Nicolas Karasiak
Senior Geospatial ML Engineer · Distributed Systems
10+ years building large-scale geospatial ML systems and remote-sensing platforms, spanning distributed processing, geospatial foundation models, time-series analysis, and open-source tooling.
600+ scientific citations for open-source geospatial software
300+ citations across peer-reviewed publications.
EXPERIENCE
EarthDaily — Data Scientist / Data Engineer Toulouse · September 2022 – Present
- Massive Scale Processing: Built a large-scale geospatial data platform monitoring tens of millions of agricultural parcels worldwide, with zonal statistics engines outperforming SOTA solutions.
- Cost & Reliability Mastery: Redesigned distributed data pipelines to achieve a 100x reduction in processing costs while increasing first-run success rate to 99%.
- Cloud Infrastructure Optimization: Led a strategic migration to ARM64 (AWS Graviton), reducing compute costs by 40% without sacrificing performance.
- Internal AI Tooling: Built internal AI tooling to standardize engineering workflows, including prompt engineering, reusable skills, and slash-command integrations, improving productivity and consistency.
- Geospatial Foundation Models: Fine-tuned and generated embeddings from state-of-the-art GFMs (Tessera, Prithvi, Clay, Presto) for crop classification and yield prediction.
Pixstart — Remote Sensing Data Scientist / ML Engineer Toulouse · September 2020 – September 2022
- Built change detection workflows on Sentinel-2 time series for linear infrastructure monitoring
- Developed geospatial pipelines combining ML models with domain-specific rules to generate analysis-ready outputs
- Translated client requirements into operational remote-sensing solutions
Dynafor / INRA — PhD, Remote Sensing Time Series Toulouse · October 2016 – August 2019
- Developed forest species mapping methods from satellite time series (Sentinel-2, Formosat-2), combining phenological and canopy-structure features.
- Investigated spatial autocorrelation effects in geospatial ML evaluation, improving robustness and reliability of performance metrics.
- Published peer-reviewed research and delivered operational remote-sensing applications.
- Built and maintained open-source Python library MuseoToolBox for reproducible geospatial workflows.
Guiana Amazonian Park — Remote Sensing Intern Cayenne · March 2016 – September 2016
- Developed geospatial and remote-sensing pipelines using Python, R, QGIS, and OTB.
- Designed vegetation detection and monitoring workflows at park-wide scale using satellite imagery.
PROJECTS & OPEN SOURCE
- EarthDaily Python SDK — official package; adopted and extended by the parent company following acquisition
- QGIS Dzetsaka Plugin — 200k+ downloads, 600+ scientific publications
- QGIS MCP Plugin — 50+ GitHub stars in the first month of publication
- MuseoToolBox — Geospatial ML library with 43k+ downloads for reproducible remote-sensing workflows
SKILLS
- Languages
- Python, R, Rust
- Distributed & Data Systems
- Dask, distributed pipelines, large-scale geospatial processing
- Geospatial
- STAC, GeoParquet, GDAL, Zarr
- Machine Learning
- scikit-learn, PyTorch, Lightning, time-series
- Platform
- Data platform & API/SDK design
- Performance
- ARM64, profiling, cost-efficient distributed computing
- Cloud & Infrastructure
- AWS (EC2, EKS, S3), Docker, Argo
- Leadership
- System design, technical leadership