Professional experience
Data Scientist — Whympr
Remote · Apr 2026 – Present
Studies and prototypes on snow cover modeling and surface data fusion for mountain environmental products.
Modeling & data
- Temporal modeling to estimate and track snow conditions from heterogeneous sources.
- Fusion and preparation of geospatial datasets for training and evaluation pipelines.
- Field validation to calibrate models and monitor prediction quality.
Stack: Python · scikit-learn · Machine Learning · GIS
Data & Analytics Engineer — Whympr
Remote · Feb 2026 – Apr 2026 · 3 months
Implementation of a centralized analytical infrastructure.
The primary goal of this project was to unify behavioral, transactional, and product data into an industrialized daily pipeline to provide teams with a comprehensive view of user activity.
Data engineering & architecture
- Development of a Data Warehouse based on DuckDB with a 4-layer ELT approach (Staging, Clean, Build, Marts) managed end-to-end via dbt.
- Implementation and maintenance of Python extraction pipelines targeting 4 main data sources: Google Analytics 4 (BigQuery), RevenueCat (via Amazon S3 and API), and the PostgreSQL production database.
- Implementation of an identity resolution logic aiming to reconcile non-logged session paths with application user IDs.
Modeling & analytics
- Creation of analysis models covering over 180 GA4 events, categorized by major usages (Navigation, Social, Monetization, etc.), to facilitate product KPI tracking.
- Modeling the subscription lifecycle (trial, conversion, renewal, cancellation) by algorithmically managing continuous periods (via an Islands & Gaps logic).
- Construction of One Big Tables (OBT) serving as a sustainable foundation for user segmentation and cohort studies.
BI & business adoption
- Deployment of an Apache Superset instance via Docker to allow Product and Marketing teams to explore data independently, with guidance on key reports (retention by cohort, Premium conversion funnels).
- Implementation of deployment on VM with a containerized stack, automation of service starts, and an operational runbook to secure production deployments and daily maintenance.
Stack: Python · dbt · DuckDB · PostgreSQL · SQL · Docker · Apache Superset
Full-Stack Data Engineer — naivo
Remote · Sept 2025 – Feb 2026 · 6 months
Design, development, and operation of a complete digital product.
From several disparate sources to a single mountain decision-support platform.
naivo.fr resolves the fragmentation of snow-weather safety data by centralizing official and collaborative streams on a high-performance map interface.
Product & value chain
- Mastery of the entire value chain: from data ingestion to the final user experience.
Data engineering & ETL
- Asynchronous ingestion (Python) of JSON/XML streams (Météo-France, BERA, Skitour, Camptocamp) with cleaning and spatialization pipelines.
GIS infrastructure & performance
- Serverless GIS Infrastructure: design of a pipeline (GDAL, Tippecanoe, PMTiles) processing digital elevation models at 25m resolution.
- High availability & frugality: distribution of vector tiles and rasters via Cloudflare Workers and R2 (PMTiles format), ensuring < 100ms latency on mobile.
Diffusion & recognition
- Social media automation: CI/CD pipelines (GitHub Actions) driving automated interface capture (multi-layer, multi-format) and its publication on Instagram and Facebook, daily illustrating the platform's functional richness and diverse use cases.
- Institutional recognition: project selected and honored by data.gouv.fr among the remarkable Open Data reuses of the year, validating the technical quality and product relevance.
- Media coverage: covered by regional (Le Dauphiné Libéré) and specialized press (La Belle Route) for its concrete impact on public mountain safety.
Stack: Python · SQLite · GDAL · Cloudflare Workers · Astro/React · MapLibre GL JS · GitHub Actions
Data Engineer & Scientist — CNRS / IGE / Météo-France
Grenoble, France · Mar 2025 – Sept 2025 · 7 months
Complete industrialization of the analysis of hourly extreme precipitation in France (1959–2022): from massive climate data processing to the production of automated scientific deliverables.
Big data engineering
-
Initial design of a distributed ETL pipeline (Dask, Airflow, Linux HPC cluster) processing 10 billion data points (100 GB) across 88,000 modeled points and 14,000 observed points covering 560,640 hours.
Optimized formats: NetCDF - Zarr - Parquet. Performance gain of 80% vs sequential execution.
Advanced statistical modeling
-
Automatic selection engine among 7 spatio-temporal GEV models (stationary and non-stationary) via profiled log-likelihood, applied point-by-point for seasonal and monthly extremes characterization.
DevOps/MLOps & reproducibility
-
Full CI/CD chain (GitHub Actions, Docker) with automated versioned dataset publication on HuggingFace Datasets, Streamlit application deployment on HuggingFace Spaces, interactive dashboards (Plotly, Leafmap), and dynamic scientific reports (Quarto).
Configuration-as-code (YAML), monitoring, and total reproducibility guaranteed.
Stack: Python · Dask · Xarray · Polars · NumPy · SciPy · Numba · Zarr · Parquet · Airflow · Docker · Linux HPC · Streamlit
Doctor of Veterinary Medicine — Clinique vétérinaire des Dômes
Le Broc, France · Sept 2021 – Aug 2024 · 3 years
Mixed practice in canine and rural medicine, combining clinical care, herd monitoring, and a scientific approach to animal health data.
Responsibilities & achievements
- Emergency care and clinical prioritization (canine and rural), with standardized protocols and complete traceability.
- Coordination and monitoring of technical and scientific projects on livestock health, nutrition, and reproduction.
- Rigorous methodology: diagnostic reproducibility, protocol compliance, patient data management.
- Pedagogical communication and scientific popularization for owners, technicians, and agricultural partners.
- Operational collaboration with health authorities, laboratories, field actors, and international institutions (One Health logic).
Doctor of Veterinary Medicine — Clinique Ani-Médic
La Tardière, France · Sept 2019 – Aug 2021 · 2 years
Sanitary and technical monitoring of cattle, sheep, and goat farms with an integrated animal-environment-production approach.
Responsibilities & achievements
- Analysis and valuation of field data (reproduction, nutrition, pathologies, welfare).
- Sanitary and preventive plans based on measurable indicators (herd monitoring, fertility, milk quality).
- Interdisciplinary coordination between veterinarians, technicians, and farmers for complex issues.
- Training and support for farmers, synthetic technical reports for decision-making.
Data Scientist & Analyst — INRA
Le Rheu, France · Mar 2019 – July 2019 · 5 months
Design of operational lactosemia indicators for livestock consulting organizations, based on statistical and mechanistic modeling of lactose transfers between the udder and the blood.
Analytical rigor
-
Management of quality control and analytical validation on 766 measurements from 10 experiments (279 cow profiles).
Result: CV ≤ 3% (repeatability) and ≤ 7% (reproducibility), ensuring indicator reliability for production integration.
Advanced mechanistic modeling
-
Development of a two-compartment udder-blood model quantifying complex physiological transfers (2.1 g/12h of milking; 130 g/24h; renal clearance $0.24 L \cdot min^-1$).
Evidence that in case of inflammation, up to 22% of lactosemia variations are explained by animal profiles.
Decision-making impact
- Translation of experimental data into robust indicators directly integrable into decision-support tools for livestock consultants.
- Publication at the Académie Vétérinaire de France and ADSA-INRAE communication.
Stack: R · lme4 · lmerTest · emmeans · ggplot2
Veterinary Assistant — Clinique Ani-Médic
Moncoutant, France · Jan 2019 · 1 month
Bovine prophylaxis campaign.
Field mission
- Collection and entry of sanitary data in the field, standardization of records, quality control, monitoring of bovine screening indicators.
- Collaboration with veterinary teams and state services.
Projects
FeelingsAnalysis
Nov 2025
Multi-aspect AI system for automating customer feedback analysis (price, cuisine, service, atmosphere) on restaurant reviews in French.
Generalizable to e-commerce, satisfaction surveys, and HR.
- Strategic benchmark: rigorous comparison between zero-shot LLM approach (Ollama) and fine-tuning a CamemBERT-Large transformer model (110M parameters) with 4 independent classification heads.
Over 24 iterations of experiments to reach the optimal configuration.
- GPU optimization: reproducible training pipeline (PyTorch Lightning) integrating Mixed Precision (FP16, 2–3x acceleration), Gradient Checkpointing (−40% GPU memory), Gradient Accumulation (effective batch 128), and Discriminative Learning Rates.
Result: 86% macro-accuracy (87.8% on service, 87.2% on cuisine).
- Production-ready: modular architecture, centralized configuration, and automatic logging of metrics, designed for industrial deployment via REST API.
Stack: Python · PyTorch Lightning · Hugging Face Transformers · Ollama · scikit-learn
SunCast
Nov 2025
High-Performance Computing (HPC) simulation engine for high-resolution sunrise and sunset times, integrating shadows cast by real topography (Copernicus DEM).
- High-Performance Computing: massively parallelized C++17 calculator (OpenMP, 96 cores per task) processing ~2 million pixels × 365 days per department.
C++ → Python binary communication in streaming (zero intermediate files). Deployment on HPC cluster via Slurm Job Arrays.
- Use cases: actionable data for solar panel placement optimization in mountains, glaciological studies, precision agriculture, and urban planning.
Stack: C++17 · OpenMP · Python · NumPy · GDAL · Slurm HPC · Parquet
Nivéo
Oct 2025 – Nov 2025
100% autonomous and serverless data infrastructure architecture for real-time snow monitoring in France, using public Météo-France APIs (DPClim).
- Total automation & DevSecOps: ingestion and visualization pipeline running daily without human intervention.
Enhanced security: AWS authentication via OIDC (zero long-lived keys), encrypted secrets, minimal IAM permissions.
- Infrastructure as Code (IaC): Terraform provisioning for AWS (Lambda, DynamoDB with 11-day TTL).
Zero operational cost stack (permanent Free Tier) with automatic cleaning and GitHub Actions exports.
- Performant frontend: lightweight static site (Astro, MapLibre GL JS, Chart.js) automatically deployed to GitHub Pages with every data update.
Stack: Python · AWS (DynamoDB, Lambda) · Terraform · GitHub Actions · Astro · MapLibre GL JS · Chart.js
MissingDataLab
Oct 2024 – Feb 2025
Methodological comparative study of missing data imputation strategies (MCAR, MAR, MNAR) to guarantee predictive model robustness in real-world contexts.
- Rigorous benchmarking: systematic evaluation of 7 methods (mean, median, KNN, SoftImputer, PCA, ICE, MissForest) on simulated scenarios, demonstrating MissForest and ICE's superiority in maintaining statistical integrity.
Stack: Python · scikit-learn · SciPy · pandas · NumPy · R (simstudy)
BioResistanceAI
Jan 2025
Predictive AI for antibiotic resistance: predicting bacterial resistance to 5 antibiotics from multi-omic data (414 bacteria, 94,000+ features: 72,236 SNPs, 16,005 genes, 6,026 gene expressions).
- Predictive excellence: recall of 0.96 (Tobramycin via XGBoost) and stable performance across all antibiotics via ensemble model optimization (XGBoost, LightGBM) with systematic hyperparameter tuning (GridSearchCV, 5-fold cross-validation).
- Explainable AI & feature importance: identification of the most predictive information sources per antibiotic (dominant transcriptomics for the best models), to guide rapid diagnostic strategies in clinical contexts.
Stack: Python · scikit-learn · XGBoost · LightGBM · PyTorch · pandas · NumPy
SnowTrack
Dec 2024 – Feb 2025
Orchestration of automated validation pipelines for climate forecast model evaluation (S2M vs field observations from Météo-France).
- ETL automation & spatio-temporal analysis: automated flows for data ingestion, processing, and aggregation, generating detailed statistics (means, maxima, distributions, trends) and identifying zones and periods with significant discrepancies between models and observations.
Stack: Python · pandas · NumPy
DevOps — Portfolio {NCS}decoopman
2024 - Present
Automation of personal site publication,
CV, and analytical reports (Quarto) within a reproducible chain.
CI/CD & publication
- Complete automation of site and analytical report publication.
- Every commit triggers site regeneration (Astro) and reports (Quarto), generation and upload via GitHub Actions.
- CI/CD ensuring consistency, traceability, and editorial time savings.
Stack: Astro · TypeScript · Quarto · GitHub Actions
Université Grenoble Alpes
Master's degree in statistics and data science (SSD) — Core AI Label by MIAI · Sept 2024 – Aug 2025
Grade: Mention bien (With honors)
- Statistics: statistical tests, statistical estimation (moments, likelihood), regressions (linear, logistic), GLM, computational statistics (bootstrap, permutation), high-dimensional statistics (FWER, FDF, Lasso and Ridge regularization), biostatistics (mixed models, survival models)
- Data exploration: numerical (descriptive analysis), text mining (DL, Word2Vec, NLP, BERT, Hugging Face Transformers), spatial (kriging)
- Modeling: Bayesian (Monte Carlo), sampling, time series (ARIMA, GARCH)
- Machine learning: supervised learning (classification and regression: K-NN, SVM, Random Forests), unsupervised (K-means clustering and PCA dimensionality), deep learning (CNN, RNN, and transformers)
- Programming and optimization (solvers):
- R: boot, lme4, randomForest, xgboost, ggplot2, RMarkdown
- Python: NumPy, pandas, SciPy, scikit-learn, TensorFlow, Keras, matplotlib, CVXPY
Nantes National Veterinary School
State diploma of Doctor of Veterinary Medicine (DVM) · Sept 2014 – Sept 2019
Grade: Mention très honorable avec félicitations du jury (Highest honors)
Scientific and clinical training recognized at the European level (ESEVT), covering all areas of animal medicine and surgery, veterinary public health, and animal production.
Development of transversal skills in diagnosis, methodological rigor, clinical data management, technical communication, and interdisciplinary coordination.
Final specialization in production medicine and data science with an experimental research approach focused on biological data modeling and analysis.