NICOLAS DECOOPMAN

Data & Systems Engineer

Grenoble, France

LinkedIn https://www.linkedin.com/in/nicolas-decoopman/
GitHub https://github.com/NCSdecoopman
Portfolio https://ncsdecoopman.github.io/

I analyze and structure complex systems to design robust and operational solutions. A veterinarian by training, I have retained a scientific rigor and judgment in high-stakes environments. My expertise combines data engineering, distributed systems, and production deployment: from ingesting billions of data points to delivering decision-support tools.

Professional experience

Data Scientist — Whympr

Remote · Apr 2026 – Present

Studies and prototypes on snow cover modeling and surface data fusion for mountain environmental products.

Modeling & data

  • Temporal modeling to estimate and track snow conditions from heterogeneous sources.
  • Fusion and preparation of geospatial datasets for training and evaluation pipelines.
  • Field validation to calibrate models and monitor prediction quality.

Stack: Python · scikit-learn · Machine Learning · GIS

Data & Analytics Engineer — Whympr

Remote · Feb 2026 – Apr 2026 · 3 months

Implementation of a centralized analytical infrastructure. The primary goal of this project was to unify behavioral, transactional, and product data into an industrialized daily pipeline to provide teams with a comprehensive view of user activity.

Data engineering & architecture

  • Development of a Data Warehouse based on DuckDB with a 4-layer ELT approach (Staging, Clean, Build, Marts) managed end-to-end via dbt.
  • Implementation and maintenance of Python extraction pipelines targeting 4 main data sources: Google Analytics 4 (BigQuery), RevenueCat (via Amazon S3 and API), and the PostgreSQL production database.
  • Implementation of an identity resolution logic aiming to reconcile non-logged session paths with application user IDs.

Modeling & analytics

  • Creation of analysis models covering over 180 GA4 events, categorized by major usages (Navigation, Social, Monetization, etc.), to facilitate product KPI tracking.
  • Modeling the subscription lifecycle (trial, conversion, renewal, cancellation) by algorithmically managing continuous periods (via an Islands & Gaps logic).
  • Construction of One Big Tables (OBT) serving as a sustainable foundation for user segmentation and cohort studies.

BI & business adoption

  • Deployment of an Apache Superset instance via Docker to allow Product and Marketing teams to explore data independently, with guidance on key reports (retention by cohort, Premium conversion funnels).
  • Implementation of deployment on VM with a containerized stack, automation of service starts, and an operational runbook to secure production deployments and daily maintenance.

Stack: Python · dbt · DuckDB · PostgreSQL · SQL · Docker · Apache Superset

Full-Stack Data Engineer — naivo

Remote · Sept 2025 – Feb 2026 · 6 months

Design, development, and operation of a complete digital product. From several disparate sources to a single mountain decision-support platform. naivo.fr resolves the fragmentation of snow-weather safety data by centralizing official and collaborative streams on a high-performance map interface.

Product & value chain

  • Mastery of the entire value chain: from data ingestion to the final user experience.

Data engineering & ETL

  • Asynchronous ingestion (Python) of JSON/XML streams (Météo-France, BERA, Skitour, Camptocamp) with cleaning and spatialization pipelines.

GIS infrastructure & performance

  • Serverless GIS Infrastructure: design of a pipeline (GDAL, Tippecanoe, PMTiles) processing digital elevation models at 25m resolution.
  • High availability & frugality: distribution of vector tiles and rasters via Cloudflare Workers and R2 (PMTiles format), ensuring < 100ms latency on mobile.

Diffusion & recognition

  • Social media automation: CI/CD pipelines (GitHub Actions) driving automated interface capture (multi-layer, multi-format) and its publication on Instagram and Facebook, daily illustrating the platform's functional richness and diverse use cases.
  • Institutional recognition: project selected and honored by data.gouv.fr among the remarkable Open Data reuses of the year, validating the technical quality and product relevance.
  • Media coverage: covered by regional (Le Dauphiné Libéré) and specialized press (La Belle Route) for its concrete impact on public mountain safety.

Stack: Python · SQLite · GDAL · Cloudflare Workers · Astro/React · MapLibre GL JS · GitHub Actions

Data Engineer & Scientist — CNRS / IGE / Météo-France

Grenoble, France · Mar 2025 – Sept 2025 · 7 months

Complete industrialization of the analysis of hourly extreme precipitation in France (1959–2022): from massive climate data processing to the production of automated scientific deliverables.

Big data engineering

  • Initial design of a distributed ETL pipeline (Dask, Airflow, Linux HPC cluster) processing 10 billion data points (100 GB) across 88,000 modeled points and 14,000 observed points covering 560,640 hours. Optimized formats: NetCDF - Zarr - Parquet. Performance gain of 80% vs sequential execution.

Advanced statistical modeling

  • Automatic selection engine among 7 spatio-temporal GEV models (stationary and non-stationary) via profiled log-likelihood, applied point-by-point for seasonal and monthly extremes characterization.

DevOps/MLOps & reproducibility

  • Full CI/CD chain (GitHub Actions, Docker) with automated versioned dataset publication on HuggingFace Datasets, Streamlit application deployment on HuggingFace Spaces, interactive dashboards (Plotly, Leafmap), and dynamic scientific reports (Quarto). Configuration-as-code (YAML), monitoring, and total reproducibility guaranteed.

Stack: Python · Dask · Xarray · Polars · NumPy · SciPy · Numba · Zarr · Parquet · Airflow · Docker · Linux HPC · Streamlit

Doctor of Veterinary Medicine — Clinique vétérinaire des Dômes

Le Broc, France · Sept 2021 – Aug 2024 · 3 years

Mixed practice in canine and rural medicine, combining clinical care, herd monitoring, and a scientific approach to animal health data.

Responsibilities & achievements

  • Emergency care and clinical prioritization (canine and rural), with standardized protocols and complete traceability.
  • Coordination and monitoring of technical and scientific projects on livestock health, nutrition, and reproduction.
  • Rigorous methodology: diagnostic reproducibility, protocol compliance, patient data management.
  • Pedagogical communication and scientific popularization for owners, technicians, and agricultural partners.
  • Operational collaboration with health authorities, laboratories, field actors, and international institutions (One Health logic).

Doctor of Veterinary Medicine — Clinique Ani-Médic

La Tardière, France · Sept 2019 – Aug 2021 · 2 years

Sanitary and technical monitoring of cattle, sheep, and goat farms with an integrated animal-environment-production approach.

Responsibilities & achievements

  • Analysis and valuation of field data (reproduction, nutrition, pathologies, welfare).
  • Sanitary and preventive plans based on measurable indicators (herd monitoring, fertility, milk quality).
  • Interdisciplinary coordination between veterinarians, technicians, and farmers for complex issues.
  • Training and support for farmers, synthetic technical reports for decision-making.

Data Scientist & Analyst — INRA

Le Rheu, France · Mar 2019 – July 2019 · 5 months

Design of operational lactosemia indicators for livestock consulting organizations, based on statistical and mechanistic modeling of lactose transfers between the udder and the blood.

Analytical rigor

  • Management of quality control and analytical validation on 766 measurements from 10 experiments (279 cow profiles). Result: CV ≤ 3% (repeatability) and ≤ 7% (reproducibility), ensuring indicator reliability for production integration.

Advanced mechanistic modeling

  • Development of a two-compartment udder-blood model quantifying complex physiological transfers (2.1 g/12h of milking; 130 g/24h; renal clearance $0.24 L \cdot min^-1$). Evidence that in case of inflammation, up to 22% of lactosemia variations are explained by animal profiles.

Decision-making impact

  • Translation of experimental data into robust indicators directly integrable into decision-support tools for livestock consultants.
  • Publication at the Académie Vétérinaire de France and ADSA-INRAE communication.

Stack: R · lme4 · lmerTest · emmeans · ggplot2

Veterinary Assistant — Clinique Ani-Médic

Moncoutant, France · Jan 2019 · 1 month

Bovine prophylaxis campaign.

Field mission

  • Collection and entry of sanitary data in the field, standardization of records, quality control, monitoring of bovine screening indicators.
  • Collaboration with veterinary teams and state services.

Projects

FeelingsAnalysis

Nov 2025

Multi-aspect AI system for automating customer feedback analysis (price, cuisine, service, atmosphere) on restaurant reviews in French. Generalizable to e-commerce, satisfaction surveys, and HR.

  • Strategic benchmark: rigorous comparison between zero-shot LLM approach (Ollama) and fine-tuning a CamemBERT-Large transformer model (110M parameters) with 4 independent classification heads. Over 24 iterations of experiments to reach the optimal configuration.
  • GPU optimization: reproducible training pipeline (PyTorch Lightning) integrating Mixed Precision (FP16, 2–3x acceleration), Gradient Checkpointing (−40% GPU memory), Gradient Accumulation (effective batch 128), and Discriminative Learning Rates. Result: 86% macro-accuracy (87.8% on service, 87.2% on cuisine).
  • Production-ready: modular architecture, centralized configuration, and automatic logging of metrics, designed for industrial deployment via REST API.

Stack: Python · PyTorch Lightning · Hugging Face Transformers · Ollama · scikit-learn

SunCast

Nov 2025

High-Performance Computing (HPC) simulation engine for high-resolution sunrise and sunset times, integrating shadows cast by real topography (Copernicus DEM).

  • High-Performance Computing: massively parallelized C++17 calculator (OpenMP, 96 cores per task) processing ~2 million pixels × 365 days per department. C++ → Python binary communication in streaming (zero intermediate files). Deployment on HPC cluster via Slurm Job Arrays.
  • Use cases: actionable data for solar panel placement optimization in mountains, glaciological studies, precision agriculture, and urban planning.

Stack: C++17 · OpenMP · Python · NumPy · GDAL · Slurm HPC · Parquet

Nivéo

Oct 2025 – Nov 2025

100% autonomous and serverless data infrastructure architecture for real-time snow monitoring in France, using public Météo-France APIs (DPClim).

  • Total automation & DevSecOps: ingestion and visualization pipeline running daily without human intervention. Enhanced security: AWS authentication via OIDC (zero long-lived keys), encrypted secrets, minimal IAM permissions.
  • Infrastructure as Code (IaC): Terraform provisioning for AWS (Lambda, DynamoDB with 11-day TTL). Zero operational cost stack (permanent Free Tier) with automatic cleaning and GitHub Actions exports.
  • Performant frontend: lightweight static site (Astro, MapLibre GL JS, Chart.js) automatically deployed to GitHub Pages with every data update.

Stack: Python · AWS (DynamoDB, Lambda) · Terraform · GitHub Actions · Astro · MapLibre GL JS · Chart.js

MissingDataLab

Oct 2024 – Feb 2025

Methodological comparative study of missing data imputation strategies (MCAR, MAR, MNAR) to guarantee predictive model robustness in real-world contexts.

  • Rigorous benchmarking: systematic evaluation of 7 methods (mean, median, KNN, SoftImputer, PCA, ICE, MissForest) on simulated scenarios, demonstrating MissForest and ICE's superiority in maintaining statistical integrity.

Stack: Python · scikit-learn · SciPy · pandas · NumPy · R (simstudy)

BioResistanceAI

Jan 2025

Predictive AI for antibiotic resistance: predicting bacterial resistance to 5 antibiotics from multi-omic data (414 bacteria, 94,000+ features: 72,236 SNPs, 16,005 genes, 6,026 gene expressions).

  • Predictive excellence: recall of 0.96 (Tobramycin via XGBoost) and stable performance across all antibiotics via ensemble model optimization (XGBoost, LightGBM) with systematic hyperparameter tuning (GridSearchCV, 5-fold cross-validation).
  • Explainable AI & feature importance: identification of the most predictive information sources per antibiotic (dominant transcriptomics for the best models), to guide rapid diagnostic strategies in clinical contexts.

Stack: Python · scikit-learn · XGBoost · LightGBM · PyTorch · pandas · NumPy

SnowTrack

Dec 2024 – Feb 2025

Orchestration of automated validation pipelines for climate forecast model evaluation (S2M vs field observations from Météo-France).

  • ETL automation & spatio-temporal analysis: automated flows for data ingestion, processing, and aggregation, generating detailed statistics (means, maxima, distributions, trends) and identifying zones and periods with significant discrepancies between models and observations.

Stack: Python · pandas · NumPy

DevOps — Portfolio {NCS}decoopman

2024 - Present

Automation of personal site publication, CV, and analytical reports (Quarto) within a reproducible chain.

CI/CD & publication

  • Complete automation of site and analytical report publication.
  • Every commit triggers site regeneration (Astro) and reports (Quarto), generation and upload via GitHub Actions.
  • CI/CD ensuring consistency, traceability, and editorial time savings.

Stack: Astro · TypeScript · Quarto · GitHub Actions

Honors and awards

Laureate of the Académie Vétérinaire de France

Académie Vétérinaire de France (AVF) · Nov 2020

Thesis Award

Nantes National Veterinary School (Oniris) · Sept 2019

Technical skills

Field Details
Data engineering dbt, ETL/ELT, Airflow, REST API, orchestration, data quality, analytical modeling
Data science & ML scikit-learn, XGBoost, LightGBM, PyTorch, statsmodels, SciPy, GEV, LMM, Hugging Face, Ollama
Cloud & infrastructure AWS (DynamoDB, Lambda), Terraform, Cloudflare Workers, OIDC, IAM
Containerization & CI/CD Docker & Compose, GitHub Actions (lint, test, build, security scan, deployment), Trivy, Grype
Languages Python, R, C++, SQL, Shell/Bash
Databases PostgreSQL, DuckDB, SQLite (relational); DynamoDB (NoSQL); S3 (object)
Data formats Parquet, Zarr, NetCDF, GeoJSON, CSV, JSON
HPC Dask, OpenMP, Slurm, ProcessPoolExecutor
GIS / Geospatial GDAL, Tippecanoe, PMTiles, MapLibre GL JS, Rasterio, GeoPandas
Visualization & BI Apache Superset, Streamlit, Plotly, Matplotlib, Seaborn, Chart.js, Quarto
Systems Linux (HPC, VM servers), Windows
English Professional — scientific publication in English

Publications and presentations

Education

Université Grenoble Alpes

Master's degree in statistics and data science (SSD) — Core AI Label by MIAI · Sept 2024 – Aug 2025

Grade: Mention bien (With honors)

  • Statistics: statistical tests, statistical estimation (moments, likelihood), regressions (linear, logistic), GLM, computational statistics (bootstrap, permutation), high-dimensional statistics (FWER, FDF, Lasso and Ridge regularization), biostatistics (mixed models, survival models)
  • Data exploration: numerical (descriptive analysis), text mining (DL, Word2Vec, NLP, BERT, Hugging Face Transformers), spatial (kriging)
  • Modeling: Bayesian (Monte Carlo), sampling, time series (ARIMA, GARCH)
  • Machine learning: supervised learning (classification and regression: K-NN, SVM, Random Forests), unsupervised (K-means clustering and PCA dimensionality), deep learning (CNN, RNN, and transformers)
  • Programming and optimization (solvers):
    • R: boot, lme4, randomForest, xgboost, ggplot2, RMarkdown
    • Python: NumPy, pandas, SciPy, scikit-learn, TensorFlow, Keras, matplotlib, CVXPY

Nantes National Veterinary School

State diploma of Doctor of Veterinary Medicine (DVM) · Sept 2014 – Sept 2019

Grade: Mention très honorable avec félicitations du jury (Highest honors)

Scientific and clinical training recognized at the European level (ESEVT), covering all areas of animal medicine and surgery, veterinary public health, and animal production.

Development of transversal skills in diagnosis, methodological rigor, clinical data management, technical communication, and interdisciplinary coordination.

Final specialization in production medicine and data science with an experimental research approach focused on biological data modeling and analysis.