Computer Science

  • Machine Learning
  • High Performance Computing

Remote sensing

  • GIS
  • Google Earth Engine

Geostatistics

  • Multiple Point Statistics
  • Personal Projects
  • Post-Doc
  • PhD
  • Me

I am currently Assistant Professor in Geo-Environmental Data Science at Utrecht University following my Postdoc with Stanford and my PhD in Lausanne. My research focuses on Geostatistics - in particular, simulations for complex structures using MPS (Multiple Point Statistics) and Machine Learning -, remote sensing (mainly using Google Earth Engine) and HPC (High Performance Computing). I have a broad interest in many fields and scientific questions, and I put a particular emphasis on the use of new technologies to enhance geoscience studies.

I currently focus my research on applying Machine Learning frameworks as tools to derive other information, such as calibrations. Furthermore I have number collaboration ongoing on different subjects such as remote sensing, geostatstics, bird tracking...

Picture Mathieu Garvey

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.

H.G.Wells

Machine Learning

My research using Machine Learning is mainly focus around geoscience and how to use such ML framework in unconventional context.

Geostatistics

My research in geostatistics mainly focuses on statistical simulations. More precisely, to develop new algorithms in Multiple Point Statistics (MPS) and Machine Leraning to generate complex structure.

Remote sensing

Remote sensing had a huge impact on my career and remains important to me. Currently, most of my research in remote sensing is done through collaborations on different projects from Vegetation evolution (Vietnam and Valais), growing season, frozen lakes and even NPP (net primary productivity) evolution in ocean. Currently, most of hese studies are done on online platforms such as Google Earth Engine, which enable studies through a gigantic amount of data.

Research philosophy

I put particular importance on investigation cutting-edge solutions, from both a technological as well as algorithmic point of view. Most applications of these solutions are done through collaborations.

Furthermore, I put particular effort and time into allowing the results of various studies to be available to the community though functional, open and easy-to-use software or library.

Projects


QuickSampling: an efficent and robust MPS approach

Finished Multiple-point statistics Maintained

The main output of this project is QuickSampling (QS),a training image-based simulation tool that is a free, open and can be furthure developpe. It uses a Direct Sampling approach that was enhanced using FFT for speed and ranking (as opposed to the threshold) that tends to be less sensitive. All this results in a more efficient and robust algorithm. The tool is available through G2S. The current implementation handles continuous, categorical, multivariate and gaped datasets.

Satellite image colorization

Finished Multiple-point statistics Remote-sensing

The main output of this project is an algorithm - Narrow Distribution Selection (NDS) - to spectrally enhance remotely sensed satellite images. This algorithm uses a similar approach" to QuikSampling to automatically and statistically colorize the image. It uses a pair of images with a high and low spectral resolution in addition to the image to enhance. The algorithm is free, open, modifiable and available through G2S toolset.

Auto QS: Training image based automatic calibration of direct-pixel MPS algorithm under low verbatim hypothesis

Ongoing Multiple-point statistics Maintained

The main goal of this approach is to propose an alternative to calibrate algorithms such as QucikSampling or Direct Sampling. Based on the analysis of the training image and possibility to reproduce patterns, that algorithm will provide an optimal and evolving calibration, that reduces as much as possible verbatim copy. This approach does NOT rely on complex objective function, and is therefore much more versatile.

Effect of resolution change on remote sensed analysis

Ongoing Remote-sensing

This study explores the effects and the errors introduced by the change of spatial resolution in remote sensing applications.

FastDS

Ongoing Collaboration Multiple-point statistics Side project

FastDS enhances Direct Sampling by taking advantage of the first few pixels to remove, for the list of potential candidates, all patterns that trivially would not match. Going from random sampling to predicted potential candidate is comparable to going from a naive and uniform rejection sampling to an optimal adaptive rejection sampling. This approach provides a significant boost compared to traditional DS, without introducing measurable bias.

Snow-Vegetation trend in the Alps

Finished Collaboration Remote-sensing Side project

Studing the evolution of snow and vegetation over the last 35+ years.

Cheetah

Ongoing Multiple-point statistics Side project

Cheetah evolved from Impala and SNESIM. The idea is to encode a k-class n-point patten in k n-bit numbers. The hope is that the method will work nicely, even with just a minor performance improvement. Mathematically, the 3 algorithms are equivalent and therefore provide similar simulations. Currently, investigating the use of FPGA can bring a true breakthrough in performance and power consumption.

MAZAlib

Ongoing Collaboration Side project

MAZAlib is a project to provide a tool - algorithm implementation and interface - to easily do segmentation of porous media scans.

Tracking of vegetation evolution (growth) in Vietnam

Ongoing Collaboration Remote-sensing Side project

The goal of this project is to track the type and the growth of the vegetation in Vietnam.

Glacier tracking

Finished Collaboration Remote-sensing Side project

The project produced a high-frequency and high-resolution image time series of the Gornergletscher (in the Swiss Alps) derived from repeated UAV surveys. I implemented the tracking algorithm.

Open Earth Engine Library

Finished Remote-sensing Side project Maintained
Open Earth Engine Library logo

The Open Earth Engine Library (aka. OEEL) is part of the open-geocomputing initiative. The goal is to provide Google Earth Engine (GEE) users with free and open algorithms.

Tracking bird migration using pressure

Ongoing Collaboration

This project relies on pressure instead of light to determine the position of birds during migration.

Code and Software


G2S: The GeoStatistical Server

Multiple-point statistics Maintained

The GeoStatistical Server (G2S) is a framework that allows you to use state-of-the-art Multiple Point Statistics (MPS) algorithms to run stochastic simulations. G2S is designed to run simulations in a generic way, independently of the code used or language it is written in. For example, it enables to run a C/C++ simulation code using Python, or Python using MATLAB (or any other combination). It includes QucikSampling, Narrow Distribution Selection (NDS) and autoQS. Furthermore, it can easily be extended to handle any simulation grid-based simulation algorithms.

Open Earth Engine Library

Remote-sensing Maintained
Open Earth Engine Library logo

The Open Earth Engine Library (aka. OEEL) is part of the open-geocomputing initiative. The goal is to provide Google Earth Engine (GEE) users with free and open algorithms.

Open Earth Engine extension

Remote-sensing Maintained
Open Earth Engine Library logo

The Open Earth Engine extension (aka. OEEex) is part of the open-geocomputing initiative. The goal is to provide Google Earth Engine (GEE) users with a dedicated chrome extension to enhance their experience.

Integration effect on gender ratio

A quick and interactive study to evaluate the time needed to correct gender bias as a function of the duration of a career. The code here includes an example with an academic career, but the equations are general and can be directly used in any field."

Useful micro toolset for statistical metric

Multiple-point statistics

Few functions to compute nD variograms and 2D cumulants with matlab.

MexInterrupt

An example of how to interrupt C/C++ code in a Matlab-Mex file.

Random Kmin/Kmax

A header only library to find the k smallest/largest values (and their index) in an unsorted array. These functions are unbiased. In case of multiple positions with the same value, all are recorded if under k, otherwise the positions are sampled accordingly. These functions were designed for k <<< N (the size of the array). Furthermore, functions were implemented with intrinsic functions to get optimal performance (similar to searching for extremes). A Matlab wrapper is provided in this repository.

2D Dynamic Warping

Implementation in C/C++ of 2DDW, check the original publication for more information.
I didn't develop the method, I only implemented a C/C++ version.

pyDev4G2S

A module to develop new MPS algorithms in Python. It takes advantage of G2S to allow remote connection and interfacing with other languages such as Matlab.

G2S for QGIS

Multiple-point statistics Remote-sensing

A QGIS interface for G2S. It allows running remote sensing stochastic simulations directly in QGIS.

PyQS-PyDS

Multiple-point statistics

This is a simple code in python to grasp how Direct Sampling and QucikSampling are working. It was intentionally reduced to its roots, the goal is to get the key component and not the performance. This code is a multiple order of magnitude slower than the C/C++ implementations.

Matching Map Maker

Remote-sensing

Do pattern matching between big images (50k x 50k). Extremely useful to follow slow objects such as glaciers over time, that can be deformed.

Fast Gaussian Simulation

Fast Gaussian Simulation (FGS) is a Matlab function which generates multiple n-D Gaussian Field very quickly. It uses the Fast Fourier Transform (FFT). It removes the edge effect on long-range, but introduces a micro bias for shorter ranges (only if the range outside of the simulation size is used).

TIFF Training Images

Multiple-point statistics

A set of training images useful for MPS. Saved as TIFFF, it allows the user to load these images directly from Github in single or few lines. Therefore, it's perfect for use as demonstration code, where there is no need to send a separate training set.

GeoPressureAPI

Remote-sensing

Server that returns the probability of position of a bird based on pressure record using Google Earth Engine to access massive ERA-5 dataset.

The Berezina

Experiences

  • 2021
    to
    2025
  • Assistant Professor in Geo-Environmental Data Science at Utrecht University
  • Using Machine Learning framework to enhance Geoscience.
  • Feb 2021
    to
    Aug 2022
    (INTR fall 2021)
  • Postdoctoral researcher, Department of Geological Sciences, in Stanford
  • Bridging the gap between machine learning and geostatistical simulations. As part of Jef Caers’ team, Center for Earth Resources Forecasting (SCERF)
  • Mar 2020
    to
    Jul 2020
  • Postdoctoral researcher at the Institute of Geography and Sustainability (IGD), UNIL
  • The forest regrowth in Vietnam, a remote sensed analysis from 1984 to nowadays. As part of Christian Kull’s team.
  • Jun 2016
    to
    Aug 2016
  • Intern at CERN OpenLab
  • Contribute to GEANT V software project to realize large scale stochastic simulation of fundamental particles. Supervisor: Andrei Gheata and Maria Girone
  • Jun 2015
    to
    Dec 2015
  • Assistant researcher at Institute of Earth Surface Dynamics, UNIL
  • Development of a method to automatically complete gaps in point cloud datasets. Supervisor: Prof. Grégoire Mariéthoz
  • Mar 2014
    to
    Dec 2014
  • Postdoctoral researcher, Department of Geological Sciences, in Stanford (Currently hosted at UNIL, due to COVID)
  • R&D in image processing at cpvrLab, Bern university of applied Sciences Development of an automatic camera to scan a manhole in 3D and also in charge of developing photogrammetric processes (e.g. image processing, point cloud generation, meshing and texturing). (Master project.) Supervisor: Prof. Hudritsch Marcus

Education

  • Jan 2016
    to
    Jan 2020
    Defense date 08.01.2020
  • PhD from Institute of Earth Surface Dynamics, University of Lausanne (UNIL)
  • Title: Multiple point geostatistical approaches to spectrally enhance satellite imagery
    Supervisor:Prof. Grégoire Mariéthoz
    Jun 2017: Multiple Point Statistic Simulations (3 days), University of Neuchatel, Switzerland.
    Oct 2016: Les méthodes de la géostatistique, (3 weeks) Ecole Des Mines de Paris, Fontainebleau, France.
    Mar 2016: An introduction to statistical reasoning and the practice of statistics in environmental sciences (4 days), University of Neuchatel, Switzerland.
  • Sep 2011
    to
    Jun 2014
  • Engineering degree from the École des Mines d’Alès
  • General engineering and computer science. (with an option in innovation)
  • Sep 2008
    to
    Jun 2011
  • Lycée Albert Schweitzer de Mulhouse, France
  • Preparatory class for entrance to Grandes Ecoles, MPSI/MP*: Advanced mathematics, physics and algorithmic.

Prizes and Awards

  • May 2022
  • Google®
  • Google Developer Expert in Earth Engine
  • Mar 2022
  • NVIDIA®
  • Academic Hardware Grant Program: A100
  • May 2021
  • Swiss National Science Foundation
  • Early Postdoc.Mobility
  • Aug 2019
  • International Association for Mathematical Geosciences
  • Student travel grant
  • Nov 2015
  • Intel®
  • Grand Prize Winner of the Intel® Modern Code Developer Challenge.

Skills

  • Programing

    Language Level
    Google Earth Engine (subset of JavaScript/Python) Offcial Developer Expert
    C / C++ / OpenMP Advance expert
    MATLAB Advance expert
    Python Advance expert
    TensorFlow Advance expert
    CUDA / OpenGL / CL Expert
    Shell / Bash Expert
    Emscripten Expert
    Javascript / JQuery / HTML / CSS Expert
    Markdown Expert
    Latex Intermediate
    R Intermediate
    Maple Intermediate
    Java Intermediate
    SQL Intermediate
    OWL/SPARQL Intermediate
  • Interfacing languages

    MATLAB <==> Python
    Python <==> C/C++
    MATLAB <==> C/C++
  • Softwares

    Name
    Sublime Text
    Git
    Illustrator
    Office
    Cinema4D
    QGIS
    ...
  • Languages

    Language Level
    French +++
    English ++
    German +

GET IN TOUCH

I would love to hear from you!

So don't hesitate to drop me a line at

Projects for students


Multiple-Object-Simulation

Multiple-point statistics

Statistical simulations are extremely useful to get realisation of random processes. In the case of use of transfer functions (forward simulations) realisations are required to compute accurate estimations. Over the 30 last year’s Multiple Point Statistics (MPS) changed the landscape of geostatistical simulations, by providing realistic samples of complexly structured processes. Unfortunately, these methods are currently restricted to pixel (raster) based simulation. However, a number of situations are not gridded.
Another approach like object-based simulation can be used for non-pixel-based situation, however, these approaches are limited to simple structure only.
This master project as for goal to experiment to develop a sequential simulation approach based on objects instead of pixels, but conserving the pattern matching process that is at the origin of the success of MPS.
Application of this newly developed algorithm would mainly be on object base class simulation.

Development of join segmentation classification process

Remote-sensing

Object-based image analysis is a widely used methodology for extracting useful information from remote sensing imagery. Unlike pixel-based methods classifying each pixel separately, it first creates groups of pixels (objects) that are similar and then tries to classify this group of pixels as whole. The advantage is that classification can be done on spectral properties of all pixels in the object, which provides more information for classification compared to pixel-based methods. Object-based image analysis thus has been widely used in large number of domains, for instance for mapping geomorphology of coastal or mountainous areas. In this project you will innovate object-based image analysis methods and test potentially improved methods by mapping geomorphology (other case studies could be defined depending on your interest. In particular, you will focus on integrating the two steps involved: 1) segmentation of the imagery into objects, 2) classification of objects. Recent work suggests that higher accuracy could be obtained if an algorithm integrates both steps.
This topic as about designing such an algorithm using image processing and Machine Learning methods and to test it on a coastal area, or more general in geomorphic application.

Simulation model with Google Earth Engine

Remote-sensing

Today Google Earth Engine (GEE) and other cloud platforms, reach the point to be recognized as a standard in remote sensing analysis. However, while GEE allows to do world-wide scale studies, it has thus far not widely been used for more complex analysis, in particular spatio-temporal forward simulation, requiring iterative processes, which is the topic of this project.
Simulation operated in the cloud on such a platform could take advantage of the colossal amount of data present (remote sensed imagery, DEM, climate model, weather reanalysis, hydrological model,…) without having the need to transfer and store this data to another system usually used for such simulations.
This project has as goal to study possibilities (feasibility and performance) of using such a cloud platform to realize simulations. Examples would range from the simple Conway's Game of Life and Lotka-Volterra scenarios, to more advanced and real case scenarios such as hydrological models. The study involves a comparison of simulation models implemented with GEE and other platforms regarding multiple aspects, including ease of implementation, run time, coupling opportunities.

Detection of frozen lakes using Sentinel 1 data

Machine Learning Remote-sensing

The state of lakes (frozen /unfrozen) is critical for multiple processes. On of this key process is the evaporation that is directly related to it. To quantify the impact of the climate change on freezing lakes, we need to be able to determine the periods on which lakes are frozen. This task can be solved using passive optical sensing; However, such sensing remains limited due to high clouds cover during freezing periods.
In this project we propose to use Sentinel 1 (SAR) data, that is known to sense trough cloud coverage. Using SAR data to detect lake status rise various challenges such as how to handle Sun reflection on the surface water or wave on the water.
The project will be focusing on overcoming these challenges, by solving the Sun position geometric equation, and using ML to properly detect the status of lakes.

Study and compute standard transformation between satellite sensors

Remote-sensing

Nowadays, more satellites are observing earth than ever before, but each with a different sensor, and each one has different characteristics. Different spectral alignment methods exist (form simple equation to advance Machine Learning technics), each with its own pros and cons. A common approach is to approximate the bias using a simple linear regression over enough data points. This can be applied to the spectral band itself or to derivative products (e.g., NDVI, EVI, and other indexes). If an extensive literature exists about such transformation, these studies are restricted to recent sensor (mainly NASA Landsat 7-8 and ESA Sentinel 2), and assume a universal transformation.
This project focuses on the idea that such transformation should be determined for archive imagery too (Landsat TM and MSS at least). Especially today, these early data are crucial to study earth evolution, and therefore need to be comparable to today’s acquired data. Furthermore, the spatial variability of such transformation should be evaluated, in particular regarding land cover. In fact, the overabundance of particular landcovers tends to bias the computed correction.

Create (and implement) an efficient random path generation ( on distributed memory )

Multiple-point statistics

Random path is assumed as an optimal path for sequential geostatistical simulations, due to the absence of any hypothesis on points repartition. Such random paths are based on the generation of random permutation sequences.
The generation of random permutation is straight forward using Fisher–Yates shuffle algorithm. This algorithm is extremely efficient, however, restricted exclusively to sequential processing. In cases of parallelized computation, at fortiori on distributed memory systems other algorithms need to be designed.
This thesis project will focus on developing, implementing and evaluating a distributed random (or quasi-random) path generation for geostatistical simulations.

Create metric(s) to estimate verbatim copy

Multiple-point statistics Machine Learning

Multiple-Point geoStatistics (MPS) is class of advance geostatistical simulation methods that rely on multiple points instead of pairs of values (kriging). Such methods allow to reproduce complex spatial structures, by transferring pattern present in the training dataset to the realization. One of the weaknesses of these approaches is that they often overfit to the training data. The main overfitting is called verbatim copy and consist in duplicating a continuous part of the training dataset (copy of complete patches).
Visual validation is the main strategy to estimate verbatim copy. In fact, early algorithms did not provide data, allowing automated computation of such a property. This project will focus on developing an “index” allowing to quickly summarize the proportion of verbatim copy present in a realization based on the algorithm outputs. Furthermore, using simple datasets the intrinsic proportion of verbatim copy would be estimated and would serve as reference points.

Fusion of high spatial resolution and high frequency model reanalysis for sea-wind

Machine Learning Remote-sensing

UERRA provide high frequency wind reanalysis but on a coarse spatial resolution; on the other hand, Sentinel 1 derived sea-wind estimation is at high resolution, but only sporadically available. The need for high resolution sea-wind estimation can be critical for an optimal wind turbine positioning. Multiple geostatistical methods exist and can provide estimates of at high spatiotemporal resolutions, but each relying on some strong hypothesis such as stationarity (pattern are spatially independent). In this project we propose to overcome this limitation by using a Machine Learning approach, in particular locally weighted CNN to create an optimal estimation for each point in space.