A Case Study of the AI4EBV Project
Machine learning and deep learning methods are widely used in the field of Earth observation, both for image classification and image segmentation tasks. However, the implementation of these methods is often limited by the available computational resources, especially when applied to large-scale high-resolution imagery. Within the scope of the AI4EBV project (https://ai4ebv.eurac.edu/), funded by the Group on Earth Observation Biodiversity Observation Network (https://geobon.org/) and Microsoft, we are assessing the computational cost and accuracy of different machine learning and deep learning models for the task of multispectral image segmentation. We are integrating open digital elevation model and multitemporal Earth observation data, to derive general land cover at a spatial resolution of 30m for the entire European Alps throughout the years 2016-2020. We focus on comparing the accuracy and cost of advanced feature extracting deep learning algorithms (e.g., deep convolutional neural networks) with traditional feature-based machine learning algorithms (e.g., random forests, support vector machines). The algorithms are written in Python-3 and are based on existing open-source frameworks for scalable data analytics (numpy, xarray, and dask) and machine learning (scikit-learn and PyTorch). The developed algorithms are highly optimized, allowing for large-scale image analysis on both multiple CPUs and GPUs in parallel.
SFScon21 - Daniel Frisinghelli - The Cost of Traditional Machine Learning and Deep Learning Models in Earth Observation
1. The Cost of Traditional Machine Learning and Deep Learning
Models in Earth Observation
SFScon 2021 - Software Architects Track
Frisinghelli Daniel 1
, Claus Michele 1
, Jacob Alexander 1
, Sayre Roger 2
, Adler
Carolina 3
, Thornton James 3
, Zebisch Marc 1
& Sonnenschein Ruth 1
1
Eurac Research, Bolzano, Italy
2
United States Geological Survey, USA
3
Mountain Research Initiative, Bern, Switzerland
November 12, 2021
2. Introduction Use case Implementation Results Contact
What is Earth Observation Data?
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 1 / 12
3. Introduction Use case Implementation Results Contact
Earth Observation is Big Data!
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 2 / 12
4. Introduction Use case Implementation Results Contact
The AI4EBV Project
Using Articial Intelligence to Downscale Ecosystem Related Essential Biodiversity
Variables in Mountain Environments
Funded by:
Partners:
Goal: Integrate terrain, climate, and land cover information to derive a high-resolution
map of mountain ecosystem extent (Sayre et al., 2020)
ML use case: High-resolution land cover classication problem
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 3 / 12
5. Introduction Use case Implementation Results Contact
Land Cover Classication
Supervised machine learning problems require a labelled dataset D = {X, y}.
Figure 1: The multispectral image denes the input data X (left) and the land cover classes
dene the labels y (right).
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 4 / 12
6. Introduction Use case Implementation Results Contact
The Harmonized Landsat-8 Sentinel-2 Dataset
Spatial resolution: 30 m (Claverie et al., 2018)
Tile size: (109.8, 109.8) km, image size: ∼ 0.3 GB @32bit Float
Frequency of observations: 2 − 3 days (∼ 200 images / year / tile)
∼ 250 GB / year for the province of South Tyrol
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 5 / 12
7. Introduction Use case Implementation Results Contact
Automatic Label Extraction: CORINE Land Cover
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 6 / 12
8. Introduction Use case Implementation Results Contact
Automatic Label Extraction: Removal of Boundary Pixels
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 6 / 12
9. Introduction Use case Implementation Results Contact
Automatic Label Extraction: Outlier Removal
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 6 / 12
10. Introduction Use case Implementation Results Contact
Machine Learning Classication Algorithms
Random Forest
Created by Sachin Modgekar from the Noun Project
Convolutional Neural Network
C
t
Conv
128
t
Conv Conv
N
t
128
t
256
t
N
1
N
1
Conv
Average Softmax
Input
Input: spectral-temporal features
Output: P(c), ∀c ∈ [1, . . . , N]
Trained on: CPU(s)
Input: multispectral time series
Output: P(c), ∀c ∈ [1, . . . , N]
Trained on: GPU(s)
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 7 / 12
11. Introduction Use case Implementation Results Contact
Land Cover Classication: Workow
Digital
elevation
model
Satellite
data
Land cover
product
Classified
land cover
map
Automatic
label
extraction
Global or
regional
land cover
product
30m Harmonized
Landsat-8
Sentinel-2
product
Removal of
boundary pixels
and
outliers
Labels
Trained
classifier
Classification
Classified
land cover
map
Machine learning
Deep learning
Feature extraction and classification
Feature extraction Classification
Training
Inference
Labels
30m Harmonized
Landsat-8
Sentinel-2
product
Input Algorithm Output
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 8 / 12
12. Introduction Use case Implementation Results Contact
Land Cover Classication: Implementation
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 9 / 12
13. Introduction Use case Implementation Results Contact
What is the Cost of the Models for South Tyrol?
Random Forest: ~2.3 h (~6.5$)
Deep CNN: ~2 h (~7.2$)
50%
Mar - Sep ~ 48 - 96 GB
4 tiles 40-80 images / tile
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 10 / 12
14. Introduction Use case Implementation Results Contact
What is the Cost of the Models for the European Alps?
Random Forest: ~25 h (~70$)
Deep CNN: ~22 h (~80$)
50%
Mar - Sep ~ 516 - 1032 GB
43 tiles 40-80 images / tile
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 11 / 12
15. Introduction Use case Implementation Results Contact
Thank you for your attention!
Contact: daniel.frisinghelli@eurac.edu, ruth.sonnenschein@eurac.edu
Website: https://ai4ebv.eurac.edu/
Code repositories:
AI4EBV PyTorch Training
Thanks to:
In collaboration with:
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 12 / 12
16. Introduction Use case Implementation Results Contact
References
Claverie, M., J. Ju, J. G. Masek, J. L. Dungan, E. F. Vermote, J. C. Roger, S. V. Skakun, and
C. Justice, (2018): The Harmonized Landsat and Sentinel-2 surface reectance data set. Remote
Sensing of Environment, 219, October, 145161, https://doi.org/10.1016/j.rse.2018.09.002.
Sayre, R. et al., (2020): An assessment of the representation of ecosystems in global protected areas
using new maps of World Climate Regions and World Ecosystems. Global Ecology and Conservation,
21. https://doi.org/10.1016/j.gecco.2019.e00860.
Frisinghelli et al. (2021): The Cost of Machine Learning in Earth Observation, SFScon 2021 12 / 12