SlideShare a Scribd company logo
1 of 20
Generating Training Data from Noisy
Measurements
HAMED ALEMOHAMMAD
LEAD GEOSPATIAL DATA SCIENTIST
ML Hub Earth
 Machine Learning commons for EO
 Training data
 Models
 Standards and best practices
Global Land Cover Training Dataset
 Human-verified training dataset
 Using open-source Sentinel-2 imagery
 10 m spatial resolution.
 Global and geo-diverse
Workflow
S2 L2A
Reflectance
S2 L2A
Classification
GlobeLand30
Labels (2010)
Filtered Labels
Class
Predictions
Class
Verification
(Human)
Model
Training
Data
 Input Data:
 10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2
 20 m bands scaled to 10m using bi-cubic interpolation
 Reference/Label Data:
 GlobeLand30 labels for 2010 used as a source
 Classes mapped to REF Land Cover Taxonomy
 Labels re-gridded to Sentinel-2 grid using nearest neighbor
 Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification
(produced as part of atmospheric correction)
 Filtered labels used as reference labels for training
Methodology
 A pixel-based supervised Random Forests model trained for each scene.
 Pixels without valid reflectance are excluded from training.
 Training on class-stratified samples of half the pixels in a scene with one
Sentinel-2 pixel at 10 m for each label pixel at 30 m.
 Predictions are made on all pixels marked with usable classes during Level-2A
processing, including pixels labeled as unclassified.
 Annual labels will be generated by aggregating time series of predictions and
probabilities from the same tile throughout the year.
Results
 88.75% average model accuracy across 4 diverse scenes.
 Some classes, like water and snow/ice, predicted with high accuracy and high
confidence across all scenes.
 Other classes, like wetland and (semi) natural vegetation, are subtler and were
expected to be more difficult to classify.
 Woody vegetation and cultivated vegetation were predicted relatively
accurately and not confused with each other, as a result of including 20 m red
edge bands, resampled to 10 m.
 Artificial bare ground tended to be predicted in unclassified regions (in
reference data), taking over areas of natural bare ground and cultivated
vegetation and suggesting that traces of human activity would lead to pixels
classified as artificial bare ground in off-vegetation season.
Results
What about non-categorical variables?
 True value of categorical variables vs true value of continuous variables:
 Crop Yield
 Soil Moisture
 Temperature
 Precipitation
 All measurements of continuous variables are prone to uncertainty (noise and
bias).
 How to reduce/eliminate these uncertainties in training data?
In-SituModel Satellite
Truth
Noisy and biased measurement systems
slide courtesy of K. McColl
Generating Training Dataset
 Triple collocation (TC) is a technique for estimating the unknown error standard
deviations (or RMSEs) of three mutually independent measurement systems,
without treating any one system as zero-error “truth”.
𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖
= 𝑄𝑖𝑖 −
𝑄 𝑖𝑗 𝑄𝑖𝑘
𝑄 𝑗𝑘
 TC-based RMSE estimates at each pixel are used to compute a priori probability
(𝑃𝑖) of selecting a particular dataset:
𝑃𝑖 =
1
𝜎𝜀𝑖
2
𝑖=1
3 1
𝜎𝜀𝑖
2
Sample time series of a pixel
𝑋1 𝑋2 𝑋3
𝑡1
𝑡2
𝑡3
𝑡 𝑁
𝑋 𝑇
Backup Slides
Alemohammad, et al., Biogeosciences, 2017
Alemohammad, et al., Biogeosciences, 2017
Things to check
 Sentinel-2 L2A classes
 What are the usable classes there?
 Plot actual scene + artificial bare ground

More Related Content

What's hot

Retraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.pptRetraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.ppt
grssieee
 
Распознавание облаков и теней на спутниковых изображениях с использованием гл...
Распознавание облаков и теней на спутниковых изображениях с использованием гл...Распознавание облаков и теней на спутниковых изображениях с использованием гл...
Распознавание облаков и теней на спутниковых изображениях с использованием гл...
Ontico
 
Hsc 340 10 14
 Hsc 340 10 14 Hsc 340 10 14
Hsc 340 10 14
CSULB
 
Maciej soja l3_poster
Maciej soja l3_posterMaciej soja l3_poster
Maciej soja l3_poster
Maciej Soja
 
10008-16.antoine_lefebvre2
10008-16.antoine_lefebvre210008-16.antoine_lefebvre2
10008-16.antoine_lefebvre2
Antoine Lefebvre
 
Irrera gold2010
Irrera gold2010Irrera gold2010
Irrera gold2010
grssieee
 
Pulvirenti_IGARSS2011.ppt
Pulvirenti_IGARSS2011.pptPulvirenti_IGARSS2011.ppt
Pulvirenti_IGARSS2011.ppt
grssieee
 
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
rsmahabir
 
geographic information system pdf
geographic information system pdfgeographic information system pdf
geographic information system pdf
Rolan Ben Lorono
 

What's hot (19)

Investigation of Chaotic-Type Features in Hyperspectral Satellite Data
Investigation of Chaotic-Type Features in Hyperspectral Satellite DataInvestigation of Chaotic-Type Features in Hyperspectral Satellite Data
Investigation of Chaotic-Type Features in Hyperspectral Satellite Data
 
Fragmentation revisited 050902
Fragmentation revisited 050902Fragmentation revisited 050902
Fragmentation revisited 050902
 
REMOTE SENSING
REMOTE SENSINGREMOTE SENSING
REMOTE SENSING
 
Retraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.pptRetraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.ppt
 
Распознавание облаков и теней на спутниковых изображениях с использованием гл...
Распознавание облаков и теней на спутниковых изображениях с использованием гл...Распознавание облаков и теней на спутниковых изображениях с использованием гл...
Распознавание облаков и теней на спутниковых изображениях с использованием гл...
 
Hsc 340 10 14
 Hsc 340 10 14 Hsc 340 10 14
Hsc 340 10 14
 
Maciej soja l3_poster
Maciej soja l3_posterMaciej soja l3_poster
Maciej soja l3_poster
 
Raster data analysis
Raster data analysisRaster data analysis
Raster data analysis
 
10008-16.antoine_lefebvre2
10008-16.antoine_lefebvre210008-16.antoine_lefebvre2
10008-16.antoine_lefebvre2
 
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
 
Robust registration of cloudy satellite images using two step segmentation
Robust registration of cloudy satellite images using two step segmentationRobust registration of cloudy satellite images using two step segmentation
Robust registration of cloudy satellite images using two step segmentation
 
Irrera gold2010
Irrera gold2010Irrera gold2010
Irrera gold2010
 
Digital Elevation Model (DEM)
Digital Elevation Model (DEM)Digital Elevation Model (DEM)
Digital Elevation Model (DEM)
 
Remote sensing e course (Geohydrology)
Remote sensing e course (Geohydrology)Remote sensing e course (Geohydrology)
Remote sensing e course (Geohydrology)
 
Pulvirenti_IGARSS2011.ppt
Pulvirenti_IGARSS2011.pptPulvirenti_IGARSS2011.ppt
Pulvirenti_IGARSS2011.ppt
 
Af33174179
Af33174179Af33174179
Af33174179
 
Poster: MMSP 2008
Poster: MMSP 2008Poster: MMSP 2008
Poster: MMSP 2008
 
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...
 
geographic information system pdf
geographic information system pdfgeographic information system pdf
geographic information system pdf
 

Similar to Generating Training Data from Noisy Measrements

Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...
grssieee
 
IGARSS_2011_GALLOZA.pptx
IGARSS_2011_GALLOZA.pptxIGARSS_2011_GALLOZA.pptx
IGARSS_2011_GALLOZA.pptx
grssieee
 
Rb euregeo 2012 poster 2
Rb euregeo 2012 poster 2Rb euregeo 2012 poster 2
Rb euregeo 2012 poster 2
Ricardo Brasil
 
Yang-IGARSS2011-1082.pptx
Yang-IGARSS2011-1082.pptxYang-IGARSS2011-1082.pptx
Yang-IGARSS2011-1082.pptx
grssieee
 
AT_MB_MM_IGARSS2011.ppt
AT_MB_MM_IGARSS2011.pptAT_MB_MM_IGARSS2011.ppt
AT_MB_MM_IGARSS2011.ppt
grssieee
 
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATORSIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
grssieee
 
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNIWE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
grssieee
 
2_Goodenough_IGARSS11_Final.ppt
2_Goodenough_IGARSS11_Final.ppt2_Goodenough_IGARSS11_Final.ppt
2_Goodenough_IGARSS11_Final.ppt
grssieee
 
Kim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptxKim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptx
grssieee
 

Similar to Generating Training Data from Noisy Measrements (20)

DRONES IN HYDROLOGY
DRONES IN HYDROLOGYDRONES IN HYDROLOGY
DRONES IN HYDROLOGY
 
Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...
 
Copernicus Land Moniotring Service Portfolio
Copernicus Land Moniotring Service PortfolioCopernicus Land Moniotring Service Portfolio
Copernicus Land Moniotring Service Portfolio
 
IGARSS_2011_GALLOZA.pptx
IGARSS_2011_GALLOZA.pptxIGARSS_2011_GALLOZA.pptx
IGARSS_2011_GALLOZA.pptx
 
Atmospheric Correction of Remote Sensing Data_RamaRao.pptx
Atmospheric Correction of Remote Sensing Data_RamaRao.pptxAtmospheric Correction of Remote Sensing Data_RamaRao.pptx
Atmospheric Correction of Remote Sensing Data_RamaRao.pptx
 
Use of UAS for Hydrological Monitoring
Use of UAS for Hydrological MonitoringUse of UAS for Hydrological Monitoring
Use of UAS for Hydrological Monitoring
 
Rb euregeo 2012 poster 2
Rb euregeo 2012 poster 2Rb euregeo 2012 poster 2
Rb euregeo 2012 poster 2
 
Yang-IGARSS2011-1082.pptx
Yang-IGARSS2011-1082.pptxYang-IGARSS2011-1082.pptx
Yang-IGARSS2011-1082.pptx
 
AT_MB_MM_IGARSS2011.ppt
AT_MB_MM_IGARSS2011.pptAT_MB_MM_IGARSS2011.ppt
AT_MB_MM_IGARSS2011.ppt
 
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATORSIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR
 
Failed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networksFailed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networks
 
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNIWE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI
 
Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...
 
2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
 
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...
 
2_Goodenough_IGARSS11_Final.ppt
2_Goodenough_IGARSS11_Final.ppt2_Goodenough_IGARSS11_Final.ppt
2_Goodenough_IGARSS11_Final.ppt
 
Landsat calibration summary_rse
Landsat calibration summary_rseLandsat calibration summary_rse
Landsat calibration summary_rse
 
Landsat calibration summary_rse
Landsat calibration summary_rseLandsat calibration summary_rse
Landsat calibration summary_rse
 
Kim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptxKim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptx
 
Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain
Atmospheric Correction of Remotely Sensed Images in Spatial and Transform DomainAtmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain
Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain
 

More from Louisa Diggs

More from Louisa Diggs (20)

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
 
Using Active Learning to Quantify how Training Data Errors Impact Classificat...
Using Active Learning to Quantify how Training Data Errors Impact Classificat...Using Active Learning to Quantify how Training Data Errors Impact Classificat...
Using Active Learning to Quantify how Training Data Errors Impact Classificat...
 
Machine Learning for Better Maps
Machine Learning for Better MapsMachine Learning for Better Maps
Machine Learning for Better Maps
 
Cropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireCropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & Fire
 
Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?
 
A Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingA Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover Mapping
 
Assessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataAssessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain Data
 
Informal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingInformal Settlements and Cadastral Mapping
Informal Settlements and Cadastral Mapping
 
Sources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchSources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations Research
 
Measuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionMeasuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervision
 
Mapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataMapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite Data
 
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASACrowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
 
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
 
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaIMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
 
IMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatIMED 2018: Landcover/habitat
IMED 2018: Landcover/habitat
 
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
 
IMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningIMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine Learning
 
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
 
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Generating Training Data from Noisy Measrements

  • 1. Generating Training Data from Noisy Measurements HAMED ALEMOHAMMAD LEAD GEOSPATIAL DATA SCIENTIST
  • 2. ML Hub Earth  Machine Learning commons for EO  Training data  Models  Standards and best practices
  • 3. Global Land Cover Training Dataset  Human-verified training dataset  Using open-source Sentinel-2 imagery  10 m spatial resolution.  Global and geo-diverse
  • 4. Workflow S2 L2A Reflectance S2 L2A Classification GlobeLand30 Labels (2010) Filtered Labels Class Predictions Class Verification (Human) Model Training
  • 5. Data  Input Data:  10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2  20 m bands scaled to 10m using bi-cubic interpolation  Reference/Label Data:  GlobeLand30 labels for 2010 used as a source  Classes mapped to REF Land Cover Taxonomy  Labels re-gridded to Sentinel-2 grid using nearest neighbor  Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification (produced as part of atmospheric correction)  Filtered labels used as reference labels for training
  • 6.
  • 7. Methodology  A pixel-based supervised Random Forests model trained for each scene.  Pixels without valid reflectance are excluded from training.  Training on class-stratified samples of half the pixels in a scene with one Sentinel-2 pixel at 10 m for each label pixel at 30 m.  Predictions are made on all pixels marked with usable classes during Level-2A processing, including pixels labeled as unclassified.  Annual labels will be generated by aggregating time series of predictions and probabilities from the same tile throughout the year.
  • 8. Results  88.75% average model accuracy across 4 diverse scenes.  Some classes, like water and snow/ice, predicted with high accuracy and high confidence across all scenes.  Other classes, like wetland and (semi) natural vegetation, are subtler and were expected to be more difficult to classify.  Woody vegetation and cultivated vegetation were predicted relatively accurately and not confused with each other, as a result of including 20 m red edge bands, resampled to 10 m.  Artificial bare ground tended to be predicted in unclassified regions (in reference data), taking over areas of natural bare ground and cultivated vegetation and suggesting that traces of human activity would lead to pixels classified as artificial bare ground in off-vegetation season.
  • 10.
  • 11. What about non-categorical variables?  True value of categorical variables vs true value of continuous variables:  Crop Yield  Soil Moisture  Temperature  Precipitation  All measurements of continuous variables are prone to uncertainty (noise and bias).  How to reduce/eliminate these uncertainties in training data?
  • 12. In-SituModel Satellite Truth Noisy and biased measurement systems slide courtesy of K. McColl
  • 13. Generating Training Dataset  Triple collocation (TC) is a technique for estimating the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as zero-error “truth”. 𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖 = 𝑄𝑖𝑖 − 𝑄 𝑖𝑗 𝑄𝑖𝑘 𝑄 𝑗𝑘  TC-based RMSE estimates at each pixel are used to compute a priori probability (𝑃𝑖) of selecting a particular dataset: 𝑃𝑖 = 1 𝜎𝜀𝑖 2 𝑖=1 3 1 𝜎𝜀𝑖 2
  • 14. Sample time series of a pixel 𝑋1 𝑋2 𝑋3 𝑡1 𝑡2 𝑡3 𝑡 𝑁 𝑋 𝑇
  • 15.
  • 16.
  • 18. Alemohammad, et al., Biogeosciences, 2017
  • 19. Alemohammad, et al., Biogeosciences, 2017
  • 20. Things to check  Sentinel-2 L2A classes  What are the usable classes there?  Plot actual scene + artificial bare ground