Generating Training Data from Noisy Measrements

•Download as PPTX, PDF•

0 likes•40 views

This document discusses generating training data for machine learning models from noisy measurements of land cover classifications. It describes a workflow that uses Sentinel-2 satellite imagery and GlobeLand30 land cover labels to train a random forests model for land cover classification. Key points include: - Sentinel-2 and GlobeLand30 data are used as input, with GlobeLand30 labels filtered and resampled to the Sentinel-2 grid to create reference labels. - A random forests model is trained separately for each Sentinel-2 scene using stratified samples of pixels. - Initial results show 88.75% average accuracy across scenes, with some classes like water predicting well and others like wetlands being more difficult.

Technology

Generating Training Data from Noisy
Measurements
HAMED ALEMOHAMMAD
LEAD GEOSPATIAL DATA SCIENTIST

ML Hub Earth
 Machine Learning commons for EO
 Training data
 Models
 Standards and best practices

Global Land Cover Training Dataset
 Human-verified training dataset
 Using open-source Sentinel-2 imagery
 10 m spatial resolution.
 Global and geo-diverse

Workflow
S2 L2A
Reflectance
S2 L2A
Classification
GlobeLand30
Labels (2010)
Filtered Labels
Class
Predictions
Class
Verification
(Human)
Model
Training

Data
 Input Data:
 10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2
 20 m bands scaled to 10m using bi-cubic interpolation
 Reference/Label Data:
 GlobeLand30 labels for 2010 used as a source
 Classes mapped to REF Land Cover Taxonomy
 Labels re-gridded to Sentinel-2 grid using nearest neighbor
 Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification
(produced as part of atmospheric correction)
 Filtered labels used as reference labels for training

Methodology
 A pixel-based supervised Random Forests model trained for each scene.
 Pixels without valid reflectance are excluded from training.
 Training on class-stratified samples of half the pixels in a scene with one
Sentinel-2 pixel at 10 m for each label pixel at 30 m.
 Predictions are made on all pixels marked with usable classes during Level-2A
processing, including pixels labeled as unclassified.
 Annual labels will be generated by aggregating time series of predictions and
probabilities from the same tile throughout the year.

Results
 88.75% average model accuracy across 4 diverse scenes.
 Some classes, like water and snow/ice, predicted with high accuracy and high
confidence across all scenes.
 Other classes, like wetland and (semi) natural vegetation, are subtler and were
expected to be more difficult to classify.
 Woody vegetation and cultivated vegetation were predicted relatively
accurately and not confused with each other, as a result of including 20 m red
edge bands, resampled to 10 m.
 Artificial bare ground tended to be predicted in unclassified regions (in
reference data), taking over areas of natural bare ground and cultivated
vegetation and suggesting that traces of human activity would lead to pixels
classified as artificial bare ground in off-vegetation season.

What about non-categorical variables?
 True value of categorical variables vs true value of continuous variables:
 Crop Yield
 Soil Moisture
 Temperature
 Precipitation
 All measurements of continuous variables are prone to uncertainty (noise and
bias).
 How to reduce/eliminate these uncertainties in training data?

In-SituModel Satellite
Truth
Noisy and biased measurement systems
slide courtesy of K. McColl

Generating Training Dataset
 Triple collocation (TC) is a technique for estimating the unknown error standard
deviations (or RMSEs) of three mutually independent measurement systems,
without treating any one system as zero-error “truth”.
𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖
= 𝑄𝑖𝑖 −
𝑄 𝑖𝑗 𝑄𝑖𝑘
𝑄 𝑗𝑘
 TC-based RMSE estimates at each pixel are used to compute a priori probability
(𝑃𝑖) of selecting a particular dataset:
𝑃𝑖 =
1
𝜎𝜀𝑖
2
𝑖=1
3 1
𝜎𝜀𝑖
2

Sample time series of a pixel
𝑋1 𝑋2 𝑋3
𝑡1
𝑡2
𝑡3
𝑡 𝑁
𝑋 𝑇

Alemohammad, et al., Biogeosciences, 2017

Things to check
 Sentinel-2 L2A classes
 What are the usable classes there?
 Plot actual scene + artificial bare ground

What's hot

Hyperspectral images provide detailed spectral info rmation with more than several hundred channels. On the other hand, the high dimensionalit y in hyperspectral images also causes to classification problems due to the huge ratio betwe en the number of training samples and the features. In this paper, Lyapunov Exponents (LEs) a re used to determine chaotic-type structure of EO- 1 Hyperion hyperspectral image, a mixed fore st site in Turkey. Experimental results demonstrate that EO-1 Hyperion image has a chaotic structure by checking distribution of Lyapunov Exponents (LEs) and they can be used as d iscriminative features to improve classification accuracy for hyperspectral images.

Investigation of Chaotic-Type Features in Hyperspectral Satellite Data

csandit

Fragmentation revisited 050902

Niels Nielsen

REMOTE SENSING

musadoto

Retraining maximum likelihood classifiers using low-rank model.ppt

grssieee

HighLoad++ 2017 Зал «Найроби+Касабланка», 7 ноября, 18:00 Тезисы: http://www.highload.ru/2017/abstracts/2987.html Спутниковые изображения Landsat-8 являются одним из наиболее востребованных инструментов мониторинга и исследования земной поверхности. Одной из главных проблем использования этих изображений является зашумленность их облаками и тенями от облаков. На начальном этапе обработки необходимо "отбраковать" зашумленные части сцен или сцены целиком. Традиционные алгоритмы, такие как Fmask неплохо работают, однако они требуют достаточно сложной и тонкой настройки параметров и порогов, зависят от "проблемной" частотной полосы TIRS (Termal Infra Red), а также плохо справляются с тонкими облаками верхних ярусов. ...

Распознавание облаков и теней на спутниковых изображениях с использованием гл...

Ontico

Hsc 340 10 14

CSULB

Maciej soja l3_poster

Maciej Soja

Raster data analysis

Abdul Raziq

10008-16.antoine_lefebvre2

Antoine Lefebvre

Deep learning provide successful applications in many fields. Recently, machines learning are involved for oceans remote sensing applications. In this study, we use and compare about eight (8) deep learning estimators for retrieval of a mainly pigment of phytoplankton. Depending on the water case and the multiple instruments simultaneously observing the earth on a variety of platforms, several algorithm are used to estimate the chlolophyll-a from marine reflectance.By using a long-term multi-sensor time-series of satellite ocean-colour data, as MODIS, SeaWifs, VIIRS, MERIS, etc…, we make a unique deep network model able to establish a relationship between sea surface reflectance and chlorophyll-a from any measurement satellite sensor over West Africa. These data fusion take into account the bias between case water and instruments.We construct several chlorophyll-a concentration prediction deep learning based models, compare them and therefore use the best for our study. Results obtained for accuracy training and test are quite good. The mean absolute error are very low and vary between 0,07 to 0,13 mg/m3 .

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

ijaia

Robust registration of cloudy satellite images using two step segmentation

I3E Technologies

Irrera gold2010

grssieee

Digital Elevation Model (DEM)

Malla Reddy University

Remote sensing e course (Geohydrology)

Fatwa Ramdani

Pulvirenti_IGARSS2011.ppt

grssieee

Af33174179

IJERA Editor

Poster: MMSP 2008

Mahfuzul Haque

Abstract-The purpose of this study was to determine via spectral separability using divergence measures the best individual and combinations of various numbers of bands for five land cover/ land use classes along the Blue Nile in Sudan. The data for this analysis were a stack of 15 layers including RADARSAT-2 C-band and PALSAR L-band quad-polarized radar registered with ASTER optical data, as well as four variance texture measures extracted from the RADARSAT-2 images. Spectral signatures were obtained for each class and examined by various separability measures. This examination is useful for better understanding the relative value of different types of remote sensing data and best band combinations for possible visual analysis and for improving land cover/ land use classification accuracy. Results show that the best single band for analysis was the RADARSAT-2 VH variance texture measure. The best pair of bands was the ASTER visible red and the RADARSAT-2 HV variance texture, which also included the PALSAR VH band for the best three band combination, all bands being very different data types. Further, based upon the divergence values, only eight bands are needed to achieve maximum separation between land cover/ land use classes. Beyond this point, classification accuracy is expected to decrease, with as few as six bands needed to reach viable classification accuracy.

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...

rsmahabir

geographic information system pdf

Rolan Ben Lorono

What's hot (19)

Investigation of Chaotic-Type Features in Hyperspectral Satellite Data

Fragmentation revisited 050902

REMOTE SENSING

Retraining maximum likelihood classifiers using low-rank model.ppt

Распознавание облаков и теней на спутниковых изображениях с использованием гл...

Hsc 340 10 14

Maciej soja l3_poster

Raster data analysis

10008-16.antoine_lefebvre2

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

Robust registration of cloudy satellite images using two step segmentation

Irrera gold2010

Digital Elevation Model (DEM)

Remote sensing e course (Geohydrology)

Pulvirenti_IGARSS2011.ppt

Af33174179

Poster: MMSP 2008

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...

geographic information system pdf

Similar to Generating Training Data from Noisy Measrements

DRONES IN HYDROLOGY

Salvatore Manfreda

Molinier - Feature Selection for Tree Species Identification in Very High res...

grssieee

The Copernicus land monitoring service provides geographical information on land cover and on variables related, for instance, to the vegetation state or the water cycle. It supports applications in a variety of domains such as spatial planning, forest management, water management, agriculture and food security, etc. The service became operational in 2012. It consists of three main components: ◾A global component; ◾A Pan-European component; ◾A local component.

Copernicus Land Moniotring Service Portfolio

CLMS

IGARSS_2011_GALLOZA.pptx

grssieee

Atmospheric Correction of Remote Sensing Data_RamaRao.pptx

ssusercd49c0

Use of UAS for Hydrological Monitoring

Salvatore Manfreda

Rb euregeo 2012 poster 2

Ricardo Brasil

Yang-IGARSS2011-1082.pptx

grssieee

AT_MB_MM_IGARSS2011.ppt

grssieee

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR

grssieee

Cognitive radio networks enable a more efficient use of the radioelectric spectrum through dynamic access. Decentralized cognitive radio networks have gained popularity due to their advantages over centralized networks. The purpose of this article is to propose the collaboration between secondary users for cognitive Wi-Fi networks, in the form of two multi-criteria decision-making algorithms known as TOPSIS and VIKOR and assess their performance in terms of the number of failed handoffs. The comparative analysis is established under four different scenarios, according to the service class and the traffic level, within the Wi-Fi frequency band. The results show the performance evaluation obtained through simulations and experimental measurements, where the VIKOR algorithm has a better performance in terms of failed handoffs under different scenarios and collaboration levels.

Failed handoffs in collaborative Wi-Fi networks

TELKOMNIKA JOURNAL

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI

grssieee

http://www.fao.org/globalsoilpartnership This presentation was made during the International Workshop "Soil Spectroscopy: the present and future of Soil Monitoring" that took place at FAO HQ Rome, Italy from 4-6 December 2013. This presentation was made by Matt Aitkenhead, Malcolm Coull & Jean Robertson, and it has the objective to present Prediction of soil properties with NIR data and site descriptors using preprocessing and neural networks. ©FAO: http://www.fao.org

Prediction of soil properties with NIR data and site descriptors using prepro...

FAO

Ozone (O3) is a powerful oxidizer (e.g. reacting with oxygen). Ozone in the upper atmosphere is considered beneficial due to the ability of the compound to filter harmful UV rays generated from the sun. However, ground level concentrations of ozone influence animal and plant health. In animals, one symptom of ground level ozone is lung tissue damage resulting in respiratory complications. Excess ozone in plants can cause excessive water loss; thus, emulate drought conditions. Ozone simulates the stomata cell in plant leaves so that these cells do not function properly. That is the stomata cells do not close completely, resulting in excess water loss (Smith et al. 2008). Anthropogenic ozone can be created via internal combustion engines and coal fired power plants. Collecting data from the Environmental Protection Agency (EPA) CASTnet site for the time periods 1990 to 2010 I use spatial interpolation techniques to create an ozone surface concentration for the contiguous United States.

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...

GIS in the Rockies

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

gerogepatton

2_Goodenough_IGARSS11_Final.ppt

grssieee

Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors Gyanesh Chander a,⁎, Brian L. Markham b, Dennis L. Helder c a SGT, Inc. 1 contractor to the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center, Sioux Falls, SD 57198-0001, USA b National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC), Greenbelt, MD 20771, USA c South Dakota State University (SDSU), Brookings, SD 57007, USA

Landsat calibration summary_rse

Alejandro González Castillo

Landsat calibration summary_rse

Alejandro González Castillo

Kim_WE3_T05_2.pptx

grssieee

Remotely sensed data is an effective source of information for monitoring changes in land use and land cover. However remotely sensed images are often degraded due to atmospheric effects or physical limitations. Atmospheric correction minimizes or removes the atmospheric influences that are added to the pure signal of target and to extract more accurate information. The atmospheric correction is often considered critical pre-processing step to achieve full spectral information from every pixel especially with hyperspectral and multispectral data. In this paper, multispectral atmospheric correction approaches that require no ancillary data are presented in spatial domain and transform domain. We propose atmospheric correction using linear regression model based on the wavelet transform and Fourier transform. They are tested on Landsat image consisting of 7 multispectral bands and their performance is evaluated using visual and statistical measures. The application of the atmospheric correction methods for vegetation analyses using Normalized Difference Vegetation Index is also presented in this paper.

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain

CSCJournals

Similar to Generating Training Data from Noisy Measrements (20)

DRONES IN HYDROLOGY

Molinier - Feature Selection for Tree Species Identification in Very High res...

Copernicus Land Moniotring Service Portfolio

IGARSS_2011_GALLOZA.pptx

Atmospheric Correction of Remote Sensing Data_RamaRao.pptx

Use of UAS for Hydrological Monitoring

Rb euregeo 2012 poster 2

Yang-IGARSS2011-1082.pptx

AT_MB_MM_IGARSS2011.ppt

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR

Failed handoffs in collaborative Wi-Fi networks

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI

Prediction of soil properties with NIR data and site descriptors using prepro...

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

2_Goodenough_IGARSS11_Final.ppt

Landsat calibration summary_rse

Kim_WE3_T05_2.pptx

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Exploring Multimodal Embeddings with Milvus

Zilliz

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

ICT role in 21st century education and its challenges

rafiqahmad00786416

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Architecting Cloud Native Applications

WSO2

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

[BuildWithAI] Introduction to Gemini.pdf

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

AXA XL - Insurer Innovation Award Americas 2024

Exploring the Future Potential of AI-Enabled Smartphone Processors

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

presentation ICT roal in 21st century education

Apidays New York 2024 - The value of a flexible API Management solution for O...

Exploring Multimodal Embeddings with Milvus

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

ICT role in 21st century education and its challenges

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

FWD Group - Insurer Innovation Award 2024

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

MS Copilot expands with MS Graph connectors

Manulife - Insurer Transformation Award 2024

Architecting Cloud Native Applications

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Generating Training Data from Noisy Measrements

1. Generating Training Data from Noisy Measurements HAMED ALEMOHAMMAD LEAD GEOSPATIAL DATA SCIENTIST

2. ML Hub Earth  Machine Learning commons for EO  Training data  Models  Standards and best practices

3. Global Land Cover Training Dataset  Human-verified training dataset  Using open-source Sentinel-2 imagery  10 m spatial resolution.  Global and geo-diverse

4. Workflow S2 L2A Reflectance S2 L2A Classification GlobeLand30 Labels (2010) Filtered Labels Class Predictions Class Verification (Human) Model Training

5. Data  Input Data:  10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2  20 m bands scaled to 10m using bi-cubic interpolation  Reference/Label Data:  GlobeLand30 labels for 2010 used as a source  Classes mapped to REF Land Cover Taxonomy  Labels re-gridded to Sentinel-2 grid using nearest neighbor  Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification (produced as part of atmospheric correction)  Filtered labels used as reference labels for training

7. Methodology  A pixel-based supervised Random Forests model trained for each scene.  Pixels without valid reflectance are excluded from training.  Training on class-stratified samples of half the pixels in a scene with one Sentinel-2 pixel at 10 m for each label pixel at 30 m.  Predictions are made on all pixels marked with usable classes during Level-2A processing, including pixels labeled as unclassified.  Annual labels will be generated by aggregating time series of predictions and probabilities from the same tile throughout the year.

8. Results  88.75% average model accuracy across 4 diverse scenes.  Some classes, like water and snow/ice, predicted with high accuracy and high confidence across all scenes.  Other classes, like wetland and (semi) natural vegetation, are subtler and were expected to be more difficult to classify.  Woody vegetation and cultivated vegetation were predicted relatively accurately and not confused with each other, as a result of including 20 m red edge bands, resampled to 10 m.  Artificial bare ground tended to be predicted in unclassified regions (in reference data), taking over areas of natural bare ground and cultivated vegetation and suggesting that traces of human activity would lead to pixels classified as artificial bare ground in off-vegetation season.

9. Results

10.

11. What about non-categorical variables?  True value of categorical variables vs true value of continuous variables:  Crop Yield  Soil Moisture  Temperature  Precipitation  All measurements of continuous variables are prone to uncertainty (noise and bias).  How to reduce/eliminate these uncertainties in training data?

12. In-SituModel Satellite Truth Noisy and biased measurement systems slide courtesy of K. McColl

13. Generating Training Dataset  Triple collocation (TC) is a technique for estimating the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as zero-error “truth”. 𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖 = 𝑄𝑖𝑖 − 𝑄 𝑖𝑗 𝑄𝑖𝑘 𝑄 𝑗𝑘  TC-based RMSE estimates at each pixel are used to compute a priori probability (𝑃𝑖) of selecting a particular dataset: 𝑃𝑖 = 1 𝜎𝜀𝑖 2 𝑖=1 3 1 𝜎𝜀𝑖 2

14. Sample time series of a pixel 𝑋1 𝑋2 𝑋3 𝑡1 𝑡2 𝑡3 𝑡 𝑁 𝑋 𝑇

15.

16.

17. Backup Slides

18. Alemohammad, et al., Biogeosciences, 2017

19. Alemohammad, et al., Biogeosciences, 2017

20. Things to check  Sentinel-2 L2A classes  What are the usable classes there?  Plot actual scene + artificial bare ground

Generating Training Data from Noisy Measrements

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Generating Training Data from Noisy Measrements

Similar to Generating Training Data from Noisy Measrements (20)

More from Louisa Diggs

More from Louisa Diggs (20)

Recently uploaded

Recently uploaded (20)

Generating Training Data from Noisy Measrements