SlideShare a Scribd company logo
1 of 37
Download to read offline
Using active learning to quantify how training
data errors impact classification accuracy over
smallholder-dominated agricultural systems
Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang,
Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor,
Dennis McRitchie, Lyndon Estes
Clark University|Clark Labs
University of California Santa Barbara
AWS Cloud Credits for Research Program
IIASA
Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
Problem 1: High spatial variability
Problem 2: High temporal variability
Bing Base Map
PlanetScope Analytic
Problem 3: Interpretation errors in training data
High spatial & temporal resolution imagery
Active Learning
01
00
11
Train
Predict
Select
Re-label
Label
Debats et al, 2017
Study Region
prob 

(%)
prob 

(%)
Growing Season
Dry Season
Labelling component:
Crowdsourcing Platform
True positive (TP) False positive (FP) False negative (FN) True negative (TN)
score = in_accuracy * β0 +
out_accuracy * β1 +
fragmentation * β2 +
edge_accuracy * β3 +
categorical_accuracy * β4
Accuracy assessment and consensus labelling
Probability
Bayesian Model Averaging
Label collection
! " # = % ! &' # !("|#, &')
,
'-.
Bayesian Model Averaging
Heat map
! " # = % ! &' # !("|#, &')
,
'-.
Consensus Label
Probability
Debats et al (2016)
A generalized computer vision approach to mapping crop fields in
heterogeneous agricultural landscapes
Remote Sensing Environment 179
Machine Learning component
1. On the fly feature extraction
2. Spark ML RandomForest
GeoTrellis/
GeoPySpark
Does Training Data Error Impact Classification Performance?
Next Steps
1. Errors in image atmospheric corrections
2. Increase feature space for classifier
3. Improve label quality
4. Quantify gap between worker and ground
Worker map
Ground truth(y)
Where lies the truth?
8
Circle Bias, many
false positive
identified because
of overreliance on
circular features
https://github.com/ecoh
ydro/CropMask_RCNN
Probability
score above
.7 deemed a
center pivot
Tested on
never before
seen
512x512 tiles
11
Some center
pivots are
missed
because of
date mismatch
between
imagery and
labels of the
reference
dataset
BAYESIAN MODEL AVERAGING:
! " # = %
&'(
)
! *& # !("|#, *&)
": the ground truth, which will be either ‘field’ or ‘no field’
#: the given data of crowdsourcing opinions for labeling this pixel
(e.g., # = {#mapper_1 = field , #mapper_/= no field, …} )
*&: the Mappers considered
(1) 012234&’s opinion: how much probability to
be "
(2) Weight (or evidence): is the probability that we weigh
012234&’s opinion based on their mapping history
combining crowdsourcing labels from their mapping history
MAPPER OPINION
In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no
field’). So ! " #, %& = 0 )* 1
(1) !(" = -./01|#& = -./01, %&) = 1
(2) !(" = 4) -./01|#& = -./01, %&) = 0
(3) !(" = 4) -./01|#& = 4) -./01, %&) = 1
(4) !(" = -./01|#& = 4) -./01, %&) = 0
WEIGHT
Weight: ! "# $ ∝ ! $ "# !("#)
(1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score
(combining geometric and thematic accuracy) to represent our belief
(()*) ∝ (∑,-.
/
01234,) /7
(2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(-
.
8
9:;#) [1][2]
BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, "
‘BIC simply reduces to maximum likelihood when the number of parameters is equal
for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment,
( J )* ∝ K J ̂F, )* (Maximum mapper likelihood)
(? is the sample number, A
is the parameter number to
be estimated (our case has
only one, i.e., L), ML is the
label that maximizes the
likelihood function)
WEIGHT (CONTI.)
Weight: ! "# $ ∝ ! $ "# !("#)
Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood)
(1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8
9 :;<
:;<=>?<
) /A
(2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8
9 :?<
:?<=>;<
) /A
D $ ̂-, " can be computed as:
* Maximum mapper likelihood is actually average producer’s accuracy of the mapper
SUMMARY
! " # = ∑&'(
)
! *& # !("|#, *&)
weight = score ∗ producer′s accuracy ∝ P M8 D
P("|D, M8) = 0 ;< 1
Labeling:
If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label
as field; otherwise, we give a label as no field
The posterior probability of the pixel label " given the data of mappers’ opinions (#):
(*& is the mapper ?)
→ ! " # =
∑FGH
I
JK&LMNF∗ O(P|Q,RF)
∑FGH
I
JK&LMNF
, where

More Related Content

Similar to Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guideprateek kumar
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopLucasMakinen1
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin NUI Galway
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationAlessandro Samuel-Rosa
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptxrani marri
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
2. data types, variables and operators
2. data types, variables and operators2. data types, variables and operators
2. data types, variables and operatorsPhD Research Scholar
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesMax De Marzi
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyAnkoor Bhagat
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 

Similar to Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems (20)

Iowa_Report_2
Iowa_Report_2Iowa_Report_2
Iowa_Report_2
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guide
 
T. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI WorkshopT. Lucas Makinen x Imperial SBI Workshop
T. Lucas Makinen x Imperial SBI Workshop
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
FinalReportFoxMelle
FinalReportFoxMelleFinalReportFoxMelle
FinalReportFoxMelle
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimation
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
2. data types, variables and operators
2. data types, variables and operators2. data types, variables and operators
2. data types, variables and operators
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-Uncertainty
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 

More from Louisa Diggs

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Louisa Diggs
 
Machine Learning for Better Maps
Machine Learning for Better MapsMachine Learning for Better Maps
Machine Learning for Better MapsLouisa Diggs
 
Generating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsGenerating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsLouisa Diggs
 
Cropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireCropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireLouisa Diggs
 
Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Louisa Diggs
 
A Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingA Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingLouisa Diggs
 
Assessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataAssessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataLouisa Diggs
 
Informal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingInformal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingLouisa Diggs
 
Sources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchSources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchLouisa Diggs
 
Measuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionMeasuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionLouisa Diggs
 
Mapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataMapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataLouisa Diggs
 
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASACrowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASALouisa Diggs
 
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...Louisa Diggs
 
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaIMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaLouisa Diggs
 
IMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatIMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatLouisa Diggs
 
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...Louisa Diggs
 
IMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningIMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningLouisa Diggs
 
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...Louisa Diggs
 
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...Louisa Diggs
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...Louisa Diggs
 

More from Louisa Diggs (20)

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...
 
Machine Learning for Better Maps
Machine Learning for Better MapsMachine Learning for Better Maps
Machine Learning for Better Maps
 
Generating Training Data from Noisy Measrements
Generating Training Data from Noisy MeasrementsGenerating Training Data from Noisy Measrements
Generating Training Data from Noisy Measrements
 
Cropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & FireCropped Field Boundaries, Food Systems, & Fire
Cropped Field Boundaries, Food Systems, & Fire
 
Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?Challenges to Large Scale Mapping: Can Data Geometry Help?
Challenges to Large Scale Mapping: Can Data Geometry Help?
 
A Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover MappingA Random Walk of Issues Related to Training Data and Land Cover Mapping
A Random Walk of Issues Related to Training Data and Land Cover Mapping
 
Assessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain DataAssessing Land Cover Change using Uncertain Data
Assessing Land Cover Change using Uncertain Data
 
Informal Settlements and Cadastral Mapping
Informal Settlements and Cadastral MappingInformal Settlements and Cadastral Mapping
Informal Settlements and Cadastral Mapping
 
Sources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations ResearchSources of Map Error in Public Health Activities and Operations Research
Sources of Map Error in Public Health Activities and Operations Research
 
Measuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervisionMeasuring the impact of label noise on semantic segmentation using rastervision
Measuring the impact of label noise on semantic segmentation using rastervision
 
Mapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite DataMapping Smallholder Yields Using Micro-Satellite Data
Mapping Smallholder Yields Using Micro-Satellite Data
 
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASACrowdsourcing Land Cover and Land Use Data: Experiences from IIASA
Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA
 
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...IMED 2018: The use of remote sensing, geostatistical and machine learning met...
IMED 2018: The use of remote sensing, geostatistical and machine learning met...
 
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaIMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia
 
IMED 2018: Landcover/habitat
IMED 2018: Landcover/habitatIMED 2018: Landcover/habitat
IMED 2018: Landcover/habitat
 
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...
 
IMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine LearningIMED 2018: An intro to Remote Sensing and Machine Learning
IMED 2018: An intro to Remote Sensing and Machine Learning
 
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...
 
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

  • 1. Using active learning to quantify how training data errors impact classification accuracy over smallholder-dominated agricultural systems Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang, Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor, Dennis McRitchie, Lyndon Estes Clark University|Clark Labs University of California Santa Barbara
  • 2. AWS Cloud Credits for Research Program IIASA
  • 3. Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
  • 4. Problem 1: High spatial variability
  • 5. Problem 2: High temporal variability Bing Base Map PlanetScope Analytic
  • 6. Problem 3: Interpretation errors in training data
  • 7. High spatial & temporal resolution imagery
  • 10.
  • 16.
  • 17.
  • 18.
  • 19. True positive (TP) False positive (FP) False negative (FN) True negative (TN)
  • 20. score = in_accuracy * β0 + out_accuracy * β1 + fragmentation * β2 + edge_accuracy * β3 + categorical_accuracy * β4
  • 21. Accuracy assessment and consensus labelling Probability
  • 22. Bayesian Model Averaging Label collection ! " # = % ! &' # !("|#, &') , '-.
  • 23. Bayesian Model Averaging Heat map ! " # = % ! &' # !("|#, &') , '-.
  • 26. Debats et al (2016) A generalized computer vision approach to mapping crop fields in heterogeneous agricultural landscapes Remote Sensing Environment 179 Machine Learning component 1. On the fly feature extraction 2. Spark ML RandomForest GeoTrellis/ GeoPySpark
  • 27. Does Training Data Error Impact Classification Performance?
  • 28.
  • 29. Next Steps 1. Errors in image atmospheric corrections 2. Increase feature space for classifier 3. Improve label quality 4. Quantify gap between worker and ground
  • 31. 8 Circle Bias, many false positive identified because of overreliance on circular features https://github.com/ecoh ydro/CropMask_RCNN
  • 32. Probability score above .7 deemed a center pivot Tested on never before seen 512x512 tiles 11 Some center pivots are missed because of date mismatch between imagery and labels of the reference dataset
  • 33. BAYESIAN MODEL AVERAGING: ! " # = % &'( ) ! *& # !("|#, *&) ": the ground truth, which will be either ‘field’ or ‘no field’ #: the given data of crowdsourcing opinions for labeling this pixel (e.g., # = {#mapper_1 = field , #mapper_/= no field, …} ) *&: the Mappers considered (1) 012234&’s opinion: how much probability to be " (2) Weight (or evidence): is the probability that we weigh 012234&’s opinion based on their mapping history combining crowdsourcing labels from their mapping history
  • 34. MAPPER OPINION In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no field’). So ! " #, %& = 0 )* 1 (1) !(" = -./01|#& = -./01, %&) = 1 (2) !(" = 4) -./01|#& = -./01, %&) = 0 (3) !(" = 4) -./01|#& = 4) -./01, %&) = 1 (4) !(" = -./01|#& = 4) -./01, %&) = 0
  • 35. WEIGHT Weight: ! "# $ ∝ ! $ "# !("#) (1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score (combining geometric and thematic accuracy) to represent our belief (()*) ∝ (∑,-. / 01234,) /7 (2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(- . 8 9:;#) [1][2] BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, " ‘BIC simply reduces to maximum likelihood when the number of parameters is equal for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment, ( J )* ∝ K J ̂F, )* (Maximum mapper likelihood) (? is the sample number, A is the parameter number to be estimated (our case has only one, i.e., L), ML is the label that maximizes the likelihood function)
  • 36. WEIGHT (CONTI.) Weight: ! "# $ ∝ ! $ "# !("#) Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood) (1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8 9 :;< :;<=>?< ) /A (2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8 9 :?< :?<=>;< ) /A D $ ̂-, " can be computed as: * Maximum mapper likelihood is actually average producer’s accuracy of the mapper
  • 37. SUMMARY ! " # = ∑&'( ) ! *& # !("|#, *&) weight = score ∗ producer′s accuracy ∝ P M8 D P("|D, M8) = 0 ;< 1 Labeling: If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label as field; otherwise, we give a label as no field The posterior probability of the pixel label " given the data of mappers’ opinions (#): (*& is the mapper ?) → ! " # = ∑FGH I JK&LMNF∗ O(P|Q,RF) ∑FGH I JK&LMNF , where