Quantifying Error in Training Data for Mapping and Monitoring the Earth System - A Workshop on “Quantifying Error in Training Data for Mapping and Monitoring the Earth System” was held on January 8-9, 2019 at Clark University, with support from Omidyar Network’s Property Rights Initiative, now PlaceFund.
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
Similar to Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems (20)
Exploring the Future Potential of AI-Enabled Smartphone Processors
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems
1. Using active learning to quantify how training
data errors impact classification accuracy over
smallholder-dominated agricultural systems
Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang,
Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor,
Dennis McRitchie, Lyndon Estes
Clark University|Clark Labs
University of California Santa Barbara
26. Debats et al (2016)
A generalized computer vision approach to mapping crop fields in
heterogeneous agricultural landscapes
Remote Sensing Environment 179
Machine Learning component
1. On the fly feature extraction
2. Spark ML RandomForest
GeoTrellis/
GeoPySpark
29. Next Steps
1. Errors in image atmospheric corrections
2. Increase feature space for classifier
3. Improve label quality
4. Quantify gap between worker and ground
31. 8
Circle Bias, many
false positive
identified because
of overreliance on
circular features
https://github.com/ecoh
ydro/CropMask_RCNN
32. Probability
score above
.7 deemed a
center pivot
Tested on
never before
seen
512x512 tiles
11
Some center
pivots are
missed
because of
date mismatch
between
imagery and
labels of the
reference
dataset
33. BAYESIAN MODEL AVERAGING:
! " # = %
&'(
)
! *& # !("|#, *&)
": the ground truth, which will be either ‘field’ or ‘no field’
#: the given data of crowdsourcing opinions for labeling this pixel
(e.g., # = {#mapper_1 = field , #mapper_/= no field, …} )
*&: the Mappers considered
(1) 012234&’s opinion: how much probability to
be "
(2) Weight (or evidence): is the probability that we weigh
012234&’s opinion based on their mapping history
combining crowdsourcing labels from their mapping history
34. MAPPER OPINION
In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no
field’). So ! " #, %& = 0 )* 1
(1) !(" = -./01|#& = -./01, %&) = 1
(2) !(" = 4) -./01|#& = -./01, %&) = 0
(3) !(" = 4) -./01|#& = 4) -./01, %&) = 1
(4) !(" = -./01|#& = 4) -./01, %&) = 0
35. WEIGHT
Weight: ! "# $ ∝ ! $ "# !("#)
(1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score
(combining geometric and thematic accuracy) to represent our belief
(()*) ∝ (∑,-.
/
01234,) /7
(2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(-
.
8
9:;#) [1][2]
BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, "
‘BIC simply reduces to maximum likelihood when the number of parameters is equal
for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment,
( J )* ∝ K J ̂F, )* (Maximum mapper likelihood)
(? is the sample number, A
is the parameter number to
be estimated (our case has
only one, i.e., L), ML is the
label that maximizes the
likelihood function)
36. WEIGHT (CONTI.)
Weight: ! "# $ ∝ ! $ "# !("#)
Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood)
(1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8
9 :;<
:;<=>?<
) /A
(2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8
9 :?<
:?<=>;<
) /A
D $ ̂-, " can be computed as:
* Maximum mapper likelihood is actually average producer’s accuracy of the mapper
37. SUMMARY
! " # = ∑&'(
)
! *& # !("|#, *&)
weight = score ∗ producer′s accuracy ∝ P M8 D
P("|D, M8) = 0 ;< 1
Labeling:
If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label
as field; otherwise, we give a label as no field
The posterior probability of the pixel label " given the data of mappers’ opinions (#):
(*& is the mapper ?)
→ ! " # =
∑FGH
I
JK&LMNF∗ O(P|Q,RF)
∑FGH
I
JK&LMNF
, where