SlideShare a Scribd company logo
ALEXANDER BRAYLAN* AND MATTHEW LEASE
The University of Texas at Austin
http://ir.ischool.utexas.edu/
APRIL 2020
MODELING AND AGGREGATION
OF COMPLEX ANNOTATIONS VIA
ANNOTATION DISTANCE
Simple annotation & aggregation
• classification
– sentiment analysis
– image categorization
• ordinal rating
– product & movie reviews
– search relevance
• multiple choice selection
– quizzes
Aggregation
• Crowd-sourcing: quality
control
• Experts: wisdom of crowds
• Goal is to select best label
available for each item
1
What’s the capital of Texas?
Austin
Austin
Houston
2
What’s the capital of Texas?
Austin
Austin
Houston
Majority Vote
3
Caption this image:
4
A cat is
eating
The cat
eats
A beautiful
picture
Caption this image:
When majority voting falls short
Problem: large label space, exact match doesn’t work!
5
A cat is
eating
The cat
eats
A beautiful
picture
What about complex annotations?
Ranked lists
Parse trees
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
Image captions
Range sequences
6
Outline
• Prior work
• Approach
• Experiments
• Conclusion
7
Aggregating Simple Labels
• Hundreds of papers
• Multiple benchmarking studies
• Rich body of Bayesian modeling
• General-purpose aggregation
models for simple labels don’t
support complex labels!
Dawid-Skene MACE
Hierarchical Dawid-Skene
Item Difficulty
Logistic Random Effects
Source:
Paun et al 2018
“Comparing bayesian
models of annotation”
8
Task-specific models
• Pros:
– Task specialization
maximizes accuracy
• Cons:
– Need new model for
every task
– Complicated, difficult
to formulate
Nguyen et al 2007 (Sequences)
Lin, Mausam, and Weld 2012 (Math)
9
Task-specific workflows
• Pros:
– Empower workers
for complex tasks
• Cons:
– Need new workflow
for every task
– Complicated, difficult
to formulate
Noronha et al 2011
(image analysis)
Lasecki et al 2012
(transcription)
10
Our goals
• We want aggregation for complex data types
– Build on ideas from simple label aggregation models
• We want to generalize across many labeling tasks
– Can we reduce problem to common simpler state space?
11
Outline
• Prior work
• Approach
• Experiments
• Conclusion
12
Key Insight
• Partial credit matching via task-specific distance function
– Encapsulate task-specific label features into requester distance function
– Model annotation distances rather than annotations
– Distance functions already exist for most tasks because people need
evaluation functions to compare predicted labels vs gold
13
Distance functions
14
Properties of distance functions
Non-negativity
Symmetry
Triangle inequality
Data Free Text Rankings
Example
evaluation fn
BLEU(x, y)
Kendall’s
𝜏(x, y)
Example
distance fn
1 –
BLEU(x, y)!BLEU(y, x)
"
1 - 𝜏(x, y)
Non-negativity ✓ ✓
Symmetry ✓ ✓
Triangle
inequality
✓ ✓
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
15
• Example task: text annotation
• Example distance function:
string edit distance
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.05
0.1
0.1
16
• Example task: text annotation
• Example distance function:
string edit distance
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
17
0.82
• Example task: text annotation
• Example distance function:
string edit distance
A1: A cat is eating
A2: The cat eats
A3: A beautiful
picture
0.1 0.6
0.3
18
All tasks reduce to matrices of
annotation distances
How to aggregate given distances
• Local selection model
• Global selection model
• Combined
19
Current item
Other items
Local approach: Smallest Avg Distance
• For each item:
1. Compute average distance between
annotations for the item
2. Choose annotation with smallest
average distance
• Generalization of majority vote
• Independence between items
• Local approach does not model
annotator reliability
20
Current item
Other items
Global approach: Best Available User
• For each annotator:
– Score by average distance over full dataset
• For each item:
– Choose label by best-scoring annotator
• Fixed annotator reliability
• Global approach does not model how
well annotators did on specific items
21
Current item
Other items
Can we get best of both worlds?
• Want a method that combines:
– Best available user (global)
– Smallest avg distance (local)
• Should build on rich history of work on Bayesian annotation modeling
• Need a principled framework for modeling annotation distance matrices
weights
votes weighted voting
22
Multidimensional Annotation Scaling (MAS)
• Based on Multidimensional
Scaling (Kruskal & Wish 1978)
• Probabilistic model of multi-
item distance matrices
• “Hierarchical Bayesian”
– Additional learned parameters
represent crowd effects such as
worker reliability
A cat is
eating
The cat
eats
A beautiful
picture
24
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
26
MAS Objective 2: Prior
Annotation prior
probability objective:
εiu =
!"iu
|!"iu|
#εiu ∼ Normal
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
Pseudo-gold
27
MAS Objective 2: Prior
Annotation prior
probability objective:
εiu =
!"iu
|!"iu|
#εiu ∼ N(0, γuδi)
• !ε :	unnormalized embedding
• γu : annotator error
• δi : item difficulty
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
28
MAS Objective 2: Prior
Annotation prior
probability objective:
εiu =
!"iu
|!"iu|
#εiu ∼ N(0, γuδi)
• !ε :	unnormalized embedding
• γu : annotator error
• δi : item difficulty
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
29
MAS Objective 2: Prior
Annotation prior
probability objective:
εiu =
!"iu
|!"iu|
#εiu ∼ N(0, 𝛄uδi)
• !ε :	unnormalized embedding
• 𝛄u : annotator error
• δi : item difficulty
𝜸 = 0.1 𝜸 = 0.4
𝜸 = 0.5
𝜸 = 0.7
30
MAS Objective 2: Prior
Annotation prior
probability objective:
εiu =
!"iu
|!"iu|
#εiu ∼ N(0, 𝛄uδi)
• !ε :	unnormalized embedding
• 𝛄u : annotator error
• δi : item difficulty
𝜸 = 0.1 𝜸 = 0.4
𝜸 = 0.5
𝜸 = 0.7
31
Outline
• Prior work
• Approach
• Experiments
• Conclusion
32
Tasks & datasets
SYNTHETIC DATASETS
• Syntactic parse trees
– Distance function: evalb
• Ranked lists
– Distance function: Kendall’s
tau
REAL DATASETS
• Biomedical text sequences
– Distance function: Span F1
• Urdu-English translations
– Distance function: GLEU
33
Nguyen et al
2017
Zaidan and
Callison-Burch
2011
Methods
Baselines:
• Random User (RU): pick one label randomly
• ZenCrowd (ZC) (Demartini et al 2012)
– Weighted voting based on exact match (rare!)
• Crowd Hidden Markov Model (CHMM) (Nguyen et al 2017)
– Sequence annotation task only
Upper bound: Oracle (OR) (always picks best label)
• Even if 5 workers answer, limited by best answer any of them gave
34
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.217 0.246
Sequences F1 0.561 0.569 0.702 0.709 0.827
Parses EVALB 0.812 0.819 - 0.932 0.939
Rankings Kendall 𝜏 0.491 0.495 - 0.710 0.724
35
• Diverse complex label datasets
• MAS aggregation is best way to get closer to ground truth with no
model alteration between datasets
Conclusion
• Goal: general-purpose probabilistic model to aggregate complex annotations
– Categorical-based methods insufficient
– Custom models difficult to design for new annotation types
• Solution: Model annotation distances via task-specific distance functions
– Transforms problem into general-purpose variable space
• Multi-dimensional Annotation Scaling (MAS)
– Allows unsupervised weighted voting with inferred annotator reliability
• Not covered in talk (see paper)
– Semi-supervised learning
– Partial credit 36
Current & Future work
Big picture: what is needed to support complex crowd-sourcing?
• Integration with workflow design and other quality-control mechanisms
• Dynamic (online) collection – measuring value of getting another label
• Merging annotations rather than selecting best one
– e.g. guessing weight of an ox
• Learning difficult tasks over time
37
THANK YOU!
Code available at
https://github.com/Praznat/annotationmodeling
A1: A cat is
eating
A2: The cat eats
A3: A beautiful
picture
We thank the crowd workers for the data they contributed for this research study!
APPENDICES
39
MAS Objective 2: Prior
Hierarchical priors:
log(γu) ~ N(log(&γ), 𝜙)
log(δu) ~ N(log(&δ), 𝜓)
• *γ : annotator error location
• *δ : item difficulty location
• 𝜙: annotator error scale
• 𝜓: item difficulty scale
γ2
γ1
γ4
γ3
&γ𝜙 𝜙 40
• Traditionally, goal of annotation
aggregation is to determine a
single ground truth per item
• With complex annotations there
could be several acceptable
answers
• Alternate goal is to score each
annotation by expected quality
Experiments: score-all results
41
• Noisy parser experiment: more
workers whose annotations
deviate substantially from gold
• Semi-supervised learning allows
rearrangement of inferred worker
reliability according to similarity to
known gold
Insight: semi-supervised learning
42
Why is it hard to design bespoke
models?
SIMPLE label generative model
P(Lui | gi, 𝜃u, 𝜃i, …)
COMPLEX label generative model
P(Lui | gi, 𝜃u, 𝜃i, …)
Categorical
Categorical
Scalars etc Complex
data type
Scalars etc
Complex
data type
label gold latent parameters
observed unobserved
label gold latent parameters
observed unobserved
43

More Related Content

What's hot

Mixed Effects Models - Growth Curve Analysis
Mixed Effects Models - Growth Curve AnalysisMixed Effects Models - Growth Curve Analysis
Mixed Effects Models - Growth Curve Analysis
Scott Fraundorf
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
GUANGYUAN PIAO
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
KamleshKumar394
 
JIST2015-data challenge
JIST2015-data challengeJIST2015-data challenge
JIST2015-data challenge
GUANGYUAN PIAO
 
Active learning
Active learningActive learning
Mixed Effects Models - Fixed Effects
Mixed Effects Models - Fixed EffectsMixed Effects Models - Fixed Effects
Mixed Effects Models - Fixed Effects
Scott Fraundorf
 
Learning from data
Learning from dataLearning from data
Learning from data
Govind Kanshi
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
Jan Aerts
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Daniel Roggen
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Pieter Heyvaert
 
Mixed Effects Models - Power
Mixed Effects Models - PowerMixed Effects Models - Power
Mixed Effects Models - Power
Scott Fraundorf
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
Sanjay Saha
 
Mixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc ComparisonsMixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc Comparisons
Scott Fraundorf
 
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect Interactions
Scott Fraundorf
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
Duke Network Analysis Center
 
Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems
Badr Hirchoua
 
Mixed Effects Models - Effect Size
Mixed Effects Models - Effect SizeMixed Effects Models - Effect Size
Mixed Effects Models - Effect Size
Scott Fraundorf
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
Sujoy Bag
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
Fattane Zarrinkalam
 

What's hot (20)

Mixed Effects Models - Growth Curve Analysis
Mixed Effects Models - Growth Curve AnalysisMixed Effects Models - Growth Curve Analysis
Mixed Effects Models - Growth Curve Analysis
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
 
JIST2015-data challenge
JIST2015-data challengeJIST2015-data challenge
JIST2015-data challenge
 
Active learning
Active learningActive learning
Active learning
 
Mixed Effects Models - Fixed Effects
Mixed Effects Models - Fixed EffectsMixed Effects Models - Fixed Effects
Mixed Effects Models - Fixed Effects
 
Learning from data
Learning from dataLearning from data
Learning from data
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
 
Mixed Effects Models - Power
Mixed Effects Models - PowerMixed Effects Models - Power
Mixed Effects Models - Power
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Mixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc ComparisonsMixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc Comparisons
 
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect Interactions
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
 
Recommendation systems
Recommendation systems  Recommendation systems
Recommendation systems
 
Mixed Effects Models - Effect Size
Mixed Effects Models - Effect SizeMixed Effects Models - Effect Size
Mixed Effects Models - Effect Size
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 

Similar to Modeling and Aggregation of Complex Annotations

Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
COGS Presentations
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
Barbara Fusinska
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
Ashish Yadav
 
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchVisual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
SOYEON KIM
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
Zachary Thomas
 
Graph Analysis of Student Model Networks
Graph Analysis of Student Model NetworksGraph Analysis of Student Model Networks
Graph Analysis of Student Model Networks
mallium
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
ThAnhonc
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
PrincePatel272012
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
台灣資料科學年會
 
0 introduction
0  introduction0  introduction
0 introduction
Dmitry Grapov
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
Poorabpatel
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
ShehnazIslam1
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
Mutia Ulfi
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
ParveshKumar17303
 
ML overview
ML overviewML overview
ML overview
NoopurRathore1
 
ML_Overview.pptx
ML_Overview.pptxML_Overview.pptx
ML_Overview.pptx
ssuserb0b8ed1
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
vijay251387
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Matthew Powers
 
joe beck cald talk.ppt
joe beck cald talk.pptjoe beck cald talk.ppt
joe beck cald talk.ppt
EverMontoya2
 

Similar to Modeling and Aggregation of Complex Annotations (20)

Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchVisual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Graph Analysis of Student Model Networks
Graph Analysis of Student Model NetworksGraph Analysis of Student Model Networks
Graph Analysis of Student Model Networks
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
 
0 introduction
0  introduction0  introduction
0 introduction
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
 
ML overview
ML overviewML overview
ML overview
 
ML_Overview.pptx
ML_Overview.pptxML_Overview.pptx
ML_Overview.pptx
 
ML_Overview.ppt
ML_Overview.pptML_Overview.ppt
ML_Overview.ppt
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
 
joe beck cald talk.ppt
joe beck cald talk.pptjoe beck cald talk.ppt
joe beck cald talk.ppt
 

Recently uploaded

一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
cjimenez2581
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 

Recently uploaded (20)

一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 

Modeling and Aggregation of Complex Annotations

  • 1. ALEXANDER BRAYLAN* AND MATTHEW LEASE The University of Texas at Austin http://ir.ischool.utexas.edu/ APRIL 2020 MODELING AND AGGREGATION OF COMPLEX ANNOTATIONS VIA ANNOTATION DISTANCE
  • 2. Simple annotation & aggregation • classification – sentiment analysis – image categorization • ordinal rating – product & movie reviews – search relevance • multiple choice selection – quizzes Aggregation • Crowd-sourcing: quality control • Experts: wisdom of crowds • Goal is to select best label available for each item 1
  • 3. What’s the capital of Texas? Austin Austin Houston 2
  • 4. What’s the capital of Texas? Austin Austin Houston Majority Vote 3
  • 5. Caption this image: 4 A cat is eating The cat eats A beautiful picture
  • 6. Caption this image: When majority voting falls short Problem: large label space, exact match doesn’t work! 5 A cat is eating The cat eats A beautiful picture
  • 7. What about complex annotations? Ranked lists Parse trees A1: A cat is eating A2: The cat eats A3: A beautiful picture Image captions Range sequences 6
  • 8. Outline • Prior work • Approach • Experiments • Conclusion 7
  • 9. Aggregating Simple Labels • Hundreds of papers • Multiple benchmarking studies • Rich body of Bayesian modeling • General-purpose aggregation models for simple labels don’t support complex labels! Dawid-Skene MACE Hierarchical Dawid-Skene Item Difficulty Logistic Random Effects Source: Paun et al 2018 “Comparing bayesian models of annotation” 8
  • 10. Task-specific models • Pros: – Task specialization maximizes accuracy • Cons: – Need new model for every task – Complicated, difficult to formulate Nguyen et al 2007 (Sequences) Lin, Mausam, and Weld 2012 (Math) 9
  • 11. Task-specific workflows • Pros: – Empower workers for complex tasks • Cons: – Need new workflow for every task – Complicated, difficult to formulate Noronha et al 2011 (image analysis) Lasecki et al 2012 (transcription) 10
  • 12. Our goals • We want aggregation for complex data types – Build on ideas from simple label aggregation models • We want to generalize across many labeling tasks – Can we reduce problem to common simpler state space? 11
  • 13. Outline • Prior work • Approach • Experiments • Conclusion 12
  • 14. Key Insight • Partial credit matching via task-specific distance function – Encapsulate task-specific label features into requester distance function – Model annotation distances rather than annotations – Distance functions already exist for most tasks because people need evaluation functions to compare predicted labels vs gold 13
  • 15. Distance functions 14 Properties of distance functions Non-negativity Symmetry Triangle inequality Data Free Text Rankings Example evaluation fn BLEU(x, y) Kendall’s 𝜏(x, y) Example distance fn 1 – BLEU(x, y)!BLEU(y, x) " 1 - 𝜏(x, y) Non-negativity ✓ ✓ Symmetry ✓ ✓ Triangle inequality ✓ ✓
  • 16. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 15 • Example task: text annotation • Example distance function: string edit distance
  • 17. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.05 0.1 0.1 16 • Example task: text annotation • Example distance function: string edit distance
  • 18. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 17 0.82 • Example task: text annotation • Example distance function: string edit distance
  • 19. A1: A cat is eating A2: The cat eats A3: A beautiful picture 0.1 0.6 0.3 18 All tasks reduce to matrices of annotation distances
  • 20. How to aggregate given distances • Local selection model • Global selection model • Combined 19 Current item Other items
  • 21. Local approach: Smallest Avg Distance • For each item: 1. Compute average distance between annotations for the item 2. Choose annotation with smallest average distance • Generalization of majority vote • Independence between items • Local approach does not model annotator reliability 20 Current item Other items
  • 22. Global approach: Best Available User • For each annotator: – Score by average distance over full dataset • For each item: – Choose label by best-scoring annotator • Fixed annotator reliability • Global approach does not model how well annotators did on specific items 21 Current item Other items
  • 23. Can we get best of both worlds? • Want a method that combines: – Best available user (global) – Smallest avg distance (local) • Should build on rich history of work on Bayesian annotation modeling • Need a principled framework for modeling annotation distance matrices weights votes weighted voting 22
  • 24. Multidimensional Annotation Scaling (MAS) • Based on Multidimensional Scaling (Kruskal & Wish 1978) • Probabilistic model of multi- item distance matrices • “Hierarchical Bayesian” – Additional learned parameters represent crowd effects such as worker reliability A cat is eating The cat eats A beautiful picture 24
  • 25. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82
  • 26. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 26
  • 27. MAS Objective 2: Prior Annotation prior probability objective: εiu = !"iu |!"iu| #εiu ∼ Normal “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” Pseudo-gold 27
  • 28. MAS Objective 2: Prior Annotation prior probability objective: εiu = !"iu |!"iu| #εiu ∼ N(0, γuδi) • !ε : unnormalized embedding • γu : annotator error • δi : item difficulty “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 28
  • 29. MAS Objective 2: Prior Annotation prior probability objective: εiu = !"iu |!"iu| #εiu ∼ N(0, γuδi) • !ε : unnormalized embedding • γu : annotator error • δi : item difficulty “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 29
  • 30. MAS Objective 2: Prior Annotation prior probability objective: εiu = !"iu |!"iu| #εiu ∼ N(0, 𝛄uδi) • !ε : unnormalized embedding • 𝛄u : annotator error • δi : item difficulty 𝜸 = 0.1 𝜸 = 0.4 𝜸 = 0.5 𝜸 = 0.7 30
  • 31. MAS Objective 2: Prior Annotation prior probability objective: εiu = !"iu |!"iu| #εiu ∼ N(0, 𝛄uδi) • !ε : unnormalized embedding • 𝛄u : annotator error • δi : item difficulty 𝜸 = 0.1 𝜸 = 0.4 𝜸 = 0.5 𝜸 = 0.7 31
  • 32. Outline • Prior work • Approach • Experiments • Conclusion 32
  • 33. Tasks & datasets SYNTHETIC DATASETS • Syntactic parse trees – Distance function: evalb • Ranked lists – Distance function: Kendall’s tau REAL DATASETS • Biomedical text sequences – Distance function: Span F1 • Urdu-English translations – Distance function: GLEU 33 Nguyen et al 2017 Zaidan and Callison-Burch 2011
  • 34. Methods Baselines: • Random User (RU): pick one label randomly • ZenCrowd (ZC) (Demartini et al 2012) – Weighted voting based on exact match (rare!) • Crowd Hidden Markov Model (CHMM) (Nguyen et al 2017) – Sequence annotation task only Upper bound: Oracle (OR) (always picks best label) • Even if 5 workers answer, limited by best answer any of them gave 34
  • 35. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.217 0.246 Sequences F1 0.561 0.569 0.702 0.709 0.827 Parses EVALB 0.812 0.819 - 0.932 0.939 Rankings Kendall 𝜏 0.491 0.495 - 0.710 0.724 35 • Diverse complex label datasets • MAS aggregation is best way to get closer to ground truth with no model alteration between datasets
  • 36. Conclusion • Goal: general-purpose probabilistic model to aggregate complex annotations – Categorical-based methods insufficient – Custom models difficult to design for new annotation types • Solution: Model annotation distances via task-specific distance functions – Transforms problem into general-purpose variable space • Multi-dimensional Annotation Scaling (MAS) – Allows unsupervised weighted voting with inferred annotator reliability • Not covered in talk (see paper) – Semi-supervised learning – Partial credit 36
  • 37. Current & Future work Big picture: what is needed to support complex crowd-sourcing? • Integration with workflow design and other quality-control mechanisms • Dynamic (online) collection – measuring value of getting another label • Merging annotations rather than selecting best one – e.g. guessing weight of an ox • Learning difficult tasks over time 37
  • 38. THANK YOU! Code available at https://github.com/Praznat/annotationmodeling A1: A cat is eating A2: The cat eats A3: A beautiful picture We thank the crowd workers for the data they contributed for this research study!
  • 40. MAS Objective 2: Prior Hierarchical priors: log(γu) ~ N(log(&γ), 𝜙) log(δu) ~ N(log(&δ), 𝜓) • *γ : annotator error location • *δ : item difficulty location • 𝜙: annotator error scale • 𝜓: item difficulty scale γ2 γ1 γ4 γ3 &γ𝜙 𝜙 40
  • 41. • Traditionally, goal of annotation aggregation is to determine a single ground truth per item • With complex annotations there could be several acceptable answers • Alternate goal is to score each annotation by expected quality Experiments: score-all results 41
  • 42. • Noisy parser experiment: more workers whose annotations deviate substantially from gold • Semi-supervised learning allows rearrangement of inferred worker reliability according to similarity to known gold Insight: semi-supervised learning 42
  • 43. Why is it hard to design bespoke models? SIMPLE label generative model P(Lui | gi, 𝜃u, 𝜃i, …) COMPLEX label generative model P(Lui | gi, 𝜃u, 𝜃i, …) Categorical Categorical Scalars etc Complex data type Scalars etc Complex data type label gold latent parameters observed unobserved label gold latent parameters observed unobserved 43