SlideShare a Scribd company logo
Matt Lease
Associate Professor
School of Information
The University of Texas at Austin
Amazon Scholar
Human-in-the-loop Services
Amazon Web Services (AWS)
Automated Models for Quantifying
Centrality of Survey Responses
1
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Human-in-the-loop Services
• 3 Team Products: Mechanical Turk,
Sagemaker Ground Truth, and Augmented AI (A2I)
• https://www.amazon.science/research-awards
– Cash and/or AWS credits
• Summer, sabbatical, or longer engagements
– https://www.amazon.science/scholars
– https://www.amazon.science/visiting-academics
• https://www.amazon.science/tag/internships
HTTPS://WWW.HUMANCOMPUTATION.COM
3
What’s the capital of Texas?
Austin
Austin
Houston
4
What’s the capital of Texas?
Austin
Austin
Houston
Majority Vote
5
Simple annotation & aggregation
Classification
• sentiment analysis
• image categorization
Ordinal rating
• product & movie reviews
• search relevance
Aggregation
• Crowdsourcing: quality control
• Experts: wisdom of crowds
• Goal: select best label available
for each item (no label fusion)
6
Caption this image:
7
A cat is
eating
The cat
eats
A beautiful
picture
Caption this image:
When majority voting falls short
Problem: large label space, exact match doesn’t work!
8
A cat is
eating
The cat
eats
A beautiful
picture
What about complex annotations?
Ranked lists
Parse trees
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
Image captions
Range sequences
9
10
Alexander Braylan1 and Matthew Lease2
1
Dept. of Computer Science & 2
School of Information
The University of Texas at Austin
Modeling and Aggregation of Complex
Annotations via Annotation Distance
Code & Data: https://github.com/Praznat/annotationmodeling
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
11
https://github.com/Praznat/annotationmodeling
Aggregating Simple Labels
• Hundreds of papers
• Multiple benchmarking studies
• Rich body of Bayesian modeling
• General-purpose aggregation
models for simple labels don’t
support complex labels
Dawid-Skene MACE
Hierarchical Dawid-Skene
Item Difficulty
Logistic Random Effects
Source:
Paun et al 2018
“Comparing bayesian
models of annotation”
12
https://github.com/Praznat/annotationmodeling
Task-specific models
• Pros:
– Task specialization
maximizes accuracy
• Cons:
– Need new model for
every task
– Complicated, difficult
to formulate
Nguyen et al 2017 (Sequences)
Lin, Mausam, and Weld 2012 (Math)
13
https://github.com/Praznat/annotationmodeling
Our goals
• We want aggregation for complex data types
– Build on ideas from simple label aggregation models
• We want to generalize across many labeling tasks
– Can we reduce problem to common simpler state space?
14
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
15
https://github.com/Praznat/annotationmodeling
Key Insight
Partial credit matching via task-specific distance function
• Adopt or define a distance function for each annotation task
• Model annotation distances uniformly across tasks
• Distance functions already exist for many task types
– Free-text responses, e.g., survey questions
16
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
17
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.05
0.1
0.1
18
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
19
0.82
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Example Distance: Levenshtein
20
https://github.com/Praznat/annotationmodeling
Example Distance: Word embeddings
21
https://github.com/Praznat/annotationmodeling
Distance function properties
22
Properties of distance functions
Non-negativity
Symmetry
Triangle inequality
Data Free Text Rankings
Example
evaluation fn
BLEU(x, y)
Example
distance fn
Non-negativity ✓ ✓
Symmetry ✓ ✓
Triangle
inequality
✓ ✓
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
23
0.82
https://github.com/Praznat/annotationmodeling
A1: A cat is eating
A2: The cat eats
A3: A beautiful
picture
0.1 0.6
0.3
24
All tasks reduce to
matrices of distances
https://github.com/Praznat/annotationmodeling
How to aggregate given distances
• Local selection model
• Global selection model
• Combined
25
Current item
Other items
https://github.com/Praznat/annotationmodeling
Local approach: Smallest Avg Distance (SAD)
• For each question: compute average
distance between responses
• The response with smallest average
distance is locally most normative,
generalizing majority vote
• Independence between items
• Local approach does not model
respondent agreement
26
Current item
Other items
https://github.com/Praznat/annotationmodeling
Global approach: Best Available User (BAU)
• Score each participant by their
average distance to all other
participants across all questions
• The participant with lowest score is
globally most normative; treat their
response as most normative
• Global approach ignores distance
observed on the current item
27
Current item
Other items
https://github.com/Praznat/annotationmodeling
Can we get best of both worlds?
• Want a method that combines:
– Best available user (global)
– Smallest avg distance (local)
• Should build on rich history of work on Bayesian annotation modeling
• Need a principled framework for modeling annotation distance matrices
weights
votes weighted voting
28
https://github.com/Praznat/annotationmodeling
Multidimensional Annotation Scaling (MAS)
• Based on Multidimensional
Scaling (Kruskal & Wish 1978)
• Probabilistic model of multi-
item distance matrices
• “Hierarchical Bayesian”
– Additional learned parameters
represent crowd effects such as
worker reliability
A cat is
eating
The cat
eats
A beautiful
picture
29
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
30
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
31
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
Pseudo-gold
32
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
33
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
34
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
35
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
36
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
37
https://github.com/Praznat/annotationmodeling
Example Output: father
38
Response SAD MAS
He always speaks ill about his father behind back. 0.78 0.16
He always speaks ill of his father behind his back. 0.71 0.30
He always talks about his father behind his back. 0.74 0.50
He always speaks ill of his father 0.78 0.55
He always speak ill of his father. 0.79 0.62
He is always talking about his father behind his back. 0.82 0.63
He always says behind his father. 0.90 0.72
He always talks about his dad behind his back. 0.83 0.73
https://github.com/Praznat/annotationmodelingç
Example Output: she says
39
Response SAD MAS
Please be sure to take a note of what she says. 0.77 0.16
Please take a note of what she says. 0.84 0.30
Be sure to take a warning notice what she says. 0.86 0.46
Please be sure to take notes what she says. 0.81 0.48
Please take a note what she say. 0.92 0.73
Please be sure to take instructions for her saying. 0.93 0.76
Make sure to insert disclaimer about what she says. 0.93 0.80
Please make a memo whatever she says. 0.99 0.82
https://github.com/Praznat/annotationmodelingç
Example Output: quiet
40
Response SAD MAS
As long as you keep quiet you may stay here 0.83 0.26
You can stay here as long as you keep quiet. 0.86 0.39
You may stay here if you keep quiet. 0.81 0.39
You can stay here if you keep quiet. 0.82 0.57
So long as you remain quiet you may stay here. 0.92 0.57
If it is quiet you may stay here 0.90 0.70
If you keep quiet you can stay here. 0.92 0.81
You may be here if you keep quiet. 0.91 0.84
https://github.com/Praznat/annotationmodelingç
Example Output: go ahead
41
Response SAD MAS
Please go ahead if i am late. 0.83 0.16
Please go ahead if I'm late. 0.79 0.28
Please go ahead if I delayed. 0.82 0.51
Please go without me if I'm late. 0.91 0.62
Please go ahead if I get late 0.83 0.67
Please go ahead and leave if I'm late. 0.88 0.74
If I am late you can go in first. 1.00 0.79
If I should be late go without me. 1.00 0.81
https://github.com/Praznat/annotationmodelingç
Example Output: married
42
Response SAD MAS
Actually they are not married 0.91 0.18
To tell the truth they are not couple 0.79 0.47
To tell the truth they are not a married couple 0.84 0.62
To tell the truth they're not married 0.89 0.63
In fact they are not couple 0.94 0.69
to telling the truth we're not married 0.97 0.71
Two people are not couples in truth 1.00 0.79
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
43
https://github.com/Praznat/annotationmodeling
Conclusion
• Probabilistic model identifies normative vs. outlier
responses by quantifying distance between responses
• Many choices for measuring distance between two
texts (e.g., character-based or more semantic NLP)
• 3 models: local (SAD), global (BAU), or combo (MAS)
• Open source: github.com/Praznat/annotationmodeling
44
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
https://github.com/Praznat/annotationmodeling
Future work
45
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
• From objective labeling tasks to subjective responses
• Evaluation on survey data
– Collaboration with behavioral science researchers?
– Compare distance functions and model settings for utility
• Automatic detection of consistent biases in a
participant’s responses vs. what’s group normative
https://github.com/Praznat/annotationmodeling
46
Matt Lease (University of Texas at Austin)
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
We thank our many talented crowd workers
for their contributions to our research!
https://github.com/Praznat/annotationmodeling
Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ]
Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via
Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020.
[ bib | pdf | data | sourcecode | video | slides ]
Bonus Material
47
MTurk: The Early Days
48
• Artificial Intelligence, With Help From the Humans.
– J. Pontin. NY Times, March 25, 2007
• Is Amazon's Mechanical Turk a Failure? April 9, 2007
– “As of this writing, there are [only] 128 HITs available on Mechanical Turk.”
• Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
2008: the ”Gold” Rush Begins
Braylan and Lease 49
Snow et al, EMNLP (Natural Language Processing)
• Annotating human language for natural language processing (NLP)
• 22,000 labels for only $26 USD
• Crowd’s consensus labels can replace traditional expert labels
“Discovery” sparks rush for “gold” data across areas
• Alonso et al., SIGIR Forum (Information Retrieval)
• Kittur et al., CHI (Human-Computer Interaction)
• Sorokin and Forsythe, CVPR (Computer Vision)
2010-11: Social & Behavioral Sciences
50
• A Guide to Behavioral Experiments on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
The Future of Crowd Work (ACM CSCW’13)
by Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
51
Braylan and Lease 52
Example Output
53
https://github.com/Praznat/annotationmodeling
Braylan and Lease 54
Tasks & datasets
SYNTHETIC DATASETS
• Syntactic parse trees
– Distance function: evalb
• Ranked lists
– Distance function: Kendall’s tau
REAL DATASETS
• Biomedical text sequences
– Distance function: Span F1
• Urdu-English translations
– Distance function: GLEU
55
Nguyen et al 2017
Zaidan and Callison-Burch 2011
Methods
Baselines:
• Random User (RU): pick one label randomly
• ZenCrowd (ZC) (Demartini et al. 2012)
– Weighted voting based on exact match (rare!)
• Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017)
– Sequence annotation task only
Upper bound: Oracle (OR) (always picks best label)
• Even if 5 workers answer, limited by best answer any of them gave
56
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.246
Sequences F1 0.561 0.827
Parses EVALB 0.812 0.939
Rankings 0.491 0.724
57
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 0.246
Sequences F1 0.561 0.569 0.827
Parses EVALB 0.812 0.819 0.939
Rankings 0.491 0.495 0.724
58
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.246
Sequences F1 0.561 0.569 0.702 0.827
Parses EVALB 0.812 0.819 - 0.939
Rankings 0.491 0.495 - 0.724
59
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.217 0.246
Sequences F1 0.561 0.569 0.702 0.709 0.827
Parses EVALB 0.812 0.819 - 0.932 0.939
Rankings 0.491 0.495 - 0.710 0.724
60
• Diverse complex label datasets
• MAS aggregation is best way to get closer to ground truth with no
model alteration between datasets
Braylan and Lease 61
62
Goal: Design a future of Artificial Intelligence (AI)
technologies to meet society’s needs and values.
.
http://goodsystems.utexas.edu
Good Systems: an 8-year, $10M
UT Austin Grand Challenge
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at over 100 universities around the world
63
What’s an Information School?
Task-specific workflows
• Pros:
– Empower workers
for complex tasks
• Cons:
– Need new workflow
for every task
– Complicated, difficult
to formulate
Noronha et al 2011
(image analysis)
Lasecki et al 2012
(transcription)
64

More Related Content

Similar to Automated Models for Quantifying Centrality of Survey Responses

Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Fwdays
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues
 
DSLs Internas e Ruby
DSLs Internas e RubyDSLs Internas e Ruby
DSLs Internas e Ruby
Fabio Kung
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
Sease
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Chris Fregly
 
Measuring Coverage From E2E Tests
Measuring Coverage From E2E TestsMeasuring Coverage From E2E Tests
Measuring Coverage From E2E Tests
Anand Bagmar
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
Yaroslav Serhieiev
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 
How to not blow up spaceships
How to not blow up spaceshipsHow to not blow up spaceships
How to not blow up spaceships
Sabin Marcu
 
Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)
Ignasi Marimon-Clos i Sunyol
 
ADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming APIADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming API
Michael Remijan
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
Sri Ambati
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
Andrew Flatters
 
デザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみてデザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみて
Yahoo!デベロッパーネットワーク
 
The Why and What of Pattern Lab
The Why and What of Pattern LabThe Why and What of Pattern Lab
The Why and What of Pattern Lab
Dave Olsen
 
Chris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing work
Chris O'Brien
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
Vsevolod Dyomkin
 
Oop principles
Oop principlesOop principles
Oop principles
Md. Mahedee Hasan
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012
Josh Patterson
 

Similar to Automated Models for Quantifying Centrality of Survey Responses (20)

Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
DSLs Internas e Ruby
DSLs Internas e RubyDSLs Internas e Ruby
DSLs Internas e Ruby
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Measuring Coverage From E2E Tests
Measuring Coverage From E2E TestsMeasuring Coverage From E2E Tests
Measuring Coverage From E2E Tests
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
How to not blow up spaceships
How to not blow up spaceshipsHow to not blow up spaceships
How to not blow up spaceships
 
Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)
 
ADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming APIADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming API
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
デザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみてデザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみて
 
The Why and What of Pattern Lab
The Why and What of Pattern LabThe Why and What of Pattern Lab
The Why and What of Pattern Lab
 
Chris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing work
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
 
Oop principles
Oop principlesOop principles
Oop principles
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012
 

More from Matthew Lease

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
Matthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
Matthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
Matthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
Matthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Matthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
Matthew Lease
 

More from Matthew Lease (20)

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 

Recently uploaded

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 

Recently uploaded (20)

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 

Automated Models for Quantifying Centrality of Survey Responses

  • 1. Matt Lease Associate Professor School of Information The University of Texas at Austin Amazon Scholar Human-in-the-loop Services Amazon Web Services (AWS) Automated Models for Quantifying Centrality of Survey Responses 1 Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark. Human-in-the-loop Services • 3 Team Products: Mechanical Turk, Sagemaker Ground Truth, and Augmented AI (A2I) • https://www.amazon.science/research-awards – Cash and/or AWS credits • Summer, sabbatical, or longer engagements – https://www.amazon.science/scholars – https://www.amazon.science/visiting-academics • https://www.amazon.science/tag/internships
  • 4. What’s the capital of Texas? Austin Austin Houston 4
  • 5. What’s the capital of Texas? Austin Austin Houston Majority Vote 5
  • 6. Simple annotation & aggregation Classification • sentiment analysis • image categorization Ordinal rating • product & movie reviews • search relevance Aggregation • Crowdsourcing: quality control • Experts: wisdom of crowds • Goal: select best label available for each item (no label fusion) 6
  • 7. Caption this image: 7 A cat is eating The cat eats A beautiful picture
  • 8. Caption this image: When majority voting falls short Problem: large label space, exact match doesn’t work! 8 A cat is eating The cat eats A beautiful picture
  • 9. What about complex annotations? Ranked lists Parse trees A1: A cat is eating A2: The cat eats A3: A beautiful picture Image captions Range sequences 9
  • 10. 10 Alexander Braylan1 and Matthew Lease2 1 Dept. of Computer Science & 2 School of Information The University of Texas at Austin Modeling and Aggregation of Complex Annotations via Annotation Distance Code & Data: https://github.com/Praznat/annotationmodeling https://github.com/Praznat/annotationmodeling
  • 11. Roadmap • Prior work • Approach • Example outputs • Conclusion 11 https://github.com/Praznat/annotationmodeling
  • 12. Aggregating Simple Labels • Hundreds of papers • Multiple benchmarking studies • Rich body of Bayesian modeling • General-purpose aggregation models for simple labels don’t support complex labels Dawid-Skene MACE Hierarchical Dawid-Skene Item Difficulty Logistic Random Effects Source: Paun et al 2018 “Comparing bayesian models of annotation” 12 https://github.com/Praznat/annotationmodeling
  • 13. Task-specific models • Pros: – Task specialization maximizes accuracy • Cons: – Need new model for every task – Complicated, difficult to formulate Nguyen et al 2017 (Sequences) Lin, Mausam, and Weld 2012 (Math) 13 https://github.com/Praznat/annotationmodeling
  • 14. Our goals • We want aggregation for complex data types – Build on ideas from simple label aggregation models • We want to generalize across many labeling tasks – Can we reduce problem to common simpler state space? 14 https://github.com/Praznat/annotationmodeling
  • 15. Roadmap • Prior work • Approach • Example outputs • Conclusion 15 https://github.com/Praznat/annotationmodeling
  • 16. Key Insight Partial credit matching via task-specific distance function • Adopt or define a distance function for each annotation task • Model annotation distances uniformly across tasks • Distance functions already exist for many task types – Free-text responses, e.g., survey questions 16 https://github.com/Praznat/annotationmodeling
  • 17. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 17 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 18. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.05 0.1 0.1 18 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 19. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 19 0.82 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 21. Example Distance: Word embeddings 21 https://github.com/Praznat/annotationmodeling
  • 22. Distance function properties 22 Properties of distance functions Non-negativity Symmetry Triangle inequality Data Free Text Rankings Example evaluation fn BLEU(x, y) Example distance fn Non-negativity ✓ ✓ Symmetry ✓ ✓ Triangle inequality ✓ ✓ https://github.com/Praznat/annotationmodeling
  • 23. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 23 0.82 https://github.com/Praznat/annotationmodeling
  • 24. A1: A cat is eating A2: The cat eats A3: A beautiful picture 0.1 0.6 0.3 24 All tasks reduce to matrices of distances https://github.com/Praznat/annotationmodeling
  • 25. How to aggregate given distances • Local selection model • Global selection model • Combined 25 Current item Other items https://github.com/Praznat/annotationmodeling
  • 26. Local approach: Smallest Avg Distance (SAD) • For each question: compute average distance between responses • The response with smallest average distance is locally most normative, generalizing majority vote • Independence between items • Local approach does not model respondent agreement 26 Current item Other items https://github.com/Praznat/annotationmodeling
  • 27. Global approach: Best Available User (BAU) • Score each participant by their average distance to all other participants across all questions • The participant with lowest score is globally most normative; treat their response as most normative • Global approach ignores distance observed on the current item 27 Current item Other items https://github.com/Praznat/annotationmodeling
  • 28. Can we get best of both worlds? • Want a method that combines: – Best available user (global) – Smallest avg distance (local) • Should build on rich history of work on Bayesian annotation modeling • Need a principled framework for modeling annotation distance matrices weights votes weighted voting 28 https://github.com/Praznat/annotationmodeling
  • 29. Multidimensional Annotation Scaling (MAS) • Based on Multidimensional Scaling (Kruskal & Wish 1978) • Probabilistic model of multi- item distance matrices • “Hierarchical Bayesian” – Additional learned parameters represent crowd effects such as worker reliability A cat is eating The cat eats A beautiful picture 29 https://github.com/Praznat/annotationmodeling
  • 30. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 30 https://github.com/Praznat/annotationmodeling
  • 31. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 31 https://github.com/Praznat/annotationmodeling
  • 32. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” Pseudo-gold 32 https://github.com/Praznat/annotationmodeling
  • 33. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 33 https://github.com/Praznat/annotationmodeling
  • 34. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 34 https://github.com/Praznat/annotationmodeling
  • 35. MAS Objective 2: Prior 35 https://github.com/Praznat/annotationmodeling
  • 36. MAS Objective 2: Prior 36 https://github.com/Praznat/annotationmodelingç
  • 37. Roadmap • Prior work • Approach • Example outputs • Conclusion 37 https://github.com/Praznat/annotationmodeling
  • 38. Example Output: father 38 Response SAD MAS He always speaks ill about his father behind back. 0.78 0.16 He always speaks ill of his father behind his back. 0.71 0.30 He always talks about his father behind his back. 0.74 0.50 He always speaks ill of his father 0.78 0.55 He always speak ill of his father. 0.79 0.62 He is always talking about his father behind his back. 0.82 0.63 He always says behind his father. 0.90 0.72 He always talks about his dad behind his back. 0.83 0.73 https://github.com/Praznat/annotationmodelingç
  • 39. Example Output: she says 39 Response SAD MAS Please be sure to take a note of what she says. 0.77 0.16 Please take a note of what she says. 0.84 0.30 Be sure to take a warning notice what she says. 0.86 0.46 Please be sure to take notes what she says. 0.81 0.48 Please take a note what she say. 0.92 0.73 Please be sure to take instructions for her saying. 0.93 0.76 Make sure to insert disclaimer about what she says. 0.93 0.80 Please make a memo whatever she says. 0.99 0.82 https://github.com/Praznat/annotationmodelingç
  • 40. Example Output: quiet 40 Response SAD MAS As long as you keep quiet you may stay here 0.83 0.26 You can stay here as long as you keep quiet. 0.86 0.39 You may stay here if you keep quiet. 0.81 0.39 You can stay here if you keep quiet. 0.82 0.57 So long as you remain quiet you may stay here. 0.92 0.57 If it is quiet you may stay here 0.90 0.70 If you keep quiet you can stay here. 0.92 0.81 You may be here if you keep quiet. 0.91 0.84 https://github.com/Praznat/annotationmodelingç
  • 41. Example Output: go ahead 41 Response SAD MAS Please go ahead if i am late. 0.83 0.16 Please go ahead if I'm late. 0.79 0.28 Please go ahead if I delayed. 0.82 0.51 Please go without me if I'm late. 0.91 0.62 Please go ahead if I get late 0.83 0.67 Please go ahead and leave if I'm late. 0.88 0.74 If I am late you can go in first. 1.00 0.79 If I should be late go without me. 1.00 0.81 https://github.com/Praznat/annotationmodelingç
  • 42. Example Output: married 42 Response SAD MAS Actually they are not married 0.91 0.18 To tell the truth they are not couple 0.79 0.47 To tell the truth they are not a married couple 0.84 0.62 To tell the truth they're not married 0.89 0.63 In fact they are not couple 0.94 0.69 to telling the truth we're not married 0.97 0.71 Two people are not couples in truth 1.00 0.79 https://github.com/Praznat/annotationmodelingç
  • 43. Roadmap • Prior work • Approach • Example outputs • Conclusion 43 https://github.com/Praznat/annotationmodeling
  • 44. Conclusion • Probabilistic model identifies normative vs. outlier responses by quantifying distance between responses • Many choices for measuring distance between two texts (e.g., character-based or more semantic NLP) • 3 models: local (SAD), global (BAU), or combo (MAS) • Open source: github.com/Praznat/annotationmodeling 44 A1: A cat is eating A2: The cat eats A3: A beautiful picture https://github.com/Praznat/annotationmodeling
  • 45. Future work 45 A1: A cat is eating A2: The cat eats A3: A beautiful picture • From objective labeling tasks to subjective responses • Evaluation on survey data – Collaboration with behavioral science researchers? – Compare distance functions and model settings for utility • Automatic detection of consistent biases in a participant’s responses vs. what’s group normative https://github.com/Praznat/annotationmodeling
  • 46. 46 Matt Lease (University of Texas at Austin) Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease We thank our many talented crowd workers for their contributions to our research! https://github.com/Praznat/annotationmodeling Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ] Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020. [ bib | pdf | data | sourcecode | video | slides ]
  • 48. MTurk: The Early Days 48 • Artificial Intelligence, With Help From the Humans. – J. Pontin. NY Times, March 25, 2007 • Is Amazon's Mechanical Turk a Failure? April 9, 2007 – “As of this writing, there are [only] 128 HITs available on Mechanical Turk.” • Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
  • 49. 2008: the ”Gold” Rush Begins Braylan and Lease 49 Snow et al, EMNLP (Natural Language Processing) • Annotating human language for natural language processing (NLP) • 22,000 labels for only $26 USD • Crowd’s consensus labels can replace traditional expert labels “Discovery” sparks rush for “gold” data across areas • Alonso et al., SIGIR Forum (Information Retrieval) • Kittur et al., CHI (Human-Computer Interaction) • Sorokin and Forsythe, CVPR (Computer Vision)
  • 50. 2010-11: Social & Behavioral Sciences 50 • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists
  • 51. The Future of Crowd Work (ACM CSCW’13) by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 51
  • 55. Tasks & datasets SYNTHETIC DATASETS • Syntactic parse trees – Distance function: evalb • Ranked lists – Distance function: Kendall’s tau REAL DATASETS • Biomedical text sequences – Distance function: Span F1 • Urdu-English translations – Distance function: GLEU 55 Nguyen et al 2017 Zaidan and Callison-Burch 2011
  • 56. Methods Baselines: • Random User (RU): pick one label randomly • ZenCrowd (ZC) (Demartini et al. 2012) – Weighted voting based on exact match (rare!) • Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017) – Sequence annotation task only Upper bound: Oracle (OR) (always picks best label) • Even if 5 workers answer, limited by best answer any of them gave 56
  • 57. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.246 Sequences F1 0.561 0.827 Parses EVALB 0.812 0.939 Rankings 0.491 0.724 57 • Diverse complex label datasets
  • 58. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 0.246 Sequences F1 0.561 0.569 0.827 Parses EVALB 0.812 0.819 0.939 Rankings 0.491 0.495 0.724 58 • Diverse complex label datasets
  • 59. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.246 Sequences F1 0.561 0.569 0.702 0.827 Parses EVALB 0.812 0.819 - 0.939 Rankings 0.491 0.495 - 0.724 59 • Diverse complex label datasets
  • 60. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.217 0.246 Sequences F1 0.561 0.569 0.702 0.709 0.827 Parses EVALB 0.812 0.819 - 0.932 0.939 Rankings 0.491 0.495 - 0.710 0.724 60 • Diverse complex label datasets • MAS aggregation is best way to get closer to ground truth with no model alteration between datasets
  • 62. 62 Goal: Design a future of Artificial Intelligence (AI) technologies to meet society’s needs and values. . http://goodsystems.utexas.edu Good Systems: an 8-year, $10M UT Austin Grand Challenge
  • 63. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at over 100 universities around the world 63 What’s an Information School?
  • 64. Task-specific workflows • Pros: – Empower workers for complex tasks • Cons: – Need new workflow for every task – Complicated, difficult to formulate Noronha et al 2011 (image analysis) Lasecki et al 2012 (transcription) 64