SlideShare a Scribd company logo
1 of 64
Matt Lease
Associate Professor
School of Information
The University of Texas at Austin
Amazon Scholar
Human-in-the-loop Services
Amazon Web Services (AWS)
Automated Models for Quantifying
Centrality of Survey Responses
1
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Human-in-the-loop Services
• 3 Team Products: Mechanical Turk,
Sagemaker Ground Truth, and Augmented AI (A2I)
• https://www.amazon.science/research-awards
– Cash and/or AWS credits
• Summer, sabbatical, or longer engagements
– https://www.amazon.science/scholars
– https://www.amazon.science/visiting-academics
• https://www.amazon.science/tag/internships
HTTPS://WWW.HUMANCOMPUTATION.COM
3
What’s the capital of Texas?
Austin
Austin
Houston
4
What’s the capital of Texas?
Austin
Austin
Houston
Majority Vote
5
Simple annotation & aggregation
Classification
• sentiment analysis
• image categorization
Ordinal rating
• product & movie reviews
• search relevance
Aggregation
• Crowdsourcing: quality control
• Experts: wisdom of crowds
• Goal: select best label available
for each item (no label fusion)
6
Caption this image:
7
A cat is
eating
The cat
eats
A beautiful
picture
Caption this image:
When majority voting falls short
Problem: large label space, exact match doesn’t work!
8
A cat is
eating
The cat
eats
A beautiful
picture
What about complex annotations?
Ranked lists
Parse trees
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
Image captions
Range sequences
9
10
Alexander Braylan1 and Matthew Lease2
1
Dept. of Computer Science & 2
School of Information
The University of Texas at Austin
Modeling and Aggregation of Complex
Annotations via Annotation Distance
Code & Data: https://github.com/Praznat/annotationmodeling
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
11
https://github.com/Praznat/annotationmodeling
Aggregating Simple Labels
• Hundreds of papers
• Multiple benchmarking studies
• Rich body of Bayesian modeling
• General-purpose aggregation
models for simple labels don’t
support complex labels
Dawid-Skene MACE
Hierarchical Dawid-Skene
Item Difficulty
Logistic Random Effects
Source:
Paun et al 2018
“Comparing bayesian
models of annotation”
12
https://github.com/Praznat/annotationmodeling
Task-specific models
• Pros:
– Task specialization
maximizes accuracy
• Cons:
– Need new model for
every task
– Complicated, difficult
to formulate
Nguyen et al 2017 (Sequences)
Lin, Mausam, and Weld 2012 (Math)
13
https://github.com/Praznat/annotationmodeling
Our goals
• We want aggregation for complex data types
– Build on ideas from simple label aggregation models
• We want to generalize across many labeling tasks
– Can we reduce problem to common simpler state space?
14
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
15
https://github.com/Praznat/annotationmodeling
Key Insight
Partial credit matching via task-specific distance function
• Adopt or define a distance function for each annotation task
• Model annotation distances uniformly across tasks
• Distance functions already exist for many task types
– Free-text responses, e.g., survey questions
16
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
17
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.05
0.1
0.1
18
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
19
0.82
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Example Distance: Levenshtein
20
https://github.com/Praznat/annotationmodeling
Example Distance: Word embeddings
21
https://github.com/Praznat/annotationmodeling
Distance function properties
22
Properties of distance functions
Non-negativity
Symmetry
Triangle inequality
Data Free Text Rankings
Example
evaluation fn
BLEU(x, y)
Example
distance fn
Non-negativity ✓ ✓
Symmetry ✓ ✓
Triangle
inequality
✓ ✓
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
23
0.82
https://github.com/Praznat/annotationmodeling
A1: A cat is eating
A2: The cat eats
A3: A beautiful
picture
0.1 0.6
0.3
24
All tasks reduce to
matrices of distances
https://github.com/Praznat/annotationmodeling
How to aggregate given distances
• Local selection model
• Global selection model
• Combined
25
Current item
Other items
https://github.com/Praznat/annotationmodeling
Local approach: Smallest Avg Distance (SAD)
• For each question: compute average
distance between responses
• The response with smallest average
distance is locally most normative,
generalizing majority vote
• Independence between items
• Local approach does not model
respondent agreement
26
Current item
Other items
https://github.com/Praznat/annotationmodeling
Global approach: Best Available User (BAU)
• Score each participant by their
average distance to all other
participants across all questions
• The participant with lowest score is
globally most normative; treat their
response as most normative
• Global approach ignores distance
observed on the current item
27
Current item
Other items
https://github.com/Praznat/annotationmodeling
Can we get best of both worlds?
• Want a method that combines:
– Best available user (global)
– Smallest avg distance (local)
• Should build on rich history of work on Bayesian annotation modeling
• Need a principled framework for modeling annotation distance matrices
weights
votes weighted voting
28
https://github.com/Praznat/annotationmodeling
Multidimensional Annotation Scaling (MAS)
• Based on Multidimensional
Scaling (Kruskal & Wish 1978)
• Probabilistic model of multi-
item distance matrices
• “Hierarchical Bayesian”
– Additional learned parameters
represent crowd effects such as
worker reliability
A cat is
eating
The cat
eats
A beautiful
picture
29
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
30
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
31
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
Pseudo-gold
32
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
33
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
34
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
35
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
36
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
37
https://github.com/Praznat/annotationmodeling
Example Output: father
38
Response SAD MAS
He always speaks ill about his father behind back. 0.78 0.16
He always speaks ill of his father behind his back. 0.71 0.30
He always talks about his father behind his back. 0.74 0.50
He always speaks ill of his father 0.78 0.55
He always speak ill of his father. 0.79 0.62
He is always talking about his father behind his back. 0.82 0.63
He always says behind his father. 0.90 0.72
He always talks about his dad behind his back. 0.83 0.73
https://github.com/Praznat/annotationmodelingç
Example Output: she says
39
Response SAD MAS
Please be sure to take a note of what she says. 0.77 0.16
Please take a note of what she says. 0.84 0.30
Be sure to take a warning notice what she says. 0.86 0.46
Please be sure to take notes what she says. 0.81 0.48
Please take a note what she say. 0.92 0.73
Please be sure to take instructions for her saying. 0.93 0.76
Make sure to insert disclaimer about what she says. 0.93 0.80
Please make a memo whatever she says. 0.99 0.82
https://github.com/Praznat/annotationmodelingç
Example Output: quiet
40
Response SAD MAS
As long as you keep quiet you may stay here 0.83 0.26
You can stay here as long as you keep quiet. 0.86 0.39
You may stay here if you keep quiet. 0.81 0.39
You can stay here if you keep quiet. 0.82 0.57
So long as you remain quiet you may stay here. 0.92 0.57
If it is quiet you may stay here 0.90 0.70
If you keep quiet you can stay here. 0.92 0.81
You may be here if you keep quiet. 0.91 0.84
https://github.com/Praznat/annotationmodelingç
Example Output: go ahead
41
Response SAD MAS
Please go ahead if i am late. 0.83 0.16
Please go ahead if I'm late. 0.79 0.28
Please go ahead if I delayed. 0.82 0.51
Please go without me if I'm late. 0.91 0.62
Please go ahead if I get late 0.83 0.67
Please go ahead and leave if I'm late. 0.88 0.74
If I am late you can go in first. 1.00 0.79
If I should be late go without me. 1.00 0.81
https://github.com/Praznat/annotationmodelingç
Example Output: married
42
Response SAD MAS
Actually they are not married 0.91 0.18
To tell the truth they are not couple 0.79 0.47
To tell the truth they are not a married couple 0.84 0.62
To tell the truth they're not married 0.89 0.63
In fact they are not couple 0.94 0.69
to telling the truth we're not married 0.97 0.71
Two people are not couples in truth 1.00 0.79
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
43
https://github.com/Praznat/annotationmodeling
Conclusion
• Probabilistic model identifies normative vs. outlier
responses by quantifying distance between responses
• Many choices for measuring distance between two
texts (e.g., character-based or more semantic NLP)
• 3 models: local (SAD), global (BAU), or combo (MAS)
• Open source: github.com/Praznat/annotationmodeling
44
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
https://github.com/Praznat/annotationmodeling
Future work
45
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
• From objective labeling tasks to subjective responses
• Evaluation on survey data
– Collaboration with behavioral science researchers?
– Compare distance functions and model settings for utility
• Automatic detection of consistent biases in a
participant’s responses vs. what’s group normative
https://github.com/Praznat/annotationmodeling
46
Matt Lease (University of Texas at Austin)
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
We thank our many talented crowd workers
for their contributions to our research!
https://github.com/Praznat/annotationmodeling
Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ]
Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via
Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020.
[ bib | pdf | data | sourcecode | video | slides ]
Bonus Material
47
MTurk: The Early Days
48
• Artificial Intelligence, With Help From the Humans.
– J. Pontin. NY Times, March 25, 2007
• Is Amazon's Mechanical Turk a Failure? April 9, 2007
– “As of this writing, there are [only] 128 HITs available on Mechanical Turk.”
• Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
2008: the ”Gold” Rush Begins
Braylan and Lease 49
Snow et al, EMNLP (Natural Language Processing)
• Annotating human language for natural language processing (NLP)
• 22,000 labels for only $26 USD
• Crowd’s consensus labels can replace traditional expert labels
“Discovery” sparks rush for “gold” data across areas
• Alonso et al., SIGIR Forum (Information Retrieval)
• Kittur et al., CHI (Human-Computer Interaction)
• Sorokin and Forsythe, CVPR (Computer Vision)
2010-11: Social & Behavioral Sciences
50
• A Guide to Behavioral Experiments on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
The Future of Crowd Work (ACM CSCW’13)
by Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
51
Braylan and Lease 52
Example Output
53
https://github.com/Praznat/annotationmodeling
Braylan and Lease 54
Tasks & datasets
SYNTHETIC DATASETS
• Syntactic parse trees
– Distance function: evalb
• Ranked lists
– Distance function: Kendall’s tau
REAL DATASETS
• Biomedical text sequences
– Distance function: Span F1
• Urdu-English translations
– Distance function: GLEU
55
Nguyen et al 2017
Zaidan and Callison-Burch 2011
Methods
Baselines:
• Random User (RU): pick one label randomly
• ZenCrowd (ZC) (Demartini et al. 2012)
– Weighted voting based on exact match (rare!)
• Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017)
– Sequence annotation task only
Upper bound: Oracle (OR) (always picks best label)
• Even if 5 workers answer, limited by best answer any of them gave
56
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.246
Sequences F1 0.561 0.827
Parses EVALB 0.812 0.939
Rankings 0.491 0.724
57
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 0.246
Sequences F1 0.561 0.569 0.827
Parses EVALB 0.812 0.819 0.939
Rankings 0.491 0.495 0.724
58
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.246
Sequences F1 0.561 0.569 0.702 0.827
Parses EVALB 0.812 0.819 - 0.939
Rankings 0.491 0.495 - 0.724
59
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.217 0.246
Sequences F1 0.561 0.569 0.702 0.709 0.827
Parses EVALB 0.812 0.819 - 0.932 0.939
Rankings 0.491 0.495 - 0.710 0.724
60
• Diverse complex label datasets
• MAS aggregation is best way to get closer to ground truth with no
model alteration between datasets
Braylan and Lease 61
62
Goal: Design a future of Artificial Intelligence (AI)
technologies to meet society’s needs and values.
.
http://goodsystems.utexas.edu
Good Systems: an 8-year, $10M
UT Austin Grand Challenge
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at over 100 universities around the world
63
What’s an Information School?
Task-specific workflows
• Pros:
– Empower workers
for complex tasks
• Cons:
– Need new workflow
for every task
– Complicated, difficult
to formulate
Noronha et al 2011
(image analysis)
Lasecki et al 2012
(transcription)
64

More Related Content

Similar to Automated Models for Quantifying Centrality of Survey Responses

Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
DSLs Internas e Ruby
DSLs Internas e RubyDSLs Internas e Ruby
DSLs Internas e RubyFabio Kung
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupSease
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
 
Measuring Coverage From E2E Tests
Measuring Coverage From E2E TestsMeasuring Coverage From E2E Tests
Measuring Coverage From E2E TestsAnand Bagmar
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRAYaroslav Serhieiev
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
How to not blow up spaceships
How to not blow up spaceshipsHow to not blow up spaceships
How to not blow up spaceshipsSabin Marcu
 
ADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming APIADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming APIMichael Remijan
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 
The Why and What of Pattern Lab
The Why and What of Pattern LabThe Why and What of Pattern Lab
The Why and What of Pattern LabDave Olsen
 
Chris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris O'Brien
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Josh Patterson
 

Similar to Automated Models for Quantifying Centrality of Survey Responses (20)

Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
DSLs Internas e Ruby
DSLs Internas e RubyDSLs Internas e Ruby
DSLs Internas e Ruby
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Measuring Coverage From E2E Tests
Measuring Coverage From E2E TestsMeasuring Coverage From E2E Tests
Measuring Coverage From E2E Tests
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
How to not blow up spaceships
How to not blow up spaceshipsHow to not blow up spaceships
How to not blow up spaceships
 
Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)Contributing to Akka (Hacktoberfest 2020)
Contributing to Akka (Hacktoberfest 2020)
 
ADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming APIADT02 - Java 8 Lambdas and the Streaming API
ADT02 - Java 8 Lambdas and the Streaming API
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
デザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみてデザインシステムの海で3年間もがいてみて
デザインシステムの海で3年間もがいてみて
 
The Why and What of Pattern Lab
The Why and What of Pattern LabThe Why and What of Pattern Lab
The Why and What of Pattern Lab
 
Chris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing work
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
 
Oop principles
Oop principlesOop principles
Oop principles
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012
 

More from Matthew Lease

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 

More from Matthew Lease (20)

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Automated Models for Quantifying Centrality of Survey Responses

  • 1. Matt Lease Associate Professor School of Information The University of Texas at Austin Amazon Scholar Human-in-the-loop Services Amazon Web Services (AWS) Automated Models for Quantifying Centrality of Survey Responses 1 Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark. Human-in-the-loop Services • 3 Team Products: Mechanical Turk, Sagemaker Ground Truth, and Augmented AI (A2I) • https://www.amazon.science/research-awards – Cash and/or AWS credits • Summer, sabbatical, or longer engagements – https://www.amazon.science/scholars – https://www.amazon.science/visiting-academics • https://www.amazon.science/tag/internships
  • 4. What’s the capital of Texas? Austin Austin Houston 4
  • 5. What’s the capital of Texas? Austin Austin Houston Majority Vote 5
  • 6. Simple annotation & aggregation Classification • sentiment analysis • image categorization Ordinal rating • product & movie reviews • search relevance Aggregation • Crowdsourcing: quality control • Experts: wisdom of crowds • Goal: select best label available for each item (no label fusion) 6
  • 7. Caption this image: 7 A cat is eating The cat eats A beautiful picture
  • 8. Caption this image: When majority voting falls short Problem: large label space, exact match doesn’t work! 8 A cat is eating The cat eats A beautiful picture
  • 9. What about complex annotations? Ranked lists Parse trees A1: A cat is eating A2: The cat eats A3: A beautiful picture Image captions Range sequences 9
  • 10. 10 Alexander Braylan1 and Matthew Lease2 1 Dept. of Computer Science & 2 School of Information The University of Texas at Austin Modeling and Aggregation of Complex Annotations via Annotation Distance Code & Data: https://github.com/Praznat/annotationmodeling https://github.com/Praznat/annotationmodeling
  • 11. Roadmap • Prior work • Approach • Example outputs • Conclusion 11 https://github.com/Praznat/annotationmodeling
  • 12. Aggregating Simple Labels • Hundreds of papers • Multiple benchmarking studies • Rich body of Bayesian modeling • General-purpose aggregation models for simple labels don’t support complex labels Dawid-Skene MACE Hierarchical Dawid-Skene Item Difficulty Logistic Random Effects Source: Paun et al 2018 “Comparing bayesian models of annotation” 12 https://github.com/Praznat/annotationmodeling
  • 13. Task-specific models • Pros: – Task specialization maximizes accuracy • Cons: – Need new model for every task – Complicated, difficult to formulate Nguyen et al 2017 (Sequences) Lin, Mausam, and Weld 2012 (Math) 13 https://github.com/Praznat/annotationmodeling
  • 14. Our goals • We want aggregation for complex data types – Build on ideas from simple label aggregation models • We want to generalize across many labeling tasks – Can we reduce problem to common simpler state space? 14 https://github.com/Praznat/annotationmodeling
  • 15. Roadmap • Prior work • Approach • Example outputs • Conclusion 15 https://github.com/Praznat/annotationmodeling
  • 16. Key Insight Partial credit matching via task-specific distance function • Adopt or define a distance function for each annotation task • Model annotation distances uniformly across tasks • Distance functions already exist for many task types – Free-text responses, e.g., survey questions 16 https://github.com/Praznat/annotationmodeling
  • 17. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 17 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 18. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.05 0.1 0.1 18 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 19. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 19 0.82 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 21. Example Distance: Word embeddings 21 https://github.com/Praznat/annotationmodeling
  • 22. Distance function properties 22 Properties of distance functions Non-negativity Symmetry Triangle inequality Data Free Text Rankings Example evaluation fn BLEU(x, y) Example distance fn Non-negativity ✓ ✓ Symmetry ✓ ✓ Triangle inequality ✓ ✓ https://github.com/Praznat/annotationmodeling
  • 23. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 23 0.82 https://github.com/Praznat/annotationmodeling
  • 24. A1: A cat is eating A2: The cat eats A3: A beautiful picture 0.1 0.6 0.3 24 All tasks reduce to matrices of distances https://github.com/Praznat/annotationmodeling
  • 25. How to aggregate given distances • Local selection model • Global selection model • Combined 25 Current item Other items https://github.com/Praznat/annotationmodeling
  • 26. Local approach: Smallest Avg Distance (SAD) • For each question: compute average distance between responses • The response with smallest average distance is locally most normative, generalizing majority vote • Independence between items • Local approach does not model respondent agreement 26 Current item Other items https://github.com/Praznat/annotationmodeling
  • 27. Global approach: Best Available User (BAU) • Score each participant by their average distance to all other participants across all questions • The participant with lowest score is globally most normative; treat their response as most normative • Global approach ignores distance observed on the current item 27 Current item Other items https://github.com/Praznat/annotationmodeling
  • 28. Can we get best of both worlds? • Want a method that combines: – Best available user (global) – Smallest avg distance (local) • Should build on rich history of work on Bayesian annotation modeling • Need a principled framework for modeling annotation distance matrices weights votes weighted voting 28 https://github.com/Praznat/annotationmodeling
  • 29. Multidimensional Annotation Scaling (MAS) • Based on Multidimensional Scaling (Kruskal & Wish 1978) • Probabilistic model of multi- item distance matrices • “Hierarchical Bayesian” – Additional learned parameters represent crowd effects such as worker reliability A cat is eating The cat eats A beautiful picture 29 https://github.com/Praznat/annotationmodeling
  • 30. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 30 https://github.com/Praznat/annotationmodeling
  • 31. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 31 https://github.com/Praznat/annotationmodeling
  • 32. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” Pseudo-gold 32 https://github.com/Praznat/annotationmodeling
  • 33. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 33 https://github.com/Praznat/annotationmodeling
  • 34. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 34 https://github.com/Praznat/annotationmodeling
  • 35. MAS Objective 2: Prior 35 https://github.com/Praznat/annotationmodeling
  • 36. MAS Objective 2: Prior 36 https://github.com/Praznat/annotationmodelingç
  • 37. Roadmap • Prior work • Approach • Example outputs • Conclusion 37 https://github.com/Praznat/annotationmodeling
  • 38. Example Output: father 38 Response SAD MAS He always speaks ill about his father behind back. 0.78 0.16 He always speaks ill of his father behind his back. 0.71 0.30 He always talks about his father behind his back. 0.74 0.50 He always speaks ill of his father 0.78 0.55 He always speak ill of his father. 0.79 0.62 He is always talking about his father behind his back. 0.82 0.63 He always says behind his father. 0.90 0.72 He always talks about his dad behind his back. 0.83 0.73 https://github.com/Praznat/annotationmodelingç
  • 39. Example Output: she says 39 Response SAD MAS Please be sure to take a note of what she says. 0.77 0.16 Please take a note of what she says. 0.84 0.30 Be sure to take a warning notice what she says. 0.86 0.46 Please be sure to take notes what she says. 0.81 0.48 Please take a note what she say. 0.92 0.73 Please be sure to take instructions for her saying. 0.93 0.76 Make sure to insert disclaimer about what she says. 0.93 0.80 Please make a memo whatever she says. 0.99 0.82 https://github.com/Praznat/annotationmodelingç
  • 40. Example Output: quiet 40 Response SAD MAS As long as you keep quiet you may stay here 0.83 0.26 You can stay here as long as you keep quiet. 0.86 0.39 You may stay here if you keep quiet. 0.81 0.39 You can stay here if you keep quiet. 0.82 0.57 So long as you remain quiet you may stay here. 0.92 0.57 If it is quiet you may stay here 0.90 0.70 If you keep quiet you can stay here. 0.92 0.81 You may be here if you keep quiet. 0.91 0.84 https://github.com/Praznat/annotationmodelingç
  • 41. Example Output: go ahead 41 Response SAD MAS Please go ahead if i am late. 0.83 0.16 Please go ahead if I'm late. 0.79 0.28 Please go ahead if I delayed. 0.82 0.51 Please go without me if I'm late. 0.91 0.62 Please go ahead if I get late 0.83 0.67 Please go ahead and leave if I'm late. 0.88 0.74 If I am late you can go in first. 1.00 0.79 If I should be late go without me. 1.00 0.81 https://github.com/Praznat/annotationmodelingç
  • 42. Example Output: married 42 Response SAD MAS Actually they are not married 0.91 0.18 To tell the truth they are not couple 0.79 0.47 To tell the truth they are not a married couple 0.84 0.62 To tell the truth they're not married 0.89 0.63 In fact they are not couple 0.94 0.69 to telling the truth we're not married 0.97 0.71 Two people are not couples in truth 1.00 0.79 https://github.com/Praznat/annotationmodelingç
  • 43. Roadmap • Prior work • Approach • Example outputs • Conclusion 43 https://github.com/Praznat/annotationmodeling
  • 44. Conclusion • Probabilistic model identifies normative vs. outlier responses by quantifying distance between responses • Many choices for measuring distance between two texts (e.g., character-based or more semantic NLP) • 3 models: local (SAD), global (BAU), or combo (MAS) • Open source: github.com/Praznat/annotationmodeling 44 A1: A cat is eating A2: The cat eats A3: A beautiful picture https://github.com/Praznat/annotationmodeling
  • 45. Future work 45 A1: A cat is eating A2: The cat eats A3: A beautiful picture • From objective labeling tasks to subjective responses • Evaluation on survey data – Collaboration with behavioral science researchers? – Compare distance functions and model settings for utility • Automatic detection of consistent biases in a participant’s responses vs. what’s group normative https://github.com/Praznat/annotationmodeling
  • 46. 46 Matt Lease (University of Texas at Austin) Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease We thank our many talented crowd workers for their contributions to our research! https://github.com/Praznat/annotationmodeling Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ] Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020. [ bib | pdf | data | sourcecode | video | slides ]
  • 48. MTurk: The Early Days 48 • Artificial Intelligence, With Help From the Humans. – J. Pontin. NY Times, March 25, 2007 • Is Amazon's Mechanical Turk a Failure? April 9, 2007 – “As of this writing, there are [only] 128 HITs available on Mechanical Turk.” • Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
  • 49. 2008: the ”Gold” Rush Begins Braylan and Lease 49 Snow et al, EMNLP (Natural Language Processing) • Annotating human language for natural language processing (NLP) • 22,000 labels for only $26 USD • Crowd’s consensus labels can replace traditional expert labels “Discovery” sparks rush for “gold” data across areas • Alonso et al., SIGIR Forum (Information Retrieval) • Kittur et al., CHI (Human-Computer Interaction) • Sorokin and Forsythe, CVPR (Computer Vision)
  • 50. 2010-11: Social & Behavioral Sciences 50 • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists
  • 51. The Future of Crowd Work (ACM CSCW’13) by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 51
  • 55. Tasks & datasets SYNTHETIC DATASETS • Syntactic parse trees – Distance function: evalb • Ranked lists – Distance function: Kendall’s tau REAL DATASETS • Biomedical text sequences – Distance function: Span F1 • Urdu-English translations – Distance function: GLEU 55 Nguyen et al 2017 Zaidan and Callison-Burch 2011
  • 56. Methods Baselines: • Random User (RU): pick one label randomly • ZenCrowd (ZC) (Demartini et al. 2012) – Weighted voting based on exact match (rare!) • Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017) – Sequence annotation task only Upper bound: Oracle (OR) (always picks best label) • Even if 5 workers answer, limited by best answer any of them gave 56
  • 57. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.246 Sequences F1 0.561 0.827 Parses EVALB 0.812 0.939 Rankings 0.491 0.724 57 • Diverse complex label datasets
  • 58. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 0.246 Sequences F1 0.561 0.569 0.827 Parses EVALB 0.812 0.819 0.939 Rankings 0.491 0.495 0.724 58 • Diverse complex label datasets
  • 59. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.246 Sequences F1 0.561 0.569 0.702 0.827 Parses EVALB 0.812 0.819 - 0.939 Rankings 0.491 0.495 - 0.724 59 • Diverse complex label datasets
  • 60. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.217 0.246 Sequences F1 0.561 0.569 0.702 0.709 0.827 Parses EVALB 0.812 0.819 - 0.932 0.939 Rankings 0.491 0.495 - 0.710 0.724 60 • Diverse complex label datasets • MAS aggregation is best way to get closer to ground truth with no model alteration between datasets
  • 62. 62 Goal: Design a future of Artificial Intelligence (AI) technologies to meet society’s needs and values. . http://goodsystems.utexas.edu Good Systems: an 8-year, $10M UT Austin Grand Challenge
  • 63. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at over 100 universities around the world 63 What’s an Information School?
  • 64. Task-specific workflows • Pros: – Empower workers for complex tasks • Cons: – Need new workflow for every task – Complicated, difficult to formulate Noronha et al 2011 (image analysis) Lasecki et al 2012 (transcription) 64