Crowdsourcing ambiguity aware ground truth - collective intelligence 2017

Lora Aroyo
Lora AroyoProfessor Human Computer Interaction at Vrije Universiteit Amsterdam / VU
Crowdsourcing Ambiguity-Aware
Ground Truth
Chris Welty, Anca Dumitrache, Oana Inel,
Benjamin Timmermans, Lora Aroyo
June 15th, 2017
Collective Intelligence Conference
• rather than accepting
disagreement as a natural
property of semantic
interpretation
• traditionally, disagreement is
considered a measure of poor
quality in the annotation task
because:
– task is poorly defined or
– annotators lack training
This makes the elimination of
disagreement a goal
What if it is GOOD?
Crowdsourcing
Myth: Disagreement
is Bad
• typically annotators are asked whether a
binary property holds for each example
• often not given a chance to say that the
property may partially hold, or holds
but is not clearly expressed
• mathematics of using ground truth
treats every example the same – either
match correct result or not
• poor quality examples tend to generate
high disagreement
• disagreement allows us to weight
sentences, giving us the ability to both
train and evaluate a machine in a more
flexible way
Crowdsourcing
Myth: All Examples
Are Created Equal
What if they are DIFFERENT?
Related Work on Annotating Ambiguity
http://CrowdTruth.org
● Juergens (2013): For word-sense disambiguation, the crowd with
ambiguity modeling was able to achieve expert-level quality of
annotations.
● Cheatham et al. (2014): Current benchmarks in ontology alignment
and evaluation are not designed to model uncertainty caused by
disagreement between annotators, both expert and crowd.
● Plank et al. (2014): For part-of-speech tagging, most inter-annotator
disagreement pointed to debatable cases in linguistic theory.
● Chang et al. (2017): In a workflow of tasks for collecting and
correcting labels for text and images, found that many ambiguous
cases cannot be resolved by better annotation guidelines or through
worker quality control.
• Annotator disagreement
is signal, not noise.
• It is indicative of the
variation in human
semantic interpretation of
signs
• It can indicate ambiguity,
vagueness, similarity,
over-generality, etc,
as well as quality
CrowdTruth
http://CrowdTruth.org
Medical Relation Extraction
1. workers select the medical relations that
hold between the given 2 terms in a
sentence
2. closed task - workers pick from given set of
14 top UMLS relations
3. 975 sentences from PubMed abstracts +
distant supervision for term pair extraction
4. 15 workers /sentence
Twitter Event Extraction
1. workers select the events expressed
in the tweet
2. closed task - workers pick from set of
8 events
3. 2,019 English tweets, crawled based
on event hashtags
4. 7 workers /tweet
(*) no expert annotators
News Events Identification
1. workers highlight words/phrases that
describe an event in the sentence
2. open-ended task - workers can pick any
words
3. 200 sentences from English TimeBank
corpus
4. 15 workers /sentence
Sound Interpretation
1. workers give tags that describe a
sound
2. open-ended task - workers can pick
any tag
3. 284 sounds from Freesound
database
4. 10 workers /sentence
http://CrowdTruth.org
Medical Relation Extraction
http://CrowdTruth.org
Patients with ACUTE FEVER and nausea could be suffering from
INFLUENZA AH1N1
Is ACUTE FEVER – related to → INFLUENZA AH1N1?
Twitter Event Extraction
http://CrowdTruth.org
News Event Identification
http://CrowdTruth.org
Sound Interpretation
http://CrowdTruth.org
Medical Relation Extraction
1. workers select the medical relations that
hold between the given 2 terms in a
sentence
2. closed task - workers pick from given set of
14 top UMLS relations
3. 975 sentences from PubMed abstracts +
distant supervision for term pair extraction
4. 15 workers /sentence
Twitter Event Extraction
1. workers select the events expressed
in the tweet
2. closed task - workers pick from set of
8 events
3. 2,019 English tweets, crawled based
on event hashtags
4. 7 workers /tweet
(*) no expert annotators
News Events Identification
1. workers highlight words/phrases that
describe an event in the sentence
2. open-ended task - workers can pick any
words
3. 200 sentences from English TimeBank
corpus
4. 15 workers /sentence
Sound Interpretation
1. workers give tags that describe a
sound
2. open-ended task - workers can pick
any tag
3. 284 sounds from Freesound
database
4. 10 workers /sentence
http://CrowdTruth.org
1 1 1
Worker Vector - Closed Task
http://CrowdTruth.org
Medical Relation Extraction
1 1 1
1 1
1
1 1
1 1
1 1
1
1
1
0 1 1 0 0 4 3 0 0 5 1 0
Media Unit Vector - Closed Task
http://CrowdTruth.org
Unclear relationship between the two arguments reflected
in the disagreement
Medical Relation Extraction
http://CrowdTruth.org
Clearly expressed relation between the two arguments reflected in
the agreement
Medical Relation Extraction
http://CrowdTruth.org
Twitter Event Extraction
http://CrowdTruth.org
The tension building between China, Japan, U.S., and Vietnam is
reaching new heights this week - let's stay tuned! #APAC
#powerstruggle
Which of the following EVENTS can you identify in the tweet?
4 0 0 0 6 0 0 0 2
islands disputed
between China
and Japan
anti China
protests in
Vietnam
None
Unclear description of events in tweet
reflected in the disagreement
Twitter Event Extraction
http://CrowdTruth.org
RT @jc_stubbs: Another tragic day in #Ukraine - More than 50
rebels killed as new leader unleashes assault:
http://t.co/wcfU3kyAFX
Which of the following EVENTS can you identify in the tweet?
0 6 0 0 0 0 0 0 1
Ukraine
crisis 2014
None
Clear description of events in tweet
reflected in the agreement
Media Unit Vector - Open Task
http://CrowdTruth.org
Worker 1: balloon exploding
Worker 2: gun
Worker 3: gunshot
Worker 4: loud noise, pop
Worker 5: gun, balloon
2 3 1 1
balloon
gun,gunshot
loudnoise
pop
+ clustering
(e.g. word2vec)
Sound Interpretation
Multiple interpretations of the sound
reflected in the disagreement
Sound Interpretation
http://CrowdTruth.org
Worker 1: siren
Worker 2: police siren
Worker 3: siren, alarm
Worker 4: siren
Worker 5: annoying
4 1 1
siren
alarm
annoying
+ clustering
(e.g. word2vec)
Clear meaning of the sound
reflected in the agreement
News Event Identification
http://CrowdTruth.org
Other than usage in business, Internet technology is also beginning
to infiltrate the lifestyle domain .
HIGHLIGHT the words/phrases that refer to an EVENT.
5 4 2 2 4 5 3Internet
technology
is
also
beginning
infiltrate
lifestyle
1
business
3
usage
0
Other
3
domain
Unclear specification of events in sentence
reflected in the disagreement
Clear specification of events in sentence
reflected in the agreement
News Event Identification
http://CrowdTruth.org
Most of the city's monuments were destroyed including a
magnificent tiled mosque which dominated the skyline for centuries.
HIGHLIGHT the words/phrases that refer to an EVENT.
5 12 0 1 1 1 4 1 0
were
destroyed
including
magnificent
tiled
mosque
dominated
skyline
centuries
5
monuments
3
city
1
Most
Measures how clearly a media unit expresses an
annotation
1
0 1 1 0 0 4 3 0 0 5 1 0
Unit vector for
annotation A6
Media Unit Vector
Cosine = .55
Media Unit - Annotation Score
http://CrowdTruth.org
• Goal: what is the most accurate data
labeling method?
○ CrowdTruth: media unit-annotation
score (continuous value)
○ Majority Vote: decision of majority of
workers (discrete)
○ Single: decision of a single worker,
randomly sampled (discrete)
○ Expert: decision of domain expert
(discrete)
• Approach: evaluate with trusted
labels set with either:
○ agreement between crowd and expert, or
○ manual evaluation when disagreement
Experimental Setup
http://CrowdTruth.org
Evaluation: F1 score
Evaluation: F1 score
CrowdTruth performs better than Majority
Vote, at least as well as Expert.
Each task has different best thresholds in
the media unit - annotation score.
Evaluation: number of workers
Evaluation: number of workers
Each task reaches stable F1 for different
number of workers.
Majority Vote never beats CrowdTruth.
Sound Interpretation needs more workers.
• CrowdTruth performs just as
well as domain experts
• crowd is also cheaper
• crowd is always available
• capturing ambiguity is essential:
• majority voting discards
important signals in the data
• using only a few annotators for
ground truth is faulty
• optimal number of workers /
media unit is task dependent
Experiments
proved that:
http://CrowdTruth.org
CrowdTruth.org
umitrache et al.: Empirical Methodology for Crowdsourcing Ground
ruth. Semantic Web Journal Special Issue on Human Computation
nd Crowdsourcing (HC&C) in the Context of the Semantic Web (in
review).
Ambiguity in Crowdsourcing
http://CrowdTruth.org
Media Unit
Annotation Worker
Ambiguity in Crowdsourcing
http://CrowdTruth.org
Media Unit
Annotation Worker
Ambiguity in Crowdsourcing
http://CrowdTruth.org
Media Unit
Annotation Worker
Disagreement can indicate ambiguity!
1 of 32

Recommended

CrowdTruth Tutorial: Using the Crowd to Understand Ambiguity by
CrowdTruth Tutorial: Using the Crowd to Understand AmbiguityCrowdTruth Tutorial: Using the Crowd to Understand Ambiguity
CrowdTruth Tutorial: Using the Crowd to Understand AmbiguityAnca Dumitrache
2.1K views59 slides
Harnessing diversity in crowds and machines for better ner performance by
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceoanainel
4.5K views25 slides
ESWC - PhD Symposium 2016 by
ESWC - PhD Symposium 2016ESWC - PhD Symposium 2016
ESWC - PhD Symposium 2016oanainel
4.1K views19 slides
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014 by
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Lora Aroyo
8.2K views34 slides
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone by
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
2.8K views84 slides
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr... by
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
28.6K views40 slides

More Related Content

What's hot

(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 by
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 Lora Aroyo
5.1K views20 slides
Towards Explainable Fact Checking (DIKU Business Club presentation) by
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Isabelle Augenstein
323 views38 slides
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote) by
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
5.2K views82 slides
Frontiers of Computational Journalism week 3 - Information Filter Design by
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignJonathan Stray
530 views69 slides
Practical Opinion Mining for Social Media by
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaDiana Maynard
5.7K views168 slides
Frontiers of Computational Journalism week 2 - Text Analysis by
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
527 views80 slides

What's hot(20)

(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 by Lora Aroyo
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
Lora Aroyo5.1K views
Towards Explainable Fact Checking (DIKU Business Club presentation) by Isabelle Augenstein
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote) by Lora Aroyo
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Lora Aroyo5.2K views
Frontiers of Computational Journalism week 3 - Information Filter Design by Jonathan Stray
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter Design
Jonathan Stray530 views
Practical Opinion Mining for Social Media by Diana Maynard
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
Diana Maynard5.7K views
Frontiers of Computational Journalism week 2 - Text Analysis by Jonathan Stray
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
Jonathan Stray527 views
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio... by Jonathan Stray
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Jonathan Stray481 views
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ... by Cataldo Musto
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
Cataldo Musto6.7K views
Understanding the world with NLP: interactions between society, behaviour and... by Diana Maynard
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
Diana Maynard202 views
Challenges of social media analysis in the real world by Diana Maynard
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
Diana Maynard2.3K views
Social Web 2014: Final Presentations (Part I) by Lora Aroyo
Social Web 2014: Final Presentations (Part I)Social Web 2014: Final Presentations (Part I)
Social Web 2014: Final Presentations (Part I)
Lora Aroyo5.2K views
Who to follow and why: link prediction with explanations by Nicola Barbieri
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
Nicola Barbieri1.7K views
Groundhog Day: Near-Duplicate Detection on Twitter by Ke Tao
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
Ke Tao2.4K views

Similar to Crowdsourcing ambiguity aware ground truth - collective intelligence 2017

The United National Human Development by
The United National Human DevelopmentThe United National Human Development
The United National Human DevelopmentApril Davis
2 views40 slides
#CrowdTruth: Linked Data for Information Extraction @ISWC2015 by
#CrowdTruth: Linked Data for Information Extraction @ISWC2015#CrowdTruth: Linked Data for Information Extraction @ISWC2015
#CrowdTruth: Linked Data for Information Extraction @ISWC2015Lora Aroyo
856 views25 slides
Mike Thelwall: Introduction to Webometrics by
Mike Thelwall: Introduction to WebometricsMike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to WebometricsLibrary and Information Science Research Coalition
3.8K views44 slides
2016 06 27 dia ibara e_source final distribution by
2016 06 27 dia ibara e_source final distribution2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distributionMichael Ibara
418 views56 slides
Disaster Management Discussion Board 3.docx by
Disaster Management Discussion Board 3.docxDisaster Management Discussion Board 3.docx
Disaster Management Discussion Board 3.docxstirlingvwriters
5 views2 slides
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun... by
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle.ai
45 views19 slides

Similar to Crowdsourcing ambiguity aware ground truth - collective intelligence 2017(20)

The United National Human Development by April Davis
The United National Human DevelopmentThe United National Human Development
The United National Human Development
April Davis2 views
#CrowdTruth: Linked Data for Information Extraction @ISWC2015 by Lora Aroyo
#CrowdTruth: Linked Data for Information Extraction @ISWC2015#CrowdTruth: Linked Data for Information Extraction @ISWC2015
#CrowdTruth: Linked Data for Information Extraction @ISWC2015
Lora Aroyo856 views
2016 06 27 dia ibara e_source final distribution by Michael Ibara
2016 06 27 dia ibara e_source final distribution2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distribution
Michael Ibara418 views
Disaster Management Discussion Board 3.docx by stirlingvwriters
Disaster Management Discussion Board 3.docxDisaster Management Discussion Board 3.docx
Disaster Management Discussion Board 3.docx
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun... by Spotle.ai
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle.ai45 views
A Dynamic Delphi Process Utilizing a Modified Thurstone Scaling Method: Colla... by Connie White
A Dynamic Delphi Process Utilizing a Modified Thurstone Scaling Method: Colla...A Dynamic Delphi Process Utilizing a Modified Thurstone Scaling Method: Colla...
A Dynamic Delphi Process Utilizing a Modified Thurstone Scaling Method: Colla...
Connie White556 views
LESSON 1 introduction to media and information Literacy.pptx by TeacherRen
LESSON 1 introduction to media and information Literacy.pptxLESSON 1 introduction to media and information Literacy.pptx
LESSON 1 introduction to media and information Literacy.pptx
TeacherRen52 views
Game Theory And Best Decision Essay by Jamie Miller
Game Theory And Best Decision EssayGame Theory And Best Decision Essay
Game Theory And Best Decision Essay
Jamie Miller2 views
Symbolic Convergence Theory Of Public Health by Brenda Higgins
Symbolic Convergence Theory Of Public HealthSymbolic Convergence Theory Of Public Health
Symbolic Convergence Theory Of Public Health
Brenda Higgins2 views
Delaunay Triangulation Algorithms by Alyssa Jones
Delaunay Triangulation AlgorithmsDelaunay Triangulation Algorithms
Delaunay Triangulation Algorithms
Alyssa Jones4 views
Ama Social Media Pres100109 by Gary Stein
Ama Social Media Pres100109Ama Social Media Pres100109
Ama Social Media Pres100109
Gary Stein934 views
Technology & Innovation Management Course - Session 1 by Dan Toma
Technology & Innovation Management Course - Session 1Technology & Innovation Management Course - Session 1
Technology & Innovation Management Course - Session 1
Dan Toma2.9K views
Sentiment Analysis using Fuzzy logic by Vinay Sawant
Sentiment Analysis using Fuzzy logicSentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logic
Vinay Sawant650 views
International Journal of Computer Science, Engineering and Information Techno... by ijcseit
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
ijcseit12 views
SENTIMENT ANALYSIS BY USING FUZZY LOGIC by ijcseit
SENTIMENT ANALYSIS BY USING FUZZY LOGICSENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
ijcseit26 views
Sentiment analysis by using fuzzy logic by ijcseit
Sentiment analysis by using fuzzy logicSentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logic
ijcseit942 views
Monitoring real time public vaccine confidence through social media (Francesc... by Data Driven Innovation
Monitoring real time public vaccine confidence through social media (Francesc...Monitoring real time public vaccine confidence through social media (Francesc...
Monitoring real time public vaccine confidence through social media (Francesc...

More from Lora Aroyo

CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning by
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningLora Aroyo
586 views20 slides
Harnessing Human Semantics at Scale (updated) by
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Lora Aroyo
124 views66 slides
Data excellence: Better data for better AI by
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AILora Aroyo
987 views35 slides
CHIP Demonstrator presentation @ CATCH Symposium by
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
263 views22 slides
Semantic Web Challenge: CHIP Demonstrator by
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
181 views17 slides
The Rijksmuseum Collection as Linked Data by
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
1.9K views26 slides

More from Lora Aroyo(20)

CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning by Lora Aroyo
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
Lora Aroyo586 views
Harnessing Human Semantics at Scale (updated) by Lora Aroyo
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)
Lora Aroyo124 views
Data excellence: Better data for better AI by Lora Aroyo
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AI
Lora Aroyo987 views
CHIP Demonstrator presentation @ CATCH Symposium by Lora Aroyo
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
Lora Aroyo263 views
Semantic Web Challenge: CHIP Demonstrator by Lora Aroyo
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
Lora Aroyo181 views
The Rijksmuseum Collection as Linked Data by Lora Aroyo
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
Lora Aroyo1.9K views
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum by Lora Aroyo
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Lora Aroyo832 views
FAIRview: Responsible Video Summarization @NYCML'18 by Lora Aroyo
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
Lora Aroyo699 views
Understanding bias in video news & news filtering algorithms by Lora Aroyo
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
Lora Aroyo356 views
StorySourcing: Telling Stories with Humans & Machines by Lora Aroyo
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
Lora Aroyo726 views
Data Science with Humans in the Loop by Lora Aroyo
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
Lora Aroyo3.7K views
Digital Humanities Benelux 2017: Keynote Lora Aroyo by Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Lora Aroyo2.4K views
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev... by Lora Aroyo
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
Lora Aroyo4.5K views
Data Science with Human in the Loop @Faculty of Science #Leiden University by Lora Aroyo
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
Lora Aroyo1.1K views
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search by Lora Aroyo
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
Lora Aroyo18.4K views
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age by Lora Aroyo
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Lora Aroyo578 views
"Video Killed the Radio Star": From MTV to Snapchat by Lora Aroyo
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
Lora Aroyo622 views
UMAP 2016 Opening Ceremony by Lora Aroyo
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
Lora Aroyo475 views
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr... by Lora Aroyo
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Lora Aroyo1.1K views
Stitch by Stitch: Annotating Fashion at the Rijksmuseum by Lora Aroyo
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Lora Aroyo16.6K views

Recently uploaded

"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
38 views34 slides
Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
180 views13 slides
MVP and prioritization.pdf by
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
40 views8 slides
Measuring User on the web with the core web vitals - by @theafolayan.pptx by
Measuring User on the web with the core web vitals - by @theafolayan.pptxMeasuring User on the web with the core web vitals - by @theafolayan.pptx
Measuring User on the web with the core web vitals - by @theafolayan.pptxOluwaseun Raphael Afolayan
14 views13 slides
The Coming AI Tsunami.pptx by
The Coming AI Tsunami.pptxThe Coming AI Tsunami.pptx
The Coming AI Tsunami.pptxjohnhandby
14 views12 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
120 views12 slides

Recently uploaded(20)

"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays38 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10180 views
The Coming AI Tsunami.pptx by johnhandby
The Coming AI Tsunami.pptxThe Coming AI Tsunami.pptx
The Coming AI Tsunami.pptx
johnhandby14 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue120 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada43 views
Measurecamp Brussels - Synthetic data.pdf by Human37
Measurecamp Brussels - Synthetic data.pdfMeasurecamp Brussels - Synthetic data.pdf
Measurecamp Brussels - Synthetic data.pdf
Human37 27 views
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell by Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 views
AIM102-S_Cognizant_CognizantCognitive by PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford23 views
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf by MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage65 views
AI + Memoori = AIM by Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori15 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu474 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays37 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri44 views

Crowdsourcing ambiguity aware ground truth - collective intelligence 2017

  • 1. Crowdsourcing Ambiguity-Aware Ground Truth Chris Welty, Anca Dumitrache, Oana Inel, Benjamin Timmermans, Lora Aroyo June 15th, 2017 Collective Intelligence Conference
  • 2. • rather than accepting disagreement as a natural property of semantic interpretation • traditionally, disagreement is considered a measure of poor quality in the annotation task because: – task is poorly defined or – annotators lack training This makes the elimination of disagreement a goal What if it is GOOD? Crowdsourcing Myth: Disagreement is Bad
  • 3. • typically annotators are asked whether a binary property holds for each example • often not given a chance to say that the property may partially hold, or holds but is not clearly expressed • mathematics of using ground truth treats every example the same – either match correct result or not • poor quality examples tend to generate high disagreement • disagreement allows us to weight sentences, giving us the ability to both train and evaluate a machine in a more flexible way Crowdsourcing Myth: All Examples Are Created Equal What if they are DIFFERENT?
  • 4. Related Work on Annotating Ambiguity http://CrowdTruth.org ● Juergens (2013): For word-sense disambiguation, the crowd with ambiguity modeling was able to achieve expert-level quality of annotations. ● Cheatham et al. (2014): Current benchmarks in ontology alignment and evaluation are not designed to model uncertainty caused by disagreement between annotators, both expert and crowd. ● Plank et al. (2014): For part-of-speech tagging, most inter-annotator disagreement pointed to debatable cases in linguistic theory. ● Chang et al. (2017): In a workflow of tasks for collecting and correcting labels for text and images, found that many ambiguous cases cannot be resolved by better annotation guidelines or through worker quality control.
  • 5. • Annotator disagreement is signal, not noise. • It is indicative of the variation in human semantic interpretation of signs • It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality CrowdTruth http://CrowdTruth.org
  • 6. Medical Relation Extraction 1. workers select the medical relations that hold between the given 2 terms in a sentence 2. closed task - workers pick from given set of 14 top UMLS relations 3. 975 sentences from PubMed abstracts + distant supervision for term pair extraction 4. 15 workers /sentence Twitter Event Extraction 1. workers select the events expressed in the tweet 2. closed task - workers pick from set of 8 events 3. 2,019 English tweets, crawled based on event hashtags 4. 7 workers /tweet (*) no expert annotators News Events Identification 1. workers highlight words/phrases that describe an event in the sentence 2. open-ended task - workers can pick any words 3. 200 sentences from English TimeBank corpus 4. 15 workers /sentence Sound Interpretation 1. workers give tags that describe a sound 2. open-ended task - workers can pick any tag 3. 284 sounds from Freesound database 4. 10 workers /sentence http://CrowdTruth.org
  • 7. Medical Relation Extraction http://CrowdTruth.org Patients with ACUTE FEVER and nausea could be suffering from INFLUENZA AH1N1 Is ACUTE FEVER – related to → INFLUENZA AH1N1?
  • 11. Medical Relation Extraction 1. workers select the medical relations that hold between the given 2 terms in a sentence 2. closed task - workers pick from given set of 14 top UMLS relations 3. 975 sentences from PubMed abstracts + distant supervision for term pair extraction 4. 15 workers /sentence Twitter Event Extraction 1. workers select the events expressed in the tweet 2. closed task - workers pick from set of 8 events 3. 2,019 English tweets, crawled based on event hashtags 4. 7 workers /tweet (*) no expert annotators News Events Identification 1. workers highlight words/phrases that describe an event in the sentence 2. open-ended task - workers can pick any words 3. 200 sentences from English TimeBank corpus 4. 15 workers /sentence Sound Interpretation 1. workers give tags that describe a sound 2. open-ended task - workers can pick any tag 3. 284 sounds from Freesound database 4. 10 workers /sentence http://CrowdTruth.org
  • 12. 1 1 1 Worker Vector - Closed Task http://CrowdTruth.org Medical Relation Extraction
  • 13. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0 Media Unit Vector - Closed Task http://CrowdTruth.org
  • 14. Unclear relationship between the two arguments reflected in the disagreement Medical Relation Extraction http://CrowdTruth.org
  • 15. Clearly expressed relation between the two arguments reflected in the agreement Medical Relation Extraction http://CrowdTruth.org
  • 16. Twitter Event Extraction http://CrowdTruth.org The tension building between China, Japan, U.S., and Vietnam is reaching new heights this week - let's stay tuned! #APAC #powerstruggle Which of the following EVENTS can you identify in the tweet? 4 0 0 0 6 0 0 0 2 islands disputed between China and Japan anti China protests in Vietnam None Unclear description of events in tweet reflected in the disagreement
  • 17. Twitter Event Extraction http://CrowdTruth.org RT @jc_stubbs: Another tragic day in #Ukraine - More than 50 rebels killed as new leader unleashes assault: http://t.co/wcfU3kyAFX Which of the following EVENTS can you identify in the tweet? 0 6 0 0 0 0 0 0 1 Ukraine crisis 2014 None Clear description of events in tweet reflected in the agreement
  • 18. Media Unit Vector - Open Task http://CrowdTruth.org Worker 1: balloon exploding Worker 2: gun Worker 3: gunshot Worker 4: loud noise, pop Worker 5: gun, balloon 2 3 1 1 balloon gun,gunshot loudnoise pop + clustering (e.g. word2vec) Sound Interpretation Multiple interpretations of the sound reflected in the disagreement
  • 19. Sound Interpretation http://CrowdTruth.org Worker 1: siren Worker 2: police siren Worker 3: siren, alarm Worker 4: siren Worker 5: annoying 4 1 1 siren alarm annoying + clustering (e.g. word2vec) Clear meaning of the sound reflected in the agreement
  • 20. News Event Identification http://CrowdTruth.org Other than usage in business, Internet technology is also beginning to infiltrate the lifestyle domain . HIGHLIGHT the words/phrases that refer to an EVENT. 5 4 2 2 4 5 3Internet technology is also beginning infiltrate lifestyle 1 business 3 usage 0 Other 3 domain Unclear specification of events in sentence reflected in the disagreement
  • 21. Clear specification of events in sentence reflected in the agreement News Event Identification http://CrowdTruth.org Most of the city's monuments were destroyed including a magnificent tiled mosque which dominated the skyline for centuries. HIGHLIGHT the words/phrases that refer to an EVENT. 5 12 0 1 1 1 4 1 0 were destroyed including magnificent tiled mosque dominated skyline centuries 5 monuments 3 city 1 Most
  • 22. Measures how clearly a media unit expresses an annotation 1 0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for annotation A6 Media Unit Vector Cosine = .55 Media Unit - Annotation Score http://CrowdTruth.org
  • 23. • Goal: what is the most accurate data labeling method? ○ CrowdTruth: media unit-annotation score (continuous value) ○ Majority Vote: decision of majority of workers (discrete) ○ Single: decision of a single worker, randomly sampled (discrete) ○ Expert: decision of domain expert (discrete) • Approach: evaluate with trusted labels set with either: ○ agreement between crowd and expert, or ○ manual evaluation when disagreement Experimental Setup http://CrowdTruth.org
  • 25. Evaluation: F1 score CrowdTruth performs better than Majority Vote, at least as well as Expert. Each task has different best thresholds in the media unit - annotation score.
  • 27. Evaluation: number of workers Each task reaches stable F1 for different number of workers. Majority Vote never beats CrowdTruth. Sound Interpretation needs more workers.
  • 28. • CrowdTruth performs just as well as domain experts • crowd is also cheaper • crowd is always available • capturing ambiguity is essential: • majority voting discards important signals in the data • using only a few annotators for ground truth is faulty • optimal number of workers / media unit is task dependent Experiments proved that: http://CrowdTruth.org
  • 29. CrowdTruth.org umitrache et al.: Empirical Methodology for Crowdsourcing Ground ruth. Semantic Web Journal Special Issue on Human Computation nd Crowdsourcing (HC&C) in the Context of the Semantic Web (in review).
  • 32. Ambiguity in Crowdsourcing http://CrowdTruth.org Media Unit Annotation Worker Disagreement can indicate ambiguity!