SlideShare a Scribd company logo
Web & Media Group
http://lora-aroyo.org @laroyo
CrowdTruth
7 Myths about Human Annotation
Web & Media Group
http://lora-aroyo.org @laroyo 2
Bulgaria
The Netherlands
Sofia 1997
2001
2006
Web & Media Group
2012 sabbatical @IBM Research
http://lora-aroyo.org @laroyo 3
Web & Media Group
2011
Web & Media Group
Open Domain Question-Answering Machine
– Rich Natural Language Questions
Won a 2-game Jeopardy match against all-time winners
Web & Media Group
http://lora-aroyo.org @laroyo 6
Web & Media Group
http://lora-aroyo.org @laroyo 7
Web & Media Group
Watson Education @ VU
•  Intro on Cognitive Computing & Watson
•  Lecture to 1st year bachelor IMM & CS
•  Watson & Social Web
•  Lecture to Master Information Science
•  Watson & Crowdsourcing
•  2 day course at Big Data in Society Summer School
•  9-10 July, 2015 (@VU)
•  Watson for Industry
•  2 day professional course @IBM Amsterdam
•  End September 2015
http://lora-aroyo.org @laroyo 8
Web & Media Group
http://lora-aroyo.org @laroyo 9
Web & Media Group
http://lora-aroyo.org @laroyo 10
Human Annotation
Central in Machine Learning
Training & Evaluation
Web & Media Group
http://lora-aroyo.org @laroyo 11
Fallacy of Universal Truth
The Experts Know Best
Web & Media Group
Cluster	
  1	
   Cluster	
  2	
   Cluster	
  3	
   Cluster	
  4	
   Cluster	
  5	
   Other	
  
passionate,	
   rollicking,	
   literate,	
   humorous,	
  silly,	
   aggressive,	
  fiery,	
   does	
  not	
  fit	
  into	
  
rousing,	
   cheerful,	
  fun,	
   poignant,	
  wis9ul,	
   campy,	
  quirky,	
   tense,	
  anxious,	
   any	
  of	
  the	
  5	
  
confident,	
   sweet,	
  amiable,	
   bi>ersweet,	
   whimsical,	
  wi>y,	
   intense,	
  vola?le,	
   clusters	
  
boisterous,	
   good-­‐natured	
   autumnal,	
   wry	
   visceral	
   	
  	
  
rowdy	
   	
  	
   brooding	
   	
  	
   	
  	
   	
  	
  
Choose one:
Which is the mood most appropriate
for each song?
One Truth?
Who is the
Expert?
Goal:
(Lee and Hu 2012)
http://lora-aroyo.org @laroyo 12
Web & Media Group
•  One truth: data collection efforts assume
one correct interpretation for every example
•  All examples are created equal: ground
truth treats all examples the same – either
match the correct result or not
•  Detailed guidelines help: if examples
cause disagreement - add instructions to
limit interpretations
•  Disagreement is bad: increase quality of
annotation data by reducing disagreement
among the annotators
•  One is enough: most of the annotated
examples are evaluated by one person
•  Experts are better: annotators with domain
knowledge provide better annotations
•  Once done, forever valid: annotations are
not updated; new data not aligned with old
7 Myths
myths directly influence the
practice of collecting human
annotated data; Need to be
revised with a new theory of
truth (CrowdTruth)
http://lora-aroyo.org @laroyo 13
Web & Media Group
human disagreement & vagueness of expression
are part of the human semantics
http://lora-aroyo.org @laroyo 14
Web & Media Group
disagreement is beautiful …
diversity of opinion
independent perspectives
multitude of contexts
gives the big picture
http://lora-aroyo.org @laroyo 15
Web & Media Group
http://lora-aroyo.org @laroyo 16
“we treat human brains as processors in a
distributed system each performing a small part
of a massive computation”
Human Computation
Luis von Ahn
Web & Media Group
crowd
annotatorannotation
example
annotation	
  
choices	
  
Knowlton,	
  J.Q.	
  	
  (1966).	
  On	
  the	
  De5inition	
  of	
  "Picture".	
  AV	
  Communication	
  Review.	
  14	
  (2),	
  157–183.	
  
passionate,	
   rollicking,	
   literate,	
   humorous,	
  silly,	
   aggressive,	
  fiery,	
   does	
  not	
  fit	
  into	
  
rousing,	
   cheerful,	
  fun,	
   poignant,	
  wis9ul,	
   campy,	
  quirky,	
   tense,	
  anxious,	
   any	
  of	
  the	
  5	
  
confident,	
   sweet,	
  amiable,	
   bi>ersweet,	
   whimsical,	
  wi>y,	
   intense,	
  vola?le,	
   clusters	
  
boisterous,	
   good-­‐natured	
   autumnal,	
   wry	
   visceral	
   	
  	
  
rowdy	
   	
  	
   brooding	
   	
  	
   	
  	
   	
  	
  
Cluster 1
Cluster 2
Cluster 5
Triangle of
disagreement
Web & Media Group
http://lora-aroyo.org @laroyo 18
•  annotator disagreement is signal, not noise.
•  it is indicative of the variation in human
semantic interpretation of signs
•  it can indicate ambiguity, vagueness,
similarity & quality
Web & Media Group
http://lora-aroyo.org @laroyo 19
Results from Crowdsourcing
Medical Relations in Text
Web & Media Group
http://lora-aroyo.org @laroyo 20
CrowdTruth.org
Web & Media Group
Crowd-Watson team 2013
http://lora-aroyo.org @laroyo 21
Web & Media Group
http://lora-aroyo.org @laroyo 22
Web & Media Group
CrowdTruth team is growing, 2014
http://lora-aroyo.org @laroyo 23
Web & Media Group
The Crew 2015
Web & Media Group
https://www.youtube.com/watch?v=CyAI_lVUdzM
To be AND not to be: quantum
intelligence?
Lora Aroyo & Chris Welty
http://lora-aroyo.org

More Related Content

What's hot

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...Marc Smith
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMichael Mathioudakis
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Mediasuresh sood
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit Ipkaviya
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overviewChris Taggart
 
Community-based Crowdsourcing
Community-based CrowdsourcingCommunity-based Crowdsourcing
Community-based CrowdsourcingAndrea Mauri
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social MediaDr Wasim Ahmed
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
"Who is this redchanit?" Applying digital methods for issue mapping to one we...
"Who is this redchanit?" Applying digital methods for issue mapping to one we..."Who is this redchanit?" Applying digital methods for issue mapping to one we...
"Who is this redchanit?" Applying digital methods for issue mapping to one we...Jean Burgess
 
Social Web for VU Dagje Studeren
Social Web for VU Dagje Studeren Social Web for VU Dagje Studeren
Social Web for VU Dagje Studeren Victor de Boer
 

What's hot (19)

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
About the Social Semantic Web
About the Social Semantic WebAbout the Social Semantic Web
About the Social Semantic Web
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
 
SDoW2010 keynote
SDoW2010 keynoteSDoW2010 keynote
SDoW2010 keynote
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...
Personal Digital Archiving 2011 - Charting Collections of Connections in Soci...
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Community-based Crowdsourcing
Community-based CrowdsourcingCommunity-based Crowdsourcing
Community-based Crowdsourcing
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
"Who is this redchanit?" Applying digital methods for issue mapping to one we...
"Who is this redchanit?" Applying digital methods for issue mapping to one we..."Who is this redchanit?" Applying digital methods for issue mapping to one we...
"Who is this redchanit?" Applying digital methods for issue mapping to one we...
 
Social Web for VU Dagje Studeren
Social Web for VU Dagje Studeren Social Web for VU Dagje Studeren
Social Web for VU Dagje Studeren
 

Similar to CrowdTruth @VU Faculty Colloquium (June 2015)

Communication between open source developers
Communication between open source developersCommunication between open source developers
Communication between open source developersAlexander Serebrenik
 
CS147 Social Mobile
CS147 Social MobileCS147 Social Mobile
CS147 Social Mobilemor
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social DataThe Open University
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...Maryam Farooq
 
Trends in internet use - how public radio fits in
Trends in internet use - how public radio fits inTrends in internet use - how public radio fits in
Trends in internet use - how public radio fits inLee Rainie
 
Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...James Gleeson
 
2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network AnalysisMarc Smith
 
Social Media for Fundrasing
Social Media for FundrasingSocial Media for Fundrasing
Social Media for FundrasingLasa UK
 
Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.The New School
 
Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social InnovationAshwin Ram
 
Pew Internet: The New News Media-scape
Pew Internet: The New News Media-scapePew Internet: The New News Media-scape
Pew Internet: The New News Media-scapeRafal
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 

Similar to CrowdTruth @VU Faculty Colloquium (June 2015) (20)

Communication between open source developers
Communication between open source developersCommunication between open source developers
Communication between open source developers
 
CS147 Social Mobile
CS147 Social MobileCS147 Social Mobile
CS147 Social Mobile
 
Social Multimedia as Sensors
Social Multimedia as SensorsSocial Multimedia as Sensors
Social Multimedia as Sensors
 
Methods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social ParticipationMethods and Tools for Facilitating Social Participation
Methods and Tools for Facilitating Social Participation
 
The Networked Librarian: Libraries as social networks
The Networked Librarian: Libraries as social networksThe Networked Librarian: Libraries as social networks
The Networked Librarian: Libraries as social networks
 
Ifip wg-galway-
Ifip wg-galway-Ifip wg-galway-
Ifip wg-galway-
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social Data
 
Libraries and learning communities - Internet Librarian
Libraries and learning communities - Internet LibrarianLibraries and learning communities - Internet Librarian
Libraries and learning communities - Internet Librarian
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
 
Digital technology impacts by 2020
Digital technology impacts by 2020Digital technology impacts by 2020
Digital technology impacts by 2020
 
Trends in internet use - how public radio fits in
Trends in internet use - how public radio fits inTrends in internet use - how public radio fits in
Trends in internet use - how public radio fits in
 
Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...
 
2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis
 
Social Media for Fundrasing
Social Media for FundrasingSocial Media for Fundrasing
Social Media for Fundrasing
 
Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.
 
Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social Innovation
 
2009 Feb 17 Public Broadcasters
2009 Feb 17  Public Broadcasters2009 Feb 17  Public Broadcasters
2009 Feb 17 Public Broadcasters
 
Pew Internet: The New News Media-scape
Pew Internet: The New News Media-scapePew Internet: The New News Media-scape
Pew Internet: The New News Media-scape
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Knowledge Sharing in the Networked World of the Internet of Things
Knowledge Sharing in the Networked World of the Internet of ThingsKnowledge Sharing in the Networked World of the Internet of Things
Knowledge Sharing in the Networked World of the Internet of Things
 

More from Lora Aroyo

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningLora Aroyo
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Lora Aroyo
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AILora Aroyo
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumLora Aroyo
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyLora Aroyo
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...Lora Aroyo
 

More from Lora Aroyo (20)

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AI
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

CrowdTruth @VU Faculty Colloquium (June 2015)

  • 1. Web & Media Group http://lora-aroyo.org @laroyo CrowdTruth 7 Myths about Human Annotation
  • 2. Web & Media Group http://lora-aroyo.org @laroyo 2 Bulgaria The Netherlands Sofia 1997 2001 2006
  • 3. Web & Media Group 2012 sabbatical @IBM Research http://lora-aroyo.org @laroyo 3
  • 4. Web & Media Group 2011
  • 5. Web & Media Group Open Domain Question-Answering Machine – Rich Natural Language Questions Won a 2-game Jeopardy match against all-time winners
  • 6. Web & Media Group http://lora-aroyo.org @laroyo 6
  • 7. Web & Media Group http://lora-aroyo.org @laroyo 7
  • 8. Web & Media Group Watson Education @ VU •  Intro on Cognitive Computing & Watson •  Lecture to 1st year bachelor IMM & CS •  Watson & Social Web •  Lecture to Master Information Science •  Watson & Crowdsourcing •  2 day course at Big Data in Society Summer School •  9-10 July, 2015 (@VU) •  Watson for Industry •  2 day professional course @IBM Amsterdam •  End September 2015 http://lora-aroyo.org @laroyo 8
  • 9. Web & Media Group http://lora-aroyo.org @laroyo 9
  • 10. Web & Media Group http://lora-aroyo.org @laroyo 10 Human Annotation Central in Machine Learning Training & Evaluation
  • 11. Web & Media Group http://lora-aroyo.org @laroyo 11 Fallacy of Universal Truth The Experts Know Best
  • 12. Web & Media Group Cluster  1   Cluster  2   Cluster  3   Cluster  4   Cluster  5   Other   passionate,   rollicking,   literate,   humorous,  silly,   aggressive,  fiery,   does  not  fit  into   rousing,   cheerful,  fun,   poignant,  wis9ul,   campy,  quirky,   tense,  anxious,   any  of  the  5   confident,   sweet,  amiable,   bi>ersweet,   whimsical,  wi>y,   intense,  vola?le,   clusters   boisterous,   good-­‐natured   autumnal,   wry   visceral       rowdy       brooding               Choose one: Which is the mood most appropriate for each song? One Truth? Who is the Expert? Goal: (Lee and Hu 2012) http://lora-aroyo.org @laroyo 12
  • 13. Web & Media Group •  One truth: data collection efforts assume one correct interpretation for every example •  All examples are created equal: ground truth treats all examples the same – either match the correct result or not •  Detailed guidelines help: if examples cause disagreement - add instructions to limit interpretations •  Disagreement is bad: increase quality of annotation data by reducing disagreement among the annotators •  One is enough: most of the annotated examples are evaluated by one person •  Experts are better: annotators with domain knowledge provide better annotations •  Once done, forever valid: annotations are not updated; new data not aligned with old 7 Myths myths directly influence the practice of collecting human annotated data; Need to be revised with a new theory of truth (CrowdTruth) http://lora-aroyo.org @laroyo 13
  • 14. Web & Media Group human disagreement & vagueness of expression are part of the human semantics http://lora-aroyo.org @laroyo 14
  • 15. Web & Media Group disagreement is beautiful … diversity of opinion independent perspectives multitude of contexts gives the big picture http://lora-aroyo.org @laroyo 15
  • 16. Web & Media Group http://lora-aroyo.org @laroyo 16 “we treat human brains as processors in a distributed system each performing a small part of a massive computation” Human Computation Luis von Ahn
  • 17. Web & Media Group crowd annotatorannotation example annotation   choices   Knowlton,  J.Q.    (1966).  On  the  De5inition  of  "Picture".  AV  Communication  Review.  14  (2),  157–183.   passionate,   rollicking,   literate,   humorous,  silly,   aggressive,  fiery,   does  not  fit  into   rousing,   cheerful,  fun,   poignant,  wis9ul,   campy,  quirky,   tense,  anxious,   any  of  the  5   confident,   sweet,  amiable,   bi>ersweet,   whimsical,  wi>y,   intense,  vola?le,   clusters   boisterous,   good-­‐natured   autumnal,   wry   visceral       rowdy       brooding               Cluster 1 Cluster 2 Cluster 5 Triangle of disagreement
  • 18. Web & Media Group http://lora-aroyo.org @laroyo 18 •  annotator disagreement is signal, not noise. •  it is indicative of the variation in human semantic interpretation of signs •  it can indicate ambiguity, vagueness, similarity & quality
  • 19. Web & Media Group http://lora-aroyo.org @laroyo 19 Results from Crowdsourcing Medical Relations in Text
  • 20. Web & Media Group http://lora-aroyo.org @laroyo 20 CrowdTruth.org
  • 21. Web & Media Group Crowd-Watson team 2013 http://lora-aroyo.org @laroyo 21
  • 22. Web & Media Group http://lora-aroyo.org @laroyo 22
  • 23. Web & Media Group CrowdTruth team is growing, 2014 http://lora-aroyo.org @laroyo 23
  • 24. Web & Media Group The Crew 2015
  • 25. Web & Media Group https://www.youtube.com/watch?v=CyAI_lVUdzM To be AND not to be: quantum intelligence? Lora Aroyo & Chris Welty http://lora-aroyo.org