SlideShare a Scribd company logo
University of Sheffield, NLP 
Crowdsourcing Best Practices 
Marta Sabou, Kalina Bontcheva 
Leon Derczynski, Arno Scharl
University of Sheffield, NLP 
The Science of Corpus Annotation 
• Quite well understood best practice in how to create linguistic 
annotation of consistently high quality by employing, training, and 
managing groups of linguistic and/or domain experts 
• Necessary in order to ensure reusability and repeatability of results 
• The acquired corpora are of very high quality 
• Costs are unfortunately also very high: estimated at between $0.36 
and $1.0 per annotation (Zaidan and Callison-Burch, 2011; Poesio et 
al., 2012)
University of Sheffield, NLP 
Goals 
What is crowdsourcing? 
What is a typical workflow for crowdsoucing NLP tasks? 
What are general solutions used by the state of the art? 
How do different crowdsourcing genres compare?
University of Sheffield, NLP
University of Sheffield, NLP 
Undefined and generally large group 
Compared to in-house projects: 
• cheaper (with 33%) 
• reach to large number of users; 
• reach to diverse user groups, 
e.g., speakers of rare languages
University of Sheffield, NLP 
Genre 1: Mechanised Labour 
• Participants (workers) paid a small amount of money to 
complete easy tasks (HIT = Human Intelligence Task)
University of Sheffield, NLP 
Genre 2: Games with a purpose (GWAPs)
University of Sheffield, NLP 
Genre 3: Altruistic Crowdsourcing
University of Sheffield, NLP 
Workflow for Crowdsourcing (Corpora) 
1. Project Definition 
2. Data and UI Preparation 
3. Running the Project 
4. Evaluation & Corpus 
Delivery
University of Sheffield, NLP
University of Sheffield, NLP 
Definition of semantic relations between concept pairs. 
Coal Is a subcategory of Fossil Fuel
University of Sheffield, NLP 
Trade-offs: Cost; Timescale; Worker skills 
Small, simple tasks, fast completion => MLab 
Complex, large tasks, slower completion => GWAP
University of Sheffield, NLP 
• Data distribution: how “micro” is each microtask? 
• Long paragraphs hard to digest, worker fatigue 
• For most NLP tasks: one sentence corresponds to one task 
• Single sentences not always appropriate: e.g. for co-ref 
• Task Type 
• Selection task: WSD, sentiment analysis, entity 
disambiguation, relation typing. 
• Sequence marking task: co-reference resolution.
University of Sheffield, NLP 
• Categories per selection type task: 
• Experts (Hovy,10): max 10, ideally 7 
• In crowdsourcing less categories, typically 3-4 
• To reduce cognitive load, focus on one category at a time 
(e.g., one NE type) 
• Number of workers per task: 
• Depends on the subjective nature/complexity of the task 
• Minimum 3, optimally 5 
• Dynamic worker assignment for inconclusive tasks 
• Lawson et al. (2010): number of required labels varies for different aspects of 
the same NLP problem. Good results with only 4 annotators for Person NEs, 
but require 6 for Location and 7 for Organizations
University of Sheffield, NLP 
Reward scheme 
• What to reward? - money, game points 
• When to reward? - when work entered or after its evaluation 
• How much to reward? 
• Typically between $0.01 - $0.05/task (5 units) 
• No clear, repeatable results for quality:reward relation 
• High rewards get it done faster, but not better 
• Pilot task gives timings, so pay at least minimum wage 
• What to do with “bad” work? - detect at run-time and 
exclude
University of Sheffield, NLP
University of Sheffield, NLP 
Categories:10 
Players/task:7 
Payment:points 
awarded based 
on previously 
contributed 
judgments
University of Sheffield, NLP 
Categories:10 
Players/task:10 
Payment:$0.05/5 units 
Players filtered through gold-data
University of Sheffield, NLP 
Workflow for Crowdsourcing Corpora 
1. Project Definition 
2. Data and UI Preparation 
3. Running the Project 
4. Evaluation & Corpus 
Delivery
University of Sheffield, NLP 
• Pre-process the corpus linguistically, as needed, e.g. 
• Tokenise text if user needs to select words 
• Identify proper names/noun phrases if we want to classify these 
• Bring additional context, if needed, e.g. text of user profile from 
Twitter; link to wikipedia page 
• For GWAPs: 
• Collect interesting input data if possible, I.e.,texts that are fun to 
read and work on 
• clean input data to remove errors (these will lower player 
satisfaction) 
• MLab can be used for cleaning the data set
University of Sheffield, NLP 
• Build and test the user interfaces 
• Easy to medium difficulty in AMT/CF; templates provided for 
some task types 
• Medium to hard for GWAPs 
• Job management interfaces 
• Provided in MLab platforms 
• Must be built from scratch for GWAPs 
• Comparative interface set-up times: 
• CF: 2 days; Climate Quiz: 2 months 
• (Thaler et al., 12): OntoPronto: 5 months
University of Sheffield, NLP 
Example: Job Management Interface
University of Sheffield, NLP 
HINT: Add explicitly verifiable 
questions to the UI: 
- help filter out spammers 
- force workers to read the task 
input
University of Sheffield, NLP 
Pilot the design, measure performance, try again 
• Simple, clear design important 
• Binary decision tasks get good results 
Run bigger pilot studies with volunteers to test 
everything and collect gold units for quality control later
University of Sheffield, NLP 
Workflow for Crowdsourcing Corpora 
1. Project Definition 
2. Data and UI Preparation 
3. Running the Project 
4. Evaluation & Corpus 
Delivery
University of Sheffield, NLP 
Contributor recruitment: 
• MLab - easy, given the platforms’ large worker pools and economic 
incentives 
• GWAPs - challenging, requires much PR. 
• Social network based games allow inviting friends for leverage the viral 
aspect of SNs 
• Multi-channel advertisement: local and national press, science websites, 
blogs, bookmarking web- sites, gaming forums, and social networking 
sites 
Contributor screening (only in MLab): 
• MLab - by country, by skill (e.g., spoken language), by reliability 
• MLab - screening through competency tests; answers to gold units
University of Sheffield, NLP 
IN-TASK QUALITY CONTROL 
Train contributors - through instructions: 
• be clear and concise; 
• avoid technical jargon; 
• provide both positive and negative examples. 
Train contributors - through gold data: 
• CF - known data units (gold units) hidden in tasks 
• When completing a gold unit, a worker is shown the expected answer thus 
being trained “on the job” 
• Workers who fail a certain percentage of gold units are automatically 
excluded from the job 
Great opportunity to train workers and amend expert data 
Better gold data means better output quality, for the same cost
University of Sheffield, NLP 
Example: CF Instructions
University of Sheffield, NLP
University of Sheffield, NLP 
• For large tasks - Multi-batch methodology 
• Submit tasks in multiple batches 
• Ensure contributor diversity by starting batches at different times 
• Needs less gold data 
• Deal with worker disputes!
University of Sheffield, NLP 
Workflow for Crowdsourcing Corpora 
1. Project Definition 
2. Data and UI Preparation 
3. Running the Project 
4. Evaluation & Corpus 
Delivery
University of Sheffield, NLP 
• Evaluate individual contributor inputs to produce final decision 
• Majority vote 
• Discard inputs from low-trusted contributors (e.g. Hsueh et al. (2009)) 
• Aggregation: 
• Merge individual units from the microtasks (e.g. sentences) into 
complete documents, including all crowdsourced markup 
• Majority voting; average; collection 
• Aggregation strategies: 
• Climate Quiz: relation chosen between pairs if it has been voted 
by 4 more players than the next most popular relation 
• CF - Majority voting; confidence value computed taking into 
account worker accuracy
University of Sheffield, NLP 
• Evaluate corpus quality 
• Compute inter-worker agreement; 
• Compute inter-worker-trusted annotator agreement 
• Compare to a gold standard baseline (P/R/F/Acc) 
•To facilitate reuse: 
• deliver corpus in a widely used format (XCES, CONLL, GATE XML) 
• Share with research community
University of Sheffield, NLP
University of Sheffield, NLP 
Evaluation of relation selection task: 
Comparison with Gold Standard 
Same data, different aggregation
University of Sheffield, NLP 
Legal and Ethical Issues 
1. Acknowledging the Crowd‘s contribution 
S. Cooper, [other authors], and Foldit players: Predicting protein structures 
with a multiplayer online game. Nature, 466(7307):756-760, 2010. 
2. Ensuring privacy and wellbeing 
1. Mechnised labour criticised for low wages, lack of worker rights 
2. Majority of workers rely on microtasks as main income source 
3. Prevent prolonged use & user exploitation (e.g. daily caps) 
3. Licensing and consent 
1. Some clearly state the use of Creative Common licenses 
2. General failure to provide informed consent information
University of Sheffield, NLP
University of Sheffield, NLP 
Thank you! 
Questions?

More Related Content

What's hot

Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
Garima Nanda
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Tasnim Ara Islam
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
Lifeng (Aaron) Han
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
Ahmed Magdy Ezzeldin, MSc.
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
Lifeng (Aaron) Han
 
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference PresentationRecent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
stewhir
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Lifeng (Aaron) Han
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
NU_I_TODALAB
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Simon Hughes
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
YerevaNN research lab
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
alessio_ferrari
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Leon Derczynski
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng (Aaron) Han
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
Yunyao Li
 

What's hot (20)

Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference PresentationRecent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
 

Similar to Crowdsourcing Best Practices

Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
Leon Derczynski
 
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Jekaterina Novikova, PhD
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...Sandy Justice
 
The Mythical Man Month
The Mythical Man MonthThe Mythical Man Month
The Mythical Man Month
Mr Cracker
 
staffing chapter no 8 external selection part 1, by heneman
staffing chapter no 8 external selection part 1, by henemanstaffing chapter no 8 external selection part 1, by heneman
staffing chapter no 8 external selection part 1, by heneman
fareeha zanib
 
Evaluating Semantic Search Systems to Identify Future Directions of Research
Evaluating Semantic Search Systems to Identify Future Directions of ResearchEvaluating Semantic Search Systems to Identify Future Directions of Research
Evaluating Semantic Search Systems to Identify Future Directions of Research
Stuart Wrigley
 
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
Agile Offsharing: Using Pair Work to OvercomeNearshoring DifficultiesAgile Offsharing: Using Pair Work to OvercomeNearshoring Difficulties
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
MobileSolutionsDTAG
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
WarNik Chow
 
Delphi Method by Amr Ali
Delphi Method  by Amr AliDelphi Method  by Amr Ali
Delphi Method by Amr Ali
Amr Ali
 
Managing application performance by Kwame Thomison
Managing application performance by Kwame ThomisonManaging application performance by Kwame Thomison
Managing application performance by Kwame Thomison
SergeyChernyshev
 
Nirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_ExpNirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_ExpNirdesh Kulshreshtha
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
Jonathas Magalhães
 
Differences in-task-descriptions
Differences in-task-descriptionsDifferences in-task-descriptions
Differences in-task-descriptionsSameer Chavan
 
Open Creativity Scoring Tutorial
Open Creativity Scoring TutorialOpen Creativity Scoring Tutorial
Open Creativity Scoring Tutorial
DenisDumas2
 
Spreading Compuer Learning_ppt [Compatibility Mode]
Spreading Compuer Learning_ppt [Compatibility Mode]Spreading Compuer Learning_ppt [Compatibility Mode]
Spreading Compuer Learning_ppt [Compatibility Mode]
EarthSoft Foundation of Guidance - EFG
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
Rising Media, Inc.
 

Similar to Crowdsourcing Best Practices (20)

Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...
Expediting the Application Workshop Presentation -- 2015 SRA -- Dianne Donnel...
 
The Mythical Man Month
The Mythical Man MonthThe Mythical Man Month
The Mythical Man Month
 
staffing chapter no 8 external selection part 1, by heneman
staffing chapter no 8 external selection part 1, by henemanstaffing chapter no 8 external selection part 1, by heneman
staffing chapter no 8 external selection part 1, by heneman
 
Evaluating Semantic Search Systems to Identify Future Directions of Research
Evaluating Semantic Search Systems to Identify Future Directions of ResearchEvaluating Semantic Search Systems to Identify Future Directions of Research
Evaluating Semantic Search Systems to Identify Future Directions of Research
 
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
Agile Offsharing: Using Pair Work to OvercomeNearshoring DifficultiesAgile Offsharing: Using Pair Work to OvercomeNearshoring Difficulties
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
Delphi Method by Amr Ali
Delphi Method  by Amr AliDelphi Method  by Amr Ali
Delphi Method by Amr Ali
 
Managing application performance by Kwame Thomison
Managing application performance by Kwame ThomisonManaging application performance by Kwame Thomison
Managing application performance by Kwame Thomison
 
Nirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_ExpNirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_Exp
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
 
Differences in-task-descriptions
Differences in-task-descriptionsDifferences in-task-descriptions
Differences in-task-descriptions
 
Open Creativity Scoring Tutorial
Open Creativity Scoring TutorialOpen Creativity Scoring Tutorial
Open Creativity Scoring Tutorial
 
Spreading Compuer Learning_ppt [Compatibility Mode]
Spreading Compuer Learning_ppt [Compatibility Mode]Spreading Compuer Learning_ppt [Compatibility Mode]
Spreading Compuer Learning_ppt [Compatibility Mode]
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Crowdsourcing Best Practices

  • 1. University of Sheffield, NLP Crowdsourcing Best Practices Marta Sabou, Kalina Bontcheva Leon Derczynski, Arno Scharl
  • 2. University of Sheffield, NLP The Science of Corpus Annotation • Quite well understood best practice in how to create linguistic annotation of consistently high quality by employing, training, and managing groups of linguistic and/or domain experts • Necessary in order to ensure reusability and repeatability of results • The acquired corpora are of very high quality • Costs are unfortunately also very high: estimated at between $0.36 and $1.0 per annotation (Zaidan and Callison-Burch, 2011; Poesio et al., 2012)
  • 3. University of Sheffield, NLP Goals What is crowdsourcing? What is a typical workflow for crowdsoucing NLP tasks? What are general solutions used by the state of the art? How do different crowdsourcing genres compare?
  • 5. University of Sheffield, NLP Undefined and generally large group Compared to in-house projects: • cheaper (with 33%) • reach to large number of users; • reach to diverse user groups, e.g., speakers of rare languages
  • 6. University of Sheffield, NLP Genre 1: Mechanised Labour • Participants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
  • 7. University of Sheffield, NLP Genre 2: Games with a purpose (GWAPs)
  • 8. University of Sheffield, NLP Genre 3: Altruistic Crowdsourcing
  • 9. University of Sheffield, NLP Workflow for Crowdsourcing (Corpora) 1. Project Definition 2. Data and UI Preparation 3. Running the Project 4. Evaluation & Corpus Delivery
  • 11. University of Sheffield, NLP Definition of semantic relations between concept pairs. Coal Is a subcategory of Fossil Fuel
  • 12. University of Sheffield, NLP Trade-offs: Cost; Timescale; Worker skills Small, simple tasks, fast completion => MLab Complex, large tasks, slower completion => GWAP
  • 13. University of Sheffield, NLP • Data distribution: how “micro” is each microtask? • Long paragraphs hard to digest, worker fatigue • For most NLP tasks: one sentence corresponds to one task • Single sentences not always appropriate: e.g. for co-ref • Task Type • Selection task: WSD, sentiment analysis, entity disambiguation, relation typing. • Sequence marking task: co-reference resolution.
  • 14. University of Sheffield, NLP • Categories per selection type task: • Experts (Hovy,10): max 10, ideally 7 • In crowdsourcing less categories, typically 3-4 • To reduce cognitive load, focus on one category at a time (e.g., one NE type) • Number of workers per task: • Depends on the subjective nature/complexity of the task • Minimum 3, optimally 5 • Dynamic worker assignment for inconclusive tasks • Lawson et al. (2010): number of required labels varies for different aspects of the same NLP problem. Good results with only 4 annotators for Person NEs, but require 6 for Location and 7 for Organizations
  • 15. University of Sheffield, NLP Reward scheme • What to reward? - money, game points • When to reward? - when work entered or after its evaluation • How much to reward? • Typically between $0.01 - $0.05/task (5 units) • No clear, repeatable results for quality:reward relation • High rewards get it done faster, but not better • Pilot task gives timings, so pay at least minimum wage • What to do with “bad” work? - detect at run-time and exclude
  • 17. University of Sheffield, NLP Categories:10 Players/task:7 Payment:points awarded based on previously contributed judgments
  • 18. University of Sheffield, NLP Categories:10 Players/task:10 Payment:$0.05/5 units Players filtered through gold-data
  • 19. University of Sheffield, NLP Workflow for Crowdsourcing Corpora 1. Project Definition 2. Data and UI Preparation 3. Running the Project 4. Evaluation & Corpus Delivery
  • 20. University of Sheffield, NLP • Pre-process the corpus linguistically, as needed, e.g. • Tokenise text if user needs to select words • Identify proper names/noun phrases if we want to classify these • Bring additional context, if needed, e.g. text of user profile from Twitter; link to wikipedia page • For GWAPs: • Collect interesting input data if possible, I.e.,texts that are fun to read and work on • clean input data to remove errors (these will lower player satisfaction) • MLab can be used for cleaning the data set
  • 21. University of Sheffield, NLP • Build and test the user interfaces • Easy to medium difficulty in AMT/CF; templates provided for some task types • Medium to hard for GWAPs • Job management interfaces • Provided in MLab platforms • Must be built from scratch for GWAPs • Comparative interface set-up times: • CF: 2 days; Climate Quiz: 2 months • (Thaler et al., 12): OntoPronto: 5 months
  • 22. University of Sheffield, NLP Example: Job Management Interface
  • 23. University of Sheffield, NLP HINT: Add explicitly verifiable questions to the UI: - help filter out spammers - force workers to read the task input
  • 24. University of Sheffield, NLP Pilot the design, measure performance, try again • Simple, clear design important • Binary decision tasks get good results Run bigger pilot studies with volunteers to test everything and collect gold units for quality control later
  • 25. University of Sheffield, NLP Workflow for Crowdsourcing Corpora 1. Project Definition 2. Data and UI Preparation 3. Running the Project 4. Evaluation & Corpus Delivery
  • 26. University of Sheffield, NLP Contributor recruitment: • MLab - easy, given the platforms’ large worker pools and economic incentives • GWAPs - challenging, requires much PR. • Social network based games allow inviting friends for leverage the viral aspect of SNs • Multi-channel advertisement: local and national press, science websites, blogs, bookmarking web- sites, gaming forums, and social networking sites Contributor screening (only in MLab): • MLab - by country, by skill (e.g., spoken language), by reliability • MLab - screening through competency tests; answers to gold units
  • 27. University of Sheffield, NLP IN-TASK QUALITY CONTROL Train contributors - through instructions: • be clear and concise; • avoid technical jargon; • provide both positive and negative examples. Train contributors - through gold data: • CF - known data units (gold units) hidden in tasks • When completing a gold unit, a worker is shown the expected answer thus being trained “on the job” • Workers who fail a certain percentage of gold units are automatically excluded from the job Great opportunity to train workers and amend expert data Better gold data means better output quality, for the same cost
  • 28. University of Sheffield, NLP Example: CF Instructions
  • 30. University of Sheffield, NLP • For large tasks - Multi-batch methodology • Submit tasks in multiple batches • Ensure contributor diversity by starting batches at different times • Needs less gold data • Deal with worker disputes!
  • 31. University of Sheffield, NLP Workflow for Crowdsourcing Corpora 1. Project Definition 2. Data and UI Preparation 3. Running the Project 4. Evaluation & Corpus Delivery
  • 32. University of Sheffield, NLP • Evaluate individual contributor inputs to produce final decision • Majority vote • Discard inputs from low-trusted contributors (e.g. Hsueh et al. (2009)) • Aggregation: • Merge individual units from the microtasks (e.g. sentences) into complete documents, including all crowdsourced markup • Majority voting; average; collection • Aggregation strategies: • Climate Quiz: relation chosen between pairs if it has been voted by 4 more players than the next most popular relation • CF - Majority voting; confidence value computed taking into account worker accuracy
  • 33. University of Sheffield, NLP • Evaluate corpus quality • Compute inter-worker agreement; • Compute inter-worker-trusted annotator agreement • Compare to a gold standard baseline (P/R/F/Acc) •To facilitate reuse: • deliver corpus in a widely used format (XCES, CONLL, GATE XML) • Share with research community
  • 35. University of Sheffield, NLP Evaluation of relation selection task: Comparison with Gold Standard Same data, different aggregation
  • 36. University of Sheffield, NLP Legal and Ethical Issues 1. Acknowledging the Crowd‘s contribution S. Cooper, [other authors], and Foldit players: Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010. 2. Ensuring privacy and wellbeing 1. Mechnised labour criticised for low wages, lack of worker rights 2. Majority of workers rely on microtasks as main income source 3. Prevent prolonged use & user exploitation (e.g. daily caps) 3. Licensing and consent 1. Some clearly state the use of Creative Common licenses 2. General failure to provide informed consent information
  • 38. University of Sheffield, NLP Thank you! Questions?

Editor's Notes

  1. Crowdsourcing is an emerging collaborative approach for acquiring annotated corpora and a wide range of other linguistic resources Three main kinds of crowdsourcing platforms paid-for marketplaces such as Amazon Mechanical Turk (AMT) and CrowdFlower (CF) games with a purpose volunteer-based platforms such as crowdcrafting Paid for crowdsourcing can be 33% cheaper than in-house employees when applied to tasks such as tagging and classification (Hoffmann, 2009) Games with a purpose can be even cheaper in the long run, since the players are not paid. However cost of implementing a game can be higher than AMT/CF costs for smaller projects (Poesio et al, 2012) Tap into the large number of contributors/players available across the globe, through the internet Easy to reach native speakers in various languages (but beware Google translate cheaters!)
  2. Contributors are extrinsically motivated through economic incentives Most NLP projects use crowdsourcing marketplaces: Amazon Mechanical Tutk and CrowdFlower Requesters post Human Intelligence Tasks (HITs) to a large population of micro-workers (Callison-Burch and Dredze, 2010a) Snow et al. (2008) collect event and affect annotations, while Lawson et al. (2010) and Finin et al. (2010) annotate special types of texts such as emails and Twitter feeds, respectively. Challenges: low quality output due to the workers’ purely economic motivation high costs for large tasks (Parent and Eskenazi, 2011) ethical issues (Fort et al., 2011)
  3. In GWAPs (von Ahn and Dabbish, 2008), contributors carry out annotation tasks as a side effect of playing a game Example GWAPs: Phratris for annotating syntactic dependencies (Attardi, 2010) PhraseDetectives (Poesio et al.,2012) to acquire anaphora annotations Sentiment Quiz (Scharl et al., 2012) to annotate sentiment http://www.wordrobe.org/ - A collection of NLP games incl. POS, NE Challenges: Designing apealing games and attracting a critical mass of players are among the key success factors within this genre (Wang et al., 2012) In 2008, the group built a FB game that required players to rate the sentiment associated to a sentence on a 5-values scale, then used this as atraining corpus for the sentiment detection module. Over 800 player played the game. In 2009 the game has been released in a slightly different form and with the aim to gather sentiment lexicons, i.e., associations between words and their sentiment polarity (ratings from as many as 12 players were averaged to get the final value). The game ran in 7 different languages and attracted over 4000 players. Let this be an introductory example of a crowdsourcing project, however, crowdsourcing is a not a new phenomenon.
  4. Volunteer contributes because he is interested in a domain, supports a cause
  5. Compared to paid-for marketplaces, GWAPs: reduce costs and the incentive to cheat as players are intrinsically motivated promise superior results, due to motivated players and better utilization of sporadic, explorer-type users (Parent and Eskenazi, 2011) Few papers, and most of those “theoretical”/survey-based comparison.
  6. Climate Quiz is a GWAP deployed over the Facebook social networking platform. It is focused on acquiring factual knowledge in the domain of climate change. The game is coupled with an ontology learning algorithm, as follows. The ontology learning algorithm extracts terms from unstructured and structured data sources. The term pairs that are most likely related based on the algorithm’s input data sources are subsequently sent to Climate Quiz, where players assign relations to each pair. These relations are fed back into the algorithm which uses them to refine the learned ontology and to derive new term pairs that should be connected. As depicted here, Climate Quiz asks players to evaluate whether two concepts presented by the system are related (e.g. environmental activism,activism), and which label is the most appropriate to describe this relation (e.g. is a sub − category of ). Players can assign one of eight relations, three of which are generic (is a sub − category of, is identical to, is the opposite of), whereas five are domain- specific (opposes, supports, threatens, influences, works on /with). Two further relations, “other” and “is not related to” were added for cases not covered by the previous eight relations. The game’s interface allows players to switch the position of the two concepts or to skip ambiguous pairs.
  7. In order to allow the comparative analysis of the two HC genres, a mechanised labour version of Climate Quiz was created on the CrowdFlower (CF) platform. Additionally to the game interface, two verification questions were added to “force” the contributors to read the terms before selecting a random relation.
  8. Can run for hours, days or years, depending on genre and size
  9. Quality in terms of agreements with a gold standard. Note: depending on how the raw input from CF is aggregated the results are very different. In particular, the aggregation mechanism of CQ (highest scored relations must have 4 more scores than second scored relation) leads to worse results than when the aggregation methods of CF are used (these take account of worker performance during majority vote).
  10. Our findings verify experimentally all the differences between the two genre that the literature based study identified. Additionally, thanks to the experimental approach we have some concrete details about the actual values of some of the parameters. For those aspects where earlier studies disagree we found that: With the appropriate aggregation method, Mlab results can be as good as those obtained with games, at least for the task in question 2) Worker diversity is higher in GWAPs