On Quality Control and Machine Learning            in Crowdsourcing                   Matt Lease             School of Inf...
Quality Control• Many factors matter  – guidelines, experimental design, human factors,    automation, …• Only as strong a...
Human Factors (HF)•   Questionnaire / Survey Design•   Interface / Interaction Design•   Incentives•   Human Relations (HR...
HF Challenges & Consequences• Not part of typical CS curriculum or expertise   – crowdsourcing disrupts prior area boundar...
Minority Voice & Diversity• Opportunity: more diversity than “experts”• Risk: false reinforcement of majority view  when m...
Automation• Examples  – Task Routing / Worker Selection  – Adaptive Plurality, Decomposition  – Post-hoc: Calibration, Fil...
Automation: Questions•   Who are the workers?•   What is the labor model?•   What are affordances of the platform?•   How ...
Lessons from Traditional Annotation•   Need clear, detailed guidelines•   Cannot predict all cases in advance•   Guideline...
Worker Organization• How might we organize workers for effective QC?• Do workers participate in high level discussions  (t...
Impact on Machine Learning: More•   Labeled data•   Uncertain data•   Diverse data•   Specific data•   Ongoing data•   Rap...
Open Questions• How do cheap, plentiful , rapid labels alter how we utilize  supervisied vs. semi-supervised vs. unsupervi...
Other Issues• Hybrid systems match human-level competence   – Achievable now at certain time/cost tradeoff, which can be  ...
Thank You!                 ir.ischool.utexas.edu/crowd• Students  –   Catherine Grady (iSchool)       Matt Lease  –   Hyun...
Upcoming SlideShare
Loading in …5
×

On Quality Control and Machine Learning in Crowdsourcing

1,361 views

Published on

Talk at "Wisdom of the Crowd" AAAI 2012 Spring Symposium workshop (http://users.wpi.edu/~soniac/WisdomOfTheCrowd/WoCSchedule.htm) on 2011 AAAI-HComp paper by the same title.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,361
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

On Quality Control and Machine Learning in Crowdsourcing

  1. 1. On Quality Control and Machine Learning in Crowdsourcing Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease
  2. 2. Quality Control• Many factors matter – guidelines, experimental design, human factors, automation, …• Only as strong as weakest link – automation is not a silver bullet• Errors are not just due to lazy/stupid workers – Even in carefully designed and managed annotation projects, uncertain cases encountered 2
  3. 3. Human Factors (HF)• Questionnaire / Survey Design• Interface / Interaction Design• Incentives• Human Relations (HR): recruitment & retention• Long-term Commitment – rapport with co-workers – buy-in to organizational mission & value of work – opportunities for advancement in organization• Oversight / Management / Organization• Communication 3
  4. 4. HF Challenges & Consequences• Not part of typical CS curriculum or expertise – crowdsourcing disrupts prior area boundaries• NLP, IR, ML people traditionally don’t do HCI – now many of us dealing with such issues• Consequences – Errors from poor HF – Stumbling into known problems, recreating solutions – May see problems through limited vantage point – May over-rely on automation• Great opportunities for HCI collaboration 4
  5. 5. Minority Voice & Diversity• Opportunity: more diversity than “experts”• Risk: false reinforcement of majority view when minority is ignored, lost, or eliminated• Questions – How to recognize when majority is wrong? – How to recognize alternative or better truths? – Is QC systematically eliminating diversity? – How diverse is the crowd really? 5
  6. 6. Automation• Examples – Task Routing / Worker Selection – Adaptive Plurality, Decomposition – Post-hoc: Calibration, Filtering & Aggregation• Separation of concerns / middleware – Users specify their task, and system handles QC – Many do not have interest, time, skill, or risk tolerance to manage low-level QC on their own – Critical to widespread/enterprise adoption – Accelerate field progress • divide problem space for different groups to work on 6
  7. 7. Automation: Questions• Who are the workers?• What is the labor model?• What are affordances of the platform?• How does that drive subsequent setup?• Appropriate inner-annotator agreement measures for crowdwork? 7
  8. 8. Lessons from Traditional Annotation• Need clear, detailed guidelines• Cannot predict all cases in advance• Guidelines evolve during annotation• Humans not merely better visual, audio sensors – e.g. imprecise directions & unforeseen examples• Crowdsourcing Questions – How to handle examples for which current guidelines are ambiguous, unclear, or insufficient? – What role do annotators play? – How to facilitate interaction? 8
  9. 9. Worker Organization• How might we organize workers for effective QC?• Do workers participate in high level discussions (telecommuters) or act like automata (HPU)?• What organizational patterns might be used – e.g. find-verify, fix-fix-verify, qualify-work• How do different organizational patterns interact with automation and other QC factors? 9
  10. 10. Impact on Machine Learning: More• Labeled data• Uncertain data• Diverse data• Specific data• Ongoing data• Rapid data• Hybrid systems• On-demand evaluation• Datasets & Benchmarks• Tasks 10
  11. 11. Open Questions• How do cheap, plentiful , rapid labels alter how we utilize supervisied vs. semi-supervised vs. unsupervised methods? – Revist task-specific learning curves• Mask uncertainty via QC or model, propagate, and expose?• How do we handle noise in active learning?• How to best utilize a 24/7 global crowd for lifetime, continuous, never-ending learning systems? – Sample size vs. adaptation• Can we develop a more formal, computational understanding of Wisdom of Crowds? – diversity, independence, decentralization, and aggregation• Can we better connect consensus algorithms with more general feature-based and ensemble models? 11
  12. 12. Other Issues• Hybrid systems match human-level competence – Achievable now at certain time/cost tradeoff, which can be navigated as function of context and need• Diverse labeling particularly valuable when subjective – Traditional in-house annotators not diverse & few• A middle way between traditional annotation and automated proxy metrics – e.g. translation quality & BLEU – More rapid than traditional annotation, more accurate than automated metrics• Less re-use has the risk of less comparable evaluation – Enduring value of community evaluations like TREC 12
  13. 13. Thank You! ir.ischool.utexas.edu/crowd• Students – Catherine Grady (iSchool) Matt Lease – Hyunjoon Jung (ECE) ml@ischool.utexas.edu – Jorn Klinger (Linguistics) @mattlease – Adriana Kovashka (CS) – Abhimanu Kumar (CS) – Di Liu (iSchool) – Hohyon Ryu (iSchool) – William Tang (CS) – Stephen Wolfson (iSchool)• Omar Alonso, Microsoft Bing• Support – John P. Commons 13

×