• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
On Quality Control and Machine Learning in Crowdsourcing

On Quality Control and Machine Learning in Crowdsourcing



Talk at "Wisdom of the Crowd" AAAI 2012 Spring Symposium workshop (http://users.wpi.edu/~soniac/WisdomOfTheCrowd/WoCSchedule.htm) on 2011 AAAI-HComp paper by the same title.

Talk at "Wisdom of the Crowd" AAAI 2012 Spring Symposium workshop (http://users.wpi.edu/~soniac/WisdomOfTheCrowd/WoCSchedule.htm) on 2011 AAAI-HComp paper by the same title.



Total Views
Views on SlideShare
Embed Views



1 Embed 2

http://www.docshut.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    On Quality Control and Machine Learning in Crowdsourcing On Quality Control and Machine Learning in Crowdsourcing Presentation Transcript

    • On Quality Control and Machine Learning in Crowdsourcing Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease
    • Quality Control• Many factors matter – guidelines, experimental design, human factors, automation, …• Only as strong as weakest link – automation is not a silver bullet• Errors are not just due to lazy/stupid workers – Even in carefully designed and managed annotation projects, uncertain cases encountered 2
    • Human Factors (HF)• Questionnaire / Survey Design• Interface / Interaction Design• Incentives• Human Relations (HR): recruitment & retention• Long-term Commitment – rapport with co-workers – buy-in to organizational mission & value of work – opportunities for advancement in organization• Oversight / Management / Organization• Communication 3
    • HF Challenges & Consequences• Not part of typical CS curriculum or expertise – crowdsourcing disrupts prior area boundaries• NLP, IR, ML people traditionally don’t do HCI – now many of us dealing with such issues• Consequences – Errors from poor HF – Stumbling into known problems, recreating solutions – May see problems through limited vantage point – May over-rely on automation• Great opportunities for HCI collaboration 4
    • Minority Voice & Diversity• Opportunity: more diversity than “experts”• Risk: false reinforcement of majority view when minority is ignored, lost, or eliminated• Questions – How to recognize when majority is wrong? – How to recognize alternative or better truths? – Is QC systematically eliminating diversity? – How diverse is the crowd really? 5
    • Automation• Examples – Task Routing / Worker Selection – Adaptive Plurality, Decomposition – Post-hoc: Calibration, Filtering & Aggregation• Separation of concerns / middleware – Users specify their task, and system handles QC – Many do not have interest, time, skill, or risk tolerance to manage low-level QC on their own – Critical to widespread/enterprise adoption – Accelerate field progress • divide problem space for different groups to work on 6
    • Automation: Questions• Who are the workers?• What is the labor model?• What are affordances of the platform?• How does that drive subsequent setup?• Appropriate inner-annotator agreement measures for crowdwork? 7
    • Lessons from Traditional Annotation• Need clear, detailed guidelines• Cannot predict all cases in advance• Guidelines evolve during annotation• Humans not merely better visual, audio sensors – e.g. imprecise directions & unforeseen examples• Crowdsourcing Questions – How to handle examples for which current guidelines are ambiguous, unclear, or insufficient? – What role do annotators play? – How to facilitate interaction? 8
    • Worker Organization• How might we organize workers for effective QC?• Do workers participate in high level discussions (telecommuters) or act like automata (HPU)?• What organizational patterns might be used – e.g. find-verify, fix-fix-verify, qualify-work• How do different organizational patterns interact with automation and other QC factors? 9
    • Impact on Machine Learning: More• Labeled data• Uncertain data• Diverse data• Specific data• Ongoing data• Rapid data• Hybrid systems• On-demand evaluation• Datasets & Benchmarks• Tasks 10
    • Open Questions• How do cheap, plentiful , rapid labels alter how we utilize supervisied vs. semi-supervised vs. unsupervised methods? – Revist task-specific learning curves• Mask uncertainty via QC or model, propagate, and expose?• How do we handle noise in active learning?• How to best utilize a 24/7 global crowd for lifetime, continuous, never-ending learning systems? – Sample size vs. adaptation• Can we develop a more formal, computational understanding of Wisdom of Crowds? – diversity, independence, decentralization, and aggregation• Can we better connect consensus algorithms with more general feature-based and ensemble models? 11
    • Other Issues• Hybrid systems match human-level competence – Achievable now at certain time/cost tradeoff, which can be navigated as function of context and need• Diverse labeling particularly valuable when subjective – Traditional in-house annotators not diverse & few• A middle way between traditional annotation and automated proxy metrics – e.g. translation quality & BLEU – More rapid than traditional annotation, more accurate than automated metrics• Less re-use has the risk of less comparable evaluation – Enduring value of community evaluations like TREC 12
    • Thank You! ir.ischool.utexas.edu/crowd• Students – Catherine Grady (iSchool) Matt Lease – Hyunjoon Jung (ECE) ml@ischool.utexas.edu – Jorn Klinger (Linguistics) @mattlease – Adriana Kovashka (CS) – Abhimanu Kumar (CS) – Di Liu (iSchool) – Hohyon Ryu (iSchool) – William Tang (CS) – Stephen Wolfson (iSchool)• Omar Alonso, Microsoft Bing• Support – John P. Commons 13