Crowdsourcing for Information Retrieval: Principles, Methods, and Applications                                            ...
Tutorial Objectives• What is crowdsourcing? (!= MTurk)• How and when to use crowdsourcing?• How to use Mechanical Turk• Ex...
Tutorial OutlineI. Introduction and Motivating ExamplesII. Amazon Mechanical Turk (and CrowdFlower)III. Relevance Judging ...
Terminology We’ll Cover• Crowdsourcing: more than a buzzword?   – What is and isn’t crowdsourcing?   – Subset we discuss: ...
IINTRODUCTION TO CROWDSOURCINGJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applicatio...
From Outsourcing to Crowdsourcing• Take a job traditionally  performed by a known agent  (often an employee)• Outsource it...
Community Q&A / Social Search /                 Public PollingJuly 24, 2011   Crowdsourcing for Information Retrieval: Pri...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   8
Mechanical What?July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   9
Mechanical Turk (MTurk)Chess machine constructed andunveiled in 1770 by Wolfgangvon Kempelen (1734–1804)        J. Pontin....
•    “Micro-task” crowdsourcing marketplace•    On-demand, scalable, real-time workforce•    Online since 2005 (and still ...
This isn’t just a lab toy…    http://www.mturk-tracker.com (P. Ipeirotis’10)From 1/09 – 4/10, 7M HITs from 10K requestorsw...
Why Crowdsourcing for IR?• Easy, cheap and fast labeling• Ready-to use infrastructure       – MTurk payments, workforce, i...
Legal Disclaimer:                 Caution Tape and Silver Bullets• Often still involves more art than science• Not a magic...
Hello World Demo• We’ll show a simple, short demo of MTurk• This is a teaser highlighting things we’ll discuss       – Don...
Jane saw the man with the binocularsJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appl...
DEMOJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   17
Traditional Annotation / Data Collection• Setup data collection software / harness• Recruit volunteers (often undergrads)•...
How about some real examples?• Let’s see examples of MTurk’s use in prior  studies (many areas!)       – e.g. IR, NLP, com...
NLP Example – Dialect IdentificationJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appl...
NLP Example – Spelling correctionJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applica...
NLP Example – Machine Translation• Manual evaluation on translation quality is  slow and expensive• High agreement between...
Snow et al. (2008). EMNLP• 5 Tasks       –    Affect recognition       –    Word similarity       –    Recognizing textual...
CV Example – Painting Similarity                                                      Kovashka & Lease, CrowdConf’10July 2...
IR Example – Relevance and adsJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applicatio...
Okay, okay! I’m a believer!       How can I get started with MTurk?• You have an idea (e.g. novel IR technique)• Hiring ed...
II                AMAZON MECHANICAL TURKJuly 24, 2011    Crowdsourcing for Information Retrieval: Principles, Methods, and...
The Requester•    Sign up with your Amazon account•    Amazon payments•    Purchase prepaid HITs•    There is no minimum o...
Mturk Dashboard• Three tabs       – Design       – Publish       – Manage• Design       – HIT Template• Publish       – Ma...
Dashboard - IIJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   30
API•    Amazon Web Services API•    Rich set of services•    Command line tools•    More flexibility than dashboardJuly 24...
Dashboard vs. API• Dashboard       – Easy to prototype       – Setup and launch an experiment in a few minutes• API       ...
But where do my labels come from?• An all powerful black box?• A magical, faraway land?July 24, 2011   Crowdsourcing for I...
Nope, MTurk has actual workers too!• Sign up with your Amazon account• Tabs       – Account: work approved/rejected       ...
Doing some work• Strongly recommended• Do some work before you create workJuly 24, 2011   Crowdsourcing for Information Re...
But who aremy workers?• A. Baio, November 2008. The Faces of Mechanical Turk.• P. Ipeitorotis. March 2010. The New Demogra...
Worker Demographics• 2008-2009 studies found  less global and diverse  than previously thought       – US       – Female  ...
2010 shows increasing diversity47% US, 34% India, 19% other (P. Ipeitorotis. March 2010) July 24, 2011   Crowdsourcing for...
Is MTurk my only choice? No, see below.•     Crowdflower (since 2007, www.crowdflower.com)•     CloudCrowd•     DoMyStuff•...
(since 2007)•   Labor on-demand•   Channels•   Quality control features•   Sponsor: CSE’10, CSDM’11, CIR’11, TREC’11 Crowd...
High-level Issues in Crowdsourcing• Process       – Experimental design, annotation guidelines, iteration• Choose crowdsou...
III                RELEVANCE JUDGING & CROWDSOURINGJuly 24, 2011      Crowdsourcing for Information Retrieval: Principles,...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   43
Relevance and IR• What is relevance?       – Multidimensional       – Dynamic       – Complex but systematic and measurabl...
Evaluation• Relevance is hard to evaluate       – Highly subjective       – Expensive to measure• Click data• Professional...
Crowdsourcing and Relevance Evaluation• For relevance, it combines two main  approaches       – Explicit judgments       –...
User Studies• Investigate attitudes about saving, sharing, publishing,  and removing online photos• Survey       – A scena...
Elicitation Criteria•     Relevance in a vertical like e-commerce•     Is classical criteria right for e-commerce?•     Cl...
IR Example – Product SearchJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications ...
IR Example – Snippet Evaluation•    Study on summary lengths•    Determine preferred result length•    Asked workers to ca...
IR Example – Relevance Assessment•     Replace TREC-like relevance assessors with MTurk?•     Selected topic “space progra...
IR Example – Timeline Annotation• Workers annotate timeline on politics, sports, culture• Given a timex (1970s, 1982, etc....
IR Example – Is Tweet Interesting?• Detecting uninteresting content text streams       – Alonso et al. SIGIR 2010 CSE Work...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   54
Started with a joke …July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   55
Results for {idiot} at WSDMFebruary 2011: 5/7 (R), 2/7 (NR)    –    Most of the time those TV reality stars have absolutel...
Two Simple Examples of MTurk1. Ask workers to classify a query2. Ask workers to judge document relevanceSteps• Define high...
Query Classification Task•    Ask the user to classify a query•    Show a form that contains a few categories•    Upload a...
DEMOJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   59
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   60
Relevance Evaluation Task•    Relevance assessment task•    Use a few documents from TREC•    Ask user to perform binary e...
DEMOJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   62
Typical Workflow•    Define and design what to test•    Sample data•    Design the experiment•    Run experiment•    Colle...
Crowdsourcing in Major IR Evaluations• CLEF image      • Nowak and Ruger, MIR’10• TREC blog      • McCreadie et al., CSE’1...
BREAKJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   65
IV                              DESIGN OF EXPERIMENTSJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, ...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   67
Survey Design•    One of the most important parts•    Part art, part science•    Instructions are key•    Prepare to itera...
Questionnaire Design• Ask the right questions• Workers may not be IR experts so don’t  assume the same understanding in te...
UX Design• Time to apply all those usability concepts• Generic tips       – Experiment should be self-contained.       – K...
UX Design - II•    Presentation•    Document design•    Highlight important concepts•    Colors and fonts•    Need to grab...
Examples - I• Asking too much, task not clear, “do NOT/reject”• Worker has to do a lot of stuffJuly 24, 2011   Crowdsourci...
Example - II• Lot of work for a few cents• Go here, go there, copy, enter, count …July 24, 2011   Crowdsourcing for Inform...
A Better Example• All information is available       – What to do       – Search result       – Question to answerJuly 24,...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   75
Form and Metadata• Form with a close question (binary relevance) and  open-ended question (user feedback)• Clear title, us...
TREC Assessment – Example IJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications ...
TREC Assessment – Example IIJuly 24, 2011     Crowdsourcing for Information Retrieval: Principles, Methods, and Applicatio...
How Much to Pay?• Price commensurate with task effort      – Ex: $0.02 for yes/no answer + $0.02 bonus for optional feedba...
Development Framework• Incremental approach• Measure, evaluate, and adjust as you go• Suitable for repeatable tasksJuly 24...
Implementation• Similar to a UX• Build a mock up and test it with your team       – Yes, you need to judge some tasks• Inc...
Implementation – II• Introduce quality control       – Qualification test       – Gold answers (honey pots)•    Adjust pas...
Experiment in Production•    Lots of tasks on MTurk at any moment•    Need to grab attention•    Importance of experiment ...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   84
Quality Control• Extremely important part of the experiment• Approach as “overall” quality; not just for workers• Bi-direc...
Quality Control - II• Approval rate: easy to use, & just as easily defeated   – P. Ipeirotis. Be a Top Mechanical Turk Wor...
A qualification test snippet<Question>  <QuestionIdentifier>question1</QuestionIdentifier>  <QuestionContent>     <Text>Ca...
Qualification tests: pros and cons• Advantages       – Great tool for controlling quality       – Adjust passing grade• Di...
Methods for measuring agreement• What to look for       – Agreement, reliability, validity• Inter-agreement level       – ...
Inter-rater reliability• Lots of research• Statistics books cover most of the material• Three categories based on the goal...
Quality control on relevance assessments•    INEX 2008 Book track•    Home grown system (no MTurk)•    Propose a game for ...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   92
Quality Control & Assurance• Filtering     –   Approval rate (built-in but defeatable)     –   Geographic restrictions (e....
More on quality control & assurance• HR issues: recruiting, selection, & retention       – e.g., post/tweet, design a bett...
Data quality   • Data quality via repeated labeling   • Repeated labeling can improve label quality     and model quality ...
Scales and labels• Binary     – Yes, No• 5-point Likert     – Strongly disagree, disagree, neutral, agree, strongly agree•...
Was the task difficult?• Ask workers to rate the difficulty of a topic• 50 topics, TREC; 5 workers, $0.01 per taskJuly 24,...
Other quality heuristics• Justification/feedback as quasi-captcha       – Successfully used at TREC and INEX experiments  ...
MTurk QA: Tools and Packages• QA infrastructure layers atop MTurk promote  useful separation-of-concerns from task       –...
Dealing with bad workers• Pay for “bad” work instead of rejecting it?   – Pro: preserve reputation, admit if poor design a...
Worker feedback• Real feedback received via email after rejection• Worker XXX     I did. If you read these articles most o...
Real email exchange with worker after rejectionWORKER: this is not fair , you made me work for 10 cents and i lost my 30 m...
Exchange with worker•    Worker XXX     Thank you. I will post positive feedback for you at     Turker Nation.Me: was this...
Build Your Reputation as a Requestor• Word of mouth effect       – Workers trust the requester (pay on time, clear        ...
Other practical tips• Sign up as worker and do some HITs• “Eat your own dog food”• Monitor discussion forums• Address feed...
Content quality• People like to work on things that they like• TREC ad-hoc vs. INEX       – TREC experiments took twice to...
Content quality - II• Document length• Randomize content• Avoid worker fatigue       – Judging 100 documents on the same s...
Presentation• People scan documents for relevance cues• Document design• Highlighting no more than 10%July 24, 2011   Crow...
Presentation - IIJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   109
Relevance justification• Why settle for a label?• Let workers justify answers• INEX       – 22% of assignments with commen...
“Relevant” answers [Salad Recipes] Doesnt mention the word salad, but the recipe is one that could be considered a    sala...
“Not relevant” answers[Salad Recipes]About gaming not salad recipes.Article is about Norway.Article is about Region Codes....
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   113
Feedback length• Workers will justify answers• Has to be optional for good  feedback• In E51, mandatory comments  – Length...
Other design principles• Text alignment• Legibility• Reading level: complexity of words and sentences• Attractiveness (wor...
Platform alternatives• Why MTurk       – Amazon brand, lots of research papers       – Speed, price, diversity, payments• ...
The human side• As a worker     –   I hate when instructions are not clear     –   I’m not a spammer – I just don’t get wh...
Things that work•    Qualification tests•    Honey-pots•    Good content and good presentation•    Economy of attention•  ...
Things that need work• UX and guidelines       – Help the worker       – Cost of interaction•    Scheduling and refresh ra...
VFROM LABELING TO HUMAN COMPUTATIONJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appli...
The Turing Test (Alan Turing, 1950)July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appli...
July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   122
The Turing Test (Alan Turing, 1950)July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appli...
What is a Computer?July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   124
• What was old becomes new• “Crowdsourcing: A New  Branch of Computer Science”  (March 29, 2011)                         P...
Davis et al. (2010) The HPU.                                                              HPUJuly 24, 2011     Crowdsourci...
Human ComputationRebirth of people as ‘computists’; people do tasks computers cannot (do well)Stage 1: Detecting robots   ...
Mobile Phone App: “Amazon Remembers”July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Appl...
ReCaptchaL. von Ahn et al. (2008). In Science.Harnesses human work as invisible by product.July 24, 2011   Crowdsourcing f...
CrowdSearch and mCrowd• T. Yan, MobiSys 2010July 24, 2011    Crowdsourcing for Information Retrieval: Principles, Methods,...
Soylent: A Word Processor with a Crowd Inside • Bernstein et al., UIST 2010 July 24, 2011   Crowdsourcing for Information ...
Translation by monolingual speakers• C. Hu, CHI 2009July 24, 2011   Crowdsourcing for Information Retrieval: Principles, M...
fold.it• S. Cooper et al. (2010).July 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applica...
VI                                                        WORKER INCENTIVESJuly 24, 2011   Crowdsourcing for Information R...
Worker Incentives•    Pay ($$$)•    Fun (or avoid boredom)•    Socialize•    Earn acclaim/prestige•    Altruism•    Learn ...
Pay ($$$)                                                             P. Ipeirotis March 2010July 24, 2011   Crowdsourcing...
Pro• Ready marketplaces (e.g. MTurk, CrowdFlower, …)• Less need for creativity• Simple motivation knobCon: Quality and Qua...
Fun (or avoid boredom)• Games with a Purpose (von Ahn)       – Data is by-product       – IR: Law et al. SearchWar. HCOMP ...
•    Learning to map from web pages to queries   •    Human computation game to elicit data   •    Home grown system (no A...
Fun (or avoid boredom)• Pro:   – Enjoyable “work” people want to do (or at least better     than anything else they have t...
Socialization & PrestigeJuly 24, 2011   Crowdsourcing for Information Retrieval: Principles, Methods, and Applications   141
Socialization & Prestige• Pro:       – “free”       – enjoyable for connecting with one another       – can share infrastr...
Altruism•     Contribute knowledge•     Help others (who need knowledge)•     Help workers (e.g. SamaSource)•     Charity ...
Altruism• Pro       – “free”       – can motivate quality work for a cause• Con       – Seemingly small workforce for pure...
Unintended by-product• Pro       – effortless (unnoticed) work       – Scalability from involving non-workers• Con       –...
Multiple Incentives• Ideally maximize all• Wikipedia, cQA, Gwap       – fun, socialization, prestige, altruism• Fun vs. Pa...
VII                                                                       THE ROAD AHEADJuly 24, 2011   Crowdsourcing for ...
Wisdom of Crowds (WoC)Requires• Diversity• Independence• Decentralization• AggregationInput: large, diverse sample     (to...
WoC vs. Ensemble Learning• Combine multiple models to improve performance  over any constituent model   – Can use many wea...
Unreasonable Effectiveness of Data• Massive free Web data  changed how we train  learning systems  – Banko and Brill (2001...
MapReduce with human computation• Commonalities       – Large task divided into smaller sub-problems       – Work distribu...
CrowdForge: MapReduce for      Automation + Human Computation        Kittur et al., CHI 2011July 24, 2011   Crowdsourcing ...
Research problems – operational• Methodology       – Budget, people, document, queries, presentation,         incentives, ...
More problems• Human factors vs. outcomes• Editors vs. workers• Pricing tasks• Predicting worker quality from observable  ...
Problems: crowds, clouds and algorithms• Infrastructure     – Current platforms are very rudimentary     – No tools for da...
Conclusions•    Crowdsourcing for relevance evaluation works•    Fast turnaround, easy to experiment, cheap•    Still have...
Conclusions - II•    Crowdsourcing is here to stay•    Lots of opportunities to improve current platforms•    Integration ...
VIII                                        RESOURCES AND REFERENCESJuly 24, 2011   Crowdsourcing for Information Retrieva...
Little written for the general public• July 2010, kindle-only• “This book introduces you  to the top crowdsourcing  sites ...
Crowdsourcing @ SIGIR’11   Workshop on Crowdsourcing for Information Retrieval   Roi Blanco, Harry Halpin, Daniel Herzig...
2011 Workshops & ConferencesSIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28)• WSDM-CSDM: Crowdsour...
2011 Tutorials• SIGIR (yep, this is it!)• WSDM: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You    – Omar Al...
Other Events & Resources                  ir.ischool.utexas.edu/crowd2011 book: Omar Alonso, Gabriella Kazai, andStefano M...
Thank You!For questions about tutorial or crowdsourcing, email:  omar.alonso@microsoft.com  ml@ischool.utexas.eduCartoons ...
Crowdsourcing in IR: 2008-2010   2008          O. Alonso, D. Rose, and B. Stewart. “Crowdsourcing for relevance evaluati...
Crowdsourcing in IR: 2011   WSDM Workshop on Crowdsourcing for Search and Data Mining.   SIGIR Workshop on Crowdsourcing...
Bibliography: General IR   M. Hearst. “Search User Interfaces”, Cambridge University Press, 2009   K. Jarvelin, and J. K...
Bibliography: Other   J. Barr and L. Cabrera. “AI gets a Brain”, ACM Queue, May 2006.   Bernstein, M. et al. Soylent: A ...
Bibliography: Other (2)    A. Kittur, E. Chi, and B. Suh. “Crowdsourcing user studies with Mechanical Turk”, SIGCHI 2008....
Bibliography: Other (3)    C. Marshall and F. Shipman “The Ownership and Reuse of Visual Media”, JCDL, 2011.    AnHai Do...
Other ResourcesBlogs Behind Enemy Lines (P.G. Ipeirotis, NYU) Deneme: a Mechanical Turk experiments blog (Gret Little, M...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing for Information Retrieval: Principles, Methods, and Applications

9,068

Published on

Tutorial by Omar Alonso and Matthew Lease, presented July 24, 2011 at the 34th Annual ACM SIGIR Conference in Beijing, China. See http://www.sigir2011.org/crowdsourcing-for-information-retrieval.htm.

Published in: Technology

Crowdsourcing for Information Retrieval: Principles, Methods, and Applications

  1. 1. Crowdsourcing for Information Retrieval: Principles, Methods, and Applications Omar Alonso Microsoft Matthew Lease University of Texas at Austin July 28, 2011 July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 1
  2. 2. Tutorial Objectives• What is crowdsourcing? (!= MTurk)• How and when to use crowdsourcing?• How to use Mechanical Turk• Experimental setup and design guidelines for working with the crowd• Quality control: issues, measuring, and improving• IR + Crowdsourcing – research landscape and open challenges July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 2
  3. 3. Tutorial OutlineI. Introduction and Motivating ExamplesII. Amazon Mechanical Turk (and CrowdFlower)III. Relevance Judging and CrowdsourcingIV. Design of experiments (the good stuff)V. From Labels to Human ComputationVI. Worker Incentives (money isn’t everything)VII.The Road Ahead (+ refs at end)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 3
  4. 4. Terminology We’ll Cover• Crowdsourcing: more than a buzzword? – What is and isn’t crowdsourcing? – Subset we discuss: micro-tasks (diagram coming)• Human Computation = having people do stuff – Functional view of human work, both helpful & harmful• AMT / MTurk – HIT, Requester, Assignment, Turking & Turkers• Quality Control (QC) – spam & spammers – label aggregation, consensus, plurality, multi-labeling – “gold” data, honey pots, verifiable answers, trap questions July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 4
  5. 5. IINTRODUCTION TO CROWDSOURCINGJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 5
  6. 6. From Outsourcing to Crowdsourcing• Take a job traditionally performed by a known agent (often an employee)• Outsource it to an undefined, generally large group of people via an open call• New application of principles from open source movement July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 6
  7. 7. Community Q&A / Social Search / Public PollingJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 7
  8. 8. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 8
  9. 9. Mechanical What?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 9
  10. 10. Mechanical Turk (MTurk)Chess machine constructed andunveiled in 1770 by Wolfgangvon Kempelen (1734–1804) J. Pontin. Artificial Intelligence, With Help From the Humans. NY Times (March 25, 2007)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 10
  11. 11. • “Micro-task” crowdsourcing marketplace• On-demand, scalable, real-time workforce• Online since 2005 (and still in “beta”)• Programmer’s API & “Dashboard” GUIJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 11
  12. 12. This isn’t just a lab toy… http://www.mturk-tracker.com (P. Ipeirotis’10)From 1/09 – 4/10, 7M HITs from 10K requestorsworth $500,000 USD (significant under-estimate) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 12
  13. 13. Why Crowdsourcing for IR?• Easy, cheap and fast labeling• Ready-to use infrastructure – MTurk payments, workforce, interface widgets – CrowdFlower quality control mechanisms, etc.• Allows early, iterative, frequent experiments – Iteratively prototype and test new ideas – Try new tasks, test when you want & as you go• Proven in major IR shared task evaluations – CLEF image, TREC, INEX, WWW/Yahoo SemSearchJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 13
  14. 14. Legal Disclaimer: Caution Tape and Silver Bullets• Often still involves more art than science• Not a magic panacea, but another alternative – one more data point for analysis, complements other methods• Quality may be sacrificed for time/cost/effort• Hard work & experimental design still required! July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 14
  15. 15. Hello World Demo• We’ll show a simple, short demo of MTurk• This is a teaser highlighting things we’ll discuss – Don’t worry about details; we’ll revisit them• Specific task unimportant• Big idea: easy, fast, cheap to label with MTurk!July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 15
  16. 16. Jane saw the man with the binocularsJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 16
  17. 17. DEMOJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 17
  18. 18. Traditional Annotation / Data Collection• Setup data collection software / harness• Recruit volunteers (often undergrads)• Pay a flat fee for experiment or hourly wage• Characteristics – Slow – Expensive – Tedious – Sample BiasJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 18
  19. 19. How about some real examples?• Let’s see examples of MTurk’s use in prior studies (many areas!) – e.g. IR, NLP, computer vision, user studies, usability testing, psychological studies, surveys, …• Check bibliography at end for more referencesJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 19
  20. 20. NLP Example – Dialect IdentificationJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 20
  21. 21. NLP Example – Spelling correctionJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 21
  22. 22. NLP Example – Machine Translation• Manual evaluation on translation quality is slow and expensive• High agreement between non-experts and experts• $0.10 to translate a sentence C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009. B. Bederson et al. Translation by Iteractive Collaboration between Monolingual Users, GI 2010July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 22
  23. 23. Snow et al. (2008). EMNLP• 5 Tasks – Affect recognition – Word similarity – Recognizing textual entailment – Event temporal ordering – Word sense disambiguation• high agreement between crowd labels and expert “gold” labels – assumes training data for worker bias correction• 22K labels for $26 !July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 23
  24. 24. CV Example – Painting Similarity Kovashka & Lease, CrowdConf’10July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 24
  25. 25. IR Example – Relevance and adsJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 25
  26. 26. Okay, okay! I’m a believer! How can I get started with MTurk?• You have an idea (e.g. novel IR technique)• Hiring editors too difficult / expensive / slow• You don’t have a large traffic query logCan you test your idea via crowdsourcing?• Is my idea crowdsourcable?• How do I start?• What do I need?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 26
  27. 27. II AMAZON MECHANICAL TURKJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 27
  28. 28. The Requester• Sign up with your Amazon account• Amazon payments• Purchase prepaid HITs• There is no minimum or up-front fee• MTurk collects a 10% commission• The minimum commission charge is $0.005 per HITJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 28
  29. 29. Mturk Dashboard• Three tabs – Design – Publish – Manage• Design – HIT Template• Publish – Make work available• Manage – Monitor progressJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 29
  30. 30. Dashboard - IIJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 30
  31. 31. API• Amazon Web Services API• Rich set of services• Command line tools• More flexibility than dashboardJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 31
  32. 32. Dashboard vs. API• Dashboard – Easy to prototype – Setup and launch an experiment in a few minutes• API – Ability to integrate AMT as part of a system – Ideal if you want to run experiments regularly – Schedule tasksJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 32
  33. 33. But where do my labels come from?• An all powerful black box?• A magical, faraway land?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 33
  34. 34. Nope, MTurk has actual workers too!• Sign up with your Amazon account• Tabs – Account: work approved/rejected – HIT: browse and search for work – Qualifications: browse & search qualifications• Start turking!July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 34
  35. 35. Doing some work• Strongly recommended• Do some work before you create workJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 35
  36. 36. But who aremy workers?• A. Baio, November 2008. The Faces of Mechanical Turk.• P. Ipeitorotis. March 2010. The New Demographics of Mechanical Turk• J. Ross, et al. Who are the Crowdworkers?... CHI 2010. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 36
  37. 37. Worker Demographics• 2008-2009 studies found less global and diverse than previously thought – US – Female – Educated – Bored – Money is secondaryJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 37
  38. 38. 2010 shows increasing diversity47% US, 34% India, 19% other (P. Ipeitorotis. March 2010) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 38
  39. 39. Is MTurk my only choice? No, see below.• Crowdflower (since 2007, www.crowdflower.com)• CloudCrowd• DoMyStuff• Livework• Clickworker• SmartSheet• uTest• Elance• oDesk• vWorker (was rent-a-coder) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 39
  40. 40. (since 2007)• Labor on-demand• Channels• Quality control features• Sponsor: CSE’10, CSDM’11, CIR’11, TREC’11 Crowd Track July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 40
  41. 41. High-level Issues in Crowdsourcing• Process – Experimental design, annotation guidelines, iteration• Choose crowdsourcing platform (or roll your own)• Human factors – Payment / incentives, interface and interaction design, communication, reputation, recruitment, retention• Quality Control / Data Quality – Trust, reliability, spam detection, consensus labelingJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 41
  42. 42. III RELEVANCE JUDGING & CROWDSOURINGJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 42
  43. 43. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 43
  44. 44. Relevance and IR• What is relevance? – Multidimensional – Dynamic – Complex but systematic and measurable• Relevance in Information Retrieval• Frameworks• Types – System or algorithmic – Topical – Pertinence – Situational – MotivationalJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 44
  45. 45. Evaluation• Relevance is hard to evaluate – Highly subjective – Expensive to measure• Click data• Professional editorial work• VerticalsJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 45
  46. 46. Crowdsourcing and Relevance Evaluation• For relevance, it combines two main approaches – Explicit judgments – Automated metrics• Other features – Large scale – Inexpensive – DiversityJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 46
  47. 47. User Studies• Investigate attitudes about saving, sharing, publishing, and removing online photos• Survey – A scenario-based probe of respondent attitudes, designed to yield quantitative data – A set of questions (close and open-ended) – Importance of recent activity – 41 question – 7 point scale• 250 respondents C. Marshall and F. Shipman. “The Ownership and Reuse of Visual Media”, JCDL 2011.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 47
  48. 48. Elicitation Criteria• Relevance in a vertical like e-commerce• Is classical criteria right for e-commerce?• Classical criteria (Barry and Schamber) – Accuracy & validity, consensus within the field, content novelty, depth & scope, presentation, recency, reliability, verifiability• E-commerce criteria – Brand name, product name, price/value (cheap, affordable, expensive, not suspiciously cheap), availability, ratings & user reviews, latest model/version, personal aspects, perceived value, genre & age• Experiment – Select e-C and non e-C queries – Each workerr 1 query/need (e-C or non e-C) – 7 workers per HITO. Alonso and S. Mizzaro. “Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis”, SIGIR 2009. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 48
  49. 49. IR Example – Product SearchJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 49
  50. 50. IR Example – Snippet Evaluation• Study on summary lengths• Determine preferred result length• Asked workers to categorize web queries• Asked workers to evaluate snippet quality• Payment between $0.01 and $0.05 per HIT M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 50
  51. 51. IR Example – Relevance Assessment• Replace TREC-like relevance assessors with MTurk?• Selected topic “space program” (011)• Modified original 4-page instructions from TREC• Workers more accurate than original assessors!• 40% provided justification for each answer O. Alonso and S. Mizzaro. “Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment”, SIGIR Workshop on the Future of IR Evaluation, 2009. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 51
  52. 52. IR Example – Timeline Annotation• Workers annotate timeline on politics, sports, culture• Given a timex (1970s, 1982, etc.) suggest something• Given an event (Vietnam, World cup, etc.) suggest a timex K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”. ECIR 2010 July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 52
  53. 53. IR Example – Is Tweet Interesting?• Detecting uninteresting content text streams – Alonso et al. SIGIR 2010 CSE Workshop.• Is this tweet interesting to the author and friends only?• Workers classify tweets• 5 tweets per HIT, 5 workers, $0.02• 57% is categorically not interestingJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 53
  54. 54. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 54
  55. 55. Started with a joke …July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 55
  56. 56. Results for {idiot} at WSDMFebruary 2011: 5/7 (R), 2/7 (NR) – Most of the time those TV reality stars have absolutely no talent. They do whatever they can to make a quick dollar. Most of the time the reality tv stars don not have a mind of their own. R – Most are just celebrity wannabees. Many have little or no talent, they just want fame. R – I can see this one going both ways. A particular sort of reality star comes to mind, though, one who was voted off Survivor because he chose not to use his immunity necklace. Sometimes the label fits, but sometimes it might be unfair. R – Just because someone else thinks they are an "idiot", doesnt mean that is what the word means. I dont like to think that any one persons photo would be used to describe a certain term. NR – While some reality-television stars are genuinely stupid (or cultivate an image of stupidity), that does not mean they can or should be classified as "idiots." Some simply act that way to increase their TV exposure and potential earnings. Other reality-television stars are really intelligent people, and may be considered as idiots by people who dont like them or agree with them. It is too subjective an issue to be a good result for a search engine. NR – Have you seen the knuckledraggers on reality television? They should be required to change their names to idiot after appearing on the show. You could put numbers after the word idiot so we can tell them apart. R – Although I have not followed too many of these shows, those that I have encountered have for a great part a very common property. That property is that most of the participants involved exhibit a shallow self-serving personality that borders on social pathological behavior. To perform or act in such an abysmal way could only be an act of an idiot. R July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 56
  57. 57. Two Simple Examples of MTurk1. Ask workers to classify a query2. Ask workers to judge document relevanceSteps• Define high-level task• Design & implement interface & backend• Launch, monitor progress, and assess work• Iterate designJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 57
  58. 58. Query Classification Task• Ask the user to classify a query• Show a form that contains a few categories• Upload a few queries (~20)• Use 3 workersJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 58
  59. 59. DEMOJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 59
  60. 60. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 60
  61. 61. Relevance Evaluation Task• Relevance assessment task• Use a few documents from TREC• Ask user to perform binary evaluation• Modification: graded evaluation• Use 5 workersJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 61
  62. 62. DEMOJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 62
  63. 63. Typical Workflow• Define and design what to test• Sample data• Design the experiment• Run experiment• Collect data and analyze results• Quality controlJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 63
  64. 64. Crowdsourcing in Major IR Evaluations• CLEF image • Nowak and Ruger, MIR’10• TREC blog • McCreadie et al., CSE’10, CSDM’11• INEX book • Kazai et al., SIGIR’11• SemSearch • Blanco et al., SIGIR’11July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 64
  65. 65. BREAKJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 65
  66. 66. IV DESIGN OF EXPERIMENTSJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 66
  67. 67. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 67
  68. 68. Survey Design• One of the most important parts• Part art, part science• Instructions are key• Prepare to iterateJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 68
  69. 69. Questionnaire Design• Ask the right questions• Workers may not be IR experts so don’t assume the same understanding in terms of terminology• Show examples• Hire a technical writer – Engineer writes the specification – Writer communicatesJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 69
  70. 70. UX Design• Time to apply all those usability concepts• Generic tips – Experiment should be self-contained. – Keep it short and simple. Brief and concise. – Be very clear with the relevance task. – Engage with the worker. Avoid boring stuff. – Always ask for feedback (open-ended question) in an input box.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 70
  71. 71. UX Design - II• Presentation• Document design• Highlight important concepts• Colors and fonts• Need to grab attention• LocalizationJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 71
  72. 72. Examples - I• Asking too much, task not clear, “do NOT/reject”• Worker has to do a lot of stuffJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 72
  73. 73. Example - II• Lot of work for a few cents• Go here, go there, copy, enter, count …July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 73
  74. 74. A Better Example• All information is available – What to do – Search result – Question to answerJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 74
  75. 75. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 75
  76. 76. Form and Metadata• Form with a close question (binary relevance) and open-ended question (user feedback)• Clear title, useful keywords• Workers need to find your taskJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 76
  77. 77. TREC Assessment – Example IJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 77
  78. 78. TREC Assessment – Example IIJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 78
  79. 79. How Much to Pay?• Price commensurate with task effort – Ex: $0.02 for yes/no answer + $0.02 bonus for optional feedback• Ethics & market-factors: W. Mason and S. Suri, 2010. – e.g. non-profit SamaSource contracts workers refugee camps – Predict right price given market & task: Wang et al. CSDM’11• Uptake & time-to-completion vs. Cost & Quality – Too little $$, no interest or slow – too much $$, attract spammers – Real problem is lack of reliable QA substrate• Accuracy & quantity – More pay = more work, not better (W. Mason and D. Watts, 2009)• Heuristics: start small, watch uptake and bargaining feedback• Worker retention (“anchoring”)See also: L.B. Chilton et al. KDD-HCOMP 2010. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 79
  80. 80. Development Framework• Incremental approach• Measure, evaluate, and adjust as you go• Suitable for repeatable tasksJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 80
  81. 81. Implementation• Similar to a UX• Build a mock up and test it with your team – Yes, you need to judge some tasks• Incorporate feedback and run a test on MTurk with a very small data set – Time the experiment – Do people understand the task?• Analyze results – Look for spammers – Check completion times• Iterate and modify accordinglyJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 81
  82. 82. Implementation – II• Introduce quality control – Qualification test – Gold answers (honey pots)• Adjust passing grade and worker approval rate• Run experiment with new settings & same data• Scale on data• Scale on workersJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 82
  83. 83. Experiment in Production• Lots of tasks on MTurk at any moment• Need to grab attention• Importance of experiment metadata• When to schedule – Split a large task into batches and have 1 single batch in the system – Always review feedback from batch n before uploading n+1July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 83
  84. 84. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 84
  85. 85. Quality Control• Extremely important part of the experiment• Approach as “overall” quality; not just for workers• Bi-directional channel – You may think the worker is doing a bad job. – The same worker may think you are a lousy requester. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 85
  86. 86. Quality Control - II• Approval rate: easy to use, & just as easily defeated – P. Ipeirotis. Be a Top Mechanical Turk Worker: You Need $5 and 5 Minutes. Oct. 2010• Mechanical Turk Masters (June 23, 2011) – Very recent addition, amount of benefit uncertain• Qualification test – Pre-screen workers’ ability to do the task (accurately) – Example and pros/cons in next slides• Assess worker quality as you go – Trap questions with known answers (“honey pots”) – Measure inner-annotator agreement between workers• No guarantees July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 86
  87. 87. A qualification test snippet<Question> <QuestionIdentifier>question1</QuestionIdentifier> <QuestionContent> <Text>Carbon monoxide poisoning is</Text> </QuestionContent> <AnswerSpecification> <SelectionAnswer> <StyleSuggestion>radiobutton</StyleSuggestion> <Selections> <Selection> <SelectionIdentifier>1</SelectionIdentifier> <Text>A chemical technique</Text> </Selection> <Selection> <SelectionIdentifier>2</SelectionIdentifier> <Text>A green energy treatment</Text> </Selection> <Selection> <SelectionIdentifier>3</SelectionIdentifier> <Text>A phenomena associated with sports</Text> </Selection> <Selection> <SelectionIdentifier>4</SelectionIdentifier> <Text>None of the above</Text> </Selection> </Selections> </SelectionAnswer> </AnswerSpecification> July 24, 2011</Question> Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 87
  88. 88. Qualification tests: pros and cons• Advantages – Great tool for controlling quality – Adjust passing grade• Disadvantages – Extra cost to design and implement the test – May turn off workers, hurt completion time – Refresh the test on a regular basis – Hard to verify subjective tasks like judging relevance• Try creating task-related questions to get worker familiar with task before starting task in earnestJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 88
  89. 89. Methods for measuring agreement• What to look for – Agreement, reliability, validity• Inter-agreement level – Agreement between judges – Agreement between judges and the gold set• Some statistics – Percentage agreement – Cohen’s kappa (2 raters) – Fleiss’ kappa (any number of raters) – Krippendorff’s alpha• With majority vote, what if 2 say relevant, 3 say not? – Use expert to break ties (Kochhar et al, HCOMP’10; GQR) – Collect more judgments as needed to reduce uncertaintyJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 89
  90. 90. Inter-rater reliability• Lots of research• Statistics books cover most of the material• Three categories based on the goals – Consensus estimates – Consistency estimates – Measurement estimatesJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 90
  91. 91. Quality control on relevance assessments• INEX 2008 Book track• Home grown system (no MTurk)• Propose a game for collecting assessments• CRA Method G. Kazai, N. Milic-Frayling, and J. Costello. “Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments”, SIGIR 2009.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 91
  92. 92. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 92
  93. 93. Quality Control & Assurance• Filtering – Approval rate (built-in but defeatable) – Geographic restrictions (e.g. US only, built-in) – Worker blocking – Qualification test • Con: slows down experiment, difficult to “test” relevance • Solution: create questions to let user get familiar before the assessment – Does not guarantee success• Assessing quality – Interject verifiable/gold answers (trap questions, honey pots) • P. Ipeitotis. Worker Evaluation in Crowdsourcing: Gold Data or Multiple Workers? Sept. 2010. – 2-tier approach: Group 1 does task, Group 2 verifies • Quinn and B. Bederson’09, Bernstein et al.’10• Identify workers that always disagree with the majority – Risk: masking cases of ambiguity or diversity, “tail” behaviorsJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 93
  94. 94. More on quality control & assurance• HR issues: recruiting, selection, & retention – e.g., post/tweet, design a better qualification test, bonuses, …• Collect more redundant judgments… – at some point defeats cost savings of crowdsourcing – 5 workers is often sufficient• Use better aggregation method – Voting – Consensus – AveragingJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 94
  95. 95. Data quality • Data quality via repeated labeling • Repeated labeling can improve label quality and model quality • When labels are noisy, repeated labeling can preferable to a single labeling • Cost issues with labelingV. Sheng, F. Provost, P. Ipeirotis. “Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers” KDD 2008. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 95
  96. 96. Scales and labels• Binary – Yes, No• 5-point Likert – Strongly disagree, disagree, neutral, agree, strongly agree• Graded relevance: – DCG: Irrelevant, marginally, fairly, highly (Jarvelin, 2000) – TREC: Highly relevant, relevant, (related), not relevant – Yahoo/MS: Perfect, excellent, good, fair, bad (PEGFB) – The Google Quality Raters Handbook (March 2008) – 0 to 10 (0 = totally irrelevant, 10 = most relevant)• Usability factors – Provide clear, concise labels that use plain language – Avoid unfamiliar jargon and terminology July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 96
  97. 97. Was the task difficult?• Ask workers to rate the difficulty of a topic• 50 topics, TREC; 5 workers, $0.01 per taskJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 97
  98. 98. Other quality heuristics• Justification/feedback as quasi-captcha – Successfully used at TREC and INEX experiments – Should be optional – Automatically verifying feedback was written by a person may be difficult (classic spam detection task)• Broken URL/incorrect object – Leave an outlier in the data set – Workers will tell you – If somebody answers “excellent” on a graded relevance test for a broken URL => probably spammerJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 98
  99. 99. MTurk QA: Tools and Packages• QA infrastructure layers atop MTurk promote useful separation-of-concerns from task – TurkIt • Quik Turkit provides nearly realtime services – Turkit-online (??) – Get Another Label (& qmturk) – Turk Surveyor – cv-web-annotation-toolkit (image labeling) – Soylent – Boto (python library) • Turkpipe: submit batches of jobs using the command line.• More needed…July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 99
  100. 100. Dealing with bad workers• Pay for “bad” work instead of rejecting it? – Pro: preserve reputation, admit if poor design at fault – Con: promote fraud, undermine approval rating system• Use bonus as incentive – Pay the minimum $0.01 and $0.01 for bonus – Better than rejecting a $0.02 task• If spammer “caught”, block from future tasks – May be easier to always pay, then block as needed July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 100
  101. 101. Worker feedback• Real feedback received via email after rejection• Worker XXX I did. If you read these articles most of them have nothing to do with space programs. I’m not an idiot.• Worker XXX As far as I remember there wasnt an explanation about what to do when there is no name in the text. I believe I did write a few comments on that, too. So I think youre being unfair rejecting my HITs.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 101
  102. 102. Real email exchange with worker after rejectionWORKER: this is not fair , you made me work for 10 cents and i lost my 30 minutesof time ,power and lot more and gave me 2 rejections atleast you may keep itpending. please show some respect to turkersREQUESTER: Im sorry about the rejection. However, in the directions given in thehit, we have the following instructions: IN ORDER TO GET PAID, you must judge all 5webpages below *AND* complete a minimum of three HITs.Unfortunately, because you only completed two hits, we had to reject those hits.We do this because we need a certain amount of data on which to make decisionsabout judgment quality. Im sorry if this caused any distress. Feel free to contact meif you have any additional questions or concerns.WORKER: I understood the problems. At that time my kid was crying and i went tolook after. thats why i responded like that. I was very much worried about a hitbeing rejected. The real fact is that i havent seen that instructions of 5 web pageand started doing as i do the dolores labs hit, then someone called me and i wentto attend that call. sorry for that and thanks for your kind concern. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 102
  103. 103. Exchange with worker• Worker XXX Thank you. I will post positive feedback for you at Turker Nation.Me: was this a sarcastic comment?• I took a chance by accepting some of your HITs to see if you were a trustworthy author. My experience with you has been favorable so I will put in a good word for you on that website. This will help you get higher quality applicants in the future, which will provide higher quality work, which might be worth more to you, which hopefully means higher HIT amounts in the future.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 103
  104. 104. Build Your Reputation as a Requestor• Word of mouth effect – Workers trust the requester (pay on time, clear explanation if there is a rejection) – Experiments tend to go faster – Announce forthcoming tasks (e.g. tweet)• Disclose your real identity?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 104
  105. 105. Other practical tips• Sign up as worker and do some HITs• “Eat your own dog food”• Monitor discussion forums• Address feedback (e.g., poor guidelines, payments, passing grade, etc.)• Everything counts! – Overall design only as strong as weakest linkJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 105
  106. 106. Content quality• People like to work on things that they like• TREC ad-hoc vs. INEX – TREC experiments took twice to complete – INEX (Wikipedia), TREC (LA Times, FBIS)• Topics – INEX: Olympic games, movies, salad recipes, etc. – TREC: cosmic events, Schengen agreement, etc.• Content and judgments according to modern times – Airport security docs are pre 9/11 – Antarctic exploration (global warming )July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 106
  107. 107. Content quality - II• Document length• Randomize content• Avoid worker fatigue – Judging 100 documents on the same subject can be tiring, leading to decreasing qualityJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 107
  108. 108. Presentation• People scan documents for relevance cues• Document design• Highlighting no more than 10%July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 108
  109. 109. Presentation - IIJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 109
  110. 110. Relevance justification• Why settle for a label?• Let workers justify answers• INEX – 22% of assignments with comments• Must be optional• Let’s see how people justifyJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 110
  111. 111. “Relevant” answers [Salad Recipes] Doesnt mention the word salad, but the recipe is one that could be considered a salad, or a salad topping, or a sandwich spread. Egg salad recipe Egg salad recipe is discussed. History of salad cream is discussed. Includes salad recipe It has information about salad recipes. Potato Salad Potato salad recipes are listed. Recipe for a salad dressing. Salad Recipes are discussed. Salad cream is discussed. Salad info and recipe The article contains a salad recipe. The article discusses methods of making potato salad. The recipe is for a dressing for a salad, so the information is somewhat narrow for the topic but is still potentially relevant for a researcher. This article describes a specific salad. Although it does not list a specific recipe, it does contain information relevant to the search topic. gives a recipe for tuna salad relevant for tuna salad recipes relevant to salad recipes this is on-topic for salad recipesJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 111
  112. 112. “Not relevant” answers[Salad Recipes]About gaming not salad recipes.Article is about Norway.Article is about Region Codes.Article is about forests.Article is about geography.Document is about forest and trees.Has nothing to do with salad or recipes.Not a salad recipeNot about recipesNot about salad recipesThere is no recipe, just a comment on how salads fit into meal formats.There is nothing mentioned about salads.While dressings should be mentioned with salads, this is an article on one specific type of dressing, no recipe for salads.article about a swiss tv showcompletely off-topic for salad recipesnot a salad recipenot about salad recipestotally off baseJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 112
  113. 113. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 113
  114. 114. Feedback length• Workers will justify answers• Has to be optional for good feedback• In E51, mandatory comments – Length dropped – “Relevant” or “Not Relevant July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 114
  115. 115. Other design principles• Text alignment• Legibility• Reading level: complexity of words and sentences• Attractiveness (worker’s attention & enjoyment)• Multi-cultural / multi-lingual• Who is the audience (e.g. target worker community) – Special needs communities (e.g. simple color blindness)• Parsimony• Cognitive load: mental rigor needed to perform task• Exposure effectJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 115
  116. 116. Platform alternatives• Why MTurk – Amazon brand, lots of research papers – Speed, price, diversity, payments• Why not – Crowdsourcing != Mturk – Spam, no analytics, must build tools for worker & task quality• How to build your own crowdsourcing platform – Back-end – Template language for creating experiments – Scheduler – Payments?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 116
  117. 117. The human side• As a worker – I hate when instructions are not clear – I’m not a spammer – I just don’t get what you want – Boring task – A good pay is ideal but not the only condition for engagement• As a requester – Attrition – Balancing act: a task that would produce the right results and is appealing to workers – I want your honest answer for the task – I want qualified workers; system should do some of that for me• Managing crowds and tasks is a daily activity – more difficult than managing computers July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 117
  118. 118. Things that work• Qualification tests• Honey-pots• Good content and good presentation• Economy of attention• Things to improve – Manage workers in different levels of expertise including spammers and potential cases. – Mix different pools of workers based on different profile and expertise levels.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 118
  119. 119. Things that need work• UX and guidelines – Help the worker – Cost of interaction• Scheduling and refresh rate• Exposure effect• Sometimes we just don’t agree• How crowdsourcable is your taskJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 119
  120. 120. VFROM LABELING TO HUMAN COMPUTATIONJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 120
  121. 121. The Turing Test (Alan Turing, 1950)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 121
  122. 122. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 122
  123. 123. The Turing Test (Alan Turing, 1950)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 123
  124. 124. What is a Computer?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 124
  125. 125. • What was old becomes new• “Crowdsourcing: A New Branch of Computer Science” (March 29, 2011) Princeton University Press, 2005 July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 125
  126. 126. Davis et al. (2010) The HPU. HPUJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 126
  127. 127. Human ComputationRebirth of people as ‘computists’; people do tasks computers cannot (do well)Stage 1: Detecting robots – CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart – No useful work produced; people just answer questions with known answersStage 2: Labeling data (at scale) – E.g. ESP game, typical use of MTurk – Game changer for AI: starving for dataStage 3: General “human computation” (HPU) – people do arbitrarily sophisticated tasks (i.e. compute arbitrary functions) – HPU as core component in system architecture, many “HPC” invocations – blend HPU with automation for a new class of hybrid applications – New tradeoffs possible in latency/cost vs. functionality/accuracyL. von Ahn has pioneered the field. See bibliography for examples of his work. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 127
  128. 128. Mobile Phone App: “Amazon Remembers”July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 128
  129. 129. ReCaptchaL. von Ahn et al. (2008). In Science.Harnesses human work as invisible by product.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 129
  130. 130. CrowdSearch and mCrowd• T. Yan, MobiSys 2010July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 130
  131. 131. Soylent: A Word Processor with a Crowd Inside • Bernstein et al., UIST 2010 July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 131
  132. 132. Translation by monolingual speakers• C. Hu, CHI 2009July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 132
  133. 133. fold.it• S. Cooper et al. (2010).July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 133
  134. 134. VI WORKER INCENTIVESJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 134
  135. 135. Worker Incentives• Pay ($$$)• Fun (or avoid boredom)• Socialize• Earn acclaim/prestige• Altruism• Learn something new (e.g. English)• Unintended by-product (e.g. re-Captcha)• Create self-serving resource (e.g. Wikipedia)Multiple incentives are typically at work in parallel July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 135
  136. 136. Pay ($$$) P. Ipeirotis March 2010July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 136
  137. 137. Pro• Ready marketplaces (e.g. MTurk, CrowdFlower, …)• Less need for creativity• Simple motivation knobCon: Quality and Quality control required• Can diminish intrinsic rewards that promote quality: – Fun/altruistic value of task – Taking pride in doing quality work Pay ($$$) – Self-assessment• Can attract workers only interested in the pay, fraud• $$$ (though other schemes cost indirectly)How much to pay?• Mason & Watts 2009: more $ = more work, not better work• Wang et al. 2011: predict from market?• More later…Zittrain 2010: if Encarta had paid for contributions, would we have Wikipedia? July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 137
  138. 138. Fun (or avoid boredom)• Games with a Purpose (von Ahn) – Data is by-product – IR: Law et al. SearchWar. HCOMP 2009.• distinct from Serious Gaming / Edutainment – Player learning / training / education is by-productJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 138
  139. 139. • Learning to map from web pages to queries • Human computation game to elicit data • Home grown system (no AMT) • Try it! pagehunt.msrlivelabs.comSee also:• H. Ma. et al. “Improving Search Engines Using Human Computation Games”, CIKM 2009.• Law et al. SearchWar. HCOMP 2009.• Bennett et al. Picture This. HCOMP 2009. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 139
  140. 140. Fun (or avoid boredom)• Pro: – Enjoyable “work” people want to do (or at least better than anything else they have to do) – Scalability potential from involving non-workers• Con: – Need for design creativity • some would say this is a plus • better performance in game should produce better/more work – Some tasks more amenable than others • Annotating syntactic parse trees for fun? • Inferring syntax implicitly from a different activity? July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 140
  141. 141. Socialization & PrestigeJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 141
  142. 142. Socialization & Prestige• Pro: – “free” – enjoyable for connecting with one another – can share infrastructure across tasks• Con: – need infrastructure beyond simple micro-task – need critical mass (for uptake and reward) – social engineering knob more complex than $$July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 142
  143. 143. Altruism• Contribute knowledge• Help others (who need knowledge)• Help workers (e.g. SamaSource)• Charity (e.g. http://www.freerice.com) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 143
  144. 144. Altruism• Pro – “free” – can motivate quality work for a cause• Con – Seemingly small workforce for pure altruismWhat if Mechanical Turk let you donate $$ per HIT?July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 144
  145. 145. Unintended by-product• Pro – effortless (unnoticed) work – Scalability from involving non-workers• Con – Design challenge • Given existing activity, find useful work to harness from it • Given target work, find or create another activity for which target work is by-product? – Maybe too invisible (disclosure, manipulation)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 145
  146. 146. Multiple Incentives• Ideally maximize all• Wikipedia, cQA, Gwap – fun, socialization, prestige, altruism• Fun vs. Pay – gwap gives Amazon certificates – Workers maybe paid in game currency – Pay tasks can also be fun themselves• Pay-based – Other rewards: e.g. learn something, socialization – altruism: worker (e.g. SamaSource) or task itself – social network integration could help everyone (currently separate and lacking structure)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 146
  147. 147. VII THE ROAD AHEADJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 147
  148. 148. Wisdom of Crowds (WoC)Requires• Diversity• Independence• Decentralization• AggregationInput: large, diverse sample (to increase likelihood of overall pool quality)Output: consensus or selection (aggregation)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 148
  149. 149. WoC vs. Ensemble Learning• Combine multiple models to improve performance over any constituent model – Can use many weak learners to make a strong one – Compensate for poor models with extra computation• Works better with diverse, independent learners• cf. NIPS’10 Workshop – Computational Social Science & the Wisdom of Crowds• More investigation needed of traditional feature- based machine learning & ensemble methods for consensus labeling with crowdsourcing July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 149
  150. 150. Unreasonable Effectiveness of Data• Massive free Web data changed how we train learning systems – Banko and Brill (2001). Human Language Tech. – Halevy et al. (2009). IEEE Intelligent Systems. • How might access to cheap & plentiful labeled data change the balance again? July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 150
  151. 151. MapReduce with human computation• Commonalities – Large task divided into smaller sub-problems – Work distributed among worker nodes (workers) – Collect all answers and combine them – Varying performance of heterogeneous CPUs/HPUs• Variations – Human response latency / size of “cluster” – Some tasks are not suitableJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 151
  152. 152. CrowdForge: MapReduce for Automation + Human Computation Kittur et al., CHI 2011July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 152
  153. 153. Research problems – operational• Methodology – Budget, people, document, queries, presentation, incentives, etc. – Scheduling – Quality• What’s the best “mix” of HC for a task?• What are the tasks suitable for HC?• Can I crowdsource my task? – Eickhoff and de Vries, WSDM 2011 CSDM WorkshopJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 153
  154. 154. More problems• Human factors vs. outcomes• Editors vs. workers• Pricing tasks• Predicting worker quality from observable properties (e.g. task completion time)• HIT / Requestor ranking or recommendation• Expert search : who are the right workers given task nature and constraints• Ensemble methods for Crowd Wisdom consensusJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 154
  155. 155. Problems: crowds, clouds and algorithms• Infrastructure – Current platforms are very rudimentary – No tools for data analysis• Dealing with uncertainty (propagate rather than mask) – Temporal and labeling uncertainty – Learning algorithms – Search evaluation – Active learning (which example is likely to be labeled correctly)• Combining CPU + HPU – Human Remote Call? – Procedural vs. declarative? – Integration points with enterprise systems July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 155
  156. 156. Conclusions• Crowdsourcing for relevance evaluation works• Fast turnaround, easy to experiment, cheap• Still have to design the experiments carefully!• Usability considerations• Worker quality• User feedback extremely usefulJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 156
  157. 157. Conclusions - II• Crowdsourcing is here to stay• Lots of opportunities to improve current platforms• Integration with current systems• MTurk is a popular platform and others are emerging• Open research problemsJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 157
  158. 158. VIII RESOURCES AND REFERENCESJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 158
  159. 159. Little written for the general public• July 2010, kindle-only• “This book introduces you to the top crowdsourcing sites and outlines step by step with photos the exact process to get started as a requester on Amazon Mechanical Turk.“July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 159
  160. 160. Crowdsourcing @ SIGIR’11 Workshop on Crowdsourcing for Information Retrieval Roi Blanco, Harry Halpin, Daniel Herzig, Peter Mika, Jeffrey Pound, Henry Thompson, Thanh D. Tran. “Repeatable and Reliable Search System Evaluation using Crowd-Sourcing”. Yen-Ta Huang, An-Jung Cheng, Liang-Chi Hsieh, Winston H. Hsu, Kuo-Wei Chang. “Region-Based Landmark Discovery by Crowdsourcing Geo-Referenced Photos.” Poster. Gabriella Kazai, Jaap Kamps, Marijn Koolen, Natasa Milic-Frayling. “Crowdsourcing for Book Search Evaluation: Impact of Quality on Comparative System Ranking.” Abhimanu Kumar, Matthew Lease . “Learning to Rank From a Noisy Crowd”. Poster. Edith Law, Paul N. Bennett, and Eric Horvitz. “The Effects of Choice in Routing Relevance Judgments”. Poster. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 160
  161. 161. 2011 Workshops & ConferencesSIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28)• WSDM-CSDM: Crowdsourcing for Search and Data Mining (Feb. 9)• CHI-CHC: Crowdsourcing and Human Computation (May 8)• Crowdsourcing: Improving … Scientific Data Through Social Networking (June 13)• Crowdsourcing Technologies for Language and Cognition Studies (July 27)• 2011 AAAI-HCOMP: 3rd Human Computation Workshop (Aug. 8)• UbiComp: 2nd Workshop on Ubiquitous Crowdsourcing (Sep. 18)• CIKM: BooksOnline (Oct. 24, “crowdsourcing … online books”)• CrowdConf 2011 -- 2nd Conf. on the Future of Distributed Work (Nov. 1-2)• TREC-Crowd: Crowdsourcing Track at TREC (Nov. 16-18)• ACIS: Crowdsourcing, Value Co-Creation, & Digital Economy Innovation (Nov. 30 – Dec. 2) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 161
  162. 162. 2011 Tutorials• SIGIR (yep, this is it!)• WSDM: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You – Omar Alonso and Matthew Lease (Feb. 9)• WWW: Managing Crowdsourced Human Computation – Panos Ipeirotis and Praveen Paritosh (March 29)• HCIC: Quality Crowdsourcing for Human Computer Interaction Research – Ed Chi (June 14-18) – Also see Chi’s Crowdsourcing for HCI Research with Amazon Mechanical Turk• AAAI: Human Computation: Core Research Questions and State of the Art – Edith Law and Luis von Ahn (Aug. 7)• VLDB: Crowdsourcing Applications and Platforms – AnHai Doan, Michael Franklin, Donald Kossmann, and Tim Kraska (Aug. 29)• CrowdConf: Crowdsourcing for Fun and Profit – Omar Alonso and Matthew Lease (Nov. 1) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 162
  163. 163. Other Events & Resources ir.ischool.utexas.edu/crowd2011 book: Omar Alonso, Gabriella Kazai, andStefano Mizzaro. Crowdsourcing for Search EngineEvaluation: Why and How.Forthcoming special issue of Springer’s InformationRetrieval journal on Crowdsourcing July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 163
  164. 164. Thank You!For questions about tutorial or crowdsourcing, email: omar.alonso@microsoft.com ml@ischool.utexas.eduCartoons by Mateo Burtch (buta@sonic.net)July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 164
  165. 165. Crowdsourcing in IR: 2008-2010 2008  O. Alonso, D. Rose, and B. Stewart. “Crowdsourcing for relevance evaluation”, SIGIR Forum, Vol. 42, No. 2. 2009  O. Alonso and S. Mizzaro. “Can we get rid of TREC Assessors? Using Mechanical Turk for … Assessment”. SIGIR Workshop on the Future of IR Evaluation.  P.N. Bennett, D.M. Chickering, A. Mityagin. Learning Consensus Opinion: Mining Data from a Labeling Game. WWW.  G. Kazai, N. Milic-Frayling, and J. Costello. “Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments”, SIGIR.  G. Kazai and N. Milic-Frayling. “… Quality of Relevance Assessments Collected through Crowdsourcing”. SIGIR Workshop on the Future of IR Evaluation.  Law et al. “SearchWar”. HCOMP.  H. Ma, R. Chandrasekar, C. Quirk, and A. Gupta. “Improving Search Engines Using Human Computation Games”, CIKM 2009. 2010  SIGIR Workshop on Crowdsourcing for Search Evaluation.  O. Alonso, R. Schenkel, and M. Theobald. “Crowdsourcing Assessments for XML Ranked Retrieval”, ECIR.  K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”, ECIR.  C. Grady and M. Lease. “Crowdsourcing Document Relevance Assessment with Mechanical Turk”. NAACL HLT Workshop on … Amazons Mechanical Turk.  Grace Hui Yang, Anton Mityagin, Krysta M. Svore, and Sergey Markov . “Collecting High Quality Overlapping Labels at Low Cost”. SIGIR.  G. Kazai. “An Exploration of the Influence that Task Parameters Have on the Performance of Crowds”. CrowdConf.  G. Kazai. “… Crowdsourcing in Building an Evaluation Platform for Searching Collections of Digitized Books”., Workshop on Very Large Digital Libraries (VLDL)  Stephanie Nowak and Stefan Ruger. How Reliable are Annotations via Crowdsourcing? MIR.  Jean-François Paiement, Dr. James G. Shanahan, and Remi Zajac. “Crowdsourcing Local Search Relevance”. CrowdConf.  Maria Stone and Omar Alonso. “A Comparison of On-Demand Workforce with Trained Judges for Web Search Relevance Evaluation”. CrowdConf.  T. Yan, V. Kumar, and D. Ganesan. CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys pp. 77--90, 2010. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 165
  166. 166. Crowdsourcing in IR: 2011 WSDM Workshop on Crowdsourcing for Search and Data Mining. SIGIR Workshop on Crowdsourcing for Information Retrieval. SIGIR papers/posters mentioned earlier O. Alonso and R. Baeza-Yates. “Design and Implementation of Relevance Assessments using Crowdsourcing, ECIR. G. Kasneci, J. Van Gael, D. Stern, and T. Graepel, CoBayes: Bayesian Knowledge Corroboration with Assessors of Unknown Areas of Expertise, WSDM. Hyun Joon Jung, Matthew Lease . “Improving Consensus Accuracy via Z-score and Weighted Voting”. HCOMP. Poster. Gabriella Kazai,. “In Search of Quality in Crowdsourcing for Search Engine Evaluation”, ECIR. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 166
  167. 167. Bibliography: General IR M. Hearst. “Search User Interfaces”, Cambridge University Press, 2009 K. Jarvelin, and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. Proceedings of the 23rd annual international ACM SIGIR conference . pp.41—48, 2000. M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008. D. Kelly. “Methods for evaluating interactive information retrieval systems with users”. Foundations and Trends in Information Retrieval, 3(1-2), 1-224, 2009. S. Mizzaro. Measuring the agreement among relevance judges, MIRA 1999 J. Tang and M. Sanderson. “Evaluation and User Preference Study on Spatial Diversity”, ECIR 2010 July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 167
  168. 168. Bibliography: Other J. Barr and L. Cabrera. “AI gets a Brain”, ACM Queue, May 2006. Bernstein, M. et al. Soylent: A Word Processor with a Crowd Inside. UIST 2010. Best Student Paper award. Bederson, B.B., Hu, C., & Resnik, P. Translation by Iteractive Collaboration between Monolingual Users, Proceedings of Graphics Interface (GI 2010), 39-46. N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004. C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009. P. Dai, Mausam, and D. Weld. “Decision-Theoretic of Crowd-Sourced Workflows”, AAAI, 2010. J. Davis et al. “The HPU”, IEEE Computer Vision and Pattern Recognition Workshop on Advancing Computer Vision with Human in the Loop (ACVHL), June 2010. M. Gashler, C. Giraud-Carrier, T. Martinez. Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, ICMLA 2008. D. A. Grief. When Computers Were Human. Princeton University Press, 2005. ISBN 0691091579 JS. Hacker and L. von Ahn. “Matchin: Eliciting User Preferences with an Online Game”, CHI 2009. J. Heer, M. Bobstock. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design”, CHI 2010. P. Heymann and H. Garcia-Molina. “Human Processing”, Technical Report, Stanford Info Lab, 2010. J. Howe. “Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business”. Crown Business, New York, 2008. P. Hsueh, P. Melville, V. Sindhwami. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria”. NAACL HLT Workshop on Active Learning and NLP, 2009. B. Huberman, D. Romero, and F. Wu. “Crowdsouring, attention and productivity”. Journal of Information Science, 2009. P.G. Ipeirotis. The New Demographics of Mechanical Turk. March 9, 2010. PDF and Spreadsheet. P.G. Ipeirotis, R. Chandrasekar and P. Bennett. Report on the human computation workshop. SIGKDD Explorations v11 no 2 pp. 80-83, 2010. P.G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. CeDER-10-04 (Sept. 11, 2010) July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 168
  169. 169. Bibliography: Other (2) A. Kittur, E. Chi, and B. Suh. “Crowdsourcing user studies with Mechanical Turk”, SIGCHI 2008. Aniket Kittur, Boris Smus, Robert E. Kraut. CrowdForge: Crowdsourcing Complex Work. CHI 2011 Adriana Kovashka and Matthew Lease. “Human and Machine Detection of … Similarity in Art”. CrowdConf 2010. K. Krippendorff. "Content Analysis", Sage Publications, 2003 G. Little, L. Chilton, M. Goldman, and R. Miller. “TurKit: Tools for Iterative Tasks on Mechanical Turk”, HCOMP 2009. T. Malone, R. Laubacher, and C. Dellarocas. Harnessing Crowds: Mapping the Genome of Collective Intelligence. 2009. W. Mason and D. Watts. “Financial Incentives and the ’Performance of Crowds’”, HCOMP Workshop at KDD 2009. J. Nielsen. “Usability Engineering”, Morgan-Kaufman, 1994. A. Quinn and B. Bederson. “A Taxonomy of Distributed Human Computation”, Technical Report HCIL-2009-23, 2009 J. Ross, L. Irani, M. Six Silberman, A. Zaldivar, and B. Tomlinson. “Who are the Crowdworkers?: Shifting Demographics in Amazon Mechanical Turk”. CHI 2010. F. Scheuren. “What is a Survey” (http://www.whatisasurvey.info) 2004. R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. “Cheap and Fast But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks”. EMNLP-2008. V. Sheng, F. Provost, P. Ipeirotis. “Get Another Label? Improving Data Quality … Using Multiple, Noisy Labelers” KDD 2008. S. Weber. “The Success of Open Source”, Harvard University Press, 2004. L. von Ahn. Games with a purpose. Computer, 39 (6), 92–94, 2006. L. von Ahn and L. Dabbish. “Designing Games with a purpose”. CACM, Vol. 51, No. 8, 2008.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 169
  170. 170. Bibliography: Other (3) C. Marshall and F. Shipman “The Ownership and Reuse of Visual Media”, JCDL, 2011. AnHai Doan, Raghu Ramakrishnan, Alon Y. Halevy: Crowdsourcing systems on the World-Wide Web. CACM, 2011 Paul Heymann, Hector Garcia-Molina: Turkalytics: analytics for human computation. WWW 2011.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 170
  171. 171. Other ResourcesBlogs Behind Enemy Lines (P.G. Ipeirotis, NYU) Deneme: a Mechanical Turk experiments blog (Gret Little, MIT) CrowdFlower Blog http://experimentalturk.wordpress.com Jeff HoweSites The Crowdsortium Crowdsourcing.org CrowdsourceBase (for workers) Daily CrowdsourceMTurk Forums and Resources Turker Nation: http://turkers.proboards.com http://www.turkalert.com (and its blog) Turkopticon: report/avoid shady requestors Amazon Forum for MTurkJuly 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 171
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×