Your SlideShare is downloading. ×
Mechanical Turk for Social Science Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Mechanical Turk for Social Science Introduction

3,191
views

Published on

Published in: Education, Technology

1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,191
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Tasks can be sorted by price or number of HITs available, among other things. To increase participation, you generally want to appear higher on at least one of these lists.
  • For this study, we wanted Conservative Republicans and Liberal Democrats, not people with neutral views, Liberal Republicans, or Conservative Democrats.
  • Restricting who can participate.
  • If not automatically scored, the qualification introduces an even bigger delay in the process, and you’ll lose workers. But scoring it yourself allows a lot more control, and lets you retain turker answer data.
  • Transcript

    • 1. Mechanical Turk for Social Science
      Sean Munson EytanBakshy
      School of Information, UMich 28 October 2009
    • 2. 11:00 am - Problem:Need to classify thousands of blogs according to category.
    • 3. Lunch*
      *not actual lunch
    • 4.
    • 5. 1:00 pm
      50 blogs classified5x each
    • 6. Mechanical Turk for Social Science Awesome
      Sean Munson EytanBakshy
      An API made of people!
    • 7. Overview
      Who are the Turkers?
      Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitable
      Tasks from Turkers’ and requesters’ points of view
      Examples
      Classifying links
      Reacting to collections of links
      Practicalities
      Tools
      Paying Turkers at UMich
      Human Subjects
      Slides will be available online.
    • 8. Who are the Turkers?
    • 9. Andy Baio, Faces of Mechanical Turk
    • 10. Andy Baio, Faces of Mechanical Turk
    • 11. Andy Baio, Faces of Mechanical Turk
    • 12. 300 Turker Survey from PanosIpeirotis
      Limited by self-selection issues (people who do tasks w/ only one available, and at that pay).
      By country:
      76% US; 8% India; 3% UK; 2% Canada
    • 13.
    • 14.
    • 15.
    • 16. Ideal types of tasks
      Short duration
      Repetitive – Turker learns once, repeats many
      No particular expertise required
      From requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.
      tasks that require people to write something
      assess quality using multiple raters
      but you can use it in other ways.
    • 17. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
    • 18. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
    • 19. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
    • 20. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
    • 21. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
      Create task type
      Load Task instances
      (prepay)
      Flickr:Michelle Gibson
    • 22.
    • 23.
    • 24.
    • 25.
    • 26.
    • 27. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
      Create task type
      Load Task instances
      (prepay)
      Approve or reject tasks
    • 28. Turkers as Classifiers
    • 29. Large-scale study of diffusion and influence on Twitter
      How does the spread of a URL over the twitter network depend on the content?
      What proportion of “influential” users are mass media vs. individuals
      Requires thousands of labels of URLs and users. Needs to be fast and cheap.
    • 30.
    • 31.
    • 32.
    • 33. Turkers as Subjects
    • 34. Turkers as Subjects – Challenges
      Hard to check answer quality when you want opinions!
      Screening & treatment randomization
      mTurk not optimized for 1x tasks
    • 35.
    • 36. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Get Paid
      Complete task
      Create task type
      Load Task instances
      (prepay)
      Approve or reject tasks
    • 37. How to screen?
      Liberal
      Republican
      Democrat
      Conservative
    • 38. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Take Qualification
      Get Paid
      Complete task
      Create task type
      Load Task instances
      (prepay)
      Require 95% task approval rating
      Require US location
      Ask demographics, political preferences
      Approve or reject tasks
    • 39. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Take Qualification
      Get Paid
      Complete task
      Create task type
      Load Task instances
      (prepay)
      Approve or reject tasks
      Evaluate Qualification: Grant or reject
      Create or use existing qualification
    • 40. Checking for validity
      Couldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the list
      Did have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences
    • 41. Total cost: $382 for 485 collection ratings
      Had to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.
    • 42. Practicalities
    • 43. Tools
      Web interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability.
      Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.
      Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)
      3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut.
      CrowdFlower – from Dolores Labs: crowdflower.com
      Smartsheet: smartsheet.com/product/smartsourcing
    • 44. Human subjects?
      Human subjects status varies with design
      Categorizing content: Not human subjects
      Asking for reactions to content: Human subjects.
      Informed Consent
      My preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)
      You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks.
    • 45. Subject payment
      mTurk handles all payment, but
      Associate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.
    • 46. Automatically accept another task of this type, or go find a new task
      Task listing – Preview & select task
      Take Qualification
      Get Paid
      Complete task
      Turker
      Create task type
      Load Task instances
      (prepay)
      Approve or reject tasks
      Evaluate Qualification: Grant or reject
      Scoring
      • Automatically score: instant grant / reject, requires right & wrong answers
      • 47. Download & score: Good for participant screening, fast turnaround (run every minute), random assignment
      Can set limits on retaking
      Too many rejects? Revoke qualification.
      Create or use existing qualification
      • Must be hosted by Amazon
      • 48. Built in quals for location, reputation
      Requester
      Can assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.
    • 49. Some references & resources
      General
      Dolores Labs blog: http://blog.doloreslabs.com/
      Turker Nation forums: http://turkers.proboards.com
      5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/
      Turker Demographics
      Survey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html
      Turker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html
      Why do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.html
      Why do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html
    • 50. Some references & resources
      Improving Answer quality
      AniketKittur, Ed H. Chi, and BongwonSuh (2008). “Crowdsourcing user studies with Mechanical Turk,” CHI 2008.
      Answer quality and dealing with bad answers
      Carpenter, Bob. 2008. Hierarchical Bayesian Models of Categorical Data
      Raykar et al. (2009) Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit, ICML.
      Worker quality & HIT difficultyhttp://behind-the-enemy-lines.blogspot.com/2008/08/mechanical-turk-worker-quality-and-hit.html
      Also see literature on scoring a test without an answer key
    • 51. Some references & resources
      Turker effort, skills, participation rate, and pay
      W Mason, D Watts. (2009). Financial Incentives and the Performance of Crowds. KDD Workshop on Human Computation.
      Self report on skillshttp://behind-the-enemy-lines.blogspot.com/2009/01/how-good-are-you-turker.html
      Human Subjects
      Consent in qualification testshttp://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html
      Discussionhttp://behind-the-enemy-lines.blogspot.com/2009/01/mechanical-turk-human-subjects-and-irbs.html