• Save
Mechanical Turk for Social Science Introduction
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Mechanical Turk for Social Science Introduction

  • 4,529 views
Uploaded on

 

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,529
On Slideshare
4,499
From Embeds
30
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
1
Likes
10

Embeds 30

http://www.slideshare.net 19
https://twitter.com 11

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Tasks can be sorted by price or number of HITs available, among other things. To increase participation, you generally want to appear higher on at least one of these lists.
  • For this study, we wanted Conservative Republicans and Liberal Democrats, not people with neutral views, Liberal Republicans, or Conservative Democrats.
  • Restricting who can participate.
  • If not automatically scored, the qualification introduces an even bigger delay in the process, and you’ll lose workers. But scoring it yourself allows a lot more control, and lets you retain turker answer data.

Transcript

  • 1. Mechanical Turk for Social Science
    Sean Munson EytanBakshy
    School of Information, UMich 28 October 2009
  • 2. 11:00 am - Problem:Need to classify thousands of blogs according to category.
  • 3. Lunch*
    *not actual lunch
  • 4.
  • 5. 1:00 pm
    50 blogs classified5x each
  • 6. Mechanical Turk for Social Science Awesome
    Sean Munson EytanBakshy
    An API made of people!
  • 7. Overview
    Who are the Turkers?
    Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitable
    Tasks from Turkers’ and requesters’ points of view
    Examples
    Classifying links
    Reacting to collections of links
    Practicalities
    Tools
    Paying Turkers at UMich
    Human Subjects
    Slides will be available online.
  • 8. Who are the Turkers?
  • 9. Andy Baio, Faces of Mechanical Turk
  • 10. Andy Baio, Faces of Mechanical Turk
  • 11. Andy Baio, Faces of Mechanical Turk
  • 12. 300 Turker Survey from PanosIpeirotis
    Limited by self-selection issues (people who do tasks w/ only one available, and at that pay).
    By country:
    76% US; 8% India; 3% UK; 2% Canada
  • 13.
  • 14.
  • 15.
  • 16. Ideal types of tasks
    Short duration
    Repetitive – Turker learns once, repeats many
    No particular expertise required
    From requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.
    tasks that require people to write something
    assess quality using multiple raters
    but you can use it in other ways.
  • 17. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
  • 18. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
  • 19. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
  • 20. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
  • 21. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
    Create task type
    Load Task instances
    (prepay)
    Flickr:Michelle Gibson
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
    Create task type
    Load Task instances
    (prepay)
    Approve or reject tasks
  • 28. Turkers as Classifiers
  • 29. Large-scale study of diffusion and influence on Twitter
    How does the spread of a URL over the twitter network depend on the content?
    What proportion of “influential” users are mass media vs. individuals
    Requires thousands of labels of URLs and users. Needs to be fast and cheap.
  • 30.
  • 31.
  • 32.
  • 33. Turkers as Subjects
  • 34. Turkers as Subjects – Challenges
    Hard to check answer quality when you want opinions!
    Screening & treatment randomization
    mTurk not optimized for 1x tasks
  • 35.
  • 36. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Get Paid
    Complete task
    Create task type
    Load Task instances
    (prepay)
    Approve or reject tasks
  • 37. How to screen?
    Liberal
    Republican
    Democrat
    Conservative
  • 38. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Take Qualification
    Get Paid
    Complete task
    Create task type
    Load Task instances
    (prepay)
    Require 95% task approval rating
    Require US location
    Ask demographics, political preferences
    Approve or reject tasks
  • 39. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Take Qualification
    Get Paid
    Complete task
    Create task type
    Load Task instances
    (prepay)
    Approve or reject tasks
    Evaluate Qualification: Grant or reject
    Create or use existing qualification
  • 40. Checking for validity
    Couldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the list
    Did have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences
  • 41. Total cost: $382 for 485 collection ratings
    Had to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.
  • 42. Practicalities
  • 43. Tools
    Web interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability.
    Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.
    Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)
    3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut.
    CrowdFlower – from Dolores Labs: crowdflower.com
    Smartsheet: smartsheet.com/product/smartsourcing
  • 44. Human subjects?
    Human subjects status varies with design
    Categorizing content: Not human subjects
    Asking for reactions to content: Human subjects.
    Informed Consent
    My preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)
    You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks.
  • 45. Subject payment
    mTurk handles all payment, but
    Associate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.
  • 46. Automatically accept another task of this type, or go find a new task
    Task listing – Preview & select task
    Take Qualification
    Get Paid
    Complete task
    Turker
    Create task type
    Load Task instances
    (prepay)
    Approve or reject tasks
    Evaluate Qualification: Grant or reject
    Scoring
    • Automatically score: instant grant / reject, requires right & wrong answers
    • 47. Download & score: Good for participant screening, fast turnaround (run every minute), random assignment
    Can set limits on retaking
    Too many rejects? Revoke qualification.
    Create or use existing qualification
    • Must be hosted by Amazon
    • 48. Built in quals for location, reputation
    Requester
    Can assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.
  • 49. Some references & resources
    General
    Dolores Labs blog: http://blog.doloreslabs.com/
    Turker Nation forums: http://turkers.proboards.com
    5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/
    Turker Demographics
    Survey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html
    Turker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html
    Why do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.html
    Why do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html
  • 50. Some references & resources
    Improving Answer quality
    AniketKittur, Ed H. Chi, and BongwonSuh (2008). “Crowdsourcing user studies with Mechanical Turk,” CHI 2008.
    Answer quality and dealing with bad answers
    Carpenter, Bob. 2008. Hierarchical Bayesian Models of Categorical Data
    Raykar et al. (2009) Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit, ICML.
    Worker quality & HIT difficultyhttp://behind-the-enemy-lines.blogspot.com/2008/08/mechanical-turk-worker-quality-and-hit.html
    Also see literature on scoring a test without an answer key
  • 51. Some references & resources
    Turker effort, skills, participation rate, and pay
    W Mason, D Watts. (2009). Financial Incentives and the Performance of Crowds. KDD Workshop on Human Computation.
    Self report on skillshttp://behind-the-enemy-lines.blogspot.com/2009/01/how-good-are-you-turker.html
    Human Subjects
    Consent in qualification testshttp://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html
    Discussionhttp://behind-the-enemy-lines.blogspot.com/2009/01/mechanical-turk-human-subjects-and-irbs.html