Mechanical Turk for Social Science Introduction

3,809 views

Published on

Published in: Education, Technology
1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total views
3,809
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
0
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide
  • Tasks can be sorted by price or number of HITs available, among other things. To increase participation, you generally want to appear higher on at least one of these lists.
  • For this study, we wanted Conservative Republicans and Liberal Democrats, not people with neutral views, Liberal Republicans, or Conservative Democrats.
  • Restricting who can participate.
  • If not automatically scored, the qualification introduces an even bigger delay in the process, and you’ll lose workers. But scoring it yourself allows a lot more control, and lets you retain turker answer data.
  • Mechanical Turk for Social Science Introduction

    1. 1. Mechanical Turk for Social Science<br />Sean Munson EytanBakshy<br />School of Information, UMich 28 October 2009<br />
    2. 2. 11:00 am - Problem:Need to classify thousands of blogs according to category.<br />
    3. 3. Lunch*<br />*not actual lunch<br />
    4. 4.
    5. 5. 1:00 pm<br />50 blogs classified5x each<br />
    6. 6. Mechanical Turk for Social Science Awesome<br />Sean Munson EytanBakshy<br />An API made of people!<br />
    7. 7. Overview<br />Who are the Turkers?<br />Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitable<br />Tasks from Turkers’ and requesters’ points of view<br />Examples<br />Classifying links<br />Reacting to collections of links<br />Practicalities<br />Tools<br />Paying Turkers at UMich<br />Human Subjects<br />Slides will be available online.<br />
    8. 8. Who are the Turkers?<br />
    9. 9. Andy Baio, Faces of Mechanical Turk<br />
    10. 10. Andy Baio, Faces of Mechanical Turk<br />
    11. 11. Andy Baio, Faces of Mechanical Turk<br />
    12. 12. 300 Turker Survey from PanosIpeirotis<br />Limited by self-selection issues (people who do tasks w/ only one available, and at that pay).<br />By country:<br /> 76% US; 8% India; 3% UK; 2% Canada<br />
    13. 13.
    14. 14.
    15. 15.
    16. 16. Ideal types of tasks<br />Short duration<br />Repetitive – Turker learns once, repeats many <br />No particular expertise required<br />From requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.<br />tasks that require people to write something<br /> assess quality using multiple raters<br />but you can use it in other ways. <br />
    17. 17. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
    18. 18. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
    19. 19. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
    20. 20. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
    21. 21. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Flickr:Michelle Gibson<br />
    22. 22.
    23. 23.
    24. 24.
    25. 25.
    26. 26.
    27. 27. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />
    28. 28. Turkers as Classifiers<br />
    29. 29. Large-scale study of diffusion and influence on Twitter<br />How does the spread of a URL over the twitter network depend on the content?<br />What proportion of “influential” users are mass media vs. individuals<br />Requires thousands of labels of URLs and users. Needs to be fast and cheap.<br />
    30. 30.
    31. 31.
    32. 32.
    33. 33. Turkers as Subjects<br />
    34. 34. Turkers as Subjects – Challenges<br />Hard to check answer quality when you want opinions!<br />Screening & treatment randomization<br />mTurk not optimized for 1x tasks<br />
    35. 35.
    36. 36. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />
    37. 37. How to screen?<br />Liberal<br />Republican<br />Democrat<br />Conservative<br />
    38. 38. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Require 95% task approval rating<br />Require US location<br />Ask demographics, political preferences<br />Approve or reject tasks<br />
    39. 39. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />Evaluate Qualification: Grant or reject<br />Create or use existing qualification<br />
    40. 40. Checking for validity<br />Couldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the list<br />Did have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences<br />
    41. 41. Total cost: $382 for 485 collection ratings<br />Had to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.<br />
    42. 42. Practicalities<br />
    43. 43. Tools<br />Web interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability. <br />Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.<br />Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)<br />3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut. <br />CrowdFlower – from Dolores Labs: crowdflower.com<br />Smartsheet: smartsheet.com/product/smartsourcing<br />
    44. 44. Human subjects?<br />Human subjects status varies with design<br />Categorizing content: Not human subjects<br />Asking for reactions to content: Human subjects.<br />Informed Consent<br />My preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)<br />You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks. <br />
    45. 45. Subject payment<br />mTurk handles all payment, but<br />Associate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.<br />
    46. 46. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Turker<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />Evaluate Qualification: Grant or reject<br />Scoring<br /><ul><li>Automatically score: instant grant / reject, requires right & wrong answers
    47. 47. Download & score: Good for participant screening, fast turnaround (run every minute), random assignment</li></ul>Can set limits on retaking<br />Too many rejects? Revoke qualification.<br />Create or use existing qualification<br /><ul><li>Must be hosted by Amazon
    48. 48. Built in quals for location, reputation</li></ul>Requester<br />Can assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.<br />
    49. 49. Some references & resources<br />General<br />Dolores Labs blog: http://blog.doloreslabs.com/<br />Turker Nation forums: http://turkers.proboards.com<br />5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/<br />Turker Demographics<br />Survey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html<br />Turker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html<br />Why do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.html<br />Why do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html<br />
    50. 50. Some references & resources<br />Improving Answer quality<br />AniketKittur, Ed H. Chi, and BongwonSuh (2008). “Crowdsourcing user studies with Mechanical Turk,” CHI 2008. <br />Answer quality and dealing with bad answers<br />Carpenter, Bob. 2008. Hierarchical Bayesian Models of Categorical Data<br />Raykar et al. (2009) Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit, ICML.<br />Worker quality & HIT difficultyhttp://behind-the-enemy-lines.blogspot.com/2008/08/mechanical-turk-worker-quality-and-hit.html<br />Also see literature on scoring a test without an answer key<br />
    51. 51. Some references & resources<br />Turker effort, skills, participation rate, and pay<br />W Mason, D Watts. (2009). Financial Incentives and the Performance of Crowds. KDD Workshop on Human Computation.<br />Self report on skillshttp://behind-the-enemy-lines.blogspot.com/2009/01/how-good-are-you-turker.html<br />Human Subjects<br />Consent in qualification testshttp://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html<br />Discussionhttp://behind-the-enemy-lines.blogspot.com/2009/01/mechanical-turk-human-subjects-and-irbs.html<br />

    ×