Successfully reported this slideshow.

Mechanical Turk for Social Science Introduction

3,864 views

Published on

Published in: Education, Technology

Mechanical Turk for Social Science Introduction

  1. 1. Mechanical Turk for Social Science<br />Sean Munson EytanBakshy<br />School of Information, UMich 28 October 2009<br />
  2. 2. 11:00 am - Problem:Need to classify thousands of blogs according to category.<br />
  3. 3. Lunch*<br />*not actual lunch<br />
  4. 4.
  5. 5. 1:00 pm<br />50 blogs classified5x each<br />
  6. 6. Mechanical Turk for Social Science Awesome<br />Sean Munson EytanBakshy<br />An API made of people!<br />
  7. 7. Overview<br />Who are the Turkers?<br />Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitable<br />Tasks from Turkers’ and requesters’ points of view<br />Examples<br />Classifying links<br />Reacting to collections of links<br />Practicalities<br />Tools<br />Paying Turkers at UMich<br />Human Subjects<br />Slides will be available online.<br />
  8. 8. Who are the Turkers?<br />
  9. 9. Andy Baio, Faces of Mechanical Turk<br />
  10. 10. Andy Baio, Faces of Mechanical Turk<br />
  11. 11. Andy Baio, Faces of Mechanical Turk<br />
  12. 12. 300 Turker Survey from PanosIpeirotis<br />Limited by self-selection issues (people who do tasks w/ only one available, and at that pay).<br />By country:<br /> 76% US; 8% India; 3% UK; 2% Canada<br />
  13. 13.
  14. 14.
  15. 15.
  16. 16. Ideal types of tasks<br />Short duration<br />Repetitive – Turker learns once, repeats many <br />No particular expertise required<br />From requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.<br />tasks that require people to write something<br /> assess quality using multiple raters<br />but you can use it in other ways. <br />
  17. 17. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
  18. 18. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
  19. 19. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
  20. 20. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />
  21. 21. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Flickr:Michelle Gibson<br />
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />
  28. 28. Turkers as Classifiers<br />
  29. 29. Large-scale study of diffusion and influence on Twitter<br />How does the spread of a URL over the twitter network depend on the content?<br />What proportion of “influential” users are mass media vs. individuals<br />Requires thousands of labels of URLs and users. Needs to be fast and cheap.<br />
  30. 30.
  31. 31.
  32. 32.
  33. 33. Turkers as Subjects<br />
  34. 34. Turkers as Subjects – Challenges<br />Hard to check answer quality when you want opinions!<br />Screening & treatment randomization<br />mTurk not optimized for 1x tasks<br />
  35. 35.
  36. 36. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />
  37. 37. How to screen?<br />Liberal<br />Republican<br />Democrat<br />Conservative<br />
  38. 38. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Require 95% task approval rating<br />Require US location<br />Ask demographics, political preferences<br />Approve or reject tasks<br />
  39. 39. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />Evaluate Qualification: Grant or reject<br />Create or use existing qualification<br />
  40. 40. Checking for validity<br />Couldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the list<br />Did have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences<br />
  41. 41. Total cost: $382 for 485 collection ratings<br />Had to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.<br />
  42. 42. Practicalities<br />
  43. 43. Tools<br />Web interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability. <br />Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.<br />Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)<br />3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut. <br />CrowdFlower – from Dolores Labs: crowdflower.com<br />Smartsheet: smartsheet.com/product/smartsourcing<br />
  44. 44. Human subjects?<br />Human subjects status varies with design<br />Categorizing content: Not human subjects<br />Asking for reactions to content: Human subjects.<br />Informed Consent<br />My preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)<br />You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks. <br />
  45. 45. Subject payment<br />mTurk handles all payment, but<br />Associate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.<br />
  46. 46. Automatically accept another task of this type, or go find a new task<br />Task listing – Preview & select task <br />Take Qualification<br />Get Paid<br />Complete task<br />Turker<br />Create task type<br />Load Task instances<br />(prepay)<br />Approve or reject tasks<br />Evaluate Qualification: Grant or reject<br />Scoring<br /><ul><li>Automatically score: instant grant / reject, requires right & wrong answers
  47. 47. Download & score: Good for participant screening, fast turnaround (run every minute), random assignment</li></ul>Can set limits on retaking<br />Too many rejects? Revoke qualification.<br />Create or use existing qualification<br /><ul><li>Must be hosted by Amazon
  48. 48. Built in quals for location, reputation</li></ul>Requester<br />Can assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.<br />
  49. 49. Some references & resources<br />General<br />Dolores Labs blog: http://blog.doloreslabs.com/<br />Turker Nation forums: http://turkers.proboards.com<br />5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/<br />Turker Demographics<br />Survey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html<br />Turker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html<br />Why do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.html<br />Why do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html<br />
  50. 50. Some references & resources<br />Improving Answer quality<br />AniketKittur, Ed H. Chi, and BongwonSuh (2008). “Crowdsourcing user studies with Mechanical Turk,” CHI 2008. <br />Answer quality and dealing with bad answers<br />Carpenter, Bob. 2008. Hierarchical Bayesian Models of Categorical Data<br />Raykar et al. (2009) Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit, ICML.<br />Worker quality & HIT difficultyhttp://behind-the-enemy-lines.blogspot.com/2008/08/mechanical-turk-worker-quality-and-hit.html<br />Also see literature on scoring a test without an answer key<br />
  51. 51. Some references & resources<br />Turker effort, skills, participation rate, and pay<br />W Mason, D Watts. (2009). Financial Incentives and the Performance of Crowds. KDD Workshop on Human Computation.<br />Self report on skillshttp://behind-the-enemy-lines.blogspot.com/2009/01/how-good-are-you-turker.html<br />Human Subjects<br />Consent in qualification testshttp://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html<br />Discussionhttp://behind-the-enemy-lines.blogspot.com/2009/01/mechanical-turk-human-subjects-and-irbs.html<br />

×