My name is Dahn Tamir, and I’ve used MTurk for everything from vetting names for my new daughter to a recent study of web browser preference by political affiliation (http://www.evilsoft.org/?p=151). This evening I’m going to focus on the work we’ve done at Knewton.
Knewton is a venture-backed eLearning startup in the west village. We prepare students for graduate entrance exams, and in the future will open our learning platform to publishers of other educational content. We've been using MTurk since we were in stealth mode a year ago and continue to be heavy users today.
The core of our system is adaptivity, and adaptive testing requires response data from hundreds of users on thousands of test questions. We built groups of qualified workers and administered quizzes to establish the foundation for our testing engine. This is real science; overseen by the former director of research at Educational Testing Service. We have load tested our online classroom via MTurk, proofed all our course material, and beta tested the functionality of our learning and testing engines. We’ve also used Mturk for ratings and feedback on our name, logo, web design, price/feature analysis, video evaluation of teachers, and so on We’ve collected and cleaned data on schools, potential partners and marketing outlets And while this requires care as we don’t want to risk being seen as spammers, we do for instance tap over 500 current college students to distribute flyers at their campuses. We also pretest banner ads and landing pages on Mturk.
How else can you get a thousand pages of text thoroughly proofread in 72 hours? But there's another dimension of speed beyond time to complete a project, and that's time to spin up and start getting responses. Because it's so fast and easy, we experiment a lot. Some things we try go nowhere, but the risk of trying is trivial. Calibrating our test engine was expected to cost tens of thousands of dollars, and we got it done for one-tenth of our budget. Through surveys and with custom qualifications we've established panels of workers by country, age, gender, education level, language ability, and so on, and can go to the right group for each task. Because we can afford to get many eyes on each task and because can iterate, we end up with more complete and accurate results on everything we do than we'd have without the wisdom of the crowd. This point is huge to us. Saving time and money are great, but in some cases the improvement in quality is reason enough to use Mturk.
It's inconceivable to many that people would be Turking for the money if they are only paid a dollar or two an hour. If you think of Mturk fundamentally as a way to get 10c worth of work from some bored person for 1c, you're selling the opportunity short. There are many highly capable Turkers who are perhaps temporarily out of the workforce because of medical disability, child rearing, a layoff, or because they’re in school. Our top 20 workers each have from 100 thousand to 500 thousand approved HITs, and overall we believe a very large fraction of work on MTurk is completed by a small number of huge, accurate producers. Getting those people working for you is key. Restricting by approval rate is useful, we get better results by creating a pool of workers who have shown they can do good work on tasks relevant to us. A poor worker can have an artificially high approval rate and vice versa. And someone’s performance on other HITs may not predict performance on your work, for better or worse. Qualifications help. It pays to take time and care in building and testing HITs to ensure that everything looks and operates for the worker as you intend. Poorly-constructed or poorly-explained HITs just get poor results. We try to align the payment amount to the timing and difficulty of the task, and have paid from a penny to five dollars for a single HIT. It’s also helped to break up complex tasks into separate HITs whenever possible. The increased effort of structuring two or three HITs really is worthwhile. Finally for large projects it’s best to try a small sample first and expect to tweak the HIT a few times—then load your 50 thousand data points. Because most requesters use the approval-rate qualification, workers live in fear of unfair rejection. Good workers will avoid your tasks if the setup suggests a chance of rejection. For instance, it's not unreasonable to use the majority opinion as the &quot;correct&quot; answer on an image moderation task. But that does not mean you have to reject the response that was &quot;wrong,&quot; especially as that response may actually be correct. We create goodwill with workers by paying for quality effort and tolerating the occasional &quot;error.&quot; On the other hand, if we identify a scammer or careless worker, we simply reject their submissions and block them from future tasks. For simple and well-established uses, the automation metaphor of MTurk works fine. But if you’re trying to do anything even a little different, it pays to introduce yourself on the forums, establish yourself as a trustworthy employer and solicit free advice. Once you are running HITs, take the time to be responsive to questions, concerns and suggestions from your workers. These are real people and your respect for their efforts will pay dividends in faster, more accurate results.
I’d love to take your questions now, and also welcome you to contact me directly.
Amazon MTurk Developer Meetup - Tamir
Amazon Mechanical Turk Requester Meetup Dahn Tamir, Knewton Inc.
Knewton - Introduction <ul><li>Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine. </li></ul><ul><li>Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field. </li></ul>
How we use MTurk <ul><li>Calibration for computer-adaptive testing </li></ul><ul><li>Quality assurance </li></ul><ul><li>Focus Groups and Surveys </li></ul><ul><li>Database building </li></ul><ul><li>Marketing </li></ul>
Why Mturk? <ul><li>Speed </li></ul><ul><li>Cost </li></ul><ul><li>Appropriate worker population for each task </li></ul><ul><li>Quality </li></ul>
What We Learned <ul><li>Use qualification tests </li></ul><ul><li>Invest in building good HITs </li></ul><ul><li>Hesitate to reject work (but not cheaters) </li></ul><ul><li>Turkers are a diverse and capable population </li></ul><ul><li>Meet Turker Nation </li></ul>