Your SlideShare is downloading. ×
New York Mechanical Turk Meetup
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

New York Mechanical Turk Meetup

1,501
views

Published on

Published in: Technology, Design, Business

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,501
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Title Page
  • Transcript

    • 1. Amazon Mechanical Turk Requester Meetup (Panos Ipeirotis – New York University) © 2009 Amazon.com, Inc. or its Affiliates.
    • 2. Panos Ipeirotis - Introduction  New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates.
    • 3. Example: Build an Adult Web Site Classifier  Need a large number of hand-labeled sites  Get people to look at sites and classify them as: G (general), PG (parental guidance), R (restricted), X (porn) Cost/Speed Statistics  Undergrad intern: 200 websites/hr, cost: $15/hr  MTurk: 2500 websites/hr, cost: $12/hr © 2009 Amazon.com, Inc. or its Affiliates.
    • 4. Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) © 2009 Amazon.com, Inc. or its Affiliates.
    • 5. Improve Data Quality through Repeated Labeling  Get multiple, redundant labels using multiple workers  Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct  Probability of correctness increases with number of workers  Probability of correctness increases with quality of workers © 2009 Amazon.com, Inc. or its Affiliates.
    • 6. But Majority Voting is Expensive Single Vote Statistics  MTurk: 2500 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr 11-vote Statistics  MTurk: 227 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr © 2009 Amazon.com, Inc. or its Affiliates.
    • 7. Using redundant votes, we can infer worker quality  Look at our spammer friend ATAMRO447HWJQ together with other 9 workers  We can compute error rates for each worker Error rates for ATAMRO447HWJQ Our “friend” ATAMRO447HWJQ  P[X → X]=9.847% P[X → G]=90.153% mainly marked sites as G.  P[G → X]=0.053% P[G → G]=99.947% Obviously a spammer… © 2009 Amazon.com, Inc. or its Affiliates.
    • 8. Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2%  P[X → X]=9.847% P[X → G]=90.153%  P[G → X]=0.053% P[G → G]=99.947% Action: REJECT and BLOCK Results:  Over time you block all spammers  Spammers learn to avoid your HITS  You can decrease redundancy, as quality of workers is higher © 2009 Amazon.com, Inc. or its Affiliates.
    • 9. After rejecting spammers, quality goes up  Spam keeps quality down  Without spam, workers are of higher quality Without spam  Need less redundancy for same quality 5 workers  Same quality of results for lower cost 94% correct Without spam 1 worker With spam 80% correct 11 workers 93% correct With spam 1 worker 70% correct © 2009 Amazon.com, Inc. or its Affiliates.
    • 10. Correcting biases  Classifying sites as G, PG, R, X  Sometimes workers are careful but biased Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%  Classifies G → P and P → R  Average error rate for ATLJIK76YH1TF: 45.0% Is ATLJIK76YH1TF a spammer? © 2009 Amazon.com, Inc. or its Affiliates.
    • 11. Correcting biases Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%  For ATLJIK76YH1TF, we simply need to compute the “non- recoverable” error-rate (technical details omitted)  Non-recoverable error-rate for ATLJIK76YH1TF: 9% © 2009 Amazon.com, Inc. or its Affiliates.
    • 12. Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/  Input: – Labels from Mechanical Turk – Cost of incorrect labelings (e.g., XG costlier than GX)  Output: – Corrected labels – Worker error rates – Ranking of workers according to their quality  Alpha version, more improvements to come!  Suggestions and collaborations welcomed! © 2009 Amazon.com, Inc. or its Affiliates.
    • 13. Thank you! Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates.