Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Humanities and Technology Unite by Panos Ipeirotis 1840 views
- The Market for Intellect: Discoveri... by Panos Ipeirotis 1563 views
- Crowdsourcing: Lessons from Henry Ford by Panos Ipeirotis 9624 views
- Big Data, Stupid Decisions / Strata... by Panos Ipeirotis 39710 views
- On Mice and Men: The Role of Biolog... by Panos Ipeirotis 2237 views
- Managing Crowdsourced Human Computa... by Panos Ipeirotis 17880 views

No Downloads

Total views

1,950

On SlideShare

0

From Embeds

0

Number of Embeds

46

Shares

0

Downloads

23

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Amazon Mechanical Turk Requester Meetup (Panos Ipeirotis – New York University) © 2009 Amazon.com, Inc. or its Affiliates.
- 2. Panos Ipeirotis - Introduction New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates.
- 3. Example: Build an Adult Web Site Classifier Need a large number of hand-labeled sites Get people to look at sites and classify them as: G (general), PG (parental guidance), R (restricted), X (porn) Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr © 2009 Amazon.com, Inc. or its Affiliates.
- 4. Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) © 2009 Amazon.com, Inc. or its Affiliates.
- 5. Improve Data Quality through Repeated Labeling Get multiple, redundant labels using multiple workers Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct Probability of correctness increases with number of workers Probability of correctness increases with quality of workers © 2009 Amazon.com, Inc. or its Affiliates.
- 6. But Majority Voting is Expensive Single Vote Statistics MTurk: 2500 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr 11-vote Statistics MTurk: 227 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr © 2009 Amazon.com, Inc. or its Affiliates.
- 7. Using redundant votes, we can infer worker quality Look at our spammer friend ATAMRO447HWJQ together with other 9 workers We can compute error rates for each worker Error rates for ATAMRO447HWJQ Our “friend” ATAMRO447HWJQ P[X → X]=9.847% P[X → G]=90.153% mainly marked sites as G. P[G → X]=0.053% P[G → G]=99.947% Obviously a spammer… © 2009 Amazon.com, Inc. or its Affiliates.
- 8. Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% P[X → X]=9.847% P[X → G]=90.153% P[G → X]=0.053% P[G → G]=99.947% Action: REJECT and BLOCK Results: Over time you block all spammers Spammers learn to avoid your HITS You can decrease redundancy, as quality of workers is higher © 2009 Amazon.com, Inc. or its Affiliates.
- 9. After rejecting spammers, quality goes up Spam keeps quality down Without spam, workers are of higher quality Without spam Need less redundancy for same quality 5 workers Same quality of results for lower cost 94% correct Without spam 1 worker With spam 80% correct 11 workers 93% correct With spam 1 worker 70% correct © 2009 Amazon.com, Inc. or its Affiliates.
- 10. Correcting biases Classifying sites as G, PG, R, X Sometimes workers are careful but biased Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% Classifies G → P and P → R Average error rate for ATLJIK76YH1TF: 45.0% Is ATLJIK76YH1TF a spammer? © 2009 Amazon.com, Inc. or its Affiliates.
- 11. Correcting biases Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% For ATLJIK76YH1TF, we simply need to compute the “non- recoverable” error-rate (technical details omitted) Non-recoverable error-rate for ATLJIK76YH1TF: 9% © 2009 Amazon.com, Inc. or its Affiliates.
- 12. Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ Input: – Labels from Mechanical Turk – Cost of incorrect labelings (e.g., XG costlier than GX) Output: – Corrected labels – Worker error rates – Ranking of workers according to their quality Alpha version, more improvements to come! Suggestions and collaborations welcomed! © 2009 Amazon.com, Inc. or its Affiliates.
- 13. Thank you! Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment