• Save
Adventures in Crowdsourcing: Research at UT Austin & Beyond
Upcoming SlideShare
Loading in...5
×
 

Adventures in Crowdsourcing: Research at UT Austin & Beyond

on

  • 4,219 views

Talks at LinkedIn (August 20, 2012) and Microsoft (August 23, 2012, updated version).

Talks at LinkedIn (August 20, 2012) and Microsoft (August 23, 2012, updated version).

Statistics

Views

Total Views
4,219
Views on SlideShare
1,874
Embed Views
2,345

Actions

Likes
3
Downloads
0
Comments
0

18 Embeds 2,345

http://thenoisychannel.com 2029
http://smartdatacollective.com 136
https://twitter.com 102
http://recsys.acm.org 47
http://www.newsblur.com 5
https://si0.twimg.com 5
http://www.twylah.com 4
http://feeds2.feedburner.com 3
http://webcache.googleusercontent.com 3
http://newsblur.com 2
http://recsys.hosting.acm.org 2
http://my-api.appspot.com 1
http://translate.googleusercontent.com 1
http://feeds.feedburner.com 1
http://www.nonrelevant.net 1
https://twimg0-a.akamaihd.net 1
http://www.feedspot.com 1
http://digg.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Adventures in Crowdsourcing: Research at UT Austin & Beyond Adventures in Crowdsourcing: Research at UT Austin & Beyond Presentation Transcript

  • Adventures in Crowdsourcing:Research at UT Austin & Beyond Matt Lease School of Information @mattlease University of Texas at Austin ml@ischool.utexas.edu
  • Outline • Foundations • Work at UT Austin • A Few Roadblocks – Workflow Design – Sensitive data – Regulation – Fraud – EthicsAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 2
  • Amazon Mechanical Turk (MTurk) • Marketplace for crowd labor (microtasks) • Created in 2005 (still in “beta”) • On-demand, scalable, 24/7 global workforceAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 3
  • Labeling Data (“Gold Rush”)
  • Snow et al. (EMNLP 2008) • MTurk annotation for 5 Tasks – Affect recognition – Word similarity – Recognizing textual entailment – Event temporal ordering – Word sense disambiguation • 22K labels for US $26 • High agreement between consensus labels and gold-standard labelsAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 5
  • Sorokin & Forsythe (CVPR 2008) • MTurk for Computer Vision • 4K labels for US $60August 23, 2012 Matt Lease - ml@ischool.utexas.edu 6
  • Kittur, Chi, & Suh (CHI 2008) • MTurk for User Studies • “…make creating believable invalid responses as effortful as completing the task in good faith.”August 23, 2012 Matt Lease - ml@ischool.utexas.edu 7
  • Alonso et al. (SIGIR Forum 2008) • MTurk for Information Retrieval (IR) – Judge relevance of search engine results • Various follow-on studies (design, quality, cost)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 8
  • Social & Behavioral Sciences • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazons Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.August 23, 2012 Matt Lease - ml@ischool.utexas.edu 9
  • August 12, 2012 Matt Lease - ml@ischool.utexas.edu 10
  • What about data quality?• Many CS papers on statistical methods – Online vs. offline, feature-based vs. content-agnostic – Worker calibration, noise vs. bias, weighted voting – Work in my lab by Jung, Kumar, Ryu, & Tang• Human factors also matter! – Instructions, design, interface, interaction – Names, relationship, reputation – Fair pay, hourly vs. per-task, recognition, advancement – For contrast with MTurk, consider Kochhar (2010)• See Lease, HComp‘11 11
  • Kovashka & Lease, CrowdConf’10August 23, 2012 Matt Lease - ml@ischool.utexas.edu 13
  • Grady & Lease, 2010 (Search Eval.)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 14/10
  • Noisy Supervised Classification Kumar and Lease, 2011(a) Our 1st study of aggregation (Fall’10) Simple idea, simulated workers Highlights concepts & open questions
  • Problem • Crowd labels tends to be noisy • Can reduce uncertainty via wisdom of crowds – Collect & aggregate multiple labels per example • How do we to maximize learning (labeling effort)? – Label a new example? – Get another label for an already-labeled example? See: Sheng, Provost & Ipeirotis, KDD’08August 23, 2012 Matt Lease - ml@ischool.utexas.edu 16
  • Setup • Task: Binary classification • Learner: C4.5 decision tree • Given – An initial seed set of single-labeled examples (64) – An unlimited pool of unlabeled examples • Cost model – Fixed unit cost for labeling any example – Unlabeled examples are freely obtained • Goal: Maximize learning rate (for labeling effort)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 17
  • Compare 3 methods: SL, MV, & NB • Single labeling (SL): label a new example • Multi-Labeling: get another label for pool – Majority Vote (MV): consensus by simple vote – Naïve Bayes (NB): weight vote by annotator accuracyAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 18
  • Assumptions • Example selection: random – From pool for SL, from seed set for multi-labeling • Fixed commitment to a single method a priori • Balanced classes (accuracy, uniform prior) • Annotator accuracies are known to system – In practice, must estimate these: from gold data (Snow et al. ’08) or EM (Dawid & Skene’79)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 19
  • Simulation • Each annotator – Has parameter p (prob. of producing correct label) – Generates exactly one label • Uniform distribution of accuracies U(min,max) • Generative model for simulation – Pick an example x (with true label y*) at random – Draw annotator accuracy p ~ U(min,max) – Generate label y ~ P(y | p, y*)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 20
  • Evaluation • Data: datasets from UCI ML Repository – Mushroom – Spambase http://archive.ics.uci.edu/ml/datasets.html – Tic-Tac-Toe – Chess: King-Rook vs. King-Pawn • Same trends across all 4, so we report first 2 • Random 70 / 30 split of data for seed+pool / test • Repeat each run 10 times and average resultsAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 21
  • p ~ U(0.6, 1.0) • Fairly accurate annotators (mean = 0.8) • Little uncertainty -> little gain from multi-labelingAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 22
  • p ~ U(0.4, 0.6) • Very noisy (mean = 0.5, random coin flip) • SL and MV learning rates are flat • NB wins by weighting more accurate workersAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 23
  • p ~ U(0.1, 0.7) • Worsen accuracies further (mean = 0.4) • NB virtually unchanged • SL and MV predictions become anti-correlated – We should actually flip their predictions…August 23, 2012 Matt Lease - ml@ischool.utexas.edu 24
  • Label flipping • Is NB doing better due to how it uses accuracy, or simply because it’s using more information? • Average accuracy < 50% --> label usually wrong – NB implicitly captures; SL and MV do not • Label flipping: put all methods on even-footing • Simple case of bias vs. noise – Issue is not whether correlated or anti-correlated – Issue is strength of correlationAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 25
  • p ~ U(0.1, 0.7) No flipping fter With flipping Mushroom Dataset Spambase Dataset 100 90 80 80 70 60 SL accuracy (%) SL accuracy (%) 60 MV accuracy(%) 40 MV accuracy(%) 50 NB accuracy (%) 40 20 NB accuracy (%) 30 0 20 64 128 256 512 1024 2048 4096 64 128 256 512 1024 2048August 23, 2012 Matt Lease - ml@ischool.utexas.edu 26
  • Summary of study • Detecting anti-correlated (bad) workers more important than the model used • Open Questions – When accuracies are estimated (noisy)? – With actual error distribution (real data)? – With different learners or tasks (e.g. ranking)? – With dynamic choice of new example or re-label? – With active learning example selection? – With imbalanced classes?August 23, 2012 Matt Lease - ml@ischool.utexas.edu 27
  • SnapshotsAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 28
  • Noisy Learning to Rank Kumar & Lease 2011bAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 29
  • Semi-Supervised Repeated Labeling Tang & Lease, CIR’11August 23, 2012 Matt Lease - ml@ischool.utexas.edu 30
  • Smart Crowd Filter • Ryu & Lease, ASIS&T’11 • Active Learning – Train Multi-class SVM to estimate P(Y|X) – Estimate average P(Y|X) for each worker – Filter out workers below threshold • Explore/Exploit (unexpected/expected labels)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 31
  • Z-score Weighted Filtering & Voting Jung & Lease, Hcomp’11August 23, 2012 Matt Lease - ml@ischool.utexas.edu 32
  • Inferring Missing Judgments Jung & Lease, 2012August 23, 2012 Matt Lease - ml@ischool.utexas.edu 33
  • Jung & Lease, Hcomp’12August 23, 2012 Matt Lease - ml@ischool.utexas.edu 34
  • Social Network + Crowdsourcing • Klinger & Lease, ASIS&T’11August 23, 2012 Matt Lease - ml@ischool.utexas.edu 35
  • Website Usability (Liu et al., 2012)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 36
  • 37
  • Designing & Optimizing Workflows 38
  • Workflow Management • How should we balance automation vs. human computation? Who does what? • Who’s the right person for the job? • Juggling constraints on budget, scheduling, quality, effort …August 23, 2012 Matt Lease - ml@ischool.utexas.edu 39
  • What about sensitive data?• Not all data can be publicly disclosed – User data (e.g. AOL query log, Netflix ratings) – Intellectual property – Legal confidentiality• Need to restrict who is in your crowd – Separate channel (workforce) from technology – Hot question for adoption at enterprise level 40
  • What about regulation?• Wolfson & Lease (ASIS&T’11)• As usual, technology is ahead of the law – employment law – patent inventorship – data security and the Federal Trade Commission – copyright ownership – securities regulation of crowdfunding• Take-away: don’t panic, but be mindful – Understand risks of “just in-time compliance” 41
  • What about fraud?• Some reports of robot “workers” on MTurk – Artificial Artificial Artificial Intelligence – Violates terms of service• Why not just use a captcha? 42
  • Captcha Fraud• Severity? 43
  • Requester Fraud on MTurk“Do not do any HITs that involve: filling inCAPTCHAs; secret shopping; test our web page;test zip code; free trial; click my link; surveys orquizzes (unless the requester is listed with asmiley in the Hall of Fame/Shame); anythingthat involves sending a text message; orbasically anything that asks for any personalinformation at all—even your zip code. If youfeel in your gut it’s not on the level, IT’S NOT.Why? Because they are scams...” 44
  • Fraud via Crowds
  • Wang et al., WWW’12• “…not only do malicious crowd-sourcing systems exist, but they are rapidly growing…” 46
  • Robert Sim, MSR Summit’12 47
  • Identifying Workers (Uniquely) • Need for identifiable workers – Repeated labeling – Recognizing “Master Workers” • Today – Platforms assign IDs intended to be unique – Problem in practice, esp. with multiple platforms – Sybil attacks • Identity value – If workers interchangeable, identities are disposable – If workers are distinguished, identifies become valuable – Reduce some types of attacks, increase othersAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 48
  • What about ethics?Fort, Adda, and Cohen (2011)• “…opportunities for our community to deliberately value ethics above cost savings.”• Suggest we focus on unpaid games; narrow solutionSilberman, Irani, and Ross (2010)• “How should we… conceptualize the role of these people who we ask to power our computing?”• Power dynamics between parties• “Abstraction hides detail” 49
  • Davis et al. (2010) The HPU. HPU 50
  • HPU: “Abstraction hides detail” 51
  • Digital Dirty Jobs• The Googler who Looked at the Worst of the Internet• Policing the Web’s Lurid Precincts• Facebook content moderation• The dirty job of keeping Facebook clean• Even linguistic annotators report stress & nightmares from reading news articles! 52
  • What about freedom?• Vision: empowering worker freedom: – work whenever you want for whomever you want• Risk: people being compelled to perform work – As crowdsourcing grows, greater $$$ at stake – Digital sweat shops? Digital slaves? – Prisoners used for gold farming – We really don’t know much today – Traction? Human Trafficking at MSR Summit’12 53
  • Thank You!Students: Past & Present – Catherine Grady (iSchool) – Hyunjoon Jung (iSchool) – Jorn Klinger (Linguistics) – Adriana Kovashka (CS) – Abhimanu Kumar (CS) ir.ischool.utexas.edu/crowd – Hohyon Ryu (iSchool) – Wei Tang (CS) – Stephen Wolfson (iSchool)Support – John P. Commons Fellowship – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease 54