• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
UT Dallas CS - Rise of Crowd Computing
 

UT Dallas CS - Rise of Crowd Computing

on

  • 587 views

Talk at the University of Texas at Dallas Department of Computer Science, hosted by Yang Liu (October 10, 2012)

Talk at the University of Texas at Dallas Department of Computer Science, hosted by Yang Liu (October 10, 2012)

Statistics

Views

Total Views
587
Views on SlideShare
587
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    UT Dallas CS - Rise of Crowd Computing UT Dallas CS - Rise of Crowd Computing Presentation Transcript

    • The Rise of Crowd Computing Matt Lease School of Information @mattlease University of Texas at Austin ml@ischool.utexas.edu
    • Crowdsourcing• Jeff Howe. Wired, June 2006.• Take a job traditionally performed by a known agent (often an employee)• Outsource it to an undefined, generally large group of people via an open call• New application of principles from open source movement 2
    • Amazon Mechanical Turk (MTurk)• Marketplace for crowd labor (microtasks)• Created in 2005 (still in “beta”)• On-demand, scalable, 24/7 global workforce 3
    • The Gold Rush: Data Labeling@mattlease 4
    • Snow et al. (EMNLP 2008)• MTurk annotation for 5 Tasks – Affect recognition – Word similarity – Recognizing textual entailment – Event temporal ordering – Word sense disambiguation• 22K labels for US $26• High agreement between consensus labels and gold-standard labels 5
    • Alonso et al. (SIGIR Forum 2008)• MTurk for Information Retrieval (IR) – Judge relevance of search engine results• Many follow-on studies (design, quality, cost) 6
    • Sorokin & Forsythe (CVPR 2008)• MTurk for Computer Vision• 4K labels for US $60 7
    • Studying People & Interactive Systems@mattlease 8
    • Kittur, Chi, & Suh (CHI 2008)• MTurk for User Studies• “…make creating believable invalid responses as effortful as completing the task in good faith.” 9
    • Social & Behavioral Sciences• A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online.• Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010)• Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management• Amazons Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists 10
    • Remote Usability Testing• Liu, Bias, Lease, & Kuipers (ASIS&T’12)• On-site vs. crowdsourced usability testing• Advantages – More Participants – More Diverse Participants – High Speed – Low Cost• Disadvantages – Lower Quality Feedback – Less Interaction – Greater Need for Quality Control – Less Focused User Groups 11
    • Beyond MTurk@mattlease 12
    • ESP Game (Games With a Purpose)von Ahn & Dabbish (2004) 13
    • reCaptchavon Ahn et al. (2008). In Science. 14
    • Crowd Sensing & Monitoring• Sullivan et al. (2009). Bio. Conservation (142):10• Keynote by Steve Kelling (ASIS&T 2011) 15
    • Human Computation@mattlease 16
    • • What was old is new• Crowdsourcing: A New Branch of Computer Science – D.A. Grier, March 29, 2011• Tabulating the heavens: computing the Nautical Almanac in 18th-century England – M. Croarken (2003) Princeton University Press, 2005 17
    • The Human Processing Unit (HPU)• Davis et al. (2010) HPU 18
    • Blending Automation & Human Computation@mattlease 19
    • Ethics Checking: The Next Frontier?• Mark Johnson’s address at ACL 2003 – Transcript in Conduit 12(2) 2003• Think how useful a little “ethics checker and corrector” program integrated into a word processor could be! 20
    • Soylent: A Word Processor with a Crowd Inside • Bernstein et al., UIST 2010 21
    • Translation by monolingual speakers• C. Hu, CHI 2009 22
    • fold.itS. Cooper et al. (2010)Alice G. Walton. Online Gamers Help Solve Mystery ofCritical AIDS Virus Enzyme. The Atlantic, October 8, 2011. 23
    • @mattlease 24
    • @mattlease 25
    • Quality Assurance• Many CS papers on statistical methods – Online vs. offline, feature-based vs. content-agnostic – Worker calibration, noise vs. bias, weighted voting – Work in my lab by Jung, Kumar, Ryu, & Tang• Human factors matter – Instructions, design, interface, interaction – Names, relationship, reputation (Klinger & Lease’11) – Fair pay, hourly vs. per-task, recognition, advancement – For contrast with MTurk, consider Kochhar (2010) 26
    • Grady & Lease, 2010 (Search Eval.)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 27/10
    • Social Network + Crowdsourcing• Klinger & Lease, 2011August 23, 2012 Matt Lease - ml@ischool.utexas.edu 28
    • Semi-Supervised Repeated LabelingTang & Lease, 2011August 23, 2012 Matt Lease - ml@ischool.utexas.edu 29
    • Noisy Learning to Rank Kumar & Lease 2011bAugust 23, 2012 Matt Lease - ml@ischool.utexas.edu 30
    • Active Learning• Ryu & Lease, ASIS&T’11• Settles’ “noisy oracles” – Train multi-class SVM to estimate P(Y|X) – Estimate average P(Y|X) for each worker – Filter out workers below threshold• Explore/Exploit (unexpected/expected labels)August 23, 2012 Matt Lease - ml@ischool.utexas.edu 31
    • Inferring Missing JudgmentsJung & Lease, 2012 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 32
    • What about benchmarks?• How well do alternative methods perform? – Common datasets & tasks enable comparison – Contests drive innovation & measure collective progress• Common tasks today – Translation – Transcription – Search Evaluation – Verification & Correction – Content Generation• NIST TREC Crowdsourcing Track (2012 is Year 2) 33
    • What about workflow design? 34
    • What about sensitive data?• Not all data can be publicly disclosed – User data (e.g. AOL query log, Netflix ratings) – Intellectual property – Legal confidentiality• Need to restrict who is in your crowd – Separate channel (workforce) from technology – Hot question for adoption at enterprise level 35
    • What about fraud?• Some reports of robot “workers” on MTurk – Artificial Artificial Artificial Intelligence – Violates terms of service• Why not just use a captcha? 36
    • Fraud wears many faces“Do not do any HITs that involve: filling inCAPTCHAs; secret shopping; test our web page;test zip code; free trial; click my link; surveys orquizzes (unless the requester is listed with asmiley in the Hall of Fame/Shame); anythingthat involves sending a text message; orbasically anything that asks for any personalinformation at all—even your zip code. If youfeel in your gut it’s not on the level, IT’S NOT.Why? Because they are scams...” 37
    • Fraud via Crowds
    • Wang et al., WWW’12• “…not only do malicious crowd-sourcing systems exist, but they are rapidly growing…” 39
    • Robert Sim, MSR Summit’12 40
    • Broader Issues@mattlease 41
    • What about regulation?• Wolfson & Lease (ASIS&T’11)• As usual, technology is ahead of the law – employment law – patent inventorship – data security and the Federal Trade Commission – copyright ownership – securities regulation of crowdfunding• Take-away: don’t panic, but be mindful – Understand risks of “just in-time compliance” 42
    • What about ethics?• Silberman, Irani, and Ross (2010) – “How should we… conceptualize the role of these people who we ask to power our computing?” – Power dynamics between parties – “Abstraction hides detail”• Fort, Adda, and Cohen (2011) – “…opportunities for our community to deliberately value ethics above cost savings.” 43
    • Davis et al. (2010) The HPU. HPU 44
    • Who arethe workers?• A. Baio, November 2008. The Faces of Mechanical Turk.• P. Ipeirotis. March 2010. The New Demographics of Mechanical Turk• J. Ross, et al. Who are the Crowdworkers? CHI 2010. 45
    • HPU: “Abstraction hides detail” 46
    • How much to pay?Performance, psychology, economics, and ethics• Pay vs. performance tradeoff, incentive design• Primary or supplemental income?• Effect on local economies?• Ethics of paying something (if low) vs. paying nothing (e.g., games) 47
    • Digital Dirty Jobs• The Googler who Looked at the Worst of the Internet• Policing the Web’s Lurid Precincts• Facebook content moderation• The dirty job of keeping Facebook clean• Even linguistic annotators report stress & nightmares from reading news articles! 48
    • What about freedom?• Vision: empowering worker freedom: – work whenever you want for whomever you want• Risk: people being compelled to perform work – Digital sweat shops? Digital slaves? – Prisoners used for gold farming – We really don’t know (and need to learn more…) – Traction? Human Trafficking at MSR Summit’12 49
    • Conclusion• Crowdsourcing is quickly transforming practice in industry and academia via greater efficiency• Crowd computing is creating a new breed of applications, augmenting state-of-the-art automation (AI) with human computation to offer new capabilities and user experiences• By placing people at the center of this new computing model, we must confront important considerations beyond the technological 50
    • Thank You!Students: Past & Present – Catherine Grady (iSchool) – Hyunjoon Jung (iSchool) – Jorn Klinger (Linguistics) – Adriana Kovashka (CS) – Abhimanu Kumar (CS) ir.ischool.utexas.edu/crowd – Hohyon Ryu (iSchool) – Wei Tang (CS) – Stephen Wolfson (iSchool)Support – John P. Commons Fellowship – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease 51
    • REFERENCES & RESOURCESAugust 12, 2012 52
    • 2012 Conferences & Workshops• AAAI: Human Computation (HComp) (July 22-23)• AAAI Spring Symposium: Wisdom of the Crowd (March 26-28)• ACL: 3rd Workshop of the Peoples Web meets NLP (July 12-13)• AMCIS: Crowdsourcing Innovation, Knowledge, and Creativity in Virtual Communities (August 9-12)• CHI: CrowdCamp (May 5-6)• CIKM: Multimodal Crowd Sensing (CrowdSens) (Oct. or Nov.)• Collective Intelligence (April 18-20)• CrowdConf 2012 (October 23)• CrowdNet - 2nd Workshop on Cloud Labor and Human Computation (Jan 26-27)• EC: Social Computing and User Generated Content Workshop (June 7)• ICDIM: Emerging Problem- specific Crowdsourcing Technologies (August 23)• ICEC: Harnessing Collective Intelligence with Games (September)• ICML: Machine Learning in Human Computation & Crowdsourcing (June 30)• ICWE: 1st International Workshop on Crowdsourced Web Engineering (CroWE) (July 27)• KDD: Workshop on Crowdsourcing and Data Mining (August 12)• Multimedia: Crowdsourcing for Multimedia (Nov 2)• SocialCom: Social Media for Human Computation (September 6)• TREC-Crowd: 2nd TREC Crowdsourcing Track (Nov. 14-16)• WWW: CrowdSearch: Crowdsourcing Web search (April 17) 53
    • Surveys• Ipeirotis, Panagiotis G., R. Chandrasekar, and P. Bennett. (2009). “A report on the human computation workshop (HComp).” ACM SIGKDD Explorations Newsletter 11(2).• Alex Quinn and Ben Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. In Proceedings of CHI 2011.• Law and von Ahn (2011). Human Computation August 12, 2012 54
    • 2013 Events PlannedResearch events• 1st year of HComp as AAAI conference• 2nd annual Collective Intelligence?Industrial Events• 4th CrowdConf (San Francisco, Fall)• 1st Crowdsourcing Week (Singapore, April)August 12, 2012 55
    • Journal Special Issues 2012 – Springer’s Information Retrieval (articles now online): Crowdsourcing for Information Retrieval – IEEE Internet Computing (articles now online): Crowdsourcing (Sept./Oct. 2012) – Hindawi’s Advances in Multimedia Journal: Multimedia Semantics Analysis via Crowdsourcing GeocontextAugust 12, 2012 56
    • 2011 Tutorials and Keynotes• By Omar Alonso and/or Matthew Lease – CLEF: Crowdsourcing for Information Retrieval Experimentation and Evaluation (Sep. 20, Omar only) – CrowdConf (Nov. 1, this is it!) – IJCNLP: Crowd Computing: Opportunities and Challenges (Nov. 10, Matt only) – WSDM: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You (Feb. 9) – SIGIR: Crowdsourcing for Information Retrieval: Principles, Methods, and Applications (July 24)• AAAI: Human Computation: Core Research Questions and State of the Art – Edith Law and Luis von Ahn, August 7• ASIS&T: How to Identify Ducks In Flight: A Crowdsourcing Approach to Biodiversity Research and Conservation – Steve Kelling, October 10, ebird• EC: Conducting Behavioral Research Using Amazons Mechanical Turk – Winter Mason and Siddharth Suri, June 5• HCIC: Quality Crowdsourcing for Human Computer Interaction Research – Ed Chi, June 14-18, about HCIC) – Also see his: Crowdsourcing for HCI Research with Amazon Mechanical Turk• Multimedia: Frontiers in Multimedia Search – Alan Hanjalic and Martha Larson, Nov 28• VLDB: Crowdsourcing Applications and Platforms – Anhai Doan, Michael Franklin, Donald Kossmann, and Tim Kraska)• WWW: Managing Crowdsourced Human Computation – Panos Ipeirotis and Praveen Paritosh 57
    • 2011 Workshops & Conferences• AAAI-HCOMP: 3rd Human Computation Workshop (Aug. 8)• ACIS: Crowdsourcing, Value Co-Creation, & Digital Economy Innovation (Nov. 30 – Dec. 2)• Crowdsourcing Technologies for Language and Cognition Studies (July 27)• CHI-CHC: Crowdsourcing and Human Computation (May 8)• CIKM: BooksOnline (Oct. 24, “crowdsourcing … online books”)• CrowdConf 2011 -- 2nd Conf. on the Future of Distributed Work (Nov. 1-2)• Crowdsourcing: Improving … Scientific Data Through Social Networking (June 13)• EC: Workshop on Social Computing and User Generated Content (June 5)• ICWE: 2nd International Workshop on Enterprise Crowdsourcing (June 20)• Interspeech: Crowdsourcing for speech processing (August)• NIPS: Second Workshop on Computational Social Science and the Wisdom of Crowds (Dec. TBD)• SIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28)• TREC-Crowd: Year 1 of TREC Crowdsourcing Track (Nov. 16-18)• UbiComp: 2nd Workshop on Ubiquitous Crowdsourcing (Sep. 18)• WSDM-CSDM: Crowdsourcing for Search and Data Mining (Feb. 9) 58
    • More BooksJuly 2010, kindle-only: “This book introduces you to thetop crowdsourcing sites and outlines step by step withphotos the exact process to get started as a requester onAmazon Mechanical Turk.“ 59
    • Bibliography J. Barr and L. Cabrera. “AI gets a Brain”, ACM Queue, May 2006. Bernstein, M. et al. Soylent: A Word Processor with a Crowd Inside. UIST 2010. Best Student Paper award. Bederson, B.B., Hu, C., & Resnik, P. Translation by Interactive Collaboration between Monolingual Users, Proceedings of Graphics Interface (GI 2010), 39-46. N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004. C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009. P. Dai, Mausam, and D. Weld. “Decision-Theoretic of Crowd-Sourced Workflows”, AAAI, 2010. J. Davis et al. “The HPU”, IEEE Computer Vision and Pattern Recognition Workshop on Advancing Computer Vision with Human in the Loop (ACVHL), June 2010. M. Gashler, C. Giraud-Carrier, T. Martinez. Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, ICMLA 2008. D. A. Grier. When Computers Were Human. Princeton University Press, 2005. ISBN 0691091579 JS. Hacker and L. von Ahn. “Matchin: Eliciting User Preferences with an Online Game”, CHI 2009. J. Heer, M. Bobstock. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design”, CHI 2010. P. Heymann and H. Garcia-Molina. “Human Processing”, Technical Report, Stanford Info Lab, 2010. J. Howe. “Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business”. Crown Business, New York, 2008. P. Hsueh, P. Melville, V. Sindhwami. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria”. NAACL HLT Workshop on Active Learning and NLP, 2009. B. Huberman, D. Romero, and F. Wu. “Crowdsourcing, attention and productivity”. Journal of Information Science, 2009. P.G. Ipeirotis. The New Demographics of Mechanical Turk. March 9, 2010. PDF and Spreadsheet. P.G. Ipeirotis, R. Chandrasekar and P. Bennett. Report on the human computation workshop. SIGKDD Explorations v11 no 2 pp. 80-83, 2010. P.G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. CeDER-10-04 (Sept. 11, 2010) 60
    • Bibliography (2) A. Kittur, E. Chi, and B. Suh. “Crowdsourcing user studies with Mechanical Turk”, SIGCHI 2008. Aniket Kittur, Boris Smus, Robert E. Kraut. CrowdForge: Crowdsourcing Complex Work. CHI 2011 Adriana Kovashka and Matthew Lease. “Human and Machine Detection of … Similarity in Art”. CrowdConf 2010. K. Krippendorff. "Content Analysis", Sage Publications, 2003 G. Little, L. Chilton, M. Goldman, and R. Miller. “TurKit: Tools for Iterative Tasks on Mechanical Turk”, HCOMP 2009. T. Malone, R. Laubacher, and C. Dellarocas. Harnessing Crowds: Mapping the Genome of Collective Intelligence. 2009. W. Mason and D. Watts. “Financial Incentives and the ’Performance of Crowds’”, HCOMP Workshop at KDD 2009. J. Nielsen. “Usability Engineering”, Morgan-Kaufman, 1994. A. Quinn and B. Bederson. “A Taxonomy of Distributed Human Computation”, Technical Report HCIL-2009-23, 2009 J. Ross, L. Irani, M. Six Silberman, A. Zaldivar, and B. Tomlinson. “Who are the Crowdworkers?: Shifting Demographics in Amazon Mechanical Turk”. CHI 2010. F. Scheuren. “What is a Survey” (http://www.whatisasurvey.info) 2004. R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. “Cheap and Fast But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks”. EMNLP-2008. V. Sheng, F. Provost, P. Ipeirotis. “Get Another Label? Improving Data Quality … Using Multiple, Noisy Labelers” KDD 2008. S. Weber. “The Success of Open Source”, Harvard University Press, 2004. L. von Ahn. Games with a purpose. Computer, 39 (6), 92–94, 2006. L. von Ahn and L. Dabbish. “Designing Games with a purpose”. CACM, Vol. 51, No. 8, 2008. 61
    • Bibliography (3) Shuo Chen et al. What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers. AAAI 2010. Paul Heymann, Hector Garcia-Molina: Turkalytics: analytics for human computation. WWW 2011. Florian Laws, Christian Scheible and Hinrich Schütze. Active Learning with Amazon Mechanical Turk. EMNLP 2011. C.Y. Lin. Rouge: A package for automatic evaluation of summaries. Proceedings of the workshop on text summarization branches out (WAS), 2004. C. Marshall and F. Shipman “The Ownership and Reuse of Visual Media”, JCDL, 2011. Hohyon Ryu and Matthew Lease. Crowdworker Filtering with Support Vector Machine. ASIS&T 2011. Wei Tang and Matthew Lease. Semi-Supervised Consensus Labeling for Crowdsourcing. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR), 2011. S. Vijayanarasimhan and K. Grauman. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. CVPR 2011. Stephen Wolfson and Matthew Lease. Look Before You Leap: Legal Pitfalls of Crowdsourcing. ASIS&T 2011. 62