Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

2,778 views

Published on

The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/

Published in: Education, Technology, Business

Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

  1. 1. Crowdsourcing beyond … Building Crowdmining Services for Your Own Research Kuan-Ta Chen Institute of Information Science Academia SinicaCrowdKDD’12 Aug 12, 2012
  2. 2. What I’m going to talkCrowdsourcing?Crowdsourcing + Data Mining Research?Common Fallacies of CS4DM ResearchPomics: A Crowdmining ServiceConclusion
  3. 3. Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3
  4. 4. A more formal definition “Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” [1] [1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4
  5. 5. What Can Crowdsourcing Do?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5
  6. 6. Brand TaggingCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 7
  7. 7. Data Entry Reward: 4.4 USD/hourCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 8
  8. 8. General Questions Reward: points on Yahoo! AnswersCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9
  9. 9. When crowdsourcing meets data mining… Crowdsourcing Data mining What’s in here?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10
  10. 10. Crowdsourcing for Data Mining: Issues Purposes Methodologies Annotation Recruiting (ground-truth generation) Incentives Evaluation Task Design Retrieval Workflow Human-in-the-loop Learning from crowd computation Quality control Cheat detectionCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11
  11. 11. Crowdsourcing Uses in Data Mining ResearchCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12
  12. 12. Image Semantics Reward: 0.04 USD / task main theme? key objects? unique attributes?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13
  13. 13. find out photos of revolvers! 0.02 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14
  14. 14. Human Skeleton 0.01 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 15
  15. 15. Photo Orientation 0.01 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17
  16. 16. Perspectives for 3D Objects Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM12.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18
  17. 17. Web Site Classifier 12 USD / hour Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and Scalability,” Invited Talk at CSDM 2011.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19
  18. 18. Photographers’ Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family? Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’ Intentions: a Test Dataset,” ACM CrowdMM’12.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20
  19. 19. Linguistic Affective Judgement Affective response (Snow et al. 2008) “Closing and cancellations top advice on flu outbreak” USD 0.4 to label 20 headlines (140 labels)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22
  20. 20. A Lot More Examples Document relevance evaluation Alonso et al. (2008) Document rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Person name resolution Su et al. (2007) And so on...CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24
  21. 21. THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM’12 Thanks to CrowdMM’12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to “Crowdsourcing for Multimedia” SI co-guest- editors Paul Bennent and Matt Lease.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25
  22. 22. Common Fallacies #1 Crowdsourcing is NOT JUST conducting user studies Crowd is uncontrollable with tasks performed in uncontrolled conditions  How to manage the crowd?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26
  23. 23. Common Fallacies #2 Crowdsourcing is NOT JUST analyzing user-generated content Cope with the noise in UGC rather than only the information.  How to manage the imperfectness & diversity in UGC?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27
  24. 24. Common Fallacies #2 Crowdsourcing is NOT JUST analyzing user-generated content Put the task element in the loop  Re-purposing the creation of UGC as your own microtasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28
  25. 25. Common Fallacies #3 Crowdsourcing is NOT JUST posting tasks on Mechanical Turk Explicit Crowdsourcing Implicit Crowdsourcing Piggyback Crowdsourcing Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29
  26. 26. An implicit crowdmining platform for multimedia content
  27. 27. Crowdsourcing for Data Mining: Issues Purposes Methodologies Annotation Recruiting (ground-truth generation) Incentives Evaluation Task Design Retrieval Workflow Human-in-the-loop Learning from crowd computation Quality control Cheat detectionCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31
  28. 28. The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32
  29. 29. How to Share Photos?
  30. 30. 3 Common Ways Photo browsing Photo/video slideshow Illustrated textCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34
  31. 31. Photo BrowsingCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 35
  32. 32. Photo/Video slideshowCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 36
  33. 33. Illustrated TextCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 37
  34. 34. A MISSING PIECECrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 38
  35. 35. ComicsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 39
  36. 36. Photo Comics – Baby BornCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40
  37. 37. Photo Comics – Birthday PartyCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41
  38. 38. Photo Comics – Daily FunCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42
  39. 39. Media Comparison Creation Viewer Viewer Port- Richness Cost Req. Control ability Photo Low Low High Low Low browsing Slideshow Medium Low Low Medium Low Illustrated High High High High High Text Comic High Low High High High How to lower it?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43
  40. 40. Comic Making – Cartoonist’s WayCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44
  41. 41. http://www.pomics.net
  42. 42. Goal of Pomics
  43. 43. Pomics = Picture to ComicsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47 47
  44. 44. Computer-Aided Storytelling Location Timing Analysis Aesthetics Analysis Picture Automated Semantics Analysis Auto Draft User Storytelling Story Editing Machine Learning Own rating User Popularity Preference Adjustment Final StoryCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48
  45. 45. Technical Challenges #1 Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing Aesthetics Analysis Exposure Composition Timing Analysis Contextual AnalysisCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49 49
  46. 46. Technical Challenges #2  Automatic Storytelling  Significant photo selection  Paginating and page layouting  Narrative designCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50
  47. 47. Pomics as a Social Service Web albums Publish & share Web resourcesCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51
  48. 48. Live DemoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 52 52
  49. 49. HOW IS RELATED TO CROWDSOURCING?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55
  50. 50. USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATIONCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56
  51. 51. What pictures are used? Aesthetics information Why the 3 pictures were used?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57
  52. 52. Wizard Interface Aesthetics informationCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58
  53. 53. The Page Layout Saliency info SemanticsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59
  54. 54. Usage Statistics of Pomics (since July 15 2012) 352 authors 434 comic books 4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons 3000+ shares on FacebookCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60
  55. 55. WHAT WE HAVE GATHERED SO FAR?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 61
  56. 56. Picture Aesthetics InfoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 62
  57. 57. Picture Aesthetics (cont.)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 63
  58. 58. Picture Saliency InfoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 64
  59. 59. Picture Semantics Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65
  60. 60. Can Pomics Do Micro-tasks? The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are “shared” by 20+ FB usersCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66
  61. 61. Picture Aesthetics from MicrotasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67
  62. 62. Picture Saliency from MicrotasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68
  63. 63. Crowdmining Services Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research Disadvantages High development cost Less flexible Hard to find the right incentives (besides money)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69
  64. 64. ConclusionCrowdmining is a potential and exciting areaCrowdsourcing != Mechanical TurkingA lot more can be done with crowdminingservices Building your own crowdmining service today!
  65. 65. CrowdMM 2012 (in conjunction with ACM Multimedia 2012) Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations Annotation, Evaluation, Novel applications An industrial panel discussion Welcome to join us! http://crowdmm.org/CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71
  66. 66. Unleash the power of Crowd! Thank You! Kuan-Ta Chen Academia Sinica http://www.iis.sinica.edu.tw/~swc

×