Your SlideShare is downloading. ×
Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

1,724
views

Published on

The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/

The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/

Published in: Education, Technology, Business

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,724
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Crowdsourcing beyond … Building Crowdmining Services for Your Own Research Kuan-Ta Chen Institute of Information Science Academia SinicaCrowdKDD’12 Aug 12, 2012
  • 2. What I’m going to talkCrowdsourcing?Crowdsourcing + Data Mining Research?Common Fallacies of CS4DM ResearchPomics: A Crowdmining ServiceConclusion
  • 3. Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3
  • 4. A more formal definition “Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” [1] [1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4
  • 5. What Can Crowdsourcing Do?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5
  • 6. Brand TaggingCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 7
  • 7. Data Entry Reward: 4.4 USD/hourCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 8
  • 8. General Questions Reward: points on Yahoo! AnswersCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9
  • 9. When crowdsourcing meets data mining… Crowdsourcing Data mining What’s in here?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10
  • 10. Crowdsourcing for Data Mining: Issues Purposes Methodologies Annotation Recruiting (ground-truth generation) Incentives Evaluation Task Design Retrieval Workflow Human-in-the-loop Learning from crowd computation Quality control Cheat detectionCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11
  • 11. Crowdsourcing Uses in Data Mining ResearchCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12
  • 12. Image Semantics Reward: 0.04 USD / task main theme? key objects? unique attributes?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13
  • 13. find out photos of revolvers! 0.02 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14
  • 14. Human Skeleton 0.01 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 15
  • 15. Photo Orientation 0.01 USD/ taskCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17
  • 16. Perspectives for 3D Objects Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM12.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18
  • 17. Web Site Classifier 12 USD / hour Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and Scalability,” Invited Talk at CSDM 2011.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19
  • 18. Photographers’ Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family? Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’ Intentions: a Test Dataset,” ACM CrowdMM’12.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20
  • 19. Linguistic Affective Judgement Affective response (Snow et al. 2008) “Closing and cancellations top advice on flu outbreak” USD 0.4 to label 20 headlines (140 labels)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22
  • 20. A Lot More Examples Document relevance evaluation Alonso et al. (2008) Document rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Person name resolution Su et al. (2007) And so on...CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24
  • 21. THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM’12 Thanks to CrowdMM’12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to “Crowdsourcing for Multimedia” SI co-guest- editors Paul Bennent and Matt Lease.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25
  • 22. Common Fallacies #1 Crowdsourcing is NOT JUST conducting user studies Crowd is uncontrollable with tasks performed in uncontrolled conditions  How to manage the crowd?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26
  • 23. Common Fallacies #2 Crowdsourcing is NOT JUST analyzing user-generated content Cope with the noise in UGC rather than only the information.  How to manage the imperfectness & diversity in UGC?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27
  • 24. Common Fallacies #2 Crowdsourcing is NOT JUST analyzing user-generated content Put the task element in the loop  Re-purposing the creation of UGC as your own microtasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28
  • 25. Common Fallacies #3 Crowdsourcing is NOT JUST posting tasks on Mechanical Turk Explicit Crowdsourcing Implicit Crowdsourcing Piggyback Crowdsourcing Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29
  • 26. An implicit crowdmining platform for multimedia content
  • 27. Crowdsourcing for Data Mining: Issues Purposes Methodologies Annotation Recruiting (ground-truth generation) Incentives Evaluation Task Design Retrieval Workflow Human-in-the-loop Learning from crowd computation Quality control Cheat detectionCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31
  • 28. The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32
  • 29. How to Share Photos?
  • 30. 3 Common Ways Photo browsing Photo/video slideshow Illustrated textCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34
  • 31. Photo BrowsingCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 35
  • 32. Photo/Video slideshowCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 36
  • 33. Illustrated TextCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 37
  • 34. A MISSING PIECECrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 38
  • 35. ComicsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 39
  • 36. Photo Comics – Baby BornCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40
  • 37. Photo Comics – Birthday PartyCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41
  • 38. Photo Comics – Daily FunCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42
  • 39. Media Comparison Creation Viewer Viewer Port- Richness Cost Req. Control ability Photo Low Low High Low Low browsing Slideshow Medium Low Low Medium Low Illustrated High High High High High Text Comic High Low High High High How to lower it?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43
  • 40. Comic Making – Cartoonist’s WayCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44
  • 41. http://www.pomics.net
  • 42. Goal of Pomics
  • 43. Pomics = Picture to ComicsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47 47
  • 44. Computer-Aided Storytelling Location Timing Analysis Aesthetics Analysis Picture Automated Semantics Analysis Auto Draft User Storytelling Story Editing Machine Learning Own rating User Popularity Preference Adjustment Final StoryCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48
  • 45. Technical Challenges #1 Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing Aesthetics Analysis Exposure Composition Timing Analysis Contextual AnalysisCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49 49
  • 46. Technical Challenges #2  Automatic Storytelling  Significant photo selection  Paginating and page layouting  Narrative designCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50
  • 47. Pomics as a Social Service Web albums Publish & share Web resourcesCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51
  • 48. Live DemoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 52 52
  • 49. HOW IS RELATED TO CROWDSOURCING?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55
  • 50. USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATIONCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56
  • 51. What pictures are used? Aesthetics information Why the 3 pictures were used?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57
  • 52. Wizard Interface Aesthetics informationCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58
  • 53. The Page Layout Saliency info SemanticsCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59
  • 54. Usage Statistics of Pomics (since July 15 2012) 352 authors 434 comic books 4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons 3000+ shares on FacebookCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60
  • 55. WHAT WE HAVE GATHERED SO FAR?CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 61
  • 56. Picture Aesthetics InfoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 62
  • 57. Picture Aesthetics (cont.)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 63
  • 58. Picture Saliency InfoCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 64
  • 59. Picture Semantics Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65
  • 60. Can Pomics Do Micro-tasks? The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are “shared” by 20+ FB usersCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66
  • 61. Picture Aesthetics from MicrotasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67
  • 62. Picture Saliency from MicrotasksCrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68
  • 63. Crowdmining Services Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research Disadvantages High development cost Less flexible Hard to find the right incentives (besides money)CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69
  • 64. ConclusionCrowdmining is a potential and exciting areaCrowdsourcing != Mechanical TurkingA lot more can be done with crowdminingservices Building your own crowdmining service today!
  • 65. CrowdMM 2012 (in conjunction with ACM Multimedia 2012) Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations Annotation, Evaluation, Novel applications An industrial panel discussion Welcome to join us! http://crowdmm.org/CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71
  • 66. Unleash the power of Crowd! Thank You! Kuan-Ta Chen Academia Sinica http://www.iis.sinica.edu.tw/~swc