Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science with Human in the Loop @Faculty of Science #Leiden University

548 views

Published on

Software systems are becoming ever more intelligent and more useful, but the way we interact with these machines too often reveals that they don’t actually understand people. Knowledge Representation and Semantic Web focus on the scientific challenges involved in providing human knowledge in machine-readable form. However, we observe that various types of human knowledge cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. The key scientific challenge is to provide an approach to capturing human knowledge in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management. My research is focusing on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. In this talk I will show how the CrowdTruth framework (http://crowdtruth.org) facilitates data collection, processing and analytics of human computation knowledge.

Some project links:
- http://controcurator.org/
- http://crowdtruth.org/
- http://diveproject.beeldengeluid.nl/
- http://vu-amsterdam-web-media-group.github.io/linkflows/

Published in: Technology
  • Be the first to comment

Data Science with Human in the Loop @Faculty of Science #Leiden University

  1. 1. Cognitive Computing with Human in the Loop http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Lora Aroyo Web & Media Group, VU IBM Center for Advanced Studies (CAS) Harnessing User Semantics at Scale
  2. 2. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Who am I … Vrije Universiteit Amsterdam computer science professor heading web & media group Amsterdam Data Science IBM Center for Advanced Studies, Amsterdam research associate leading cognitive computing & crowdsourcing team Columbia University, NY visiting scholar computer science, NLP, Computer Vision Columbia Data Science Tagasauris Inc, NY Chief of Science
  3. 3. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo VU Web & Media Group … Tobias Kuhn Davide Ceolin Victor de Boer Jan Wielemaker 10 PhD Students Lora Aroyo
  4. 4. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo VU Web & Media Group … Tobias Kuhn Davide Ceolin Victor de Boer Jan Wielemaker 10 PhD Students Lora Aroyo Intelligent & Interactive Information Systems enriching metadata & content of digital collections content analysis for entity extraction modeling provenance in digital collections tracking changes over time augmenting online multimedia text & video summarization interactive product placement, hotspots assessing quality of web data bias, controversy, opinions, perspectives uncertainty, ambiguity trust, privacy
  5. 5. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo … but they don’t actually understand people software systems becoming ever more intelligent
  6. 6. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo not all human knowledge can yet be captured by machines for wide ranges of real-world contexts Knowledge Representation aims at human knowledge in machine-readable form
  7. 7. all the information machines have is all the information there is
  8. 8. there is always something else …
  9. 9. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo key scientific challenge: capturing human knowledge at scale and adequate to real-world needs
  10. 10. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Human Computation: how human intelligence at scale can be used to improve machine-based knowledge
  11. 11. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo understanding human computation: improving how machine-based systems acquire, capture & harness human knowledge
  12. 12. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo … understanding the data variety of meanings multitude of perspectives abundance of sources endless applications
  13. 13. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo … understanding the crowds volunteers enthusiasts visitors on-site visitors online paid crowds in-house experts understand who are the different crowds what can they do for your collection
  14. 14. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://crowdtruth.org/ framework that facilitates data collection, processing & analytics of human computation knowledge
  15. 15. “best collective decisions are result of disagreement, not consensus or compromise” James Surowiecki
  16. 16. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo disagreement = signal
  17. 17. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://crowdtruth.org/ disagreement is signal for the natural ambiguity of language and diversity & perspectives of human interpretation
  18. 18. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://controcurator.org/
  19. 19. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  20. 20. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  21. 21. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo X Interac(ve Explora,on & Discovery in Context building automa(c storylines (narra(ves) DIVE+ Aggregated views over the collec(on collec(ng perspec,ves from crowds & niches http://diveproject.beeldengeluid.nl/
  22. 22. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo VOTE for DIVE: https://summit2017.lodlam.net/2017/04/12/dive-explorative-search-for-digital-humanities/
  23. 23. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo VU – IBM CAS Team
  24. 24. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Victor de Boer Lora Aroyo Oana Inel Chiel van den Akker Susan Legêne
  25. 25. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Carlos MarAnez OrAz Werner Helmich Berber Hagedoorn Sabrina Sauer
  26. 26. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Liliana Melgar Johan Oomen Jaap Blom
  27. 27. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  28. 28. Cognitive Computing with Human in the Loop http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Lora Aroyo Web & Media Group, VU IBM Center for Advanced Studies (CAS) Harnessing User Semantics at Scale
  29. 29. https://www.rijksmuseum.nl/en/rijksstudio Crowds for Co-crea-on Data
  30. 30. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo … by user-driven augmentations of exiting online collections
  31. 31. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  32. 32. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Nichesourcing with Experts http://annotate.accurator.nl
  33. 33. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo niches of people with the right expertise to contribute specific information
  34. 34. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Train Lay Crowds to be Experts training the general crowd to be a niche: game in which players can carry out an expert annotation tasks with some assistance
  35. 35. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://spotvogel.vroegevogels.vara.nl Volunteer crowds for continuous gaming
  36. 36. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Paid Crowds for Video Analysis CrowdTruth.org
  37. 37. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Paid Crowds for Text Analysis CrowdTruth.org
  38. 38. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Paid Crowds for Image Analysis http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo CrowdTruth.org
  39. 39. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Challenge 1: Typically undertaken in isolation Challenge 2: Difficult to estimate & control the time to complete Challenge 3: Difficult to assess & compare quality Challenge 4: Demands continuous promotional effort Challenge 5: Active learning (human-in-the-loop) needs different expertise Challenge 6: Challenging for institutions to incorporate crowdsourcing results into their existing content infrastructure Crowdsourcing Challenges
  40. 40. measure & assess ensure impact •  be aware of the channel, e.g. Wikipedia, Wikimedia, Facebook
  41. 41. Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011). On the role of user-generated metadata in audio visual collections. International conference on Knowledge capture K-CAP '11, Pages 145-152 measure & assess monitor progress 6 months 2 years 340,551 tags 36,981 tags 137.421 matches 602 items 1.782 items 555 registered players 2,017 users (taggers) thousands of anonymous players 12,279 visits (3+ min online) 44,362 pageviews
  42. 42. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google locations (7%) engeland persons (31%) objects (57%) measure & assess evaluate content, compare crowds 88% of the tags useful for specific genres
  43. 43. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://crowdtruth.org/ disagreement signals ambiguity if people disagree then it will be more difficult for a machine to classify that example
  44. 44. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  45. 45. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  46. 46. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo http://mediasuite.clariah.nl/
  47. 47. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  48. 48. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  49. 49. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  50. 50. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  51. 51. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  52. 52. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
  53. 53. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo 1998 from DVDs to data science
  54. 54. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo 1998 2006 1 million dollar prize for best algorithm
  55. 55. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Netflix switches to streaming 20071998 2006
  56. 56. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Team BellKor wins Netflix Prize 20071998 2006 2009
  57. 57. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo Team BellKor wins Netflix Prize 20071998 2006 2009
  58. 58. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo From Jeopardy to real-world problems 2011 2017
  59. 59. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo data is at the centre of every process
  60. 60. http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo data is essential to evolve with users

×