Successfully reported this slideshow.
Your SlideShare is downloading. ×

Digital Humanities Benelux 2017: Keynote Lora Aroyo

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 59 Ad
Advertisement

More Related Content

Slideshows for you (15)

Similar to Digital Humanities Benelux 2017: Keynote Lora Aroyo (20)

Advertisement

More from Lora Aroyo (17)

Recently uploaded (20)

Advertisement

Digital Humanities Benelux 2017: Keynote Lora Aroyo

  1. 1. http://lora-aroyo.org @laroyo Harnessing Human Semantics at Scale Measurable, Reproducible, Engaging, Sustainable Crowdsourcing & Nichesourcing Lora Aroyo
  2. 2. http://lora-aroyo.org @laroyo 20071998 2006 2009 from DVDs to data science
  3. 3. http://lora-aroyo.org @laroyo 20071998 2006 2009 Team BellKor wins Netflix Prize
  4. 4. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  5. 5. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  6. 6. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  7. 7. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  8. 8. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  9. 9. http://lora-aroyo.org @laroyo data is at the centre of every process
  10. 10. http://lora-aroyo.org @laroyo data is essential to evolve with users
  11. 11. http://lora-aroyo.org @laroyo Ceci n'est pas … la mona lisa
  12. 12. http://lora-aroyo.org @laroyo Ceci n'est pas … la mona lisa Louvre’s Mona Lisa is only #14
  13. 13. http://lora-aroyo.org @laroyo the battle of two worlds 9,3 million Louvre visitors 2014 14 million website visitors 2,3 million social media
  14. 14. http://lora-aroyo.org @laroyo in the (very near) future most visitors will be digital-born not bound by time or location native to new forms of co-makership native to new media Siebe Weide, Max Meijer and Marieke Krabshuis (2012). Agenda 2026: Study on the Future of the Dutch Museum Sector
  15. 15. http://lora-aroyo.org @laroyo variety of meanings multitude of perspectives abundance of sources endless contexts know your data
  16. 16. http://lora-aroyo.org @laroyo crowdsourcing to know your data at scale
  17. 17. http://lora-aroyo.org @laroyo variety of types multitude of platforms abundance of interactions endless characteristics know your crowds
  18. 18. http://lora-aroyo.org @laroyo https://www.rijksmuseum.nl/en/rijksstudio Engage with Co-creation
  19. 19. http://lora-aroyo.org @laroyo Engage with Co-creativity
  20. 20. http://lora-aroyo.org @laroyo Engage with Co-curation
  21. 21. http://lora-aroyo.org @laroyo Engage the Expert Niche http://annotate.accurator.nl
  22. 22. http://lora-aroyo.org @laroyo expertise of Rijksmuseum professionals is in annotating their collection with art-historical information, e.g. when they were created, by whom, etc.
  23. 23. http://lora-aroyo.org @laroyo detailed domain-specific information about depicted objects, e.g. which species the animal or plant belongs to, is in most cases not available
  24. 24. http://lora-aroyo.org @laroyo use nichesourcing, i.e. niches of people with the right expertise, to add more specific information
  25. 25. http://lora-aroyo.org @laroyo Keep Reproducing http://annotate.accurator.nl
  26. 26. http://lora-aroyo.org @laroyo Engage with Games training the general crowd to be a niche: game in which players can carry out an expert annotation tasks with some assistance
  27. 27. http://lora-aroyo.org @laroyo http://waisda.nl Engage with Games
  28. 28. http://lora-aroyo.org @laroyo http://waisda.nl Engage with Games
  29. 29. http://lora-aroyo.org @laroyo http://spotvogel.vroegevogels.vara.nl Keep Reproducing
  30. 30. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  31. 31. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  32. 32. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  33. 33. http://lora-aroyo.org @laroyo http://crowdtruth.org/
  34. 34. http://lora-aroyo.org @laroyo http://data.crowdtruth.org/
  35. 35. http://lora-aroyo.org @laroyo Challenges
  36. 36. http://lora-aroyo.org @laroyo Low reproducibility rates Difficult to estimate & control the time to complete Difficult to assess & compare quality Demands continuous promotional effort Active learning (human-in-the-loop) needs different expertise Difficult to incorporate results into existing content infrastructure Challenges Crowdsourcing typically undertaken in isolation
  37. 37. http://lora-aroyo.org @laroyo Assess Impact of Task Design
  38. 38. http://lora-aroyo.org @laroyo Instructions Layout Sequence Crowds Payment Campaign Assess Impact of Task Design experiment with different designs
  39. 39. http://lora-aroyo.org @laroyo for example mapping music to mood
  40. 40. http://lora-aroyo.org @laroyo Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Other passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5 confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clusters boisterous, good-natured autumnal, wry visceral rowdy brooding Choose one: Which is the mood most appropriate for each song? Goal: (Lee and Hu 2012) 1 song - 1 mood???
  41. 41. http://lora-aroyo.org @laroyo If “One Truth” & “No Disagreement” Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 W1 1 W2 1 W3 1 W4 1 W5 1 W6 1 W7 W8 W9 1 W10 1 Totals 1 3 1 2 1
  42. 42. http://lora-aroyo.org @laroyo Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W1 1 1 1 W2 1 1 1 W3 1 1 1 W4 1 1 W5 1 1 W6 1 1 1 W7 1 1 1 W8 1 1 1 W9 1 1 W10 1 1 1 1 1 Totals 3 5 6 5 2 8 If “Many Truths” & “Disagreement”
  43. 43. Web & Media Group http://lora-aroyo.org @laroyo simplification of context this all results in
  44. 44. Web & Media Group http://lora-aroyo.org @laroyo
  45. 45. http://lora-aroyo.org @laroyo ● Identify Crowdsourcing Goals through user log analysis ○ # queries, #unique queries, #queries of specific type ○ ranked by popularity ○ ranked by popularity and with error, e.g. ■ # queries entered over 50 times with 0 results ■ # queries of specific type with 0 results ○ which will have biggest impact ○ which has biggest urgency ● … or through other user analysis Assess Impact of Results
  46. 46. http://lora-aroyo.org @laroyo for example in video search
  47. 47. http://lora-aroyo.org @laroyo people search for fragments experts annotate full videos 35% of search queries result in not found people search for fragments experts annotate full videos 35% of search queries result in not found for example in video search
  48. 48. http://lora-aroyo.org @laroyo Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  49. 49. http://lora-aroyo.org @laroyo Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 time-based annotation bernhard 88% of the tags useful for specific genres describe short segments often not very specific don’t describe program as a whole
  50. 50. http://lora-aroyo.org @laroyo for example in video search video annotation is time-consuming 5 times the video duration experts use a specific vocabulary that is unknown to general audiences video annotation is time-consuming 5 times the video duration experts use a specific vocabulary that is unknown to general audiences
  51. 51. http://lora-aroyo.org @laroyo user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google locations (7%) engeland persons (31%) objects (57%) Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  52. 52. Web & Media Group http://lora-aroyo.org @laroyo human subjectivity, ambiguity & uncertainty of expression natural part of human semantics
  53. 53. http://lora-aroyo.org @laroyo measure quality quality is not just about spam quality is typically multi-dimensional understand the diversity in crowd answers do not ignore multitude of interpretations understand the variety of contexts identify cases with high ambiguity, similarity, … experiment with explicit metrics experiment with different designs
  54. 54. http://lora-aroyo.org @laroyo Measure Progress 6 months 2 years 340,551 tags 36,981 tags 137.421 matches 602 items 1.782 items 555 registered players 2,017 users (taggers) thousands of anonymous players 12,279 visits (3+ min online) 44,362 pageviews Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011). On the role of user-generated metadata in audio visual collections. International conference on Knowledge capture K-CAP '11, Pages 145-152
  55. 55. http://lora-aroyo.org @laroyo campaign, campaign, campaign
  56. 56. http://lora-aroyo.org @laroyo
  57. 57. http://lora-aroyo.org @laroyo
  58. 58. http://lora-aroyo.org @laroyo
  59. 59. http://lora-aroyo.org @laroyo Measurable quality Reproducible results Sustainable settings Engaging interaction Goals

×