Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digital Humanities Benelux 2017: Keynote Lora Aroyo

544 views

Published on

https://dhbenelux2017.eu/programme/keynotes/lora/

Published in: Technology

Digital Humanities Benelux 2017: Keynote Lora Aroyo

  1. 1. http://lora-aroyo.org @laroyo Harnessing Human Semantics at Scale Measurable, Reproducible, Engaging, Sustainable Crowdsourcing & Nichesourcing Lora Aroyo
  2. 2. http://lora-aroyo.org @laroyo 20071998 2006 2009 from DVDs to data science
  3. 3. http://lora-aroyo.org @laroyo 20071998 2006 2009 Team BellKor wins Netflix Prize
  4. 4. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  5. 5. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  6. 6. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  7. 7. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  8. 8. http://lora-aroyo.org @laroyo 20061994 2003 2016 2017 from books to data science
  9. 9. http://lora-aroyo.org @laroyo data is at the centre of every process
  10. 10. http://lora-aroyo.org @laroyo data is essential to evolve with users
  11. 11. http://lora-aroyo.org @laroyo Ceci n'est pas … la mona lisa
  12. 12. http://lora-aroyo.org @laroyo Ceci n'est pas … la mona lisa Louvre’s Mona Lisa is only #14
  13. 13. http://lora-aroyo.org @laroyo the battle of two worlds 9,3 million Louvre visitors 2014 14 million website visitors 2,3 million social media
  14. 14. http://lora-aroyo.org @laroyo in the (very near) future most visitors will be digital-born not bound by time or location native to new forms of co-makership native to new media Siebe Weide, Max Meijer and Marieke Krabshuis (2012). Agenda 2026: Study on the Future of the Dutch Museum Sector
  15. 15. http://lora-aroyo.org @laroyo variety of meanings multitude of perspectives abundance of sources endless contexts know your data
  16. 16. http://lora-aroyo.org @laroyo crowdsourcing to know your data at scale
  17. 17. http://lora-aroyo.org @laroyo variety of types multitude of platforms abundance of interactions endless characteristics know your crowds
  18. 18. http://lora-aroyo.org @laroyo https://www.rijksmuseum.nl/en/rijksstudio Engage with Co-creation
  19. 19. http://lora-aroyo.org @laroyo Engage with Co-creativity
  20. 20. http://lora-aroyo.org @laroyo Engage with Co-curation
  21. 21. http://lora-aroyo.org @laroyo Engage the Expert Niche http://annotate.accurator.nl
  22. 22. http://lora-aroyo.org @laroyo expertise of Rijksmuseum professionals is in annotating their collection with art-historical information, e.g. when they were created, by whom, etc.
  23. 23. http://lora-aroyo.org @laroyo detailed domain-specific information about depicted objects, e.g. which species the animal or plant belongs to, is in most cases not available
  24. 24. http://lora-aroyo.org @laroyo use nichesourcing, i.e. niches of people with the right expertise, to add more specific information
  25. 25. http://lora-aroyo.org @laroyo Keep Reproducing http://annotate.accurator.nl
  26. 26. http://lora-aroyo.org @laroyo Engage with Games training the general crowd to be a niche: game in which players can carry out an expert annotation tasks with some assistance
  27. 27. http://lora-aroyo.org @laroyo http://waisda.nl Engage with Games
  28. 28. http://lora-aroyo.org @laroyo http://waisda.nl Engage with Games
  29. 29. http://lora-aroyo.org @laroyo http://spotvogel.vroegevogels.vara.nl Keep Reproducing
  30. 30. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  31. 31. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  32. 32. http://lora-aroyo.org @laroyo CrowdTruth.org Experiment with Paid Crowds
  33. 33. http://lora-aroyo.org @laroyo http://crowdtruth.org/
  34. 34. http://lora-aroyo.org @laroyo http://data.crowdtruth.org/
  35. 35. http://lora-aroyo.org @laroyo Challenges
  36. 36. http://lora-aroyo.org @laroyo Low reproducibility rates Difficult to estimate & control the time to complete Difficult to assess & compare quality Demands continuous promotional effort Active learning (human-in-the-loop) needs different expertise Difficult to incorporate results into existing content infrastructure Challenges Crowdsourcing typically undertaken in isolation
  37. 37. http://lora-aroyo.org @laroyo Assess Impact of Task Design
  38. 38. http://lora-aroyo.org @laroyo Instructions Layout Sequence Crowds Payment Campaign Assess Impact of Task Design experiment with different designs
  39. 39. http://lora-aroyo.org @laroyo for example mapping music to mood
  40. 40. http://lora-aroyo.org @laroyo Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Other passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5 confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clusters boisterous, good-natured autumnal, wry visceral rowdy brooding Choose one: Which is the mood most appropriate for each song? Goal: (Lee and Hu 2012) 1 song - 1 mood???
  41. 41. http://lora-aroyo.org @laroyo If “One Truth” & “No Disagreement” Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 W1 1 W2 1 W3 1 W4 1 W5 1 W6 1 W7 W8 W9 1 W10 1 Totals 1 3 1 2 1
  42. 42. http://lora-aroyo.org @laroyo Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W1 1 1 1 W2 1 1 1 W3 1 1 1 W4 1 1 W5 1 1 W6 1 1 1 W7 1 1 1 W8 1 1 1 W9 1 1 W10 1 1 1 1 1 Totals 3 5 6 5 2 8 If “Many Truths” & “Disagreement”
  43. 43. Web & Media Group http://lora-aroyo.org @laroyo simplification of context this all results in
  44. 44. Web & Media Group http://lora-aroyo.org @laroyo
  45. 45. http://lora-aroyo.org @laroyo ● Identify Crowdsourcing Goals through user log analysis ○ # queries, #unique queries, #queries of specific type ○ ranked by popularity ○ ranked by popularity and with error, e.g. ■ # queries entered over 50 times with 0 results ■ # queries of specific type with 0 results ○ which will have biggest impact ○ which has biggest urgency ● … or through other user analysis Assess Impact of Results
  46. 46. http://lora-aroyo.org @laroyo for example in video search
  47. 47. http://lora-aroyo.org @laroyo people search for fragments experts annotate full videos 35% of search queries result in not found people search for fragments experts annotate full videos 35% of search queries result in not found for example in video search
  48. 48. http://lora-aroyo.org @laroyo Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  49. 49. http://lora-aroyo.org @laroyo Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 time-based annotation bernhard 88% of the tags useful for specific genres describe short segments often not very specific don’t describe program as a whole
  50. 50. http://lora-aroyo.org @laroyo for example in video search video annotation is time-consuming 5 times the video duration experts use a specific vocabulary that is unknown to general audiences video annotation is time-consuming 5 times the video duration experts use a specific vocabulary that is unknown to general audiences
  51. 51. http://lora-aroyo.org @laroyo user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google locations (7%) engeland persons (31%) objects (57%) Measure Quality “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  52. 52. Web & Media Group http://lora-aroyo.org @laroyo human subjectivity, ambiguity & uncertainty of expression natural part of human semantics
  53. 53. http://lora-aroyo.org @laroyo measure quality quality is not just about spam quality is typically multi-dimensional understand the diversity in crowd answers do not ignore multitude of interpretations understand the variety of contexts identify cases with high ambiguity, similarity, … experiment with explicit metrics experiment with different designs
  54. 54. http://lora-aroyo.org @laroyo Measure Progress 6 months 2 years 340,551 tags 36,981 tags 137.421 matches 602 items 1.782 items 555 registered players 2,017 users (taggers) thousands of anonymous players 12,279 visits (3+ min online) 44,362 pageviews Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011). On the role of user-generated metadata in audio visual collections. International conference on Knowledge capture K-CAP '11, Pages 145-152
  55. 55. http://lora-aroyo.org @laroyo campaign, campaign, campaign
  56. 56. http://lora-aroyo.org @laroyo
  57. 57. http://lora-aroyo.org @laroyo
  58. 58. http://lora-aroyo.org @laroyo
  59. 59. http://lora-aroyo.org @laroyo Measurable quality Reproducible results Sustainable settings Engaging interaction Goals

×