2. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
TAKE HOME MESSAGE
2
data is essential to evolve with your users
data should be at the center of every process
there is no single notion of truth
but a spectrum of context, opinions,
perspectives & shades of grey
harnessing the full spectrum of
truth from experts & users
creates more opportunities for
serendipity, creativity & engagement
4. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
CULTURAL HERITAGE
4
Before the Digital Age
Lots of manual effort
Focus on internal collection
management
Focus on art historical
significance
Access targeted to researchers
& professionals
Small curated selection online
for general audiences onsite
6. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DIGITAL HERITAGE
6
Bringing collections online
Focus on massive digitization
of heritage collections
Getting large collections online
Still need significant art
historical understanding to get
access
Metadata not sufficient for the
online presence
7. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Knowledge Representation, Taxonomies, Thesauri
METADATA ENRICHMENT
Shared structured knowledge
Guus Schreiber, et al (2000). The CommonKADS Methodology
7
8. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Linked Data, Semantic Web, Interoperability, Standards
METADATA ENRICHMENT
Shift from metadata for internal use to metadata for online access
Michiel Hildebrand, http://e-culture.multimedian.nl, 2009
8
14. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
RIJKSMUSEUM LINKED DATA
Using Linked Data to Diversify Search Results a Case Study in Cultural Heritage
Chris Dijkshoorn, Lora Aroyo, Guus Schreiber, Jan Wielemaker, and Lizzy Jongma, 2014
14
15. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Linked Data, Semantic Web, Interoperability, Standards
METADATA ENRICHMENT
Building community for shared knowledge creation, use & maintenance
2014, http://www.getty.edu/research/tools/vocabularies/lod/index.html
15
16. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
ADDRESSED THE WEB ACCESS & SCALE ISSUES ...
through using automated methods to enrich & curate metadata
André Malraux, The Imaginary Museum of World Sculpture, 1953
16
17. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
André Malraux, The Imaginary Museum of World Sculpture, 1953
BUT THAT WASN’T ENOUGH FOR TRUE ENGAGEMENT
Focus on information support rather than interpretation support for online collections
17
18. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Gravity (2013)
LOST IN CULTURAL SPACE MORE THAN EVER
The sense of disconnect was now bigger as there has never been so much online
information and so difficult to find ...
18
26. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org 26
IN THE VERY NEAR FUTURE
most visitors will be digital-born
not bound by time or location
native to new forms of co-makership
native to new media
Siebe Weide, Max Meijer and Marieke Krabshuis (2012). Agenda 2026: Study on the Future of the Dutch Museum Sector
28. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org 28
NETFLIX: BIG & DEEP DATA FOR ENGAGEMENT
engagement monitoring:
- how many users watching show X finished it to the end of season Y?
- what did the other users do?
- how big of a ‘time gap’ between watching episodes?
deep data with tracking EVENTS:
- when people pause, rewind, fast forward, leave (and if ever come back)
- when people watch; where they watch (zip code) & on what device
- what people search for (~ 3 mil per day); how people browse & scroll;
deep data within video - “in the moment” characteristics
- how much users need to watch in order to be less likely to cancel, e.g.
- if users watch at least 15 h/month they are 75% less likely to cancel.
- If they drop below 5 hours, there is a 95% chance they will cancel
29. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org 29
NETFLIX: BIG & DEEP DATA FOR ENGAGEMENT
1. Set objectives, pick metrics = shared across whole organization
2. Consider UX as mission-critical
3. Personalize UX as much as possible
4. Understand user’s lifestyle and context
5. Use interaction data then ask for feedback
6. Let users know your service is adapting to their tastes
7. Ensure metadata captures content nuances and is consistent
8. Give reasons to come back often
9. Run frequent UI experiments
10. Close the loop and base your decisions upon data
https://www.slideshare.net/PancrazioAuteri/personalization-10-lessons-learned-from-netflix/50-Netflix_solutions_are_applicable_and
36. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org 36
One truth: knowledge acquisition for the
semantic web assumes one correct
interpretation for every example
All examples are created equal: triples are
triples, one is not more important than
another, they are all either true or false
Disagreement bad: when people disagree,
they don’t understand the problem
Experts rule: knowledge is captured from
domain experts
One is enough: knowledge by a single
expert is sufficient
Detailed explanations help: if examples
cause disagreement - add instructions
Once done, forever valid: knowledge is not
updated; new data not aligned with old
7 Myths about Human Annotation
BINARY WORLD
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
44. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org 44
WHAT DOES THE CROWD SAY?
This applies to everything, e.g. interests, popularity, relevance, significance ..
& the bigger the crowd the better
& better to do it continuously and from various perspectives
@ at various granularity levels
Does This Image Depict A Woman?
95% 75% 50%
47. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
a spatial representation of truth & meaning
harnesses disagreement
CROWDTRUTH.ORG
“The Three Sides of CrowdTruth”, Journal of Human Computation 2014, L. Aroyo, C. Welty
47
48. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
gathering diversity of perspectives & opinions from the crowd
expand the expert vocabularies with these
provide new type of gold standard for machine intelligence
CROWDTRUTH.ORG
“The Three Sides of CrowdTruth”, Journal of Human Computation 2014, L. Aroyo, C. Welty
48
49. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
COMFORT ZONE DISRUPTED
49
Encourage Disagreement in Annotation Tasks & Develop New Quality Metrics
Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, Lora Aroyo, Chris Welty (2018):
Empirical Methodology for Crowdsourcing Ground Truth
50. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
On the role of user-generated metadata in audio visual collections (2011).
R. Gligorov, M. Hildebrand, J. van Ossenbruggen, G. Schreiber, L. Aroyo K-CAP2011
VIDEO METADATA ENRICHMENT
The Netherlands Institute for Sound and Vision
http://waisda.nl
51. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
VIDEO METADATA ENRICHMENT
The Netherlands Institute for Sound and Vision
http://spotvogel.vroegevogels.vara.nl/
On the role of user-generated metadata in audio visual collections (2011).
R. Gligorov, M. Hildebrand, J. van Ossenbruggen, G. Schreiber, L. Aroyo K-CAP2011
51
52. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DIVE+
Explorative Search
DIVE into the event-based browsing of linked historical media (2015)
V De Boer, J Oomen, O Inel, L Aroyo, E Van Staveren, in Journal of Web Semantics:
52
http://diveplus.beeldengeluid.nl/
53. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DEEP QA IN CULTURAL HERITAGE
Mauritshuis use case
53
Nikita Galinkin, Zoltán Szlávik, Lora Aroyo and Benjamin Timmermans (2017).
Catch Them If You Can: A Simulation Study on Malicious Behavior in a Cultural Heritage
Question Answering System. The 29th Benelux Conference on Artificial Intelligence (BNAIC 2017)
61. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Chris Dijkshoorn, Victor De Boer, Lora Aroyo, Guus Schreiber (2014).
Accurator: Nichesourcing for Cultural Heritage
NICHESOURCING: FINDING NICHES IN THE CROWD
Accurator tool: SealincMedia Project
http://sealincmedia.wordpress.com
61
62. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Chris Dijkshoorn, Victor De Boer, Lora Aroyo, Guus Schreiber (2014).
Accurator: Nichesourcing for Cultural Heritage
NICHESOURCING IN THE CULTURAL HERITAGE
Accurator tool
http://annotate.accurator.nl
62
63. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Chris Dijkshoorn, Victor De Boer, Lora Aroyo, Guus Schreiber (2014).
Accurator: Nichesourcing for Cultural Heritage
NICHESOURCING IN THE CULTURAL HERITAGE
Accurator tool
http://annotate.accurator.nl
63
64. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Chris Dijkshoorn, Victor De Boer, Lora Aroyo, Guus Schreiber (2014).
Accurator: Nichesourcing for Cultural Heritage
NICHESOURCING IN THE CULTURAL HERITAGE
Accurator tool
http://annotate.accurator.nl
64
65. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DigiBird: on the fly collection integration supported by the crowd (2017)
Chris Dijkshoorn, Christina-Lulia Bucur, Maarten Brinkerink, Sander Pieterse and Lora Aroyo
SUCCESS STORIES: NICHESOURCING EVENTS
Part of the SealincMedia Project
http://annotate.accurator.nl
65
66. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DigiBird: on the fly collection integration supported by the crowd (2017)
Chris Dijkshoorn, Christina-Lulia Bucur, Maarten Brinkerink, Sander Pieterse and Lora Aroyo
DigiBird Project
http://annotate.accurator.nl
66
SUCCESS STORIES: NICHESOURCING EVENTS
67. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DigiBird: on the fly collection integration supported by the crowd (2017)
Chris Dijkshoorn, Christina-Lulia Bucur, Maarten Brinkerink, Sander Pieterse and Lora Aroyo
DigiBird Project
http://annotate.accurator.nl
67
SUCCESS STORIES: NICHESOURCING EVENTS
68. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DigiBird: on the fly collection integration supported by the crowd (2017)
Chris Dijkshoorn, Christina-Lulia Bucur, Maarten Brinkerink, Sander Pieterse and Lora Aroyo
DigiBird Project
http://annotate.accurator.nl
68
SUCCESS STORIES: NICHESOURCING EVENTS
69. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
Chris Dijkshoorn, Victor De Boer, Lora Aroyo, Guus Schreiber (2014).
Accurator: Nichesourcing for Cultural Heritage
CREATING EXPERTS WITH GAMES
Accurator tool
http://annotate.accurator.nl
69
78. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
DATA SCIENCE WITH RIJKSSTUDIO COULD PROVIDE ...
engagement monitoring:
- how many users viewing artwork X?
- in how many collections is artwork X?
- what are the labels users give to their collections where artwork X is?
- how many shares / downloads of artwork X?
- what follow-up user study can be done with the most popular artworks in rijksstudio?
- how to promote the less popular artworks?
deep data with tracking EVENTS:
- when people engage with the content (and if ever come back)
- what day & date & time people engage with content;
- where do they view it (zip code) & on what device
- what people search for in the rijksstudio / website
- how people browse & scroll;
- what kind of in-museum engagement can be done for rijksstudio popular users?
- how to link to current exhibitions the collections in rijksstudio?
78
82. http://lora-aroyo.org http://slideshare.net/laroyo @laroyo http:://crowdtruth.org
TAKE HOME MESSAGE
82
data is essential to evolve with your users
data should be at the center of every process
there is no single notion of truth
but a spectrum of context, opinions,
perspectives & shades of grey
harnessing the full spectrum of
truth from experts & users
creates more opportunities for
serendipity, creativity & engagement