SlideShare a Scribd company logo
Measuring System Performance in
Cultural Heritage Information Systems
Toine Bogers
Aalborg University Copenhagen, Denmark
‘Evaluating Cultural Heritage Information Systems’ workshop
iConference 2015, Newport Beach
March 24, 2015
Outline
• Types of cultural heritage (CH) information systems
- Definition
- Common evaluation practice
• Challenges
• Case study: Social Book Search track
2
Types of cultural heritage information systems
• Large variety in the types of cultural heritage collections → many
different ways of unlocking this material
• Four main types of cultural heritage information systems
- Search
- Browsing
- Recommendation
- Enrichment
3
Search (Definition)
• Search engines provide direct access to the collection
- Search engine indexes representations of the collection objects
(occasionally full-text)
- User interacts by actively submitting queries describing their information
need(s)
- Search engine ranks the collection documents by (topical) relevance for
the query
• Examples
- Making museum collection metadata accessible (Koolen et al., 2009)
- Searching through war-time radio broadcasts (Heeren et al., 2007)
- Unlocking television broadcast archives (Hollink et al., 2009) 4
5
p of the GUI used for the system. Top left the high-level concept search part and right
Example: Searching broadcast video archives
Search (Evaluation practice)
• What do we need?
- Realistic collection of objects with textual representations
- Representative set of real-world information needs → for reliable
evaluation we typically need ≥50 topics
- Relevance judgments → (semi-)complete list of correct
answers for each of these topics, preferably from the original users
• How do we evaluate?
- Unranked → Precision (what did we get right?) & Recall (what did we miss?)
- Ranked → MRR (where is the first relevant result?), MAP (are the relevant
results all near the top?), and nDCG (are the most relevant results returned
before the less relevant ones?)
6
‘test
collection’
Browsing (Definition)
• Browsing supports free to semi-guided exploration of collections
- Object metadata allows for links between objects → clicking on a link
shows all other objects that share that property
- Exploration can also take place along other dimensions (e.g., temporal or
geographical)
- Taxonomies & ontologies can be used to link objects in different ways
- Users can explore with or without an direct information need
• Examples
- Exploring digital cultural heritage spaces in PATHS (Hall et al., 2012)
- Semantic portals for cultural heritage (Hyvönen, 2009)
7
8
Example: Providing multiple paths through collection using PATHS
Browsing (Evaluation practice)
• What do we need?
- System-based evaluation of performance is hard to do → browsing is the
most user-focused of the four system types
- If historical interaction logs are available, then these could be used to
identify potential browsing ‘shortcuts’
• How do we evaluate?
- Known-item evaluation → Shortest path lengths to randomly selected
items can provide a hint about best possible outcome
‣ Needs to be complemented with user-based studies of actual browsing behavior!
- ‘Novel’ information need → User-based evaluation is required to draw any
meaningful conclusions (about satisfaction, effectiveness, and efficiency)
9
Recommendation (Definition)
• Recommender systems provide suggestions for new content
- Non-personalized → “More like this” functionality
- Personalized → Suggestions for new content based on past interactions
‣ System records implicit (or explicit) evidence of user interest (e.g., views,
bookmarks, prints, ...)
‣ Find interesting, related content based on content-based and/or social similarity &
generate a personalized ranking of the related content by training a model of the
users and item space
‣ User’s role is passive: interactions are recorded & suggestions are pushed on the user
• Examples
- Personalized museum tours (Ardissono et al, 2012; Bohnert et al., 2008;
De Gemmis et al., 2008; Wang et al., 2009) 10
11
Fig. 4. Screenshot of the CHIP Recommender
Example: Personalized museum tours using CHIP
Recommendation (Evaluation practice)
• What do we need?
- User profiles for each user, containing a sufficiently large number (≥20) of
user preferences (views, plays, bookmarks, prints, ratings, etc.)
- Problematic in the start-up phase of a system, leading to the cold-start
problem
‣ Possible solution → combining multiple algorithms to provide recommendations until we
have collected enough information
• How do we evaluate?
- Backtesting (combination of information retrieval & machine learning evaluation)
‣ We hide a small number (e.g., 10) of a user’s preferences, train our algorithm and check
whether we can successfully predict interest in the ‘missing’ items
- Evaluation metrics are similar to search engine evaluation
12
Enrichment (Definition)
• Enrichment covers all approaches that add extra layers of
information to collection objects
- Many different types of ‘added information’: entities, events, errors/
corrections, geo-tagging, clustering, etc.
- Typically use machine learning to predict additional information relevant for
an object
‣ Supervised learning uses labeled examples to learn patterns
‣ Unsupervised learning attempts to find patterns without examples
• Examples
- Automatically correcting database entry errors (Van den Bosch et al., 2009)
- Historical event detection in text (Cybulska & Vossen, 2011) 13
14gure 3. Details of an animal specimen database entry result retu
Example: Automatic error correction in databases
15
Back to theme page
Slachtoffers gemaakt door de Nederlandse troepen op weg
naar Jogyakarta. Kinderschilderij van de inname van Jogyakarta
tijdens de tweede politionele actie, december 1948.
NG-1998-7-10
Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta (Object) Associated Events
DepictsEvent: Tweede politionele actie
biographical aspects
Creator:Toha Adimidjojo, Mohammed (4) Date:1948-12-19 (3) 1949-06-30 (3) 20e eeuw (18) tweede kwart 20e eeuw (17)
material aspects
Type: aquarel (3)tekening (3)
Technique: aquarelleren (3)
Material: hardboard (4)
semiotic aspects
Subject: Jogyakarta (4)Tweede politionele actie (7)
1948-12-19 (4) 1949-06-31 (1)
militaire geschiedenis (12)
Associated Objects (25) < prev 1 2 3 4 5 next >
Your Navigation Path < prev 1 next >
Navigation Path Details
President Soekarno g...
Associated Press
Sinkin panjang met s...
Anonymous
Indonesië vrij!
Hatta, Mohammad
Schild van een Atjeher
Anonymous
Aankomst van Van Spi...
Anonymous
Het kasteel van Bata...
Beeckman, Andries
Figure 1: Screenshot of object page in the Agora Event Browsing Demonstrator
GORA DEMONSTRATOR 7. ADDITIONAL AUTHORS
Example: Historical event extraction from text
Enrichment (Evaluation practice)
• What do we need?
- Most enrichment approaches use machine learning algorithms to predict
which annotations to add to an object
- Data set with a large number (>1000) of labeled examples, each of which
contain different features about this object and the actual output label
- Including humans in a feedback loop can reduce the number of examples
needed for good performance, but results in a longer training phase
• How do we evaluate?
- Metrics from machine learning are commonly used
‣ Precision (what did we get right?) & Recall (what did we miss?)
‣ F-score (harmonic mean of Precision & Recall)
16
Challenges
• Propagation of errors
- Unlocking cultural heritage is inherently a multi-stage process
‣ Digitization → correction → enrichment → access
- Errors will propagate and influence all subsequent stages → difficult to
tease apart what caused errors at the later stage
‣ Only possible with additional manual labor!
• Language
- Historical spelling variants need to be detected and incorporated
- Multilinguality → many collections contain content in multiple languages,
which present problems for both algorithms and evaluation
17
Challenges
• Measuring system performance still requires user input!
- Queries, relevance judgments, user preferences, pre-classified examples, ...
• Different groups provide different input affecting the performance →
how do we reach them and how do we strike a balance?
- Experts
‣ Interviews, observation
- Amateurs & enthusiasts
‣ Dedicated websites & online communities
- General public
‣ Search logs
18
Challenges
• Scaling up from cases to databases
- Can we scale up small-scale user-based evaluation to large-scale system-
based evaluation?
- Which evaluation aspects can we measure reliably?
- How much should the human be in the loop?
• No two cultural heritage systems are the same!
- Means evaluation needs to be tailored to each situation (in collaboration
with end users)
19
Case study: Social Book Search
• The Social Book Search track (2011-2015) is a search challenge
focused on book search & discovery
- Originally at INEX (2011-2014), now at CLEF (2015- )
• What do we need to investigate book search & discovery?
- Collection of book records
‣ Amazon/LibraryThing collection containing 2.8 million book metadata records
‣ Mix of metadata from Amazon and Librarything
‣ Controlled metadata from Library of Congress (LoC) and British Library (BL)
- Representative set of book requests & information needs
- Relevance judgments (preferably graded)
20
Challenge: Information needs & relevance judgments
• Getting a large, varied & representative set of book information
needs and relevance judgment is far from trivial!
- Each method has its own pros and cons in terms of realism and size
21
Information
needs
Relevance
judgments
Size
Interview ✓ ✓ ✗
Surveys ✓ ✗ ✓
Search engine logs ✗ ✗ ✓
Web mining ✓ ✓ ✓
Solution: Mining the LibraryThing fora
• Book discussion fora contain discussions on many different topics
- Analyses of single or related books
- Author discussions & comparisons
- Reading behavior discussions
- Requests for new books to read & discover
- Re-finding known but forgotten books
• Example: LibraryThing fora
22
Annotated LT topic
23
Annotated LT topic
24
Group name
Topic title
Narrative
Recommended
books
Solution: Mining the LibraryThing fora
• LibraryThing fora provided us with 10,000+ rich, realistic,
representative information needs captured in discussion threads
- Annotated 1000+ threads with additional aspects of the information needs
- Graded relevance judgments based on
‣ Number of mentions by other LibraryThing users
‣ Interest by original requester
25
Catalog additions
26
Forums suggestions added
after the topic was posted
Not just true for the book domain!
27
Relevance for designing CH information systems
• Benefits
- Better understanding of the needs of amateurs, enthusiasts, and the
general public
- Easy & cheap way of collecting many examples of information needs
- Should not be seen as a substitute, but as an addition
• Caveat
- Example needs might not be available on the Web for every domain...
28
Conclusions
• Different types of systems require different evaluation approaches
• Many challenges exist that can influence performance
• Some of these challenges can be addressed by leveraging the power
and the breadth of the Web
29
Want to hear more about what we can learn from the Social
Book Search track? Come to our Tagging vs. Controlled
Vocabulary: Which is More Helpful for Book Search? talk
in the ”Extracting, Comparing and Creating Book and Journal
Data” session (Wednesday, 10:30-12:00, Salon D)
Questions? Comments? Suggestions?
30

More Related Content

Similar to Measuring System Performance in Cultural Heritage Systems

Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
Symeon Papadopoulos
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Enrico Daga
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Enrico Daga
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
James Howison
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web Archiving
Jessica Ogden
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
Jaap Kamps
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
Tony Hirst
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
Toine Bogers
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
Jeroen Rombouts
 
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Vladimir Alexiev, PhD, PMP
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
guest94c824
 
CISER & the Data Reference Interview
CISER & the Data Reference InterviewCISER & the Data Reference Interview
CISER & the Data Reference Interview
Historic Environment Scotland
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
Libcorpio
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
Rebecca Grant
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
Roi Blanco
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
TimelessFuture
 
IRT Unit_I.pptx
IRT Unit_I.pptxIRT Unit_I.pptx
IRT Unit_I.pptx
thenmozhip8
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...
James Jacobs
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Michael Mathioudakis
 

Similar to Measuring System Performance in Cultural Heritage Systems (20)

Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web Archiving
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Benoit Visual Only Retrieval
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
 
CISER & the Data Reference Interview
CISER & the Data Reference InterviewCISER & the Data Reference Interview
CISER & the Data Reference Interview
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
IRT Unit_I.pptx
IRT Unit_I.pptxIRT Unit_I.pptx
IRT Unit_I.pptx
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 

More from Toine Bogers

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
Toine Bogers
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Toine Bogers
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
Toine Bogers
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
Toine Bogers
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
Toine Bogers
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
Toine Bogers
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven Recommendation
Toine Bogers
 
An In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book SearchAn In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book Search
Toine Bogers
 
Personalized search
Personalized searchPersonalized search
Personalized search
Toine Bogers
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
Toine Bogers
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Toine Bogers
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
Toine Bogers
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Toine Bogers
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Toine Bogers
 

More from Toine Bogers (15)

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven Recommendation
 
An In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book SearchAn In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book Search
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 

Recently uploaded

Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 

Recently uploaded (20)

Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 

Measuring System Performance in Cultural Heritage Systems

  • 1. Measuring System Performance in Cultural Heritage Information Systems Toine Bogers Aalborg University Copenhagen, Denmark ‘Evaluating Cultural Heritage Information Systems’ workshop iConference 2015, Newport Beach March 24, 2015
  • 2. Outline • Types of cultural heritage (CH) information systems - Definition - Common evaluation practice • Challenges • Case study: Social Book Search track 2
  • 3. Types of cultural heritage information systems • Large variety in the types of cultural heritage collections → many different ways of unlocking this material • Four main types of cultural heritage information systems - Search - Browsing - Recommendation - Enrichment 3
  • 4. Search (Definition) • Search engines provide direct access to the collection - Search engine indexes representations of the collection objects (occasionally full-text) - User interacts by actively submitting queries describing their information need(s) - Search engine ranks the collection documents by (topical) relevance for the query • Examples - Making museum collection metadata accessible (Koolen et al., 2009) - Searching through war-time radio broadcasts (Heeren et al., 2007) - Unlocking television broadcast archives (Hollink et al., 2009) 4
  • 5. 5 p of the GUI used for the system. Top left the high-level concept search part and right Example: Searching broadcast video archives
  • 6. Search (Evaluation practice) • What do we need? - Realistic collection of objects with textual representations - Representative set of real-world information needs → for reliable evaluation we typically need ≥50 topics - Relevance judgments → (semi-)complete list of correct answers for each of these topics, preferably from the original users • How do we evaluate? - Unranked → Precision (what did we get right?) & Recall (what did we miss?) - Ranked → MRR (where is the first relevant result?), MAP (are the relevant results all near the top?), and nDCG (are the most relevant results returned before the less relevant ones?) 6 ‘test collection’
  • 7. Browsing (Definition) • Browsing supports free to semi-guided exploration of collections - Object metadata allows for links between objects → clicking on a link shows all other objects that share that property - Exploration can also take place along other dimensions (e.g., temporal or geographical) - Taxonomies & ontologies can be used to link objects in different ways - Users can explore with or without an direct information need • Examples - Exploring digital cultural heritage spaces in PATHS (Hall et al., 2012) - Semantic portals for cultural heritage (Hyvönen, 2009) 7
  • 8. 8 Example: Providing multiple paths through collection using PATHS
  • 9. Browsing (Evaluation practice) • What do we need? - System-based evaluation of performance is hard to do → browsing is the most user-focused of the four system types - If historical interaction logs are available, then these could be used to identify potential browsing ‘shortcuts’ • How do we evaluate? - Known-item evaluation → Shortest path lengths to randomly selected items can provide a hint about best possible outcome ‣ Needs to be complemented with user-based studies of actual browsing behavior! - ‘Novel’ information need → User-based evaluation is required to draw any meaningful conclusions (about satisfaction, effectiveness, and efficiency) 9
  • 10. Recommendation (Definition) • Recommender systems provide suggestions for new content - Non-personalized → “More like this” functionality - Personalized → Suggestions for new content based on past interactions ‣ System records implicit (or explicit) evidence of user interest (e.g., views, bookmarks, prints, ...) ‣ Find interesting, related content based on content-based and/or social similarity & generate a personalized ranking of the related content by training a model of the users and item space ‣ User’s role is passive: interactions are recorded & suggestions are pushed on the user • Examples - Personalized museum tours (Ardissono et al, 2012; Bohnert et al., 2008; De Gemmis et al., 2008; Wang et al., 2009) 10
  • 11. 11 Fig. 4. Screenshot of the CHIP Recommender Example: Personalized museum tours using CHIP
  • 12. Recommendation (Evaluation practice) • What do we need? - User profiles for each user, containing a sufficiently large number (≥20) of user preferences (views, plays, bookmarks, prints, ratings, etc.) - Problematic in the start-up phase of a system, leading to the cold-start problem ‣ Possible solution → combining multiple algorithms to provide recommendations until we have collected enough information • How do we evaluate? - Backtesting (combination of information retrieval & machine learning evaluation) ‣ We hide a small number (e.g., 10) of a user’s preferences, train our algorithm and check whether we can successfully predict interest in the ‘missing’ items - Evaluation metrics are similar to search engine evaluation 12
  • 13. Enrichment (Definition) • Enrichment covers all approaches that add extra layers of information to collection objects - Many different types of ‘added information’: entities, events, errors/ corrections, geo-tagging, clustering, etc. - Typically use machine learning to predict additional information relevant for an object ‣ Supervised learning uses labeled examples to learn patterns ‣ Unsupervised learning attempts to find patterns without examples • Examples - Automatically correcting database entry errors (Van den Bosch et al., 2009) - Historical event detection in text (Cybulska & Vossen, 2011) 13
  • 14. 14gure 3. Details of an animal specimen database entry result retu Example: Automatic error correction in databases
  • 15. 15 Back to theme page Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta. Kinderschilderij van de inname van Jogyakarta tijdens de tweede politionele actie, december 1948. NG-1998-7-10 Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta (Object) Associated Events DepictsEvent: Tweede politionele actie biographical aspects Creator:Toha Adimidjojo, Mohammed (4) Date:1948-12-19 (3) 1949-06-30 (3) 20e eeuw (18) tweede kwart 20e eeuw (17) material aspects Type: aquarel (3)tekening (3) Technique: aquarelleren (3) Material: hardboard (4) semiotic aspects Subject: Jogyakarta (4)Tweede politionele actie (7) 1948-12-19 (4) 1949-06-31 (1) militaire geschiedenis (12) Associated Objects (25) < prev 1 2 3 4 5 next > Your Navigation Path < prev 1 next > Navigation Path Details President Soekarno g... Associated Press Sinkin panjang met s... Anonymous Indonesië vrij! Hatta, Mohammad Schild van een Atjeher Anonymous Aankomst van Van Spi... Anonymous Het kasteel van Bata... Beeckman, Andries Figure 1: Screenshot of object page in the Agora Event Browsing Demonstrator GORA DEMONSTRATOR 7. ADDITIONAL AUTHORS Example: Historical event extraction from text
  • 16. Enrichment (Evaluation practice) • What do we need? - Most enrichment approaches use machine learning algorithms to predict which annotations to add to an object - Data set with a large number (>1000) of labeled examples, each of which contain different features about this object and the actual output label - Including humans in a feedback loop can reduce the number of examples needed for good performance, but results in a longer training phase • How do we evaluate? - Metrics from machine learning are commonly used ‣ Precision (what did we get right?) & Recall (what did we miss?) ‣ F-score (harmonic mean of Precision & Recall) 16
  • 17. Challenges • Propagation of errors - Unlocking cultural heritage is inherently a multi-stage process ‣ Digitization → correction → enrichment → access - Errors will propagate and influence all subsequent stages → difficult to tease apart what caused errors at the later stage ‣ Only possible with additional manual labor! • Language - Historical spelling variants need to be detected and incorporated - Multilinguality → many collections contain content in multiple languages, which present problems for both algorithms and evaluation 17
  • 18. Challenges • Measuring system performance still requires user input! - Queries, relevance judgments, user preferences, pre-classified examples, ... • Different groups provide different input affecting the performance → how do we reach them and how do we strike a balance? - Experts ‣ Interviews, observation - Amateurs & enthusiasts ‣ Dedicated websites & online communities - General public ‣ Search logs 18
  • 19. Challenges • Scaling up from cases to databases - Can we scale up small-scale user-based evaluation to large-scale system- based evaluation? - Which evaluation aspects can we measure reliably? - How much should the human be in the loop? • No two cultural heritage systems are the same! - Means evaluation needs to be tailored to each situation (in collaboration with end users) 19
  • 20. Case study: Social Book Search • The Social Book Search track (2011-2015) is a search challenge focused on book search & discovery - Originally at INEX (2011-2014), now at CLEF (2015- ) • What do we need to investigate book search & discovery? - Collection of book records ‣ Amazon/LibraryThing collection containing 2.8 million book metadata records ‣ Mix of metadata from Amazon and Librarything ‣ Controlled metadata from Library of Congress (LoC) and British Library (BL) - Representative set of book requests & information needs - Relevance judgments (preferably graded) 20
  • 21. Challenge: Information needs & relevance judgments • Getting a large, varied & representative set of book information needs and relevance judgment is far from trivial! - Each method has its own pros and cons in terms of realism and size 21 Information needs Relevance judgments Size Interview ✓ ✓ ✗ Surveys ✓ ✗ ✓ Search engine logs ✗ ✗ ✓ Web mining ✓ ✓ ✓
  • 22. Solution: Mining the LibraryThing fora • Book discussion fora contain discussions on many different topics - Analyses of single or related books - Author discussions & comparisons - Reading behavior discussions - Requests for new books to read & discover - Re-finding known but forgotten books • Example: LibraryThing fora 22
  • 24. Annotated LT topic 24 Group name Topic title Narrative Recommended books
  • 25. Solution: Mining the LibraryThing fora • LibraryThing fora provided us with 10,000+ rich, realistic, representative information needs captured in discussion threads - Annotated 1000+ threads with additional aspects of the information needs - Graded relevance judgments based on ‣ Number of mentions by other LibraryThing users ‣ Interest by original requester 25
  • 26. Catalog additions 26 Forums suggestions added after the topic was posted
  • 27. Not just true for the book domain! 27
  • 28. Relevance for designing CH information systems • Benefits - Better understanding of the needs of amateurs, enthusiasts, and the general public - Easy & cheap way of collecting many examples of information needs - Should not be seen as a substitute, but as an addition • Caveat - Example needs might not be available on the Web for every domain... 28
  • 29. Conclusions • Different types of systems require different evaluation approaches • Many challenges exist that can influence performance • Some of these challenges can be addressed by leveraging the power and the breadth of the Web 29 Want to hear more about what we can learn from the Social Book Search track? Come to our Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search? talk in the ”Extracting, Comparing and Creating Book and Journal Data” session (Wednesday, 10:30-12:00, Salon D)