SlideShare a Scribd company logo
1 of 30
Download to read offline
Measuring System Performance in
Cultural Heritage Information Systems
Toine Bogers
Aalborg University Copenhagen, Denmark
‘Evaluating Cultural Heritage Information Systems’ workshop
iConference 2015, Newport Beach
March 24, 2015
Outline
• Types of cultural heritage (CH) information systems
- Definition
- Common evaluation practice
• Challenges
• Case study: Social Book Search track
2
Types of cultural heritage information systems
• Large variety in the types of cultural heritage collections → many
different ways of unlocking this material
• Four main types of cultural heritage information systems
- Search
- Browsing
- Recommendation
- Enrichment
3
Search (Definition)
• Search engines provide direct access to the collection
- Search engine indexes representations of the collection objects
(occasionally full-text)
- User interacts by actively submitting queries describing their information
need(s)
- Search engine ranks the collection documents by (topical) relevance for
the query
• Examples
- Making museum collection metadata accessible (Koolen et al., 2009)
- Searching through war-time radio broadcasts (Heeren et al., 2007)
- Unlocking television broadcast archives (Hollink et al., 2009) 4
5
p of the GUI used for the system. Top left the high-level concept search part and right
Example: Searching broadcast video archives
Search (Evaluation practice)
• What do we need?
- Realistic collection of objects with textual representations
- Representative set of real-world information needs → for reliable
evaluation we typically need ≥50 topics
- Relevance judgments → (semi-)complete list of correct
answers for each of these topics, preferably from the original users
• How do we evaluate?
- Unranked → Precision (what did we get right?) & Recall (what did we miss?)
- Ranked → MRR (where is the first relevant result?), MAP (are the relevant
results all near the top?), and nDCG (are the most relevant results returned
before the less relevant ones?)
6
‘test
collection’
Browsing (Definition)
• Browsing supports free to semi-guided exploration of collections
- Object metadata allows for links between objects → clicking on a link
shows all other objects that share that property
- Exploration can also take place along other dimensions (e.g., temporal or
geographical)
- Taxonomies & ontologies can be used to link objects in different ways
- Users can explore with or without an direct information need
• Examples
- Exploring digital cultural heritage spaces in PATHS (Hall et al., 2012)
- Semantic portals for cultural heritage (Hyvönen, 2009)
7
8
Example: Providing multiple paths through collection using PATHS
Browsing (Evaluation practice)
• What do we need?
- System-based evaluation of performance is hard to do → browsing is the
most user-focused of the four system types
- If historical interaction logs are available, then these could be used to
identify potential browsing ‘shortcuts’
• How do we evaluate?
- Known-item evaluation → Shortest path lengths to randomly selected
items can provide a hint about best possible outcome
‣ Needs to be complemented with user-based studies of actual browsing behavior!
- ‘Novel’ information need → User-based evaluation is required to draw any
meaningful conclusions (about satisfaction, effectiveness, and efficiency)
9
Recommendation (Definition)
• Recommender systems provide suggestions for new content
- Non-personalized → “More like this” functionality
- Personalized → Suggestions for new content based on past interactions
‣ System records implicit (or explicit) evidence of user interest (e.g., views,
bookmarks, prints, ...)
‣ Find interesting, related content based on content-based and/or social similarity &
generate a personalized ranking of the related content by training a model of the
users and item space
‣ User’s role is passive: interactions are recorded & suggestions are pushed on the user
• Examples
- Personalized museum tours (Ardissono et al, 2012; Bohnert et al., 2008;
De Gemmis et al., 2008; Wang et al., 2009) 10
11
Fig. 4. Screenshot of the CHIP Recommender
Example: Personalized museum tours using CHIP
Recommendation (Evaluation practice)
• What do we need?
- User profiles for each user, containing a sufficiently large number (≥20) of
user preferences (views, plays, bookmarks, prints, ratings, etc.)
- Problematic in the start-up phase of a system, leading to the cold-start
problem
‣ Possible solution → combining multiple algorithms to provide recommendations until we
have collected enough information
• How do we evaluate?
- Backtesting (combination of information retrieval & machine learning evaluation)
‣ We hide a small number (e.g., 10) of a user’s preferences, train our algorithm and check
whether we can successfully predict interest in the ‘missing’ items
- Evaluation metrics are similar to search engine evaluation
12
Enrichment (Definition)
• Enrichment covers all approaches that add extra layers of
information to collection objects
- Many different types of ‘added information’: entities, events, errors/
corrections, geo-tagging, clustering, etc.
- Typically use machine learning to predict additional information relevant for
an object
‣ Supervised learning uses labeled examples to learn patterns
‣ Unsupervised learning attempts to find patterns without examples
• Examples
- Automatically correcting database entry errors (Van den Bosch et al., 2009)
- Historical event detection in text (Cybulska & Vossen, 2011) 13
14gure 3. Details of an animal specimen database entry result retu
Example: Automatic error correction in databases
15
Back to theme page
Slachtoffers gemaakt door de Nederlandse troepen op weg
naar Jogyakarta. Kinderschilderij van de inname van Jogyakarta
tijdens de tweede politionele actie, december 1948.
NG-1998-7-10
Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta (Object) Associated Events
DepictsEvent: Tweede politionele actie
biographical aspects
Creator:Toha Adimidjojo, Mohammed (4) Date:1948-12-19 (3) 1949-06-30 (3) 20e eeuw (18) tweede kwart 20e eeuw (17)
material aspects
Type: aquarel (3)tekening (3)
Technique: aquarelleren (3)
Material: hardboard (4)
semiotic aspects
Subject: Jogyakarta (4)Tweede politionele actie (7)
1948-12-19 (4) 1949-06-31 (1)
militaire geschiedenis (12)
Associated Objects (25) < prev 1 2 3 4 5 next >
Your Navigation Path < prev 1 next >
Navigation Path Details
President Soekarno g...
Associated Press
Sinkin panjang met s...
Anonymous
Indonesië vrij!
Hatta, Mohammad
Schild van een Atjeher
Anonymous
Aankomst van Van Spi...
Anonymous
Het kasteel van Bata...
Beeckman, Andries
Figure 1: Screenshot of object page in the Agora Event Browsing Demonstrator
GORA DEMONSTRATOR 7. ADDITIONAL AUTHORS
Example: Historical event extraction from text
Enrichment (Evaluation practice)
• What do we need?
- Most enrichment approaches use machine learning algorithms to predict
which annotations to add to an object
- Data set with a large number (>1000) of labeled examples, each of which
contain different features about this object and the actual output label
- Including humans in a feedback loop can reduce the number of examples
needed for good performance, but results in a longer training phase
• How do we evaluate?
- Metrics from machine learning are commonly used
‣ Precision (what did we get right?) & Recall (what did we miss?)
‣ F-score (harmonic mean of Precision & Recall)
16
Challenges
• Propagation of errors
- Unlocking cultural heritage is inherently a multi-stage process
‣ Digitization → correction → enrichment → access
- Errors will propagate and influence all subsequent stages → difficult to
tease apart what caused errors at the later stage
‣ Only possible with additional manual labor!
• Language
- Historical spelling variants need to be detected and incorporated
- Multilinguality → many collections contain content in multiple languages,
which present problems for both algorithms and evaluation
17
Challenges
• Measuring system performance still requires user input!
- Queries, relevance judgments, user preferences, pre-classified examples, ...
• Different groups provide different input affecting the performance →
how do we reach them and how do we strike a balance?
- Experts
‣ Interviews, observation
- Amateurs & enthusiasts
‣ Dedicated websites & online communities
- General public
‣ Search logs
18
Challenges
• Scaling up from cases to databases
- Can we scale up small-scale user-based evaluation to large-scale system-
based evaluation?
- Which evaluation aspects can we measure reliably?
- How much should the human be in the loop?
• No two cultural heritage systems are the same!
- Means evaluation needs to be tailored to each situation (in collaboration
with end users)
19
Case study: Social Book Search
• The Social Book Search track (2011-2015) is a search challenge
focused on book search & discovery
- Originally at INEX (2011-2014), now at CLEF (2015- )
• What do we need to investigate book search & discovery?
- Collection of book records
‣ Amazon/LibraryThing collection containing 2.8 million book metadata records
‣ Mix of metadata from Amazon and Librarything
‣ Controlled metadata from Library of Congress (LoC) and British Library (BL)
- Representative set of book requests & information needs
- Relevance judgments (preferably graded)
20
Challenge: Information needs & relevance judgments
• Getting a large, varied & representative set of book information
needs and relevance judgment is far from trivial!
- Each method has its own pros and cons in terms of realism and size
21
Information
needs
Relevance
judgments
Size
Interview ✓ ✓ ✗
Surveys ✓ ✗ ✓
Search engine logs ✗ ✗ ✓
Web mining ✓ ✓ ✓
Solution: Mining the LibraryThing fora
• Book discussion fora contain discussions on many different topics
- Analyses of single or related books
- Author discussions & comparisons
- Reading behavior discussions
- Requests for new books to read & discover
- Re-finding known but forgotten books
• Example: LibraryThing fora
22
Annotated LT topic
23
Annotated LT topic
24
Group name
Topic title
Narrative
Recommended
books
Solution: Mining the LibraryThing fora
• LibraryThing fora provided us with 10,000+ rich, realistic,
representative information needs captured in discussion threads
- Annotated 1000+ threads with additional aspects of the information needs
- Graded relevance judgments based on
‣ Number of mentions by other LibraryThing users
‣ Interest by original requester
25
Catalog additions
26
Forums suggestions added
after the topic was posted
Not just true for the book domain!
27
Relevance for designing CH information systems
• Benefits
- Better understanding of the needs of amateurs, enthusiasts, and the
general public
- Easy & cheap way of collecting many examples of information needs
- Should not be seen as a substitute, but as an addition
• Caveat
- Example needs might not be available on the Web for every domain...
28
Conclusions
• Different types of systems require different evaluation approaches
• Many challenges exist that can influence performance
• Some of these challenges can be addressed by leveraging the power
and the breadth of the Web
29
Want to hear more about what we can learn from the Social
Book Search track? Come to our Tagging vs. Controlled
Vocabulary: Which is More Helpful for Book Search? talk
in the ”Extracting, Comparing and Creating Book and Journal
Data” session (Wednesday, 10:30-12:00, Salon D)
Questions? Comments? Suggestions?
30

More Related Content

Similar to Measuring System Performance in Cultural Heritage Systems

Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Symeon Papadopoulos
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingJessica Ogden
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideTony Hirst
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Toine Bogers
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Vladimir Alexiev, PhD, PMP
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshareguest94c824
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)TimelessFuture
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...James Jacobs
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMichael Mathioudakis
 

Similar to Measuring System Performance in Cultural Heritage Systems (20)

Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web Archiving
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Benoit Visual Only Retrieval
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
 
CISER & the Data Reference Interview
CISER & the Data Reference InterviewCISER & the Data Reference Interview
CISER & the Data Reference Interview
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
IRT Unit_I.pptx
IRT Unit_I.pptxIRT Unit_I.pptx
IRT Unit_I.pptx
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 

More from Toine Bogers

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...Toine Bogers
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingToine Bogers
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...Toine Bogers
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkToine Bogers
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...Toine Bogers
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...Toine Bogers
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationToine Bogers
 
An In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book SearchAn In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book SearchToine Bogers
 
Personalized search
Personalized searchPersonalized search
Personalized searchToine Bogers
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeToine Bogers
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Toine Bogers
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...Toine Bogers
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterToine Bogers
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
 

More from Toine Bogers (15)

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven Recommendation
 
An In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book SearchAn In-depth Analysis of Tags and Controlled Metadata for Book Search
An In-depth Analysis of Tags and Controlled Metadata for Book Search
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 

Recently uploaded

Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...
Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...
Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...Cherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptSyedArifMalki
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxRASHMI M G
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 
Sequence submission tools ............pptx
Sequence submission tools ............pptxSequence submission tools ............pptx
Sequence submission tools ............pptxCherry
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil RecordSangram Sahoo
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismAreesha Ahmad
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisAreesha Ahmad
 
Method of Quantifying interactions and its types
Method of Quantifying interactions and its typesMethod of Quantifying interactions and its types
Method of Quantifying interactions and its typesNISHIKANTKRISHAN
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 

Recently uploaded (20)

Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...
Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...
Molecular phylogeny, molecular clock hypothesis, molecular evolution, kimuras...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Sequence submission tools ............pptx
Sequence submission tools ............pptxSequence submission tools ............pptx
Sequence submission tools ............pptx
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Method of Quantifying interactions and its types
Method of Quantifying interactions and its typesMethod of Quantifying interactions and its types
Method of Quantifying interactions and its types
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 

Measuring System Performance in Cultural Heritage Systems

  • 1. Measuring System Performance in Cultural Heritage Information Systems Toine Bogers Aalborg University Copenhagen, Denmark ‘Evaluating Cultural Heritage Information Systems’ workshop iConference 2015, Newport Beach March 24, 2015
  • 2. Outline • Types of cultural heritage (CH) information systems - Definition - Common evaluation practice • Challenges • Case study: Social Book Search track 2
  • 3. Types of cultural heritage information systems • Large variety in the types of cultural heritage collections → many different ways of unlocking this material • Four main types of cultural heritage information systems - Search - Browsing - Recommendation - Enrichment 3
  • 4. Search (Definition) • Search engines provide direct access to the collection - Search engine indexes representations of the collection objects (occasionally full-text) - User interacts by actively submitting queries describing their information need(s) - Search engine ranks the collection documents by (topical) relevance for the query • Examples - Making museum collection metadata accessible (Koolen et al., 2009) - Searching through war-time radio broadcasts (Heeren et al., 2007) - Unlocking television broadcast archives (Hollink et al., 2009) 4
  • 5. 5 p of the GUI used for the system. Top left the high-level concept search part and right Example: Searching broadcast video archives
  • 6. Search (Evaluation practice) • What do we need? - Realistic collection of objects with textual representations - Representative set of real-world information needs → for reliable evaluation we typically need ≥50 topics - Relevance judgments → (semi-)complete list of correct answers for each of these topics, preferably from the original users • How do we evaluate? - Unranked → Precision (what did we get right?) & Recall (what did we miss?) - Ranked → MRR (where is the first relevant result?), MAP (are the relevant results all near the top?), and nDCG (are the most relevant results returned before the less relevant ones?) 6 ‘test collection’
  • 7. Browsing (Definition) • Browsing supports free to semi-guided exploration of collections - Object metadata allows for links between objects → clicking on a link shows all other objects that share that property - Exploration can also take place along other dimensions (e.g., temporal or geographical) - Taxonomies & ontologies can be used to link objects in different ways - Users can explore with or without an direct information need • Examples - Exploring digital cultural heritage spaces in PATHS (Hall et al., 2012) - Semantic portals for cultural heritage (Hyvönen, 2009) 7
  • 8. 8 Example: Providing multiple paths through collection using PATHS
  • 9. Browsing (Evaluation practice) • What do we need? - System-based evaluation of performance is hard to do → browsing is the most user-focused of the four system types - If historical interaction logs are available, then these could be used to identify potential browsing ‘shortcuts’ • How do we evaluate? - Known-item evaluation → Shortest path lengths to randomly selected items can provide a hint about best possible outcome ‣ Needs to be complemented with user-based studies of actual browsing behavior! - ‘Novel’ information need → User-based evaluation is required to draw any meaningful conclusions (about satisfaction, effectiveness, and efficiency) 9
  • 10. Recommendation (Definition) • Recommender systems provide suggestions for new content - Non-personalized → “More like this” functionality - Personalized → Suggestions for new content based on past interactions ‣ System records implicit (or explicit) evidence of user interest (e.g., views, bookmarks, prints, ...) ‣ Find interesting, related content based on content-based and/or social similarity & generate a personalized ranking of the related content by training a model of the users and item space ‣ User’s role is passive: interactions are recorded & suggestions are pushed on the user • Examples - Personalized museum tours (Ardissono et al, 2012; Bohnert et al., 2008; De Gemmis et al., 2008; Wang et al., 2009) 10
  • 11. 11 Fig. 4. Screenshot of the CHIP Recommender Example: Personalized museum tours using CHIP
  • 12. Recommendation (Evaluation practice) • What do we need? - User profiles for each user, containing a sufficiently large number (≥20) of user preferences (views, plays, bookmarks, prints, ratings, etc.) - Problematic in the start-up phase of a system, leading to the cold-start problem ‣ Possible solution → combining multiple algorithms to provide recommendations until we have collected enough information • How do we evaluate? - Backtesting (combination of information retrieval & machine learning evaluation) ‣ We hide a small number (e.g., 10) of a user’s preferences, train our algorithm and check whether we can successfully predict interest in the ‘missing’ items - Evaluation metrics are similar to search engine evaluation 12
  • 13. Enrichment (Definition) • Enrichment covers all approaches that add extra layers of information to collection objects - Many different types of ‘added information’: entities, events, errors/ corrections, geo-tagging, clustering, etc. - Typically use machine learning to predict additional information relevant for an object ‣ Supervised learning uses labeled examples to learn patterns ‣ Unsupervised learning attempts to find patterns without examples • Examples - Automatically correcting database entry errors (Van den Bosch et al., 2009) - Historical event detection in text (Cybulska & Vossen, 2011) 13
  • 14. 14gure 3. Details of an animal specimen database entry result retu Example: Automatic error correction in databases
  • 15. 15 Back to theme page Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta. Kinderschilderij van de inname van Jogyakarta tijdens de tweede politionele actie, december 1948. NG-1998-7-10 Slachtoffers gemaakt door de Nederlandse troepen op weg naar Jogyakarta (Object) Associated Events DepictsEvent: Tweede politionele actie biographical aspects Creator:Toha Adimidjojo, Mohammed (4) Date:1948-12-19 (3) 1949-06-30 (3) 20e eeuw (18) tweede kwart 20e eeuw (17) material aspects Type: aquarel (3)tekening (3) Technique: aquarelleren (3) Material: hardboard (4) semiotic aspects Subject: Jogyakarta (4)Tweede politionele actie (7) 1948-12-19 (4) 1949-06-31 (1) militaire geschiedenis (12) Associated Objects (25) < prev 1 2 3 4 5 next > Your Navigation Path < prev 1 next > Navigation Path Details President Soekarno g... Associated Press Sinkin panjang met s... Anonymous Indonesië vrij! Hatta, Mohammad Schild van een Atjeher Anonymous Aankomst van Van Spi... Anonymous Het kasteel van Bata... Beeckman, Andries Figure 1: Screenshot of object page in the Agora Event Browsing Demonstrator GORA DEMONSTRATOR 7. ADDITIONAL AUTHORS Example: Historical event extraction from text
  • 16. Enrichment (Evaluation practice) • What do we need? - Most enrichment approaches use machine learning algorithms to predict which annotations to add to an object - Data set with a large number (>1000) of labeled examples, each of which contain different features about this object and the actual output label - Including humans in a feedback loop can reduce the number of examples needed for good performance, but results in a longer training phase • How do we evaluate? - Metrics from machine learning are commonly used ‣ Precision (what did we get right?) & Recall (what did we miss?) ‣ F-score (harmonic mean of Precision & Recall) 16
  • 17. Challenges • Propagation of errors - Unlocking cultural heritage is inherently a multi-stage process ‣ Digitization → correction → enrichment → access - Errors will propagate and influence all subsequent stages → difficult to tease apart what caused errors at the later stage ‣ Only possible with additional manual labor! • Language - Historical spelling variants need to be detected and incorporated - Multilinguality → many collections contain content in multiple languages, which present problems for both algorithms and evaluation 17
  • 18. Challenges • Measuring system performance still requires user input! - Queries, relevance judgments, user preferences, pre-classified examples, ... • Different groups provide different input affecting the performance → how do we reach them and how do we strike a balance? - Experts ‣ Interviews, observation - Amateurs & enthusiasts ‣ Dedicated websites & online communities - General public ‣ Search logs 18
  • 19. Challenges • Scaling up from cases to databases - Can we scale up small-scale user-based evaluation to large-scale system- based evaluation? - Which evaluation aspects can we measure reliably? - How much should the human be in the loop? • No two cultural heritage systems are the same! - Means evaluation needs to be tailored to each situation (in collaboration with end users) 19
  • 20. Case study: Social Book Search • The Social Book Search track (2011-2015) is a search challenge focused on book search & discovery - Originally at INEX (2011-2014), now at CLEF (2015- ) • What do we need to investigate book search & discovery? - Collection of book records ‣ Amazon/LibraryThing collection containing 2.8 million book metadata records ‣ Mix of metadata from Amazon and Librarything ‣ Controlled metadata from Library of Congress (LoC) and British Library (BL) - Representative set of book requests & information needs - Relevance judgments (preferably graded) 20
  • 21. Challenge: Information needs & relevance judgments • Getting a large, varied & representative set of book information needs and relevance judgment is far from trivial! - Each method has its own pros and cons in terms of realism and size 21 Information needs Relevance judgments Size Interview ✓ ✓ ✗ Surveys ✓ ✗ ✓ Search engine logs ✗ ✗ ✓ Web mining ✓ ✓ ✓
  • 22. Solution: Mining the LibraryThing fora • Book discussion fora contain discussions on many different topics - Analyses of single or related books - Author discussions & comparisons - Reading behavior discussions - Requests for new books to read & discover - Re-finding known but forgotten books • Example: LibraryThing fora 22
  • 24. Annotated LT topic 24 Group name Topic title Narrative Recommended books
  • 25. Solution: Mining the LibraryThing fora • LibraryThing fora provided us with 10,000+ rich, realistic, representative information needs captured in discussion threads - Annotated 1000+ threads with additional aspects of the information needs - Graded relevance judgments based on ‣ Number of mentions by other LibraryThing users ‣ Interest by original requester 25
  • 26. Catalog additions 26 Forums suggestions added after the topic was posted
  • 27. Not just true for the book domain! 27
  • 28. Relevance for designing CH information systems • Benefits - Better understanding of the needs of amateurs, enthusiasts, and the general public - Easy & cheap way of collecting many examples of information needs - Should not be seen as a substitute, but as an addition • Caveat - Example needs might not be available on the Web for every domain... 28
  • 29. Conclusions • Different types of systems require different evaluation approaches • Many challenges exist that can influence performance • Some of these challenges can be addressed by leveraging the power and the breadth of the Web 29 Want to hear more about what we can learn from the Social Book Search track? Come to our Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search? talk in the ”Extracting, Comparing and Creating Book and Journal Data” session (Wednesday, 10:30-12:00, Salon D)