The document discusses using linked open data to diversify search results for a cultural heritage collection. It presents a case study of the Rijksmuseum collection, which has been published as linked data connected to external vocabularies. An experiment matches query terms from the museum's search logs to literals in the datasets. It finds that adding external semantics increases the number of results and clusters per query, and path lengths of cluster definitions, indicating more diverse search results. However, not all vocabularies proved equally useful for diversification. Future work could incorporate relevance and test different graph search algorithms on integrated collections.
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Using Linked Data to diversify search results: a case study in cultural heritage
1. Using Linked Data to
diversify search results
Chris Dijkshoorn a case study in cultural heritage
Lora Aroyo
Guus Schreiber
Jan Wielemaker
Lizzy Jongma
2. Uses of diverse search results
Provide diverse search results to
enable exploration
Provide clusters of results with
different topics to address
ambiguity
3. Uses of diverse search results
Provide diverse search results to
enable exploration
Provide clusters of results with
different topics to address
ambiguity
Hypothesis
External semantics can provide
the means to diversify search
results while providing context
4. Cultural heritage collection
Rijksmuseum Amsterdam
‣ ~1,000,000 objects
‣ Dutch Masters like Rembrandt
The collection online
‣ Published as Linked Data
‣ ~550,000 object records
‣ ~250,000 images
5. Links to external datasets
Subject matter
‣ Iconclass
‣ IOC World Bird List
Type and format of artwork
‣ Art & Architecture Thesaurus
‣ Wordnet
Creators
‣ Union List of Artist Names
6. Statistics of datasets linked to collection
Iconclass
39,578
concepts
Art and
Architecture
Thesaurus
38,619
concepts
Union List of
Artist Names
113,768
Persons
Wordnet
115,424
concepts
IOC World
Bird List
34,197
concepts
Rijksmuseum
Collection
543,816 Artworks
708,287
total
143
distinct
306,872
total
11,945
distinct
2,293
alignments
7,977
distinct 148
alignments
186,126
total
7. Linked Data and search result
diversity
Influence external semantics on
search functionality
8. Experiment: match queries with datasets
Investigate how many query terms match literals in triple store
‣ Collect query terms
‣ See if query matches a literal in dataset
‣ Record percentage of literals that match per dataset
Goal: investigate potential of external vocabularies
9. Query logs
Use one month of Rijksmuseum website query logs as input
Split logs into three parts
‣ High frequency 2,393 queries
‣ Medium Frequency 16,963 queries
‣ Low frequency 25,303 queries
10. Query terms that match with a label in the vocabularies
Loaded Vocabularies
Art and Architecture Thesaurus
Wordnet
Union List of Artist Names
Iconclass
IOC World Bird List
Collection only
Query Frequency Splits
Percentage of Query Terms matching a Literal
0% 20% 40% 60% 80% 100%
High Split Medium Split Low Split H − M − L
11. Experiment: diversifying search
results
Use existing semantic search
algorithm1
Investigate the effects of added
external semantics on
‣ Overall number of results
‣ Number of found clusters
‣ Path Length of cluster definitions
1 Thesaurus-Based Search in Large Heterogeneous
Collections - Jan Wielemaker, Michiel Hildebrand, Jacco
van Ossenbruggen and Guus Schreiber - ISWC 2008
12. Semantic Search Algorithm
Steps algorithm
‣ Match user query with literals in index
‣ Find paths to artwork using graph structure
‣ Use path to create clusters
13. Semantic Search Algorithm
Steps algorithm
‣ Match user query with literals in index
‣ Find paths to artwork using graph structure
‣ Use path to create clusters
”Rubens, Peter Paul” dc:
”Rubens, Philip” subject
14. Paths, Clusters and Diversity
altLabel ”Paolo Rubens” Rubens,
”Rubens
skos:
Peale”
Peale, Rubens altLabel ulan:
Peale, James ulan:
siblingOf teacherOf
Benjamin,
West
dc:
creator
dc:
creator
skos:
Peter Paul
Use path length and number of clusters as indicators for
search result diversity
15. Number of clusters per query
rma high all high rma med all med rma low all low
2 4 6 8 10 12 14
Loaded Vocabularies and Query Frequency Splits
Number of Clusters
16. Path lengths of cluster definitions
rma edm aat wn ulan ic ioc all vocs
Vocabularies
Percentage of Paths
0% 20% 40% 60% 80% 100%
Path Lengths
6+
5
4
3
2
1
17. Conclusions
Linked Data can be used to
diversify search results
But not every vocabulary is useful
Hypothesis Search result
diversification influenced by
1. the number of distinct links to
vocabularies
18. Conclusions
Linked Data can be used to
diversify search results
But not every vocabulary is useful
Hypothesis Search result
diversification influenced by
1. the number of distinct links to
vocabularies
2. the richness of the internal links
between vocabulary objects
19. Future Work
Incorporate relevance
assessments in evaluation
Create variants graph search
algorithm
Run experiments on integrated
collections
20. Using Linked Data to diversify search results
a case study in cultural heritage
https://github.com/rasvaan/cluster_search_experimental_data/
http://sealinc.ops.few.vu.nl/clustersearch/
Chris Dijkshoorn c.r.dijkshoorn@vu.nl http://chrisdijkshoorn.nl