SlideShare a Scribd company logo
1 of 20
Download to read offline
Using Linked Data to 
diversify search results 
Chris Dijkshoorn a case study in cultural heritage 
Lora Aroyo 
Guus Schreiber 
Jan Wielemaker 
Lizzy Jongma
Uses of diverse search results 
Provide diverse search results to 
enable exploration 
Provide clusters of results with 
different topics to address 
ambiguity
Uses of diverse search results 
Provide diverse search results to 
enable exploration 
Provide clusters of results with 
different topics to address 
ambiguity 
Hypothesis 
External semantics can provide 
the means to diversify search 
results while providing context
Cultural heritage collection 
Rijksmuseum Amsterdam 
‣ ~1,000,000 objects 
‣ Dutch Masters like Rembrandt 
The collection online 
‣ Published as Linked Data 
‣ ~550,000 object records 
‣ ~250,000 images
Links to external datasets 
Subject matter 
‣ Iconclass 
‣ IOC World Bird List 
Type and format of artwork 
‣ Art & Architecture Thesaurus 
‣ Wordnet 
Creators 
‣ Union List of Artist Names
Statistics of datasets linked to collection 
Iconclass 
39,578 
concepts 
Art and 
Architecture 
Thesaurus 
38,619 
concepts 
Union List of 
Artist Names 
113,768 
Persons 
Wordnet 
115,424 
concepts 
IOC World 
Bird List 
34,197 
concepts 
Rijksmuseum 
Collection 
543,816 Artworks 
708,287 
total 
143 
distinct 
306,872 
total 
11,945 
distinct 
2,293 
alignments 
7,977 
distinct 148 
alignments 
186,126 
total
Linked Data and search result 
diversity 
Influence external semantics on 
search functionality
Experiment: match queries with datasets 
Investigate how many query terms match literals in triple store 
‣ Collect query terms 
‣ See if query matches a literal in dataset 
‣ Record percentage of literals that match per dataset 
Goal: investigate potential of external vocabularies
Query logs 
Use one month of Rijksmuseum website query logs as input 
Split logs into three parts 
‣ High frequency 2,393 queries 
‣ Medium Frequency 16,963 queries 
‣ Low frequency 25,303 queries
Query terms that match with a label in the vocabularies 
Loaded Vocabularies 
Art and Architecture Thesaurus 
Wordnet 
Union List of Artist Names 
Iconclass 
IOC World Bird List 
Collection only 
Query Frequency Splits 
Percentage of Query Terms matching a Literal 
0% 20% 40% 60% 80% 100% 
High Split Medium Split Low Split H − M − L
Experiment: diversifying search 
results 
Use existing semantic search 
algorithm1 
Investigate the effects of added 
external semantics on 
‣ Overall number of results 
‣ Number of found clusters 
‣ Path Length of cluster definitions 
1 Thesaurus-Based Search in Large Heterogeneous 
Collections - Jan Wielemaker, Michiel Hildebrand, Jacco 
van Ossenbruggen and Guus Schreiber - ISWC 2008
Semantic Search Algorithm 
Steps algorithm 
‣ Match user query with literals in index 
‣ Find paths to artwork using graph structure 
‣ Use path to create clusters
Semantic Search Algorithm 
Steps algorithm 
‣ Match user query with literals in index 
‣ Find paths to artwork using graph structure 
‣ Use path to create clusters 
”Rubens, Peter Paul” dc: 
”Rubens, Philip” subject
Paths, Clusters and Diversity 
altLabel ”Paolo Rubens” Rubens, 
”Rubens 
skos: 
Peale” 
Peale, Rubens altLabel ulan: 
Peale, James ulan: 
siblingOf teacherOf 
Benjamin, 
West 
dc: 
creator 
dc: 
creator 
skos: 
Peter Paul 
Use path length and number of clusters as indicators for 
search result diversity
Number of clusters per query 
rma high all high rma med all med rma low all low 
2 4 6 8 10 12 14 
Loaded Vocabularies and Query Frequency Splits 
Number of Clusters
Path lengths of cluster definitions 
rma edm aat wn ulan ic ioc all vocs 
Vocabularies 
Percentage of Paths 
0% 20% 40% 60% 80% 100% 
Path Lengths 
6+ 
5 
4 
3 
2 
1
Conclusions 
Linked Data can be used to 
diversify search results 
But not every vocabulary is useful 
Hypothesis Search result 
diversification influenced by 
1. the number of distinct links to 
vocabularies
Conclusions 
Linked Data can be used to 
diversify search results 
But not every vocabulary is useful 
Hypothesis Search result 
diversification influenced by 
1. the number of distinct links to 
vocabularies 
2. the richness of the internal links 
between vocabulary objects
Future Work 
Incorporate relevance 
assessments in evaluation 
Create variants graph search 
algorithm 
Run experiments on integrated 
collections
Using Linked Data to diversify search results 
a case study in cultural heritage 
https://github.com/rasvaan/cluster_search_experimental_data/ 
http://sealinc.ops.few.vu.nl/clustersearch/ 
Chris Dijkshoorn c.r.dijkshoorn@vu.nl http://chrisdijkshoorn.nl

More Related Content

What's hot

Engineering literature searching handout
Engineering   literature searching handoutEngineering   literature searching handout
Engineering literature searching handout
jamiehalsteadkcl
 

What's hot (10)

Archaeology Data Service (ADS)
Archaeology Data Service (ADS)Archaeology Data Service (ADS)
Archaeology Data Service (ADS)
 
Stormont: Ice cream, golf and striptease
Stormont: Ice cream, golf and stripteaseStormont: Ice cream, golf and striptease
Stormont: Ice cream, golf and striptease
 
2018-10-30 図書館総合展2018 トークイベント saveMLAKとは
2018-10-30 図書館総合展2018 トークイベント saveMLAKとは2018-10-30 図書館総合展2018 トークイベント saveMLAKとは
2018-10-30 図書館総合展2018 トークイベント saveMLAKとは
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
Fasti Online at the International Association of Classical Archaeology (AIAC)
Fasti Online at the International Association of Classical Archaeology (AIAC)Fasti Online at the International Association of Classical Archaeology (AIAC)
Fasti Online at the International Association of Classical Archaeology (AIAC)
 
Museum0610
Museum0610Museum0610
Museum0610
 
Engineering literature searching handout
Engineering   literature searching handoutEngineering   literature searching handout
Engineering literature searching handout
 
Good data management and online access
Good data management and online accessGood data management and online access
Good data management and online access
 
Primo at Ticer 2009 - afternoon session
Primo at Ticer 2009 - afternoon sessionPrimo at Ticer 2009 - afternoon session
Primo at Ticer 2009 - afternoon session
 

Viewers also liked

CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
Lora Aroyo
 

Viewers also liked (17)

Ontologies for multimedia: the Semantic Culture Web
Ontologies for multimedia: the Semantic Culture WebOntologies for multimedia: the Semantic Culture Web
Ontologies for multimedia: the Semantic Culture Web
 
Web Science: the digital heritage case
Web Science: the digital heritage caseWeb Science: the digital heritage case
Web Science: the digital heritage case
 
NoTube: integrating TV and Web with the help of semantics
NoTube: integrating TV and Web with the help of semanticsNoTube: integrating TV and Web with the help of semantics
NoTube: integrating TV and Web with the help of semantics
 
Principles and pragmatics of a Semantic Culture Web
 Principles and pragmatics of a Semantic Culture Web Principles and pragmatics of a Semantic Culture Web
Principles and pragmatics of a Semantic Culture Web
 
Semantics for visual resources: use cases from e-culture
Semantics for visual resources: use cases from e-cultureSemantics for visual resources: use cases from e-culture
Semantics for visual resources: use cases from e-culture
 
Linking historical ship records to a newspaper archive
Linking historical ship records to a newspaper archiveLinking historical ship records to a newspaper archive
Linking historical ship records to a newspaper archive
 
Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012
 
Knowledge engineering and the Web
Knowledge engineering and the WebKnowledge engineering and the Web
Knowledge engineering and the Web
 
Principles for knowledge engineering on the Web
Principles for knowledge engineering on the WebPrinciples for knowledge engineering on the Web
Principles for knowledge engineering on the Web
 
The artof of knowledge engineering, or: knowledge engineering of art
The artof of knowledge engineering, or: knowledge engineering of artThe artof of knowledge engineering, or: knowledge engineering of art
The artof of knowledge engineering, or: knowledge engineering of art
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 
Talk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament ProceedingsTalk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament Proceedings
 
Dive exploring history presentation
Dive exploring history presentationDive exploring history presentation
Dive exploring history presentation
 
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceTEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
 

Similar to Using Linked Data to diversify search results: a case study in cultural heritage

Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
The European Library
 
Introduction to Scratchpads & ViBRANT
Introduction to Scratchpads & ViBRANTIntroduction to Scratchpads & ViBRANT
Introduction to Scratchpads & ViBRANT
Edward Baker
 
Library resources for EL/EN6770
Library resources for EL/EN6770Library resources for EL/EN6770
Library resources for EL/EN6770
NUS Libraries
 
Semantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage CollectionsSemantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage Collections
RyanRM
 

Similar to Using Linked Data to diversify search results: a case study in cultural heritage (20)

Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Building an ecosystem of networked references
Building an ecosystem of networked referencesBuilding an ecosystem of networked references
Building an ecosystem of networked references
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Introduction to Scratchpads & ViBRANT
Introduction to Scratchpads & ViBRANTIntroduction to Scratchpads & ViBRANT
Introduction to Scratchpads & ViBRANT
 
Digital Humanities and Linked Data
Digital Humanities and Linked DataDigital Humanities and Linked Data
Digital Humanities and Linked Data
 
Visualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham CorpusVisualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham Corpus
 
An Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHLAn Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHL
 
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
 
Library resources for EL/EN6770
Library resources for EL/EN6770Library resources for EL/EN6770
Library resources for EL/EN6770
 
Using Word Clouds to Assist with Collection Development
Using Word Clouds to Assist with Collection DevelopmentUsing Word Clouds to Assist with Collection Development
Using Word Clouds to Assist with Collection Development
 
Semantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage CollectionsSemantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage Collections
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.orgVocabularies as Linked Data: SENESCHAL & HeritageData.org
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
Reborn Digital: coding text
Reborn Digital: coding textReborn Digital: coding text
Reborn Digital: coding text
 
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
 

Recently uploaded

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Using Linked Data to diversify search results: a case study in cultural heritage

  • 1. Using Linked Data to diversify search results Chris Dijkshoorn a case study in cultural heritage Lora Aroyo Guus Schreiber Jan Wielemaker Lizzy Jongma
  • 2. Uses of diverse search results Provide diverse search results to enable exploration Provide clusters of results with different topics to address ambiguity
  • 3. Uses of diverse search results Provide diverse search results to enable exploration Provide clusters of results with different topics to address ambiguity Hypothesis External semantics can provide the means to diversify search results while providing context
  • 4. Cultural heritage collection Rijksmuseum Amsterdam ‣ ~1,000,000 objects ‣ Dutch Masters like Rembrandt The collection online ‣ Published as Linked Data ‣ ~550,000 object records ‣ ~250,000 images
  • 5. Links to external datasets Subject matter ‣ Iconclass ‣ IOC World Bird List Type and format of artwork ‣ Art & Architecture Thesaurus ‣ Wordnet Creators ‣ Union List of Artist Names
  • 6. Statistics of datasets linked to collection Iconclass 39,578 concepts Art and Architecture Thesaurus 38,619 concepts Union List of Artist Names 113,768 Persons Wordnet 115,424 concepts IOC World Bird List 34,197 concepts Rijksmuseum Collection 543,816 Artworks 708,287 total 143 distinct 306,872 total 11,945 distinct 2,293 alignments 7,977 distinct 148 alignments 186,126 total
  • 7. Linked Data and search result diversity Influence external semantics on search functionality
  • 8. Experiment: match queries with datasets Investigate how many query terms match literals in triple store ‣ Collect query terms ‣ See if query matches a literal in dataset ‣ Record percentage of literals that match per dataset Goal: investigate potential of external vocabularies
  • 9. Query logs Use one month of Rijksmuseum website query logs as input Split logs into three parts ‣ High frequency 2,393 queries ‣ Medium Frequency 16,963 queries ‣ Low frequency 25,303 queries
  • 10. Query terms that match with a label in the vocabularies Loaded Vocabularies Art and Architecture Thesaurus Wordnet Union List of Artist Names Iconclass IOC World Bird List Collection only Query Frequency Splits Percentage of Query Terms matching a Literal 0% 20% 40% 60% 80% 100% High Split Medium Split Low Split H − M − L
  • 11. Experiment: diversifying search results Use existing semantic search algorithm1 Investigate the effects of added external semantics on ‣ Overall number of results ‣ Number of found clusters ‣ Path Length of cluster definitions 1 Thesaurus-Based Search in Large Heterogeneous Collections - Jan Wielemaker, Michiel Hildebrand, Jacco van Ossenbruggen and Guus Schreiber - ISWC 2008
  • 12. Semantic Search Algorithm Steps algorithm ‣ Match user query with literals in index ‣ Find paths to artwork using graph structure ‣ Use path to create clusters
  • 13. Semantic Search Algorithm Steps algorithm ‣ Match user query with literals in index ‣ Find paths to artwork using graph structure ‣ Use path to create clusters ”Rubens, Peter Paul” dc: ”Rubens, Philip” subject
  • 14. Paths, Clusters and Diversity altLabel ”Paolo Rubens” Rubens, ”Rubens skos: Peale” Peale, Rubens altLabel ulan: Peale, James ulan: siblingOf teacherOf Benjamin, West dc: creator dc: creator skos: Peter Paul Use path length and number of clusters as indicators for search result diversity
  • 15. Number of clusters per query rma high all high rma med all med rma low all low 2 4 6 8 10 12 14 Loaded Vocabularies and Query Frequency Splits Number of Clusters
  • 16. Path lengths of cluster definitions rma edm aat wn ulan ic ioc all vocs Vocabularies Percentage of Paths 0% 20% 40% 60% 80% 100% Path Lengths 6+ 5 4 3 2 1
  • 17. Conclusions Linked Data can be used to diversify search results But not every vocabulary is useful Hypothesis Search result diversification influenced by 1. the number of distinct links to vocabularies
  • 18. Conclusions Linked Data can be used to diversify search results But not every vocabulary is useful Hypothesis Search result diversification influenced by 1. the number of distinct links to vocabularies 2. the richness of the internal links between vocabulary objects
  • 19. Future Work Incorporate relevance assessments in evaluation Create variants graph search algorithm Run experiments on integrated collections
  • 20. Using Linked Data to diversify search results a case study in cultural heritage https://github.com/rasvaan/cluster_search_experimental_data/ http://sealinc.ops.few.vu.nl/clustersearch/ Chris Dijkshoorn c.r.dijkshoorn@vu.nl http://chrisdijkshoorn.nl