A Candidate Dataset Discovery andLinkage Recommendation System         for Linked Data        OC WG Meeting – 26.6.2012   ...
Outline  ●   Introduction  ●   Linked Data  ●   Candidate Dataset Selection  ●   Linkage Recommendation  ●   Conclusion   ...
Introduction  ●   Maturing of Semantic Web technologies  ●   Growing amount of structured data becoming available as Linke...
Linked Data – Linking Open Data Initiative  ●   300 datasets            LOD Cloud – September 2011  ●   30 billion triples...
Linked Data – Publishing Workflow           Ontology Selection                                      Dataset Selection     ...
Dataset Discovery – Overview       ●    Goal – Given a source dataset, discover candidate target datasets from the LOD    ...
Dataset Discovery – Metadata Initiatives (i)  ●   CKAN      –   Data hub platform covering LOD datasets      –   Allows or...
Dataset Discovery – Metadata Initiatives (ii)  ●   Ontologies for Linked Data       –   VoID – Vocabulary of Interlinked D...
Dataset Discovery – Metadata Initiatives (iii)              Example – LOV / VOAF metadata about the Music Ontology.       ...
Dataset Discovery – Information Sources ●   Schema Elements      –   Classes and properties, extracted from datasets using...
Dataset Discovery – Computation ●   Computation of overlap score O between the source dataset DS and each potential     ta...
Dataset Discovery – Results ●   CKAN Metadata      –   Complete but can skew results (non-representative keywords) ●   VoI...
Linkage Recommendation – Overview ●   Matching Linked Data resources → Data Linking      –   Given two datasets, links bet...
Linkage Recommendation – Specification (Silk LSL)        <Prefixes>          <Prefix id="rdfs" namespace="http://www.w3.or...
Linkage Recommendation – Information Sources (i) ●    User input        –    Tabular view for browsing the contents of two...
Linkage Recommendation – Information Sources (ii) ●   Ontology Alignments      –   Management of ontology correspondences ...
Linkage Recommendation – User Interface (i)                                              17
Linkage Recommendation – User Interface (ii)                                               18
Linkage Recommendation – User Interface (iii)                                                19
Conclusion ●   Linked Data is exhibiting growing adaptation ●   Development of a Data Linking system      –   Focus on dat...
Upcoming SlideShare
Loading in...5
×

A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data

324

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
324
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data"

  1. 1. A Candidate Dataset Discovery andLinkage Recommendation System for Linked Data OC WG Meeting – 26.6.2012 Michael Luger 1
  2. 2. Outline ● Introduction ● Linked Data ● Candidate Dataset Selection ● Linkage Recommendation ● Conclusion 2
  3. 3. Introduction ● Maturing of Semantic Web technologies ● Growing amount of structured data becoming available as Linked Data – Data publishers from a variety of domains should be encouraged to participate ● Establishment of links between Linked Data resources → Data Linking – Draws influences from Record Linkage, Ontology Matching – Need for solutions to facilitate this process ● Data Linking can benefit from the exploitation of various information sources – Available Metadata – Ontology Alignments – Statistical Information – User Input 3
  4. 4. Linked Data – Linking Open Data Initiative ● 300 datasets LOD Cloud – September 2011 ● 30 billion triples 4 ● 500 million RDF links
  5. 5. Linked Data – Publishing Workflow Ontology Selection Dataset Selection Conversion Matcher Configuration Publication Matching Interlinking Application 5
  6. 6. Dataset Discovery – Overview ● Goal – Given a source dataset, discover candidate target datasets from the LOD cloud Source Dataset Set of LOD DatasetsUser User-supplied Information Information Information Extraction Extraction Sources (pre-computed) Representative Information Sources Representative Information Sources (Keywords, Schema Elements, Topics, ...) (Keywords, Schema Elements, Topics, ...) Comparison of Information Sources Results – Top ranked similar Datasets from the set of LOD Datasets 6
  7. 7. Dataset Discovery – Metadata Initiatives (i) ● CKAN – Data hub platform covering LOD datasets – Allows organizations to publish metadata about their data in a structured format Example – CKAN metadata published about the dataset bbc­music. 7
  8. 8. Dataset Discovery – Metadata Initiatives (ii) ● Ontologies for Linked Data – VoID – Vocabulary of Interlinked Datasets ● RDFS vocabulary for describing linked datasets ● Main concepts – void:Dataset, void:Linkset – LOV – Linked Open Vocabularies ● RDFS vocabulary for describing ontologies used by linked datasets – VOAF – Vocabulary of a Friend ● Extension of LOV ● Allows to describe relationships between ontologies and topic classifications 8
  9. 9. Dataset Discovery – Metadata Initiatives (iii) Example – LOV / VOAF metadata about the Music Ontology. 9
  10. 10. Dataset Discovery – Information Sources ● Schema Elements – Classes and properties, extracted from datasets using SPARQL – Ontology Alignments ● Keywords – Class- and property-labels, extracted from schema information – CKAN metadata (tags, groups, …) – User-supplied keywords ● Vocabularies – Derived from schema elements – Retrieved from VoID metadata ● Topics – Retrieved from LOV / VOAF metadata – User-supplied topics 10
  11. 11. Dataset Discovery – Computation ● Computation of overlap score O between the source dataset DS and each potential target dataset DT ● DS … Source Dataset ● DT … Target Dataset ● KWDS, KWDT, SEDS, SEDT, TDS, TDT ... Sets of Keywords, Schema Elements, Topics ● kwDS, kwDT, seDS, seDT, tDS, tDT ... Individual Keywords, Schema Elements, Topics ● Mkw, Mse, Mt … Matching Predicates ● wkw, wse, wt ... Weights 11
  12. 12. Dataset Discovery – Results ● CKAN Metadata – Complete but can skew results (non-representative keywords) ● VoID Descriptions – Not available for all the datasets ● Schema Elements – Problems with retrieval for some datasets – Ontology alignments improve results significantly – Potential through application of ontology matching ● Topics – LOV provides a very general classification – Potential through application of Upper Level Ontologies ● Computation works well in combination with user input (weights, custom input) 12
  13. 13. Linkage Recommendation – Overview ● Matching Linked Data resources → Data Linking – Given two datasets, links between their resources are established by means of assessing similarity through matching – Reliance on well established value-matching techniques – Adapted to RDF-type data characteristics (graph-like structure, heterogeneous vocabularies, large collections) ● Existing data linking tools – Fully automated approaches exhibit limited applicability – Semi-automated approaches demand manual configuration → Linkage Specification ● Goal – User interactive environment for creating linkage specifications – Recommendation of linkage specifications – Exploitation of available information sources – Ability to perform matching and evaluate results in an iterative way 13
  14. 14. Linkage Recommendation – Specification (Silk LSL) <Prefixes> <Prefix id="rdfs" namespace="http://www.w3.org/2000/01/rdf-schema#" /> <Prefix id="dbpedia" namespace="http://dbpedia.org/ontology/" /> <Prefix id="gn" namespace="http://www.geonames.org/ontology#" /> </Prefixes> <DataSources> <DataSource id="dbpedia"> <Param name="endpointURI" value="http://example.org/sparql" /> <Param name="graph" value="http://dbpedia.org" /> </DataSource> <DataSource id="geonames"> <Param name="endpointURI" value="http://example2.org/sparql" /> <Param name="graph" value="http://sws.geonames.org/" /> </DataSource> </DataSources> <Interlinks> <Interlink id="cities"> <LinkType>owl:sameAs</LinkType> <SourceDataset dataSource="dbpedia" var="a"> <RestrictTo> ?a rdf:type dbpedia:City </RestrictTo> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <RestrictTo> ?b rdf:type gn:P </RestrictTo> </TargetDataset> <LinkageRule> <Aggregate type="average"> <Compare metric="jaro"> <Input path="?a/rdfs:label" /> <Input path="?b/gn:name" /> </Compare> <Compare metric="num"> <Input path="?a/dbpedia:populationTotal" /> <Input path="?b/gn:population" /> </Compare> </Aggregate> 14 </LinkageRule> <...>
  15. 15. Linkage Recommendation – Information Sources (i) ● User input – Tabular view for browsing the contents of two chosen datasets – Filtering of contents by specifying constraints – Ability to specify and execute linkage specification – Display of additional information sources inline Classes Properties Classes Properties Storage / Retrieval User Alignment Server SPARQL SPARQL Matcher & Analysis Service Source Dataset Target Dataset Endpoint Endpoint 15
  16. 16. Linkage Recommendation – Information Sources (ii) ● Ontology Alignments – Management of ontology correspondences as part of the user interface ● Statistics – Property Discriminability & Coverage ∣ o | t={< i , p , o >} ∣ dis p= ∣ t | t ={< i , p , o > } ∣ ∣ i |t ={< i , p , o >} ∣ cov  p = ∣ i| t={< i ,∗, o >} ∣ – Sample Property Pair Analysis ● Instance-based ontology matching based on instance samples 16
  17. 17. Linkage Recommendation – User Interface (i) 17
  18. 18. Linkage Recommendation – User Interface (ii) 18
  19. 19. Linkage Recommendation – User Interface (iii) 19
  20. 20. Conclusion ● Linked Data is exhibiting growing adaptation ● Development of a Data Linking system – Focus on dataset selection & matcher configuration – Combination of available information sources and user feedback ● Outlook – Data publishers should be encouraged to participate in Linked Data – High quality data is key – Important aspects also include up-to-date views on data, trust, provenance information, ... – Ongoing research and growing tool support 20
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×