Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

3,836 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,836
On SlideShare
0
From Embeds
0
Number of Embeds
1,841
Actions
Shares
0
Downloads
33
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Pre-selected linked datasets Transparent query federation
  • Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

    1. 1. Institute for Web Science and Technologies University of Koblenz ▪ Landau, GermanySPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions Olaf Görlitz, Steffen Staab
    2. 2. Motivation How to access a large number of linked data sources?WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 2
    3. 3. Data Integration Approaches Data Warehouse Link Traversal Efficient query execution  Live Data Access Complete results  Flexible / On Demand Data copies  Incomplete results Inflexible  Biased by starting pointWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 3
    4. 4. Our Approach Data Federation Live data access Flexible source integration Effective query planning Complete resultsHypothesis:Efficient query federation is possible using core SemanticWeb technology (i.e. SPARQL endpoints, VoiD descriptions)WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 4
    5. 5. VoiD: „Vocabulary of Interlinked Datasets“ } General Information } Basic statistics triples = 732744 } Type statistics chebi:Compound = 50477 } Predicate statistics bio:formula = 39555WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 5
    6. 6. Distributed Query ProcessingContribution:Apply Best Practices of RDBMS for RDF Federation http://code.google.com/p/rdffederator/WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 6
    7. 7. Query Example Which drugs are categorized as micronutrients? SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } }WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 7
    8. 8. Query Processing Source Selection Join Optimization Query Execution SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } }WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 8
    9. 9. Query Processing Source Selection Join Optimization Query Execution 1. Step: Index-based source mapping SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient . → drugbank   ?drug drugbank:casRegistryNumber ?id . → drugbank   ?keggDrug rdf:type kegg:Drug . → kegg   ?keggDrug bio2rdf:xRef ?id . → kegg   ?keggDrug purl:title ?title . } → kegg, dbpedia, Chebi } predicate-index type-index drugbank:drugCategory → drugbank kegg:Drug → keggWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 9
    10. 10. Query Processing Source Selection Join Optimization Query Execution 2. Step: Refinement with ASK Queries SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } } No index for subject / object valuesWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 10
    11. 11. Query Processing Source Selection Join Optimization Query Execution 3. Step: Grouping Triple Patterns SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id . } drugbank   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id . } kegg   ?keggDrug purl:title ?title . } } kegg, dbpedia, Chebi } + grouping sameAs patternsWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 11
    12. 12. Join Order Optimization Source Selection Join Optimization Query Execution Dynamic Programming with statistics-based cost estimation bind join / hash joinWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 12
    13. 13. Evaluation FedBench Evaluation Suite Measuring • Life Science + Cross Domain Data • #data sources selected • different query characteristics • query execution timeOrthogonal State-of-the-Art approaches: DARQ AliBaba FedX SPLENDID Statistics ServiceDesc – – VoiD Source Statistics All sources ASK queries Statistics + Selection (predicates) ASK queries Query DynProg Heuristics Heuristics DynProg Optimization Query Bind join Bind join Bound Join + Bind Join + Execution parallelization Hash JoinWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 13
    14. 14. Evaluation: Source Selection Source Selection Join Optimization Query Execution owl:sameAs rdf:typeWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 14
    15. 15. Evaluation: Query Optimization Source Selection Join Optimization Query ExecutionWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 15
    16. 16. Conclusion Publish more VoiD description! VoiD-based query federation is efficientWhat next? Combination with FedX Improving estimation and cost model Integrating SPARQL 1.1 featuresWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 16

    ×