Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

4,059 views

Published on

Published in: Education, Technology
  • Be the first to comment

Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

  1. 1. Institute for Web Science and Technologies University of Koblenz ▪ Landau, GermanySPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions Olaf Görlitz, Steffen Staab
  2. 2. Motivation How to access a large number of linked data sources?WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 2
  3. 3. Data Integration Approaches Data Warehouse Link Traversal Efficient query execution  Live Data Access Complete results  Flexible / On Demand Data copies  Incomplete results Inflexible  Biased by starting pointWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 3
  4. 4. Our Approach Data Federation Live data access Flexible source integration Effective query planning Complete resultsHypothesis:Efficient query federation is possible using core SemanticWeb technology (i.e. SPARQL endpoints, VoiD descriptions)WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 4
  5. 5. VoiD: „Vocabulary of Interlinked Datasets“ } General Information } Basic statistics triples = 732744 } Type statistics chebi:Compound = 50477 } Predicate statistics bio:formula = 39555WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 5
  6. 6. Distributed Query ProcessingContribution:Apply Best Practices of RDBMS for RDF Federation http://code.google.com/p/rdffederator/WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 6
  7. 7. Query Example Which drugs are categorized as micronutrients? SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } }WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 7
  8. 8. Query Processing Source Selection Join Optimization Query Execution SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } }WeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 8
  9. 9. Query Processing Source Selection Join Optimization Query Execution 1. Step: Index-based source mapping SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient . → drugbank   ?drug drugbank:casRegistryNumber ?id . → drugbank   ?keggDrug rdf:type kegg:Drug . → kegg   ?keggDrug bio2rdf:xRef ?id . → kegg   ?keggDrug purl:title ?title . } → kegg, dbpedia, Chebi } predicate-index type-index drugbank:drugCategory → drugbank kegg:Drug → keggWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 9
  10. 10. Query Processing Source Selection Join Optimization Query Execution 2. Step: Refinement with ASK Queries SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } } No index for subject / object valuesWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 10
  11. 11. Query Processing Source Selection Join Optimization Query Execution 3. Step: Grouping Triple Patterns SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id . } drugbank   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id . } kegg   ?keggDrug purl:title ?title . } } kegg, dbpedia, Chebi } + grouping sameAs patternsWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 11
  12. 12. Join Order Optimization Source Selection Join Optimization Query Execution Dynamic Programming with statistics-based cost estimation bind join / hash joinWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 12
  13. 13. Evaluation FedBench Evaluation Suite Measuring • Life Science + Cross Domain Data • #data sources selected • different query characteristics • query execution timeOrthogonal State-of-the-Art approaches: DARQ AliBaba FedX SPLENDID Statistics ServiceDesc – – VoiD Source Statistics All sources ASK queries Statistics + Selection (predicates) ASK queries Query DynProg Heuristics Heuristics DynProg Optimization Query Bind join Bind join Bound Join + Bind Join + Execution parallelization Hash JoinWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 13
  14. 14. Evaluation: Source Selection Source Selection Join Optimization Query Execution owl:sameAs rdf:typeWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 14
  15. 15. Evaluation: Query Optimization Source Selection Join Optimization Query ExecutionWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 15
  16. 16. Conclusion Publish more VoiD description! VoiD-based query federation is efficientWhat next? Combination with FedX Improving estimation and cost model Integrating SPARQL 1.1 featuresWeST Institute Olaf GörlitzPeople and Knowledge Networks COLD 2011, Bonn, Germany Slide 16

×