DESWeb 2014
ICDE 2014, Chicago IL, USA, March 3
balloon Fusion
SPARQL Rewriting Based on
Unified Co-Reference Information
...
2
Motivation
SPARQL Rewriting & Federation
Intermediate Results
Outline
supported by the European Commission
under the Sev...
3
Linked Data is
the heart of Semantic Web
“
- W3C Semantic Web Group
4
5
• Easy access to Linked Data
• Query Linked Open Data with SPARQL
• Plethora of tools available
• Problems:
• Business ori...
• How to get information about the German City „Passau“?
• Problem: LOD is not a single database!
7
Querying LOD
SPARQL SP...
• Problem: Selection of appropriate endpoints
• Send query to some endpoints and aggregate the results?
8
Distributed Quer...
• Problem: Different identifier for the same semantic concept
9
Misunderstanding: Co-Referencing
SPARQL SPARQL
RDF
RDFRDF
...
10
Problem = Solution?
SPARQL-based crawling of co-reference information
Exploit co-reference information for
• accomplish...
11
Components
balloon toolsuite
12
balloon Overflight
• SPARQL based crawling of LOD endpoints
• Query: Ask for subjects and objects which are
related wit...
13
balloon Fusion
SPARQL Federation setup using co-reference information
SPARQL Transformation for each BGP
1. Determine s...
14
1. Determine synonym URIs
SELECT ?p ?o WHERE {
<http://de.dbpedia.org/resource/Passau> ?p ?o.
}
SPARQL
15
2. Select suitable endpoints
• Provenance based selection (PBS)
• Endpoints which are involved in cluster composition
•...
16
2. Select suitable endpoints (2)
Assumption:
• Provenance information only contains „linkedgeodata.org“
as co-reference...
17
3. Adapt sub-queries to endpoints
PBS:
Linked-Geo-Data
Endpoint
NBS:
DBPedia
endpoint
NBS:
Freebase
endpoint
SELECT ?p ...
• W3C SPARQL 1.1 Federated Query Extension (SERVICE)
• (Partial) Query can be executed against a remote SPARQL
endpoint
• ...
• Endpoint status check
• Check routine in terms of availability and latency
• Minimize sub-queries
• Group sub-queries wi...
balloon Overflight
Results
20
21
Results from a sounding
balloon
22
balloon toolsuite
23
Statistics
• Datahub.io: Linked Open Data Cloud catalog
• 337 datasets in total
• 237 expose a SPARQL endpoint
• 112 su...
Open Source:
• Demo, information and sources available (MIT License)
• X as a Service
• SPARQL Rewriting (HTTP API)
• Quer...
Summary:
• SPARQL-based crawling of distributed co-reference information
• Exploit co-reference information for SPARQL fed...
Any questions?
“
26
Research is formalized curiosity.
It is poking and prying with a
purpose. - Zora Neale Hurston
Upcoming SlideShare
Loading in …5
×

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information

856 views

Published on

Presentation for 5th International Workshop on
Data Engineering meets the Semantic Web (DESWeb)
In conjunction with ICDE 2014, Chicago IL, USA, March 31, 2014 held by Kai Schlegel

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
856
On SlideShare
0
From Embeds
0
Number of Embeds
235
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information

  1. 1. DESWeb 2014 ICDE 2014, Chicago IL, USA, March 3 balloon Fusion SPARQL Rewriting Based on Unified Co-Reference Information Kai Schlegel (kai.schlegel@googlemail.com) Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch
  2. 2. 2 Motivation SPARQL Rewriting & Federation Intermediate Results Outline supported by the European Commission under the Seventh Framework Program
  3. 3. 3 Linked Data is the heart of Semantic Web “ - W3C Semantic Web Group
  4. 4. 4
  5. 5. 5
  6. 6. • Easy access to Linked Data • Query Linked Open Data with SPARQL • Plethora of tools available • Problems: • Business oriented • Complex setup • Maintenance • „Paper-only“ • Not developer friendly •  Simple and „instant“ SPARQL Query Federation (-as-a-Service) 6 Motivation Nothing-as-a-Service
  7. 7. • How to get information about the German City „Passau“? • Problem: LOD is not a single database! 7 Querying LOD SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org Relations, Coordinates, Leader, etc. What about the population? SPARQL
  8. 8. • Problem: Selection of appropriate endpoints • Send query to some endpoints and aggregate the results? 8 Distributed Querying! SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org SPARQL linkedgeodata.org
  9. 9. • Problem: Different identifier for the same semantic concept 9 Misunderstanding: Co-Referencing SPARQL SPARQL RDF RDFRDF SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } de.dbpedia.org SPARQL linkedgeodata.org Known problem in linguistic: It’s a spud!“ What?“ I mean potato!“ Co-Referencing: Multiple expressions refer to the same thing.
  10. 10. 10 Problem = Solution? SPARQL-based crawling of co-reference information Exploit co-reference information for • accomplishing immediate SPARQL rewriting • performing endpoint selection • execute automatic query federation Basic idea: Focusing distributed co-reference information Main principle: Semantic entites over identifier!
  11. 11. 11 Components balloon toolsuite
  12. 12. 12 balloon Overflight • SPARQL based crawling of LOD endpoints • Query: Ask for subjects and objects which are related with special predicate • Simplified global view on • Equivalence: owl:SameAs, skos:exactMatch, coref:coreferenceData, ... • Graph-Database Neo4j • Equivalence Cluster: Multiple synonym URIs representing the same semantic entity including Provenance
  13. 13. 13 balloon Fusion SPARQL Federation setup using co-reference information SPARQL Transformation for each BGP 1. Determine synonym URIs 2. Select suitable endpoints 3. Adapt sub-queries to endpoints 4. Federated querying SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  14. 14. 14 1. Determine synonym URIs SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  15. 15. 15 2. Select suitable endpoints • Provenance based selection (PBS) • Endpoints which are involved in cluster composition • Namespace based selection (NBS) • Prefix and Namespace matching of synonym URLs Summarized: origin of co-reference information and origin of synonym URIs
  16. 16. 16 2. Select suitable endpoints (2) Assumption: • Provenance information only contains „linkedgeodata.org“ as co-reference origin • Namespaces for freebase and dbpedia available (datahub.io) PBS: Linked-Geo-Data Endpoint NBS: DBPedia endpoint NBS: Freebase endpoint
  17. 17. 17 3. Adapt sub-queries to endpoints PBS: Linked-Geo-Data Endpoint NBS: DBPedia endpoint NBS: Freebase endpoint SELECT ?p ?o WHERE { <http://rdf.freebase.com/ ns/m.01h5td> ?p ?o. } SPARQL SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL SELECT ?p ?o WHERE { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. } } SPARQL SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o. } SPARQL
  18. 18. • W3C SPARQL 1.1 Federated Query Extension (SERVICE) • (Partial) Query can be executed against a remote SPARQL endpoint • Distributed sub-queries don‘t contain SPARQL 1.1 features 18 4. Federated Querying SPARQL SELECT ?p ?o WHERE { SERVICE <http://dbpedia.org/sparql> { <http://de.dbpedia.org/resource/Passau> ?p ?o. } UNION { SERVICE <http://www.freebase.com/base/sparql> { <http://rdf.freebase.com/ns/m.01h5td> ?p ? } } UNION { SERVICE <http://linkedgeodata.org/sparql/> { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. } }}}
  19. 19. • Endpoint status check • Check routine in terms of availability and latency • Minimize sub-queries • Group sub-queries with common endpoint • Push join to endpoint • SPARQL Features • Condense PBS UNION-construct of synonym URIs • SPARQL 1.1 VALUES or FILTER with IN operator • Not well implemented in Linked Data endpoints 19 Optimizations (ongoing)
  20. 20. balloon Overflight Results 20
  21. 21. 21 Results from a sounding balloon
  22. 22. 22 balloon toolsuite
  23. 23. 23 Statistics • Datahub.io: Linked Open Data Cloud catalog • 337 datasets in total • 237 expose a SPARQL endpoint • 112 successfully queried for co-reference information • Balloon Dataset (first run) • 17.6M co-reference statements • 22.4M distinct URLs • 8.4M equivalence cluster (~ 2.68 identifier per cluster) • Pending Analysis • Distribution of cluster sizes, Number of different Hosts per cluster • Main representative per cluster & False-Friends
  24. 24. Open Source: • Demo, information and sources available (MIT License) • X as a Service • SPARQL Rewriting (HTTP API) • Query Federation (SPARQL) 24 http://schlegel.github.io/balloon
  25. 25. Summary: • SPARQL-based crawling of distributed co-reference information • Exploit co-reference information for SPARQL federation 25 Single Point of Access
  26. 26. Any questions? “ 26 Research is formalized curiosity. It is poking and prying with a purpose. - Zora Neale Hurston

×