Your SlideShare is downloading. ×
Executing SPARQL Queries of the Web of Linked Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Executing SPARQL Queries of the Web of Linked Data

2,263
views

Published on

With these slides I presented my paper at the International Semantic Web Conference (ISWC'09), Washington DC, USA, Oct.2009

With these slides I presented my paper at the International Semantic Web Conference (ISWC'09), Washington DC, USA, Oct.2009


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,263
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
242
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Executing SPARQL Queries over the Web of Linked Data Olaf Hartig* Christian Bizer˚ Johann-Christoph Freytag* *Humboldt-Universität zu Berlin ˚Freie Universität Berlin
  • 2. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 My Movie DB Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 3. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 4. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 5. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 6. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 7. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 8. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 9. ● The Web: a huge, globally distributed dataspace ● Querying this dataspace opens new possibilities: ● Aggregating data from different sources ● Integrating fragmentary information ● Achieving a more complete view Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 10. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 11. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets ● Misses unknown or new sources ● Collection probably out of date ● Will it scale? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 12. Traditional approach 2: federated query processing ● Querying a mediator which ? distributes subqueries to relevant sources and integrates the results ? ? ? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 13. Traditional approach 2: federated query processing ● Querying a mediator which distributes subqueries to relevant sources and integrates the results ? ● Requires sources to provide a query service Requires information ? ● about the sources ? ? ● Misses unknown or new sources Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 14. Main drawback: You have to know the relevant data sources in advance. You restrict yourself to the selected sources. You do not tap the full potential of the Web ! Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 15. A novel approach: Link Traversal Based Query Execution Allows data sources to be discovered at runtime Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 16. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 17. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data
  • 18. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 19. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 20. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 21. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 22. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 23. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 24. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 25. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 26. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 27. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 28. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 29. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 30. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 31. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate ?loc ?stat solutions and add retrieved data http://geo.../Italy http://stats.db/../it to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 32. In a Nutshell ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 33. Real-World Examples SELECT DISTINCT ?author ?phone WHERE { ?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> . ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel . FILTER regex( str(?topicLabel), "ontology engineering", "i" ) . ?pub swrc:author ?author . { ?author owl:sameAs ?authorAlt } Return phone numbers of authors of ontology engineering papers UNION at ESWC'09. { ?authorAlt owl:sameAs ?author } ?authorAlt foaf:phone ?phone # of query results 2 } # of retrieved graphs 297 # of accessed servers 16 avg. execution time 1min 30sec Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 34. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 35. Iterator based Query Execution ● Iterator: ● implements an operation ● is a group of functions: OPEN, GETNEXT, CLOSE Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 36. Iterator based Query Execution ● Iterator: ● implements an operation I1 ● is a group of functions: OPEN, GETNEXT, CLOSE I2 ● Query execution uses a chain of iterators I3 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 37. Iterator based Query Execution ● Iterator: ● implements an operation http://.../movie2449 I1 ● is a group of functions: filmin gLoc ation ?loc OPEN, GETNEXT, CLOSE stati stics ?stat I2 ?loc ● Query execution uses a chain of iterators ?stat I3 Each iterator responsible unem p ● _rate ?ur for a single triple pattern http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 38. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 39. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 40. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 41. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 42. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 43. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... } Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 44. Iterator based Query Execution ● Results of Ii are solutions for tp1 , … , tpi Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 45. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 46. Application to Link Traversal ● The queried data set grows Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 47. Application to Link Traversal ● The queried data set grows ● Look-up Requirement: Ii-1 for tpi-1 Do not evaluate tpcur until the queried data set contains all data that can be retrieved from Ii for tpi all URIs in tpcur Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 48. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 49. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 50. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 51. Blocked Query Execution ● Waiting for URI look-ups blocks query execution Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 52. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 53. URI Prefetching ● Waiting for URI look-ups blocks query execution ● URI prefetching: when a URI Ii-1 for tpi-1 is bound to a variable initiate look-up in the background Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 54. URI Prefetching ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 55. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 56. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 57. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 58. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi ● Possible solutions: ● Program parallelism ● Asynchronous pipeline Ii+1 for tpi+1 ● Drawback: requires major Wait until look-up rewrite of existing is finished query engines Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 59. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 60. Postponing Iterator ● Enabled by an extension of the iterator paradigm: ● New function POSTPONE: take most recently provided result back ● Adjusted GETNEXT: either return the next result or return a formerly postponed result ● POSTPONE allows to temporarily reject input solution μcur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 61. Postponing Iterator ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 62. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 63. Evaluation ● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ ● Berlin SPARQL Benchmark (BSBM) ● Simulates e-commerce scenario ● Mix of 12 SPARQL queries ● Generates datasets of different sizes (scaling factor) ● Simulation of the Web of Linked Data ● Linked Data server publishes BSBM datasets ● Experiment ● Adjusted BSBM queries link to the simulation server ● Execute query mix with SWClLib Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 64. Evaluation 250 w/o prefetching w/ prefetching avg. execution time per query mix in seconds non-blocking + 200 prefetching all data retrieved in advance 150 100 50 scal.factor # of triples # of entities 10 4,971 613 20 8,485 928 30 11,999 1,245 0 40 16,918 1,845 10 20 30 40 50 60 BSBM scaling factor 50 22,616 2,599 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
  • 65. Take-away Summary ● Novel query execution approach for the Web of Data: ● Utilizes the characteristics of the Web ● Traverses RDF links during query execution ● Discovery of new data sources ● No need to know all data sources in advance ● Implementation approach: ● Iterator based execution with URI Prefetching ● Extension of the iterator paradigm (POSTPONE) ● New research challenges: ● Improving result completeness ● Investigating suitable caching strategies Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 66. Try it! ● SQUIN http://squin.org ● Provides SWClLib functionality as a Web service ● Accessible like a SPARQL endpoint ● Public SQUIN service at http://squin.informatik.hu-berlin.de/SQUIN/ Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 67. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data