Executing SPARQL Queries
         over the
   Web of Linked Data
Olaf Hartig*
Christian Bizer˚
Johann-Christoph Freytag*
*...
●   Use URIs as names for things
                                                           ●   Use HTTP URIs so that peop...
●   Use URIs as names for things
                                                                                       ● ...
●   Use URIs as names for things
                                                                                        ●...
●   Use URIs as names for things
                                                                                        ●...
●   Use URIs as names for things
                                                                                        ●...
●   Use URIs as names for things
                                                                                        ●...
●   Use URIs as names for things
                                                                                        ●...
●   The Web: a huge, globally distributed dataspace
 ●   Querying this dataspace opens new possibilities:
     ●   Aggrega...
Traditional approach 1:
    data centralization


 ●   Querying a collection of
     copies from all relevant
     dataset...
Traditional approach 1: data centralization
 ●   Querying a collection of
     copies from all relevant
     datasets




...
Traditional approach 2:
    federated query processing


 ●   Querying a mediator which                                   ...
Traditional approach 2: federated query processing
 ●   Querying a mediator which distributes
     subqueries to relevant ...
Main drawback:

                                You have to know the relevant
                                  data sourc...
A novel approach:

  Link Traversal Based Query Execution

                       Allows data sources to be discovered at ...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                               ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                               ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                               ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                    ...
In a Nutshell

 ●   Link traversal based query execution:
     ●   Evaluation on a continuously augmented dataset
     ●  ...
Real-World Examples
 SELECT DISTINCT ?author ?phone WHERE {
     ?pub swc:isPartOf
           <http://data.semanticweb.org...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
     ●   is a group of functions:
         ...
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                           ...
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                           ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution
                                           ?c                               ?cStats

      ...
Iterator based Query Execution


 ●   Results of Ii are solutions for tp1 , … , tpi

                                     ...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
Application to Link Traversal


 ●   The queried data set grows



                                                       ...
Application to Link Traversal


 ●   The queried data set grows


 ●   Look-up Requirement:
                              ...
Application to Link Traversal
                                           ?c                               ?cStats

       ...
Application to Link Traversal
                                           ?c                               ?cStats

       ...
Application to Link Traversal
                                           ?c                               ?cStats

       ...
Blocked Query Execution
 ●   Waiting for URI look-ups
     blocks query execution
                                        ...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
URI Prefetching
 ●   Waiting for URI look-ups
     blocks query execution
 ●   URI prefetching: when a URI                ...
URI Prefetching
                                           ?c                               ?cStats

                     ...
URI Prefetching


                                                                                     Ii-1 for tpi-1



 ...
URI Prefetching


                                                                                     Ii-1 for tpi-1



 ...
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                            ...
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                            ...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
Postponing Iterator


 ●   Enabled by an extension of the iterator paradigm:
     ●   New function POSTPONE: take most rec...
Postponing Iterator
                                           ?c                               ?cStats

                 ...
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Imp...
Evaluation
 ●   Implementation: Semantic Web Client Library (SWClLib)
          http://www4.wiwiss.fu-berlin.de/bizer/ng4j...
Evaluation
                                               250

                                                           ...
Take-away Summary
 ●   Novel query execution approach for the Web of Data:
     ●   Utilizes the characteristics of the We...
Try it!


 ●   SQUIN                                                           http://squin.org
     ●   Provides SWClLib ...
These slides have been created by
                                      Olaf Hartig

                                     ...
Upcoming SlideShare
Loading in …5
×

Executing SPARQL Queries of the Web of Linked Data

2,384
-1

Published on

With these slides I presented my paper at the International Semantic Web Conference (ISWC'09), Washington DC, USA, Oct.2009

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,384
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
245
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Executing SPARQL Queries of the Web of Linked Data

  1. 1. Executing SPARQL Queries over the Web of Linked Data Olaf Hartig* Christian Bizer˚ Johann-Christoph Freytag* *Humboldt-Universität zu Berlin ˚Freie Universität Berlin
  2. 2. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 My Movie DB Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  3. 3. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  4. 4. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  5. 5. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  6. 6. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  7. 7. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  8. 8. ● Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  9. 9. ● The Web: a huge, globally distributed dataspace ● Querying this dataspace opens new possibilities: ● Aggregating data from different sources ● Integrating fragmentary information ● Achieving a more complete view Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  10. 10. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  11. 11. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets ● Misses unknown or new sources ● Collection probably out of date ● Will it scale? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  12. 12. Traditional approach 2: federated query processing ● Querying a mediator which ? distributes subqueries to relevant sources and integrates the results ? ? ? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  13. 13. Traditional approach 2: federated query processing ● Querying a mediator which distributes subqueries to relevant sources and integrates the results ? ● Requires sources to provide a query service Requires information ? ● about the sources ? ? ● Misses unknown or new sources Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  14. 14. Main drawback: You have to know the relevant data sources in advance. You restrict yourself to the selected sources. You do not tap the full potential of the Web ! Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  15. 15. A novel approach: Link Traversal Based Query Execution Allows data sources to be discovered at runtime Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  16. 16. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  17. 17. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data
  18. 18. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  19. 19. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  20. 20. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  21. 21. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  22. 22. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  23. 23. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  24. 24. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  25. 25. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  26. 26. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  27. 27. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  28. 28. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  29. 29. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  30. 30. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  31. 31. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate ?loc ?stat solutions and add retrieved data http://geo.../Italy http://stats.db/../it to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  32. 32. In a Nutshell ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  33. 33. Real-World Examples SELECT DISTINCT ?author ?phone WHERE { ?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> . ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel . FILTER regex( str(?topicLabel), "ontology engineering", "i" ) . ?pub swrc:author ?author . { ?author owl:sameAs ?authorAlt } Return phone numbers of authors of ontology engineering papers UNION at ESWC'09. { ?authorAlt owl:sameAs ?author } ?authorAlt foaf:phone ?phone # of query results 2 } # of retrieved graphs 297 # of accessed servers 16 avg. execution time 1min 30sec Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  34. 34. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  35. 35. Iterator based Query Execution ● Iterator: ● implements an operation ● is a group of functions: OPEN, GETNEXT, CLOSE Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  36. 36. Iterator based Query Execution ● Iterator: ● implements an operation I1 ● is a group of functions: OPEN, GETNEXT, CLOSE I2 ● Query execution uses a chain of iterators I3 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  37. 37. Iterator based Query Execution ● Iterator: ● implements an operation http://.../movie2449 I1 ● is a group of functions: filmin gLoc ation ?loc OPEN, GETNEXT, CLOSE stati stics ?stat I2 ?loc ● Query execution uses a chain of iterators ?stat I3 Each iterator responsible unem p ● _rate ?ur for a single triple pattern http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  38. 38. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  39. 39. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  40. 40. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  41. 41. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  42. 42. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  43. 43. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... } Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  44. 44. Iterator based Query Execution ● Results of Ii are solutions for tp1 , … , tpi Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  45. 45. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  46. 46. Application to Link Traversal ● The queried data set grows Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  47. 47. Application to Link Traversal ● The queried data set grows ● Look-up Requirement: Ii-1 for tpi-1 Do not evaluate tpcur until the queried data set contains all data that can be retrieved from Ii for tpi all URIs in tpcur Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  48. 48. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  49. 49. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  50. 50. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  51. 51. Blocked Query Execution ● Waiting for URI look-ups blocks query execution Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  52. 52. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  53. 53. URI Prefetching ● Waiting for URI look-ups blocks query execution ● URI prefetching: when a URI Ii-1 for tpi-1 is bound to a variable initiate look-up in the background Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  54. 54. URI Prefetching ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  55. 55. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  56. 56. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  57. 57. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  58. 58. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi ● Possible solutions: ● Program parallelism ● Asynchronous pipeline Ii+1 for tpi+1 ● Drawback: requires major Wait until look-up rewrite of existing is finished query engines Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  59. 59. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  60. 60. Postponing Iterator ● Enabled by an extension of the iterator paradigm: ● New function POSTPONE: take most recently provided result back ● Adjusted GETNEXT: either return the next result or return a formerly postponed result ● POSTPONE allows to temporarily reject input solution μcur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  61. 61. Postponing Iterator ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  62. 62. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  63. 63. Evaluation ● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ ● Berlin SPARQL Benchmark (BSBM) ● Simulates e-commerce scenario ● Mix of 12 SPARQL queries ● Generates datasets of different sizes (scaling factor) ● Simulation of the Web of Linked Data ● Linked Data server publishes BSBM datasets ● Experiment ● Adjusted BSBM queries link to the simulation server ● Execute query mix with SWClLib Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  64. 64. Evaluation 250 w/o prefetching w/ prefetching avg. execution time per query mix in seconds non-blocking + 200 prefetching all data retrieved in advance 150 100 50 scal.factor # of triples # of entities 10 4,971 613 20 8,485 928 30 11,999 1,245 0 40 16,918 1,845 10 20 30 40 50 60 BSBM scaling factor 50 22,616 2,599 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
  65. 65. Take-away Summary ● Novel query execution approach for the Web of Data: ● Utilizes the characteristics of the Web ● Traverses RDF links during query execution ● Discovery of new data sources ● No need to know all data sources in advance ● Implementation approach: ● Iterator based execution with URI Prefetching ● Extension of the iterator paradigm (POSTPONE) ● New research challenges: ● Improving result completeness ● Investigating suitable caching strategies Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  66. 66. Try it! ● SQUIN http://squin.org ● Provides SWClLib functionality as a Web service ● Accessible like a SPARQL endpoint ● Public SQUIN service at http://squin.informatik.hu-berlin.de/SQUIN/ Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  67. 67. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×