Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Executing SPARQL Queries of the Web of Linked Data
1. Executing SPARQL Queries
over the
Web of Linked Data
Olaf Hartig*
Christian Bizer˚
Johann-Christoph Freytag*
*Humboldt-Universität zu Berlin ˚Freie Universität Berlin
2. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
● When someone looks up a
URI, provide useful
information.
● Include links to other URIs so
that they can discover more
things.
Tim Berners-Lee, July 2006
My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
3. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
● When someone looks up a
URI, provide useful
information.
● Include links to other URIs so
that they can discover more
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
4. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
5. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
6. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://mymovie.db/movie5112
My Movie DB
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
7. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://geo.db/country21
http://geo.db/country7
http://mymovie.db/movie5112
My Movie DB http://geo.db/cityCJ
http://geo.db/cityXA
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
8. ● Use URIs as names for things
● Use HTTP URIs so that people
can look up those names.
http://m
● When someone looks up a
ymovie
URI, provide useful
information.
?
.d
b/movie
● Include links to other URIs so
that they can discover more
2449
things.
Tim Berners-Lee, July 2006
http://mymovie.db/movie1342
http://mymovie.db/movie0362
http://geo.db/country21
http://geo.db/country7
http://mymovie.db/movie5112
My Movie DB http://geo.db/cityCJ
http://geo.db/cityXA
http://mymovie.db/movie2449
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
9. ● The Web: a huge, globally distributed dataspace
● Querying this dataspace opens new possibilities:
● Aggregating data from different sources
● Integrating fragmentary information
● Achieving a more complete view
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
10. Traditional approach 1:
data centralization
● Querying a collection of
copies from all relevant
datasets
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
11. Traditional approach 1: data centralization
● Querying a collection of
copies from all relevant
datasets
● Misses unknown or new sources
● Collection probably out of date
● Will it scale?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
12. Traditional approach 2:
federated query processing
● Querying a mediator which ?
distributes subqueries to
relevant sources and
integrates the results
?
? ?
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
13. Traditional approach 2: federated query processing
● Querying a mediator which distributes
subqueries to relevant sources and
integrates the results ?
● Requires sources to
provide a query service
Requires information
?
●
about the sources
? ?
● Misses unknown
or new sources
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
14. Main drawback:
You have to know the relevant
data sources in advance.
You restrict yourself to
the selected sources.
You do not tap the
full potential of
the Web !
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
15. A novel approach:
Link Traversal Based Query Execution
Allows data sources to be discovered at runtime
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
16. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
17. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
18. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
19. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
20. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
21. Main Idea
● Intertwine query evaluation with traversal of RDF links
Alternately:
htt
●
p:/
/.
Evaluate parts of the query on a
../m ?
●
continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
22. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
23. Main Idea
● Intertwine query evaluation with traversal of RDF links
● Alternately:
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
filmingLocation
http://.../movie2449 http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
24. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
filmingLocation
http://.../movie2449 http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
25. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
26. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
27. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
? aly
continuously augmented set of data
./I t
..
g eo
Look up URIs in intermediate
://
●
p
htt
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
28. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
29. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
30. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the queried data set
tics http://stat.db/.../it
statis
http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
31. Main Idea
● Intertwine query evaluation with traversal of RDF links
?loc
● Alternately:
http://geo.../Italy
● Evaluate parts of the query on a
continuously augmented set of data
● Look up URIs in intermediate ?loc ?stat
solutions and add retrieved data http://geo.../Italy http://stats.db/../it
to the queried data set
tics http://stat.db/.../it
statis
http://geo.../Italy
Queried data
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
32. In a Nutshell
● Link traversal based query execution:
● Evaluation on a continuously augmented dataset
● Discovery of potentially relevant data during execution
● Discovery driven by intermediate solutions
● Main advantage:
● No need to know all data sources in advance
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
33. Real-World Examples
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf
<http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
Return phone numbers of
authors of ontology engineering papers
UNION
at ESWC'09.
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone # of query results 2
} # of retrieved graphs 297
# of accessed servers 16
avg. execution time 1min 30sec
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
34. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
35. Iterator based Query Execution
● Iterator:
● implements an operation
● is a group of functions:
OPEN, GETNEXT, CLOSE
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
36. Iterator based Query Execution
● Iterator:
● implements an operation
I1
● is a group of functions:
OPEN, GETNEXT, CLOSE
I2
● Query execution uses
a chain of iterators
I3
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
37. Iterator based Query Execution
● Iterator:
● implements an operation
http://.../movie2449
I1
● is a group of functions: filmin
gLoc
ation ?loc
OPEN, GETNEXT, CLOSE
stati
stics
?stat I2
?loc
● Query execution uses
a chain of iterators
?stat
I3
Each iterator responsible
unem
p
● _rate
?ur
for a single triple pattern
http://.../movie2449 s ?stat unem Query
filmin tis t ic p_ r a
g Loca sta te
t io n ?loc ?ur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
38. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
39. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
40. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
41. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
42. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
μ' = { ?s → http://db... }
4. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
43. Iterator based Query Execution
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
tpi = ( ?loc ex:stats ?s )
Ii for tpi
μcur = { ?p → http://ex... , ?loc → http://geo... } Example
1. Substitute tpcur = μcur [ tpi ]
tpcur = ( http://geo... ex:stats ?s )
2. Find matching triples match(tpcur ) in queried data set
(http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
3. Create solution μ' for each t in match(tpcur )
μ' = { ?s → http://db... }
4. Return each μcur U μ' as a result
{ ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
44. Iterator based Query Execution
● Results of Ii are solutions for tp1 , … , tpi
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
45. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
46. Application to Link Traversal
● The queried data set grows
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
47. Application to Link Traversal
● The queried data set grows
● Look-up Requirement:
Ii-1 for tpi-1
Do not evaluate tpcur until the
queried data set contains all
data that can be retrieved from Ii for tpi
all URIs in tpcur
Ii+1 for tpi+1
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
48. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
49. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
50. Application to Link Traversal
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Return each μcur U μ' as a result
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
51. Blocked Query Execution
● Waiting for URI look-ups
blocks query execution
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
52. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
53. URI Prefetching
● Waiting for URI look-ups
blocks query execution
● URI prefetching: when a URI Ii-1 for tpi-1
is bound to a variable initiate
look-up in the background
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Ensure look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
54. URI Prefetching
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. Ensure look-up requirement for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Initiate parallel look-up for each new URI in μ'
6. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
55. URI Prefetching
Ii-1 for tpi-1
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Ensure look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
56. URI Prefetching
Ii-1 for tpi-1
Initiate look-up
Ii for tpi
Ii+1 for tpi+1
Wait until look-up
is finished Initiate look-ups
and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
57. URI Prefetching
● Even with URI prefetching
query execution may block
Ii-1 for tpi-1
Ii for tpi
Ii+1 for tpi+1
Wait until look-up
is finished
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
58. URI Prefetching
● Even with URI prefetching
query execution may block
Ii-1 for tpi-1
Ii for tpi
● Possible solutions:
● Program parallelism
● Asynchronous pipeline Ii+1 for tpi+1
● Drawback: requires major Wait until look-up
rewrite of existing is finished
query engines
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
59. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
60. Postponing Iterator
● Enabled by an extension of the iterator paradigm:
● New function POSTPONE: take most recently provided
result back
● Adjusted GETNEXT: either return the next result or return
a formerly postponed result
● POSTPONE allows to temporarily reject input solution μcur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
61. Postponing Iterator
?c ?cStats
http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1
http://geo.db/country/IT http://stats.example.org/ITstatistics
μcur
http://geo.db/country/IT http://stats.db/example/It
http://example.db/ctry/DE http://stats.example.org/Germany
Ii for tpi
1. Substitute tpcur = μcur [ tpi ]
2. POSTPONE μcur if look-up requirement doesn't hold for tpcur
3. Find matching triples match(tpcur ) in queried data set
4. Create solution μ' for each t in match(tpcur )
5. Initiate parallel look-up for each new URI in μ'
6. Return each μcur U μ' as a result
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
62. Outline
Part I
Overview of Link Traversal based Query Execution
Part II
An Iterator based Implementation Approach
➢ Introduction to the Iterator Paradigm
➢ Application to Link Traversal based Query Execution
➢ URI Prefetching
➢ Extension to the Iterator Paradigm
➢ Evaluation
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
63. Evaluation
● Implementation: Semantic Web Client Library (SWClLib)
http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
● Berlin SPARQL Benchmark (BSBM)
● Simulates e-commerce scenario
● Mix of 12 SPARQL queries
● Generates datasets of different sizes (scaling factor)
● Simulation of the Web of Linked Data
● Linked Data server publishes BSBM datasets
● Experiment
● Adjusted BSBM queries link to the simulation server
● Execute query mix with SWClLib
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
64. Evaluation
250
w/o prefetching
w/ prefetching
avg. execution time per query mix in seconds
non-blocking +
200 prefetching
all data retrieved
in advance
150
100
50
scal.factor # of triples # of entities
10 4,971 613
20 8,485 928
30 11,999 1,245
0 40 16,918 1,845
10 20 30 40 50 60
BSBM scaling factor 50 22,616 2,599
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
65. Take-away Summary
● Novel query execution approach for the Web of Data:
● Utilizes the characteristics of the Web
● Traverses RDF links during query execution
● Discovery of new data sources
● No need to know all data sources in advance
● Implementation approach:
● Iterator based execution with URI Prefetching
● Extension of the iterator paradigm (POSTPONE)
● New research challenges:
● Improving result completeness
● Investigating suitable caching strategies
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
66. Try it!
● SQUIN http://squin.org
● Provides SWClLib functionality as a Web service
● Accessible like a SPARQL endpoint
● Public SQUIN service at
http://squin.informatik.hu-berlin.de/SQUIN/
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
67. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data