(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
1. An Overview on
Linked Data Management
and SPARQL Querying
Olaf Hartig
http://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research Group
Humboldt-Universität zu Berlin
2. Outline
Part 1 Introduction to SPARQL
Part 2 Storage Solutions
for RDF Data
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 2
3. Outline
Part 1 Introduction to SPARQL
Part 2 Storage do express
How Solutions
declarative queries
for RDF Data
over RDF data?
(A user perspective)
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 3
4. SPARQL in General
● A family of W3C recommendations
● SPARQL Query
● Declarative query language for RDF data
● Our focus today
● SPARQL Update
● Declarative update language for RDF data
● SPARQL Protocol
● Communication between SPARQL processing
services (a.k.a. SPARQL endpoints) and clients
● SPARQL Query Results XML Format
● XML format for serializing query results
Olaf Hartig - Linked Data Management and SPARQL Querying 4
5. Main Idea of SPARQL Queries
● Pattern matching:
● Describe subgraphs of the queried RDF graph
● Subgraphs that match your description contribute an answer
● Mean: graph patterns (i.e. RDF graphs with variables)
?v rdf:type
umbel-sc:Volcano
Olaf Hartig - Linked Data Management and SPARQL Querying 5
6. Main Idea of SPARQL Queries
Queried RDF graph:
rdf:type
dbpedia:Mount_Baker umbel-sc:Volcano
p:lastEruption rdf:type
"1880" dbpedia:Mount_Etna
?v rdf:type
umbel-sc:Volcano
Result:
?v
dbpedia:Mount_Baker
dbpedia:Mount_Etna
Olaf Hartig - Linked Data Management and SPARQL Querying 6
7. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Prologue:
● Prefix definitions for using compact URIs (CURIEs)
● Attention (difference to Turtle): No period (“.”) character as
separator
Olaf Hartig - Linked Data Management and SPARQL Querying 7
8. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Result form specification:
● SELECT, DESCRIBE, CONSTRUCT, or ASK
(more about that later)
Olaf Hartig - Linked Data Management and SPARQL Querying 8
9. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Dataset specification:
● Specify the RDF dataset to be queried
(more about that later)
Olaf Hartig - Linked Data Management and SPARQL Querying 9
10. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Query Pattern:
● WHERE clause specifies the graph pattern to be matched
Olaf Hartig - Linked Data Management and SPARQL Querying 10
11. Graph Patterns
● Different types of graph patterns for the query pattern
(WHERE clause):
● Basic graph pattern (BGP)
● Group graph pattern
● Optional graph pattern
● Union graph pattern
● Graph graph pattern
● (Constraints)
Olaf Hartig - Linked Data Management and SPARQL Querying 11
12. Basic Graph Patterns
● Set of triple patterns (i.e. RDF triples with variables)
● Variable names prefixed with “?” or “$” (e.g. ?v, $v)
● Turtle syntax
● Including syntactical sugar (e.g. property and object lists)
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT
SELECT ?name
?name
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
?v rdfs:label ?name .
?v rdfs:label ?name .
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 12
13. Basic Graph Patterns
● Set of triple patterns (i.e. RDF triples with variables)
● Variable names prefixed with “?” or “$” (e.g. ?v, $v)
● Turtle syntax
● Including syntactical sugar (e.g. property and object lists)
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT
SELECT ?name
?name
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name .
rdfs:label ?name .
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 13
14. Basic Graph Patterns (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data*
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
● Question: What are the names of all (known) volcanos?
SELECT ?name WHERE {
SELECT ?name WHERE { Query*
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name . }
rdfs:label ?name . }
*Prefix definitions omitted
Olaf Hartig - Linked Data Management and SPARQL Querying 14
15. Basic Graph Patterns (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data*
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
●Question: What are the names of all (known) volcanos?
Result:
SELECT ?name WHERE {
SELECT ?name WHERE { Query*
?v rdf:type umbel-sc:Volcano ;
?name
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name . }
rdfs:label ?name . }
"Etna"
"Бееренберг"@ru
*Prefix definitions omitted "Beerenberg"@en
Olaf Hartig - Linked Data Management and SPARQL Querying 15
16. Basic Graph Patterns (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
● Question: List all types of the volcano called “Beerenberg”
SELECT ?type WHERE {
SELECT ?type WHERE { Query
?type
?v rdf:type ?type ;
?v rdf:type ?type ;
rdfs:label "Beerenberg" .
rdfs:label "Beerenberg" .
}
}
Empty!
Olaf Hartig - Linked Data Management and SPARQL Querying 16
17. Basic Graph Patterns (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
● Question: List all types of the volcano called “Beerenberg”
SELECT ?type WHERE {
SELECT ?type WHERE { Query
?v rdf:type ?type ;
?v rdf:type ?type ; ?type
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en . umbel-sc:Volcano
}
} umbel-sc:NaturalElevation
Olaf Hartig - Linked Data Management and SPARQL Querying 17
18. Basic Graph Patterns (Example)
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano ; Data
p:location dbpedia:United_States .
p:location dbpedia:United_States .
dbpedia:United_States rdfs:label "United States" .
dbpedia:United_States rdfs:label "United States" .
● Question: Where are all (known) volcanos located?
(List the names of these locations)
● Blank nodes in SPARQL queries
● As subject or object of a triple pattern
● “Non-selectable” variables
SELECT ?name WHERE {
SELECT ?name WHERE { Query
?name
_:x rdf:type umbel-sc:Volcano ;
_:x rdf:type umbel-sc:Volcano ;
"United States"
p:location [ rdfs:label ?name ] . }
p:location [ rdfs:label ?name ] . }
Olaf Hartig - Linked Data Management and SPARQL Querying 18
20. Optional Graph Patterns
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
● Question: What are all (known) volcanos and their names?
SELECT ?v ?name WHERE {
SELECT ?v ?name WHERE { Query
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name . }
rdfs:label ?name . }
?v ?name
● Problem: Mount
dbpedia:Mount_Etna "Etna"
Baker missing
dbpedia:Beerenberg "Beerenberg"@en
(it has no name)
Olaf Hartig - Linked Data Management and SPARQL Querying 20
21. Optional Graph Patterns
● Keyword OPTIONAL allows for optional patterns
SELECT ?v ?name WHERE {
SELECT ?v ?name WHERE { Query
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
OPTIONAL { ?v rdfs:label ?name }
OPTIONAL { ?v rdfs:label ?name }
}
}
?v ?name
dbpedia:Mount_Etna "Etna"
dbpedia:Mount_Baker
dbpedia:Beerenberg "Beerenberg"@en
● Optional patterns may result in unbound variables
Olaf Hartig - Linked Data Management and SPARQL Querying 21
22. Union Graph Patterns
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" ;
rdfs:label "Etna" ;
p:location dbpedia:Italy .
p:location dbpedia:Italy .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano ;
p:location dbpedia:United_States .
p:location dbpedia:United_States .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
rdfs:label
rdfs:label "Beerenberg"@en ;
"Beerenberg"@en ;
p:location
p:location dbpedia:Norway .
dbpedia:Norway .
● Question: What volcanos are located in Italy or in Norway?
SELECT ?v WHERE {
SELECT ?v WHERE { Query
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
p:location
p:location ?
? . }
. }
Olaf Hartig - Linked Data Management and SPARQL Querying 22
23. Union Graph Patterns
● Union graph patterns allow for alternatives
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
{ ?v p:location dbpedia:Italy }
{ ?v p:location dbpedia:Italy }
UNION
UNION
{ ?v p:location dbpedia:Norway }
{ ?v p:location dbpedia:Norway }
}
} Query
Olaf Hartig - Linked Data Management and SPARQL Querying 23
24. Union Graph Patterns
● Union graph patterns allow for alternatives
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
{ ?v p:location dbpedia:Italy }
{ ?v p:location dbpedia:Italy }
UNION
UNION
{ ?v p:location dbpedia:Norway }
{ ?v p:location dbpedia:Norway }
}
} Query
SELECT
SELECT ?v WHERE
?v WHERE {
{ Query
{ ?v
{ ?v rdf:type
rdf:type umbel-sc:Volcano ;
umbel-sc:Volcano ;
p:location dbpedia:Italy }
p:location dbpedia:Italy }
Semantically UNION
UNION
equivalent { ?v rdf:type umbel-sc:Volcano
{ ?v rdf:type umbel-sc:Volcano ;
;
p:location dbpedia:Norway
p:location dbpedia:Norway }
}
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 24
25. Group Graph Patterns
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano .
{ ?v p:location dbpedia:Italy }
{ ?v p:location dbpedia:Italy }
UNION
UNION
{ ?v p:location dbpedia:Norway }
{ ?v p:location dbpedia:Norway }
}
} Query
Semantically equivalent
SELECT ?v WHERE { { ?v rdf:type umbel-sc:Volcano }
SELECT ?v WHERE { { ?v rdf:type umbel-sc:Volcano }
{ { ?v p:location dbpedia:Italy }
{ { ?v p:location dbpedia:Italy }
UNION
UNION
{ ?v p:location dbpedia:Norway } }
{ ?v p:location dbpedia:Norway } }
}
} Query
Olaf Hartig - Linked Data Management and SPARQL Querying 25
26. Constraints on Solutions
● Syntax: Keyword FILTER followed by filter expression
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX p: <http://dbpedia.org/property/>
PREFIX p: <http://dbpedia.org/property/>
SELECT ?v
SELECT ?v
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
p:lastEruption ?le .
p:lastEruption ?le .
FILTER ( ?le > 1900 )
FILTER ( ?le > 1900 )
}
}
● Filter expressions contain operators and functions
● Operators and functions operate on RDF terms
Olaf Hartig - Linked Data Management and SPARQL Querying 26
27. Constraints (Truth Table)
● Filter expressions evaluate to true, false, or error
● Truth table:
A B A || B A && B
T T T T
T F T F
F T T F
F F F F
T E T E
E T T E
F E E F
E F E F
E E E E
Olaf Hartig - Linked Data Management and SPARQL Querying 27
28. Unary Operators in Constraints
Operator Type(A) Result type
!A xsd:boolean xsd:boolean
+A numeric numeric
-A numeric numeric
BOUND(A) variable xsd:boolean
isURI(A) RDF term xsd:boolean
isBLANK(A) RDF term xsd:boolean
isLITERAL(A) RDF term xsd:boolean
STR(A) literal / URI simple literal
LANG(A) literal simple literal
DATATYPE(A) literal simple literal
Olaf Hartig - Linked Data Management and SPARQL Querying 28
29. Constraints (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
●Question: List all types of the volcano called “Beerenberg”
SELECT ?type WHERE {
SELECT ?type WHERE { Query ?type
?v rdf:type ?type ;
?v rdf:type ?type ; umbel-sc:Volcano
rdfs:label ?name .
rdfs:label ?name .
umbel-sc:NaturalElevation
FILTER ( STR(?name) = "Beerenberg" )
FILTER ( STR(?name) = "Beerenberg" )
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 29
30. Constraints (Further Operators)
● Binary operators:
● Logical connectives && and || for xsd:boolean
● Comparison operators =, !=, <, >, <=, and >= for numeric
datatypes, xsd:dateTime, xsd:string, and xsd:boolean
● Comparison operators = and != for other datatypes
● Arithmetic operators +, -, *, and / for numeric datatypes
● Furthermore:
● REGEX(String,Pattern) or REGEX(String,Pattern,Flags)
● sameTERM(A,B)
● langMATCHES(A,B)
Olaf Hartig - Linked Data Management and SPARQL Querying 30
31. Constraints (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
● Question: What volcanos have an “e” in their name?
SELECT ?v WHERE {
SELECT ?v WHERE { Query
?v
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name . dbpedia:Beerenberg
rdfs:label ?name .
dbpedia:Beerenberg
FILTER( REGEX(STR(?name),"e") )
FILTER( REGEX(STR(?name),"e") )
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 31
32. Constraints (Example)
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
● Question: What volcanos have an “e” in their name?
SELECT ?v WHERE {
SELECT ?v WHERE { Query
?v
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name . dbpedia:Mount_Etna
rdfs:label ?name .
dbpedia:Beerenberg
FILTER( REGEX(STR(?name),"e","i") )
FILTER( REGEX(STR(?name),"e","i") ) dbpedia:Beerenberg
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 32
33. Graph Graph Patterns
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Etna
dbpedia:Mount_Etna http://example.org/d1
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
Default
rdfs:label "Etna" .
rdfs:label "Etna" .
Graph
dbpedia:Mount_Baker http://example.org/d2
dbpedia:Mount_Baker
rdf:type umbel-sc:Volcano .
rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg http://example.org/d3
dbpedia:Beerenberg
● SPARQL queries rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
are executed over rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
an RDF dataset:
● One default graph and
● Zero or more named graphs (identified by an URI)
Olaf Hartig - Linked Data Management and SPARQL Querying 33
34. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Dataset specification:
● Specify the RDF dataset to be queried
(more about that later) here ...
● Specification using FROM and FROM NAMED
Olaf Hartig - Linked Data Management and SPARQL Querying 34
35. Graph Graph Patterns
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Etna
dbpedia:Mount_Etna http://example.org/d1
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
Default
rdfs:label "Etna" .
rdfs:label "Etna" .
Graph
dbpedia:Mount_Baker http://example.org/d2
dbpedia:Mount_Baker
rdf:type umbel-sc:Volcano .
rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg
dbpedia:Beerenberg http://example.org/d3
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
● Evaluation of patterns w.r.t. active graph
● GRAPH clause for making a named graph the active graph
Olaf Hartig - Linked Data Management and SPARQL Querying 35
43. Graph Graph Patterns
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Etna
dbpedia:Mount_Etna http://example.org/d1
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
Default
rdfs:label "Etna" .
rdfs:label "Etna" .
Graph
dbpedia:Mount_Baker http://example.org/d2
dbpedia:Mount_Baker
rdf:type umbel-sc:Volcano .
rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg
dbpedia:Beerenberg http://example.org/d3
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
● Question: Which named graphs contain the name of a
volcano that is not referenced in the default graph?
Olaf Hartig - Linked Data Management and SPARQL Querying 43
44. Graph Graph Patterns
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Etna rdfs:seeAlso <http://example.org/d1>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Baker rdfs:seeAlso <http://example.org/d2>.
dbpedia:Mount_Etna
dbpedia:Mount_Etna http://example.org/d1
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
Default
rdfs:label "Etna" .
rdfs:label "Etna" .
Graph
SELECT ?g WHERE
SELECT ?g WHERE {
dbpedia:Mount_Baker http://example.org/d2
{dbpedia:Mount_Baker
GRAPH ?g {
GRAPH ?g { rdf:type umbel-sc:Volcano .
rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
dbpedia:Beerenberg
dbpedia:Beerenberg http://example.org/d3
rdfs:label ?name .
rdfs:label ?name .
rdf:type umbel-sc:Volcano ;
rdf:type umbel-sc:Volcano ;
}}
rdfs:label "Beerenberg"@en .
OPTIONAL { ?v rdfs:seeAlso ?r } "Beerenberg"@en .
rdfs:label
OPTIONAL { ?v rdfs:seeAlso ?r }
● Question: ! BOUND(?r) ) graphs contain the name of a
FILTER ( Which named
FILTER ( ! BOUND(?r) )
}
} volcano that is not referenced in the default graph?
Olaf Hartig - Linked Data Management and SPARQL Querying 44
45. Summary – Graph Patterns
● Different types of graph patterns for the query pattern
(WHERE clause):
● Basic graph pattern (BGP)
● Group graph pattern
● Optional graph pattern – keyword OPTIONAL
● Union graph pattern – keyword UNION
● Graph graph pattern – keyword GRAPH
● Constraints – keyword FILTER
Olaf Hartig - Linked Data Management and SPARQL Querying 45
46. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Result form specification:
● SELECT, DESCRIBE, CONSTRUCT, or ASK
(more about that later)
^ here ...
Olaf Hartig - Linked Data Management and SPARQL Querying 46
47. Result Forms
● SELECT
● Result: sequence of solutions (i.e. sets of variable bindings)
● Selected variables separated by space (not by comma!)
● Asterisk character (“*”) selects all variables in the pattern
● ASK
● Check whether there is at least one result
● Result: true or false
● Example: Do we have data about volcanos?
ASK WHERE {
ASK WHERE { Query
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 47
48. Result Forms
● DESCRIBE
● Result: an RDF graph with data about resources
● Non-deterministic (i.e. query processor defines the actual
structure of the resulting RDF graph)
● Example: just name the resource
DESCRIBE <http://dbpedia.org/resource/Beerenberg>
DESCRIBE <http://dbpedia.org/resource/Beerenberg> Query
● Example: Specify the resource(s) with a query pattern
DESCRIBE ?v WHERE {
DESCRIBE ?v WHERE { Query
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name .
rdfs:label ?name . }
}
● Multiple variables possible or asterisk (“*”) for all
Olaf Hartig - Linked Data Management and SPARQL Querying 48
49. Result Forms
● CONSTRUCT
● Result: an RDF graph constructed from a template
● Template: graph pattern with variables from the query pattern
CONSTRUCT { ?v rdfs:label ?name ;
CONSTRUCT { ?v rdfs:label ?name ; Query
rdf:type myTypes:VolcanosOutsideTheUS
rdf:type myTypes:VolcanosOutsideTheUS
}
}
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name .
rdfs:label ?name .
OPTIONAL { ?v p:location
OPTIONAL { ?v p:location ?l
?l
FILTER ( ?l =
FILTER ( ?l = dbpedia:United_States ) }
dbpedia:United_States ) }
FILTER ( ! BOUND(?l) )
FILTER ( ! BOUND(?l) )
}
}
Olaf Hartig - Linked Data Management and SPARQL Querying 49
51. Components of SPARQL Queries
PREFIX
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
umbel-sc: <http://umbel.org/umbel/sc/>
SELECT ?v
SELECT ?v
FROM <http://example.org/myGeoData>
FROM <http://example.org/myGeoData>
WHERE {
WHERE {
?v rdf:type umbel-sc:Volcano .
?v rdf:type umbel-sc:Volcano .
}
}
ORDER BY ?name
ORDER BY ?name
● Solution modifiers:
● Only for SELECT queries
● Modify the result set as a whole (not single solutions)
● Keywords: DISTINCT, ORDER BY, LIMIT, and OFFSET
Olaf Hartig - Linked Data Management and SPARQL Querying 51
52. Solution Modifiers
● DISTINCT removes duplicates from the result set
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
?type
SELECT ?type
SELECT ?type Query
umbel-sc:Volcano
WHERE { _:x rdf:type ?type }
WHERE { _:x rdf:type ?type } umbel-sc:Volcano
umbel-sc:NaturalElevation
umbel-sc:Volcano
Olaf Hartig - Linked Data Management and SPARQL Querying 52
53. Solution Modifiers
● DISTINCT removes duplicates from the result set
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano.
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
dbpedia:Beerenberg rdf:type umbel-sc:Volcano,
umbel-sc:NaturalElevation ;
umbel-sc:NaturalElevation ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Beerenberg"@en ;
rdfs:label "Бееренберг"@ru .
rdfs:label "Бееренберг"@ru .
?type
SELECT DISTINCT ?type
SELECT DISTINCT ?type Query
umbel-sc:Volcano
WHERE { _:x rdf:type ?type }
WHERE { _:x rdf:type ?type } umbel-sc:NaturalElevation
Olaf Hartig - Linked Data Management and SPARQL Querying 53
54. Solution Modifiers
● ORDER BY orders the results
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano ;
SELECT ?v WHERE { ?v rdf:type umbel-sc:Volcano ; Query
rdfs:label ?name }
rdfs:label ?name }
ORDER BY ?name
ORDER BY ?name
● How do we order different kinds of elements?
unbound variable < blank node < URI < literal
● ASC for ascending (default) and DESC for descending
● Hierarchical order criteria:
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ; Query
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ;
p:lastEruption ?le ;
p:lastEruption ?le ;
rdfs:label ?name }
rdfs:label ?name }
ORDER BY DESC(?le), ?name
ORDER BY DESC(?le), ?name
Olaf Hartig - Linked Data Management and SPARQL Querying 54
55. Solution Modifiers
● LIMIT – limits the number of results
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ; Query
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name }
rdfs:label ?name }
ORDER
ORDER BY ?name
BY ?name
LIMIT
LIMIT 5
5
● OFFSET – position/index of the first reported results
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ; Query
SELECT ?name WHERE { ?v rdf:type umbel-sc:Volcano ;
rdfs:label ?name }
rdfs:label ?name }
ORDER
ORDER BY ?name
BY ?name
LIMIT
LIMIT 5 OFFSET 10
5 OFFSET 10
● Order of result should be predictable
(i.e. combine with ORDER BY)
Olaf Hartig - Linked Data Management and SPARQL Querying 55
57. Further Reading
● W3C SPARQL Working Group
http://www.w3.org/2001/sw/DataAccess/
● J. Pérez, M. Arenas, and C. Gutierrez: Semantics and
Complexity of SPARQL. ACM Transactions on Database
Systems 34(3), September 2009
● SPARQL interface
for DBpedia:
http://dbpedia.org/snorql/
Olaf Hartig - Linked Data Management and SPARQL Querying 57
58. Outline
Part 1 Introduction to SPARQL
Part 2 Storage Solutions
for RDF Data
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 58
59. Outline
Part 1 Introduction to SPARQL
Part 2 Storage Solutions
for RDF Data
How can we manage RDF data in
Part 3 Querying the an efficient and scalable way?
Web of (relational) DBMS, or
Use existing Linked Data
●
● Create a new, native DBMS for RDF
Olaf Hartig - Linked Data Management and SPARQL Querying 59
60. Outline
Part 1 Introduction to SPARQL
Part 2 Storage Solutions
for RDF Data
(RDF in a RDBMS)
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 60
61. Overview on RDF in RDBMS
● Idea: Use a specific relational schema for RDF data and
benefit from 40 years of research in the DB community
● 3 steps for SPARQL query processing:
1. Convert SPARQL query to SQL query (w.r.t. the schema)
2. Use RDBMS to answer the SQL query S P O
3. Generate SPARQL query result from
the SQL query result
S P1 ... Pi ... Pn
● Relational schemas for RDF:
● Triple table
● Property tables P1 ... Pi ... Pn
● Binary tables S O S O S O
● Hybrid
Olaf Hartig - Linked Data Management and SPARQL Querying 61
62. Triple Table (Basic Idea)
● Store all S P O
RDF triples
in a single
table
Olaf Hartig - Linked Data Management and SPARQL Querying 62
63. Triple Table (Basic Idea)
● Store all S P O
RDF triples dbpedia:Mount_Etna rdf:type umbel-sc:Volcano
in a single dbpedia:Mount_Etna rdfs:label "Etna"
table dbpedia:Mount_Baker rdf:type umbel-sc:Volcano
dbpedia:Beerenberg rdf:type umbel-sc:Volcano
dbpedia:Beerenberg rdfs:label "Beerenberg"@en
● Create indexes on combinations of S, P, and O
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; RDF Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
Olaf Hartig - Linked Data Management and SPARQL Querying 63
64. Triple Table (Querying)
● Store all S P O
RDF triples
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano
in a single
dbpedia:Mount_Etna rdfs:label "Etna"
table dbpedia:Mount_Baker rdf:type umbel-sc:Volcano
dbpedia:Beerenberg rdf:type umbel-sc:Volcano
SELECT ?name WHERE { SPARQL Query
dbpedia:Beerenberg
SELECT ?name WHERE { rdfs:label "Beerenberg"@en
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
● Create indexes on combinations of S, P, and O
rdfs:label ?name .
rdfs:label ?name .
}
} SELECT umbel-sc:Volcano ; SQL Query
SELECT t2.o AS name
dbpedia:Mount_Etna rdf:type t2.o AS name
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; RDF Data
FROM triples AS t1, triples AS t2
FROM triples AS t1, triples AS t2
rdfs:label "Etna" .
rdfs:label "Etna" .
WHERE t1.p = 'rdf:type' AND
dbpedia:Mount_Baker rdf:type umbel-sc:VolcanoAND
WHERE t1.p = 'rdf:type' .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
SPARQL t1.o = "umbel-sc:Volcano" AND
dbpedia:Beerenberg rdf:typet1.o = "umbel-sc:Volcano" AND
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
to SQL umbel-sc:Volcano ;
t2.p = 'rdfs:label' AND
rdfs:label "Beerenberg"@en .AND
t2.p = 'rdfs:label'
rdfs:label "Beerenberg"@en .
t1.s = t2.s
t1.s = t2.s
Olaf Hartig - Linked Data Management and SPARQL Querying 64
65. ID-based Triple Table
S P O
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano
dbpedia:Mount_Etna rdfs:label "Etna"
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano
dbpedia:Beerenberg rdf:type umbel-sc:Volcano
RDF Term dbpedia:Beerenberg
ID rdfs:label "Beerenberg"@en
● Use numerical identifier for
each RDF term in the dataset
● Advantages:
● Saves space
● Enhances efficiency
Olaf Hartig - Linked Data Management and SPARQL Querying 65
66. ID-based Triple Table
S P O
1 4 6
1 5 7
2 4 6
3 4 6
RDF Term ID 3 5 8
dbpedia:Mount_Etna 1
dbpedia:Mount_Baker 2 ● Use numerical identifier for
dbpedia:Beerenberg 3 each RDF term in the dataset
rdf:type 4 ● Advantages:
rdfs:label 5
● Saves space
umbel-sc:Volcano 6
"Etna" 7
● Enhances efficiency
"Beerenberg"@en 8
Olaf Hartig - Linked Data Management and SPARQL Querying 66
67. Quad Table
G S P O
100 1 4 6
100 1 5 7
101 2 4 6
102 3 4 6
RDF Term
100 ID 3 5 8
dbpedia:Mount_Etna 1
dbpedia:Mount_Baker 2 ● Storing multiple RDF graphs
dbpedia:Beerenberg 3 ● Use Cases:
rdf:type 4
rdfs:label 5
● Provenance
umbel-sc:Volcano 6 ● Versioning (multiple snapshots)
"Etna" 7 ● Contexts
"Beerenberg"@en 8
Olaf Hartig - Linked Data Management and SPARQL Querying 67
68. Pros and Cons of Triple Tables
● Pros:
● Easy to implement
● If predicates are selective, not expensive
● Cons:
● Many self joins are needed
● If predicates are not selective, very expensive
Olaf Hartig - Linked Data Management and SPARQL Querying 68
69. Property Tables (Basic Idea)
● Combining all (or some) properties
of similar subjects in n-ary tables
S rdf:type rdfs:label
dbpedia:Mount_Etna umbel-sc:Volcano "Etna"
dbpedia:Mount_Baker umbel-sc:Volcano NULL
dbpedia:Beerenberg umbel-sc:Volcano "Beerenberg"@en
● Also uses ID based encoding
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ;
dbpedia:Mount_Etna rdf:type umbel-sc:Volcano ; RDF Data
rdfs:label "Etna" .
rdfs:label "Etna" .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Mount_Baker rdf:type umbel-sc:Volcano .
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
dbpedia:Beerenberg rdf:type umbel-sc:Volcano ;
rdfs:label "Beerenberg"@en .
rdfs:label "Beerenberg"@en .
Olaf Hartig - Linked Data Management and SPARQL Querying 69
70. Pros and Cons of Property Tables
● Pros:
● Fewer joins
● If the data is structured, we have a relational DB
● Cons:
● Potentially a lot of NULLs
● Clustering is not trivial
● Multi-value properties are complicated
Olaf Hartig - Linked Data Management and SPARQL Querying 70
71. Binary Tables (Basic Idea)
● Also called vertical partitioning
● n two column tables (n is the number of unique
properties in the data)
● Also combined with rdf:type
ID based encoding S O
of RDF terms dbpedia:Mount_Etna umbel-sc:Volcano
dbpedia:Mount_Baker umbel-sc:Volcano
dbpedia:Beerenberg umbel-sc:Volcano
rdfs:label
S O
dbpedia:Mount_Etna "Etna"
dbpedia:Beerenberg "Beerenberg"@en
Olaf Hartig - Linked Data Management and SPARQL Querying 71
72. Row Store vs. Column Store
● Different physical storage models for relational DBs
● Row based storage:
● Tuples (i.e. DB records)
are stored consecutively
● Entire row needs to be
read even if few attributes
are projected
● Column based storage:
● Read columns relevant to the 1 2 1 2 1 2
query → projection is free 3 3 3
● Inserts are expensive
● Column based storage model fit binary tables very well
Olaf Hartig - Linked Data Management and SPARQL Querying 72
73. Performance Comparison
Copied from:
D.J. Abadi, A. Marcus, S.R.
Madden, and K. Hollenbach.
Scalable Semantic Web
Data Management Using
Vertical Partitioning. In Proc.
of VLDB 2007
● However: property tables may outperform triple table and
binary tables for RDF data with regular structure*
*J.J. Levandoski and M.F. Mokbel. RDF data-centric storage. ICWS 2009
Olaf Hartig - Linked Data Management and SPARQL Querying 73
74. Pros and Cons of Binary Tables
● Pros:
● Supports multi-value properties
● No NULLs
● Read only needed attributes (i.e. less I/O)
● No clustering
● Excellent performance (if number of properties
is small, queries with bounded properties)
● Cons:
● Expensive inserts
● Bad performance (large number of properties,
queries with unbounded properties)
Olaf Hartig - Linked Data Management and SPARQL Querying 74
75. Summary (RDF in RDBMS)
● Relational schemas for RDF: S P O
● Triple table S P1 ... Pi ... Pn
● Property tables
● Binary tables
● Hybrid P1 ... Pi ... Pn
S O S O S O
● None of these approaches
can compete with a purely
relational model:
“... there is still room for optimization in the proposed generic
relational RDF storage schemes and thus new techniques for
storing and querying RDF data are still required ...”*
* S. Sakr and G. Al-Naymat. Relational Processing of
RDF Queries: A Survey. SIGMOD Record 38(4), 2009
Olaf Hartig - Linked Data Management and SPARQL Querying 75
76. Outline
Part 1 Introduction to SPARQL
Part 2 Storage Solutions
for RDF Data
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 76
77. Outline
Part 1 Introduction to SPARQL
Interesting …
but, that was only about (local) RDF data.
Part 2 Storage Solutions Web?
What's about Linked Data on the
for RDF Data
Part 3 Querying the
Web of Linked Data
Olaf Hartig - Linked Data Management and SPARQL Querying 77
78. Can we query the
Web of Linked Data
as if it were a single
(gigantic) database?
SELECT DISTINCT ?i ?label
WHERE {
?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ;
foaf:topic_interest ?i .
}
OPTIONAL {
}
?i rdfs:label ?label
FILTER( LANG(?label)="en" || LANG(?label)="")
ORDER BY ?label
?
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 78
79. Warehousing Based Approach
Traditional approach 1:
Data centralization (e.g. data warehouse)
● Querying a collection of
copies from all relevant
datasets
+ Most efficient
– Misses unknown or new sources
– Collection possibly out of date
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
80. Query Federation Approach
Traditional approach 2:
Federated query processing ?
● Mediator distributes subqueries
to relevant sources and
integrates the results
?
+ Current data ? ?
– Sources must
provide a
query service
– Misses unknown
or new sources
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
81. Main Drawback
You have to know the relevant
data sources in advance.
You restrict yourself to
the selected sources.
You do not tap the
full potential of
the Web !
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
82. A Novel Approach
*
Link Traversal Based Query Execution
* First Publication: O. Hartig, C. Bizer, and J.-C. Freytag. Executing SPARQL
Queries over the Web of Linked Data. In Proc. of ISWC 2009
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
83. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
84. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
85. Main Idea
● Intertwine query evaluation with traversal of data links
We alternate between:
htt
●
p:/
/.
Evaluate parts of the query (triple patterns)
../m ?
●
on a continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
86. Main Idea
● Intertwine query evaluation with traversal of data links
We alternate between:
htt
●
p:/
/.
Evaluate parts of the query (triple patterns)
../m ?
●
on a continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
87. Main Idea
● Intertwine query evaluation with traversal of data links
We alternate between:
htt
●
p:/
/.
Evaluate parts of the query (triple patterns)
../m ?
●
on a continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
88. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
89. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://.../movie2449
Query
http://.../movie2449 in
t or_
film http://mdb.../Paul ac
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
90. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://.../movie2449
Query
http://.../movie2449 in
t or_
film http://mdb.../Paul ac
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
91. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
? aul
P
on a continuously augmented set of data
.../
db
/m
Look up URIs in intermediate
p:/
●
htt
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
92. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
? aul
P
on a continuously augmented set of data
.../
db
/m
Look up URIs in intermediate
p:/
●
htt
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
93. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
? aul
P
on a continuously augmented set of data
.../
db
/m
Look up URIs in intermediate
p:/
●
htt
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
94. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data
95. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - A Novel Approach to Query the Web of Linked Data