Efficient Query Answering against
Dynamic RDF Databases
François Goasdoué, Ioana Manolescu, Alexandra Roati¸s
Université Paris-Sud & Inria Saclay (OAK project)
20 March 2013
Overview
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35
The Resource Description Framework
Basic Graph Pattern Queries
Contributions
Experiments
Related Work
Conclusion
The Resource Description Framework
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model
⊲ W3C standard
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model
⊲ W3C standard
RDF Graph:
⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L
U – URIs, L – literals (constants), B – blank nodes
the subject s
has the property p
with the value: the object o
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model
⊲ W3C standard
RDF Graph:
⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L
U – URIs, L – literals (constants), B – blank nodes
the subject s
has the property p
with the value: the object o
⊲ built-in property: rdf:type
specify to which classes a resource belongs
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model
⊲ W3C standard
RDF Graph:
⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L
U – URIs, L – literals (constants), B – blank nodes
the subject s
has the property p
with the value: the object o
⊲ built-in property: rdf:type
specify to which classes a resource belongs
Constructor Triple Relational notation
Class assertion s rdf:type o o(s)
Property assertion s p o p(s, o)
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF
⊲ support unknown URI/literal tokens
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF
⊲ support unknown URI/literal tokens
Example:
the country of _:b1 is Italy
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF
⊲ support unknown URI/literal tokens
Example:
the country of _:b1 is Italy
the city of the same _:b1 is Genoa
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF
⊲ support unknown URI/literal tokens
Example:
the country of _:b1 is Italy
the city of the same _:b1 is Genoa
the population of Genoa is an unspecified value _:b2
Running example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
RDF Schema (RDFS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35
⊲ feature of RDF
⊲ enhance the descriptions in graphs
⊲ declare semantic constraints between classes and properties
RDF Schema (RDFS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35
⊲ feature of RDF
⊲ enhance the descriptions in graphs
⊲ declare semantic constraints between classes and properties
Built-in properties:
⊲ subclass relationships: rdfs:subClassOf
⊲ subproperty relationships: rdfs:subPropertyOf
⊲ typing the first attribute (domain) of a property: rdfs:domain
⊲ typing the second attribute (range) of a property: rdfs:range
Constructor Triple Relational notation
Subclass constraint s rdfs:subClassOf o s ⊆ o
Subproperty constraint s rdfs:subPropertyOf o s ⊆ o
Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o
Range typing constraint s rdfs:range o Πrange(s) ⊆ o
Running example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
The RDF data model is based on the open-world assumption.
→ deductive constraints – implicitly propagate tuples
Implicit triples
→ considered part of the graph – not explicitly present
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
The RDF data model is based on the open-world assumption.
→ deductive constraints – implicitly propagate tuples
Implicit triples
→ considered part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples & some entailment rules
derive implicit information
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
The RDF data model is based on the open-world assumption.
→ deductive constraints – implicitly propagate tuples
Implicit triples
→ considered part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples & some entailment rules
derive implicit information
Exhaustive application of entailment rules → saturation (a.k.a. closure)
The saturation of a graph is unique (up to blank node renaming).
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
The RDF data model is based on the open-world assumption.
→ deductive constraints – implicitly propagate tuples
Implicit triples
→ considered part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples & some entailment rules
derive implicit information
Exhaustive application of entailment rules → saturation (a.k.a. closure)
The saturation of a graph is unique (up to blank node renaming).
Entailment is part of the RDF specification itself.
The semantics of an RDF graph is its saturation.
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
3)
book1writtenIn
Book English
rdfs:domain writtenIn
rdf:type
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
3)
book1writtenIn
Book English
rdfs:domain writtenIn
rdf:type
4)
book1writtenIn
Language English
rdfs:range writtenIn
rdf:type
Basic Graph Pattern Queries
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP – conjunction of triple patterns (or triples)
q(¯x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L
¯x ∈ V (distinguished variables)
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP – conjunction of triple patterns (or triples)
q(¯x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L
¯x ∈ V (distinguished variables)
query evaluation treats blank nodes in a query as
non-distinguished variables
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP – conjunction of triple patterns (or triples)
q(¯x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L
¯x ∈ V (distinguished variables)
query evaluation treats blank nodes in a query as
non-distinguished variables
Example:
q(x, y):- x hasAuthor z, x rdf:type y
≡
q(x, y):- x hasAuthor _:b0, x rdf:type y
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
query evaluation = query answering
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
query evaluation = query answering
the evaluation of a query only uses the graph’s explicit triples
may lead to an incomplete answer set
the (complete) answer set is obtained by evaluating the query
against the graph’s saturation
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
query evaluation = query answering
the evaluation of a query only uses the graph’s explicit triples
may lead to an incomplete answer set
the (complete) answer set is obtained by evaluating the query
against the graph’s saturation
Solution:
decouple RDF entailment from query evaluation
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
query evaluation = query answering
the evaluation of a query only uses the graph’s explicit triples
may lead to an incomplete answer set
the (complete) answer set is obtained by evaluating the query
against the graph’s saturation
Solution:
decouple RDF entailment from query evaluation
Perform a pre-processing step to deal with entailed triples:
⊲ on the database – data saturation
⊲ on the queries – query reformulation
Data saturation vs. Query reformulation
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35
Data saturation
Advantages:
⊲ straightforward
⊲ easy to implement
Drawbacks:
⊲ computation time
⊲ additional storage space
⊲ must be recomputed upon
database updates
Example:
the YAGO2 dataset doubles in
size when computing the
RDFS-closure
→ 33M to 64M triples
Query reformulation
Advantages:
⊲ database saturation does not need
to be (re)computed
Drawbacks:
⊲ every incoming query must be
reformulated
⊲ reformulations can be
prohibitively large
⊲ difficult to optimize
Example:
a single atom query over YAGO2,
can yield of union of > 300 000
queries
Contributions
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDF
extending previously studied fragments by the support of blank nodes
2. Novel BGP query answering techniques for this DB fragment
designed to work on top of on any standard conjunctive query processor
(i) an efficient incremental RDF saturation maintenance algorithm
(ii) a novel reformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDF
extending previously studied fragments by the support of blank nodes
2. Novel BGP query answering techniques for this DB fragment
designed to work on top of on any standard conjunctive query processor
(i) an efficient incremental RDF saturation maintenance algorithm
(ii) a novel reformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDF
extending previously studied fragments by the support of blank nodes
2. Novel BGP query answering techniques for this DB fragment
designed to work on top of on any standard conjunctive query processor
(i) an efficient incremental RDF saturation maintenance algorithm
(ii) a novel reformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDF
extending previously studied fragments by the support of blank nodes
2. Novel BGP query answering techniques for this DB fragment
designed to work on top of on any standard conjunctive query processor
(i) an efficient incremental RDF saturation maintenance algorithm
(ii) a novel reformulation-based query answering algorithm
3. Thorough performance comparison and analysis
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment to RDFS entailment
⊲ does not restrict graphs in any way
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment to RDFS entailment
⊲ does not restrict graphs in any way
An RDF database: db = D, S
D & S – disjoint sets of triples
D (RDF) – instance level → assertions
S (RDFS) – schema level → semantics
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment to RDFS entailment
⊲ does not restrict graphs in any way
An RDF database: db = D, S
D & S – disjoint sets of triples
D (RDF) – instance level → assertions
S (RDFS) – schema level → semantics
db =
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0 Language
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
,
Book
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
∪
q(x, Publication):- x rdf:type Publication
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
∪
q(x, Publication):- x rdf:type Publication
∪
q(x, Publication):- x rdf:type Book
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
∪
q(x, Publication):- x rdf:type Publication
∪
q(x, Publication):- x rdf:type Book
∪
q(x, Publication):- x writtenIn z
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
∪
q(x, Publication):- x rdf:type Publication
∪
q(x, Publication):- x rdf:type Book
∪
q(x, Publication):- x writtenIn z
∪ . . . ∪
q(x, _:b1):- x rdf:type _:b1
∪ . . .
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
≡
q(x, _:b1):- x rdf:type z
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the evaluations of these queries
on db produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
≡
q(x, _:b1):- x rdf:type z
Answer set: { book1, _:b1 , English, _:b1 }
wrong answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the non-standard evaluations
of these queries on db produces the correct
answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
≡
q(x, _:b1):- x rdf:type z
Answer set: { book1, _:b1 }
correct answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm (13 reformulation rules)
⊲ reformulates q into a set of queries s.t.
the union of the non-standard evaluations
of these queries on db produces the correct
answer
⊲ size of the output: O((6 ∗ #db2)#q)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
⊲ explicitly adds to db all its implicit triples
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
⊲ explicitly adds to db all its implicit triples
Saturate(db) = db ∪
book1
Language
Publication
_:b1
English
rdf:type
rdf:type
hasLanguage
rdf:type
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
⊲ explicitly adds to db all its implicit triples
⊲ size of the output: O(#db2
)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
⊲ explicitly adds to db all its implicit triples
⊲ size of the output: O(#db2
)
⊲ computation time: O(#db3)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
⊲ explicitly adds to db all its implicit triples
⊲ size of the output: O(#db2
)
⊲ computation time: O(#db3)
What about updates?
Saturation maintenance algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35
Saturate+(db)
⊲ multiset variant of Saturate(db)
⊲ allows saturation maintenance upon updates
Saturation maintenance algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35
Saturate+(db)
⊲ multiset variant of Saturate(db)
⊲ allows saturation maintenance upon updates
Saturate+(db) = db ∪
Book
book1
Language
Publication
_:b1
English
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To insert the triple:
book1 French
writtenIn
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To insert the triple:
book1 French
writtenIn
First saturate the triple using db:
book1
Language
Book
Publication
_:b1
French
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
French
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
hasLanguage
rdf:type
writtenIn
To insert the triple:
book1 French
writtenIn
First saturate the triple using db:
book1
Language
Book
Publication
_:b1
French
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Then
insert the explicit triple
and
the inferred ones in db.
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenIn
rdfs:domain
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenIn
rdfs:domain
First infer affected data triples
using db:
book1
Book
Publication
_:b1
rdf:type
rdf:type
rdf:type
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenIn
rdfs:domain
First infer affected data triples
using db:
book1
Book
Publication
_:b1
rdf:type
rdf:type
rdf:type
Then
delete the explicit triple
and
the inferred ones from db.
Experiments
Experimental setup
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35
• implementation in Java 1.6
• deployed on top of a PostgreSQL v8.5 server
• 6 indexes – all permutations of the (s, p, o) columns
• the spo index is clustering
• dictionary encoding
Graph characteristics and saturation times:
Graph Storage Barton DBpedia DBLP
#Schema in memory 101 5, 666 41
#Instance Triple(s, p, o) 34 × 106 27 × 106 8.4 × 106
#Saturation Sat(s, p, o) 39 × 106 30 × 106 12 × 106
Saturation increase (%) 14.91 10.65 41.05
#Multiset SatM(s, p, o, isExp, count) 73.5 × 106 66 × 106 18.7 × 106
Multiset increase (%) 116.89 227.37 121.97
tsat (s) 4, 294 2, 742 748
tsat+ (s) 4, 586 2, 977 799
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35
• 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average)
• similar query answering times on Sat and SatM
ABCDEFB
EB E B F DE B DE
EB E B F DE D DE
EB E B F DE B DE D B
Graph updates
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35
• no impact on reformulation
• saturation needs to maintain SatM
• insertions & deletions
• updates of one triple on the data and the schema
ABCDEFB
EB EC EB DE A
EB EC F DE A
EB EC EB DE A
EB EC F DE A
BC EB DE A
BC F DE A
Saturation thresholds
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35
The saturation threshold of a query q (st(q)):
the smallest integer n s.t.
n × tref
(q) > n × tsat
(q) + tsat+
tref
(q) – time to answer q through reformulation (using Triple)
tsat
(q) – time to answer q based on saturation (using SatM)
tsat+ – time to saturate db (create SatM)
AB
CDE CDF D AB D AB C F DC F DF
D AB C F DC B A DF D AB C C F DF
D AB C C B A DF
Related Work
Outline of the positioning of our work
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35
Query language
expressive power
SPARQL
BGP queries
relational
conjunctive
queries RDF fragment
expressive powerDL DB
[1, 3, 5]
[4, 6, 7]
[2]
this
work
[1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web. JODS 8 (2007).
[2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. In Reasoning Web (2009).
[3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering in
description logics: The DL-Lite family. Journal of Automated Reasoning (JAR) 39, 3 (2007).
[4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases. PVLDB (2011).
[5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. In ICDE (2011). Keynote.
[6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. In ISWC (2008).
[7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very large
knowledge bases. In ISWC (2011).
Conclusion
Conclusion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35
Summary:
⊲ RDF fragment (extending those studied in the literature)
⊲ novel saturation- and reformulation-based query answering techniques
robust to instance and schema updates
⊲ algorithms directly deployable on top of any RDBMS
⊲ thorough performance comparison and analysis
Conclusion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35
Summary:
⊲ RDF fragment (extending those studied in the literature)
⊲ novel saturation- and reformulation-based query answering techniques
robust to instance and schema updates
⊲ algorithms directly deployable on top of any RDBMS
⊲ thorough performance comparison and analysis
Future work:
An automated strategy to choose between the two techniques:
Saturate+(db) / Reformulate(q, db)
Thank you!
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35
I you attention
Question
_:b1
_:b2
_:b3
thank
pay
ask
ask
ask
rdf:type
rdf:type
rdf:type
Open-world interpretation of RDFS constraints
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35
Constraint interpretation:
⊲ closed-world assumption (CWA)
any fact not present in the database is assumed not to hold
database facts do not respect a constraint → inconsistency
R1 ⊆ R2 – any tuple in the relation R1 must also be in the relation R2
⊲ open-world assumption (OWA)
facts may hold even though they are not in the database
R1 ⊆ R2 – any tuple in the relation R1 is also in the relation R2
The RDF data model is based on OWA.
RDF meets Relational Database Management Systems (RDBMS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35
RDF graphs:
incomplete relational databases based on V-tables
V-tables:
allow using variables in their tuples
using a variable multiple times allows expressing joins on unknown values
BGP query answering boils down to
conjunctive query evaluation on a saturated database.
Saturation (related work)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35
• J. Broekstra and A. Kampman
“Inferencing and truth maintenance in RDF Schema: Exploring a naive
practical approach”
in PSSS Workshop, 2003.
• B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov
“OWLIM: A family of scalable semantic repositories”
Semantic Web, vol. 2, no. 1, 2011.
• C. Gutierrez, C. A. Hurtado, and A. A. Vaisman
“RDFS update: From theory to practice”
in ESWC, 2011.

Efficient Query Answering against Dynamic RDF Databases

  • 1.
    Efficient Query Answeringagainst Dynamic RDF Databases François Goasdoué, Ioana Manolescu, Alexandra Roati¸s Université Paris-Sud & Inria Saclay (OAK project) 20 March 2013
  • 2.
    Overview EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 2 / 35 The Resource Description Framework Basic Graph Pattern Queries Contributions Experiments Related Work Conclusion
  • 3.
  • 4.
    The Resource DescriptionFramework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard
  • 5.
    The Resource DescriptionFramework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o
  • 6.
    The Resource DescriptionFramework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs
  • 7.
    The Resource DescriptionFramework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs Constructor Triple Relational notation Class assertion s rdf:type o o(s) Property assertion s p o p(s, o)
  • 8.
    Blank nodes EDBT 2013Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens
  • 9.
    Blank nodes EDBT 2013Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy
  • 10.
    Blank nodes EDBT 2013Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa
  • 11.
    Blank nodes EDBT 2013Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa the population of Genoa is an unspecified value _:b2
  • 12.
    Running example EDBT 2013Efficient Query Answering against Dynamic RDF Databases 6 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  • 13.
    RDF Schema (RDFS) EDBT2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties
  • 14.
    RDF Schema (RDFS) EDBT2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties Built-in properties: ⊲ subclass relationships: rdfs:subClassOf ⊲ subproperty relationships: rdfs:subPropertyOf ⊲ typing the first attribute (domain) of a property: rdfs:domain ⊲ typing the second attribute (range) of a property: rdfs:range Constructor Triple Relational notation Subclass constraint s rdfs:subClassOf o s ⊆ o Subproperty constraint s rdfs:subPropertyOf o s ⊆ o Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o Range typing constraint s rdfs:range o Πrange(s) ⊆ o
  • 15.
    Running example EDBT 2013Efficient Query Answering against Dynamic RDF Databases 8 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  • 16.
    Open-world assumption andRDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present
  • 17.
    Open-world assumption andRDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information
  • 18.
    Open-world assumption andRDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming).
  • 19.
    Open-world assumption andRDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming). Entailment is part of the RDF specification itself. The semantics of an RDF graph is its saturation.
  • 20.
    Entailment rules byexample EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
  • 21.
    Entailment rules byexample EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type
  • 22.
    Entailment rules byexample EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage
  • 23.
    Entailment rules byexample EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type
  • 24.
    Entailment rules byexample EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type 4) book1writtenIn Language English rdfs:range writtenIn rdf:type
  • 25.
  • 26.
    Basic Graph Pattern(BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables)
  • 27.
    Basic Graph Pattern(BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables
  • 28.
    Basic Graph Pattern(BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables Example: q(x, y):- x hasAuthor z, x rdf:type y ≡ q(x, y):- x hasAuthor _:b0, x rdf:type y
  • 29.
    Query answering EDBT 2013Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering
  • 30.
    Query answering EDBT 2013Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation
  • 31.
    Query answering EDBT 2013Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation
  • 32.
    Query answering EDBT 2013Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation Perform a pre-processing step to deal with entailed triples: ⊲ on the database – data saturation ⊲ on the queries – query reformulation
  • 33.
    Data saturation vs.Query reformulation EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35 Data saturation Advantages: ⊲ straightforward ⊲ easy to implement Drawbacks: ⊲ computation time ⊲ additional storage space ⊲ must be recomputed upon database updates Example: the YAGO2 dataset doubles in size when computing the RDFS-closure → 33M to 64M triples Query reformulation Advantages: ⊲ database saturation does not need to be (re)computed Drawbacks: ⊲ every incoming query must be reformulated ⊲ reformulations can be prohibitively large ⊲ difficult to optimize Example: a single atom query over YAGO2, can yield of union of > 300 000 queries
  • 34.
  • 35.
    Contributions EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • 36.
    Contributions EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • 37.
    Contributions EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • 38.
    Contributions EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • 39.
    The database (DB)fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way
  • 40.
    The database (DB)fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics
  • 41.
    The database (DB)fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics db = book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 Language hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type , Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf
  • 42.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules)
  • 43.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer
  • 44.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y
  • 45.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication
  • 46.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book
  • 47.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z
  • 48.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z ∪ . . . ∪ q(x, _:b1):- x rdf:type _:b1 ∪ . . .
  • 49.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1
  • 50.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z
  • 51.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 , English, _:b1 } wrong answer
  • 52.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 } correct answer
  • 53.
    Query reformulation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer ⊲ size of the output: O((6 ∗ #db2)#q)
  • 54.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules)
  • 55.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples
  • 56.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples Saturate(db) = db ∪ book1 Language Publication _:b1 English rdf:type rdf:type hasLanguage rdf:type
  • 57.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 )
  • 58.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3)
  • 59.
    Database saturation algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3) What about updates?
  • 60.
    Saturation maintenance algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates
  • 61.
    Saturation maintenance algorithm EDBT2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates Saturate+(db) = db ∪ Book book1 Language Publication _:b1 English rdf:type rdf:type rdf:type hasLanguage rdf:type
  • 62.
    Example of instanceinsertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn
  • 63.
    Example of instanceinsertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type
  • 64.
    Example of instanceinsertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English French _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type hasLanguage rdf:type writtenIn To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type Then insert the explicit triple and the inferred ones in db.
  • 65.
    Example of schemadeletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain
  • 66.
    Example of schemadeletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type
  • 67.
    Example of schemadeletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type Then delete the explicit triple and the inferred ones from db.
  • 68.
  • 69.
    Experimental setup EDBT 2013Efficient Query Answering against Dynamic RDF Databases 24 / 35 • implementation in Java 1.6 • deployed on top of a PostgreSQL v8.5 server • 6 indexes – all permutations of the (s, p, o) columns • the spo index is clustering • dictionary encoding Graph characteristics and saturation times: Graph Storage Barton DBpedia DBLP #Schema in memory 101 5, 666 41 #Instance Triple(s, p, o) 34 × 106 27 × 106 8.4 × 106 #Saturation Sat(s, p, o) 39 × 106 30 × 106 12 × 106 Saturation increase (%) 14.91 10.65 41.05 #Multiset SatM(s, p, o, isExp, count) 73.5 × 106 66 × 106 18.7 × 106 Multiset increase (%) 116.89 227.37 121.97 tsat (s) 4, 294 2, 742 748 tsat+ (s) 4, 586 2, 977 799
  • 70.
    Query answering EDBT 2013Efficient Query Answering against Dynamic RDF Databases 25 / 35 • 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average) • similar query answering times on Sat and SatM ABCDEFB EB E B F DE B DE EB E B F DE D DE EB E B F DE B DE D B
  • 71.
    Graph updates EDBT 2013Efficient Query Answering against Dynamic RDF Databases 26 / 35 • no impact on reformulation • saturation needs to maintain SatM • insertions & deletions • updates of one triple on the data and the schema ABCDEFB EB EC EB DE A EB EC F DE A EB EC EB DE A EB EC F DE A BC EB DE A BC F DE A
  • 72.
    Saturation thresholds EDBT 2013Efficient Query Answering against Dynamic RDF Databases 27 / 35 The saturation threshold of a query q (st(q)): the smallest integer n s.t. n × tref (q) > n × tsat (q) + tsat+ tref (q) – time to answer q through reformulation (using Triple) tsat (q) – time to answer q based on saturation (using SatM) tsat+ – time to saturate db (create SatM) AB CDE CDF D AB D AB C F DC F DF D AB C F DC B A DF D AB C C F DF D AB C C B A DF
  • 73.
  • 74.
    Outline of thepositioning of our work EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35 Query language expressive power SPARQL BGP queries relational conjunctive queries RDF fragment expressive powerDL DB [1, 3, 5] [4, 6, 7] [2] this work [1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web. JODS 8 (2007). [2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. In Reasoning Web (2009). [3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning (JAR) 39, 3 (2007). [4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases. PVLDB (2011). [5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. In ICDE (2011). Keynote. [6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. In ISWC (2008). [7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases. In ISWC (2011).
  • 75.
  • 76.
    Conclusion EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis
  • 77.
    Conclusion EDBT 2013 EfficientQuery Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis Future work: An automated strategy to choose between the two techniques: Saturate+(db) / Reformulate(q, db)
  • 78.
    Thank you! EDBT 2013Efficient Query Answering against Dynamic RDF Databases 32 / 35 I you attention Question _:b1 _:b2 _:b3 thank pay ask ask ask rdf:type rdf:type rdf:type
  • 79.
    Open-world interpretation ofRDFS constraints EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35 Constraint interpretation: ⊲ closed-world assumption (CWA) any fact not present in the database is assumed not to hold database facts do not respect a constraint → inconsistency R1 ⊆ R2 – any tuple in the relation R1 must also be in the relation R2 ⊲ open-world assumption (OWA) facts may hold even though they are not in the database R1 ⊆ R2 – any tuple in the relation R1 is also in the relation R2 The RDF data model is based on OWA.
  • 80.
    RDF meets RelationalDatabase Management Systems (RDBMS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35 RDF graphs: incomplete relational databases based on V-tables V-tables: allow using variables in their tuples using a variable multiple times allows expressing joins on unknown values BGP query answering boils down to conjunctive query evaluation on a saturated database.
  • 81.
    Saturation (related work) EDBT2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35 • J. Broekstra and A. Kampman “Inferencing and truth maintenance in RDF Schema: Exploring a naive practical approach” in PSSS Workshop, 2003. • B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov “OWLIM: A family of scalable semantic repositories” Semantic Web, vol. 2, no. 1, 2011. • C. Gutierrez, C. A. Hurtado, and A. A. Vaisman “RDFS update: From theory to practice” in ESWC, 2011.