Efficient Query Answering against Dynamic RDF Databases
Upcoming SlideShare
Loading in...5
×
 

Efficient Query Answering against Dynamic RDF Databases

on

  • 111 views

 

Statistics

Views

Total Views
111
Views on SlideShare
109
Embed Views
2

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Efficient Query Answering against Dynamic RDF Databases Efficient Query Answering against Dynamic RDF Databases Presentation Transcript

  • Efficient Query Answering against Dynamic RDF Databases François Goasdoué, Ioana Manolescu, Alexandra Roati¸s Université Paris-Sud & Inria Saclay (OAK project) 20 March 2013
  • Overview EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35 The Resource Description Framework Basic Graph Pattern Queries Contributions Experiments Related Work Conclusion
  • The Resource Description Framework
  • The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard
  • The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o
  • The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs
  • The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs Constructor Triple Relational notation Class assertion s rdf:type o o(s) Property assertion s p o p(s, o)
  • Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens
  • Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy
  • Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa
  • Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa the population of Genoa is an unspecified value _:b2
  • Running example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  • RDF Schema (RDFS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties
  • RDF Schema (RDFS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties Built-in properties: ⊲ subclass relationships: rdfs:subClassOf ⊲ subproperty relationships: rdfs:subPropertyOf ⊲ typing the first attribute (domain) of a property: rdfs:domain ⊲ typing the second attribute (range) of a property: rdfs:range Constructor Triple Relational notation Subclass constraint s rdfs:subClassOf o s ⊆ o Subproperty constraint s rdfs:subPropertyOf o s ⊆ o Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o Range typing constraint s rdfs:range o Πrange(s) ⊆ o
  • Running example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  • Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present
  • Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information
  • Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming).
  • Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming). Entailment is part of the RDF specification itself. The semantics of an RDF graph is its saturation.
  • Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
  • Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type
  • Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage
  • Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type
  • Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type 4) book1writtenIn Language English rdfs:range writtenIn rdf:type
  • Basic Graph Pattern Queries
  • Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables)
  • Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables
  • Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables Example: q(x, y):- x hasAuthor z, x rdf:type y ≡ q(x, y):- x hasAuthor _:b0, x rdf:type y
  • Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering
  • Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation
  • Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation
  • Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation Perform a pre-processing step to deal with entailed triples: ⊲ on the database – data saturation ⊲ on the queries – query reformulation
  • Data saturation vs. Query reformulation EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35 Data saturation Advantages: ⊲ straightforward ⊲ easy to implement Drawbacks: ⊲ computation time ⊲ additional storage space ⊲ must be recomputed upon database updates Example: the YAGO2 dataset doubles in size when computing the RDFS-closure → 33M to 64M triples Query reformulation Advantages: ⊲ database saturation does not need to be (re)computed Drawbacks: ⊲ every incoming query must be reformulated ⊲ reformulations can be prohibitively large ⊲ difficult to optimize Example: a single atom query over YAGO2, can yield of union of > 300 000 queries
  • Contributions
  • Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  • The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way
  • The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics
  • The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics db = book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 Language hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type , Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules)
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z ∪ . . . ∪ q(x, _:b1):- x rdf:type _:b1 ∪ . . .
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 , English, _:b1 } wrong answer
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 } correct answer
  • Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer ⊲ size of the output: O((6 ∗ #db2)#q)
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules)
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples Saturate(db) = db ∪ book1 Language Publication _:b1 English rdf:type rdf:type hasLanguage rdf:type
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 )
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3)
  • Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3) What about updates?
  • Saturation maintenance algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates
  • Saturation maintenance algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates Saturate+(db) = db ∪ Book book1 Language Publication _:b1 English rdf:type rdf:type rdf:type hasLanguage rdf:type
  • Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn
  • Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type
  • Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English French _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type hasLanguage rdf:type writtenIn To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type Then insert the explicit triple and the inferred ones in db.
  • Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain
  • Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type
  • Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type Then delete the explicit triple and the inferred ones from db.
  • Experiments
  • Experimental setup EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35 • implementation in Java 1.6 • deployed on top of a PostgreSQL v8.5 server • 6 indexes – all permutations of the (s, p, o) columns • the spo index is clustering • dictionary encoding Graph characteristics and saturation times: Graph Storage Barton DBpedia DBLP #Schema in memory 101 5, 666 41 #Instance Triple(s, p, o) 34 × 106 27 × 106 8.4 × 106 #Saturation Sat(s, p, o) 39 × 106 30 × 106 12 × 106 Saturation increase (%) 14.91 10.65 41.05 #Multiset SatM(s, p, o, isExp, count) 73.5 × 106 66 × 106 18.7 × 106 Multiset increase (%) 116.89 227.37 121.97 tsat (s) 4, 294 2, 742 748 tsat+ (s) 4, 586 2, 977 799
  • Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35 • 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average) • similar query answering times on Sat and SatM ABCDEFB EB E B F DE B DE EB E B F DE D DE EB E B F DE B DE D B
  • Graph updates EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35 • no impact on reformulation • saturation needs to maintain SatM • insertions & deletions • updates of one triple on the data and the schema ABCDEFB EB EC EB DE A EB EC F DE A EB EC EB DE A EB EC F DE A BC EB DE A BC F DE A
  • Saturation thresholds EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35 The saturation threshold of a query q (st(q)): the smallest integer n s.t. n × tref (q) > n × tsat (q) + tsat+ tref (q) – time to answer q through reformulation (using Triple) tsat (q) – time to answer q based on saturation (using SatM) tsat+ – time to saturate db (create SatM) AB CDE CDF D AB D AB C F DC F DF D AB C F DC B A DF D AB C C F DF D AB C C B A DF
  • Related Work
  • Outline of the positioning of our work EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35 Query language expressive power SPARQL BGP queries relational conjunctive queries RDF fragment expressive powerDL DB [1, 3, 5] [4, 6, 7] [2] this work [1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web. JODS 8 (2007). [2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. In Reasoning Web (2009). [3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning (JAR) 39, 3 (2007). [4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases. PVLDB (2011). [5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. In ICDE (2011). Keynote. [6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. In ISWC (2008). [7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases. In ISWC (2011).
  • Conclusion
  • Conclusion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis
  • Conclusion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis Future work: An automated strategy to choose between the two techniques: Saturate+(db) / Reformulate(q, db)
  • Thank you! EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35 I you attention Question _:b1 _:b2 _:b3 thank pay ask ask ask rdf:type rdf:type rdf:type
  • Open-world interpretation of RDFS constraints EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35 Constraint interpretation: ⊲ closed-world assumption (CWA) any fact not present in the database is assumed not to hold database facts do not respect a constraint → inconsistency R1 ⊆ R2 – any tuple in the relation R1 must also be in the relation R2 ⊲ open-world assumption (OWA) facts may hold even though they are not in the database R1 ⊆ R2 – any tuple in the relation R1 is also in the relation R2 The RDF data model is based on OWA.
  • RDF meets Relational Database Management Systems (RDBMS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35 RDF graphs: incomplete relational databases based on V-tables V-tables: allow using variables in their tuples using a variable multiple times allows expressing joins on unknown values BGP query answering boils down to conjunctive query evaluation on a saturated database.
  • Saturation (related work) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35 • J. Broekstra and A. Kampman “Inferencing and truth maintenance in RDF Schema: Exploring a naive practical approach” in PSSS Workshop, 2003. • B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov “OWLIM: A family of scalable semantic repositories” Semantic Web, vol. 2, no. 1, 2011. • C. Gutierrez, C. A. Hurtado, and A. A. Vaisman “RDFS update: From theory to practice” in ESWC, 2011.