Efficient Query Answering against Dynamic RDF Databases

498 views

Published on

Published in: Data & Analytics, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
498
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Efficient Query Answering against Dynamic RDF Databases

  1. 1. Efficient Query Answering against Dynamic RDF Databases François Goasdoué, Ioana Manolescu, Alexandra Roati¸s Université Paris-Sud & Inria Saclay (OAK project) 20 March 2013
  2. 2. Overview EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35 The Resource Description Framework Basic Graph Pattern Queries Contributions Experiments Related Work Conclusion
  3. 3. The Resource Description Framework
  4. 4. The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard
  5. 5. The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o
  6. 6. The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs
  7. 7. The Resource Description Framework (RDF) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35 ⊲ graph-based data model ⊲ W3C standard RDF Graph: ⊲ set of triples: s p o s ∈ U ∪ B, p ∈ U, o ∈ U ∪ B ∪ L U – URIs, L – literals (constants), B – blank nodes the subject s has the property p with the value: the object o ⊲ built-in property: rdf:type specify to which classes a resource belongs Constructor Triple Relational notation Class assertion s rdf:type o o(s) Property assertion s p o p(s, o)
  8. 8. Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens
  9. 9. Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy
  10. 10. Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa
  11. 11. Blank nodes EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35 ⊲ feature of RDF ⊲ support unknown URI/literal tokens Example: the country of _:b1 is Italy the city of the same _:b1 is Genoa the population of Genoa is an unspecified value _:b2
  12. 12. Running example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  13. 13. RDF Schema (RDFS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties
  14. 14. RDF Schema (RDFS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35 ⊲ feature of RDF ⊲ enhance the descriptions in graphs ⊲ declare semantic constraints between classes and properties Built-in properties: ⊲ subclass relationships: rdfs:subClassOf ⊲ subproperty relationships: rdfs:subPropertyOf ⊲ typing the first attribute (domain) of a property: rdfs:domain ⊲ typing the second attribute (range) of a property: rdfs:range Constructor Triple Relational notation Subclass constraint s rdfs:subClassOf o s ⊆ o Subproperty constraint s rdfs:subPropertyOf o s ⊆ o Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o Range typing constraint s rdfs:range o Πrange(s) ⊆ o
  15. 15. Running example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35 book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type
  16. 16. Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present
  17. 17. Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information
  18. 18. Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming).
  19. 19. Open-world assumption and RDF entailment EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35 The RDF data model is based on the open-world assumption. → deductive constraints – implicitly propagate tuples Implicit triples → considered part of the graph – not explicitly present Entailment – reasoning mechanism set of explicit triples & some entailment rules derive implicit information Exhaustive application of entailment rules → saturation (a.k.a. closure) The saturation of a graph is unique (up to blank node renaming). Entailment is part of the RDF specification itself. The semantics of an RDF graph is its saturation.
  20. 20. Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
  21. 21. Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type
  22. 22. Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage
  23. 23. Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type
  24. 24. Entailment rules by example EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35 1) book1Book Publication rdfs:subClassOf rdf:type rdf:type 2) book1writtenIn hasLanguage English rdfs:subPropertyOf writtenIn hasLanguage 3) book1writtenIn Book English rdfs:domain writtenIn rdf:type 4) book1writtenIn Language English rdfs:range writtenIn rdf:type
  25. 25. Basic Graph Pattern Queries
  26. 26. Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables)
  27. 27. Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables
  28. 28. Basic Graph Pattern (BGP) Queries EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35 ⊲ subset of SPARQL ⊲ BGP – conjunction of triple patterns (or triples) q(¯x):- t1, . . . , tα ti = si pi oi, si, pi ∈ U ∪ B ∪ V, oi ∈ U ∪ B ∪ V ∪ L ¯x ∈ V (distinguished variables) query evaluation treats blank nodes in a query as non-distinguished variables Example: q(x, y):- x hasAuthor z, x rdf:type y ≡ q(x, y):- x hasAuthor _:b0, x rdf:type y
  29. 29. Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering
  30. 30. Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation
  31. 31. Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation
  32. 32. Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35 Problem: query evaluation = query answering the evaluation of a query only uses the graph’s explicit triples may lead to an incomplete answer set the (complete) answer set is obtained by evaluating the query against the graph’s saturation Solution: decouple RDF entailment from query evaluation Perform a pre-processing step to deal with entailed triples: ⊲ on the database – data saturation ⊲ on the queries – query reformulation
  33. 33. Data saturation vs. Query reformulation EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35 Data saturation Advantages: ⊲ straightforward ⊲ easy to implement Drawbacks: ⊲ computation time ⊲ additional storage space ⊲ must be recomputed upon database updates Example: the YAGO2 dataset doubles in size when computing the RDFS-closure → 33M to 64M triples Query reformulation Advantages: ⊲ database saturation does not need to be (re)computed Drawbacks: ⊲ every incoming query must be reformulated ⊲ reformulations can be prohibitively large ⊲ difficult to optimize Example: a single atom query over YAGO2, can yield of union of > 300 000 queries
  34. 34. Contributions
  35. 35. Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  36. 36. Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  37. 37. Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  38. 38. Contributions EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35 1. The database (DB) fragment of RDF extending previously studied fragments by the support of blank nodes 2. Novel BGP query answering techniques for this DB fragment designed to work on top of on any standard conjunctive query processor (i) an efficient incremental RDF saturation maintenance algorithm (ii) a novel reformulation-based query answering algorithm 3. Thorough performance comparison and analysis
  39. 39. The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way
  40. 40. The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics
  41. 41. The database (DB) fragment of RDF EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35 ⊲ restricts entailment to RDFS entailment ⊲ does not restrict graphs in any way An RDF database: db = D, S D & S – disjoint sets of triples D (RDF) – instance level → assertions S (RDFS) – schema level → semantics db = book1 “Good Omens” “Neil Gaiman” “Terry Pratchett” Book English _:b0 Language hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type , Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf
  42. 42. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules)
  43. 43. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer
  44. 44. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y
  45. 45. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication
  46. 46. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book
  47. 47. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z
  48. 48. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf Reformulate(q, db) = q(x, y):- x rdf:type y ∪ q(x, Publication):- x rdf:type Publication ∪ q(x, Publication):- x rdf:type Book ∪ q(x, Publication):- x writtenIn z ∪ . . . ∪ q(x, _:b1):- x rdf:type _:b1 ∪ . . .
  49. 49. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1
  50. 50. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z
  51. 51. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 , English, _:b1 } wrong answer
  52. 52. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer Book _:b1 Language writtenIn hasLanguage Publication LanguageEnglish book1 _:b1 rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf rdf:type rdf:type q(x, _:b1):- x rdf:type _:b1 ≡ q(x, _:b1):- x rdf:type z Answer set: { book1, _:b1 } correct answer
  53. 53. Query reformulation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35 Reformulate(q, db) ⊲ fixpoint algorithm (13 reformulation rules) ⊲ reformulates q into a set of queries s.t. the union of the non-standard evaluations of these queries on db produces the correct answer ⊲ size of the output: O((6 ∗ #db2)#q)
  54. 54. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules)
  55. 55. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples
  56. 56. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples Saturate(db) = db ∪ book1 Language Publication _:b1 English rdf:type rdf:type hasLanguage rdf:type
  57. 57. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 )
  58. 58. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3)
  59. 59. Database saturation algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35 Saturate(db) ⊲ fixpoint algorithm (4 saturation rules) ⊲ explicitly adds to db all its implicit triples ⊲ size of the output: O(#db2 ) ⊲ computation time: O(#db3) What about updates?
  60. 60. Saturation maintenance algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates
  61. 61. Saturation maintenance algorithm EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35 Saturate+(db) ⊲ multiset variant of Saturate(db) ⊲ allows saturation maintenance upon updates Saturate+(db) = db ∪ Book book1 Language Publication _:b1 English rdf:type rdf:type rdf:type hasLanguage rdf:type
  62. 62. Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn
  63. 63. Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type
  64. 64. Example of instance insertion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English French _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type hasLanguage rdf:type writtenIn To insert the triple: book1 French writtenIn First saturate the triple using db: book1 Language Book Publication _:b1 French rdf:type rdf:type rdf:type hasLanguage rdf:type Then insert the explicit triple and the inferred ones in db.
  65. 65. Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain
  66. 66. Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:domain rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type
  67. 67. Example of schema deletion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35 Book _:b1 Language writtenIn hasLanguage Publication rdfs:subClassOf rdfs:subClassOf rdfs:range rdfs:subPropertyOf book1 _:b1 Book “Good Omens” “Neil Gaiman” “Terry Pratchett” English _:b0 Language Publication hasTitle hasAuthor hasAuthor rdf:type translatedTo writtenIn rdf:type rdf:type rdf:type hasLanguage rdf:type To delete the triple: BookwrittenIn rdfs:domain First infer affected data triples using db: book1 Book Publication _:b1 rdf:type rdf:type rdf:type Then delete the explicit triple and the inferred ones from db.
  68. 68. Experiments
  69. 69. Experimental setup EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35 • implementation in Java 1.6 • deployed on top of a PostgreSQL v8.5 server • 6 indexes – all permutations of the (s, p, o) columns • the spo index is clustering • dictionary encoding Graph characteristics and saturation times: Graph Storage Barton DBpedia DBLP #Schema in memory 101 5, 666 41 #Instance Triple(s, p, o) 34 × 106 27 × 106 8.4 × 106 #Saturation Sat(s, p, o) 39 × 106 30 × 106 12 × 106 Saturation increase (%) 14.91 10.65 41.05 #Multiset SatM(s, p, o, isExp, count) 73.5 × 106 66 × 106 18.7 × 106 Multiset increase (%) 116.89 227.37 121.97 tsat (s) 4, 294 2, 742 748 tsat+ (s) 4, 586 2, 977 799
  70. 70. Query answering EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35 • 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average) • similar query answering times on Sat and SatM ABCDEFB EB E B F DE B DE EB E B F DE D DE EB E B F DE B DE D B
  71. 71. Graph updates EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35 • no impact on reformulation • saturation needs to maintain SatM • insertions & deletions • updates of one triple on the data and the schema ABCDEFB EB EC EB DE A EB EC F DE A EB EC EB DE A EB EC F DE A BC EB DE A BC F DE A
  72. 72. Saturation thresholds EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35 The saturation threshold of a query q (st(q)): the smallest integer n s.t. n × tref (q) > n × tsat (q) + tsat+ tref (q) – time to answer q through reformulation (using Triple) tsat (q) – time to answer q based on saturation (using SatM) tsat+ – time to saturate db (create SatM) AB CDE CDF D AB D AB C F DC F DF D AB C F DC B A DF D AB C C F DF D AB C C B A DF
  73. 73. Related Work
  74. 74. Outline of the positioning of our work EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35 Query language expressive power SPARQL BGP queries relational conjunctive queries RDF fragment expressive powerDL DB [1, 3, 5] [4, 6, 7] [2] this work [1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web. JODS 8 (2007). [2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. In Reasoning Web (2009). [3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning (JAR) 39, 3 (2007). [4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases. PVLDB (2011). [5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. In ICDE (2011). Keynote. [6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. In ISWC (2008). [7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases. In ISWC (2011).
  75. 75. Conclusion
  76. 76. Conclusion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis
  77. 77. Conclusion EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35 Summary: ⊲ RDF fragment (extending those studied in the literature) ⊲ novel saturation- and reformulation-based query answering techniques robust to instance and schema updates ⊲ algorithms directly deployable on top of any RDBMS ⊲ thorough performance comparison and analysis Future work: An automated strategy to choose between the two techniques: Saturate+(db) / Reformulate(q, db)
  78. 78. Thank you! EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35 I you attention Question _:b1 _:b2 _:b3 thank pay ask ask ask rdf:type rdf:type rdf:type
  79. 79. Open-world interpretation of RDFS constraints EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35 Constraint interpretation: ⊲ closed-world assumption (CWA) any fact not present in the database is assumed not to hold database facts do not respect a constraint → inconsistency R1 ⊆ R2 – any tuple in the relation R1 must also be in the relation R2 ⊲ open-world assumption (OWA) facts may hold even though they are not in the database R1 ⊆ R2 – any tuple in the relation R1 is also in the relation R2 The RDF data model is based on OWA.
  80. 80. RDF meets Relational Database Management Systems (RDBMS) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35 RDF graphs: incomplete relational databases based on V-tables V-tables: allow using variables in their tuples using a variable multiple times allows expressing joins on unknown values BGP query answering boils down to conjunctive query evaluation on a saturated database.
  81. 81. Saturation (related work) EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35 • J. Broekstra and A. Kampman “Inferencing and truth maintenance in RDF Schema: Exploring a naive practical approach” in PSSS Workshop, 2003. • B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov “OWLIM: A family of scalable semantic repositories” Semantic Web, vol. 2, no. 1, 2011. • C. Gutierrez, C. A. Hurtado, and A. A. Vaisman “RDFS update: From theory to practice” in ESWC, 2011.

×