Introduction to query rewriting optimisation with dependencies

242 views
183 views

Published on

Introduction to query rewriting optimisation with dependencies in APEX lab, Shanghai 2012.

Published in: Education, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
242
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to query rewriting optimisation with dependencies

  1. 1. Dependencies Making Ontology Based Data Access Work in Practice Mariano Rodriguez-Muro and Diego Calvanese {rodriguez,calvanese}@inf.unibz.it KRDB Research Centre Free University of Bozen Bolzano July, 2011 Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 1 / 33
  2. 2. The context Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 2 / 33
  3. 3. DL Ontologies Description Logics: • Formalisms for knowledge representation. • Decidable fragments of FOL • Base of OWL • World is described by means of Concepts and Roles Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
  4. 4. DL Ontologies Description Logics: • Formalisms for knowledge representation. • Decidable fragments of FOL • Base of OWL • World is described by means of Concepts and Roles Ontologies • Intentional knowledge: TBox T . • Extensional knowledge: ABox A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
  5. 5. OBDA with DL-Lite A family of light-weight ontology languages Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  6. 6. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  7. 7. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  8. 8. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− • DL-LiteF TBoxes B B | B ¬B | (funct R) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  9. 9. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− • DL-LiteF TBoxes B B | B ¬B | (funct R) • DL-LiteF ABoxes A(a) | R(a, b) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  10. 10. Query Answering Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  11. 11. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  12. 12. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Problem: Compute the certain answers of Q, denoted cert(Q, O). Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  13. 13. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Problem: Compute the certain answers of Q, denoted cert(Q, O). The promise We can do this as efficiently as answering DB queries, also in the virtual setting. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  14. 14. Query Answering with PerfectRef (2005) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  15. 15. Query Answering with PerfectRef (2005) Query: q(x) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  16. 16. Query Answering with PerfectRef (2005) Query: q(x) ← Person(x), hasFather(x, y), Person(y) Reformulation: q(x) ← Person(x), hasFather(x, y), Person(y) q(x) ← Person(x), hasFather(x, y), hasFather(z, y) q(x) ← Person(x), hasFather(x, y) q(x) ← Person(x), Person(x) q(x) ← Person(x) q(x) ← Person(x), hasFather(x, y), Man(y) q(x) ← Person(x), hasFather(x, y), Woman(y) q(x) ← hasFather(x, m), hasFather(x, y), Person(y) q(x) ← hasFather(x, m), hasFather(x, y), hasFather(z, y) q(x) ← hasFather(x, m), hasFather(x, y) q(x) ← hasFather(x, m), Person(x) q(x) ← hasFather(x, m), hasFather(x, t) q(x) ← hasFather(x, m) q(x) ← hasFather(x, m), hasFather(x, y), Man(y)Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  17. 17. Alternatives Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  18. 18. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  19. 19. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  20. 20. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  21. 21. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  22. 22. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. • Combined Approach (Kontchakov et. al., 2010) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  23. 23. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. • Combined Approach (Kontchakov et. al., 2010) Fast. But too much data and too much time. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  24. 24. What can we do? ? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 8 / 33
  25. 25. Query Answering It is not only about existential constants Query: q(x, y) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
  26. 26. Query Answering It is not only about existential constants Query: q(x, y) ← Person(x), hasFather(x, y), Person(y) Reformulation: q(x, y) ← Person(x), hasFather(x, y), Person(y) q(x, y) ← Person(x), hasFather(x, y), hasFather(z, y) q(x, y) ← Person(x), hasFather(x, y), Man(y) q(x, y) ← Person(x), hasFather(x, y), Woman(y) q(x, y) ← hasFather(x, m), hasFather(x, y), Person(y) q(x, y) ← hasFather(x, m), hasFather(x, y), hasFather(z, y) q(x, y) ← hasFather(x, m), hasFather(x, y), Man(y) q(x, y) ← hasFather(x, m), hasFather(x, y), Woman(y) q(x, y) ← Man(x), hasFather(x, y), Person(y) q(x, y) ← Man(x), hasFather(x, y), hasFather(z, y) q(x, y) ← Man(x), hasFather(x, y), Man(y) q(x, y) ← Man(x), hasFather(x, y), Woman(y) q(x, y) ← Woman(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
  27. 27. The full picture: Ontology Based Data Access SourceUser Source User Queries Ontology Mappings Source To deal with OBDA we need to consider: • If in the backend we have RDBMSs, we cannot go beyond their capabilities. • All systems are composed by T , D = R, I , M. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 10 / 33
  28. 28. First Observation Is my data complete? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  29. 29. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  30. 30. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  31. 31. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  32. 32. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  33. 33. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  34. 34. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) • This happens a lot! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  35. 35. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) • This happens a lot! Keyword Redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  36. 36. Second Observation There are no ABoxes THERE ARE NO ABOXES! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
  37. 37. Second Observation There are no ABoxes THERE ARE NO ABOXES! Any Ontology based query answering systems today: • Uses relational DBs to store the ABox data; • In such D, both, R and I can be manipulated; • Implementors may choose any M for their system; Opportunity To complete an ABox we can do more than expansion. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
  38. 38. How to approach the problem Two level approach Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  39. 39. How to approach the problem Two level approach How to approach OBDA in practice? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  40. 40. How to approach the problem Two level approach How to approach OBDA in practice? • Efficient ways to deal with redundancy due to completeness. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  41. 41. How to approach the problem Two level approach How to approach OBDA in practice? • Efficient ways to deal with redundancy due to completeness. • Efficient ways to complete (virtual) ABoxes. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  42. 42. Contributions Dealing with redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 14 / 33
  43. 43. Characterizing completeness ABox Dependencies Definition An assertion B A B that restricts valid ABoxes. Syntax B2 A B2 Semantics: A |= Manager A Employee if Manager(x)∈ A implies Employee(x)∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
  44. 44. Characterizing completeness ABox Dependencies Definition An assertion B A B that restricts valid ABoxes. Syntax B2 A B2 Semantics: A |= Manager A Employee if Manager(x)∈ A implies Employee(x)∈ A. ABox dependencies are fundamentally different than TBox assertions. Think open world Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
  45. 45. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  46. 46. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  47. 47. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: • Optimize the query reformulation algorithm to deal with Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  48. 48. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: • Optimize the query reformulation algorithm to deal with Σ. • Optimize the TBox T with respect to Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  49. 49. When is an assertion redundant? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  50. 50. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  51. 51. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Redundant if Σ is: ∃hasFather Person Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  52. 52. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Redundant if Σ is: ∃hasFather Person Human Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  53. 53. When is an assertion redundant? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  54. 54. When is an assertion redundant? Direct Redundancy: Case 2 Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  55. 55. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  56. 56. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Redundant if Σ is: Person ∃hasFather− ∃hasFather Man Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  57. 57. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Redundant if Σ is: Person ∃hasFather− ∃hasFather Man Σ sais Man(ramon) ∈ A → ∃a | hasFather(ramon, a ) ∧ Person(a ) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  58. 58. When is an assertion redundant? Indirect Redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  59. 59. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  60. 60. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Redundant if Σ is: Animal Man Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  61. 61. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Redundant if Σ is: Animal Man Human Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  62. 62. Formalization: Redundancy Given a TBox T and a set of dependencies Σ over T , the optimized version of T w.r.t. Σ, denoted optim(T , Σ), is the set of inclusion assertions {α ∈ sat(T ) | α is not redundant in sat(T ) w.r.t. sat(Σ)} We can compute optim(T , Σ) in linear time. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 20 / 33
  63. 63. Contributions Completing ABoxes Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 21 / 33
  64. 64. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  65. 65. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  66. 66. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Trade-off: • Degree of completeness (# of dependencies), • Cost of the procedure • Performance of Query answering. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  67. 67. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Trade-off: • Degree of completeness (# of dependencies), • Cost of the procedure • Performance of Query answering. We can complete virtual ABoxes up to B ∃R without the need for new data. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  68. 68. Semantic Index for OBDA General Idea Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  69. 69. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  70. 70. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  71. 71. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. • Make mappings for the system that take the ranges into account. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  72. 72. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. • Make mappings for the system that take the ranges into account. We can do this by using the implied hierarchy of T to generate the index and ranges! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  73. 73. Semantic Index Example T = {B A, C A, C D} Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  74. 74. Semantic Index Example T = {B A, C A, C D} A B C D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  75. 75. Semantic Index Example T = {B A, C A, C D} 1 A B 2 C 3 4 D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  76. 76. Semantic Index Example T = {B A, C A, C D} 1 A B 2 C 3 4 D We create a table TC with constant and idx columns. To insert the data we use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  77. 77. Semantic Index Example T = {B A, C A, C D} 1, {(1, 3)} A B 2, {(2, 2)} C 3, {(3, 3)} 4, {(3, 4)} D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  78. 78. Semantic Index Example T = {B A, C A, C D} 1, {(1, 3)} A B 2, {(2, 2)} C 3, {(3, 3)} 4, {(3, 4)} D We create the mappings using the ranges, e.g., SELECT constant FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  79. 79. Experimentation I The Resource Index features: • Search over 22 document collections • Semantics given by the hierarchies of 200 ontologies (SNOMED, GO) Implementation in a nutshell: (i) Understand documents with natural language processing and annotate Cervical Cancer( doc224 ) (ii) Expand the ABox (iii) Pose queries that retrieve documents as q(x) ← A1(x) ∧ · · · ∧ An(x) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 25 / 33
  80. 80. Experimentation II The challenge: • ≈ 3 million concepts and ≈ 2.5 million is-a assertions • Split second responses • 150 GB of data • Expansion data: 1.5 TB The experimentation data: • Clinical Trials.gov (CT) • 181 million assertion (≈ 14 GB of data, ≈ 140 GB when expanded.) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 26 / 33
  81. 81. Results The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
  82. 82. Results The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) Results: • Traditional reformulation: Union of 467874 SQL SPJ queries; • Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Time to compute semantic index: 1 min; Size of data: +≈ 4 GB. • ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansion time ≈ 7 days; Size of data +≈ 126 GB. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
  83. 83. The Query The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) SELECT DISTINCT r0.element_id as element_id FROM RESOURCE_INDEX.CT_ANN r0 JOIN RESOURCE_INDEX.CT_ANN r1 ON r0.element_id = r1.element_id JOIN RESOURCE_INDEX.CT_ANN r2 ON r1.element_id = r2.element_id WHERE ((r0.idx >= 1783559 AND r0.idx <= 1783657)) AND ((r1.idx >= 1782996 AND r1.idx <= 1783029)) AND ((r2.idx >= 1783115 AND r2.idx <= 1783253)); Standard SQL query efficient in ANY DBMS. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 28 / 33
  84. 84. Conclusions Contributions • We indicated that efficient OBDA requires to take into account more than only T , A and Q. • Provided means to deal with redundancy at the level of the TBox. • We showed that expansion is not necessary that we can complete ABoxes. • We presented to efficient ways to complete ABoxes, one for the general OBDA setting and one for the virtual setting. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
  85. 85. Conclusions Contributions • We indicated that efficient OBDA requires to take into account more than only T , A and Q. • Provided means to deal with redundancy at the level of the TBox. • We showed that expansion is not necessary that we can complete ABoxes. • We presented to efficient ways to complete ABoxes, one for the general OBDA setting and one for the virtual setting. Future work • Exploring more expressive languages. • Exploring the RDFS/SPARQL setting. • Handling updates of T and A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
  86. 86. Extra examples Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 30 / 33
  87. 87. First Observation (cont.) Mappings will introduce dependencies over ABoxes Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  88. 88. First Observation (cont.) Mappings will introduce dependencies over ABoxes Let R be a DB schema with the relation schema employee with attributes id, dept, and salary. Let M be the following mappings: SELECT id,dept FROM employee ;q(id, dept) ← Employee(id) ∧ WORKS-FOR(id, dept) SELECT id,dept FROM employee WHERE salary > 1000 ;q(id, dept) ← Manager(id)∧ MANAGES(id, dept) Then for any instance I, if Manager(John) ∈ A we have that Employee(John). Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  89. 89. First Observation (cont.) Mappings will introduce dependencies over ABoxes Let R be a DB schema with the relation schema employee with attributes id, dept, and salary. Let M be the following mappings: SELECT id,dept FROM employee ;q(id, dept) ← Employee(id) ∧ WORKS-FOR(id, dept) SELECT id,dept FROM employee WHERE salary > 1000 ;q(id, dept) ← Manager(id)∧ MANAGES(id, dept) Then for any instance I, if Manager(John) ∈ A we have that Employee(John). This is an indicator of completeness of all ABoxes A for M and R, e.g., A is complete w.r.t. Manager A Employee. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  90. 90. Formalization: Chains Let T be a TBox, B, C basic concepts, and Σ a set of dependencies over T . A T -chain from B to C in T (resp., a Σ-chain from B to C in Σ) is a sequence of concept inclusion assertions (Bi Bi )n i=0 in T (resp., a sequence of inclusion dependencies (Bi A Bi )n i=0 in Σ), for some n ≥ 0, such that: 1 B0 = B, Bn = C, and 2 for 1 ≤ i ≤ n, we have that Bi−1 and Bi are basic concepts s.t., either (i) Bi−1 = Bi , or (ii) Bi−1 = ∃R and Bi = ∃R− , for some basic role R. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 32 / 33
  91. 91. Formalization: Redundancy Let T be a TBox, B, C basic concepts, and Σ a set of dependencies. The concept inclusion assertion B C is directly redundant in T w.r.t. Σ if (i) Σ |= B A C and (ii) for every T -chain (Bi Bi )n i=0 with Bn = B in T , there is a Σ-chain (Bi A Bi )n i=0. Then, B C is redundant in T w.r.t. Σ if (a) it is directly redundant, or (b) there exists B = B s.t. (i) T |= B C, (ii) B C is not redundant in T w.r.t. Σ, and (iii) B B is directly redundant in T w.r.t. Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 33 / 33

×