Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to query rewriting optimisation with dependencies

361 views

Published on

Introduction to query rewriting optimisation with dependencies in APEX lab, Shanghai 2012.

Published in: Education, Business
  • Be the first to comment

  • Be the first to like this

Introduction to query rewriting optimisation with dependencies

  1. 1. Dependencies Making Ontology Based Data Access Work in Practice Mariano Rodriguez-Muro and Diego Calvanese {rodriguez,calvanese}@inf.unibz.it KRDB Research Centre Free University of Bozen Bolzano July, 2011 Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 1 / 33
  2. 2. The context Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 2 / 33
  3. 3. DL Ontologies Description Logics: • Formalisms for knowledge representation. • Decidable fragments of FOL • Base of OWL • World is described by means of Concepts and Roles Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
  4. 4. DL Ontologies Description Logics: • Formalisms for knowledge representation. • Decidable fragments of FOL • Base of OWL • World is described by means of Concepts and Roles Ontologies • Intentional knowledge: TBox T . • Extensional knowledge: ABox A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
  5. 5. OBDA with DL-Lite A family of light-weight ontology languages Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  6. 6. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  7. 7. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  8. 8. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− • DL-LiteF TBoxes B B | B ¬B | (funct R) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  9. 9. OBDA with DL-Lite A family of light-weight ontology languages • DL-LiteF concepts B := A | ∃R • DL-LiteF roles R := P | P− • DL-LiteF TBoxes B B | B ¬B | (funct R) • DL-LiteF ABoxes A(a) | R(a, b) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
  10. 10. Query Answering Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  11. 11. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  12. 12. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Problem: Compute the certain answers of Q, denoted cert(Q, O). Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  13. 13. Query Answering TBox: Man Person, Woman Person, Person ∃hasFather, ∃hasFather− Person ABox: Man(mariano) Queries: q(x) ← Person(x), hasFather(x, y), Person(y) Problem: Compute the certain answers of Q, denoted cert(Q, O). The promise We can do this as efficiently as answering DB queries, also in the virtual setting. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
  14. 14. Query Answering with PerfectRef (2005) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  15. 15. Query Answering with PerfectRef (2005) Query: q(x) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  16. 16. Query Answering with PerfectRef (2005) Query: q(x) ← Person(x), hasFather(x, y), Person(y) Reformulation: q(x) ← Person(x), hasFather(x, y), Person(y) q(x) ← Person(x), hasFather(x, y), hasFather(z, y) q(x) ← Person(x), hasFather(x, y) q(x) ← Person(x), Person(x) q(x) ← Person(x) q(x) ← Person(x), hasFather(x, y), Man(y) q(x) ← Person(x), hasFather(x, y), Woman(y) q(x) ← hasFather(x, m), hasFather(x, y), Person(y) q(x) ← hasFather(x, m), hasFather(x, y), hasFather(z, y) q(x) ← hasFather(x, m), hasFather(x, y) q(x) ← hasFather(x, m), Person(x) q(x) ← hasFather(x, m), hasFather(x, t) q(x) ← hasFather(x, m) q(x) ← hasFather(x, m), hasFather(x, y), Man(y)Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
  17. 17. Alternatives Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  18. 18. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  19. 19. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  20. 20. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  21. 21. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  22. 22. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. • Combined Approach (Kontchakov et. al., 2010) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  23. 23. Alternatives • Improved version of PerfectRef (2007-2011) • RQR (Urbina et, al. 2007) Too many unions, cannot execute!. • PRESTO (Rosati et al., 2010) Better, eventually it breaks. • Combined Approach (Kontchakov et. al., 2010) Fast. But too much data and too much time. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
  24. 24. What can we do? ? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 8 / 33
  25. 25. Query Answering It is not only about existential constants Query: q(x, y) ← Person(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
  26. 26. Query Answering It is not only about existential constants Query: q(x, y) ← Person(x), hasFather(x, y), Person(y) Reformulation: q(x, y) ← Person(x), hasFather(x, y), Person(y) q(x, y) ← Person(x), hasFather(x, y), hasFather(z, y) q(x, y) ← Person(x), hasFather(x, y), Man(y) q(x, y) ← Person(x), hasFather(x, y), Woman(y) q(x, y) ← hasFather(x, m), hasFather(x, y), Person(y) q(x, y) ← hasFather(x, m), hasFather(x, y), hasFather(z, y) q(x, y) ← hasFather(x, m), hasFather(x, y), Man(y) q(x, y) ← hasFather(x, m), hasFather(x, y), Woman(y) q(x, y) ← Man(x), hasFather(x, y), Person(y) q(x, y) ← Man(x), hasFather(x, y), hasFather(z, y) q(x, y) ← Man(x), hasFather(x, y), Man(y) q(x, y) ← Man(x), hasFather(x, y), Woman(y) q(x, y) ← Woman(x), hasFather(x, y), Person(y) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
  27. 27. The full picture: Ontology Based Data Access SourceUser Source User Queries Ontology Mappings Source To deal with OBDA we need to consider: • If in the backend we have RDBMSs, we cannot go beyond their capabilities. • All systems are composed by T , D = R, I , M. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 10 / 33
  28. 28. First Observation Is my data complete? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  29. 29. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  30. 30. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  31. 31. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  32. 32. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  33. 33. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  34. 34. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) • This happens a lot! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  35. 35. First Observation Is my data complete? Completeness of A The TBox sais: Manager Employee In the ABox: all Managers are already employees. In any realistic scenario: • We don’t use arbitrary sources; • Intersection of semantics is reflected in completeness (e.g., no need to chase, expand or rewrite) • This happens a lot! Keyword Redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
  36. 36. Second Observation There are no ABoxes THERE ARE NO ABOXES! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
  37. 37. Second Observation There are no ABoxes THERE ARE NO ABOXES! Any Ontology based query answering systems today: • Uses relational DBs to store the ABox data; • In such D, both, R and I can be manipulated; • Implementors may choose any M for their system; Opportunity To complete an ABox we can do more than expansion. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
  38. 38. How to approach the problem Two level approach Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  39. 39. How to approach the problem Two level approach How to approach OBDA in practice? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  40. 40. How to approach the problem Two level approach How to approach OBDA in practice? • Efficient ways to deal with redundancy due to completeness. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  41. 41. How to approach the problem Two level approach How to approach OBDA in practice? • Efficient ways to deal with redundancy due to completeness. • Efficient ways to complete (virtual) ABoxes. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
  42. 42. Contributions Dealing with redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 14 / 33
  43. 43. Characterizing completeness ABox Dependencies Definition An assertion B A B that restricts valid ABoxes. Syntax B2 A B2 Semantics: A |= Manager A Employee if Manager(x)∈ A implies Employee(x)∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
  44. 44. Characterizing completeness ABox Dependencies Definition An assertion B A B that restricts valid ABoxes. Syntax B2 A B2 Semantics: A |= Manager A Employee if Manager(x)∈ A implies Employee(x)∈ A. ABox dependencies are fundamentally different than TBox assertions. Think open world Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
  45. 45. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  46. 46. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  47. 47. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: • Optimize the query reformulation algorithm to deal with Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  48. 48. Where to deal with redundancy? Given a TBox T , an ABox A, a set of dependencies Σ and a query Q, what do we do? Available Options: • Optimize the query reformulation algorithm to deal with Σ. • Optimize the TBox T with respect to Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
  49. 49. When is an assertion redundant? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  50. 50. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  51. 51. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Redundant if Σ is: ∃hasFather Person Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  52. 52. When is an assertion redundant? Direct Redundancy: Case 1 Let T be implied the following hierarchy: ∃hasFather Person Human Redundant if Σ is: ∃hasFather Person Human Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
  53. 53. When is an assertion redundant? Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  54. 54. When is an assertion redundant? Direct Redundancy: Case 2 Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  55. 55. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  56. 56. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Redundant if Σ is: Person ∃hasFather− ∃hasFather Man Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  57. 57. When is an assertion redundant? Direct Redundancy: Case 2 Let T be the following TBox: Person ∃hasFather− ∃hasFather Man Redundant if Σ is: Person ∃hasFather− ∃hasFather Man Σ sais Man(ramon) ∈ A → ∃a | hasFather(ramon, a ) ∧ Person(a ) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
  58. 58. When is an assertion redundant? Indirect Redundancy Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  59. 59. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  60. 60. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Redundant if Σ is: Animal Man Human Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  61. 61. When is an assertion redundant? Indirect Redundancy Let T be the following TBox: Animal Man Human Redundant if Σ is: Animal Man Human Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
  62. 62. Formalization: Redundancy Given a TBox T and a set of dependencies Σ over T , the optimized version of T w.r.t. Σ, denoted optim(T , Σ), is the set of inclusion assertions {α ∈ sat(T ) | α is not redundant in sat(T ) w.r.t. sat(Σ)} We can compute optim(T , Σ) in linear time. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 20 / 33
  63. 63. Contributions Completing ABoxes Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 21 / 33
  64. 64. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  65. 65. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  66. 66. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Trade-off: • Degree of completeness (# of dependencies), • Cost of the procedure • Performance of Query answering. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  67. 67. General considerations OBDA systems have no ABoxes, instead virtual ABoxes V = D, M with D = R, I . If we that V |= A A B, we check make sure that mappings for B include all the data coming from the mappings of A. Trade-off: • Degree of completeness (# of dependencies), • Cost of the procedure • Performance of Query answering. We can complete virtual ABoxes up to B ∃R without the need for new data. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
  68. 68. Semantic Index for OBDA General Idea Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  69. 69. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  70. 70. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  71. 71. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. • Make mappings for the system that take the ranges into account. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  72. 72. Semantic Index for OBDA General Idea • To encode the semantics of T in numeric indexes and ranges for concept names and roles. • Store the ABox in the database using those indexes and ranges. • Make mappings for the system that take the ranges into account. We can do this by using the implied hierarchy of T to generate the index and ranges! Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
  73. 73. Semantic Index Example T = {B A, C A, C D} Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  74. 74. Semantic Index Example T = {B A, C A, C D} A B C D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  75. 75. Semantic Index Example T = {B A, C A, C D} 1 A B 2 C 3 4 D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  76. 76. Semantic Index Example T = {B A, C A, C D} 1 A B 2 C 3 4 D We create a table TC with constant and idx columns. To insert the data we use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  77. 77. Semantic Index Example T = {B A, C A, C D} 1, {(1, 3)} A B 2, {(2, 2)} C 3, {(3, 3)} 4, {(3, 4)} D Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  78. 78. Semantic Index Example T = {B A, C A, C D} 1, {(1, 3)} A B 2, {(2, 2)} C 3, {(3, 3)} 4, {(3, 4)} D We create the mappings using the ranges, e.g., SELECT constant FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
  79. 79. Experimentation I The Resource Index features: • Search over 22 document collections • Semantics given by the hierarchies of 200 ontologies (SNOMED, GO) Implementation in a nutshell: (i) Understand documents with natural language processing and annotate Cervical Cancer( doc224 ) (ii) Expand the ABox (iii) Pose queries that retrieve documents as q(x) ← A1(x) ∧ · · · ∧ An(x) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 25 / 33
  80. 80. Experimentation II The challenge: • ≈ 3 million concepts and ≈ 2.5 million is-a assertions • Split second responses • 150 GB of data • Expansion data: 1.5 TB The experimentation data: • Clinical Trials.gov (CT) • 181 million assertion (≈ 14 GB of data, ≈ 140 GB when expanded.) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 26 / 33
  81. 81. Results The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
  82. 82. Results The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) Results: • Traditional reformulation: Union of 467874 SQL SPJ queries; • Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Time to compute semantic index: 1 min; Size of data: +≈ 4 GB. • ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansion time ≈ 7 days; Size of data +≈ 126 GB. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
  83. 83. The Query The query: q(x) ← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x) SELECT DISTINCT r0.element_id as element_id FROM RESOURCE_INDEX.CT_ANN r0 JOIN RESOURCE_INDEX.CT_ANN r1 ON r0.element_id = r1.element_id JOIN RESOURCE_INDEX.CT_ANN r2 ON r1.element_id = r2.element_id WHERE ((r0.idx >= 1783559 AND r0.idx <= 1783657)) AND ((r1.idx >= 1782996 AND r1.idx <= 1783029)) AND ((r2.idx >= 1783115 AND r2.idx <= 1783253)); Standard SQL query efficient in ANY DBMS. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 28 / 33
  84. 84. Conclusions Contributions • We indicated that efficient OBDA requires to take into account more than only T , A and Q. • Provided means to deal with redundancy at the level of the TBox. • We showed that expansion is not necessary that we can complete ABoxes. • We presented to efficient ways to complete ABoxes, one for the general OBDA setting and one for the virtual setting. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
  85. 85. Conclusions Contributions • We indicated that efficient OBDA requires to take into account more than only T , A and Q. • Provided means to deal with redundancy at the level of the TBox. • We showed that expansion is not necessary that we can complete ABoxes. • We presented to efficient ways to complete ABoxes, one for the general OBDA setting and one for the virtual setting. Future work • Exploring more expressive languages. • Exploring the RDFS/SPARQL setting. • Handling updates of T and A. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
  86. 86. Extra examples Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 30 / 33
  87. 87. First Observation (cont.) Mappings will introduce dependencies over ABoxes Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  88. 88. First Observation (cont.) Mappings will introduce dependencies over ABoxes Let R be a DB schema with the relation schema employee with attributes id, dept, and salary. Let M be the following mappings: SELECT id,dept FROM employee ;q(id, dept) ← Employee(id) ∧ WORKS-FOR(id, dept) SELECT id,dept FROM employee WHERE salary > 1000 ;q(id, dept) ← Manager(id)∧ MANAGES(id, dept) Then for any instance I, if Manager(John) ∈ A we have that Employee(John). Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  89. 89. First Observation (cont.) Mappings will introduce dependencies over ABoxes Let R be a DB schema with the relation schema employee with attributes id, dept, and salary. Let M be the following mappings: SELECT id,dept FROM employee ;q(id, dept) ← Employee(id) ∧ WORKS-FOR(id, dept) SELECT id,dept FROM employee WHERE salary > 1000 ;q(id, dept) ← Manager(id)∧ MANAGES(id, dept) Then for any instance I, if Manager(John) ∈ A we have that Employee(John). This is an indicator of completeness of all ABoxes A for M and R, e.g., A is complete w.r.t. Manager A Employee. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
  90. 90. Formalization: Chains Let T be a TBox, B, C basic concepts, and Σ a set of dependencies over T . A T -chain from B to C in T (resp., a Σ-chain from B to C in Σ) is a sequence of concept inclusion assertions (Bi Bi )n i=0 in T (resp., a sequence of inclusion dependencies (Bi A Bi )n i=0 in Σ), for some n ≥ 0, such that: 1 B0 = B, Bn = C, and 2 for 1 ≤ i ≤ n, we have that Bi−1 and Bi are basic concepts s.t., either (i) Bi−1 = Bi , or (ii) Bi−1 = ∃R and Bi = ∃R− , for some basic role R. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 32 / 33
  91. 91. Formalization: Redundancy Let T be a TBox, B, C basic concepts, and Σ a set of dependencies. The concept inclusion assertion B C is directly redundant in T w.r.t. Σ if (i) Σ |= B A C and (ii) for every T -chain (Bi Bi )n i=0 with Bn = B in T , there is a Σ-chain (Bi A Bi )n i=0. Then, B C is redundant in T w.r.t. Σ if (a) it is directly redundant, or (b) there exists B = B s.t. (i) T |= B C, (ii) B C is not redundant in T w.r.t. Σ, and (iii) B B is directly redundant in T w.r.t. Σ. Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 33 / 33

×