OXFORD'13 Optimising OWL 2 QL query rewriring

on

  • 228 views

OXFORD 2013, Presentation on the query rewriting approach taken in ontop/Quest. Separating reasoning with respect to hierarchies and existential constants using mapping transformation techniques and a ...

OXFORD 2013, Presentation on the query rewriting approach taken in ontop/Quest. Separating reasoning with respect to hierarchies and existential constants using mapping transformation techniques and a specialised query rewriting algorithm

Statistics

Views

Total Views
228
Views on SlideShare
228
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

OXFORD'13 Optimising OWL 2 QL query rewriring OXFORD'13 Optimising OWL 2 QL query rewriring Presentation Transcript

  • .. .. Ontop at Work Mariano Rodríguez-Muro1, Roman Kontchakov2 Michael Zakharyaschev2 1 Faculty of Computer Science, Free University of Bozen-Bolzano, Italy 2 Department of Computer Science and Information Systems, Birkbeck, University of London, U.K. May 22th, 2013
  • ... OBDA: What is it? . Loosely speaking... .. .Using ontologies to access of data. Ontop at Work 2 / 29
  • ... OBDA: What is it? . Loosely speaking... .. .Using ontologies to access of data. (Virtual) ABox User Query Ontology (TBox) Mappings OBDA System RBMS Data source Ontop at Work 2 / 29
  • ... OBDA: What is it? . Loosely speaking... .. .Using ontologies to access of data. (Virtual) ABox User Query Ontology (TBox) Mappings OBDA System RBMS Data source Our focus are OWL 2 QL ontologies, since they are tailored to handle very large amounts of data by means of query rewriting techniques. Ontop at Work 2 / 29
  • ... Query Answering by Query rewriting . Objective .. . Given a query Q over the ontology T derive a query Q′ over the database D that preserves the semantics of T. Ontop at Work 3 / 29
  • ... Query Answering by Query rewriting . Objective .. . Given a query Q over the ontology T derive a query Q′ over the database D that preserves the semantics of T. . . Consider a TBox T Movie ≡ ∃title, Movie ⊑ ∃year, Movie ≡ ∃cast, ∃cast− ⊑ Person Actor ⊑ Person Actress ⊑ Person, Producer ⊑ Person, Director ⊑ Person, Writer ⊑ Person, Editor ⊑ Person. Ontop at Work 3 / 29
  • ... Example . . The Database D: Two DB relations title[m, t, y] and castinfo[p, m, r]. Ontop at Work 4 / 29
  • ... Example . . The Database D: Two DB relations title[m, t, y] and castinfo[p, m, r]. The mapping M (logical form, think R2RML): Movie(m) ← title(m, t, y), title(m, t) ← title(m, t, y), year(m, y) ← title(m, t, y), cast(m, p) ← castinfo(p, m, r), Person(p) ← castinfo(p, m, r), Actor(p) ← castinfo(p, m, ”c1”) · · · Editor(p) ← castinfo(p, m, ”c6”). Ontop at Work 4 / 29
  • ... The classic OBDA architecture ..CQ q . ontology T . FO q′. mapping . SQL. data D . ABox A .+. rewriting . +. unfolding . + . ABox virtualisation Stages in the classic OBDA approach: . Rewriting w.r.t. T, . Unfolding w.r.t. M, . Execution over D. Ontop at Work 5 / 29
  • ... The classic OBDA architecture ..CQ q . ontology T . FO q′. mapping . SQL. data D . ABox A .+. rewriting . +. unfolding . + . ABox virtualisation Stages in the classic OBDA approach: . Rewriting w.r.t. T, . Unfolding w.r.t. M, . Execution over D. . .Unfolding and Mappings are ignored in most OBDA literature Ontop at Work 5 / 29
  • ... Example: Rewriting Given the query Q q(x) ← Person(x) Gives the rewriting q(x) ← Person(x) q(x) ← cast(z, x) q(x) ← Actor(x) . . . q(x) ← Editor(x) Ontop at Work 6 / 29
  • ... Example: Unfolding Given the query Q q(x) ← Person(x) Gives the rewriting q(x1) ← castinfo(x1, m, r) q(x2) ← castinfo(x2, m, r) q(x3) ← castinfo(x3, m, ”c1”) . . . q(x8) ← castinfo(x8, m, ”c6”) Ontop at Work 7 / 29
  • ... Issues The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) Ontop at Work 8 / 29
  • ... Issues The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) In the literature we find two solutions: . Encoding the rewriting as a Datalog program. For example, given the query: q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z) Ontop at Work 8 / 29
  • ... Issues The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) In the literature we find two solutions: . Encoding the rewriting as a Datalog program. For example, given the query: q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z) we generate the rewriting: q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z) Person(x) ← cast(m, x) Person(x) ← Actor(x) . . . Person(x) ← Edtior(x) Ontop at Work 8 / 29
  • ... Issues The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) In the literature we find two solutions: . Encoding the rewriting as a Datalog program. . But... .. . The query still needs to be unfolded into an SQL query. There are two choices here: . Generate SQL queries with nested UNIONs. Very bad for performance. . Expand into a UCQ. Back to square 1. Ontop at Work 9 / 29
  • ... Issues (cont.) The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) Ontop at Work 10 / 29
  • ... Issues (cont.) The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) . Using Query Containment to clean the output. Ontop at Work 10 / 29
  • ... Issues (cont.) The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) . Using Query Containment to clean the output. For example, to detect that this: q(x1) ← castinfo(x1, m, r) q(x2) ← castinfo(x2, m, r) q(x3) ← castinfo(x3, m, ”c1”) . . . q(x8) ← castinfo(x8, m, ”c6”) Ontop at Work 10 / 29
  • ... Issues (cont.) The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) . Using Query Containment to clean the output. For example, to detect that this: q(x1) ← castinfo(x1, m, r) q(x2) ← castinfo(x2, m, r) q(x3) ← castinfo(x3, m, ”c1”) . . . q(x8) ← castinfo(x8, m, ”c6”) can be simplified to q(x1) ← castinfo(x1, m, r) Ontop at Work 10 / 29
  • ... Issues (cont.) The issues with these rewritings are: . Large size (n1 ∗ . . . ∗ n2) . Largely redundant (w.r.t. query containment) . Using Query Containment to clean the output. . But... .. . . Query containment is an extremely expensive operation. . We are working with large sets of queries. Ontop at Work 11 / 29
  • ... Roots of the problem There are 3 main reasons for large CQ rewritings and unfoldings: Ontop at Work 12 / 29
  • ... Roots of the problem There are 3 main reasons for large CQ rewritings and unfoldings: (E) Sub-queries of q with existentially quantified variables can be folded in many different ways to match the canonical model (existential trees), e.g., Person ⊑ ∃hasFather.Person and the query q(x) ← hasFather(x, y), hasFather(y, z) Ontop at Work 12 / 29
  • ... Roots of the problem There are 3 main reasons for large CQ rewritings and unfoldings: (E) Sub-queries of q with existentially quantified variables can be folded in many different ways to match the canonical model (existential trees), e.g., Person ⊑ ∃hasFather.Person and the query q(x) ← hasFather(x, y), hasFather(y, z) (H) The concepts and roles for atoms in q can have many sub-concepts and sub-roles according to T, Ontop at Work 12 / 29
  • ... Roots of the problem There are 3 main reasons for large CQ rewritings and unfoldings: (E) Sub-queries of q with existentially quantified variables can be folded in many different ways to match the canonical model (existential trees), e.g., Person ⊑ ∃hasFather.Person and the query q(x) ← hasFather(x, y), hasFather(y, z) (H) The concepts and roles for atoms in q can have many sub-concepts and sub-roles according to T, (M) The mapping M can have multiple definitions of the ontology terms, Most of the proposed rewriting techniques try to tame (E). Ontop at Work 12 / 29
  • ... More about (E) More about (E) . it is in theory incurable . it is independent of (H) and (M) Ontop at Work 13 / 29
  • ... More about (E) More about (E) . it is in theory incurable . it is independent of (H) and (M) However . Rewriting algorithms deal with (E) and (H) at the same time . Real-world Qs and T’s generate few queries when dealing with (E) in isolation. . Even artificially constructed Qs and T’s become simple. Ontop at Work 13 / 29
  • ... More about (E) More about (E) . it is in theory incurable . it is independent of (H) and (M) However . Rewriting algorithms deal with (E) and (H) at the same time . Real-world Qs and T’s generate few queries when dealing with (E) in isolation. . Even artificially constructed Qs and T’s become simple. . .The strongest issues in query rewriting are (H) and (M) Ontop at Work 13 / 29
  • ... More about (E) More about (E) . it is in theory incurable . it is independent of (H) and (M) However . Rewriting algorithms deal with (E) and (H) at the same time . Real-world Qs and T’s generate few queries when dealing with (E) in isolation. . Even artificially constructed Qs and T’s become simple. . .The strongest issues in query rewriting are (H) and (M) In Ontop we deal with (H) and (M) separately from (E). We do it through T-mappings and TreeWitness rewritings. Ontop at Work 13 / 29
  • ... Dealing with (H) and (M): T-Mappings A T-mapping MT is a transformation of M that enforces all (H) entailments (H-completeness), formally, M |= A(c) and T |= A ⊑ B → MT |= B(c) Ontop at Work 14 / 29
  • ... Dealing with (H) and (M): T-Mappings A T-mapping MT is a transformation of M that enforces all (H) entailments (H-completeness), formally, M |= A(c) and T |= A ⊑ B → MT |= B(c) . T-mapping example 1 .. . Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an ontology MO describing the film domain as follows: Movie ≡ ∃cast Let M be the following mappings: Movie(m) ← title(m, t, y), cast(m, p) ← castinfo(p, m, r). Ontop at Work 14 / 29
  • ... Dealing with (H) and (M): T-Mappings A T-mapping MT is a transformation of M that enforces all (H) entailments (H-completeness), formally, M |= A(c) and T |= A ⊑ B → MT |= B(c) . T-mapping example 1 (domain/range) .. . Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an ontology MO describing the film domain as follows: Movie ≡ ∃cast Let M be the following mappings: Movie(m) ← title(m, t, y), cast(m, p) ← castinfo(p, m, r). Movie(m) ← castinfo(p, m, r). Ontop at Work 15 / 29
  • ... T-Mappings: Example 2 . T-mappings example 2 (hierarchies) .. . Consider a TBox T Actor ⊑ Person Actress ⊑ Person, Producer ⊑ Person, Director ⊑ Person, Writer ⊑ Person, Editor ⊑ Person. The mapping M: Actor(p) ← castinfo(p, m, ”c1”) · · · Editor(p) ← castinfo(p, m, ”c6”). Ontop at Work 16 / 29
  • ... T-Mappings: Example 2 . T-mappings example 2 (hierarchies) .. . Consider a TBox T Actor ⊑ Person Actress ⊑ Person, Producer ⊑ Person, Director ⊑ Person, Writer ⊑ Person, Editor ⊑ Person. The mapping M: Person(p) ← castinfo(p, m, ”c1”) · · · Person(p) ← castinfo(p, m, ”c6”). Ontop at Work 17 / 29
  • ... Optimising T-mappings . . The objective of T-mapping allow to deal with hierarchical reasoning (H) at the level of the unfolding. At this point, we can exploit . DB dependencies and . SQL expressivity to reduce and often the exponential growth coming form (H) and (M). Ontop at Work 18 / 29
  • ... Optimising with Dependencies A first optimisation is Query Containment (w.r.t. dependencies) Ontop at Work 19 / 29
  • ... Optimising with Dependencies A first optimisation is Query Containment (w.r.t. dependencies) . Example .. . Consider the previous example, since T |= ∃cast ⊑ Movie, the T-mapping contains: Movie(m) ← title(m, t, y), Movie(m) ← castinfo(p, m, r). Ontop at Work 19 / 29
  • ... Optimising with Dependencies A first optimisation is Query Containment (w.r.t. dependencies) . Example .. . Consider the previous example, since T |= ∃cast ⊑ Movie, the T-mapping contains: Movie(m) ← title(m, t, y), Movie(m) ← castinfo(p, m, r). The latter rule is redundant since IMDb contains the foreign key title(m, t, y) ⇝ title(p, m, r) This step is crucial to reduce the growth due to inferences related to domain and range. Ontop at Work 19 / 29
  • ... Optimising with SQL expressivity Ontop at Work 20 / 29
  • ... Optimising with SQL expressivity Observation. The only means for perfect reformulations to deal with (H) is through disjunction (UNION). DBMS are not good planning UNIONs. Ontop at Work 20 / 29
  • ... Optimising with SQL expressivity Observation. The only means for perfect reformulations to deal with (H) is through disjunction (UNION). DBMS are not good planning UNIONs. However, At the level of the unfolding and mappings, we have full SQL expressivity (e.g., Disjunction (OR), inequalities, etc.). Ontop at Work 20 / 29
  • ... Optimising with SQL expressivity Observation. The only means for perfect reformulations to deal with (H) is through disjunction (UNION). DBMS are not good planning UNIONs. However, At the level of the unfolding and mappings, we have full SQL expressivity (e.g., Disjunction (OR), inequalities, etc.). . Objective .. . Given a T-mapping, define mapping transformations that entail the same ABox using less mappings while ensuring that the encoding used is efficient during execution. Ontop at Work 20 / 29
  • ... Optimising with SQL expressivity Use OR and inequalities to re-express mappings for hierarchies and discriminant columns. Ontop at Work 21 / 29
  • ... Optimising with SQL expressivity Use OR and inequalities to re-express mappings for hierarchies and discriminant columns. . Dealing with discriminant columns .. . For example, the mapping M for IMDb and MO contains six rules for sub-concepts of Person: Person(p) ← castinfo(p, m, ”c1”) · · · Person(p) ← castinfo(p, m, ”c6”) Ontop at Work 21 / 29
  • ... Optimising with SQL expressivity Use OR and inequalities to re-express mappings for hierarchies and discriminant columns. . Dealing with discriminant columns .. . For example, the mapping M for IMDb and MO contains six rules for sub-concepts of Person: Person(p) ← castinfo(p, m, ”c1”) · · · Person(p) ← castinfo(p, m, ”c6”) These can be reduced to a single rule: Person(p) ← castinfo(c, p, m, r), (r = c1) ∨ · · · ∨ (r = c6). Ontop at Work 21 / 29
  • ... The architecture of Ontop ..CQ q . ontology T . UCQ qtw . T-mapping . mapping M . dependencies Σ . SQL. data D . ABox A . H-complete ABox A .+ . tw-rewriting . +. unfolding . + . ABox virtualisation . + . ABox virtualisation . + . ABox completion . + . completion . SQO . SQO . Highlights: (H) and (M) dealt with T-mappings, rewriting for (H)-complete ABoxes, extensive use of SQO over the unfolding. Ontop at Work 22 / 29
  • ... Other Optimisations in Ontop We also apply other important optimisations during system setup and at query time, the most important: Equivalence Simplification Simplify the ontology vocabulary w.r.t. equivalence (keep one representative of each equivalence class). Semantic Query Optimisation Optimise each query generated individually... see next slides. Emptiness indexes Keeping track of empty predicates Ontop at Work 23 / 29
  • ... Results A summary of the results we have observed using this architecture: . Mappings per class/property are few . Query rewritings are small . SQL queries generated like this often correspond to what a human expert would have generated. . Query execution of SPARQL with entailments is fast, often much faster than in triple stores. . .Query rewriting can be done efficiently Ontop at Work 24 / 29
  • ... Benchmarks 0.1   1   10   100   1000   10000   100000   1000000   R1   R2   R3   R4   R5   Q1   Q2   Q3   Q4   Q5   Q6   V7   V8   V9   V10   OWLIM   STARDOG   ONTOP   Benchmark: LUBMex, 200 Unis (30M triples). Systems: OWLIM (forward chaining), Stardog (rewriting), Ontop/DB2. Ontop/DB2 . returns immediately for 5/15 queries, . faster than the rest in 12/15 queries Ontop at Work 25 / 29
  • ... Summary Results so far . Efficiently dealt with exponential growth from (H) and (M) . Use of dependencies and CQC/SQO to minimise and optimise mapping rules . We exploit SQL expressivity to transform mappings to minimize the number of mappings. Ontop at Work 26 / 29
  • ... Summary Results so far . Efficiently dealt with exponential growth from (H) and (M) . Use of dependencies and CQC/SQO to minimise and optimise mapping rules . We exploit SQL expressivity to transform mappings to minimize the number of mappings. . . OWL 2 QL query answering with query rewriting is efficient and materialisation is not required. Ontop at Work 26 / 29
  • ... Summary Results so far . Efficiently dealt with exponential growth from (H) and (M) . Use of dependencies and CQC/SQO to minimise and optimise mapping rules . We exploit SQL expressivity to transform mappings to minimize the number of mappings. . . OWL 2 QL query answering with query rewriting is efficient and materialisation is not required. Ontop is available as an SPARQL end-point, OWLAPI and Sesame library, and Protege 4 plugin. Many more features (SPARQL, R2RML). Permanently under-development, however, stable enough to be used seriously in many projects, incl. Optique. Ontop at Work 26 / 29
  • ... Summary Results so far . Efficiently dealt with exponential growth from (H) and (M) . Use of dependencies and CQC/SQO to minimise and optimise mapping rules . We exploit SQL expressivity to transform mappings to minimize the number of mappings. . . OWL 2 QL query answering with query rewriting is efficient and materialisation is not required. Ontop is available as an SPARQL end-point, OWLAPI and Sesame library, and Protege 4 plugin. Many more features (SPARQL, R2RML). Permanently under-development, however, stable enough to be used seriously in many projects, incl. Optique. Current work is applying these techniques to more expressive settings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through an hybrid approach. Ontop at Work 26 / 29
  • ... Semantic Query Optimisation Consider the query q(t, y) ← Movie(m), title(m, t), year(m, y), (y > 2010) By straightforwardly applying the unfolding to qtw and the T-mapping M above, we obtain the query q′ tw(t, y) ← title(m, t0, y0), title(m, t, y1), title(m, t2, y), (y > 2010), which requires two (potentially) expensive Join operations. However, by using the primary key m of title we obtain: q′′ tw(t, y) ← title(m, t, y), (y > 2010).
  • ... Semantic Query Optmization Semantic Query Optimisation (SQO) is a field from DB theory focused on optimisation of queries w.r.t. dependencies. Semantic Query Optimisations in DB and OBDA . While some of SQO techniques reached industrial RDBMSs, it never had a strong impact on the database community. . In OBDA, in contrast, SQL queries are generated automatically, and so SQO is the only tools to reach optimal queries. . . In practice, an OBDA system must implement at least SQO w.r.t. primary keys and foreign keys to deal with the disparities between RDF and relational.
  • ... Why does it work? DBs are created through standard practices that generate features that are the focus of the previous optimisations. Starting from a rich conceptual schema, we encode it in a relational schema by: – amalgamating N-to-1 and 1-to-1 attributes of an entity to a single n-ary relation with a primary key identifying the entity (e.g., title with title and year), – using foreign keys over attribute columns when a column refers to the entity (e.g., name and castinfo), – using type-discriminant columns to encode hierarchical information (e.g., castinfo). As this process is universal, the T-mappings created for the resulting databases are dramatically simplified by the Ontop optimisations