OXFORD'13 Optimising OWL 2 QL query rewriring

.
.
.
.
Ontop at Work
Mariano Rodríguez-Muro1,
Roman Kontchakov2
Michael Zakharyaschev2
1 Faculty of Computer Science, Free
University of Bozen-Bolzano, Italy
2 Department of Computer Science
and Information Systems,
Birkbeck, University of London, U.K.
May 22th, 2013

.
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
Using ontologies to access of data.
Ontop at Work 2 / 29

.
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
(Virtual) ABox
User
Query
Ontology
(TBox)
Mappings
OBDA System
RBMS
Data source

.
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
(Virtual) ABox
User
Query
Ontology
(TBox)
Mappings
OBDA System
RBMS
Data source
Our focus are OWL 2 QL ontologies, since they are tailored to
handle very large amounts of data by means of query rewriting
techniques.

.
.
.
Query Answering by Query rewriting
.
Objective
.
.
.
Given a query Q over the ontology T derive a query Q′
over the database D that preserves the semantics of T.

.
.
.
Query Answering by Query rewriting
.
Objective
.
.
.
Given a query Q over the ontology T derive a query Q′
over the database D that preserves the semantics of T.
.
.
Consider a TBox T
Movie ≡ ∃title, Movie ⊑ ∃year,
Movie ≡ ∃cast, ∃cast−
⊑ Person
Actor ⊑ Person Actress ⊑ Person,
Producer ⊑ Person, Director ⊑ Person,
Writer ⊑ Person, Editor ⊑ Person.

.
.
.
Example
.
.
The Database D: Two DB relations title[m, t, y] and
castinfo[p, m, r].

.
.
.
Example
.
.
The Database D: Two DB relations title[m, t, y] and
castinfo[p, m, r].
The mapping M (logical form, think R2RML):
Movie(m) ← title(m, t, y), title(m, t) ← title(m, t, y),
year(m, y) ← title(m, t, y), cast(m, p) ← castinfo(p, m, r),
Person(p) ← castinfo(p, m, r),
Actor(p) ← castinfo(p, m, ”c1”) · · ·
Editor(p) ← castinfo(p, m, ”c6”).

.
.
.
The classic OBDA architecture
.
.
CQ q .
ontology T
. FO q′
.
mapping
. SQL
.
data D
.
ABox A
.
+
.
rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
Stages in the classic OBDA approach:
. Rewriting w.r.t. T,
. Unfolding w.r.t. M,
. Execution over D.

.
.
.
The classic OBDA architecture
.
.
CQ q .
ontology T
. FO q′
.
mapping
. SQL
.
data D
.
ABox A
.
+
.
rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
Stages in the classic OBDA approach:
. Rewriting w.r.t. T,
. Unfolding w.r.t. M,
. Execution over D.
.
.
Unfolding and Mappings are ignored in most OBDA literature

.
.
.
Example: Rewriting
Given the query Q
q(x) ← Person(x)
Gives the rewriting
q(x) ← Person(x)
q(x) ← cast(z, x)
q(x) ← Actor(x)
. . .
q(x) ← Editor(x)

.
.
.
Example: Unfolding
Given the query Q
q(x) ← Person(x)
Gives the rewriting
q(x1) ← castinfo(x1, m, r)
q(x3) ← castinfo(x3, m, ”c1”)
. . .

.
.
.
Issues
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)

.
.
.
Issues
. Large size (n1 ∗ . . . ∗ n2)
In the literature we ﬁnd two solutions:
. Encoding the rewriting as a Datalog program. For example,
given the query:
q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z)

.
.
.
Issues
. Large size (n1 ∗ . . . ∗ n2)
. Encoding the rewriting as a Datalog program. For example,
given the query:
we generate the rewriting:
Person(x) ← cast(m, x)
Person(x) ← Actor(x)
. . .
Person(x) ← Edtior(x)

.
.
.
Issues
. Large size (n1 ∗ . . . ∗ n2)
. Encoding the rewriting as a Datalog program.
.
But...
.
.
.
The query still needs to be unfolded into an SQL query. There are
two choices here:
. Generate SQL queries with nested UNIONs. Very bad for
performance.
. Expand into a UCQ. Back to square 1.

.
.
.
Issues (cont.)
. Large size (n1 ∗ . . . ∗ n2)

.
.
.
Issues (cont.)
. Large size (n1 ∗ . . . ∗ n2)
. Using Query Containment to clean the output.

.
.
.
Issues (cont.)
. Large size (n1 ∗ . . . ∗ n2)
. Using Query Containment to clean the output. For example,
to detect that this:
. . .

.
.
.
Issues (cont.)
. Large size (n1 ∗ . . . ∗ n2)
. Using Query Containment to clean the output. For example,
to detect that this:
. . .
can be simpliﬁed to

.
.
.
Issues (cont.)
. Large size (n1 ∗ . . . ∗ n2)
. Using Query Containment to clean the output.
.
But...
.
.
.
. Query containment is an extremely expensive operation.
. We are working with large sets of queries.

.
.
.
Roots of the problem
There are 3 main reasons for large CQ rewritings and unfoldings:

.
.
.
(E) Sub-queries of q with existentially quantiﬁed variables
can be folded in many diﬀerent ways to match the
canonical model (existential trees), e.g.,
Person ⊑ ∃hasFather.Person
and the query
q(x) ← hasFather(x, y), hasFather(y, z)

.
.
.
and the query
(H) The concepts and roles for atoms in q can have many
sub-concepts and sub-roles according to T,

.
.
.
and the query
(H) The concepts and roles for atoms in q can have many
sub-concepts and sub-roles according to T,
(M) The mapping M can have multiple deﬁnitions of the
ontology terms,
Most of the proposed rewriting techniques try to tame (E).

.
.
.
More about (E)
More about (E)
. it is in theory incurable
. it is independent of (H) and (M)

.
.
.
More about (E)
More about (E)
However
. Rewriting algorithms deal with (E) and (H) at the same time
. Real-world Qs and T’s generate few queries when dealing with
(E) in isolation.
. Even artiﬁcially constructed Qs and T’s become simple.

.
.
.
More about (E)
More about (E)
However
(E) in isolation.
.
.
The strongest issues in query rewriting are (H) and (M)

.
.
.
More about (E)
More about (E)
However
(E) in isolation.
.
.
The strongest issues in query rewriting are (H) and (M)
In Ontop we deal with (H) and (M) separately from (E). We do it
through T-mappings and TreeWitness rewritings.

.
.
.
Dealing with (H) and (M): T-Mappings
A T-mapping MT is a transformation of M that enforces all (H)
entailments (H-completeness), formally,
M |= A(c) and T |= A ⊑ B → MT |= B(c)

.
.
.
M |= A(c) and T |= A ⊑ B → MT |= B(c)
.
T-mapping example 1
.
.
.
Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an
ontology MO describing the ﬁlm domain as follows:
Movie ≡ ∃cast
Let M be the following mappings:
Movie(m) ← title(m, t, y),
cast(m, p) ← castinfo(p, m, r).

.
.
.
M |= A(c) and T |= A ⊑ B → MT |= B(c)
.
T-mapping example 1 (domain/range)
.
.
.
Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an
ontology MO describing the ﬁlm domain as follows:
Movie ≡ ∃cast
Let M be the following mappings:
cast(m, p) ← castinfo(p, m, r).
Movie(m) ← castinfo(p, m, r).

.
.
.
T-Mappings: Example 2
.
T-mappings example 2 (hierarchies)
.
.
.
Consider a TBox T
The mapping M:
Actor(p) ← castinfo(p, m, ”c1”) · · ·
Editor(p) ← castinfo(p, m, ”c6”).

.
.
.
T-Mappings: Example 2
.
T-mappings example 2 (hierarchies)
.
.
.
Consider a TBox T
The mapping M:
Person(p) ← castinfo(p, m, ”c1”) · · ·
Person(p) ← castinfo(p, m, ”c6”).

.
.
.
Optimising T-mappings
.
.
The objective of T-mapping allow to deal with hierarchical reasoning
(H) at the level of the unfolding. At this point, we can exploit
. DB dependencies and
. SQL expressivity to reduce and often the exponential growth
coming form (H) and (M).

.
.
.
Optimising with Dependencies
A ﬁrst optimisation is Query Containment (w.r.t. dependencies)

.
.
.
.
Example
.
.
.
Consider the previous example, since T |= ∃cast ⊑ Movie, the
T-mapping contains:

.
.
.
.
Example
.
.
.
Consider the previous example, since T |= ∃cast ⊑ Movie, the
T-mapping contains:
The latter rule is redundant since IMDb contains the foreign key
title(m, t, y) ⇝ title(p, m, r)
This step is crucial to reduce the growth due to inferences related to
domain and range.

.
.
.
Optimising with SQL expressivity

.
.
.
Observation. The only means for perfect reformulations to deal
with (H) is through disjunction (UNION). DBMS are not good
planning UNIONs.

.
.
.
planning UNIONs.
However, At the level of the unfolding and mappings, we have full
SQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.
.
.
planning UNIONs.
However, At the level of the unfolding and mappings, we have full
SQL expressivity (e.g., Disjunction (OR), inequalities, etc.).
.
Objective
.
.
.
Given a T-mapping, deﬁne mapping transformations that
entail the same ABox using less mappings while ensuring
that the encoding used is eﬃcient during execution.

.
.
.
Use OR and inequalities to re-express mappings for hierarchies and
discriminant columns.

.
.
.
.
Dealing with discriminant columns
.
.
.
For example, the mapping M for IMDb and MO contains six rules
for sub-concepts of Person:
Person(p) ← castinfo(p, m, ”c1”)
· · ·

.
.
.
.
Dealing with discriminant columns
.
.
.
For example, the mapping M for IMDb and MO contains six rules
for sub-concepts of Person:
· · ·
These can be reduced to a single rule:
Person(p) ← castinfo(c, p, m, r), (r = c1) ∨ · · · ∨ (r = c6).

.
.
.
The architecture of Ontop
.
.
CQ q .
ontology T
. UCQ qtw
.
T-mapping
.
mapping M
.
dependencies Σ
. SQL
.
data D
.
ABox A
.
H-complete ABox A
.
+ .
tw-rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
.
+
.
ABox virtualisation
.
+
.
ABox completion
.
+
.
completion
.
SQO
.
SQO
.
Highlights: (H) and (M) dealt with T-mappings, rewriting for
(H)-complete ABoxes, extensive use of SQO over the unfolding.

.
.
.
Other Optimisations in Ontop
We also apply other important optimisations during system setup
and at query time, the most important:
Equivalence Simpliﬁcation Simplify the ontology vocabulary w.r.t.
equivalence (keep one representative of each
equivalence class).
Semantic Query Optimisation Optimise each query generated
individually... see next slides.
Emptiness indexes Keeping track of empty predicates

.
.
.
Results
A summary of the results we have observed using this architecture:
. Mappings per class/property are few
. Query rewritings are small
. SQL queries generated like this often correspond to what a
human expert would have generated.
. Query execution of SPARQL with entailments is fast, often
much faster than in triple stores.
.
.
Query rewriting can be done eﬃciently

.
.
.
Benchmarks
0.1

1

10

100

1000

10000

100000

1000000

R1
R2
R3
R4
R5
Q1
Q2
Q3
Q4
Q5
Q6
V7
V8
V9
V10

OWLIM

STARDOG

ONTOP

Benchmark: LUBMex, 200 Unis (30M triples). Systems: OWLIM
(forward chaining), Stardog (rewriting), Ontop/DB2.
Ontop/DB2
. returns immediately for 5/15 queries,
. faster than the rest in 12/15 queries

.
.
.
Summary
Results so far
. Eﬃciently dealt with exponential growth from (H) and (M)
. Use of dependencies and CQC/SQO to minimise and optimise
mapping rules
. We exploit SQL expressivity to transform mappings to minimize
the number of mappings.

.
.
.
Summary
Results so far
mapping rules
.
.
OWL 2 QL query answering with query rewriting is eﬃcient and
materialisation is not required.

.
.
.
Summary
Results so far
mapping rules
.
.
Ontop is available as an SPARQL end-point, OWLAPI and
Sesame library, and Protege 4 plugin. Many more features
(SPARQL, R2RML). Permanently under-development, however,
stable enough to be used seriously in many projects, incl. Optique.

.
.
.
Summary
Results so far
mapping rules
.
.
Ontop is available as an SPARQL end-point, OWLAPI and
Sesame library, and Protege 4 plugin. Many more features
(SPARQL, R2RML). Permanently under-development, however,
stable enough to be used seriously in many projects, incl. Optique.
Current work is applying these techniques to more expressive
settings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through an
hybrid approach.

.
.
.
Semantic Query Optimisation
Consider the query
q(t, y) ← Movie(m), title(m, t), year(m, y), (y > 2010)
By straightforwardly applying the unfolding to qtw and the
T-mapping M above, we obtain the query
q′
tw(t, y) ← title(m, t0, y0), title(m, t, y1), title(m, t2, y), (y > 2010),
which requires two (potentially) expensive Join operations.
However, by using the primary key m of title we obtain:
q′′
tw(t, y) ← title(m, t, y), (y > 2010).

.
.
.
Semantic Query Optmization
Semantic Query Optimisation (SQO) is a ﬁeld from DB theory
focused on optimisation of queries w.r.t. dependencies.
Semantic Query Optimisations in DB and OBDA
. While some of SQO techniques reached industrial RDBMSs,
it never had a strong impact on the database community.
. In OBDA, in contrast, SQL queries are generated
automatically, and so SQO is the only tools to reach optimal
queries.
.
.
In practice, an OBDA system must implement at least SQO
w.r.t. primary keys and foreign keys to deal with the disparities
between RDF and relational.

.
.
.
Why does it work?
DBs are created through standard practices that generate features
that are the focus of the previous optimisations.
Starting from a rich conceptual schema, we encode it in a relational
schema by:
– amalgamating N-to-1 and 1-to-1 attributes of an entity to a
single n-ary relation with a primary key identifying the entity
(e.g., title with title and year),
– using foreign keys over attribute columns when a column refers
to the entity (e.g., name and castinfo),
– using type-discriminant columns to encode hierarchical
information (e.g., castinfo).
As this process is universal, the T-mappings created for the resulting
databases are dramatically simpliﬁed by the Ontop optimisations

OXFORD'13 Optimising OWL 2 QL query rewriring

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to OXFORD'13 Optimising OWL 2 QL query rewriring

Similar to OXFORD'13 Optimising OWL 2 QL query rewriring (20)

More from Mariano Rodriguez-Muro

More from Mariano Rodriguez-Muro (20)

Recently uploaded

Recently uploaded (20)

OXFORD'13 Optimising OWL 2 QL query rewriring