ontology-mediated
query answering with
data-tractable
description logics
Meghyn Bienvenu (CNRS & Université de Montpellier)
Magdalena Ortiz (Vienna University of Technology)
ontology-mediated query answering (omqa)
data
incomplete
database
(ground facts)
ontology
(logical theory)
???
user query
2/109
ontology-mediated query answering (omqa)
data ???
patient data
“Melanie has listeriosis”
“Paul has Lyme disease”
medical knowledge
“Listeriosis & Lyme disease
are bacterial infections”
user query
“Find all patients with
bacterial infections”
2/109
ontology-mediated query answering (omqa)
data ???
patient data
“Melanie has listeriosis”
“Paul has Lyme disease”
medical knowledge
“Listeriosis & Lyme disease
are bacterial infections”
user query
“Find all patients with
bacterial infections”
expected answers: Melanie, Paul
2/109
ontology-mediated query answering (omqa)
data ???
employee data
“Marie is a professor”
“Mark teaches CS200”
org. knowledge
“Professors are teaching staff”
“Someone who teaches is
part of the teaching staff”
user query
“Find all teaching staff”
2/109
ontology-mediated query answering (omqa)
data ???
employee data
“Marie is a professor”
“Mark teaches CS200”
org. knowledge
“Professors are teaching staff”
“Someone who teaches is
part of the teaching staff”
user query
“Find all teaching staff”
expected answers: Marie, Mark
2/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
3/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
To present an intuitive and unified view of data sources
∙ ontology can be used to enrich the data vocabulary, making it
easier for users to formulate their queries
∙ especially useful when integrating multiple data sources
3/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
To present an intuitive and unified view of data sources
∙ ontology can be used to enrich the data vocabulary, making it
easier for users to formulate their queries
∙ especially useful when integrating multiple data sources
To support automated reasoning
∙ uncover implicit connections between terms, errors in modelling
∙ exploit knowledge in the ontology during query answering, to get
back a more complete set of answers to queries
3/109
applications of omqa: medicine
General medical ontologies: SNOMED CT (∼ 400,000 terms!), GALEN
Specialized ontologies: FMA (anatomy), NCI (cancer), ...
Querying & exchanging medical records (find patients for medical trials)
∙ myocardial infarction vs. MI vs. heart attack vs. 410.0
Supports tools for annotating and visualizing patient data (scans, x-rays) 4/109
applications of omqa: life sciences
Hundreds of ontologies at BioPortal (http://bioportal.bioontology.org/):
Gene Ontology (GO), Cell Ontology, Pathway Ontology, Plant Anatomy, ...
Help scientists share, query, & visualize experimental data
5/109
applications of omqa: entreprise information systems
Companies and organizations have lots of data
∙ need easy and flexible access to support decision-making
Example industrial projects:
∙ Public debt data: Sapienza Univ. & Italian Department of Treasury
∙ Energy sector: Optique EU project (several univ, StatOil, & Siemens)
6/109
our focus: horn description logics
Ontologies formulated using description logics (DLs):
∙ family of decidable fragments of first-order logic
∙ basis for OWL web ontology language (W3C)
∙ range from fairly simple to highly expressive
∙ complexity of query answering well understood
7/109
our focus: horn description logics
Ontologies formulated using description logics (DLs):
∙ family of decidable fragments of first-order logic
∙ basis for OWL web ontology language (W3C)
∙ range from fairly simple to highly expressive
∙ complexity of query answering well understood
In this tutorial, focus on Horn description logics:
∙ DL-LiteR, EL, ELHI, Horn-SHIQ, ...
∙ good computational properties, well suited for OMQA
∙ still expressive enough for interesting applications
∙ basis for OWL 2 QL and OWL 2 EL profiles
Consider various types of queries
7/109
plan for today
∙ Horn Description Logics
∙ Basics of OMQA
∙ Instance Queries
∙ Conjunctive Queries
∙ Navigational Queries
∙ Queries with Negation and Recursion
∙ Research Trends in OMQA
8/109
horn description logics
dl basics
Building blocks of DLs:
∙ concept names (unary predicates, classes)
IceCream, Pizza, Meat, SpicyDish, Dish, Menu, Restaurant, ...
∙ role names (binary predicates, properties)
hasIngred, hasCourse, hasDessert, serves, ...
∙ individual names (constants)
menu32, pastadish17, d3, rest156, r12, ...
(specific menus, dishes, restaurants ...)
NC / NR / NI: set of all concept / role / individual names
10/109
dl knowledge bases
Knowledge base (KB) = ABox (data) + TBox (ontology)
ABox contains facts about specific individuals
∙ finite set of concept assertions A(a) and role assertions r(a, b)
∙ IceCream(d2): dish d2 is of type IceCream
∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2
11/109
dl knowledge bases
Knowledge base (KB) = ABox (data) + TBox (ontology)
ABox contains facts about specific individuals
∙ finite set of concept assertions A(a) and role assertions r(a, b)
∙ IceCream(d2): dish d2 is of type IceCream
∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2
TBox contains general knowledge about the domain of interest
∙ finite set of axioms (details on syntax to follow)
∙ IceCream is a subclass of Dessert
∙ hasCourse connects Menus to Dishes
∙ every Menu is connected to at least one dish via hasCourse
11/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
∙ inverse (−
) and composition (·) of roles
hasCourse−
contains · contains
(use N±
R for set of role names and inverse roles)
(use inv(r) to toggle −: inv(r) = r−, inv(r−) = r )
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
∙ inverse (−
) and composition (·) of roles
hasCourse−
contains · contains
(use N±
R for set of role names and inverse roles)
(use inv(r) to toggle −: inv(r) = r−, inv(r−) = r )
Note: set of available constructors depends on the particular DL! 12/109
tbox axioms
Concept inclusions C ⊑ D (C, D possibly complex concepts)
IceCream ⊑ Dessert Menu ⊑ ∃hasCourse.⊤ Spicy ⊓ Dish ⊑ SpicyDish
Role inclusions R ⊑ S (R, S possibly complex roles)
hasIngred ⊑ contains ingredOf−
⊑ hasIngred hasDessert ⊑ hasCourse
Note: type and syntax of axioms depends on the particular DL!
13/109
dl semantics
Interpretation I (“possible world”)
∙ domain of objects ∆I
(possibly infinite set)
∙ interpretation function ·I
that maps
∙ concept name A ⇝ set of objects AI
⊆ ∆I
∙ role name r ⇝ set of pairs of objects rI
⊆ ∆I
× ∆I
∙ individual name a ⇝ object aI
∈ ∆I
14/109
example: interpretation
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
4 concept names: Dish, Dessert, Appetizer, Menu
1 role name: hasCourse 2 individual names: italFeast, chCake
15/109
dl semantics
Interpretation I (“possible world”)
∙ domain of objects ∆I
(possibly infinite set)
∙ interpretation function ·I
that maps
∙ concept name A ⇝ set of objects AI
⊆ ∆I
∙ role name r ⇝ set of pairs of objects rI
⊆ ∆I
× ∆I
∙ individual name a ⇝ object aI
∈ ∆I
Interpretation function ·I
extends to complex concepts and roles:
⊤ ∆I
⊥ ∅
¬C ∆I
 CI
C1 ⊓ C2 C1
I
∩ C2
I
∃R.C {d1 | there exists (d1, d2) ∈ RI
with d2 ∈ CI
}
∀R.C {d1 | d2 ∈ CI
for all (d1, d2) ∈ RI
}
r−
{(d2, d1) | (d1, d2) ∈ rI
}
16/109
back to the example
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
Dish ⊓ Menu Dessert ⊓ Appetizer ∃hasCourse.⊤ ∃hasCourse−
.Dessert
17/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
18/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
∙ I satisfies A(a) ⇔ aI
∈ AI
∙ I satisfies r(a, b) ⇔ (aI
, bI
) ∈ rI
18/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
∙ I satisfies A(a) ⇔ aI
∈ AI
∙ I satisfies r(a, b) ⇔ (aI
, bI
) ∈ rI
Model of a KB K = interpretation that satisfies all statements in K
K is satisfiable = K has at least one model
K entails α (written K |= α) = every model I of K satisfies α
18/109
back to the example
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
Which of the following assertions / axioms is satisfied in I?
Dessert ⊑ Dish Dish ⊓ Menu ⊑ ⊥ Menu ⊑ ∃hasCourse.⊤
∃hasCourse−
.⊤ ⊑ Dish Menu(italFeast) hasCourse(italFeast, chCake)
19/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
ELHI⊥
∙ additionally allows for ⊥ and inverse roles (r−
)
∙ can also have role inclusions
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
ELHI⊥
∙ additionally allows for ⊥ and inverse roles (r−
)
∙ can also have role inclusions
Horn-SHIQ
∙ limited use of ¬, ∀r.C, and number restrictions (≥ nR.C, ≤ nR.C)
∙ also have transitivity axioms (e.g. assert contains is transitive) 20/109
basics of omqa
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
22/109
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
ABoxes interpreted under open world assumption:
∙ every assertion in the ABox is assumed to hold (true)
∙ assertions not present in the ABox may hold or not (unknown)
Each ABox gives rise to many interpretations (its models)
∙ models can be infinite, can have infinitely many models
22/109
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
ABoxes interpreted under open world assumption:
∙ every assertion in the ABox is assumed to hold (true)
∙ assertions not present in the ABox may hold or not (unknown)
Each ABox gives rise to many interpretations (its models)
∙ models can be infinite, can have infinitely many models
Databases interpreted under closed world assumption:
∙ every fact in the DB is assumed to hold (true)
∙ every fact not in the DB is assumed not to hold (false)
In other words, each DB corresponds to single finite interpretation
∙ domain of the interpretation = set of constants in DB
22/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
First-order (FO) query = first-order formula
∙ arity of FO query = number of free variables
∙ answers = substitutions for free vars that make formula hold
∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y))
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
First-order (FO) query = first-order formula
∙ arity of FO query = number of free variables
∙ answers = substitutions for free vars that make formula hold
∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y))
Datalog queries = finite set of Datalog rules + ‘goal’ relation
∙ arity of Datalog query = arity of goal relation
∙ answers = exhaustively apply rules to DB / interpretation, collect
tuples in goal relation
∙ example: rules contains(x, z) ← contains(x, y), contains(y, z) and
SpicyDish(x) ← Dish(x), contains(x, y), Spicy(y)
23/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
Formally: Call a tuple (a1, . . . , an) of individuals from A a certain
answer to n-ary query q over DL KB K = (T , A) iff
(aI
1 , . . . , aI
n ) ∈ ans(q, I) for every model I of K
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
Formally: Call a tuple (a1, . . . , an) of individuals from A a certain
answer to n-ary query q over DL KB K = (T , A) iff
(aI
1 , . . . , aI
n ) ∈ ans(q, I) for every model I of K
Ontology-mediated query answering (OMQA)
= computing certain answers to queries
24/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
25/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
There are four certain answers to q w.r.t. K:
∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T
∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T
∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A
∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert−
⊑ Dessert ∈ T
25/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
There are four certain answers to q w.r.t. K:
∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T
∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T
∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A
∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert−
⊑ Dessert ∈ T
The fifth individual m is not a certain answer: can construct model
J of K in which mJ
̸∈ DessertJ
(see lecture notes).
25/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
26/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
Call q′
(⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a
T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′
(⃗x), IA) (IA = treat A as DB)
26/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
Call q′
(⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a
T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′
(⃗x), IA) (IA = treat A as DB)
Types of rewritings: FO-rewritings (SQL), Datalog rewritings, ...
26/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
27/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
Simple use of saturation: (works e.g. for RDFS ontologies)
∙ use saturation to ‘complete’ the ABox by adding those assertions
that are logically entailed from the KB
∙ then evaluate the query over the saturated ABox
27/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
Simple use of saturation: (works e.g. for RDFS ontologies)
∙ use saturation to ‘complete’ the ABox by adding those assertions
that are logically entailed from the KB
∙ then evaluate the query over the saturated ABox
More complex uses:
∙ enrich the ABox in other ways (e.g. add new ABox individuals to
witness the existential restrictions ∃R.C)
∙ combine saturation with query rewriting
27/109
complexity of omqa
View OMQA as a decision problem (yes-or-no question):
Problem: Q answering in L (Q a query language, L a DL)
Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T ,
and a tuple ⃗a ∈ Ind(A)n
Question: Does ⃗a belong to cert(q, (T , A))?
28/109
complexity of omqa
View OMQA as a decision problem (yes-or-no question):
Problem: Q answering in L (Q a query language, L a DL)
Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T ,
and a tuple ⃗a ∈ Ind(A)n
Question: Does ⃗a belong to cert(q, (T , A))?
Combined complexity: in terms of size of whole input
Data complexity: in terms of size of A only
∙ view rest of input as fixed (of constant size)
∙ motivation: ABox typically much larger than rest of input
Note: use |A| to denote size of A (similarly for |T |, |q|, etc.)
28/109
complexity classes
We will mention the following standard classes:
P problems solvable in deterministic polynomial time
NP problems solvable in non-det. polynomial time
coNP problems whose complement is solvable in
non-deterministic polynomial time
LogSpace problems solvable in deterministic logarithmic space
NLogSpace problems solvable in non-det. logarithmic space
PSpace problems solvable in polynomial space (note: =NPSpace)
Exp problems solvable in deterministic exponential time
29/109
complexity classes
We will mention the following standard classes:
P problems solvable in deterministic polynomial time
NP problems solvable in non-det. polynomial time
coNP problems whose complement is solvable in
non-deterministic polynomial time
LogSpace problems solvable in deterministic logarithmic space
NLogSpace problems solvable in non-det. logarithmic space
PSpace problems solvable in polynomial space (note: =NPSpace)
Exp problems solvable in deterministic exponential time
Another less known but important class:
AC0 problems solvable by uniform family of
polynomial-size constant-depth circuits
Relationships between classes:
AC0 ⊊ LogSpace ⊆ NLogSpace ⊆ P ⊆ NP ⊆ PSpace ⊆ Exp
29/109
instance queries
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
31/109
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
To query for a complex concept C, take AC(x) for fresh AC ∈ NC and
add C ⊑ AC to the TBox
31/109
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
To query for a complex concept C, take AC(x) for fresh AC ∈ NC and
add C ⊑ AC to the TBox
Remarks:
∙ Instance query answering is often called instance checking
∙ Focus of OMQA until mid-2000s
31/109
instance checking in dl-lite via query rewriting
Input = instance query q + DL-LiteR TBox T
We construct an FO-rewriting of q w.r.t. T
More specifically, we construct:
∙ an FO-rewriting of q relative to consistent ABoxes, and
∙ an FO-rewriting of unsatisfiability
(these can be easily combined into FO-rewriting of q for all ABoxes)
32/109
rewriting relative to consistent aboxes
We first define two procedures:
ComputeSubsumees all reasons for an individual to be in B
input concept B, TBox T
output set of C such that T |= C ⊑ B ⇝ subsumees of B w.r.t. T
ComputeSubroles all reasons for a pair to be in R
input role R, TBox T
output set of S such that T |= S ⊑ R ⇝ subroles of R w.r.t. T
33/109
computing subsumees
Algorithm ComputeSubsumees
Input: DL-LiteR TBox T , concept B ∈ NC ∪ {∃R | R ∈ N±
R }
1. Initialize Subsumees = {B} and Examined = ∅.
2. While Subsumees  Examined ̸= ∅
2.1 Select D ∈ Subsumees  Examined and add D to Examined.
2.2 For every concept inclusion C ⊑ D ∈ T
∙ If C ̸∈ Subsumees, add C to Subsumees
2.3 For every role inclusion R ⊑ S ∈ T such that D = ∃S.
∙ If ∃R ̸∈ Subsumees, add ∃R to Subsumees
2.4 For every role inclusion R ⊑ S ∈ T such that D = ∃inv(S).
∙ If ∃inv(R) ̸∈ Subsumees, add ∃inv(R) to Subsumees.
3. Return Subsumees.
34/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
35/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
Choose: Dish Examined = {Dish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
35/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
Choose: Dish Examined = {Dish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Choose: ItalDish Examined = {Dish, ItalDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
35/109
computing subsumees: an example (2/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: VegDish Examined = {Dish, ItalDish, VegDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
36/109
computing subsumees: an example (2/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: VegDish Examined = {Dish, ItalDish, VegDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Choose: ∃hasCourse−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
36/109
computing subsumees: an example (3/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: ∃hasMain−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
hasMain−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
37/109
computing subsumees: an example (3/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: ∃hasMain−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
hasMain−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
Choose: ∃hasDessert−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
37/109
computing subroles
Algorithm ComputeSubroles
Input: DL-LiteR TBox T , role R ∈ N±
R
1. Initialize Subroles = {R} and Examined = ∅.
2. While Subroles  Examined ̸= ∅
2.1 Select S ∈ Subroles  Examined and add S to Examined.
2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T
∙ If U ̸∈ Subsumees, add U to Subsumees
3. Return Subroles.
38/109
computing subroles
Algorithm ComputeSubroles
Input: DL-LiteR TBox T , role R ∈ N±
R
1. Initialize Subroles = {R} and Examined = ∅.
2. While Subroles  Examined ̸= ∅
2.1 Select S ∈ Subroles  Examined and add S to Examined.
2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T
∙ If U ̸∈ Subsumees, add U to Subsumees
3. Return Subroles.
ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Run on hasCourse:
Subroles = {hasCourse, hasMain,
hasDessert}
38/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
39/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
Rewriting of r(x, y) w.r.t. T (and consistent ABoxes):
RewriteIQ(r, T ) =
∨
s∈SR
s(x, y) ∨
∨
s−∈SR
s(y, x)
39/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
Rewriting of r(x, y) w.r.t. T (and consistent ABoxes):
RewriteIQ(r, T ) =
∨
s∈SR
s(x, y) ∨
∨
s−∈SR
s(y, x)
The rewriting is ABox-independent and polysize in |T | and |q|.
39/109
example of query rewriting (1/2)
We have already computed:
ComputeSubsumees(Dish, T ) ={Dish, ItalDish, VegDish,
∃hasCourse−
, ∃hasMain
−
, ∃hasDessert−
}
Get following rewriting of Dish(x) w.r.t. T :
RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x)
∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x)
∨ ∃y.hasDessert(y, x)
40/109
example of query rewriting (2/2)
ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
ABox A:
hasMain(m, d1)
hasDessert(m, d2)
VegDish(d3)
RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x)
∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x)
Certain answers: d1, because of the disjunct ∃y.hasMain(y, x)
d2, because of the disjunct ∃y.hasDessert(y, x)
d3, because of the disjunct VegDish(x)
41/109
checking unsatisfiability
We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes
We need a rewriting of unsatisfiability to obtain a rewriting of q
42/109
checking unsatisfiability
We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes
We need a rewriting of unsatisfiability to obtain a rewriting of q
∙ only negative inclusions relevant
∙ one subquery for each such inclusion G ⊑ ¬H
∙ consider all possible ways of violating G ⊑ ¬H: combinations of a
subsumee (subrole) of G and a subsumee (subrole) of H
Details in the lecture notes.
42/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
43/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
In combined complexity:
∙ P membership: rewriting and evaluation both in polynomial time
∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting
43/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
In combined complexity:
∙ P membership: rewriting and evaluation both in polynomial time
∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting
Theorem In DL-LiteR, satisfiability and instance checking are
1. in AC0 for data complexity
2. NLogSpace-complete for combined complexity.
Note: Same bounds hold for several other DL-Lite dialects
43/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
44/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
Cannot use FO query rewriting approach for EL:
no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A}
44/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
Cannot use FO query rewriting approach for EL:
no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A}
We present a saturation-based approach.
44/109
saturation rules for el
TBox rules
A ⊑ Bi (1 ≤ i ≤ n) B1 ⊓ . . . ⊓ Bn ⊑ D
A ⊑ D
T1
A ⊑ B B ⊑ ∃r.D
A ⊑ ∃r.D
T2
A ⊑ ∃r.B B ⊑ D ∃r.D ⊑ E
A ⊑ E
T3
ABox rules
A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n)
B(a)
A1
∃r.B ⊑ A r(a, b) B(b)
A(a)
A2
Algorithm: apply rules exhaustively, check if A(a) (r(a, b)) is present
45/109
example: saturation in el
46/109
example: saturation in el
ArrabSauce ⊑ Spicy T3 : (5), (6), (7) (10)
PenneArrab ⊑ Spicy T3 : (1), (10), (7) (11)
PenneArrab ⊑ Dish T1 : (2), (3) (12)
PenneArrab ⊑ ∃hasIngred.Pasta T2 : (2), (4) (13)
PenneArrab ⊑ SpicyDish T1 : (11), (12), (8) (14)
Spicy(p) A1 : (11), (9) (15)
Dish(p) A1 : (12), (9) (16)
SpicyDish(p) A1 : (16), (15) (17)
46/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
Note: does not make all consequences explicit
∙ can have infinitely many implied axioms ⇝ would not terminate!
∙ so: only complete for some reasoning tasks
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
Note: does not make all consequences explicit
∙ can have infinitely many implied axioms ⇝ would not terminate!
∙ so: only complete for some reasoning tasks
Runs in polynomial time in |K|. This is optimal:
Theorem Instance checking in EL is P-complete for both data and
combined complexity.
47/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
A ⊑ ∃R.D ∃R−
.B ⊑ E
48/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
A ⊑ ∃R.D ∃R−
.B ⊑ E
A ⊓ B ⊑ ∃R.(D ⊓ E)
48/109
extended set of saturation rules
TBox rules
{A ⊑ Bi}n
i=1 B1 ⊓ . . . ⊓ Bn ⊑ D
A ⊑ D
T1
R ⊑ S S ⊑ T
R ⊑ T
T4
M ⊑ ∃R.(N ⊓ ⊥)
M ⊑ ⊥
T5
M ⊑ ∃R.(N ⊓ N′
) N ⊑ A
M ⊑ ∃R.(N ⊓ N′
⊓ A)
T6
M ⊑ ∃R.(N ⊓ A) ∃S.A ⊑ B R ⊑ S
M ⊑ B
T7
M ⊑ ∃R.N ∃inv(S).A ⊑ B R ⊑ S
M ⊓ A ⊑ ∃R.(N ⊓ B)
T8
ABox rules
A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n)
B(a)
A1
∃r.B ⊑ A r(a, b) B(b)
A(a)
A2
∃r−
.B ⊑ A r(b, a) B(b)
A(a)
A3
r ⊑ s r(a, b)
s(a, b)
A4
r ⊑ s−
r(a, b)
s(b, a)
A5
49/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
C ⊑ ∃R.D D ⊑ ∀R−
.B
C ⊑ ∃R.(N ⊓ N′
⊓ A)
New set of rules ⇝ exponentially many different new axioms
Theorem Instance checking in ELHI⊥ is P-complete for data and
Exp-complete for combined complexity.
50/109
saturation as a datalog program
Let sat(T ) be result of applying TBox saturation rules to T .
For each ELHI⊥ TBox T and ABox signature Σ define following
Datalog program Π(T , Σ):
Π(T , Σ) ={B(x) ← A1(x), . . . , An(x) | A1 ⊓ . . . ⊓ An ⊑ B ∈ sat(T )} ∪
{B(x) ← A(y), r(x, y) | ∃r.A ⊑ B ∈ T } ∪
{B(y) ← A(x), r(x, y) | ∃r−
.A ⊑ B ∈ T } ∪
{s(x, y) ← r(x, y) | r ⊑ s ∈ sat(T ), s ∈ NR} ∪
{s(y, x) ← r(x, y) | r ⊑ s−
∈ sat(T ), s ∈ NR} ∪
{⊤(x) ← A(x) | A ∈ NC ∩ Σ} ∪
{⊤(x) ← r(x, y) | r ∈ NR ∩ Σ} ∪
{⊤(x) ← r(y, x) | r ∈ NR ∩ Σ}
51/109
saturation as a datalog program (continued)
Theorem For every finite signature Σ and ELHI⊥ KB K = (T , A)
with sig(A) ⊆ Σ:
1. K is unsatisfiable iff ans((Π(T , Σ), ⊥), IA) ̸= ∅;
2. If K is satisfiable, then for all A ∈ NC, r ∈ NR, and a, b ∈ Ind(A):
∙ K |= A(a) iff a ∈ ans((Π(T , Σ), A), IA);
∙ K |= r(a, b) iff (a, b) ∈ ans((Π(T , Σ), r), IA).
This means:
∙ get Datalog rewriting of instance queries in ELHI⊥
∙ can use Datalog program to create saturated ABox
52/109
saturation as a datalog program: an example
The Datalog program associated with our example:
PastaDish(x) ← PenneArrab(x) PenneArrab ⊑ PastaDish
Dish(x) ← PastaDish(x) PastaDish ⊑ Dish
Spicy(x) ← Peperonc(x) Peperonc ⊑ Spicy
Spicy(x) ← hasIngred(x, y), Spicy(y) ∃hasIngred.Spicy ⊑ Spicy
SpicyDish(x) ← Spicy(x), Dish(x) Dish ⊓ Spicy ⊑ SpicyDish
Spicy(x) ← ArrabSauce(x) ArrabSauce ⊑ Spicy
Spicy(x) ← PenneArrab(x) PenneArrab ⊑ Spicy
Dish(x) ← PenneArrab(x) PenneArrab ⊑ Dish
SpicyDish(x) ← PenneArrab PenneArrab ⊑ SpicyDish
(technically, also have T -independent rules for ⊤...)
53/109
conjunctive queries
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
A conjunctive query (CQ) is a first-order query q(⃗x) of the form
∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn)
where every variable in some ⃗ti appears in either ⃗x or ⃗y
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
A conjunctive query (CQ) is a first-order query q(⃗x) of the form
∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn)
where every variable in some ⃗ti appears in either ⃗x or ⃗y
A union of CQs (UCQ) is a first-order query q(⃗x) of the form
q1(⃗x) ∨ · · · ∨ qn(⃗x)
where the qi(⃗x) are CQs with same tuple ⃗x of free vars
55/109
cqs and ucqs in datalog
Alternatively, CQs and UCQs can be seen as Datalog rules
CQs:
q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn)
56/109
cqs and ucqs in datalog
Alternatively, CQs and UCQs can be seen as Datalog rules
CQs:
q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn)
UCQs:
q(⃗x) = ∃⃗y1.P1
1(⃗t1
1) ∧ · · · ∧ P1
n1
( ⃗t1
n1
) q(⃗x) ← P1
1(⃗t1
1), . . . , P1
n1
( ⃗t1
n1
)
∨ ∃⃗y2.P2
1(⃗t2
1) ∧ · · · ∧ P2
n2
( ⃗t2
n2
) q(⃗x) ← P2
1(⃗t2
1), . . . , P2
n(⃗t2
n)
... ⇝
...
∨ ∃⃗yℓ.Pℓ
1(⃗tℓ
1) ∧ · · · ∧ Pℓ
nℓ
( ⃗tℓ
nℓ
) q(⃗x) ← Pℓ
1(⃗tℓ
1), . . . , Pℓ
nℓ
( ⃗tℓ
nℓ
)
56/109
what can we express as (u)cqs?
q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z)
q2(y, x) = q2(y, x) ∨
(
∃z, z′
.serves(x, y) ∧ hasIngred(y, z)
∧hasIngred(z, z′
) ∧ Spicy(z′
)
)
57/109
what can we express as (u)cqs?
q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z)
q2(y, x) = q2(y, x) ∨
(
∃z, z′
.serves(x, y) ∧ hasIngred(y, z)
∧hasIngred(z, z′
) ∧ Spicy(z′
)
)
Capture select-project-join queries of relational algebra
Capture basic graph patterns of SPARQL
57/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
By definition: ⃗a ∈ ans(q, I)
iff
there exists a match π such that I |=π q(⃗a)
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
By definition: ⃗a ∈ ans(q, I)
iff
there exists a match π such that I |=π q(⃗a)
Answering CQs amounts to searching for matches
Recall that ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, I) for every model I of K
Challenge: how do we check that there is a match in every model?
58/109
the universal model property
For Horn DLs, each satisfiable K has a universal model IK
IK is ‘contained’ in every model I of K
⇝ formally, there is a homomorphism from IK to I
An answer to a (U)CQ q in IK is an answer to q in every model of K
⇝ matches of (U)CQs are preserved under homomorphisms
⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, IK)
So: IK gives us the certain answers to q over K
59/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
60/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
Intuition: - IT ,A contains the saturated ABox A′
- if an object satisfies M and M ⊑ ∃R.M′
∈ sat(T ),
a fresh object witnessing this is created
60/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
Intuition: - IT ,A contains the saturated ABox A′
- if an object satisfies M and M ⊑ ∃R.M′
∈ sat(T ),
a fresh object witnessing this is created
Formally, ∆IT ,A
contains words aR1M1 . . . RnMn with a ∈ Ind(A) and:
∙ Ri are roles and Mi are conjunctions of concepts
∙ there exists M ⊑ ∃R1.M1 ∈ sat(T ) such that T , A |= M(a)
∙ Mi ⊑ ∃Ri+1.Mi+1 ∈ sat(T ) for every 1 ≤ i < n
(note: use only strongest axioms M ⊑ ∃R.M′ in sat(T ))
The interpretation function is defined as follows:
∙ aIT ,A
= a,
∙ a ∈ AI
iff A(a) ∈ sat(T , A), eRM ∈ AIT ,A
iff M ⊑ A ∈ sat(T )
∙ (a, b) ∈ rI
iff r(a, b) ∈ sat(T , A), (e, eRM) ∈ rIT ,A
iff R ⊑ r ∈ sat(T ),
and (eRM, e) ∈ rIT ,A
if R ⊑ r−
∈ sat(T )
60/109
example of the canonical model construction (1/3)
TBox: PenneArrab ⊑ ∃hasIngred.Penne
Penne ⊑ Pasta
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ⊑ Spicy
PizzaCalab ⊑ ∃hasIngred.Nduja
Nduja ⊑ Spicy
ABox: serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
The saturated TBox additionally contains:
PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta)
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy)
61/109
example of the canonical model construction (2/3)
IT ,A contains the ABox and is closed under inclusions
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
rp
PizzaCalab
b
PenneArrabserves serves
62/109
example of the canonical model construction (3/3)
The anonymous objects witnessing existential concepts form trees
rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
PenneArrab ⊑ ∃hasIngred.ArrabSauce
PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta)
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy)
63/109
finding answers in the canonical model
To answer CQ q, it suffices to test whether it has a match in IT ,A
But this is still challenging!
- IT ,A contains assertions and objects not present in A
- we cannot build IT ,A explicitly: can be infinite!
Our approach: use query rewriting!
Formally: given a CQ q, we construct a UCQ REWT (q) such that
⃗a ∈ ans(q, IT ,A)
iff
there is a match π for a disjunct q′
of rewT (q) such that
IT ,A |=π q′
(⃗a) and π sends all vars to individuals from A
64/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
65/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
Properties:
∙ Every match for q′
can be extended to a match for q
∙ Every match for q contains a match for q′
∙ The answers of q and q′
coincide, but the relevant matches for q′
are closer to the ABox than those of q
65/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
Properties:
∙ Every match for q′
can be extended to a match for q
∙ Every match for q contains a match for q′
∙ The answers of q and q′
coincide, but the relevant matches for q′
are closer to the ABox than those of q
We repeatedly apply this rewriting step to obtain a set of queries
whose relevant matches range over ABox individuals.
65/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
66/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z′
as ‘leaf’
• Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T )
• RHS ensures hasIngred(z, z′
), Spicy(z′
)
• We replace these atoms by ArrabSauce(z)
66/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z′
as ‘leaf’
• Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T )
• RHS ensures hasIngred(z, z′
), Spicy(z′
)
• We replace these atoms by ArrabSauce(z)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
66/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
67/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π q(b, r) IK |=π′ q′
(b, r)
r
π(x),π′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
67/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π q(b, r) IK |=π′ q′
(b, r)
r
π(x),π′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
)
67/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
68/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z as leaf
• Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce
• RHS yields hasIngred(y, z) and ArrabSauce(z)
• We replace these atoms by PenneArrab(y)
68/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z as leaf
• Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce
• RHS yields hasIngred(y, z) and ArrabSauce(z)
• We replace these atoms by PenneArrab(y)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
68/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
) > depth(π′′
)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
) > depth(π′′
)
In π′′
all variables are mapped to individuals
69/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
70/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
There is a bounded number of such restricted matches π
Checking if π is match reduces to linearly many instance checks
70/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
There is a bounded number of such restricted matches π
Checking if π is match reduces to linearly many instance checks
Yields terminating, sound, and complete CQ answering procedure
70/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
71/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
Data complexity:
sat(T ) and rewT (q) are ABox independent
polynomial bound on candidate matches π
Instance checking in polynomial time
71/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
Data complexity:
sat(T ) and rewT (q) are ABox independent
polynomial bound on candidate matches π
Instance checking in polynomial time
Theorem CQ answering in ELHI⊥and Horn-SHIQ is Exp-complete
in combined complexity and P-complete in data complexity.
71/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity.
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity.
Theorem CQ answering in ELH and DL-LiteR is NP-complete in
combined complexity. For ELH the data complexity is P-complete,
and for DL-LiteR the data complexity is in AC0. 72/109
datalog rewriting
Our procedure yields a Datalog rewriting:
∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q)
∙ use Q in head of rules
∙ the program Π(T ) (from earlier) computes all entailed ABox
assertions
73/109
datalog rewriting
Our procedure yields a Datalog rewriting:
∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q)
∙ use Q in head of rules
∙ the program Π(T ) (from earlier) computes all entailed ABox
assertions
(Πrew(q) ∪ Π(T ), Q)
is a Datalog rewriting of q w.r.t. T relative to consistent ABoxes
73/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
Also know: assertions entailed from K = assertions in sat(K)
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
Also know: assertions entailed from K = assertions in sat(K)
Materialize assertions in sat(K) and view result as database
+ only need to evaluate a UCQ
can use standard relational database systems
– materializing not always convenient
saturation needs to be updated if data changes
74/109
an fo rewriting approach for cqs in dl-lite
For DL-LiteR we can generate an FO-rewriting as follows.
Replace in all q′
∈ rewT (q) each atom by its FO-rewriting for
instance checking:
75/109
an fo rewriting approach for cqs in dl-lite
For DL-LiteR we can generate an FO-rewriting as follows.
Replace in all q′
∈ rewT (q) each atom by its FO-rewriting for
instance checking:
∙ replace each A(t) by RewriteIQ(A, T )
∙ replace each r(t, t′
) by RewriteIQ(r, T )
Resulting FO formula:
∙ positive, can be transformed into a UCQ
∙ rewriting relative to consistent ABoxes
∙ can be combined with a rewriting of unsatisfiability
∙ yields AC0 upper bound in data complexity
75/109
a glimpse beyond
Other Horn DLs
∙ Similar results hold for other dialects of DL-Lite and EL
∙ Sometimes complexity increases, e.g., EL with complex role
inclusions
∙ The P data and Exp combined upper bounds extend to even more
expressive Horn DLs, like Horn-SHOIQ
76/109
a glimpse beyond
Other Horn DLs
∙ Similar results hold for other dialects of DL-Lite and EL
∙ Sometimes complexity increases, e.g., EL with complex role
inclusions
∙ The P data and Exp combined upper bounds extend to even more
expressive Horn DLs, like Horn-SHOIQ
Beyond Horn DLs
∙ no universal model property
∙ query answering usually exponentially harder
∙ different techniques: automata, rolling-up, resolution, etc.
∙ for some well-known DLs (e.g., SHOIQ) decidability open
76/109
complexity of answering (u)cqs
IQs CQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
in AC0 NLogSpace in AC0 NP
EL, ELH P P P NP
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP Exp
ALCI, SH,
SHIQ
coNP Exp coNP 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open
77/109
navigational queries
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
79/109
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
We need navigational queries queries that can query the topology
of the graph, that is, the information stored in the connections
79/109
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
We need navigational queries queries that can query the topology
of the graph, that is, the information stored in the connections
Especially important for highly connected data with no fixed schema
∙ social, biological, and chemical networks
79/109
navigational queries
Prominent navigational query languages:
80/109
navigational queries
Prominent navigational query languages:
Regular Path Queries (RPQs): find pairs of objects that are connected
by a chain of roles that comply with a given regular language
(hasCourse ∪ courseOf−
) · (hasIngred ∪ ingredOf
−
)∗
· Spicy?(x, y)
80/109
navigational queries
Prominent navigational query languages:
Regular Path Queries (RPQs): find pairs of objects that are connected
by a chain of roles that comply with a given regular language
(hasCourse ∪ courseOf−
) · (hasIngred ∪ ingredOf
−
)∗
· Spicy?(x, y)
Conjunctive RPQs: allow to join RPQs conjunctively
∙ similar to CQs, but each atom is an RPQ
∙ extend CQs with the navigational power of RPQs
q(x, x′
) = ∃y, z. serves · Menu? · (hasMain ∪ hasStarter)(x, y) ∧
serves · Menu? · (hasCourse ∪ courseOf−
)(x′
, y) ∧
(hasIngred ∪ ingredOf
−
)∗
· Spicy?(y, z)
Both languages have 1-way and 2-way variants
80/109
our most expressive navigational queries: c2rpqs
Recall: N±
R contains all role names and their inverses.
A conjunctive two-way regular path query (C2RPQ) has the form
q(⃗x) = ∃⃗y.
∧
L(t, t′
) ∧
∧
A(t)
where A is a concept name
t, t′
are variables or individuals (in NI ∪⃗x ∪⃗y)
L is regular language over N±
R ∪ {A? | A ∈ NC}
81/109
our most expressive navigational queries: c2rpqs
Recall: N±
R contains all role names and their inverses.
A conjunctive two-way regular path query (C2RPQ) has the form
q(⃗x) = ∃⃗y.
∧
L(t, t′
) ∧
∧
A(t)
where A is a concept name
t, t′
are variables or individuals (in NI ∪⃗x ∪⃗y)
L is regular language over N±
R ∪ {A? | A ∈ NC}
Regular languages can be given as:
∙ regular expressions E → r ∈ N±
R | A? | r · r | r ∪ r | r∗
∙ non-deterministic finite automata NFA
Recall: RegExps and NFAs are equivalent, but NFAs are more succinct
81/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
82/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
Two-way regular path queries (2RPQs) have only one atom and no
existential variables ⇝ both variables are answer variables
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy?(x, y)
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
82/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
Two-way regular path queries (2RPQs) have only one atom and no
existential variables ⇝ both variables are answer variables
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy?(x, y)
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
(One-way) Regular path queries (RPQs) are 2RPQs with no inverses
⇝ all of the restrictions above
q(x, y) = hasIngred∗
· Spicy?(x, y)
q(x, y) = hasCourse · hasIngred∗
· Spicy?(x, y)
82/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
83/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
Match: mapping π from terms to elements that satisfies all atoms
As before: I |=π q(⃗a) if match π maps answer variables to ⃗a
83/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
Match: mapping π from terms to elements that satisfies all atoms
As before: I |=π q(⃗a) if match π maps answer variables to ⃗a
Certain answers defined as for CQs
Again suffices to find match in canonical model
83/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
84/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
Bound on matches ranging over individuals only
84/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
Bound on matches ranging over individuals only
Challenge: paths may need to go deep into the canonical model
q(x, y) = serves · (hasIngred ∪ ingredOf
−
)∗
· Spicy? · Σ∗
(x, y)
r
π(x)
p
PizzaCalab
b
PenneArrabπ(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
84/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
85/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗
We use NFA representation
We write M ∈ Loopα[s, s′
] iff a ∈ MIK
implies the existence of a path
p below a that takes the NFA α from s to s′
, e.g.,
85/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗
We use NFA representation
We write M ∈ Loopα[s, s′
] iff a ∈ MIK
implies the existence of a path
p below a that takes the NFA α from s to s′
, e.g.,
PenneArrab ∈ Loopα[s1, sf]
because of
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
85/109
computing the loop table
We can explicitly compute the full table Loopα inductively:
86/109
computing the loop table
We can explicitly compute the full table Loopα inductively:
if s is a state then Loopα[s, s] = NC
if M1 ∈ Loopα[s1, s2] and
M2 ∈ Loopα[s2, s3]
then M1 ⊓ M2 ∈ Loopα[s1, s3]
if T |= C1 ⊓ · · · ⊓ Cn ⊑ A and
(s1, A?, s2) ∈ δ
then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s2]
if T |= C1 ⊓ · · · ⊓ Cn ⊑ ∃R.D,
T |= R ⊑ R′
, T |= R ⊑ R′′
,
(s1, R′
, s2) ∈ δ,
D ∈ Loopα[s2, s3], and
(s3, R′′−
, s4) ∈ δ
then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s4]
86/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
∙ ArrabSauce ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ∈ Loopα[s1, sf]
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
∙ ArrabSauce ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ∈ Loopα[s1, sf]
∙ PenneArrab ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ∈ Loopα[s1, sf]
87/109
evaluation 2rpqs using the loop table
Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K)
Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A
88/109
evaluation 2rpqs using the loop table
Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K)
Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A
∙ After checking consistency, we start from (a, s0)
∙ At pair (c, s), guess new pair (d, s′
) together with one of:
∙ transition (s, σ, s′
) a σ-step from c to d in ABox
⇝ check if (c, d) ∈ σI
∙ concepts M in Loopα[s, s′
] stay at same individual, and jump to s′
⇝ check if c = d ∈ MI
∙ Exit when we get pair (b, sf)
∙ Use counter to ensure termination (only need to consider each
pair once)
88/109
evaluation algorithm
Algorithm EvalAtom
Input: NFA α = (S, Σ, δ, s0, F) with Σ ⊆ N±
R ∪ {A? | A ∈ NC}, ELHI⊥ KB
(T , A), (a, b) ∈ Ind(A) × Ind(A)
1. Test whether (T , A) is satisfiable, output yes if not.
2. Initialize current = (a, s0) and count = 0. Set max = |A| · |S| + 1.
3. While count < max and current ̸∈ {(b, sf) | sf ∈ F}
3.1 Let current = (c, s).
3.2 Guess a pair (d, s′
) ∈ Ind(A) × S and either (s, σ, s′
) ∈ δ or
M ∈ Loopα[s, s′
].
3.3 If (s, σ, s′
) was guessed
∙ If σ ∈ N±
R , then verify that T , A |= σ(c, d), and return no if not.
∙ If σ = A?, then verify that c = d and T , A |= A(c), and return no if not.
3.4 If M was guessed, then verify that c = d and that T , A |= B(c) for every
concept name B ∈ M, and return no if not.
3.5 Set current = (d, s′
) and increment count.
4. If current = (b, sf) for some sf ∈ F, return yes. Else return no.
89/109
evaluation algorithm: example (1/2)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
PenneArrab ⊑ PastaDish ⊓ ∃hasIngred.ArrabSauce
PastaDish ⊑ Dish ⊓ ∃hasIngred.Pasta
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ⊔ ∃hasIngred.Spicy ⊑ Spicy
Spicy ⊓ Dish ⊑ SpicyDish
r
π(x)
p
PizzaCalab
b
PenneArrabπ(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
90/109
evaluation algorithm: example (2/2)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
s0 s1 sf
serves
hasIngred
ingredOf−
Spicy?
Σ∗Peperonc ∈ Loopα[s1, sf]
ArrabSauce ∈ Loopα[s1, sf]
PenneArrab ∈ Loopα[s1, sf]
91/109
evaluation algorithm: example (2/2)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
s0 s1 sf
serves
hasIngred
ingredOf−
Spicy?
Σ∗Peperonc ∈ Loopα[s1, sf]
ArrabSauce ∈ Loopα[s1, sf]
PenneArrab ∈ Loopα[s1, sf]
count: 0 1 2
Guess (r, s0) (b, s1) (b, sf)
(s0, serves, s1) ∈ δ PenneArrab ∈ Loopα[s1, sf]
Test (r, b) ∈ servesI
b ∈ PenneArrabI
return yes
91/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
For ELH and DL-Lite, we can obtain P upper bound (combined)
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
For ELH and DL-Lite, we can obtain P upper bound (combined)
Data complexity:
∙ Loopα computation in constant time (ABox independent)
∙ called procedures in P for ELH and NLogSpace for DL-LiteR
92/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
∙ instance checking
∙ graph reachability = RPQ over plain ABox
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
∙ instance checking
∙ graph reachability = RPQ over plain ABox
P-hardness for DL-LiteR non-trivial
93/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
94/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
This algorithm works for all DLs discussed and gives optimal
complexity bounds
94/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
This algorithm works for all DLs discussed and gives optimal
complexity bounds
Answering C2RPQs is not much harder:
∙ Combined complexity increases to PSpace for DL-LiteR and ELH
∙ but most other bounds are the same as for RPQs and CQs
∙ even for very expressive DLs that are not Horn
94/109
final remarks on navigational queries
Navigational queries provide more querying power at moderate
computational cost
Expressible in Datalog, but computationally better behaved
Good alternative to CQs, gaining increasing attention
Property paths in SPARQL
∙ included in the SPARQL 1.1 standard
∙ add regular paths as in C2RPQs
Ongoing quest for more flexible navigational languages
95/109
complexity of answering (c)(2)rpqs
2RPQs C2RPQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
NLogSpace P NLogSpace PSpace
EL, ELH P P P PSpace
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP-hard 2Exp
ALCI, SH,
SHIQ
coNP Exp coNP-hard 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open 96/109
comparison: complexity of answering (u)cqs
IQs CQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
in AC0 NLogSpace in AC0 NP
EL, ELH P P P NP
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP Exp
ALCI, SH,
SHIQ
coNP Exp coNP 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open
97/109
queries with negation or re-
cursion
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
99/109
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
Conjunctive query with inequalities (CQ̸=
)
∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals)
∙ example: find restaurant offering two menus having different
dessert courses
∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧
offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2
∙ example: find menus with at least three courses (see notes)
99/109
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
Conjunctive query with inequalities (CQ̸=
)
∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals)
∙ example: find restaurant offering two menus having different
dessert courses
∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧
offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2
∙ example: find menus with at least three courses (see notes)
Note: can define UCQ¬s
s and UCQ̸=
s in the obvious way
99/109
undecidability results for queries with negation
Adding negation leads to undecidability even in very restricted
settings.
Theorem The following problems are undecidable:
∙ CQ¬s
answering in DL-LiteR
∙ UCQ¬s
answering in EL⊥
∙ CQ̸=
answering in DL-LiteR
∙ CQ̸=
answering in EL⊥
100/109
undecidability results for queries with negation
Adding negation leads to undecidability even in very restricted
settings.
Theorem The following problems are undecidable:
∙ CQ¬s
answering in DL-LiteR
∙ UCQ¬s
answering in EL⊥
∙ CQ̸=
answering in DL-LiteR
∙ CQ̸=
answering in EL⊥
Possible solution: adopt alternative semantics (see lecture notes)
100/109
undecidability results for queries with recursion
Significant interest in combining DLs with Datalog rules
Unfortunately, this almost always leads to undecidability:
Theorem Datalog query answering is undecidable in every DL that
can express (directly or indirectly) A ⊑ ∃r.A
In particular: undecidable in both DL-Lite and EL
101/109
undecidability results for queries with recursion
Significant interest in combining DLs with Datalog rules
Unfortunately, this almost always leads to undecidability:
Theorem Datalog query answering is undecidable in every DL that
can express (directly or indirectly) A ⊑ ∃r.A
In particular: undecidable in both DL-Lite and EL
Possible solutions:
∙ use restricted classes of Datalog queries (e.g. path queries)
∙ DL-safe rules: can only apply rules to (named) individuals
101/109
research trends in omqa
efficient omqa
Lots of work on developing and implementing efficient OMQA
algorithms
Focus mostly on DL-Lite (and related dialects):
∙ First algorithm PerfectRef proposed in mid-2000’s
∙ Rewrites into UCQs, implemented in Quonto
∙ Improved versions proposed in Requiem, Presto, Rapid, …
∙ Some algorithms rewrite into positive existential queries or
Datalog programs instead of UCQs
∙ Resulting queries are smaller, can be easier to evaluate
103/109
optimizations and omqa beyond dl-lite
Tractable classes, fragments of lower complexity
Rewriting engines for other Horn DLs also developed, e.g.,
∙ Requiem and the related Kyrie cover several EL dialects
∙ Clipper, and recently Rapid cover Horn-SHIQ
They usually rewrite into Datalog programs
104/109
understanding rewritability
Much attention devoted to understanding the limits of rewritability
and size of rewritings
When are polynomial rewritings possible?
Can we give bounds on the size of rewritings?
Which non-DL-Lite ontologies can be rewritten into FO-queries?
⇝ related to non-uniform complexity:
∙ study specific pairs (q, T ), called ontology-mediated queries
105/109
combined approaches
Saturate the ABox using the TBox axioms
⇝ a finite version of the canonical model
and then evaluate the query over the saturated ABox
Two approaches:
∙ modify the query before evaluation to ensure soundess, or
∙ evaluate and then filter unsound answers
First proposed for EL, then also for DL-Lite
Extended to other dialects, richer DLs
106/109
querying existing relational data using mappings
Today: assumed data given as ABox assertions (unary + binary facts)
Problem: how to query existing relational data (arbitrary arity)?
Solution: use mapping that specifies relationship between the
database relations and the concepts / roles in DL vocabulary
Formally: mapping assertions of the form φ → ψ where:
∙ φ is an query formulated using DB relations
∙ ψ is a query in the DL vocabulary
Global-as-view (GAV) mappings: φ CQ, ψ atom (no quantifiers)
Handling mappings:
∙ apply mappings to generate ABox, proceed as usual
∙ virtual ABox: unfolding step to get rewriting over DB relations
107/109
other research topics (non-exhaustive)
Beyond classical OMQA  
∙ inconsistency-tolerant query answering
∙ probabilistic query answering
∙ privacy-aware query answering
∙ temporal query answering
Support for building and maintaining OMQA systems
∙ module extraction
∙ ontology evolution
∙ query inseparability and emptiness
Improving the usability of OMQA systems
∙ interfaces and support for query formulation
∙ explaining query (non-)answers
108/109
Questions ?
109/109

Ontology-mediated query answering with data-tractable description logics

  • 1.
    ontology-mediated query answering with data-tractable descriptionlogics Meghyn Bienvenu (CNRS & Université de Montpellier) Magdalena Ortiz (Vienna University of Technology)
  • 2.
    ontology-mediated query answering(omqa) data incomplete database (ground facts) ontology (logical theory) ??? user query 2/109
  • 3.
    ontology-mediated query answering(omqa) data ??? patient data “Melanie has listeriosis” “Paul has Lyme disease” medical knowledge “Listeriosis & Lyme disease are bacterial infections” user query “Find all patients with bacterial infections” 2/109
  • 4.
    ontology-mediated query answering(omqa) data ??? patient data “Melanie has listeriosis” “Paul has Lyme disease” medical knowledge “Listeriosis & Lyme disease are bacterial infections” user query “Find all patients with bacterial infections” expected answers: Melanie, Paul 2/109
  • 5.
    ontology-mediated query answering(omqa) data ??? employee data “Marie is a professor” “Mark teaches CS200” org. knowledge “Professors are teaching staff” “Someone who teaches is part of the teaching staff” user query “Find all teaching staff” 2/109
  • 6.
    ontology-mediated query answering(omqa) data ??? employee data “Marie is a professor” “Mark teaches CS200” org. knowledge “Professors are teaching staff” “Someone who teaches is part of the teaching staff” user query “Find all teaching staff” expected answers: Marie, Mark 2/109
  • 7.
    what are ontologiesgood for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information 3/109
  • 8.
    what are ontologiesgood for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information To present an intuitive and unified view of data sources ∙ ontology can be used to enrich the data vocabulary, making it easier for users to formulate their queries ∙ especially useful when integrating multiple data sources 3/109
  • 9.
    what are ontologiesgood for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information To present an intuitive and unified view of data sources ∙ ontology can be used to enrich the data vocabulary, making it easier for users to formulate their queries ∙ especially useful when integrating multiple data sources To support automated reasoning ∙ uncover implicit connections between terms, errors in modelling ∙ exploit knowledge in the ontology during query answering, to get back a more complete set of answers to queries 3/109
  • 10.
    applications of omqa:medicine General medical ontologies: SNOMED CT (∼ 400,000 terms!), GALEN Specialized ontologies: FMA (anatomy), NCI (cancer), ... Querying & exchanging medical records (find patients for medical trials) ∙ myocardial infarction vs. MI vs. heart attack vs. 410.0 Supports tools for annotating and visualizing patient data (scans, x-rays) 4/109
  • 11.
    applications of omqa:life sciences Hundreds of ontologies at BioPortal (http://bioportal.bioontology.org/): Gene Ontology (GO), Cell Ontology, Pathway Ontology, Plant Anatomy, ... Help scientists share, query, & visualize experimental data 5/109
  • 12.
    applications of omqa:entreprise information systems Companies and organizations have lots of data ∙ need easy and flexible access to support decision-making Example industrial projects: ∙ Public debt data: Sapienza Univ. & Italian Department of Treasury ∙ Energy sector: Optique EU project (several univ, StatOil, & Siemens) 6/109
  • 13.
    our focus: horndescription logics Ontologies formulated using description logics (DLs): ∙ family of decidable fragments of first-order logic ∙ basis for OWL web ontology language (W3C) ∙ range from fairly simple to highly expressive ∙ complexity of query answering well understood 7/109
  • 14.
    our focus: horndescription logics Ontologies formulated using description logics (DLs): ∙ family of decidable fragments of first-order logic ∙ basis for OWL web ontology language (W3C) ∙ range from fairly simple to highly expressive ∙ complexity of query answering well understood In this tutorial, focus on Horn description logics: ∙ DL-LiteR, EL, ELHI, Horn-SHIQ, ... ∙ good computational properties, well suited for OMQA ∙ still expressive enough for interesting applications ∙ basis for OWL 2 QL and OWL 2 EL profiles Consider various types of queries 7/109
  • 15.
    plan for today ∙Horn Description Logics ∙ Basics of OMQA ∙ Instance Queries ∙ Conjunctive Queries ∙ Navigational Queries ∙ Queries with Negation and Recursion ∙ Research Trends in OMQA 8/109
  • 16.
  • 17.
    dl basics Building blocksof DLs: ∙ concept names (unary predicates, classes) IceCream, Pizza, Meat, SpicyDish, Dish, Menu, Restaurant, ... ∙ role names (binary predicates, properties) hasIngred, hasCourse, hasDessert, serves, ... ∙ individual names (constants) menu32, pastadish17, d3, rest156, r12, ... (specific menus, dishes, restaurants ...) NC / NR / NI: set of all concept / role / individual names 10/109
  • 18.
    dl knowledge bases Knowledgebase (KB) = ABox (data) + TBox (ontology) ABox contains facts about specific individuals ∙ finite set of concept assertions A(a) and role assertions r(a, b) ∙ IceCream(d2): dish d2 is of type IceCream ∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2 11/109
  • 19.
    dl knowledge bases Knowledgebase (KB) = ABox (data) + TBox (ontology) ABox contains facts about specific individuals ∙ finite set of concept assertions A(a) and role assertions r(a, b) ∙ IceCream(d2): dish d2 is of type IceCream ∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2 TBox contains general knowledge about the domain of interest ∙ finite set of axioms (details on syntax to follow) ∙ IceCream is a subclass of Dessert ∙ hasCourse connects Menus to Dishes ∙ every Menu is connected to at least one dish via hasCourse 11/109
  • 20.
    concept and roleconstructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish 12/109
  • 21.
    concept and roleconstructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) 12/109
  • 22.
    concept and roleconstructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) ∙ inverse (− ) and composition (·) of roles hasCourse− contains · contains (use N± R for set of role names and inverse roles) (use inv(r) to toggle −: inv(r) = r−, inv(r−) = r ) 12/109
  • 23.
    concept and roleconstructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) ∙ inverse (− ) and composition (·) of roles hasCourse− contains · contains (use N± R for set of role names and inverse roles) (use inv(r) to toggle −: inv(r) = r−, inv(r−) = r ) Note: set of available constructors depends on the particular DL! 12/109
  • 24.
    tbox axioms Concept inclusionsC ⊑ D (C, D possibly complex concepts) IceCream ⊑ Dessert Menu ⊑ ∃hasCourse.⊤ Spicy ⊓ Dish ⊑ SpicyDish Role inclusions R ⊑ S (R, S possibly complex roles) hasIngred ⊑ contains ingredOf− ⊑ hasIngred hasDessert ⊑ hasCourse Note: type and syntax of axioms depends on the particular DL! 13/109
  • 25.
    dl semantics Interpretation I(“possible world”) ∙ domain of objects ∆I (possibly infinite set) ∙ interpretation function ·I that maps ∙ concept name A ⇝ set of objects AI ⊆ ∆I ∙ role name r ⇝ set of pairs of objects rI ⊆ ∆I × ∆I ∙ individual name a ⇝ object aI ∈ ∆I 14/109
  • 26.
    example: interpretation ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse 4 conceptnames: Dish, Dessert, Appetizer, Menu 1 role name: hasCourse 2 individual names: italFeast, chCake 15/109
  • 27.
    dl semantics Interpretation I(“possible world”) ∙ domain of objects ∆I (possibly infinite set) ∙ interpretation function ·I that maps ∙ concept name A ⇝ set of objects AI ⊆ ∆I ∙ role name r ⇝ set of pairs of objects rI ⊆ ∆I × ∆I ∙ individual name a ⇝ object aI ∈ ∆I Interpretation function ·I extends to complex concepts and roles: ⊤ ∆I ⊥ ∅ ¬C ∆I CI C1 ⊓ C2 C1 I ∩ C2 I ∃R.C {d1 | there exists (d1, d2) ∈ RI with d2 ∈ CI } ∀R.C {d1 | d2 ∈ CI for all (d1, d2) ∈ RI } r− {(d2, d1) | (d1, d2) ∈ rI } 16/109
  • 28.
    back to theexample ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse Dish ⊓ Menu Dessert ⊓ Appetizer ∃hasCourse.⊤ ∃hasCourse− .Dessert 17/109
  • 29.
    semantics of dlkbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI 18/109
  • 30.
    semantics of dlkbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI ∙ I satisfies A(a) ⇔ aI ∈ AI ∙ I satisfies r(a, b) ⇔ (aI , bI ) ∈ rI 18/109
  • 31.
    semantics of dlkbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI ∙ I satisfies A(a) ⇔ aI ∈ AI ∙ I satisfies r(a, b) ⇔ (aI , bI ) ∈ rI Model of a KB K = interpretation that satisfies all statements in K K is satisfiable = K has at least one model K entails α (written K |= α) = every model I of K satisfies α 18/109
  • 32.
    back to theexample ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse Which of the following assertions / axioms is satisfied in I? Dessert ⊑ Dish Dish ⊓ Menu ⊑ ⊥ Menu ⊑ ∃hasCourse.⊤ ∃hasCourse− .⊤ ⊑ Dish Menu(italFeast) hasCourse(italFeast, chCake) 19/109
  • 33.
    some important horndls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) 20/109
  • 34.
    some important horndls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R 20/109
  • 35.
    some important horndls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox 20/109
  • 36.
    some important horndls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox ELHI⊥ ∙ additionally allows for ⊥ and inverse roles (r− ) ∙ can also have role inclusions 20/109
  • 37.
    some important horndls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox ELHI⊥ ∙ additionally allows for ⊥ and inverse roles (r− ) ∙ can also have role inclusions Horn-SHIQ ∙ limited use of ¬, ∀r.C, and number restrictions (≥ nR.C, ≤ nR.C) ∙ also have transitivity axioms (e.g. assert contains is transitive) 20/109
  • 38.
  • 39.
    aboxes vs. databases ABoxesand databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity 22/109
  • 40.
    aboxes vs. databases ABoxesand databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity ABoxes interpreted under open world assumption: ∙ every assertion in the ABox is assumed to hold (true) ∙ assertions not present in the ABox may hold or not (unknown) Each ABox gives rise to many interpretations (its models) ∙ models can be infinite, can have infinitely many models 22/109
  • 41.
    aboxes vs. databases ABoxesand databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity ABoxes interpreted under open world assumption: ∙ every assertion in the ABox is assumed to hold (true) ∙ assertions not present in the ABox may hold or not (unknown) Each ABox gives rise to many interpretations (its models) ∙ models can be infinite, can have infinitely many models Databases interpreted under closed world assumption: ∙ every fact in the DB is assumed to hold (true) ∙ every fact not in the DB is assumed not to hold (false) In other words, each DB corresponds to single finite interpretation ∙ domain of the interpretation = set of constants in DB 22/109
  • 42.
    querying databases Database queryq of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D 23/109
  • 43.
    querying databases Database queryq of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I 23/109
  • 44.
    querying databases Database queryq of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I First-order (FO) query = first-order formula ∙ arity of FO query = number of free variables ∙ answers = substitutions for free vars that make formula hold ∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y)) 23/109
  • 45.
    querying databases Database queryq of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I First-order (FO) query = first-order formula ∙ arity of FO query = number of free variables ∙ answers = substitutions for free vars that make formula hold ∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y)) Datalog queries = finite set of Datalog rules + ‘goal’ relation ∙ arity of Datalog query = arity of goal relation ∙ answers = exhaustively apply rules to DB / interpretation, collect tuples in goal relation ∙ example: rules contains(x, z) ← contains(x, y), contains(y, z) and SpicyDish(x) ← Dish(x), contains(x, y), Spicy(y) 23/109
  • 46.
    querying dl knowledgebases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation 24/109
  • 47.
    querying dl knowledgebases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB 24/109
  • 48.
    querying dl knowledgebases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB Formally: Call a tuple (a1, . . . , an) of individuals from A a certain answer to n-ary query q over DL KB K = (T , A) iff (aI 1 , . . . , aI n ) ∈ ans(q, I) for every model I of K 24/109
  • 49.
    querying dl knowledgebases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB Formally: Call a tuple (a1, . . . , an) of individuals from A a certain answer to n-ary query q over DL KB K = (T , A) iff (aI 1 , . . . , aI n ) ∈ ans(q, I) for every model I of K Ontology-mediated query answering (OMQA) = computing certain answers to queries 24/109
  • 50.
    example: certain answers Considerthe query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} 25/109
  • 51.
    example: certain answers Considerthe query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} There are four certain answers to q w.r.t. K: ∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T ∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T ∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A ∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert− ⊑ Dessert ∈ T 25/109
  • 52.
    example: certain answers Considerthe query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} There are four certain answers to q w.r.t. K: ∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T ∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T ∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A ∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert− ⊑ Dessert ∈ T The fifth individual m is not a certain answer: can construct model J of K in which mJ ̸∈ DessertJ (see lecture notes). 25/109
  • 53.
    key techniques foromqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA 26/109
  • 54.
    key techniques foromqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA Call q′ (⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′ (⃗x), IA) (IA = treat A as DB) 26/109
  • 55.
    key techniques foromqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA Call q′ (⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′ (⃗x), IA) (IA = treat A as DB) Types of rewritings: FO-rewritings (SQL), Datalog rewritings, ... 26/109
  • 56.
    key techniques foromqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation 27/109
  • 57.
    key techniques foromqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation Simple use of saturation: (works e.g. for RDFS ontologies) ∙ use saturation to ‘complete’ the ABox by adding those assertions that are logically entailed from the KB ∙ then evaluate the query over the saturated ABox 27/109
  • 58.
    key techniques foromqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation Simple use of saturation: (works e.g. for RDFS ontologies) ∙ use saturation to ‘complete’ the ABox by adding those assertions that are logically entailed from the KB ∙ then evaluate the query over the saturated ABox More complex uses: ∙ enrich the ABox in other ways (e.g. add new ABox individuals to witness the existential restrictions ∃R.C) ∙ combine saturation with query rewriting 27/109
  • 59.
    complexity of omqa ViewOMQA as a decision problem (yes-or-no question): Problem: Q answering in L (Q a query language, L a DL) Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T , and a tuple ⃗a ∈ Ind(A)n Question: Does ⃗a belong to cert(q, (T , A))? 28/109
  • 60.
    complexity of omqa ViewOMQA as a decision problem (yes-or-no question): Problem: Q answering in L (Q a query language, L a DL) Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T , and a tuple ⃗a ∈ Ind(A)n Question: Does ⃗a belong to cert(q, (T , A))? Combined complexity: in terms of size of whole input Data complexity: in terms of size of A only ∙ view rest of input as fixed (of constant size) ∙ motivation: ABox typically much larger than rest of input Note: use |A| to denote size of A (similarly for |T |, |q|, etc.) 28/109
  • 61.
    complexity classes We willmention the following standard classes: P problems solvable in deterministic polynomial time NP problems solvable in non-det. polynomial time coNP problems whose complement is solvable in non-deterministic polynomial time LogSpace problems solvable in deterministic logarithmic space NLogSpace problems solvable in non-det. logarithmic space PSpace problems solvable in polynomial space (note: =NPSpace) Exp problems solvable in deterministic exponential time 29/109
  • 62.
    complexity classes We willmention the following standard classes: P problems solvable in deterministic polynomial time NP problems solvable in non-det. polynomial time coNP problems whose complement is solvable in non-deterministic polynomial time LogSpace problems solvable in deterministic logarithmic space NLogSpace problems solvable in non-det. logarithmic space PSpace problems solvable in polynomial space (note: =NPSpace) Exp problems solvable in deterministic exponential time Another less known but important class: AC0 problems solvable by uniform family of polynomial-size constant-depth circuits Relationships between classes: AC0 ⊊ LogSpace ⊆ NLogSpace ⊆ P ⊆ NP ⊆ PSpace ⊆ Exp 29/109
  • 63.
  • 64.
    instance queries Instance queries(IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query 31/109
  • 65.
    instance queries Instance queries(IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query To query for a complex concept C, take AC(x) for fresh AC ∈ NC and add C ⊑ AC to the TBox 31/109
  • 66.
    instance queries Instance queries(IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query To query for a complex concept C, take AC(x) for fresh AC ∈ NC and add C ⊑ AC to the TBox Remarks: ∙ Instance query answering is often called instance checking ∙ Focus of OMQA until mid-2000s 31/109
  • 67.
    instance checking indl-lite via query rewriting Input = instance query q + DL-LiteR TBox T We construct an FO-rewriting of q w.r.t. T More specifically, we construct: ∙ an FO-rewriting of q relative to consistent ABoxes, and ∙ an FO-rewriting of unsatisfiability (these can be easily combined into FO-rewriting of q for all ABoxes) 32/109
  • 68.
    rewriting relative toconsistent aboxes We first define two procedures: ComputeSubsumees all reasons for an individual to be in B input concept B, TBox T output set of C such that T |= C ⊑ B ⇝ subsumees of B w.r.t. T ComputeSubroles all reasons for a pair to be in R input role R, TBox T output set of S such that T |= S ⊑ R ⇝ subroles of R w.r.t. T 33/109
  • 69.
    computing subsumees Algorithm ComputeSubsumees Input:DL-LiteR TBox T , concept B ∈ NC ∪ {∃R | R ∈ N± R } 1. Initialize Subsumees = {B} and Examined = ∅. 2. While Subsumees Examined ̸= ∅ 2.1 Select D ∈ Subsumees Examined and add D to Examined. 2.2 For every concept inclusion C ⊑ D ∈ T ∙ If C ̸∈ Subsumees, add C to Subsumees 2.3 For every role inclusion R ⊑ S ∈ T such that D = ∃S. ∙ If ∃R ̸∈ Subsumees, add ∃R to Subsumees 2.4 For every role inclusion R ⊑ S ∈ T such that D = ∃inv(S). ∙ If ∃inv(R) ̸∈ Subsumees, add ∃inv(R) to Subsumees. 3. Return Subsumees. 34/109
  • 70.
    computing subsumees: anexample (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} 35/109
  • 71.
    computing subsumees: anexample (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} Choose: Dish Examined = {Dish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 35/109
  • 72.
    computing subsumees: anexample (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} Choose: Dish Examined = {Dish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } Choose: ItalDish Examined = {Dish, ItalDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 35/109
  • 73.
    computing subsumees: anexample (2/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: VegDish Examined = {Dish, ItalDish, VegDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 36/109
  • 74.
    computing subsumees: anexample (2/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: VegDish Examined = {Dish, ItalDish, VegDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } Choose: ∃hasCourse− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 36/109
  • 75.
    computing subsumees: anexample (3/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: ∃hasMain− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , hasMain− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 37/109
  • 76.
    computing subsumees: anexample (3/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: ∃hasMain− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , hasMain− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } Choose: ∃hasDessert− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 37/109
  • 77.
    computing subroles Algorithm ComputeSubroles Input:DL-LiteR TBox T , role R ∈ N± R 1. Initialize Subroles = {R} and Examined = ∅. 2. While Subroles Examined ̸= ∅ 2.1 Select S ∈ Subroles Examined and add S to Examined. 2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T ∙ If U ̸∈ Subsumees, add U to Subsumees 3. Return Subroles. 38/109
  • 78.
    computing subroles Algorithm ComputeSubroles Input:DL-LiteR TBox T , role R ∈ N± R 1. Initialize Subroles = {R} and Examined = ∅. 2. While Subroles Examined ̸= ∅ 2.1 Select S ∈ Subroles Examined and add S to Examined. 2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T ∙ If U ̸∈ Subsumees, add U to Subsumees 3. Return Subroles. ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Run on hasCourse: Subroles = {hasCourse, hasMain, hasDessert} 38/109
  • 79.
    from concepts androles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) 39/109
  • 80.
    from concepts androles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) Rewriting of r(x, y) w.r.t. T (and consistent ABoxes): RewriteIQ(r, T ) = ∨ s∈SR s(x, y) ∨ ∨ s−∈SR s(y, x) 39/109
  • 81.
    from concepts androles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) Rewriting of r(x, y) w.r.t. T (and consistent ABoxes): RewriteIQ(r, T ) = ∨ s∈SR s(x, y) ∨ ∨ s−∈SR s(y, x) The rewriting is ABox-independent and polysize in |T | and |q|. 39/109
  • 82.
    example of queryrewriting (1/2) We have already computed: ComputeSubsumees(Dish, T ) ={Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain − , ∃hasDessert− } Get following rewriting of Dish(x) w.r.t. T : RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x) 40/109
  • 83.
    example of queryrewriting (2/2) ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse ABox A: hasMain(m, d1) hasDessert(m, d2) VegDish(d3) RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x) Certain answers: d1, because of the disjunct ∃y.hasMain(y, x) d2, because of the disjunct ∃y.hasDessert(y, x) d3, because of the disjunct VegDish(x) 41/109
  • 84.
    checking unsatisfiability We havea FO-rewriting of q w.r.t. T relative to consistent ABoxes We need a rewriting of unsatisfiability to obtain a rewriting of q 42/109
  • 85.
    checking unsatisfiability We havea FO-rewriting of q w.r.t. T relative to consistent ABoxes We need a rewriting of unsatisfiability to obtain a rewriting of q ∙ only negative inclusions relevant ∙ one subquery for each such inclusion G ⊑ ¬H ∙ consider all possible ways of violating G ⊑ ¬H: combinations of a subsumee (subrole) of G and a subsumee (subrole) of H Details in the lecture notes. 42/109
  • 86.
    complexity of instancechecking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 43/109
  • 87.
    complexity of instancechecking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 In combined complexity: ∙ P membership: rewriting and evaluation both in polynomial time ∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting 43/109
  • 88.
    complexity of instancechecking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 In combined complexity: ∙ P membership: rewriting and evaluation both in polynomial time ∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting Theorem In DL-LiteR, satisfiability and instance checking are 1. in AC0 for data complexity 2. NLogSpace-complete for combined complexity. Note: Same bounds hold for several other DL-Lite dialects 43/109
  • 89.
    instance checking inel Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B 44/109
  • 90.
    instance checking inel Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B Cannot use FO query rewriting approach for EL: no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A} 44/109
  • 91.
    instance checking inel Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B Cannot use FO query rewriting approach for EL: no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A} We present a saturation-based approach. 44/109
  • 92.
    saturation rules forel TBox rules A ⊑ Bi (1 ≤ i ≤ n) B1 ⊓ . . . ⊓ Bn ⊑ D A ⊑ D T1 A ⊑ B B ⊑ ∃r.D A ⊑ ∃r.D T2 A ⊑ ∃r.B B ⊑ D ∃r.D ⊑ E A ⊑ E T3 ABox rules A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n) B(a) A1 ∃r.B ⊑ A r(a, b) B(b) A(a) A2 Algorithm: apply rules exhaustively, check if A(a) (r(a, b)) is present 45/109
  • 93.
  • 94.
    example: saturation inel ArrabSauce ⊑ Spicy T3 : (5), (6), (7) (10) PenneArrab ⊑ Spicy T3 : (1), (10), (7) (11) PenneArrab ⊑ Dish T1 : (2), (3) (12) PenneArrab ⊑ ∃hasIngred.Pasta T2 : (2), (4) (13) PenneArrab ⊑ SpicyDish T1 : (11), (12), (8) (14) Spicy(p) A1 : (11), (9) (15) Dish(p) A1 : (12), (9) (16) SpicyDish(p) A1 : (16), (15) (17) 46/109
  • 95.
    complexity of instancechecking in el Saturation approach is sound: everything derived is entailed 47/109
  • 96.
    complexity of instancechecking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ 47/109
  • 97.
    complexity of instancechecking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ Note: does not make all consequences explicit ∙ can have infinitely many implied axioms ⇝ would not terminate! ∙ so: only complete for some reasoning tasks 47/109
  • 98.
    complexity of instancechecking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ Note: does not make all consequences explicit ∙ can have infinitely many implied axioms ⇝ would not terminate! ∙ so: only complete for some reasoning tasks Runs in polynomial time in |K|. This is optimal: Theorem Instance checking in EL is P-complete for both data and combined complexity. 47/109
  • 99.
    extending the saturationapproach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur A ⊑ ∃R.D ∃R− .B ⊑ E 48/109
  • 100.
    extending the saturationapproach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur A ⊑ ∃R.D ∃R− .B ⊑ E A ⊓ B ⊑ ∃R.(D ⊓ E) 48/109
  • 101.
    extended set ofsaturation rules TBox rules {A ⊑ Bi}n i=1 B1 ⊓ . . . ⊓ Bn ⊑ D A ⊑ D T1 R ⊑ S S ⊑ T R ⊑ T T4 M ⊑ ∃R.(N ⊓ ⊥) M ⊑ ⊥ T5 M ⊑ ∃R.(N ⊓ N′ ) N ⊑ A M ⊑ ∃R.(N ⊓ N′ ⊓ A) T6 M ⊑ ∃R.(N ⊓ A) ∃S.A ⊑ B R ⊑ S M ⊑ B T7 M ⊑ ∃R.N ∃inv(S).A ⊑ B R ⊑ S M ⊓ A ⊑ ∃R.(N ⊓ B) T8 ABox rules A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n) B(a) A1 ∃r.B ⊑ A r(a, b) B(b) A(a) A2 ∃r− .B ⊑ A r(b, a) B(b) A(a) A3 r ⊑ s r(a, b) s(a, b) A4 r ⊑ s− r(a, b) s(b, a) A5 49/109
  • 102.
    extending the saturationapproach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur C ⊑ ∃R.D D ⊑ ∀R− .B C ⊑ ∃R.(N ⊓ N′ ⊓ A) New set of rules ⇝ exponentially many different new axioms Theorem Instance checking in ELHI⊥ is P-complete for data and Exp-complete for combined complexity. 50/109
  • 103.
    saturation as adatalog program Let sat(T ) be result of applying TBox saturation rules to T . For each ELHI⊥ TBox T and ABox signature Σ define following Datalog program Π(T , Σ): Π(T , Σ) ={B(x) ← A1(x), . . . , An(x) | A1 ⊓ . . . ⊓ An ⊑ B ∈ sat(T )} ∪ {B(x) ← A(y), r(x, y) | ∃r.A ⊑ B ∈ T } ∪ {B(y) ← A(x), r(x, y) | ∃r− .A ⊑ B ∈ T } ∪ {s(x, y) ← r(x, y) | r ⊑ s ∈ sat(T ), s ∈ NR} ∪ {s(y, x) ← r(x, y) | r ⊑ s− ∈ sat(T ), s ∈ NR} ∪ {⊤(x) ← A(x) | A ∈ NC ∩ Σ} ∪ {⊤(x) ← r(x, y) | r ∈ NR ∩ Σ} ∪ {⊤(x) ← r(y, x) | r ∈ NR ∩ Σ} 51/109
  • 104.
    saturation as adatalog program (continued) Theorem For every finite signature Σ and ELHI⊥ KB K = (T , A) with sig(A) ⊆ Σ: 1. K is unsatisfiable iff ans((Π(T , Σ), ⊥), IA) ̸= ∅; 2. If K is satisfiable, then for all A ∈ NC, r ∈ NR, and a, b ∈ Ind(A): ∙ K |= A(a) iff a ∈ ans((Π(T , Σ), A), IA); ∙ K |= r(a, b) iff (a, b) ∈ ans((Π(T , Σ), r), IA). This means: ∙ get Datalog rewriting of instance queries in ELHI⊥ ∙ can use Datalog program to create saturated ABox 52/109
  • 105.
    saturation as adatalog program: an example The Datalog program associated with our example: PastaDish(x) ← PenneArrab(x) PenneArrab ⊑ PastaDish Dish(x) ← PastaDish(x) PastaDish ⊑ Dish Spicy(x) ← Peperonc(x) Peperonc ⊑ Spicy Spicy(x) ← hasIngred(x, y), Spicy(y) ∃hasIngred.Spicy ⊑ Spicy SpicyDish(x) ← Spicy(x), Dish(x) Dish ⊓ Spicy ⊑ SpicyDish Spicy(x) ← ArrabSauce(x) ArrabSauce ⊑ Spicy Spicy(x) ← PenneArrab(x) PenneArrab ⊑ Spicy Dish(x) ← PenneArrab(x) PenneArrab ⊑ Dish SpicyDish(x) ← PenneArrab PenneArrab ⊑ SpicyDish (technically, also have T -independent rules for ⊤...) 53/109
  • 106.
  • 107.
    (unions of) conjunctivequeries IQs quite restricted: No selections and joins as in DB queries 55/109
  • 108.
    (unions of) conjunctivequeries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) 55/109
  • 109.
    (unions of) conjunctivequeries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) A conjunctive query (CQ) is a first-order query q(⃗x) of the form ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) where every variable in some ⃗ti appears in either ⃗x or ⃗y 55/109
  • 110.
    (unions of) conjunctivequeries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) A conjunctive query (CQ) is a first-order query q(⃗x) of the form ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) where every variable in some ⃗ti appears in either ⃗x or ⃗y A union of CQs (UCQ) is a first-order query q(⃗x) of the form q1(⃗x) ∨ · · · ∨ qn(⃗x) where the qi(⃗x) are CQs with same tuple ⃗x of free vars 55/109
  • 111.
    cqs and ucqsin datalog Alternatively, CQs and UCQs can be seen as Datalog rules CQs: q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn) 56/109
  • 112.
    cqs and ucqsin datalog Alternatively, CQs and UCQs can be seen as Datalog rules CQs: q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn) UCQs: q(⃗x) = ∃⃗y1.P1 1(⃗t1 1) ∧ · · · ∧ P1 n1 ( ⃗t1 n1 ) q(⃗x) ← P1 1(⃗t1 1), . . . , P1 n1 ( ⃗t1 n1 ) ∨ ∃⃗y2.P2 1(⃗t2 1) ∧ · · · ∧ P2 n2 ( ⃗t2 n2 ) q(⃗x) ← P2 1(⃗t2 1), . . . , P2 n(⃗t2 n) ... ⇝ ... ∨ ∃⃗yℓ.Pℓ 1(⃗tℓ 1) ∧ · · · ∧ Pℓ nℓ ( ⃗tℓ nℓ ) q(⃗x) ← Pℓ 1(⃗tℓ 1), . . . , Pℓ nℓ ( ⃗tℓ nℓ ) 56/109
  • 113.
    what can weexpress as (u)cqs? q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z) q2(y, x) = q2(y, x) ∨ ( ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧hasIngred(z, z′ ) ∧ Spicy(z′ ) ) 57/109
  • 114.
    what can weexpress as (u)cqs? q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z) q2(y, x) = q2(y, x) ∨ ( ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧hasIngred(z, z′ ) ∧ Spicy(z′ ) ) Capture select-project-join queries of relational algebra Capture basic graph patterns of SPARQL 57/109
  • 115.
    query matches A matchfor q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q 58/109
  • 116.
    query matches A matchfor q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a 58/109
  • 117.
    query matches A matchfor q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a By definition: ⃗a ∈ ans(q, I) iff there exists a match π such that I |=π q(⃗a) 58/109
  • 118.
    query matches A matchfor q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a By definition: ⃗a ∈ ans(q, I) iff there exists a match π such that I |=π q(⃗a) Answering CQs amounts to searching for matches Recall that ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, I) for every model I of K Challenge: how do we check that there is a match in every model? 58/109
  • 119.
    the universal modelproperty For Horn DLs, each satisfiable K has a universal model IK IK is ‘contained’ in every model I of K ⇝ formally, there is a homomorphism from IK to I An answer to a (U)CQ q in IK is an answer to q in every model of K ⇝ matches of (U)CQs are preserved under homomorphisms ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, IK) So: IK gives us the certain answers to q over K 59/109
  • 120.
    constructing a universalmodel Use the saturation of (T , A) for building a universal model IT ,A 60/109
  • 121.
    constructing a universalmodel Use the saturation of (T , A) for building a universal model IT ,A Intuition: - IT ,A contains the saturated ABox A′ - if an object satisfies M and M ⊑ ∃R.M′ ∈ sat(T ), a fresh object witnessing this is created 60/109
  • 122.
    constructing a universalmodel Use the saturation of (T , A) for building a universal model IT ,A Intuition: - IT ,A contains the saturated ABox A′ - if an object satisfies M and M ⊑ ∃R.M′ ∈ sat(T ), a fresh object witnessing this is created Formally, ∆IT ,A contains words aR1M1 . . . RnMn with a ∈ Ind(A) and: ∙ Ri are roles and Mi are conjunctions of concepts ∙ there exists M ⊑ ∃R1.M1 ∈ sat(T ) such that T , A |= M(a) ∙ Mi ⊑ ∃Ri+1.Mi+1 ∈ sat(T ) for every 1 ≤ i < n (note: use only strongest axioms M ⊑ ∃R.M′ in sat(T )) The interpretation function is defined as follows: ∙ aIT ,A = a, ∙ a ∈ AI iff A(a) ∈ sat(T , A), eRM ∈ AIT ,A iff M ⊑ A ∈ sat(T ) ∙ (a, b) ∈ rI iff r(a, b) ∈ sat(T , A), (e, eRM) ∈ rIT ,A iff R ⊑ r ∈ sat(T ), and (eRM, e) ∈ rIT ,A if R ⊑ r− ∈ sat(T ) 60/109
  • 123.
    example of thecanonical model construction (1/3) TBox: PenneArrab ⊑ ∃hasIngred.Penne Penne ⊑ Pasta PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ⊑ Spicy PizzaCalab ⊑ ∃hasIngred.Nduja Nduja ⊑ Spicy ABox: serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) The saturated TBox additionally contains: PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta) ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy) 61/109
  • 124.
    example of thecanonical model construction (2/3) IT ,A contains the ABox and is closed under inclusions serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) rp PizzaCalab b PenneArrabserves serves 62/109
  • 125.
    example of thecanonical model construction (3/3) The anonymous objects witnessing existential concepts form trees rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred PenneArrab ⊑ ∃hasIngred.ArrabSauce PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta) ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy) 63/109
  • 126.
    finding answers inthe canonical model To answer CQ q, it suffices to test whether it has a match in IT ,A But this is still challenging! - IT ,A contains assertions and objects not present in A - we cannot build IT ,A explicitly: can be infinite! Our approach: use query rewriting! Formally: given a CQ q, we construct a UCQ REWT (q) such that ⃗a ∈ ans(q, IT ,A) iff there is a match π for a disjunct q′ of rewT (q) such that IT ,A |=π q′ (⃗a) and π sends all vars to individuals from A 64/109
  • 127.
    rewriting the query Rewritingstep (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ 65/109
  • 128.
    rewriting the query Rewritingstep (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ Properties: ∙ Every match for q′ can be extended to a match for q ∙ Every match for q contains a match for q′ ∙ The answers of q and q′ coincide, but the relevant matches for q′ are closer to the ABox than those of q 65/109
  • 129.
    rewriting the query Rewritingstep (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ Properties: ∙ Every match for q′ can be extended to a match for q ∙ Every match for q contains a match for q′ ∙ The answers of q and q′ coincide, but the relevant matches for q′ are closer to the ABox than those of q We repeatedly apply this rewriting step to obtain a set of queries whose relevant matches range over ABox individuals. 65/109
  • 130.
    example of arewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 66/109
  • 131.
    example of arewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred • Choose z′ as ‘leaf’ • Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T ) • RHS ensures hasIngred(z, z′ ), Spicy(z′ ) • We replace these atoms by ArrabSauce(z) 66/109
  • 132.
    example of arewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred • Choose z′ as ‘leaf’ • Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T ) • RHS ensures hasIngred(z, z′ ), Spicy(z′ ) • We replace these atoms by ArrabSauce(z) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) 66/109
  • 133.
    example of arewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) 67/109
  • 134.
    example of arewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π q(b, r) IK |=π′ q′ (b, r) r π(x),π′(x) p PizzaCalab b PenneArrab π(y),π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 67/109
  • 135.
    example of arewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π q(b, r) IK |=π′ q′ (b, r) r π(x),π′(x) p PizzaCalab b PenneArrab π(y),π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) 67/109
  • 136.
    another rewriting step(1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 68/109
  • 137.
    another rewriting step(1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred • Choose z as leaf • Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce • RHS yields hasIngred(y, z) and ArrabSauce(z) • We replace these atoms by PenneArrab(y) 68/109
  • 138.
    another rewriting step(1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred • Choose z as leaf • Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce • RHS yields hasIngred(y, z) and ArrabSauce(z) • We replace these atoms by PenneArrab(y) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) 68/109
  • 139.
    another rewriting step(2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) 69/109
  • 140.
    another rewriting step(2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) 69/109
  • 141.
    another rewriting step(2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 69/109
  • 142.
    another rewriting step(2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) > depth(π′′ ) 69/109
  • 143.
    another rewriting step(2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) > depth(π′′ ) In π′′ all variables are mapped to individuals 69/109
  • 144.
    decision procedure Theorem Forevery satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. 70/109
  • 145.
    decision procedure Theorem Forevery satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. There is a bounded number of such restricted matches π Checking if π is match reduces to linearly many instance checks 70/109
  • 146.
    decision procedure Theorem Forevery satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. There is a bounded number of such restricted matches π Checking if π is match reduces to linearly many instance checks Yields terminating, sound, and complete CQ answering procedure 70/109
  • 147.
    complexity of cqanswering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time 71/109
  • 148.
    complexity of cqanswering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time Data complexity: sat(T ) and rewT (q) are ABox independent polynomial bound on candidate matches π Instance checking in polynomial time 71/109
  • 149.
    complexity of cqanswering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time Data complexity: sat(T ) and rewT (q) are ABox independent polynomial bound on candidate matches π Instance checking in polynomial time Theorem CQ answering in ELHI⊥and Horn-SHIQ is Exp-complete in combined complexity and P-complete in data complexity. 71/109
  • 150.
    optimal bounds forlightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) 72/109
  • 151.
    optimal bounds forlightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard 72/109
  • 152.
    optimal bounds forlightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity. 72/109
  • 153.
    optimal bounds forlightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity. Theorem CQ answering in ELH and DL-LiteR is NP-complete in combined complexity. For ELH the data complexity is P-complete, and for DL-LiteR the data complexity is in AC0. 72/109
  • 154.
    datalog rewriting Our procedureyields a Datalog rewriting: ∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q) ∙ use Q in head of rules ∙ the program Π(T ) (from earlier) computes all entailed ABox assertions 73/109
  • 155.
    datalog rewriting Our procedureyields a Datalog rewriting: ∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q) ∙ use Q in head of rules ∙ the program Π(T ) (from earlier) computes all entailed ABox assertions (Πrew(q) ∪ Π(T ), Q) is a Datalog rewriting of q w.r.t. T relative to consistent ABoxes 73/109
  • 156.
    combined approach forcqs in elhi Alternative: combined approach (saturation and rewriting) 74/109
  • 157.
    combined approach forcqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K 74/109
  • 158.
    combined approach forcqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K Also know: assertions entailed from K = assertions in sat(K) 74/109
  • 159.
    combined approach forcqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K Also know: assertions entailed from K = assertions in sat(K) Materialize assertions in sat(K) and view result as database + only need to evaluate a UCQ can use standard relational database systems – materializing not always convenient saturation needs to be updated if data changes 74/109
  • 160.
    an fo rewritingapproach for cqs in dl-lite For DL-LiteR we can generate an FO-rewriting as follows. Replace in all q′ ∈ rewT (q) each atom by its FO-rewriting for instance checking: 75/109
  • 161.
    an fo rewritingapproach for cqs in dl-lite For DL-LiteR we can generate an FO-rewriting as follows. Replace in all q′ ∈ rewT (q) each atom by its FO-rewriting for instance checking: ∙ replace each A(t) by RewriteIQ(A, T ) ∙ replace each r(t, t′ ) by RewriteIQ(r, T ) Resulting FO formula: ∙ positive, can be transformed into a UCQ ∙ rewriting relative to consistent ABoxes ∙ can be combined with a rewriting of unsatisfiability ∙ yields AC0 upper bound in data complexity 75/109
  • 162.
    a glimpse beyond OtherHorn DLs ∙ Similar results hold for other dialects of DL-Lite and EL ∙ Sometimes complexity increases, e.g., EL with complex role inclusions ∙ The P data and Exp combined upper bounds extend to even more expressive Horn DLs, like Horn-SHOIQ 76/109
  • 163.
    a glimpse beyond OtherHorn DLs ∙ Similar results hold for other dialects of DL-Lite and EL ∙ Sometimes complexity increases, e.g., EL with complex role inclusions ∙ The P data and Exp combined upper bounds extend to even more expressive Horn DLs, like Horn-SHOIQ Beyond Horn DLs ∙ no universal model property ∙ query answering usually exponentially harder ∙ different techniques: automata, rolling-up, resolution, etc. ∙ for some well-known DLs (e.g., SHOIQ) decidability open 76/109
  • 164.
    complexity of answering(u)cqs IQs CQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR in AC0 NLogSpace in AC0 NP EL, ELH P P P NP ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP Exp ALCI, SH, SHIQ coNP Exp coNP 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 77/109
  • 165.
  • 166.
    limitations of (u)cqs Somevery natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? 79/109
  • 167.
    limitations of (u)cqs Somevery natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? We need navigational queries queries that can query the topology of the graph, that is, the information stored in the connections 79/109
  • 168.
    limitations of (u)cqs Somevery natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? We need navigational queries queries that can query the topology of the graph, that is, the information stored in the connections Especially important for highly connected data with no fixed schema ∙ social, biological, and chemical networks 79/109
  • 169.
  • 170.
    navigational queries Prominent navigationalquery languages: Regular Path Queries (RPQs): find pairs of objects that are connected by a chain of roles that comply with a given regular language (hasCourse ∪ courseOf− ) · (hasIngred ∪ ingredOf − )∗ · Spicy?(x, y) 80/109
  • 171.
    navigational queries Prominent navigationalquery languages: Regular Path Queries (RPQs): find pairs of objects that are connected by a chain of roles that comply with a given regular language (hasCourse ∪ courseOf− ) · (hasIngred ∪ ingredOf − )∗ · Spicy?(x, y) Conjunctive RPQs: allow to join RPQs conjunctively ∙ similar to CQs, but each atom is an RPQ ∙ extend CQs with the navigational power of RPQs q(x, x′ ) = ∃y, z. serves · Menu? · (hasMain ∪ hasStarter)(x, y) ∧ serves · Menu? · (hasCourse ∪ courseOf− )(x′ , y) ∧ (hasIngred ∪ ingredOf − )∗ · Spicy?(y, z) Both languages have 1-way and 2-way variants 80/109
  • 172.
    our most expressivenavigational queries: c2rpqs Recall: N± R contains all role names and their inverses. A conjunctive two-way regular path query (C2RPQ) has the form q(⃗x) = ∃⃗y. ∧ L(t, t′ ) ∧ ∧ A(t) where A is a concept name t, t′ are variables or individuals (in NI ∪⃗x ∪⃗y) L is regular language over N± R ∪ {A? | A ∈ NC} 81/109
  • 173.
    our most expressivenavigational queries: c2rpqs Recall: N± R contains all role names and their inverses. A conjunctive two-way regular path query (C2RPQ) has the form q(⃗x) = ∃⃗y. ∧ L(t, t′ ) ∧ ∧ A(t) where A is a concept name t, t′ are variables or individuals (in NI ∪⃗x ∪⃗y) L is regular language over N± R ∪ {A? | A ∈ NC} Regular languages can be given as: ∙ regular expressions E → r ∈ N± R | A? | r · r | r ∪ r | r∗ ∙ non-deterministic finite automata NFA Recall: RegExps and NFAs are equivalent, but NFAs are more succinct 81/109
  • 174.
    other navigational querylanguages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) 82/109
  • 175.
    other navigational querylanguages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) Two-way regular path queries (2RPQs) have only one atom and no existential variables ⇝ both variables are answer variables q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy?(x, y) q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) 82/109
  • 176.
    other navigational querylanguages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) Two-way regular path queries (2RPQs) have only one atom and no existential variables ⇝ both variables are answer variables q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy?(x, y) q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) (One-way) Regular path queries (RPQs) are 2RPQs with no inverses ⇝ all of the restrictions above q(x, y) = hasIngred∗ · Spicy?(x, y) q(x, y) = hasCourse · hasIngred∗ · Spicy?(x, y) 82/109
  • 177.
    semantics of c2rpqs Satisfactionof atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI 83/109
  • 178.
    semantics of c2rpqs Satisfactionof atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI Match: mapping π from terms to elements that satisfies all atoms As before: I |=π q(⃗a) if match π maps answer variables to ⃗a 83/109
  • 179.
    semantics of c2rpqs Satisfactionof atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI Match: mapping π from terms to elements that satisfies all atoms As before: I |=π q(⃗a) if match π maps answer variables to ⃗a Certain answers defined as for CQs Again suffices to find match in canonical model 83/109
  • 180.
    answering 2rpqs We focuson answering 2PRQs: one atom, no existential variables 84/109
  • 181.
    answering 2rpqs We focuson answering 2PRQs: one atom, no existential variables Bound on matches ranging over individuals only 84/109
  • 182.
    answering 2rpqs We focuson answering 2PRQs: one atom, no existential variables Bound on matches ranging over individuals only Challenge: paths may need to go deep into the canonical model q(x, y) = serves · (hasIngred ∪ ingredOf − )∗ · Spicy? · Σ∗ (x, y) r π(x) p PizzaCalab b PenneArrabπ(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 84/109
  • 183.
    loops through theanonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches 85/109
  • 184.
    loops through theanonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ We use NFA representation We write M ∈ Loopα[s, s′ ] iff a ∈ MIK implies the existence of a path p below a that takes the NFA α from s to s′ , e.g., 85/109
  • 185.
    loops through theanonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ We use NFA representation We write M ∈ Loopα[s, s′ ] iff a ∈ MIK implies the existence of a path p below a that takes the NFA α from s to s′ , e.g., PenneArrab ∈ Loopα[s1, sf] because of PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) 85/109
  • 186.
    computing the looptable We can explicitly compute the full table Loopα inductively: 86/109
  • 187.
    computing the looptable We can explicitly compute the full table Loopα inductively: if s is a state then Loopα[s, s] = NC if M1 ∈ Loopα[s1, s2] and M2 ∈ Loopα[s2, s3] then M1 ⊓ M2 ∈ Loopα[s1, s3] if T |= C1 ⊓ · · · ⊓ Cn ⊑ A and (s1, A?, s2) ∈ δ then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s2] if T |= C1 ⊓ · · · ⊓ Cn ⊑ ∃R.D, T |= R ⊑ R′ , T |= R ⊑ R′′ , (s1, R′ , s2) ∈ δ, D ∈ Loopα[s2, s3], and (s3, R′′− , s4) ∈ δ then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s4] 86/109
  • 188.
    computing loops: anexample s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 87/109
  • 189.
    computing loops: anexample s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy 87/109
  • 190.
    computing loops: anexample s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy ∙ ArrabSauce ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ∈ Loopα[s1, sf] 87/109
  • 191.
    computing loops: anexample s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy ∙ ArrabSauce ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ∈ Loopα[s1, sf] ∙ PenneArrab ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ∈ Loopα[s1, sf] 87/109
  • 192.
    evaluation 2rpqs usingthe loop table Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K) Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A 88/109
  • 193.
    evaluation 2rpqs usingthe loop table Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K) Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A ∙ After checking consistency, we start from (a, s0) ∙ At pair (c, s), guess new pair (d, s′ ) together with one of: ∙ transition (s, σ, s′ ) a σ-step from c to d in ABox ⇝ check if (c, d) ∈ σI ∙ concepts M in Loopα[s, s′ ] stay at same individual, and jump to s′ ⇝ check if c = d ∈ MI ∙ Exit when we get pair (b, sf) ∙ Use counter to ensure termination (only need to consider each pair once) 88/109
  • 194.
    evaluation algorithm Algorithm EvalAtom Input:NFA α = (S, Σ, δ, s0, F) with Σ ⊆ N± R ∪ {A? | A ∈ NC}, ELHI⊥ KB (T , A), (a, b) ∈ Ind(A) × Ind(A) 1. Test whether (T , A) is satisfiable, output yes if not. 2. Initialize current = (a, s0) and count = 0. Set max = |A| · |S| + 1. 3. While count < max and current ̸∈ {(b, sf) | sf ∈ F} 3.1 Let current = (c, s). 3.2 Guess a pair (d, s′ ) ∈ Ind(A) × S and either (s, σ, s′ ) ∈ δ or M ∈ Loopα[s, s′ ]. 3.3 If (s, σ, s′ ) was guessed ∙ If σ ∈ N± R , then verify that T , A |= σ(c, d), and return no if not. ∙ If σ = A?, then verify that c = d and T , A |= A(c), and return no if not. 3.4 If M was guessed, then verify that c = d and that T , A |= B(c) for every concept name B ∈ M, and return no if not. 3.5 Set current = (d, s′ ) and increment count. 4. If current = (b, sf) for some sf ∈ F, return yes. Else return no. 89/109
  • 195.
    evaluation algorithm: example(1/2) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) PenneArrab ⊑ PastaDish ⊓ ∃hasIngred.ArrabSauce PastaDish ⊑ Dish ⊓ ∃hasIngred.Pasta ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ⊔ ∃hasIngred.Spicy ⊑ Spicy Spicy ⊓ Dish ⊑ SpicyDish r π(x) p PizzaCalab b PenneArrabπ(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 90/109
  • 196.
    evaluation algorithm: example(2/2) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) s0 s1 sf serves hasIngred ingredOf− Spicy? Σ∗Peperonc ∈ Loopα[s1, sf] ArrabSauce ∈ Loopα[s1, sf] PenneArrab ∈ Loopα[s1, sf] 91/109
  • 197.
    evaluation algorithm: example(2/2) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) s0 s1 sf serves hasIngred ingredOf− Spicy? Σ∗Peperonc ∈ Loopα[s1, sf] ArrabSauce ∈ Loopα[s1, sf] PenneArrab ∈ Loopα[s1, sf] count: 0 1 2 Guess (r, s0) (b, s1) (b, sf) (s0, serves, s1) ∈ δ PenneArrab ∈ Loopα[s1, sf] Test (r, b) ∈ servesI b ∈ PenneArrabI return yes 91/109
  • 198.
    complexity of thealgorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. 92/109
  • 199.
    complexity of thealgorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment 92/109
  • 200.
    complexity of thealgorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) 92/109
  • 201.
    complexity of thealgorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) For ELH and DL-Lite, we can obtain P upper bound (combined) 92/109
  • 202.
    complexity of thealgorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) For ELH and DL-Lite, we can obtain P upper bound (combined) Data complexity: ∙ Loopα computation in constant time (ABox independent) ∙ called procedures in P for ELH and NLogSpace for DL-LiteR 92/109
  • 203.
    complexity bounds Theorem ∙ ForELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. 93/109
  • 204.
    complexity bounds Theorem ∙ ForELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: 93/109
  • 205.
    complexity bounds Theorem ∙ ForELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: ∙ instance checking ∙ graph reachability = RPQ over plain ABox 93/109
  • 206.
    complexity bounds Theorem ∙ ForELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: ∙ instance checking ∙ graph reachability = RPQ over plain ABox P-hardness for DL-LiteR non-trivial 93/109
  • 207.
    answering c2rpqs For answeringC2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom 94/109
  • 208.
    answering c2rpqs For answeringC2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom This algorithm works for all DLs discussed and gives optimal complexity bounds 94/109
  • 209.
    answering c2rpqs For answeringC2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom This algorithm works for all DLs discussed and gives optimal complexity bounds Answering C2RPQs is not much harder: ∙ Combined complexity increases to PSpace for DL-LiteR and ELH ∙ but most other bounds are the same as for RPQs and CQs ∙ even for very expressive DLs that are not Horn 94/109
  • 210.
    final remarks onnavigational queries Navigational queries provide more querying power at moderate computational cost Expressible in Datalog, but computationally better behaved Good alternative to CQs, gaining increasing attention Property paths in SPARQL ∙ included in the SPARQL 1.1 standard ∙ add regular paths as in C2RPQs Ongoing quest for more flexible navigational languages 95/109
  • 211.
    complexity of answering(c)(2)rpqs 2RPQs C2RPQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR NLogSpace P NLogSpace PSpace EL, ELH P P P PSpace ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP-hard 2Exp ALCI, SH, SHIQ coNP Exp coNP-hard 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 96/109
  • 212.
    comparison: complexity ofanswering (u)cqs IQs CQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR in AC0 NLogSpace in AC0 NP EL, ELH P P P NP ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP Exp ALCI, SH, SHIQ coNP Exp coNP 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 97/109
  • 213.
    queries with negationor re- cursion
  • 214.
    queries with negation Conjunctivequery with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) 99/109
  • 215.
    queries with negation Conjunctivequery with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) Conjunctive query with inequalities (CQ̸= ) ∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals) ∙ example: find restaurant offering two menus having different dessert courses ∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧ offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2 ∙ example: find menus with at least three courses (see notes) 99/109
  • 216.
    queries with negation Conjunctivequery with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) Conjunctive query with inequalities (CQ̸= ) ∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals) ∙ example: find restaurant offering two menus having different dessert courses ∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧ offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2 ∙ example: find menus with at least three courses (see notes) Note: can define UCQ¬s s and UCQ̸= s in the obvious way 99/109
  • 217.
    undecidability results forqueries with negation Adding negation leads to undecidability even in very restricted settings. Theorem The following problems are undecidable: ∙ CQ¬s answering in DL-LiteR ∙ UCQ¬s answering in EL⊥ ∙ CQ̸= answering in DL-LiteR ∙ CQ̸= answering in EL⊥ 100/109
  • 218.
    undecidability results forqueries with negation Adding negation leads to undecidability even in very restricted settings. Theorem The following problems are undecidable: ∙ CQ¬s answering in DL-LiteR ∙ UCQ¬s answering in EL⊥ ∙ CQ̸= answering in DL-LiteR ∙ CQ̸= answering in EL⊥ Possible solution: adopt alternative semantics (see lecture notes) 100/109
  • 219.
    undecidability results forqueries with recursion Significant interest in combining DLs with Datalog rules Unfortunately, this almost always leads to undecidability: Theorem Datalog query answering is undecidable in every DL that can express (directly or indirectly) A ⊑ ∃r.A In particular: undecidable in both DL-Lite and EL 101/109
  • 220.
    undecidability results forqueries with recursion Significant interest in combining DLs with Datalog rules Unfortunately, this almost always leads to undecidability: Theorem Datalog query answering is undecidable in every DL that can express (directly or indirectly) A ⊑ ∃r.A In particular: undecidable in both DL-Lite and EL Possible solutions: ∙ use restricted classes of Datalog queries (e.g. path queries) ∙ DL-safe rules: can only apply rules to (named) individuals 101/109
  • 221.
  • 222.
    efficient omqa Lots ofwork on developing and implementing efficient OMQA algorithms Focus mostly on DL-Lite (and related dialects): ∙ First algorithm PerfectRef proposed in mid-2000’s ∙ Rewrites into UCQs, implemented in Quonto ∙ Improved versions proposed in Requiem, Presto, Rapid, … ∙ Some algorithms rewrite into positive existential queries or Datalog programs instead of UCQs ∙ Resulting queries are smaller, can be easier to evaluate 103/109
  • 223.
    optimizations and omqabeyond dl-lite Tractable classes, fragments of lower complexity Rewriting engines for other Horn DLs also developed, e.g., ∙ Requiem and the related Kyrie cover several EL dialects ∙ Clipper, and recently Rapid cover Horn-SHIQ They usually rewrite into Datalog programs 104/109
  • 224.
    understanding rewritability Much attentiondevoted to understanding the limits of rewritability and size of rewritings When are polynomial rewritings possible? Can we give bounds on the size of rewritings? Which non-DL-Lite ontologies can be rewritten into FO-queries? ⇝ related to non-uniform complexity: ∙ study specific pairs (q, T ), called ontology-mediated queries 105/109
  • 225.
    combined approaches Saturate theABox using the TBox axioms ⇝ a finite version of the canonical model and then evaluate the query over the saturated ABox Two approaches: ∙ modify the query before evaluation to ensure soundess, or ∙ evaluate and then filter unsound answers First proposed for EL, then also for DL-Lite Extended to other dialects, richer DLs 106/109
  • 226.
    querying existing relationaldata using mappings Today: assumed data given as ABox assertions (unary + binary facts) Problem: how to query existing relational data (arbitrary arity)? Solution: use mapping that specifies relationship between the database relations and the concepts / roles in DL vocabulary Formally: mapping assertions of the form φ → ψ where: ∙ φ is an query formulated using DB relations ∙ ψ is a query in the DL vocabulary Global-as-view (GAV) mappings: φ CQ, ψ atom (no quantifiers) Handling mappings: ∙ apply mappings to generate ABox, proceed as usual ∙ virtual ABox: unfolding step to get rewriting over DB relations 107/109
  • 227.
    other research topics(non-exhaustive) Beyond classical OMQA   ∙ inconsistency-tolerant query answering ∙ probabilistic query answering ∙ privacy-aware query answering ∙ temporal query answering Support for building and maintaining OMQA systems ∙ module extraction ∙ ontology evolution ∙ query inseparability and emptiness Improving the usability of OMQA systems ∙ interfaces and support for query formulation ∙ explaining query (non-)answers 108/109
  • 228.