Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Query Answering in Probabilistic Datalog+/–
Ontologies under Group Preferences
Thomas Lukasiewicz, Maria Vanina Martinez,
Gerardo I. Simari, and Oana Tifrea-Marciuska
Department of Computer Science, University of Oxford, UK
July 5, 2013
Oana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 1 /27

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Introduction
Datalog+/–
Databases and Queries
The Chase
GPP-Datalog+/–
Group Preference Model
Probabilistic Model
Preference Merging and Aggregation
Strategies to Answer k-rank Disjunctive Atomic Queries

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Motivation
◮ Web → Social Semantic Web
◮ model group of users (e.g., movie night, trip) that can handle
◮ qualitative preferences of users
◮ disagreement between users
◮ eﬃciency



◮ model uncertainty (e.g., information integration from travel sites)
◮ Desire: ontology language that handles preferences of a group of
users and can handle uncertainty
1
1image source: www.boundless.com

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Motivation
◮ Web → Social Semantic Web
◮ model group of users (e.g., movie night, trip) that can handle
◮ qualitative preferences of users
◮ disagreement between users
◮ eﬃciency



our previous work in SUM2013
◮ model uncertainty (e.g., information integration from travel sites)
◮ Desire: ontology language that handles preferences of a group of
users and can handle uncertainty
1
1image source: www.boundless.com

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
Datalog+/– (1/3)
◮ A database (instance) D for R is a (possibly infinite) set of atoms
with predicates from a finite set of predicate symbols R and
arguments from a set of data constants ∆.
D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1), adv(a2),
museum(m1), museum(m2), park(p1), free entrance(p1)}.
◮ A conjunctive query (CQ) over R has the form Q(X) = ∃Y Φ(X, Y),
where Φ(X, Y) is a conjunction of atoms
Q(X) = park(X) ∧ free entrance(X).
◮ A Boolean CQ (BCQ) over R is a CQ of the form Q(), often written
as the set of all its atoms, without quantifiers.
Q() = ∃Xpark(X) ∧ free entrance(X).

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
Datalog+/– (2/3)
◮ Answers to CQs and BCQs are deﬁned via homomorphisms, which
are mappings µ: ∆ ∪ ∆N ∪ V → ∆ ∪ ∆N ∪ V such that
1. c ∈ ∆ (set of constants) implies µ(c) = c,
2. c ∈ ∆N (set of labelled nulls) implies µ(c) ∈ ∆ ∪ ∆N ,
3. µ is naturally extended to atomic formula, sets of atomic formulas,
and conjunctions of atomic formulas.
◮ The set of all answers Q(D) is the set of all tuples t over a set of
data constants s.t. ∃µ µ: X ∪ Y → ∆ ∪ ∆N s.t. µ(Φ(X, Y)) ⊆ D and
µ(X) = t.
D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1), adv(a2),
museum(m1), museum(m2), park(p1), free entrance(p1)}.
For Q(X) = park(X) ∧ free entrance(X)
the set of all answers over D is Q(D) = {p1}.
For Q() = ∃Xpark(X) ∧ free entrance(X)
the answer is YES.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
Datalog+/– (3/3)
◮ Tuple-generating dependency (TGD): constraint of the form
∀X∀Y Φ(X, Y) → ∃Z Ψ(X, Z), where Φ(X, Y) and Ψ(X, Z) are
conjunctions of atoms over a set of predicates R, called the body
and the head, respectively.
museum(X) → SS(X)
◮ For a database D for R, and a set of TGDs Σ on R, the set of
models of D and Σ, denoted mods(D, Σ), is the set of all (possibly
inﬁnite) databases B such that
◮ D ⊆ B and
◮ every σ ∈ Σ is satisﬁed in B.
◮ The set of answers for a CQ Q to D and Σ, denoted ans(Q, D, Σ),
is the set of all tuples a such that a ∈ Q(B) for all B ∈ mods(D, Σ).

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
◮ The chase is a procedure for repairing a DB relative to a set of
dependencies.
◮ D ∪ Σ |= Q iff chase(D, Σ) |= Q.
◮ A TGD σ is guarded iff it contains an atom in its body that contains
all universally quantified variables of σ.
σ1 : P(X) ∧ R(X, Y ) ∧ Q(Y ) → ∃R(Y , Z) YES.
σ2 : R(X, Y ) ∧ R(Y , Z) → R(X, Z) NO.
If Σ consists of guarded TGDs, CQs can be evaluated on a fragment
of constant depth k ∗ |Q|, PTIME in data complexity.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
Informally, a TGD σ is applicable in a DB D if body(σ) maps to atoms in
D. If not already in D, the application of σ on D adds an atom with fresh
nulls corresponding to each existentially quantiﬁed variable in head(σ).
Example. Let O = (D, Σ) be an ontology describing travel activities:
Σ = {museum(X) → SS(X), park(A) → SS(A),
SS(A) → act(A), relax(X) → act(X),
adv(X) → act(X), sport(X) → act(X),
adv(X) → ∃Y requireEquip(X, Y )};
D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1),
adv(a2), museum(m1), museum(m2), park(p1)}
chase(D, Σ) = D ∪ {SS(m1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
chase(D, Σ) = D ∪ {SS(m1), SS(m2),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
chase(D, Σ) = D ∪ {SS(m1), SS(m2), SS(p1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
chase(D, Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(r2),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(r2), act(a1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(r2), act(a1), act(a2),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(r2), act(a1), act(a2), act(s1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(s2),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(s2), act(m1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(s2), act(m1), act(m2),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
act(s2), act(m1), act(m2), act(p1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
requireEquip(a1, e1),

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
The Chase
TGD Chase
requireEquip(a1, e1), requireEquip(a2, e2)}

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ A preference relation is a binary relation ≻ ⊆ HPref × HPref.
◮ A user preference model U induces a preference relation over a
subset of HOnt, denoted ≻U ;
act(s1
)
act(s2
) act(a2
) act(a1
)
act(p1
) act(m1
) act(m2
)
act(r1
) act(r2
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Deﬁnition
A group preference model U = (U1, . . . , Un) for n 1 users is a
collection of n user preference models.
u1 u2
u3
act(s1
)
act(s2
) act(a2
) act(a1
)
act(p1
) act(m1
) act(m2
)
act(r1
) act(r2
)
act(s2
)
act(p1
) act(m2
) act(m1
)
act(s1
) act(r1
) act(r2
)
act(a2
) act(a1
)
act(m1
) act(m2
)
act(p1
) act(r1
) act(r2
)
act(s2
) act(a2
) act(a1
) act(s1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Probabilistic Model
◮ A preference relation ≻ is score-based if is deﬁned as follows:
a1 ≻ a2 iﬀ score(a1) > score(a2).
◮ Model assigns a probability to each atom (using e.g. Markov logic
and Bayesian networks).
0.8
0.44
0.75
0.6
0.52
0.4
0.34
0.3
0.1
PrM
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
)
act(a1
)
act(r2
)
act(a2
)
act(s1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Challenges of the given model
0.8
0.44
0.75
0.6
0.52
0.4
0.34
0.3
0.1
u1 u2
u3
PrM
act(s1
)
act(s2
) act(a2
) act(a1
)
act(p1
) act(m1
) act(m2
)
act(r1
) act(r2
)
act(s2
)
act(p1
) act(m2
) act(m1
)
act(s1
) act(r1
) act(r2
)
act(a2
) act(a1
)
act(m1
) act(m2
)
act(p1
) act(r1
) act(r2
)
act(s2
) act(a2
) act(a1
) act(s1
)
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
)
act(a1
)
act(r2
)
act(a2
)
act(s1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Preference Merging and Aggregation.
◮ Challenge 1: user preference model and the probabilistic model in
disagreement: preference merging operators
◮ Challenge 2: user preference models may be in disagreement with
each other: preference aggregation operator
Deﬁnition
Let ≻U be an SPO and ≻M be a score-based preference relation. A
preference merging operator ⊗(≻U , ≻M ) yields a relation ≻∗
such that
1. ≻∗
is an SPO
2. if a1 ≻U a2 and a1 ≻M a2, then a1 ≻∗
a2.
Deﬁnition
Let U = (U1, . . . , Un) be a group preference model, where every Ui is
an SPO. A preference aggregation operator on U yields an SPO ≻∗
.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Deﬁnition
A GPP-Datalog+/– ontology has the form KB = (O, U, M, ⊗, ),
where
◮ O is a Datalog+/– ontology
◮ U = (U1, . . . , Un) is a group preference model with n 1
◮ M is a probabilistic model (with Herbrand bases HOnt, HPref,
and HM, respectively, such that HPref ⊆ HOnt)
◮ ⊗ is a preference merging operator
◮ is the preference aggregation operator
We say that KB is a guarded iﬀ O is guarded.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
DAQ queries
Deﬁnition
Let KB = (O, U, M, ⊗, ) be a GPP-Datalog+/– ontology, where
U = (U1, . . . , Un), and Q(X) = q1(X1) ∨ · · · ∨ qn(Xn) be a DAQ. Then, a
skyline answer to Q relative to ≻∗
= (⊗(≻U1
, ≻M ), . . . , ⊗(≻Un
, ≻M )) is
any θqi entailed by O such that no θ′
exists with O |= θ′
qj and
θ′
qj ≻∗
θqi , where θ and θ′
are ground substitutions for the variables in
Q(X).
A substitution is a mapping from variables to variables or constants.
◮ Intuitively, a skyline-answer is an answer that is the most preferred.
◮ A 1-rank answer is a skyline answer
◮ A 2-rank answer is the ﬁrst and second most preferred answers.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Collapse to single user
◮ t ∈ [0, 1]: the inﬂuence of probabilistic model (0 - high)
◮ 0.34 − 0.6 > 0.1 No =⇒ keep relation
0.8
0.44
0.75
0.6
0.52
0.4
0.34
0.3
0.1
PrM
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
)
act(a1
)
act(r2
)
act(a2
)
act(s1
)
act(s1
)
act(m1
)
act(m2
)
act(p1
)
act(r1
)act(r2
)
act(s2
)
act(a2
) act(a1
)
t=0.1
u2
act(m1
)act(m2
)act(p1
)
act(r1
) act(r2
)
act(s2
)
act(a2
)
act(a1
)
act(s1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ 0.75 − 0.6 > 0.1 Yes =⇒ inverse relation
0.8
0.44
0.75
0.6
0.52
0.4
0.34
0.3
0.1
PrM
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
)
act(a1
)
act(r2
)
act(a2
)
act(s1
)
act(s1
)
act(m1
)
act(m2
)
act(p1
)
act(r1
)act(r2
)
act(s2
)
act(a2
) act(a1
)
t=0.1
u2
act(m1
)act(m2
)act(p1
)
act(r1
) act(r2
)
act(s2
)
act(a2
)
act(a1
)
act(s1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ no relation
u2
t = 0.1u3
t = 0.3
act(s1
)
act(m1
)
u1
t = 0
act(r2
)
act(a2
)
act(m2
)
act(r1
)
act(p1
) act(a1
) act(s2
)
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
) act(r2
)
act(a2
) act(a1
) act(s1
)
act(p1
)
act(s2
)
act(a2
)
act(s1
)
act(m1
)
act(a1
)
act(m2
)
act(r2
)
act(r1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ relation with weight 1
u2
t = 0.1u3
t = 0.3
act(s1
)
act(m1
)
u1
t = 0
act(r2
)
act(a2
)
act(m2
)
act(r1
)
act(p1
) act(a1
) act(s2
)
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
) act(r2
)
act(a2
) act(a1
) act(s1
)
act(p1
)
act(s2
)
act(a2
)
act(s1
)
act(m1
)
act(a1
)
act(m2
)
act(r2
)
act(r1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
u2
t = 0.1u3
t = 0.3
act(s1
)
act(m1
)
u1
t = 0
act(r2
)
act(a2
)
act(m2
)
act(r1
)
act(p1
) act(a1
) act(s2
)
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
) act(r2
)
act(a2
) act(a1
) act(s1
)
act(p1
)
act(s2
)
act(a2
)
act(s1
)
act(m1
)
act(a1
)
act(m2
)
act(r2
)
act(r1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Collapse to single user (Final Graph)
◮ Q = act(X), (t1, t2, t3) = (0, 0.1, 0.3), k = 1
◮ k-rank answer to Q act(m1) .
act(m2
)
act(r2
)
act(a2
)
act(m1
)
act(p1
)
act(s2
)
act(r1
)
act(s1
)
act(a1
)
1
2
1
2
3
1
12
2 2
2
1
2
2
1
2
1
2
1
1
2
2
1
2
2
2

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ Q = act(X), (t1, t2, t3) = (0, 0.1, 0.3), k = 2
◮ k-rank answer to Q act(m1) , act(p1) .
act(m2
)
act(r2
)
act(a2
)
act(p1
)
act(s2
)
act(r1
)
act(s1
)
act(a1
)
2
1
2
3
1
1
1
2
2
1
2
1
2
1
1
2
2
1
2
2
2

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ Q = act(X), (t1, t2, t3) = (0, 0.1, 0.3), k = 3
◮ k-rank answer to Q act(m1) , act(p1) , act(s2) .
act(a2
)
act(s1
)
act(a1
)
1
1
1
2
1
2
2
2
2
act(r1
)
1
2
1
act(r2
)
3
2
act(m2
)
2
1
1
1
act(s2
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ Q = act(X), (t1, t2, t3) = (0, 0.1, 0.3), k = 4
◮ k-rank answer to Q act(m1) , act(p1) , act(s2) , act(m2) .
act(a2
)
act(s1
)
act(a1
)
1
1
1
22
2
2
2
act(r1
)
1
2
1
act(r2
)
3
2
act(m2
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
◮ Q = act(X), (t1, t2, t3) = (0, 0.1, 0.3), k = 5
◮ k-rank answer to Q act(m1) , act(p1) , act(s2) , act(m2) , act(r2) .
act(a2
)
act(s1
)
act(a1
)
1
1
2
2
2
act(r1
)
1
2
1
act(r2
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Collapse to single user (Theorem)
Theorem
Let KB = (O, U, M, ⊗, ) be a GPP-Datalog+/– ontology, Q be a
DAQ, and k 0. If O is a guarded Datalog+/– ontology and the
removeCycles subroutine does not remove any unnecessary edges, then
Algorithm k-Rank-CSU
◮ correctly computes k-rank answers to Q
◮ Complexity: O(poly(|D|) · S + C) time in the data complexity, where
S is the cost of computing score(a) = PrKB (a) for any atom a such
that O |= a, and C is the cost of removeCycles.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Voting
◮ diﬀerent voting strategies e.g.,plurality voting
◮ that is computing the individual rankings ﬁrst and then voting
Q = act(X), k = 2, and (t1, t2, t3) = (0, 0.1, 0.3). k-rank answer to Q
using voting is act(m1), act(m2) or act(m1), act(p1) .
act(s1
)
act(m1
)
u2
u1
u3
act(r2
)
act(a2
)
act(m2
)
act(r1
)
act(p1
) act(a1
) act(s2
)
act(m1
)
act(p1
)
act(s2
)
act(m2
)
act(r1
) act(r2
)
act(a2
) act(a1
) act(s1
)
act(p1
)
act(s2
)
act(a2
)
act(s1
)
act(m1
)
act(a1
)
act(m2
)
act(r2
)
act(r1
)

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Summary
◮ Extension of Datalog+/– that allows for dealing with both partially
ordered preferences of groups of users and probabilistic uncertainty.
◮ We have focused on answering DAQs (disjunctions of atomic
queries) k-rank queries in this context.
◮ Presented diﬀerent operators to compute group preferences as a
merging and an aggregation of the preferences of single users with
probability-based preferences and with each other, respectively.
◮ We have then provided algorithms to answer k-rank queries
for DAQs under these group preferences.
◮ We have shown that, under certain reasonable conditions, such DAQ
answering in Datalog+/– can be done in polynomial time in the
data complexity.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
Future work
◮ Implementing and testing the GPP-Datalog+/– framework.
◮ Explore which of the merging/aggregation operators is similar to
human judgment and thus well-suited as a general default
merging/aggregation operator for search and query answering in the
Social Semantic Web.

Outline
Introduction
Datalog+/–
GPP-Datalog+/–
Probabilistic Model
THANK YOU
Questions ?oana.tifrea@cs.ox.ac.uk

Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences

More Related Content

What's hot

Viewers also liked

Similar to Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences

More from Oana Tifrea-Marciuska

Recently uploaded

Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences