Incomplete Information in RDF
Charalampos Nikolaou and Manolis Koubarakis
charnik@di.uoa.gr

koubarak@di.uoa.gr

Department of Informatics and Telecommunications
National and Kapodistrian University of Athens

Web Reasoning and Rule Systems (RR) 2013
July 27, 2013
Outline
Motivation
Previous work
The RDFi framework
SPARQL query evaluation over RDFi databases
An algorithm for certain answer computation
Preliminary complexity results
Conclusions and future work

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

2/36
Motivation

Incomplete information is an important issue in many research
areas: relational databases, knowledge representation and the
semantic web.
Incomplete information arises in many practical settings (e.g.,
sensor data). RDF is often used to represent such data.
Even if initial information is complete, incomplete information
arises later on (e.g., relational view updates, data integration,
data exchange).
Although there is much work recently on incomplete
information in XML, not much has been done for incomplete
information in RDF.

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

3/36
Previous work
Relational
Relations extended to tables with various models of
incompleteness [Imielinski/Lipski ’84]
Complexity results for the associated decision problems
[Abiteboul/Kanellakis/Grahne ’91]
Dependencies and updates [Grahne ’91]

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

4/36
Previous work
Relational
Relations extended to tables with various models of
incompleteness [Imielinski/Lipski ’84]
Complexity results for the associated decision problems
[Abiteboul/Kanellakis/Grahne ’91]
Dependencies and updates [Grahne ’91]

XML
Dynamic enrichment of incomplete information
[Abiteboul/Segoufin/Vianu ’01,’06]
General models of incompleteness, query answering, and
computational complexity [Barcel´/Libkin/Poggi/Sirangelo
o
’09,’10]
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

4/36
Previous work (cont’d)
RDF
Blank nodes as existential variables in the RDF standard
SPARQL query evaluation under certain answer semantics
(Open World Assumption) [Arenas/P´rez ’11]
e
Anonymous timestamps in general temporal RDF graphs
[Gutierrez/Hurtado/Vaisman ’05]
General temporal RDF graphs with temporal constraints
[Hurtado/Vaisman ’06]

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

5/36
Previous work (cont’d)
RDF
Blank nodes as existential variables in the RDF standard
SPARQL query evaluation under certain answer semantics
(Open World Assumption) [Arenas/P´rez ’11]
e
Anonymous timestamps in general temporal RDF graphs
[Gutierrez/Hurtado/Vaisman ’05]
General temporal RDF graphs with temporal constraints
[Hurtado/Vaisman ’06]

RDFi : It captures incomplete information for property values
using constraints. It is for RDF what the c-tables model is for the
relational model.
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

5/36
RDFi by example

Example
y
P

19
hotspot1
type
Hotspot
fire1
type
Fire
hotspot1 correspondsTo fire1
fire1
occuredIn
_R1

.
.
.
.

8
6

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

23

x

6/36
RDFi by example

Example
y
P

19
hotspot1
type
Hotspot
fire1
type
Fire
hotspot1 correspondsTo fire1
fire1
occuredIn
_R1

.
.
.
.

8
6

23

x

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

6/36
RDFi in a nutshell
Extension of RDF for capturing incomplete information for
property values that exist but are unknown or partially known
Partial knowledge captured by constraints using an appropriate
constraint language L interpreted over a fixed structure ML

Syntax
RDF graphs extended to RDFi databases: pair (G , φ)
G : RDF graph with a new kind of literals, called e-literals
φ: quantifier-free formula of L

Semantics
Possible world semantics as in [Imielinski/Lipski ’84] and
[Grahne ’91]
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

7/36
Constraint languages L

Examples

ECL
Equality constraints
interpreted over an infinite
domain: x EQ y , x EQ c
Blank nodes as existential
variables

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

8/36
Constraint languages L

Examples

diPCL/dePCL

ECL
Equality constraints
interpreted over an infinite
domain: x EQ y , x EQ c
Blank nodes as existential
variables

C. Nikolaou and M. Koubarakis

–

Difference constraints of the
form x − y ≤ c interpreted over
the integers or rationals
Incomplete temporal
information [Koubarakis ’94]

Incomplete Information in RDF

8/36
Constraint languages L

Examples

diPCL/dePCL

ECL
Equality constraints
interpreted over an infinite
domain: x EQ y , x EQ c
Blank nodes as existential
variables

Difference constraints of the
form x − y ≤ c interpreted over
the integers or rationals
Incomplete temporal
information [Koubarakis ’94]

TCL
Topological constraints of
non-empty, regular closed
subsets of topological space
Six binary predicates:
DC, EC, PO, EQ, TPP, NTPP
C. Nikolaou and M. Koubarakis

–

x DC y
y

x

Incomplete Information in RDF

x EQ y
x
y

x EC y
y

x

x TPP y
y
x

x PO y
y

x

x NTPP y
y
x

8/36
Constraint languages L

Examples

diPCL/dePCL

ECL
Equality constraints
interpreted over an infinite
domain: x EQ y , x EQ c
Blank nodes as existential
variables

TCL

Difference constraints of the
form x − y ≤ c interpreted over
the integers or rationals
Incomplete temporal
information [Koubarakis ’94]

PCL
Topological constraints of
non-empty, regular closed
subsets of topological space
Six binary predicates:
DC, EC, PO, EQ, TPP, NTPP

C. Nikolaou and M. Koubarakis

–

TCL plus constant symbols
representing polygons in Q2
e.g.,
r NTPP "x − y ≥ 0 ∧ x ≤ 1 ∧ y ≥ 0"

Incomplete Information in RDF

8/36
RDFi : Vocabulary

RDF

RDFi

I (IRIs)
B (blank nodes)
L (literals)

I
B
L
C (literals)
U (e-literals)
M
A (datatypes)

M (datatype map)

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

L

constants
variables
set of sorts

9/36
RDFi : Vocabulary

RDF

RDFi

I (IRIs)
B (blank nodes)
L (literals)

I
B
L
C (literals)
U (e-literals)
M
A (datatypes)

M (datatype map)

L

constants
variables
set of sorts

ML interprets the constants of L in agreement with function
L2V of M

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

9/36
RDFi : Syntax
I
subject

I

predicate

B

object

θ

I B L C U

I :
B:
L:
C:
U:

IRIs
blank nodes
literals
constants of L
e-literals

Definition
(s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

10/36
RDFi : Syntax
I
subject

I

predicate

B

object

θ

I B L C U

I :
B:
L:
C:
U:

IRIs
blank nodes
literals
constants of L
e-literals

Definition
(s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple
If t is an e-triple and θ a conjunction of L-constraints, then
the pair (t, θ) is called a conditional triple

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

10/36
RDFi : Syntax
I
subject

I

predicate

B

object

θ

I B L C U

I :
B:
L:
C:
U:

IRIs
blank nodes
literals
constants of L
e-literals

Definition
(s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple
If t is an e-triple and θ a conjunction of L-constraints, then
the pair (t, θ) is called a conditional triple
A set of conditional triples is called a conditional graph
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

10/36
RDFi : Syntax (cont’d)
Definition
An RDFi database D is a pair D = (G , φ) where G is a conditional
graph and φ a Boolean combination of L-constraints (global
constraint)

Example
y

hotspot1
type
Hotspot
fire1
type
Fire
hotspot1 correspondsTo fire1
fire1
occuredIn _R1

.
.
.
.

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

P

19
8
6

23
11/36

x
RDFi : Semantics
RDFi database
Rep

D

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

set of RDF

 G1 ,

G2 ,
 .
 .
.

graphs






12/36
RDFi : Semantics
RDFi database
Rep

D

set of RDF

 G1 ,

G2 ,
 .
 .
.

graphs






Definition
A valuation v is a function from U to C assigning to each e-literal
from U a constant from C

Definition
Let G be a conditional graph and v a valuation. Then v (G )
denotes the RDF graph
{v (t) | (t, θ) ∈ G and ML |= v (θ)}
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

12/36
RDFi : Semantics (cont’d)
From RDFi databases to sets of RDF graphs
An RDFi database D = (G , φ) corresponds to the following set of
RDF graphs:
Rep(D) = H | there exists valuation v and RDF graph H
such that ML |= v (φ) and H ⊇ v (G )

Relation ⊇ captures the OWA semantics

An RDFi database corresponds to an infinite number of RDF
graphs

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

13/36
Question
How can we evaluate a query q over an RDFi database D
(compute q D )?

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

14/36
Question
How can we evaluate a query q over an RDFi database D
(compute q D )?

Semantic definition
q

C. Nikolaou and M. Koubarakis

Rep(D)

–

={ q

G

| G ∈ Rep(D)}

Incomplete Information in RDF

14/36
Question
How can we evaluate a query q over an RDFi database D
(compute q D )?

Semantic definition
q

C. Nikolaou and M. Koubarakis

Rep(D)

–

={ q

G

| G ∈ Rep(D)}

Incomplete Information in RDF

14/36
Question
How can we evaluate a query q over an RDFi database D
(compute q D )?

Semantic definition
q

Rep(D)

={ q

G

| G ∈ Rep(D)}

In practice?
Start with SPARQL algebra of [P´rez/Arenas/Gutierrez ’06]
e
with set semantics
Define SPARQL query evaluation for RDFi databases
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

14/36
From mappings to e-mappings...

{?F → fire1, ?S → ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”}

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

15/36
From mappings to e-mappings...

{?F → fire1, ?S → ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”}
{?F → fire1, ?S → R1}

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

15/36
... to conditional mappings

{?F → fire1, ?S →”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”}

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

16/36
... to conditional mappings

{?F → fire1, ?S →”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”}, true

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

16/36
... to conditional mappings

{?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1∧x ≤ 2∧y ≥ 1∧y ≤ 2”

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

16/36
From compatible mappings to possibly compatible
mappings
Join of conditional mappings

{?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”
{

?S → R2}, true

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

17/36
From compatible mappings to possibly compatible
mappings
Join of conditional mappings

{?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”
{

?S → R2}, true

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

17/36
From compatible mappings to possibly compatible
mappings
Join of conditional mappings

{?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”
{

?S → R2}, true
=

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

17/36
From compatible mappings to possibly compatible
mappings
Join of conditional mappings

{?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”
{

?S → R2}, true
=

{?F → fire1, ?S → R1}, true ∧ R1 EQ R2 ∧
R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

17/36
Operations on conditional mappings

Let Ω1 and Ω2 be sets of conditional mappings. We can define the
operation of:
Join (Ω1

Ω2 )

Union (Ω1 ∪ Ω2 )

Difference (Ω1  Ω2 )
Left-outer join (Ω1

C. Nikolaou and M. Koubarakis

–

1Ω )
2

Incomplete Information in RDF

18/36
Graph pattern evaluation

If D is an RDFi database and P a graph pattern, the evaluation of
P over D is defined recursively:

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

19/36
Graph pattern evaluation

If D is an RDFi database and P a graph pattern, the evaluation of
P over D is defined recursively:
base case:
P is the triple pattern t
recursion:
P is (P1 AND P2 )
→ P1 D
P2 D
P is (P1 UNION P2 )
→ P1 D ∪ P2 D
P is (P1 OPT P2 )
→ P1 D
P2 D

1

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

19/36
Graph pattern evaluation

If D is an RDFi database and P a graph pattern, the evaluation of
P over D is defined recursively:
base case:
P is the triple pattern t
recursion:
P is (P1 AND P2 )
→ P1 D
P2 D
P is (P1 UNION P2 )
→ P1 D ∪ P2 D
P is (P1 OPT P2 )
→ P1 D
P2 D
P is (P1 FILTER R)
where R is a conjunction of L-constraints

1

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

19/36
Graph pattern evaluation

If D is an RDFi database and P a graph pattern, the evaluation of
P over D is defined recursively:
base case:
P is the triple pattern t
recursion:
P is (P1 AND P2 )
→ P1 D
P2 D
P is (P1 UNION P2 )
→ P1 D ∪ P2 D
P is (P1 OPT P2 )
→ P1 D
P2 D
P is (P1 FILTER R)
where R is a conjunction of L-constraints

1

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

19/36
Triple pattern evaluation (case 1)

Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn ?R

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

20/36
Triple pattern evaluation (case 1)

Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn ?R

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

Answer (set of conditional mappings)
q

D

{?F → fire1, ?R → R1}, true

=

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

20/36
Triple pattern evaluation (case 2)

Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2"

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

21/36
Triple pattern evaluation (case 2)

Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2"

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

Answer (set of conditional mappings)

q

D

=

{?F → fire1}, R1 EQ "x ≥ 1∧x ≤ 2∧y ≥ 1∧y ≤ 2"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

21/36
Evaluation of FILTER graph patterns
Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn ?R .
FILTER (?R NTPP
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

22/36
Evaluation of FILTER graph patterns
Example
Database D

Query q

fire1 occuredIn _R1 .

?F occuredIn ?R .
FILTER (?R NTPP
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")

R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

Answer
q

D

=

{?F → fire1, ?R → R1},
R1 NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

22/36
SELECT queries
Example
Database D

Query q

SELECT ?F
WHERE {
?F occuredIn ?R .
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"
FILTER (?R NTPP
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")}

fire1 occuredIn _R1 .

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

23/36
SELECT queries
Example
Database D

Query q

SELECT ?F
WHERE {
?F occuredIn ?R .
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"
FILTER (?R NTPP
"x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")}

fire1 occuredIn _R1 .

Answer (set of conditional mappings)

q

D

=

{?F → fire1},
R1 NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

23/36
CONSTRUCT queries
Example
Database D

Query q

fire1 occuredIn _R1 .

CONSTRUCT { ?F type Fire }
WHERE {
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"
?F occuredIn ?R
}

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

24/36
CONSTRUCT queries
Example
Database D

Query q

fire1 occuredIn _R1 .

CONSTRUCT { ?F type Fire }
WHERE {
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"
?F occuredIn ?R
}

Answer (RDFi database)
fire1 type Fire .

D = (G , φ)
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

24/36
CONSTRUCT queries
Example
Database D

Query q

fire1 occuredIn _R1 .

CONSTRUCT { ?F type Fire }
WHERE {
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"
?F occuredIn ?R
}

Answer (RDFi database)
fire1 type Fire .

D = (G , φ)
R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19"

Closure property
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

24/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute

D

Rep

G

q

q

D

G

Rep

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute

D

Rep

G

q

q

D

G

Rep

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute

D

Rep

G

q

q

D

G

Rep

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute

D

Rep

G

q

q

D

Rep( q

D)

= q

Rep(D)

G

Rep

C. Nikolaou and M. Koubarakis

OR

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute. Does it?

D

Rep

G

q

q

D

Rep( q

D)

= q

Rep(D)

G

Rep

C. Nikolaou and M. Koubarakis

OR

–

Incomplete Information in RDF

25/36
Correctness of SPARQL query evaluation for RDFi
Does query evaluation compute the correct answer
(the answer agrees with the semantic definition)?

The following diagram should commute. Does it?

D

Rep

G

q

q

D

Rep( q

D)

=

q

Rep(D)

G

Rep

C. Nikolaou and M. Koubarakis

OR

–

Incomplete Information in RDF

25/36
Certain answer to the rescue
Definition
The certain answer to query q over a set of RDF graphs G is set
{ q

C. Nikolaou and M. Koubarakis

–

G

| G ∈ G}

Incomplete Information in RDF

26/36
Certain answer to the rescue
Definition
The certain answer to query q over a set of RDF graphs G is set
{ q

G

| G ∈ G}

Using the notion of certain answer we can relax the earlier equality
requirement to one that uses Q-equivalence.

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

26/36
Certain answer to the rescue
Definition
The certain answer to query q over a set of RDF graphs G is set
{ q

G

| G ∈ G}

Using the notion of certain answer we can relax the earlier equality
requirement to one that uses Q-equivalence.

Definition
Let Q be a fragment of SPARQL. Two sets of RDF graphs G, H
will be Q-equivalent (denoted by G ≡Q H) if they give the same
certain answer to every query q ∈ Q
{ q
C. Nikolaou and M. Koubarakis

G

–

| G ∈ G} =

{ q

Incomplete Information in RDF

H

| H ∈ H}
26/36
Representation system
Let
D be the set of all RDFi databases
G be the set of all RDF graphs

Rep : D → G be a function determining the set of possible
RDF graphs corresponding to an RDFi database, and
Q be a fragment of SPARQL

D, Rep, Q is a representation system if for all D ∈ D and all
q ∈ Q, there exists an RDFi database q D such that
Rep( q

C. Nikolaou and M. Koubarakis

–

D)

≡Q q

Rep(D)

Incomplete Information in RDF

27/36
Representation system
Let
D be the set of all RDFi databases
G be the set of all RDF graphs

Rep : D → G be a function determining the set of possible
RDF graphs corresponding to an RDFi database, and
Q be a fragment of SPARQL

D, Rep, Q is a representation system if for all D ∈ D and all
q ∈ Q, there exists an RDFi database q D such that
Rep( q

D)

≡Q q

Rep(D)

Are there interesting fragments Q of SPARQL that lead to a
representation system?
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

27/36
Representation systems for RDFi
Theorem
The following fragments of SPARQL can give us representation
systems for RDFi (with D and Rep as defined):
QC : CONSTRUCT queries using only AND, UNION,
AUF
and FILTER graph patterns, and without blank nodes in their
templates
QC : CONSTRUCT queries using only well-designed
WD
graph patterns, and without blank nodes in their templates

Well-designed graph patterns [P´rez/Arenas/Gutierrez ’06]
e
AND, FILTER, OPT fragment
P FILTER R: safe
P1 OPT P2 : variables in P2 are properly scoped
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

28/36
Representation systems for RDFi (cont’d)
Monotonicity

Definition
A fragment Q of SPARQL is monotone if for every q ∈ Q and
RDF graphs G and H such that G ⊆ H, it is q G ⊆ q H .

Proposition [Arenas/P´rez ’11]
e
The fragment of SPARQL corresponding to AND, UNION,
and FILTER graph patterns is monotone.
The fragment of SPARQL corresponding to well-designed
graph patterns is weakly-monotone ( ).

Proposition
Fragments QC and QC are monotone.
AUF
WD
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

29/36
Computing certain answers

Representation systems guarantee correctness of query
evaluation for RDFi and SPARQL
Query evaluation computes an RDFi database
q

D

= D = (G , φ)

How could we compute the certain answer?
Rep( q
Rep( q

D)

D)

is infinite!

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

30/36
Computing certain answers (cont’d)

Theorem
For D = (G , φ) and q from QC or QC , the certain answer of q
AUF
WD
over D can be computed as follows:
i) compute q

D

= Dq = (Gq , φ),

ii) compute the RDFi database (Hq , φ) = ((Dq )EQ )∗ , and
iii) return the set of RDF triples
{(s, p, o) | ((s, p, o), θ) ∈ Hq such that φ |= θ and o ∈ U}
/

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

31/36
The certainty problem
CERT (q, H, D)

Input
An RDF graph H, a CONSTRUCT query q, and an RDFi
database D

Question
Does H belong to the certain answer of q over D?
H⊆

C. Nikolaou and M. Koubarakis

–

q

Rep(D) ?

Incomplete Information in RDF

32/36
The certainty problem
CERT (q, H, D)

Input
An RDF graph H, a CONSTRUCT query q, and an RDFi
database D

Question
Does H belong to the certain answer of q over D?
H⊆

q

Rep(D) ?

We study the data complexity of CERT (q, H, D)
H and D are part of the input
q is fixed
C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

32/36
Deciding the certainty problem

Theorem
CERT (q, H, D) is equivalent to deciding whether formula

t∈H

(∀ l)(φ( l) ⊃ Θ(t, q, D, l))

is true
l is the vector of all e-literals in D
Θ(t, q, D, l) is of the form θ1 ∨ · · · ∨ θk , where θi is a
conjunction of L-constraints

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

33/36
Computational complexity

CERT (q, H, D)

C. Nikolaou and M. Koubarakis

L

data complexity

ECL/diPCL/dePCL/RCL

coNP-complete

TCL/PCL (RCC-5)

Problem

EXPTIME

–

Incomplete Information in RDF

34/36
Computational complexity

CERT (q, H, D)

L

data complexity

ECL/diPCL/dePCL/RCL

coNP-complete

TCL/PCL (RCC-5)

Problem

EXPTIME

Problem

combined complexity

SPARQL
SPARQLAUF
SPARQLWD

PSPACE-complete
NP-complete
coNP-complete

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

data complexity
E
AC
SP
OG

L

34/36
Conclusions

RDFi framework
Modeling of incomplete information for property values
Formal semantics through possible worlds semantics
SPARQL query evaluation and certain answer semantics
Two representation systems for RDFi and SPARQL
Algorithm for certain answer computation
Preliminary complexity analysis

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

35/36
Future work

More general models of incomplete information (subject,
predicate)
More refined complexity results
Scalable implementation when L expresses topological
constraints with/without constants (TCL/PCL)
Connection with query processing for the topology vocabulary
extension of GeoSPARQL
Probabilistic extension to RDFi
Data integration theory for linked data (only practice exists so
far)
Connection to geospatial OBDA using DL logics

C. Nikolaou and M. Koubarakis

–

Incomplete Information in RDF

36/36
Thank you
Constraint languages L

Properties of L
Many-sorted first-order language
Interpreted over a fixed (intended) structure ML
EQ: distinguished equality predicate
L-constraints: quantifier-free formulae of L

Weakly closed under negation: the negation of every atomic
L-constraint is equivalent to a disjunction of L-constraints
Correctness of SPARQL query evaluation for RDFi
An easy negative example

Example (classical RDF - OWA)
D
s p o .

q
CONSTRUCT { s ?p ?o }
WHERE { s ?p ?o }
Correctness of SPARQL query evaluation for RDFi
An easy negative example

Example (classical RDF - OWA)
D

q

s p o .

CONSTRUCT { s ?p ?o }
WHERE { s ?p ?o }

Then,
q

D

=D
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

D

with q

Rep(D)
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

Rep( q

D)

=

(s, p, o)

,

(s, p, o)
(c, d, e)

,

D

with q

(s, p, o)
(s, b, c)

Rep(D)

,···
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

Rep( q

D)

q

=

Rep(D)

(s, p, o)

=

,

(s, p, o)

(s, p, o)
(c, d, e)

,

,

(s, p, o)
(s, b, c)

D

with q

(s, p, o)
(s, b, c)

,···

Rep(D)

,···
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

Rep( q

D)

q

=

Rep(D)

(s, p, o)

=

There is no g ∈ q

(s, p, o)
(c, d, e)

,

(s, p, o)

Rep(D)

,

,

(s, p, o)
(s, b, c)

D

with q

(s, p, o)
(s, b, c)

Rep(D)

,···

,···

containing the triple (c, d, e)!
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

Rep( q

D)

q

=

Rep(D)

(s, p, o)

=

There is no g ∈ q

(s, p, o)
(c, d, e)

,

(s, p, o)

Rep(D)

,

,

(s, p, o)
(s, b, c)

D

with q

(s, p, o)
(s, b, c)

,···

,···

containing the triple (c, d, e)!

This would work if RDF made the CWA

Rep(D)
Correctness of SPARQL query evaluation for RDFi (cont’d)
An easy negative example

Example
Let us compare the the set of graphs represented by q

Rep( q

D)

q

=

Rep(D)

(s, p, o)

=

There is no g ∈ q

(s, p, o)
(c, d, e)

,

(s, p, o)

Rep(D)

,

,

(s, p, o)
(s, b, c)

D

with q

(s, p, o)
(s, b, c)

Rep(D)

,···

,···

containing the triple (c, d, e)!

This would work if RDF made the CWA
We know this already from the relational case [Imielinski/Lipski ’84]
Computing certain answers
Definitions

Definition (EQ-completion)
The EQ-completed form of D = (G , φ), denoted by
D EQ = (G EQ , φ), is taken from D by replacing all e-literals l ∈ U
appearing in G by the constant c ∈ C such that φ |= l EQ c
Computing certain answers
Definitions

Definition (EQ-completion)
The EQ-completed form of D = (G , φ), denoted by
D EQ = (G EQ , φ), is taken from D by replacing all e-literals l ∈ U
appearing in G by the constant c ∈ C such that φ |= l EQ c

Definition (Normalization)
The normalized form of D is the RDFi database D ∗ = (G ∗ , φ)
where G ∗ is the set
{(t, θ) | (t, θi ) ∈ G for all i = 1 . . . n, and θ is

G = {(t, θ1 ), (t, θ2 ), (t , θ )}

i

θi }

G ∗ = {(t, θ1 ∨ θ2 ), (t , θ )}

Incomplete Information in RDF

  • 1.
    Incomplete Information inRDF Charalampos Nikolaou and Manolis Koubarakis charnik@di.uoa.gr koubarak@di.uoa.gr Department of Informatics and Telecommunications National and Kapodistrian University of Athens Web Reasoning and Rule Systems (RR) 2013 July 27, 2013
  • 2.
    Outline Motivation Previous work The RDFiframework SPARQL query evaluation over RDFi databases An algorithm for certain answer computation Preliminary complexity results Conclusions and future work C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 2/36
  • 3.
    Motivation Incomplete information isan important issue in many research areas: relational databases, knowledge representation and the semantic web. Incomplete information arises in many practical settings (e.g., sensor data). RDF is often used to represent such data. Even if initial information is complete, incomplete information arises later on (e.g., relational view updates, data integration, data exchange). Although there is much work recently on incomplete information in XML, not much has been done for incomplete information in RDF. C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 3/36
  • 4.
    Previous work Relational Relations extendedto tables with various models of incompleteness [Imielinski/Lipski ’84] Complexity results for the associated decision problems [Abiteboul/Kanellakis/Grahne ’91] Dependencies and updates [Grahne ’91] C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 4/36
  • 5.
    Previous work Relational Relations extendedto tables with various models of incompleteness [Imielinski/Lipski ’84] Complexity results for the associated decision problems [Abiteboul/Kanellakis/Grahne ’91] Dependencies and updates [Grahne ’91] XML Dynamic enrichment of incomplete information [Abiteboul/Segoufin/Vianu ’01,’06] General models of incompleteness, query answering, and computational complexity [Barcel´/Libkin/Poggi/Sirangelo o ’09,’10] C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 4/36
  • 6.
    Previous work (cont’d) RDF Blanknodes as existential variables in the RDF standard SPARQL query evaluation under certain answer semantics (Open World Assumption) [Arenas/P´rez ’11] e Anonymous timestamps in general temporal RDF graphs [Gutierrez/Hurtado/Vaisman ’05] General temporal RDF graphs with temporal constraints [Hurtado/Vaisman ’06] C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 5/36
  • 7.
    Previous work (cont’d) RDF Blanknodes as existential variables in the RDF standard SPARQL query evaluation under certain answer semantics (Open World Assumption) [Arenas/P´rez ’11] e Anonymous timestamps in general temporal RDF graphs [Gutierrez/Hurtado/Vaisman ’05] General temporal RDF graphs with temporal constraints [Hurtado/Vaisman ’06] RDFi : It captures incomplete information for property values using constraints. It is for RDF what the c-tables model is for the relational model. C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 5/36
  • 8.
    RDFi by example Example y P 19 hotspot1 type Hotspot fire1 type Fire hotspot1correspondsTo fire1 fire1 occuredIn _R1 . . . . 8 6 C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 23 x 6/36
  • 9.
    RDFi by example Example y P 19 hotspot1 type Hotspot fire1 type Fire hotspot1correspondsTo fire1 fire1 occuredIn _R1 . . . . 8 6 23 x R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 6/36
  • 10.
    RDFi in anutshell Extension of RDF for capturing incomplete information for property values that exist but are unknown or partially known Partial knowledge captured by constraints using an appropriate constraint language L interpreted over a fixed structure ML Syntax RDF graphs extended to RDFi databases: pair (G , φ) G : RDF graph with a new kind of literals, called e-literals φ: quantifier-free formula of L Semantics Possible world semantics as in [Imielinski/Lipski ’84] and [Grahne ’91] C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 7/36
  • 11.
    Constraint languages L Examples ECL Equalityconstraints interpreted over an infinite domain: x EQ y , x EQ c Blank nodes as existential variables C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 8/36
  • 12.
    Constraint languages L Examples diPCL/dePCL ECL Equalityconstraints interpreted over an infinite domain: x EQ y , x EQ c Blank nodes as existential variables C. Nikolaou and M. Koubarakis – Difference constraints of the form x − y ≤ c interpreted over the integers or rationals Incomplete temporal information [Koubarakis ’94] Incomplete Information in RDF 8/36
  • 13.
    Constraint languages L Examples diPCL/dePCL ECL Equalityconstraints interpreted over an infinite domain: x EQ y , x EQ c Blank nodes as existential variables Difference constraints of the form x − y ≤ c interpreted over the integers or rationals Incomplete temporal information [Koubarakis ’94] TCL Topological constraints of non-empty, regular closed subsets of topological space Six binary predicates: DC, EC, PO, EQ, TPP, NTPP C. Nikolaou and M. Koubarakis – x DC y y x Incomplete Information in RDF x EQ y x y x EC y y x x TPP y y x x PO y y x x NTPP y y x 8/36
  • 14.
    Constraint languages L Examples diPCL/dePCL ECL Equalityconstraints interpreted over an infinite domain: x EQ y , x EQ c Blank nodes as existential variables TCL Difference constraints of the form x − y ≤ c interpreted over the integers or rationals Incomplete temporal information [Koubarakis ’94] PCL Topological constraints of non-empty, regular closed subsets of topological space Six binary predicates: DC, EC, PO, EQ, TPP, NTPP C. Nikolaou and M. Koubarakis – TCL plus constant symbols representing polygons in Q2 e.g., r NTPP "x − y ≥ 0 ∧ x ≤ 1 ∧ y ≥ 0" Incomplete Information in RDF 8/36
  • 15.
    RDFi : Vocabulary RDF RDFi I(IRIs) B (blank nodes) L (literals) I B L C (literals) U (e-literals) M A (datatypes) M (datatype map) C. Nikolaou and M. Koubarakis – Incomplete Information in RDF L constants variables set of sorts 9/36
  • 16.
    RDFi : Vocabulary RDF RDFi I(IRIs) B (blank nodes) L (literals) I B L C (literals) U (e-literals) M A (datatypes) M (datatype map) L constants variables set of sorts ML interprets the constants of L in agreement with function L2V of M C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 9/36
  • 17.
    RDFi : Syntax I subject I predicate B object θ IB L C U I : B: L: C: U: IRIs blank nodes literals constants of L e-literals Definition (s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 10/36
  • 18.
    RDFi : Syntax I subject I predicate B object θ IB L C U I : B: L: C: U: IRIs blank nodes literals constants of L e-literals Definition (s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple If t is an e-triple and θ a conjunction of L-constraints, then the pair (t, θ) is called a conditional triple C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 10/36
  • 19.
    RDFi : Syntax I subject I predicate B object θ IB L C U I : B: L: C: U: IRIs blank nodes literals constants of L e-literals Definition (s, p, o) ∈ (I ∪ B) ∪ I ∪ (I ∪ B ∪ L ∪ C ∪ U) is called an e-triple If t is an e-triple and θ a conjunction of L-constraints, then the pair (t, θ) is called a conditional triple A set of conditional triples is called a conditional graph C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 10/36
  • 20.
    RDFi : Syntax(cont’d) Definition An RDFi database D is a pair D = (G , φ) where G is a conditional graph and φ a Boolean combination of L-constraints (global constraint) Example y hotspot1 type Hotspot fire1 type Fire hotspot1 correspondsTo fire1 fire1 occuredIn _R1 . . . . R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF P 19 8 6 23 11/36 x
  • 21.
    RDFi : Semantics RDFidatabase Rep D C. Nikolaou and M. Koubarakis – Incomplete Information in RDF set of RDF   G1 ,  G2 ,  .  . . graphs      12/36
  • 22.
    RDFi : Semantics RDFidatabase Rep D set of RDF   G1 ,  G2 ,  .  . . graphs      Definition A valuation v is a function from U to C assigning to each e-literal from U a constant from C Definition Let G be a conditional graph and v a valuation. Then v (G ) denotes the RDF graph {v (t) | (t, θ) ∈ G and ML |= v (θ)} C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 12/36
  • 23.
    RDFi : Semantics(cont’d) From RDFi databases to sets of RDF graphs An RDFi database D = (G , φ) corresponds to the following set of RDF graphs: Rep(D) = H | there exists valuation v and RDF graph H such that ML |= v (φ) and H ⊇ v (G ) Relation ⊇ captures the OWA semantics An RDFi database corresponds to an infinite number of RDF graphs C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 13/36
  • 24.
    Question How can weevaluate a query q over an RDFi database D (compute q D )? C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 14/36
  • 25.
    Question How can weevaluate a query q over an RDFi database D (compute q D )? Semantic definition q C. Nikolaou and M. Koubarakis Rep(D) – ={ q G | G ∈ Rep(D)} Incomplete Information in RDF 14/36
  • 26.
    Question How can weevaluate a query q over an RDFi database D (compute q D )? Semantic definition q C. Nikolaou and M. Koubarakis Rep(D) – ={ q G | G ∈ Rep(D)} Incomplete Information in RDF 14/36
  • 27.
    Question How can weevaluate a query q over an RDFi database D (compute q D )? Semantic definition q Rep(D) ={ q G | G ∈ Rep(D)} In practice? Start with SPARQL algebra of [P´rez/Arenas/Gutierrez ’06] e with set semantics Define SPARQL query evaluation for RDFi databases C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 14/36
  • 28.
    From mappings toe-mappings... {?F → fire1, ?S → ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”} C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 15/36
  • 29.
    From mappings toe-mappings... {?F → fire1, ?S → ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”} {?F → fire1, ?S → R1} C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 15/36
  • 30.
    ... to conditionalmappings {?F → fire1, ?S →”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”} C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 16/36
  • 31.
    ... to conditionalmappings {?F → fire1, ?S →”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2”}, true C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 16/36
  • 32.
    ... to conditionalmappings {?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1∧x ≤ 2∧y ≥ 1∧y ≤ 2” C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 16/36
  • 33.
    From compatible mappingsto possibly compatible mappings Join of conditional mappings {?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2” { ?S → R2}, true C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 17/36
  • 34.
    From compatible mappingsto possibly compatible mappings Join of conditional mappings {?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2” { ?S → R2}, true C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 17/36
  • 35.
    From compatible mappingsto possibly compatible mappings Join of conditional mappings {?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2” { ?S → R2}, true = C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 17/36
  • 36.
    From compatible mappingsto possibly compatible mappings Join of conditional mappings {?F → fire1, ?S → R1}, R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2” { ?S → R2}, true = {?F → fire1, ?S → R1}, true ∧ R1 EQ R2 ∧ R1 EQ ”x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2” C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 17/36
  • 37.
    Operations on conditionalmappings Let Ω1 and Ω2 be sets of conditional mappings. We can define the operation of: Join (Ω1 Ω2 ) Union (Ω1 ∪ Ω2 ) Difference (Ω1 Ω2 ) Left-outer join (Ω1 C. Nikolaou and M. Koubarakis – 1Ω ) 2 Incomplete Information in RDF 18/36
  • 38.
    Graph pattern evaluation IfD is an RDFi database and P a graph pattern, the evaluation of P over D is defined recursively: C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 19/36
  • 39.
    Graph pattern evaluation IfD is an RDFi database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P1 AND P2 ) → P1 D P2 D P is (P1 UNION P2 ) → P1 D ∪ P2 D P is (P1 OPT P2 ) → P1 D P2 D 1 C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 19/36
  • 40.
    Graph pattern evaluation IfD is an RDFi database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P1 AND P2 ) → P1 D P2 D P is (P1 UNION P2 ) → P1 D ∪ P2 D P is (P1 OPT P2 ) → P1 D P2 D P is (P1 FILTER R) where R is a conjunction of L-constraints 1 C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 19/36
  • 41.
    Graph pattern evaluation IfD is an RDFi database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P1 AND P2 ) → P1 D P2 D P is (P1 UNION P2 ) → P1 D ∪ P2 D P is (P1 OPT P2 ) → P1 D P2 D P is (P1 FILTER R) where R is a conjunction of L-constraints 1 C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 19/36
  • 42.
    Triple pattern evaluation(case 1) Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn ?R R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 20/36
  • 43.
    Triple pattern evaluation(case 1) Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn ?R R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" Answer (set of conditional mappings) q D {?F → fire1, ?R → R1}, true = C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 20/36
  • 44.
    Triple pattern evaluation(case 2) Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2" R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 21/36
  • 45.
    Triple pattern evaluation(case 2) Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2" R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" Answer (set of conditional mappings) q D = {?F → fire1}, R1 EQ "x ≥ 1∧x ≤ 2∧y ≥ 1∧y ≤ 2" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 21/36
  • 46.
    Evaluation of FILTERgraph patterns Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn ?R . FILTER (?R NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2") R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 22/36
  • 47.
    Evaluation of FILTERgraph patterns Example Database D Query q fire1 occuredIn _R1 . ?F occuredIn ?R . FILTER (?R NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2") R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" Answer q D = {?F → fire1, ?R → R1}, R1 NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 22/36
  • 48.
    SELECT queries Example Database D Queryq SELECT ?F WHERE { ?F occuredIn ?R . R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" FILTER (?R NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")} fire1 occuredIn _R1 . C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 23/36
  • 49.
    SELECT queries Example Database D Queryq SELECT ?F WHERE { ?F occuredIn ?R . R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" FILTER (?R NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2")} fire1 occuredIn _R1 . Answer (set of conditional mappings) q D = {?F → fire1}, R1 NTPP "x ≥ 1 ∧ x ≤ 2 ∧ y ≥ 1 ∧ y ≤ 2" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 23/36
  • 50.
    CONSTRUCT queries Example Database D Queryq fire1 occuredIn _R1 . CONSTRUCT { ?F type Fire } WHERE { R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" ?F occuredIn ?R } C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 24/36
  • 51.
    CONSTRUCT queries Example Database D Queryq fire1 occuredIn _R1 . CONSTRUCT { ?F type Fire } WHERE { R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" ?F occuredIn ?R } Answer (RDFi database) fire1 type Fire . D = (G , φ) R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 24/36
  • 52.
    CONSTRUCT queries Example Database D Queryq fire1 occuredIn _R1 . CONSTRUCT { ?F type Fire } WHERE { R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" ?F occuredIn ?R } Answer (RDFi database) fire1 type Fire . D = (G , φ) R1 NTPP "x ≥ 6 ∧ x ≤ 23 ∧ y ≥ 8 ∧ y ≤ 19" Closure property C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 24/36
  • 53.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 25/36
  • 54.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute D Rep G q q D G Rep C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 25/36
  • 55.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute D Rep G q q D G Rep C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 25/36
  • 56.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute D Rep G q q D G Rep C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 25/36
  • 57.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute D Rep G q q D Rep( q D) = q Rep(D) G Rep C. Nikolaou and M. Koubarakis OR – Incomplete Information in RDF 25/36
  • 58.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute. Does it? D Rep G q q D Rep( q D) = q Rep(D) G Rep C. Nikolaou and M. Koubarakis OR – Incomplete Information in RDF 25/36
  • 59.
    Correctness of SPARQLquery evaluation for RDFi Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute. Does it? D Rep G q q D Rep( q D) = q Rep(D) G Rep C. Nikolaou and M. Koubarakis OR – Incomplete Information in RDF 25/36
  • 60.
    Certain answer tothe rescue Definition The certain answer to query q over a set of RDF graphs G is set { q C. Nikolaou and M. Koubarakis – G | G ∈ G} Incomplete Information in RDF 26/36
  • 61.
    Certain answer tothe rescue Definition The certain answer to query q over a set of RDF graphs G is set { q G | G ∈ G} Using the notion of certain answer we can relax the earlier equality requirement to one that uses Q-equivalence. C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 26/36
  • 62.
    Certain answer tothe rescue Definition The certain answer to query q over a set of RDF graphs G is set { q G | G ∈ G} Using the notion of certain answer we can relax the earlier equality requirement to one that uses Q-equivalence. Definition Let Q be a fragment of SPARQL. Two sets of RDF graphs G, H will be Q-equivalent (denoted by G ≡Q H) if they give the same certain answer to every query q ∈ Q { q C. Nikolaou and M. Koubarakis G – | G ∈ G} = { q Incomplete Information in RDF H | H ∈ H} 26/36
  • 63.
    Representation system Let D bethe set of all RDFi databases G be the set of all RDF graphs Rep : D → G be a function determining the set of possible RDF graphs corresponding to an RDFi database, and Q be a fragment of SPARQL D, Rep, Q is a representation system if for all D ∈ D and all q ∈ Q, there exists an RDFi database q D such that Rep( q C. Nikolaou and M. Koubarakis – D) ≡Q q Rep(D) Incomplete Information in RDF 27/36
  • 64.
    Representation system Let D bethe set of all RDFi databases G be the set of all RDF graphs Rep : D → G be a function determining the set of possible RDF graphs corresponding to an RDFi database, and Q be a fragment of SPARQL D, Rep, Q is a representation system if for all D ∈ D and all q ∈ Q, there exists an RDFi database q D such that Rep( q D) ≡Q q Rep(D) Are there interesting fragments Q of SPARQL that lead to a representation system? C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 27/36
  • 65.
    Representation systems forRDFi Theorem The following fragments of SPARQL can give us representation systems for RDFi (with D and Rep as defined): QC : CONSTRUCT queries using only AND, UNION, AUF and FILTER graph patterns, and without blank nodes in their templates QC : CONSTRUCT queries using only well-designed WD graph patterns, and without blank nodes in their templates Well-designed graph patterns [P´rez/Arenas/Gutierrez ’06] e AND, FILTER, OPT fragment P FILTER R: safe P1 OPT P2 : variables in P2 are properly scoped C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 28/36
  • 66.
    Representation systems forRDFi (cont’d) Monotonicity Definition A fragment Q of SPARQL is monotone if for every q ∈ Q and RDF graphs G and H such that G ⊆ H, it is q G ⊆ q H . Proposition [Arenas/P´rez ’11] e The fragment of SPARQL corresponding to AND, UNION, and FILTER graph patterns is monotone. The fragment of SPARQL corresponding to well-designed graph patterns is weakly-monotone ( ). Proposition Fragments QC and QC are monotone. AUF WD C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 29/36
  • 67.
    Computing certain answers Representationsystems guarantee correctness of query evaluation for RDFi and SPARQL Query evaluation computes an RDFi database q D = D = (G , φ) How could we compute the certain answer? Rep( q Rep( q D) D) is infinite! C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 30/36
  • 68.
    Computing certain answers(cont’d) Theorem For D = (G , φ) and q from QC or QC , the certain answer of q AUF WD over D can be computed as follows: i) compute q D = Dq = (Gq , φ), ii) compute the RDFi database (Hq , φ) = ((Dq )EQ )∗ , and iii) return the set of RDF triples {(s, p, o) | ((s, p, o), θ) ∈ Hq such that φ |= θ and o ∈ U} / C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 31/36
  • 69.
    The certainty problem CERT(q, H, D) Input An RDF graph H, a CONSTRUCT query q, and an RDFi database D Question Does H belong to the certain answer of q over D? H⊆ C. Nikolaou and M. Koubarakis – q Rep(D) ? Incomplete Information in RDF 32/36
  • 70.
    The certainty problem CERT(q, H, D) Input An RDF graph H, a CONSTRUCT query q, and an RDFi database D Question Does H belong to the certain answer of q over D? H⊆ q Rep(D) ? We study the data complexity of CERT (q, H, D) H and D are part of the input q is fixed C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 32/36
  • 71.
    Deciding the certaintyproblem Theorem CERT (q, H, D) is equivalent to deciding whether formula t∈H (∀ l)(φ( l) ⊃ Θ(t, q, D, l)) is true l is the vector of all e-literals in D Θ(t, q, D, l) is of the form θ1 ∨ · · · ∨ θk , where θi is a conjunction of L-constraints C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 33/36
  • 72.
    Computational complexity CERT (q,H, D) C. Nikolaou and M. Koubarakis L data complexity ECL/diPCL/dePCL/RCL coNP-complete TCL/PCL (RCC-5) Problem EXPTIME – Incomplete Information in RDF 34/36
  • 73.
    Computational complexity CERT (q,H, D) L data complexity ECL/diPCL/dePCL/RCL coNP-complete TCL/PCL (RCC-5) Problem EXPTIME Problem combined complexity SPARQL SPARQLAUF SPARQLWD PSPACE-complete NP-complete coNP-complete C. Nikolaou and M. Koubarakis – Incomplete Information in RDF data complexity E AC SP OG L 34/36
  • 74.
    Conclusions RDFi framework Modeling ofincomplete information for property values Formal semantics through possible worlds semantics SPARQL query evaluation and certain answer semantics Two representation systems for RDFi and SPARQL Algorithm for certain answer computation Preliminary complexity analysis C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 35/36
  • 75.
    Future work More generalmodels of incomplete information (subject, predicate) More refined complexity results Scalable implementation when L expresses topological constraints with/without constants (TCL/PCL) Connection with query processing for the topology vocabulary extension of GeoSPARQL Probabilistic extension to RDFi Data integration theory for linked data (only practice exists so far) Connection to geospatial OBDA using DL logics C. Nikolaou and M. Koubarakis – Incomplete Information in RDF 36/36
  • 76.
  • 77.
    Constraint languages L Propertiesof L Many-sorted first-order language Interpreted over a fixed (intended) structure ML EQ: distinguished equality predicate L-constraints: quantifier-free formulae of L Weakly closed under negation: the negation of every atomic L-constraint is equivalent to a disjunction of L-constraints
  • 78.
    Correctness of SPARQLquery evaluation for RDFi An easy negative example Example (classical RDF - OWA) D s p o . q CONSTRUCT { s ?p ?o } WHERE { s ?p ?o }
  • 79.
    Correctness of SPARQLquery evaluation for RDFi An easy negative example Example (classical RDF - OWA) D q s p o . CONSTRUCT { s ?p ?o } WHERE { s ?p ?o } Then, q D =D
  • 80.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D)
  • 81.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q Rep( q D) = (s, p, o) , (s, p, o) (c, d, e) , D with q (s, p, o) (s, b, c) Rep(D) ,···
  • 82.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q Rep( q D) q = Rep(D) (s, p, o) = , (s, p, o) (s, p, o) (c, d, e) , , (s, p, o) (s, b, c) D with q (s, p, o) (s, b, c) ,··· Rep(D) ,···
  • 83.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q Rep( q D) q = Rep(D) (s, p, o) = There is no g ∈ q (s, p, o) (c, d, e) , (s, p, o) Rep(D) , , (s, p, o) (s, b, c) D with q (s, p, o) (s, b, c) Rep(D) ,··· ,··· containing the triple (c, d, e)!
  • 84.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q Rep( q D) q = Rep(D) (s, p, o) = There is no g ∈ q (s, p, o) (c, d, e) , (s, p, o) Rep(D) , , (s, p, o) (s, b, c) D with q (s, p, o) (s, b, c) ,··· ,··· containing the triple (c, d, e)! This would work if RDF made the CWA Rep(D)
  • 85.
    Correctness of SPARQLquery evaluation for RDFi (cont’d) An easy negative example Example Let us compare the the set of graphs represented by q Rep( q D) q = Rep(D) (s, p, o) = There is no g ∈ q (s, p, o) (c, d, e) , (s, p, o) Rep(D) , , (s, p, o) (s, b, c) D with q (s, p, o) (s, b, c) Rep(D) ,··· ,··· containing the triple (c, d, e)! This would work if RDF made the CWA We know this already from the relational case [Imielinski/Lipski ’84]
  • 86.
    Computing certain answers Definitions Definition(EQ-completion) The EQ-completed form of D = (G , φ), denoted by D EQ = (G EQ , φ), is taken from D by replacing all e-literals l ∈ U appearing in G by the constant c ∈ C such that φ |= l EQ c
  • 87.
    Computing certain answers Definitions Definition(EQ-completion) The EQ-completed form of D = (G , φ), denoted by D EQ = (G EQ , φ), is taken from D by replacing all e-literals l ∈ U appearing in G by the constant c ∈ C such that φ |= l EQ c Definition (Normalization) The normalized form of D is the RDFi database D ∗ = (G ∗ , φ) where G ∗ is the set {(t, θ) | (t, θi ) ∈ G for all i = 1 . . . n, and θ is G = {(t, θ1 ), (t, θ2 ), (t , θ )} i θi } G ∗ = {(t, θ1 ∨ θ2 ), (t , θ )}