Data Exchange over RDF

Data Exchange over RDF

Andr´s Letelier
e
Advisor: Marcelo Arenas

Pontiﬁcia Universidad Cat´lica de Chile
o

September 1, 2011

What is data exchange?

Problem
Data under one schema S needs to be restructured and translated
into a target schema T

S −→ T
IS −→ IT

Schema mappings

Question
Which source instances corresponds to which target instances?

Answer
Schema mappings:

M ⊆ Instances(S) × Instances(T)

Usually, schema mappings are deﬁned as M = (S, T, ΣST )

Deﬁnition (Solution)
I2 is a solution of I1 under M iif (I1 , I2 ) ∈ M
The set of all solutions for I1 under M is denoted by SolM (I1 )

Resource Description Framework (RDF)

Data model for representing information about World Wide
Web resources
W3C Recommendation (1998)
Part of the semantic web stack
Directed, labeled graphs
Blank nodes (labeled nulls)
Basically, sets of triples (s, p, o)

Example
D= {
(B1 name paul)
(B1 email paul@example.edu)
(B2 name john)
(B2 city Liverpool)
}

SPARQL (pronounced “sparkle”)

Query language for RDF
W3C Recommendation(2008)
Standard for querying RDF datasets
Returns sets of partial mappings
Operators:
Projection
AND (inner join)
OPT (left join)
FILTER
UNION
and more

Example

P1 = (?X, name, ?Y )

?X ?Y
P1 D = B1 paul
B2 john

Example

P2 = (?X, name, ?Y ) AND (?X, email, ?Z)

?X ?Y ?Z
P2 D =
B1 paul paul@example.edu

Example

P3 = (?X, name, ?Y ) OPT (?X, email, ?Z)

?X ?Y ?Z
P3 D = B1 paul paul@example.edu
B2 john

Well-designed SPARQL patterns

Deﬁnition (Well-designed patterns)
A pattern P is well designed if for every subpattern P of the form
P1 OPT P2 , every variable that appears in P2 and outside P also
appears in P1 .

Example
(?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A))
is well-designed
(?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A))
is not

Data Exchange over RDF

S and T are fixed to be RDF triples
Tuple generating dependencies have to be redefined
But first, we need some definitions...

RDF Tuple Generating Dependencies

Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
Ω1 and Ω2 be sets of mappings. Then:
var(P ) are the variables mentioned in P
dom(µ1 ) is the domain of µ1
A SPARQL SELECT query (denoted by (W, P ), where
W ⊆ var(P )) is the projection of the evaluation of P onto
the variables in W


Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
Ω1 and Ω2 be sets of mappings. Then:
µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for
every ?X in dom(µ1 ) that is not bound to a blank node we
have that µ1 (?X) = µ2 (?X) and for every pair of variables
?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the
case that µ2 (?X) = µ2 (?Y ).
Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in
Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .


(Re)Deﬁnition (Tuple Generating Dependencies)
Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ).
An RDF tgd is a sentence of the form

(W, P1 ) → (W, P2 )

Given two RDF graphs G1 and G2 , and a set of tgds Σ,
(G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the
case that (W, P1 ) G1 (W, P2 ) G2

RDF Schema Mappings

Since S and T are ﬁxed,

M=Σ

G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ

Universal solutions

Example
Let W = {?X}, Σ =
{(W, (?X, name, ?Y ) AND (?X, email, ?Z)) →
(W, (?Y, hasmail, ?Z))}
and consider the dataset D:

Solution 1
G2 = {
(paul hasmail paul@example.edu)
}

Solution 2
G2 = {
(paul hasmail paul@example.edu)
(john hasmail n)
}

Universal solutions

Deﬁnition
A solution G2 is universal if for every other solution G2 , G2 G2

Solution 1 is universal
Solution 2 is not

Universal solutions

Not all settings have universal solutions:
Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and

Σ = {(W, (?X, ?Y, ?Z)) →
(W, ((?X, a, b) OPT (?W, b, ?Y ))
AND ((?X, c, d) OPT (?Z, d, ?Y )))}

Solution 1
G2 = {
(1 a b)
( n1 b 2)
(1 c d)
}

Solution 2
G2 = {
(1 a b)
( n2 d 2)
(1 c d)
}
This setting has no universal solution!

Good and bad news

Bad news
There is no ensurance that an exchange setting that has a solution
will have a universal solution

Good news
If the heads of all tgds in Σ are well-designed and there is a
solution, there is always a universal solution

Better news
We have an algorithm

“Chasing” SPARQL queries

input A mapping µ and a (well-designed) SPARQL pattern P
output An RDF graph G such that µ ∈ P G

Chase(µ, ν, P, G)
t:
add unbound variables in t as fresh blank nodes to ν
add ν(t) to G
P1 AND P2 :
Chase(µ, ν, P1 , G)
P1 OPT P2 :
if dom(µ) dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)

After chasing:

µ ν
ν∈ P G
{µ} P G
If we chase with every P2 in Heads(Σ) the evaluations of
(W, P1 ) G1 , we get a universal solution.

Certain answers

Deﬁnition (Certain answers on a regular data exchange setting)
The set of certain answers is the intersection of the evaluation of
the query over all the valid solutions

Example
Consider G1 = {(1, 2, 3)} and

{({?X},(?X, ?Y, ?Z)) →
({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}

Solution 1
G2 = {
(1 1 2) (W, P2 ) G2 = {{?X → 1}}
}

Solution 2
G2 = {
(1 1 2)
(W, P2 ) G2 = {{?X → 1, ?Y → 2}}
(1 2 3)
}
The intersection of (W, P2 ) G2 and (W, P2 ) G2 is empty!

Certain answers

Given a pattern P and a set of RDF graphs G, let Lower(P, G) be
the set of all lower bounds of G w.r.t. subsumption.
(Re)Definition (Certain Answers)
The set of certain answers of a set of RDF graphs and a SPARQL
pattern P is defined as any mapping Ω in Lower(P, G), such that
for any other Ω in Lower(P, G) it is the case that Ω Ω .

Claim
All the possible sets of certain answers to an RDF data exchange
setting are homomorfically equivalent.

Back in our previous example...

Solution 1
G2 = { (W, P2 ) G2 = {{?X → 1}}
(1 1 2)
}

Solution 2
G2 = {
(1 1 2)
(1 2 3)
(W, P2 ) G2 = {{?X → 1, ?Y → 2}}
}
The set of certain answers is now {{?X → 1}}

In conclusion...

Our contributions so far:
RDF and SPARQL TGDs
RDF Schema mappings
Universal solutions
Materialization of universal solutions
Certain answers

In conclusion...

To do:
Prove remaining claims
Query answering (using universal solutions)
Incomplete information in the source instance
Knowledge exchange over RDFs

Thank you for listening

Any questions?

Data Exchange over RDF

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (7)

Similar to Data Exchange over RDF

Similar to Data Exchange over RDF (20)

More from net2-project

More from net2-project (10)

Recently uploaded

Recently uploaded (20)

Data Exchange over RDF