Upcoming SlideShare
×

# Data Exchange over RDF

• 431 views

More in: Education
• Comment goes here.
Are you sure you want to
Be the first to comment
Be the first to like this

Total Views
431
On Slideshare
0
From Embeds
0
Number of Embeds
1

Shares
7
0
Likes
0

No embeds

### Report content

No notes for slide

### Transcript

• 1. Data Exchange over RDF Andr´s Letelier e Advisor: Marcelo ArenasPontiﬁcia Universidad Cat´lica de Chile o September 1, 2011
• 2. What is data exchange? Problem Data under one schema S needs to be restructured and translated into a target schema T S −→ T IS −→ IT
• 3. Schema mappings Question Which source instances corresponds to which target instances? Answer Schema mappings: M ⊆ Instances(S) × Instances(T) Usually, schema mappings are deﬁned as M = (S, T, ΣST )
• 4. Deﬁnition (Solution)I2 is a solution of I1 under M iif (I1 , I2 ) ∈ MThe set of all solutions for I1 under M is denoted by SolM (I1 )
• 5. Resource Description Framework (RDF) Data model for representing information about World Wide Web resources W3C Recommendation (1998) Part of the semantic web stack Directed, labeled graphs Blank nodes (labeled nulls) Basically, sets of triples (s, p, o)
• 6. Example D= { (B1 name paul) (B1 email paul@example.edu) (B2 name john) (B2 city Liverpool) }
• 7. SPARQL (pronounced “sparkle”) Query language for RDF W3C Recommendation(2008) Standard for querying RDF datasets Returns sets of partial mappings Operators: Projection AND (inner join) OPT (left join) FILTER UNION and more
• 8. Example P1 = (?X, name, ?Y ) ?X ?Y P1 D = B1 paul B2 john
• 9. Example P2 = (?X, name, ?Y ) AND (?X, email, ?Z) ?X ?Y ?Z P2 D = B1 paul paul@example.edu
• 10. Example P3 = (?X, name, ?Y ) OPT (?X, email, ?Z) ?X ?Y ?Z P3 D = B1 paul paul@example.edu B2 john
• 11. Well-designed SPARQL patterns Deﬁnition (Well-designed patterns) A pattern P is well designed if for every subpattern P of the form P1 OPT P2 , every variable that appears in P2 and outside P also appears in P1 . Example (?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A)) is well-designed (?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A)) is not
• 12. Data Exchange over RDF S and T are ﬁxed to be RDF triples Tuple generating dependencies have to be redeﬁned But ﬁrst, we need some deﬁnitions...
• 13. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: var(P ) are the variables mentioned in P dom(µ1 ) is the domain of µ1 A SPARQL SELECT query (denoted by (W, P ), where W ⊆ var(P )) is the projection of the evaluation of P onto the variables in W
• 14. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for every ?X in dom(µ1 ) that is not bound to a blank node we have that µ1 (?X) = µ2 (?X) and for every pair of variables ?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the case that µ2 (?X) = µ2 (?Y ). Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .
• 15. RDF Tuple Generating Dependencies (Re)Deﬁnition (Tuple Generating Dependencies) Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ). An RDF tgd is a sentence of the form (W, P1 ) → (W, P2 ) Given two RDF graphs G1 and G2 , and a set of tgds Σ, (G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the case that (W, P1 ) G1 (W, P2 ) G2
• 16. RDF Schema Mappings Since S and T are ﬁxed, M=Σ G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ
• 17. Universal solutions Example Let W = {?X}, Σ = {(W, (?X, name, ?Y ) AND (?X, email, ?Z)) → (W, (?Y, hasmail, ?Z))} and consider the dataset D: Solution 1 G2 = { (paul hasmail paul@example.edu) } Solution 2 G2 = { (paul hasmail paul@example.edu) (john hasmail n) }
• 18. Universal solutions Deﬁnition A solution G2 is universal if for every other solution G2 , G2 G2 Solution 1 is universal Solution 2 is not
• 19. Universal solutions Not all settings have universal solutions: Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and Σ = {(W, (?X, ?Y, ?Z)) → (W, ((?X, a, b) OPT (?W, b, ?Y )) AND ((?X, c, d) OPT (?Z, d, ?Y )))}
• 20. Solution 1 G2 = { (1 a b) ( n1 b 2) (1 c d) }Solution 2 G2 = { (1 a b) ( n2 d 2) (1 c d) }This setting has no universal solution!
• 21. Good and bad news Bad news There is no ensurance that an exchange setting that has a solution will have a universal solution Good news If the heads of all tgds in Σ are well-designed and there is a solution, there is always a universal solution Better news We have an algorithm
• 22. “Chasing” SPARQL queries input A mapping µ and a (well-designed) SPARQL pattern Poutput An RDF graph G such that µ ∈ P G Chase(µ, ν, P, G) t: add unbound variables in t as fresh blank nodes to ν add ν(t) to G P1 AND P2 : Chase(µ, ν, P1 , G) Chase(µ, ν, P2 , G) P1 OPT P2 : Chase(µ, ν, P1 , G) if dom(µ) dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)
• 23. After chasing: µ ν ν∈ P G {µ} P G If we chase with every P2 in Heads(Σ) the evaluations of (W, P1 ) G1 , we get a universal solution.
• 24. Certain answers Deﬁnition (Certain answers on a regular data exchange setting) The set of certain answers is the intersection of the evaluation of the query over all the valid solutions Example Consider G1 = {(1, 2, 3)} and {({?X},(?X, ?Y, ?Z)) → ({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}
• 25. Solution 1 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1}} }Solution 2 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} (1 2 3) } The intersection of (W, P2 ) G2 and (W, P2 ) G2 is empty!
• 26. Certain answers Given a pattern P and a set of RDF graphs G, let Lower(P, G) be the set of all lower bounds of G w.r.t. subsumption. (Re)Deﬁnition (Certain Answers) The set of certain answers of a set of RDF graphs and a SPARQL pattern P is deﬁned as any mapping Ω in Lower(P, G), such that for any other Ω in Lower(P, G) it is the case that Ω Ω . Claim All the possible sets of certain answers to an RDF data exchange setting are homomorﬁcally equivalent.
• 27. Back in our previous example... Solution 1 G2 = { (W, P2 ) G2 = {{?X → 1}} (1 1 2) } Solution 2 G2 = { (1 1 2) (1 2 3) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} } The set of certain answers is now {{?X → 1}}
• 28. In conclusion... Our contributions so far: RDF and SPARQL TGDs RDF Schema mappings Universal solutions Materialization of universal solutions Certain answers
• 29. In conclusion... To do: Prove remaining claims Query answering (using universal solutions) Incomplete information in the source instance Knowledge exchange over RDFs
• 30. Thank you for listening Any questions?