Data Exchange over RDF


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Exchange over RDF

  1. 1. Data Exchange over RDF Andr´s Letelier e Advisor: Marcelo ArenasPontificia Universidad Cat´lica de Chile o September 1, 2011
  2. 2. What is data exchange? Problem Data under one schema S needs to be restructured and translated into a target schema T S −→ T IS −→ IT
  3. 3. Schema mappings Question Which source instances corresponds to which target instances? Answer Schema mappings: M ⊆ Instances(S) × Instances(T) Usually, schema mappings are defined as M = (S, T, ΣST )
  4. 4. Definition (Solution)I2 is a solution of I1 under M iif (I1 , I2 ) ∈ MThe set of all solutions for I1 under M is denoted by SolM (I1 )
  5. 5. Resource Description Framework (RDF) Data model for representing information about World Wide Web resources W3C Recommendation (1998) Part of the semantic web stack Directed, labeled graphs Blank nodes (labeled nulls) Basically, sets of triples (s, p, o)
  6. 6. Example D= { (B1 name paul) (B1 email (B2 name john) (B2 city Liverpool) }
  7. 7. SPARQL (pronounced “sparkle”) Query language for RDF W3C Recommendation(2008) Standard for querying RDF datasets Returns sets of partial mappings Operators: Projection AND (inner join) OPT (left join) FILTER UNION and more
  8. 8. Example P1 = (?X, name, ?Y ) ?X ?Y P1 D = B1 paul B2 john
  9. 9. Example P2 = (?X, name, ?Y ) AND (?X, email, ?Z) ?X ?Y ?Z P2 D = B1 paul
  10. 10. Example P3 = (?X, name, ?Y ) OPT (?X, email, ?Z) ?X ?Y ?Z P3 D = B1 paul B2 john
  11. 11. Well-designed SPARQL patterns Definition (Well-designed patterns) A pattern P is well designed if for every subpattern P of the form P1 OPT P2 , every variable that appears in P2 and outside P also appears in P1 . Example (?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A)) is well-designed (?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A)) is not
  12. 12. Data Exchange over RDF S and T are fixed to be RDF triples Tuple generating dependencies have to be redefined But first, we need some definitions...
  13. 13. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: var(P ) are the variables mentioned in P dom(µ1 ) is the domain of µ1 A SPARQL SELECT query (denoted by (W, P ), where W ⊆ var(P )) is the projection of the evaluation of P onto the variables in W
  14. 14. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for every ?X in dom(µ1 ) that is not bound to a blank node we have that µ1 (?X) = µ2 (?X) and for every pair of variables ?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the case that µ2 (?X) = µ2 (?Y ). Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .
  15. 15. RDF Tuple Generating Dependencies (Re)Definition (Tuple Generating Dependencies) Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ). An RDF tgd is a sentence of the form (W, P1 ) → (W, P2 ) Given two RDF graphs G1 and G2 , and a set of tgds Σ, (G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the case that (W, P1 ) G1 (W, P2 ) G2
  16. 16. RDF Schema Mappings Since S and T are fixed, M=Σ G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ
  17. 17. Universal solutions Example Let W = {?X}, Σ = {(W, (?X, name, ?Y ) AND (?X, email, ?Z)) → (W, (?Y, hasmail, ?Z))} and consider the dataset D: Solution 1 G2 = { (paul hasmail } Solution 2 G2 = { (paul hasmail (john hasmail n) }
  18. 18. Universal solutions Definition A solution G2 is universal if for every other solution G2 , G2 G2 Solution 1 is universal Solution 2 is not
  19. 19. Universal solutions Not all settings have universal solutions: Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and Σ = {(W, (?X, ?Y, ?Z)) → (W, ((?X, a, b) OPT (?W, b, ?Y )) AND ((?X, c, d) OPT (?Z, d, ?Y )))}
  20. 20. Solution 1 G2 = { (1 a b) ( n1 b 2) (1 c d) }Solution 2 G2 = { (1 a b) ( n2 d 2) (1 c d) }This setting has no universal solution!
  21. 21. Good and bad news Bad news There is no ensurance that an exchange setting that has a solution will have a universal solution Good news If the heads of all tgds in Σ are well-designed and there is a solution, there is always a universal solution Better news We have an algorithm
  22. 22. “Chasing” SPARQL queries input A mapping µ and a (well-designed) SPARQL pattern Poutput An RDF graph G such that µ ∈ P G Chase(µ, ν, P, G) t: add unbound variables in t as fresh blank nodes to ν add ν(t) to G P1 AND P2 : Chase(µ, ν, P1 , G) Chase(µ, ν, P2 , G) P1 OPT P2 : Chase(µ, ν, P1 , G) if dom(µ) dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)
  23. 23. After chasing: µ ν ν∈ P G {µ} P G If we chase with every P2 in Heads(Σ) the evaluations of (W, P1 ) G1 , we get a universal solution.
  24. 24. Certain answers Definition (Certain answers on a regular data exchange setting) The set of certain answers is the intersection of the evaluation of the query over all the valid solutions Example Consider G1 = {(1, 2, 3)} and {({?X},(?X, ?Y, ?Z)) → ({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}
  25. 25. Solution 1 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1}} }Solution 2 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} (1 2 3) } The intersection of (W, P2 ) G2 and (W, P2 ) G2 is empty!
  26. 26. Certain answers Given a pattern P and a set of RDF graphs G, let Lower(P, G) be the set of all lower bounds of G w.r.t. subsumption. (Re)Definition (Certain Answers) The set of certain answers of a set of RDF graphs and a SPARQL pattern P is defined as any mapping Ω in Lower(P, G), such that for any other Ω in Lower(P, G) it is the case that Ω Ω . Claim All the possible sets of certain answers to an RDF data exchange setting are homomorfically equivalent.
  27. 27. Back in our previous example... Solution 1 G2 = { (W, P2 ) G2 = {{?X → 1}} (1 1 2) } Solution 2 G2 = { (1 1 2) (1 2 3) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} } The set of certain answers is now {{?X → 1}}
  28. 28. In conclusion... Our contributions so far: RDF and SPARQL TGDs RDF Schema mappings Universal solutions Materialization of universal solutions Certain answers
  29. 29. In conclusion... To do: Prove remaining claims Query answering (using universal solutions) Incomplete information in the source instance Knowledge exchange over RDFs
  30. 30. Thank you for listening Any questions?