Successfully reported this slideshow.
Upcoming SlideShare
×

# Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

1,991 views

Published on

These are the slides from my WWW 2013 Tutorial "Linked Data Query Processing" http://db.uwaterloo.ca/LDQTut2013/

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

1. 1. Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)May 14, 2013http://db.uwaterloo.ca/LDQTut2013/2.Theoretical FoundationsOlaf HartigUniversity of Waterloo
2. 2. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 2● Data retrieval approach● Data source selection● Data source ranking(optional, for optimization)Query-local dataGET http://.../movie2449● Result construction approach● i.e., query-local data processing● Combining data retrievaland result constructionhttp://mdb.../Paul http://geo.../Berlinhttp://mdb.../Ric http://geo.../Rome?loc?actor“Ingredients” for LD Query Execution
3. 3. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 3Outline Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics➢ Definition➢ Theoretical Properties Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
4. 4. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 4General Idea of SPARQL● SPARQL: Declarative query language for RDF data● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables( ?p , affiliated with , orgaX ) ,( ?p , interested in , ?i )Basic Query Pattern:
5. 5. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 5General Idea of SPARQL● SPARQL: Declarative query language for RDF data● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables● Any part of the queried data that matches the query patterncontributes a solution to the query result( ?p , affiliated with , orgaX ) ,( ?p , interested in , ?i )Basic Query Pattern:( alice , affiliated with , orgaX ) ,( bob , affiliated with , acme ) ,( alice , interested in , yoga ) ,( alice , interested in , tea ) ,( bob , interested in , tea ) ,Data:?p ?ialice yogaalice teaQuery Result:
6. 6. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 6Some Terminology● Any query result is represented by a set of valuationsΩ = { μ1 , … , μn }● Valuation: mapping of query variables to RDF termsμ = { ?p → alice , ?i → yoga }● Any valuation in a query result is a solution● Two valuations μ1 and μ2 are compatible if they agree onthe binding for any shared variable● i.e., μ1(?v) = μ2(?v) for all ?v Є dom(μ1) ∩ dom(μ2)?p ?ialice yogaalice tea
7. 7. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 7Query Patterns and their Semantics● Triple pattern: An RDF triple that may have a variable assubject, predicate, or object, respectively●eval( tp, G ) :═ { μ | μ[tp] Є G and dom(μ) = vars(tp) }● Group graph pattern: Conjunction of two query patterns●eval( (P1 AND P2), G ) :═ { μ1 U μ2 | μ1 Є eval(P1,G) andμ2 Є eval(P2,G) andμ1 and μ2 are compatible }● Associative and commutative● Basic graph pattern (BGP): A set of triple patterns● B = { tp1 , … , tpn } is equivalent to ( tp1 AND … AND tpn )● Filter, union, optional, etc.
8. 8. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 8Outline Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics➢ Definition➢ Theoretical Properties Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties√
9. 9. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 9Data Model● Static view (i.e., no changes during computation)● Web of Linked Data: W = ( D, adoc, data )● D – set of symbols that represent Web documentsfrom which Linked Data can be extracted● adoc – partial mapping from URIs to D● data – total mapping from D to finite sets of RDF triples●All data in W: AllData(W) :═ Ud є D data(d)D = adoc( ) = data( ) =
10. 10. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 10Linked Data QueryQ(W)
11. 11. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 11( ?p , affiliated with , orgaX ) ,( ?p , interested in , ?i )?p ?ialice yogaalice teaQP(W) = { μ1 , μ2 , ... }SPARQL-Based Linked Data Query
12. 12. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 12Outline Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics➢ Definition➢ Theoretical Properties Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties√√
13. 13. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 13Full-Web SemanticsQP(W) :═ eval(P,AllData(W))
14. 14. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 14Computability● (Ordinary) Turing machinesare unsuitable:● Limited data access capabilitiesnot properly captured● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997QP(W) :═ eval(P,AllData(W))TM
15. 15. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 15LD Machine● Multi-tape Turing machine➔ Input➔ Work➔ Output
16. 16. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 16LD Machine# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙● Multi-tape Turing machine➔ Web Input➔ Input➔ Work➔ Output
17. 17. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 17LD Machine# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙● Multi-tape Turing machine➔ Web Input➔ Input➔ Work➔ Output● Access to Web Input is restricted● Only by performinga particular procedurein a particular state
18. 18. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 18# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input➔ Input➔ Work➔ Output● For Q exists an LD machine MQ such that for any W holds:● MQ halts after a finite number of computation steps, and● MQ outputs the complete result Q(W)Finitely Computable LD Queriesstep 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k∙ ∙ ∙# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
19. 19. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 19Eventually Computable LD Queriesstepk + 2∙ ∙ ∙∙ ∙ ∙stepk - 3stepk - 2stepk - 1stepkstepk + 1# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙# enc(μ1) # enc(μ2)➔ Web Input➔ Input➔ Work➔ Output● For Q exists an LD machine MQ such that for any W holds:1. Output always encodes a subset of query result Q(W), and2. Each μ Q(W) eventually appears on the output✗ No guarantee for termination
20. 20. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 20Not (LD Machine) Computablestepk + 2∙ ∙ ∙∙ ∙ ∙stepk - 3stepk - 2stepk - 1stepkstepk + 1# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙# enc(μ1) # enc(μ2)➔ Web Input➔ Input➔ Work➔ Output● For Q exists an LD machine MQ such that for any W holds:1. Output always encodes a subset of query result Q(W), and2. Each μ Q(W) eventually appears on the output✗ No guarantee for termination
21. 21. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 21Theorem [Har12]: If a satisfiable SPARQLLD queryQPunder full-Web semantics is monotonic, then it iseventually computable (but not finitely computable);otherwise, it is not even eventually computable.Theorem [Har12]: If a satisfiable SPARQLLD queryQPunder full-Web semantics is monotonic, then it iseventually computable (but not finitely computable);otherwise, it is not even eventually computable.● Main reason: Set of all URIs is infinite (but countable)Computability (Full-Web Semantics)QP(W) :═ eval(P,AllData(W))
22. 22. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 22Satisfiability and MonotonicityQ( ) = { …, ri , … } ≠ Ø⟹ Q( ) ⊆ Q( )Proposition [Har12]: Let QPbe a SPARQLLD query under full-Websemantics.● QPis satisfiable ⟺ its query pattern P is satisfiable● QPis monotonic ⟺ its query pattern P is monotonicProposition [Har12]: Let QPbe a SPARQLLD query under full-Websemantics.● QPis satisfiable ⟺ its query pattern P is satisfiable● QPis monotonic ⟺ its query pattern P is monotonicUnfortunately:Satisfiability andmonotonicity of SPARQLquery patterns is undecidableSatisfiability:Monotonicity:OPTtpANDFILTERUNION
23. 23. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 23Theorem [Har12]: If a satisfiable SPARQLLD queryQPunder full-Web semantics is monotonic, then it iseventually computable (but not finitely computable);otherwise, it is not even eventually computable.Theorem [Har12]: If a satisfiable SPARQLLD queryQPunder full-Web semantics is monotonic, then it iseventually computable (but not finitely computable);otherwise, it is not even eventually computable.● Main reason: Set of all URIs is infinite (but countable)Computability (Full-Web Semantics)QP(W) :═ eval(P,AllData(W))
24. 24. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 24Outline Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics➢ Definition➢ Theoretical Properties Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties√√√
25. 25. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 25Reachability-Based Query Semantics
26. 26. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 26●Seed URIs S ᑕ UReachability-Based Query Semantics
27. 27. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 27●Seed URIs S ᑕ U● Reachability criterion c●(Turing) computable function c: T ×U × P → {true,false}Reachability-Based Query Semantics
28. 28. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 28WQP,S( ):= eval(P,AllData(W*))cReachability-Based Query Semantics
29. 29. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 29(Reachability-Based) cAll-SemanticsWQP,S( ) eval(P,AllData(W*))cAll:=
30. 30. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 30WQP,S( ) eval(P,AllData(W*))cNone:=(Reachability-Based) cNone-Semantics
31. 31. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 31WQP,S( ) eval(P,AllData(W*))cMatch:=(Reachability-Based) cMatch-Semantics
32. 32. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 32Satisfiability and MonotonicityQ( ) = { …, ri , … } ≠ Ø⟹ Q( ) ⊆ Q( )Satisfiability:Monotonicity:Proposition [Har12]: Let c be a reachability criterion and QPcSbe aSPARQLLD query under c-semantics.● QPcSis satisfiable ⟺ its SPARQL expression P is satisfiable● QPcSis monotonic ⇒ its SPARQL expression P is monotonicProposition [Har12]: Let c be a reachability criterion and QPcSbe aSPARQLLD query under c-semantics.● QPcSis satisfiable ⟺ its SPARQL expression P is satisfiable● QPcSis monotonic ⇒ its SPARQL expression P is monotonicCounterexample: Let QPcSbe a SPARQLLD query under c-semantics.If c = cNone and |S| = 1, then QPcSis monotonic.Counterexample: Let QPcSbe a SPARQLLD query under c-semantics.If c = cNone and |S| = 1, then QPcSis monotonic.
33. 33. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 33LD Machine-Based ComputabilityTheorem [Har12]: For any reachability criterion c, anySPARQL based Linked Data query under c-semanticsis finitely computable.Theorem [Har12]: For any reachability criterion c, anySPARQL based Linked Data query under c-semanticsis finitely computable.● If we restrict our data model such that Websof Linked Data are finite (i.e., consist of a finitenumber of documents), then:For results w.r.t. the unrestricted data model (i.e.,Webs of Linked Data that may be infinitely large),refer to [Har12].
34. 34. WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 34Outline Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics➢ Definition➢ Theoretical Properties Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties√√√√Next part: 3. Source Selection Strategies ...