ICWE 2012 TutorialAn Introduction to SPARQL and   Queries over Linked Data                     ●●●Chapter 3: Querying Link...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets           Linked Data QueriesO...
SPARQL Endpoints ●   SPARQL query processing service ●   Supports the SPARQL protocol ●   Issuing a SPARQL query is an HTT...
Query Result Formats ●   For SELECT and ASK queries:                                            XML, JSON, plain text ●   ...
SPARQL Client Libraries ●   More convenient than on the protocol level:     ●   SPARQL JavaScript Library               ht...
SPARQL Client Libraries ●   Example with Jena ARQ:     import com.hp.hpl.jena.query.*;     String service = "..."; // addr...
SPARQL Endpoints ●   Several Linked Data sets exposed via SPARQL endpoint     ●   DBpedia http://dbpedia.org/sparql     ● ...
SPARQL Endpoints ●   Several Linked Data sets exposed via SPARQL endpoint     ●   DBpedia http://dbpedia.org/sparql     ● ...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets           Linked Data QueriesO...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
Querying a Given Collection ●   Some public SPARQL endpoints provide access to a     collection of data from multiple sour...
Setting up Your Own Collection ●   RDF-specific DBMSs:     ●   Virtuoso http://virtuoso.openlinksw.com/     ●   Allegro Gr...
Populating Your Own Collection ●   Datasets provided as RDF dumps ●   (Focused) crawling     ●   ldspider http://code.goog...
Setting up Your Own Collection ●   Pros:     ●   All relevant data     ●   Independent of existence, availability,        ...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
SPARQL Endpoint Federation ●   Idea of federated query processing:     ●   Querying a query federation         service (me...
SPARQL Endpoint Federation ●   Pros:     ●   Queried data is up to date                                                   ...
SPARQL 1.1 Federation Extension ●   SERVICE pattern in SPARQL 1.1     ●   Explicitly specify query patterns whose executio...
For all these approaches ... ●   … you have to know the relevant data sources beforehand     ●   When selecting a SPARQL e...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links     We alternate between:                          ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
“Real World” Example SELECT DISTINCT ?author ?phone WHERE {     ?pub swc:isPartOf           <http://data.semanticweb.org/c...
Summary                                             O. Hartig and A. Langegger. A Database Perspective on Consuming       ...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
SPARQL Pattern Evaluation           eval(P,G ) = { μ1 , μ2 , ... }         http://.../movie2449                     film  ...
SPARQL Linked Data Query                                      http://.../movie2449                                        ...
Full-Web Semantics       PQ (W ) = eval(P2AllData(W ))         { μ1 , μ, , ... }Olaf Hartig - ICWE 2012 Tutorial "An Intro...
Reachability-based Semantics ●   Seed URIs S ●   Reachability criterion cOlaf Hartig - ICWE 2012 Tutorial "An Introduction...
Reachability-based Semantics              P,S     Qc ( W ) = eval(P,AllData(W ))                                          ...
Reachability-based Semantics              P,S     Qc ( W ) = eval(P,AllData(W ))                All                       ...
Reachability-based Semantics              P,S     Qc ( W ) = eval(P,AllData(W ))                None                      ...
Reachability-based Semantics              P,S     Qc ( W ) = eval(P,AllData(W ))               Match                      ...
Computability              P,S     Qc ( W )  Match ●   (Ordinary) Turing machines     unsuitable:                         ...
LD Machine ●   Multi-tape Turing machine     ➔   Web Input               # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) #...
Finitely Computable LD Queries     ➔   Web Input               # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙    ...
Eventually Computable LD Queries     ➔   Web Input            # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙     ...
Main Results for cMatch-Semantics              Theorem: Any satisfiable SPARQL based Linked Data              Theorem: Any...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Iterator Based Execution                                         tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )   ...
Alternative Execution Order                                         tp1 = ( ?b , rdf:type , <http://.../Book> )           ...
Iterator Based Execution                                         tp1 = ( ?b , rdf:type , <http://.../Book> )              ...
Alternative Execution Order                                         tp1 = ( ?b , rdf:type , <http://.../Book> )           ...
Alternative Execution Order                                         tp1 = ( ?b , rdf:type , <http://.../Book> )           ...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
Query Plan Selection ●   Assessment criteria:     ●   Cost (query execution time)     ●   Benefit (size of computed of res...
Query Plan Selection ●   Assessment criteria:     ●   Cost (query execution time)     ●   Benefit (size of computed of res...
DEPENDENCY RESPECT RULE                 Use a dependency respecting query plan ●   Dependency respect: a variable from eac...
DEPENDENCY RESPECT RULE                 Use a dependency respecting query plan ●   Dependency respect: a variable from eac...
DEPENDENCY RESPECT RULE                 Use a dependency respecting query plan ●   Dependency respect: a variable from eac...
DEPENDENCY RESPECT RULE                 Use a dependency respecting query plan ●   Dependency respect: a variable from eac...
SEED TP RULE                       Use a plan with a seed triple pattern ●   Potential seed triple pattern      … is a tri...
NO VOCAB SEED RULE        Avoid a seed triple pattern with vocabulary terms ●   Not only vocabulary term URIs in the seed ...
FILTERING TP RULE           Use a plan where all filtering triple patterns are            as close to the seed triple patt...
Evaluation Procedure ●   Generate all possible plans ●   Execute each plan:     ●   5 runs (+ 1 initial warm-up run)     ●...
Evaluation Query (Example) SELECT ?spec ?genus WHERE {                                                         Of what gen...
Measurements                30                                                                          400               ...
Summary (Linked Data Queries) ●   Theoretical foundations of Linked Data queries     ●   Full-Web semantics, (family of) r...
Chapter 3           Accessing a SPARQL Endpoint           Queries over Multiple Datasets            ➢ Query a given coll...
These slides have been created by                                      Olaf Hartig                                        ...
Upcoming SlideShare
Loading in …5
×

Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

4,692 views

Published on

These are the slides from my ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data"

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,692
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
352
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

  1. 1. ICWE 2012 TutorialAn Introduction to SPARQL and Queries over Linked Data ●●●Chapter 3: Querying Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin
  2. 2. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets  Linked Data QueriesOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2
  3. 3. SPARQL Endpoints ● SPARQL query processing service ● Supports the SPARQL protocol ● Issuing a SPARQL query is an HTTP GET request with parameter query URL-encoded string with the SPARQL query GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3
  4. 4. Query Result Formats ● For SELECT and ASK queries: XML, JSON, plain text ● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ... ● How to request? ● ACCEPT header GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1 Accept: application/sparql-results+json ● Non-standard alternative: parameter out GET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4
  5. 5. SPARQL Client Libraries ● More convenient than on the protocol level: ● SPARQL JavaScript Library http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html ● ARC for PHP http://arc.semsol.org/ ● RAP – RDF API for PHP http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html ● Jena / ARQ (Java) http://jena.sourceforge.net/ ● Sesame (Java) http://www.openrdf.org/ ● SPARQL Wrapper (Python) http://sparql-wrapper.sourceforge.net/ ● PySPARQL (Python) http://code.google.com/p/pysparql/Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5
  6. 6. SPARQL Client Libraries ● Example with Jena ARQ: import com.hp.hpl.jena.query.*; String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecution e = QueryExecutionFactory.sparqlService( service, query ); ResultSet results = e.execSelect(); while ( results.hasNext() ) { QuerySolution s = results.nextSolution(); // … } e.close();Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6
  7. 7. SPARQL Endpoints ● Several Linked Data sets exposed via SPARQL endpoint ● DBpedia http://dbpedia.org/sparql ● Musicbrainz http://dbtune.org/musicbrainz/sparql ● Semantic Web dog food http://data.semanticweb.org/sparql ● etc. http://esw.w3.org/topic/SparqlEndpoints ● Send your query, receive the resultOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7
  8. 8. SPARQL Endpoints ● Several Linked Data sets exposed via SPARQL endpoint ● DBpedia http://dbpedia.org/sparql ● Musicbrainz http://dbtune.org/musicbrainz/sparql ● Semantic Web dog food http://data.semanticweb.org/sparql ● etc. http://esw.w3.org/topic/SparqlEndpoints ● Send your query, receive the result Querying a single dataset is quite boring Querying a single dataset is quite boring compared to: compared to: Issuing SPARQL queries over multiple datasets Issuing SPARQL queries over multiple datasetsOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8
  9. 9. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets  Linked Data QueriesOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9
  10. 10. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data QueriesOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10
  11. 11. Querying a Given Collection ● Some public SPARQL endpoints provide access to a collection of data from multiple sources ● http://lod.openlinksw.com/sparql ● http://sparql.sindice.com/ ● Pros: ● Nothing to set up ● Good query execution times ● Cons: ● Queried data might be out of date ● Not all relevant data in the collectionOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11
  12. 12. Setting up Your Own Collection ● RDF-specific DBMSs: ● Virtuoso http://virtuoso.openlinksw.com/ ● Allegro Graph http://www.franz.com/agraph/allegrograph/ ● Bigdata http://www.systap.com/bigdata.htm ● OWLIM http://www.ontotext.com/owlim ● 4store http://4store.org/ ● Jena TDB http://jena.apache.org/ ● Sesame http://www.openrdf.org/ ● etc.Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12
  13. 13. Populating Your Own Collection ● Datasets provided as RDF dumps ● (Focused) crawling ● ldspider http://code.google.com/p/ldspider/Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13
  14. 14. Setting up Your Own Collection ● Pros: ● All relevant data ● Independent of existence, availability, efficiency of SPARQL endpoints ● Good query execution times (once set up properly) ● Cons: ● Effort to set up ● Effort to operate ● Queried data might be out of dateOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14
  15. 15. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data QueriesOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15
  16. 16. SPARQL Endpoint Federation ● Idea of federated query processing: ● Querying a query federation service (mediator) ? ● Mediator distributes sub-queries to relevant sources Finally, mediator ? ? ? ● combines sub-results ● Prototypes: ● FedX ● SPLENDID ● ANAPSIDOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16
  17. 17. SPARQL Endpoint Federation ● Pros: ● Queried data is up to date ? ● Cons: ● All relevant datasets must be exposed via a SPARQL endpoint ? ● Effort to set ? ? up mediatorOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17
  18. 18. SPARQL 1.1 Federation Extension ● SERVICE pattern in SPARQL 1.1 ● Explicitly specify query patterns whose execution must be distributed to a remote SPARQL endpoint SELECT ?v ?ve WHERE SELECT ?v ?ve WHERE { { ?v rdf:type umbel-sc:Volcano ; ?v rdf:type umbel-sc:Volcano ; p:location dbpedia:Italy . p:location dbpedia:Italy . SERVICE <http://volcanos.example.org/query> { SERVICE <http://volcanos.example.org/query> { ?v p:lastEruption ?ve } ?v p:lastEruption ?ve } } }Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18
  19. 19. For all these approaches ... ● … you have to know the relevant data sources beforehand ● When selecting a SPARQL endpoint over an existing collection of datasets ● When setting up your own collection ● When configuring your federation system ● When using the SERVICE pattern ● … you restrict yourself to the selected sources ● … you do not tap the full potential of the WebOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19
  20. 20. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data QueriesOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20
  21. 21. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Discovered dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21
  22. 22. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://.../movie2449 film ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Discovered dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22
  23. 23. Main Idea ● Intertwine query evaluation with traversal of data links We alternate between: htt ● p:/ /. Evaluate parts of the query (triple patterns) ../m ? ● on a continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the query-local dataset Query http://.../movie2449 film ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Queried dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23
  24. 24. Main Idea ● Intertwine query evaluation with traversal of data links ?actor ● We alternate between: http://mdb.../Paul ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://.../movie2449 Query http://.../movie2449 in t or_ film http://mdb.../Paul ac ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Queried dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24
  25. 25. Main Idea ● Intertwine query evaluation with traversal of data links ?actor ● We alternate between: http://mdb.../Paul ● Evaluate parts of the query (triple patterns) ? aul P on a continuously augmented set of data .../ db /m Look up URIs in intermediate p:/ ● htt solutions and add retrieved data to the query-local dataset Query http://.../movie2449 film ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Queried dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25
  26. 26. Main Idea ● Intertwine query evaluation with traversal of data links ?actor ● We alternate between: http://mdb.../Paul ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?actor ?loc solutions and add retrieved data http://mdb.../Paul http://geo.../Berlin to the query-local dataset http://mdb.../Paul Query liv http://.../movie2449 es _in film http://geo.../Berlin ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Queried dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26
  27. 27. Main Idea ● Intertwine query evaluation with traversal of data links ?actor ● We alternate between: http://mdb.../Paul ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?actor ?loc solutions and add retrieved data http://mdb.../Paul http://geo.../Berlin to the query-local dataset Query http://.../movie2449 film ing n r_i Lo ca to t io ac n lives_in ?actor ?loc Queried dataOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27
  28. 28. “Real World” Example SELECT DISTINCT ?author ?phone WHERE { ?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> . ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel . FILTER regex( str(?topicLabel), "ontology engineering", "i" ) . ?pub swrc:author ?author . { ?author owl:sameAs ?authorAlt } Return phone numbers of authors of ontology engineering papers UNION at ESWC09. { ?authorAlt owl:sameAs ?author } ?authorAlt foaf:phone ?phone Result size 2 } # of retrieved docs 297 # of accessed servers 16 avg. execution time 1min 30secOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28
  29. 29. Summary O. Hartig and A. Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum 10(2), 2010Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29
  30. 30. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data Queries ➢ Foundations ➢ Iterator Based Implementation ➢ Query PlanningOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30
  31. 31. SPARQL Pattern Evaluation eval(P,G ) = { μ1 , μ2 , ... } http://.../movie2449 film ing ?actor ?loc _in Lo http://mdb.../Paul http://geo.../Berlin to r ca tio ac n lives_in ?actor ?locOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31
  32. 32. SPARQL Linked Data Query http://.../movie2449 film in g _in Lo to r ca tio ac n lives_in ?actor ?loc PQ (W ) = { μ1 , μ2 , ... } ?actor ?loc http://mdb.../Paul http://geo.../BerlinOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32
  33. 33. Full-Web Semantics PQ (W ) = eval(P2AllData(W )) { μ1 , μ, , ... }Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33
  34. 34. Reachability-based Semantics ● Seed URIs S ● Reachability criterion cOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34
  35. 35. Reachability-based Semantics P,S Qc ( W ) = eval(P,AllData(W )) *Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35
  36. 36. Reachability-based Semantics P,S Qc ( W ) = eval(P,AllData(W )) All *Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36
  37. 37. Reachability-based Semantics P,S Qc ( W ) = eval(P,AllData(W )) None *Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37
  38. 38. Reachability-based Semantics P,S Qc ( W ) = eval(P,AllData(W )) Match *Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38
  39. 39. Computability P,S Qc ( W ) Match ● (Ordinary) Turing machines unsuitable: TM ● Limited data access capabilities not properly captured ● Web machines ● Abiteboul and Vianu, 1997 ● Mendelzon and Milo, 1997Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39
  40. 40. LD Machine ● Multi-tape Turing machine ➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙ ➔ Input ➔ Work ➔ Output ● Access to Web input is restricted ● Only by performing a particular procedure in a particular stateOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40
  41. 41. Finitely Computable LD Queries ➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙ ➔ Input ➔ Work ➔ Output # enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) # ● For Q exists an LD machine MQ such that for any W holds: ● MQ halts after a finite number of computation steps, and ● MQ outputs the complete result Q(W ) ∙∙∙ step 1 ∙∙∙ step k - 3 step k - 2 step k – 1 step kOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41
  42. 42. Eventually Computable LD Queries ➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙ ➔ Input ➔ Work ➔ Output # enc(μ1) # enc(μ2) ● For Q exists an LD machine MQ such that for any W holds: 1. Output always encodes a subset of query result Q(W ), and 2. Each μ Q(W ) eventually appears on the output ✗ No guarantee for termination∙∙∙ ∙∙∙ step step step step step stepOlaf Hartig - ICWE 2012 Tutorial "Ank - 2 k-3 Introduction to SPARQL and Queries over Linked Data" -+ 1 3: Querying 2 k-1 k k Chapter k + Linked Data 42
  43. 43. Main Results for cMatch-Semantics Theorem: Any satisfiable SPARQL based Linked Data Theorem: Any satisfiable SPARQL based Linked Data P,S query QcP,S under cMatch-semantics that is monotonic, is query Q under cMatch-semantics that is monotonic, is Match at least eventually computable; at least eventually computable; Any non-monotonic QP,S is either finitely computable Any non-monotonic QcP,S is either finitely computable Match or not even eventually computable. or not even eventually computable. Problem: Problem: TERMINATION(cMatch )) TERMINATION(cMatch Web Input: W – a (potentially infinite) Web of Linked Data Web Input: W – a (potentially infinite) Web of Linked Data Ord.Input: S – a finite but nonempty set of seed URIs Ord.Input: S – a finite but nonempty set of seed URIs P – a SPARQL expression P – a SPARQL expression Question: Question: Does an LD machine exist that computes QcP,S (W )) Does an LD machine exist that computes QP,S (W Match and halts? and halts? Theorem: TERMINATION(cMatch)) is not LD machine decidable. Theorem: TERMINATION(cMatch is not LD machine decidable.Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43
  44. 44. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data Queries ➢ Foundations ➢ Iterator Based Implementation ➢ Query PlanningOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44
  45. 45. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Seed: <http://.../orgaX>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 45
  46. 46. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Seed: <http://.../orgaX>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 46
  47. 47. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Next?Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 47
  48. 48. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:affiliated_with <http://.../orgaX> :Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 48
  49. 49. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Next?Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 49
  50. 50. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2 = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:interested_in <http://.../b1> :Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 50
  51. 51. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2 = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Next?Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 51
  52. 52. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2 = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3 = ( <http://.../b1> , rdf:type , <http://.../Book> ) : Next? <http://.../b1> rdf:type <http://.../Book> :Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 52
  53. 53. Iterator Based Execution tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2 = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3 = ( <http://.../b1> , rdf:type , <http://.../Book> ) { ?p = <http://.../alice> , ?b = <http://.../b1> }Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 53
  54. 54. Alternative Execution Order tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Seed: <http://.../orgaX>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 54
  55. 55. Iterator Based Execution tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Seed: <http://.../orgaX>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 55
  56. 56. Alternative Execution Order tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 END! query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 : Next? <http://.../alice> ex:affiliated_with <http://.../orgaX> :Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56
  57. 57. Alternative Execution Order tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 END! query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset END! tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Computed query END! result may depend on the order of triple patterns = logical query execution planOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57
  58. 58. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data Queries ➢ Foundations ➢ Iterator Based Implementation ➢ Query PlanningOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58
  59. 59. Query Plan Selection ● Assessment criteria: ● Cost (query execution time) ● Benefit (size of computed of result) ● Cost and benefit must be estimated without plan execution ● Estimation impossible due to “zero knowledge” ● Heuristic Based Plan Selection ● DEPENDENCY RESPECT RULE ● SEED TP RULE ● NO VOCAB SEED RULE Assumptions about QcP,S : Match ● P refers to instance data ● FILTERING TP RULE ● S = uris(P)Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59
  60. 60. Query Plan Selection ● Assessment criteria: ● Cost (query execution time) ● Benefit (size of computed of result) ● Cost and benefit must be estimated without plan execution ● Estimation impossible due to “zero knowledge” ● Heuristic Based Plan Selection ● DEPENDENCY RESPECT RULE ● SEED TP RULE ● NO VOCAB SEED RULE Assumptions about QcP,S : Match ● P refers to instance data ● FILTERING TP RULE ● S = uris(P)Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60
  61. 61. DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 Query tp2 = ( ?p , ex:interested_in , ?b ) √ I2 ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61
  62. 62. DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 Query ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62
  63. 63. DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 tp2 = ( ?b , rdf:type , <http://.../Book> ) I2 Query ?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63
  64. 64. DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns ● Rationale: tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 Avoid cartesian products tp2 = ( ?b , rdf:type , <http://.../Book> ) I2 Query ?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64
  65. 65. SEED TP RULE Use a plan with a seed triple pattern ● Potential seed triple pattern … is a triple pattern that contains at least one HTTP URI ● Seed triple pattern of a plan … is the first triple pattern in the plan and Recall: S = uris(P) … is a potential seed triple pattern Query ● Rationale: good ?p ex:affiliated_with <http://.../orgaX> √ starting point ?p ex:interested_in ?b √ ?b rdf:type <http://.../Book> √Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65
  66. 66. NO VOCAB SEED RULE Avoid a seed triple pattern with vocabulary terms ● Not only vocabulary term URIs in the seed triple pattern ● Patterns to avoid: ?s ex:any_property ?o ?s rdf:type ex:any_class ● Rationale: URIs for vocabulary term usually resolve to vocabulary definitions with little instance data Query ?p ex:affiliated_with <http://.../orgaX> √ ?p ex:interested_in ?b ?b rdf:type <http://.../Book>Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66
  67. 67. FILTERING TP RULE Use a plan where all filtering triple patterns are as close to the seed triple pattern as possible ● Filtering triple pattern: each variable already occurs in one of the preceding triple patterns ● For each result tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 consumed as input a filtering TP can { ?p = <http://.../alice> } only report 1 or 0 results as output tp2 = ( ?p , ex:interested_in , ?b ) I2 tp2 = ( <http://.../alice> , ex:interested_in , ?b ) ● Rationale: Reduce { ?p = <http://.../alice> , ?b = <http://.../b1> } cost tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3 = ( <http://.../b1> , rdf:type , <http://.../Book> )Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67
  68. 68. Evaluation Procedure ● Generate all possible plans ● Execute each plan: ● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run ● Measure for each plan: ● Avg. execution time ● Avg. number of RDF documents retrieved during execution ● Avg. number of query resultsOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68
  69. 69. Evaluation Query (Example) SELECT ?spec ?genus WHERE { Of what genus are the species that are geospecies:4qyn7 gs:inFamily ?fam . ● classified in the ?fam skos:narrowerTransitive ?spec . same family as the ?spec skos:closeMatch ?sp2 . American Badger, ● and expected in the ?sp2 rdfs:subClassOf ?genus . same states as the ?spec gs:isExpectedIn ?loc . American Badger ? geospecies:4qyn7 gs:isExpectedIn ?loc ?loc rdf:type gs:State . } ● 2 potential seed triple patterns that satisfy our NO SEED VOCAB RULE ● 56 different dependency respecting plans, each contains 2 filtering TPs Picture source: WikipediaOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69
  70. 70. Measurements 30 400 retrieved documents 300 20query results 200 10 100 0 0 0 30 60 90 120 150 180 0 30 60 90 120 150 180 query exec. times (in seconds) query exec. times (in seconds) Percentage of plans in each group with a filtering TP in specific positions 1st Filtering TP 2nd Filtering TP 100 100 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 TP position in the ordered BGP TP position in the ordered BGP Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70
  71. 71. Summary (Linked Data Queries) ● Theoretical foundations of Linked Data queries ● Full-Web semantics, (family of) reachability based semantics ● Theoretical properties of queries (e.g. computability) ● Link traversal based query execution ● Novel paradigm for executing Linked Data queries ● Sound and complete for conjunctive Linked Data queries under cMatch-semantics ● Iterator implementation of the LTBQE paradigm ● Trades off completeness for a termination guarantee ● Degree of completeness depends on execution order of TPs ● Heuristic based plan selectionOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71
  72. 72. Chapter 3  Accessing a SPARQL Endpoint  Queries over Multiple Datasets ➢ Query a given collection ➢ Manage your own collection ➢ Use a query federation system ➢ Link traversal based query execution  Linked Data Queries ➢ Foundations ➢ Iterator Based Implementation ➢ Query PlanningOlaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72
  73. 73. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73

×