Querying the Web of DataKennissystemen, December 2010<br />Rinke Hoekstra<br />
Overview<br />Linked (Open) Data<br />The Web of Data<br />Scalability issues<br />Technology<br />RDF Syntaxes<br />RDF S...
The Semantic Web Ideology<br />Identity is everything<br />Partial solutions are great too!<br />Layer cake<br />OWL<br />...
The Web of Data<br />… does it exist?<br />Kennissystemen 2010<br />
Linked Data<br />Kennissystemen 2010<br />
Semantic Web<br /><ul><li>Intially
`Metadata’ for web pages
Since ~2006
`Web of Data’
Semantic web as data source in its own right
Linked Data
A ‘Databaseesque’ Web
RDF Triple stores
Query languages</li></ul>Kennissystemen 2010<br />
Storage (on the web)<br />As documents<br />.rdf, .n3, .turtle, .html<br />RDF triple stores<br />Sesame, Joseki, 4Store, ...
Data and the Web<br />Need to add this ‘meta’ to my ‘data’<br />‘Linking’ data across sites<br />Web of Documents and the ...
BBC Music<br />Kennissystemen 2010<br />
Kennissystemen 2010<br />
Kennissystemen 2010<br />
Kennissystemen 2010<br />
Integration: 303 See Other<br />Kennissystemen 2010<br />
Integration: Inline <br />RDFa<br />Attributes on XHTML elements<br />http://www.w3.org/TR/xhtml-rdfa-primer<br />Kennissy...
Integration: RDFa Example<br /><ul><li>In XHTML:</li></ul><?xml version="1.0" encoding="UTF-8"?><br /><!DOCTYPE html PUBLI...
Legal InformationRetrievalforLaymen<br />Kennissystemen 2010<br />
Voorbeeld<br />Kennissystemen 2010<br />
So, where’s that data?<br />I repeat: does it really exist?<br />Kennissystemen 2010<br />
Linked Open Data<br />Kennissystemen 2010<br />
November 2009: 13.1 Billion triples, 142 Million links <br />Kennissystemen 2010<br />
September 2010: 25 Miljard triples, 395 Miljoenlinks <br />Kennissystemen 2010<br />
Scalability<br />How to deal with massive amounts of data?<br />Consequences for reasoning<br />Billion Triple Challenge<b...
A rough idea…<br />I can crash a DL reasoner using an ontology of ~15 classes and 5 individuals (honestly)<br />What if my...
Reasoning<br />Reasoning with <br />inconsistent knowledge<br />incomplete knowledge<br />Complete vs. incomplete reasonin...
Reasoning<br /><ul><li>When?
Realtime vs. in advance
Lightweight reasoning (RDFS, OWL 2 RL)
Implementable using forward chaining rules
Still problems with scalability
Distributed reasoning (DAS-3)
MaRVIN
‘SpeedDate’ distrubution of triples across nodes
MapReduce
Full closure of BTC in 57 minutes
Output: 30B triples
And what to do with the results?</li></ul>Kennissystemen 2010<br />
2 Degrees from Kevin Bacon<br />PREFIX p: http://dbpedia.org/property/<br />SELECT ?film1 ?actor1 ?film2 ?actor2<br />WHER...
Another rough idea…<br /><ul><li>1 Billion triples in MySQL
Load time
… a couple of hours
Simple table lookup (one-variable query)
… about 5 minutes
Upcoming SlideShare
Loading in...5
×

Querying the Web of Data

1,709

Published on

Presentation at the Knowledge Systems course of the UvA (December 2010)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,709
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Querying the Web of Data

  1. 1. Querying the Web of DataKennissystemen, December 2010<br />Rinke Hoekstra<br />
  2. 2. Overview<br />Linked (Open) Data<br />The Web of Data<br />Scalability issues<br />Technology<br />RDF Syntaxes<br />RDF Storage and Querying<br />Kennissystemen 2010<br />
  3. 3. The Semantic Web Ideology<br />Identity is everything<br />Partial solutions are great too!<br />Layer cake<br />OWL<br />Kennissystemen 2010<br />
  4. 4. The Web of Data<br />… does it exist?<br />Kennissystemen 2010<br />
  5. 5. Linked Data<br />Kennissystemen 2010<br />
  6. 6. Semantic Web<br /><ul><li>Intially
  7. 7. `Metadata’ for web pages
  8. 8. Since ~2006
  9. 9. `Web of Data’
  10. 10. Semantic web as data source in its own right
  11. 11. Linked Data
  12. 12. A ‘Databaseesque’ Web
  13. 13. RDF Triple stores
  14. 14. Query languages</li></ul>Kennissystemen 2010<br />
  15. 15. Storage (on the web)<br />As documents<br />.rdf, .n3, .turtle, .html<br />RDF triple stores<br />Sesame, Joseki, 4Store, AllegroGraph, OpenLink Virtuoso, SDB/TDB, Open Calais, SWI Prolog <br />Reasoners ‘on top’, or via DIG<br />Pellet, OWLIM, etc.<br />SPARQL Endpoints<br />Results as JSON, XML, CSV etc.<br />Kennissystemen 2010<br />
  16. 16. Data and the Web<br />Need to add this ‘meta’ to my ‘data’<br />‘Linking’ data across sites<br />Web of Documents and the Web of Data<br />Old fashioned HTML:<link rel='meta' type='application/rdf+xml' href='http://www.leibnizcenter.org/~hoekstra/foaf.rdf' title='FOAF'> <br />URL-based<br />HTTP 303 `see other’http://www.w3.org/TR/swbp-vocab-pub/<br />RDFa<br />Kennissystemen 2010<br />
  17. 17. BBC Music<br />Kennissystemen 2010<br />
  18. 18. Kennissystemen 2010<br />
  19. 19. Kennissystemen 2010<br />
  20. 20. Kennissystemen 2010<br />
  21. 21. Integration: 303 See Other<br />Kennissystemen 2010<br />
  22. 22. Integration: Inline <br />RDFa<br />Attributes on XHTML elements<br />http://www.w3.org/TR/xhtml-rdfa-primer<br />Kennissystemen 2010<br />
  23. 23. Integration: RDFa Example<br /><ul><li>In XHTML:</li></ul><?xml version="1.0" encoding="UTF-8"?><br /><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"<br /> "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><br /><html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"><br /> <head><title>Jo's Friends and Family Blog</title></head><br /> <body><br /> <pinstanceof="cal:Vevent"><br /> I'm holding<br /> <span property="cal:summary">one last summer Barbecue</span>,<br /> on<br /> <span property="cal:dtstart" content="20070916T1600-0500"><br /> September 16th at 4pm.<br /> </span><br /> </p><br /> </body><br /></html><br /><ul><li>In RDF:</li></ul>_:blanknode0<br />rdf:typecal:Vevent; <br />cal:summary ”one last summer Barbecue";<br />cal:dtstart "20070916T1600-0500" .<br />Kennissystemen 2010<br />
  24. 24. Legal InformationRetrievalforLaymen<br />Kennissystemen 2010<br />
  25. 25. Voorbeeld<br />Kennissystemen 2010<br />
  26. 26. So, where’s that data?<br />I repeat: does it really exist?<br />Kennissystemen 2010<br />
  27. 27. Linked Open Data<br />Kennissystemen 2010<br />
  28. 28. November 2009: 13.1 Billion triples, 142 Million links <br />Kennissystemen 2010<br />
  29. 29. September 2010: 25 Miljard triples, 395 Miljoenlinks <br />Kennissystemen 2010<br />
  30. 30. Scalability<br />How to deal with massive amounts of data?<br />Consequences for reasoning<br />Billion Triple Challenge<br />(864.8 Million Triples)<br />Consequences for querying<br />Table lookups, joins etc.<br />… and what about …<br />Dealing with change, provenance, trust?<br />Kennissystemen 2010<br />
  31. 31. A rough idea…<br />I can crash a DL reasoner using an ontology of ~15 classes and 5 individuals (honestly)<br />What if my ontology contains thousands of classes and billions of individuals?<br />Kennissystemen 2010<br />
  32. 32. Reasoning<br />Reasoning with <br />inconsistent knowledge<br />incomplete knowledge<br />Complete vs. incomplete reasoning<br />Kennissystemen 2010<br />
  33. 33. Reasoning<br /><ul><li>When?
  34. 34. Realtime vs. in advance
  35. 35. Lightweight reasoning (RDFS, OWL 2 RL)
  36. 36. Implementable using forward chaining rules
  37. 37. Still problems with scalability
  38. 38. Distributed reasoning (DAS-3)
  39. 39. MaRVIN
  40. 40. ‘SpeedDate’ distrubution of triples across nodes
  41. 41. MapReduce
  42. 42. Full closure of BTC in 57 minutes
  43. 43. Output: 30B triples
  44. 44. And what to do with the results?</li></ul>Kennissystemen 2010<br />
  45. 45. 2 Degrees from Kevin Bacon<br />PREFIX p: http://dbpedia.org/property/<br />SELECT ?film1 ?actor1 ?film2 ?actor2<br />WHERE { <br /> ?film1 p:starring <http://dbpedia.org/resource/Kevin_Bacon> . <br /> ?film1 p:starring ?actor1 . <br /> ?film2 p:starring ?actor1 . <br /> ?film2 p:starring ?actor2 .}<br />DBPedia: 150M triples<br />Kennissystemen 2010<br />
  46. 46. Another rough idea…<br /><ul><li>1 Billion triples in MySQL
  47. 47. Load time
  48. 48. … a couple of hours
  49. 49. Simple table lookup (one-variable query)
  50. 50. … about 5 minutes
  51. 51. Single join (two-variable query)
  52. 52. … a couple of hours
  53. 53. Better indexes?
  54. 54. Harddisk access times are the bottleneck (9ms)
  55. 55. More targeted reasoning, querying, federation.</li></ul>Kennissystemen 2010<br />
  56. 56. SPARQL<br />Querying the linked data cloud<br />Kennissystemen 2010<br />
  57. 57. Languages<br /><ul><li>Multiple Languages
  58. 58. RDF, RDFS and OWL
  59. 59. Multiple Syntaxes</li></ul>RDF/XML, Turtle (Restricted N3), Ntriple<br />Functional Syntax, Manchester Syntax, OWL XML<br /><ul><li>RDF
  60. 60. Triples <subject, predicate,object>
  61. 61. Distributed
  62. 62. Always about something else
  63. 63. ... but can be about other RDF triples as well.</li></ul>Kennissystemen 2010<br />
  64. 64. Languages: RDF(S)/XML<br /><rdf:RDF<br />xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”<br />xmlns:rdfs="http://www.w3.org/TR/rdf-schema/"<br />xmlns:owl=“http://www.w3.org/2002/07/owl#”<br />xmlns:uva=“http://www.uva.nl/rdf#”<br />xmlns=“http://www.uva.nl/people”><br /> <rdf:Descriptionrdf:ID=“#radboud”><br /> <rdf:typerdf:resource=“http://www.uva.nl/rdf#AssociateProfessor”/><br /> <uva:name>RadboudWinkels</uva:name><br /> <uva:teachesrdf:resource=“http://www.uva.nl/courses#ks2009”/><br /> </rdf:Description><br /> <uva:Courserdf:about=“http://www.uva.nl/courses#ks2009”/><br /> <rdfs:Classrdf:about=“http://www.uva.nl/rdf#AssociateProfessor”><br /> <rdfs:subClassOfrdf:resource=“http://www.uva.nl/rdf#StaffMember”/><br /> </rdfs:Class><br /><owl:ObjectPropertyrdf:about=“http://www.uva.nl/rdf#teaches”><br /> <rdfs:domainrdf:resource=“http://www.uva.nl/rdf#Professor”/><br /> <rdfs:rangerdf:resource=“http://www.uva.nl/rdf#Course”/><br /> </owl:ObjectProperty><br /></rdf:RDF><br />Kennissystemen 2010<br />
  65. 65. Languages: FS<br />Namespace(=<http://www.uva.nl/people#>)<br />Namespace(owl=<http://www.w3.org/2002/07/owl#>)<br />Namespace(uva=<http://www.uva.nl/rdf#>)<br />Namespace(courses=<http://www.uva.nl/courses#>)<br />Declaration(Class(uva:Course))<br />Declaration(Class(uva:StaffMember))<br />Declaration(Class(uva:AssociateProfessor))<br />SubClassOf(uva:AssociateProfessoruva:StaffMember)<br />Declaration(DataProperty(uva:name))<br />Declaration(ObjectProperty(uva:teaches))<br />ObjectPropertyDomain(uva:teachesuva:AssociateProfessor)<br />ObjectPropertyRange(uva2:teaches uva:Course)<br />Declaration(Individual(courses:ks2009))<br />Declaration(Individual(radboud))<br />ObjectPropertyAssertion(uva:teachesradboud courses:ks2009)<br />DataPropertyAssertion(uva:nameradboud "RadboudWinkels")<br />Kennissystemen 2010<br />
  66. 66. Languages: Turtle<br />@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>.<br />@prefix rdfs:<http://www.w3.org/TR/rdf-schema/>.<br />@prefix owl:<http://www.w3.org/2002/07/owl#>.<br />@prefix uva:<http://www.uva.nl/rdf#>.<br />@prefix courses:<http://www.uva.nl/courses#>.<br />@prefix :<http://www.uva.nl/people#>.<br />uva:AssociateProfessor a rdfs:Class;<br />rdfs:subClassOfuva:StaffMember.<br />uva:teaches aowl:ObjectProperty;<br />rdfs:domainuva:AssociateProfessor;<br />rdfs:rangeuva:Course.<br />:radboud a uva:AssociateProfessor;<br />uva:name ”RadboudWinkels”^^xsd:string;<br />uva:teachescourses:ks2009.<br />courses:ks2009 a uva:Course.<br />Kennissystemen 2010<br />
  67. 67. Turtle Syntax<br />Simple abbreviations<br />:xgeo:name “A’dam” . :xgeo:areacode “020” .<br />:xgeo:name “A’dam” ; geo:areacode “020” .<br />Nesting<br />:nlgeo:capital [ geo:name “A’dam” ] .<br />Blank nodes<br />:nlgeo:capital _:bnode1 .<br />_:bnode1 geo:name “A’dam” .<br />Kennissystemen 2010<br />
  68. 68. Querying<br />Originally there were many languages<br />SPARQL, nRQL, SeRQL, etc. <br />SPARQL:<br />SPARQL Query Language for RDF<br />http://www.w3.org/TR/rdf-sparql-query/<br />Version 1.1 in the making...<br />Kennissystemen 2010<br />
  69. 69. Do you know SQL?<br />Formulate a query on the relational model<br />students(name, age, address)<br />Structured Query Language<br />SELECT namedata needed<br />FROM student data source<br />WHERE age > 20 data constraint<br />Kennissystemen 2010<br />
  70. 70. SPARQL Query Syntax<br />Inspired by SQL (select-from-where)<br />select: the entities (variables) you want to returnSELECT ?city<br />from: a datasource (RDF graph)FROM <http://example.org/geo.rdf><br />where: the (sub)graph you want to get information fromWHERE {?city geo:areacode “010”. }<br />Including additional constraints on objects, using operatorsWHERE {?city geo:areacode ?c . FILTER (?c > 010). }<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?city<br />FROM <http://example.org/geoData.rdf><br />WHERE { ?city geo:areacode ?c .<br /> FILTER (?c > 010)<br />}<br />Kennissystemen 2010<br />
  71. 71. SPARQL Graph Patterns<br />WHERE clause specifies graph pattern<br />pattern should be matched<br />pattern can match more than once<br />Graph pattern:<br />an RDF graph<br />with some nodes/edges as variables<br />:EuropeanCountry<br />“020”^^xsd:integer<br />:hasCapital<br />?<br />rdf:type<br />?<br />?<br />Kennissystemen 2010<br />
  72. 72. Basis: triple patterns<br />Triples with one/more variables<br />Turtle syntax<br />?xgeo:hasCapitalgeo:Amsterdam<br />?xgeo:hasCapital ?y<br />?xgeo:areacode “020”^^xsd:integer<br />?x ?p ?y<br />All of them match the graph:<br />“020”^^xsd:integer<br />:Netherlands<br />:Amsterdam<br />geo:areacode<br />geo:hasCapital<br />Kennissystemen 2010<br />
  73. 73. Conjunctions: several patterns<br />A pattern with several graphs, all must match<br />equivalent to<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x<br />FROM <http://example.org/geoData.rdf><br />WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } }<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x<br />FROM <http://example.org/geoData.rdf><br />WHERE { ?xgeo:hasCapital ?y . <br /> ?ygeo:areacode “020”^^xsd:integer . }<br />Kennissystemen 2010<br />
  74. 74. Conjunctions: several patterns (2)<br />A pattern with several graphs, all must match<br />equivalent to<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x<br />FROM <http://example.org/geoData.rdf><br />WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } }<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x<br />FROM <http://example.org/geoData.rdf><br />WHERE { ?xgeo:hasCapital [ geo:areacode “020”^^xsd:integer ]. }<br />Kennissystemen 2010<br />
  75. 75. Alternatives: UNION<br />A pattern with several graphs<br />At least one should match<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?city<br />FROM <http://example.org/geoData.rdf><br />WHERE {<br /> { ?city geo:name “Parijs”@nl . }<br />UNION <br /> { ?city geo:name “Paris”@fr . }<br />}<br />Kennissystemen 2010<br />
  76. 76. Optional Graphs<br />RDF allows for ‘partial’ representations<br />“Give me all people with names, and if known their email address”<br />Use an OPTIONAL graph expression<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?person ?name ?email<br />WHERE {<br /> ?person :name ?name .<br /> OPTIONAL { ?person :email ?email }<br />}<br />Kennissystemen 2010<br />
  77. 77. Testing values of nodes<br />Tests in FILTER clause have to be validated for matching subgraphs<br />Functions<br />isLiteral(?aNode)<br />isURI(?aNode)<br />str(?aResource)<br />for resources with partially known names<br />for literals with unknown language tag<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x ?n<br />WHERE {<br /> ?x ?p ?n.<br /> FILTER ( str(?p) = “areacode”)<br />}<br />Kennissystemen 2010<br />
  78. 78. Testing values of nodes<br />Tests in FILTER clause<br />Comparison (<=, <, =, etc.)<br />Arithmetic operators (+, -, etc.)<br />String matching using regular expressions<br />regex(?x, “netherlands”, “i”)<br />... matches “The Netherlands”<br />Boolean combination of these<br />&& (and), || (or), ! (not)<br />(?y >10 && ?y <30) || !regex(?z, “Rott”)<br />Kennissystemen 2010<br />
  79. 79. Boolean comparisons and datatypes<br />RDF has basic datatypes for literals<br />xsd:integer, xsd:float, xsd:string, xsd:dateTime etc.<br />Datatypes can be used in value comparisons<br />?x < “21”^^xsd:integer<br />... and be obtained from literals<br />datatype(?aLiteral)<br />Kennissystemen 2010<br />
  80. 80. Solution modifiers<br />Sorting using ORDER BY<br />Limiting the number of results: LIMIT<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?dog ?age<br />WHERE { ?dog a :Dog; ?dog :age ?age .}<br />ORDER BY DESC(?age)<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?dog ?age<br />WHERE { ?dog a :Dog; ?dog :age ?age .}<br />ORDER BY ?dog<br />LIMIT 10<br />Kennissystemen 2010<br />
  81. 81. SPARQL query types<br />SELECT: table with variable bindings<br />SELECT ... WHERE { ... }<br />CONSTRUCT: returns a graph<br />CONSTRUCT { ... } WHERE { ... }<br />ASK: returns yes/no<br />ASK { ... }<br />DESCRIBE: returns a graph<br />DESCRIBE dbpedia:Amsterdam<br />Kennissystemen 2010<br />
  82. 82. SELECT Query Results<br />Solutions consist of variable bindings<br />For each variable in the query, it gives a value (or list)<br />The result is a table, where each column is a variable and each row a combination of variable bindings<br />PREFIX geo: <http://example.org/geo/><br />SELECT ?x >y ?x<br />WHERE { ?xgeo:contains ?y .<br /> OPTIONAL { ?ygeo:areacode ?z }}<br />Kennissystemen 2010<br />
  83. 83. CONSTRUCT query results<br />Construct queries return RDF statements<br />The query result is either a subgraph or a transformed graph.<br />PREFIX geo: <http://example.org/geo/><br />CONSTRUCT {?xgeo:hasCapital ?y . }<br />WHERE { ?xgeo:containsCity ?y .<br /> ?ygeo:name “Amsterdam”@nl. }<br />Kennissystemen 2010<br />
  84. 84. Recap<br />SPARQL is the query language for the web of data<br />Queries are sent to ‘endpoints’ on the web<br />Queries describe graph patterns with variables<br />Graph patterns match the graphs in the triple store<br />Results are typically returned as a table<br />Kennissystemen 2010<br />

×