15. Storage (on the web) As documents .rdf, .n3, .turtle, .html RDF triple stores Sesame, Joseki, 4Store, AllegroGraph, OpenLink Virtuoso, SDB/TDB, Open Calais, SWI Prolog Reasoners ‘on top’, or via DIG Pellet, OWLIM, etc. SPARQL Endpoints Results as JSON, XML, CSV etc. Kennissystemen 2010
16. Data and the Web Need to add this ‘meta’ to my ‘data’ ‘Linking’ data across sites Web of Documents and the Web of Data Old fashioned HTML:<link rel='meta' type='application/rdf+xml' href='http://www.leibnizcenter.org/~hoekstra/foaf.rdf' title='FOAF'> URL-based HTTP 303 `see other’http://www.w3.org/TR/swbp-vocab-pub/ RDFa Kennissystemen 2010
29. September 2010: 25 Miljard triples, 395 Miljoenlinks Kennissystemen 2010
30. Scalability How to deal with massive amounts of data? Consequences for reasoning Billion Triple Challenge (864.8 Million Triples) Consequences for querying Table lookups, joins etc. … and what about … Dealing with change, provenance, trust? Kennissystemen 2010
31. A rough idea… I can crash a DL reasoner using an ontology of ~15 classes and 5 individuals (honestly) What if my ontology contains thousands of classes and billions of individuals? Kennissystemen 2010
32. Reasoning Reasoning with inconsistent knowledge incomplete knowledge Complete vs. incomplete reasoning Kennissystemen 2010
68. Querying Originally there were many languages SPARQL, nRQL, SeRQL, etc. SPARQL: SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/ Version 1.1 in the making... Kennissystemen 2010
69. Do you know SQL? Formulate a query on the relational model students(name, age, address) Structured Query Language SELECT namedata needed FROM student data source WHERE age > 20 data constraint Kennissystemen 2010
70. SPARQL Query Syntax Inspired by SQL (select-from-where) select: the entities (variables) you want to returnSELECT ?city from: a datasource (RDF graph)FROM <http://example.org/geo.rdf> where: the (sub)graph you want to get information fromWHERE {?city geo:areacode “010”. } Including additional constraints on objects, using operatorsWHERE {?city geo:areacode ?c . FILTER (?c > 010). } PREFIX geo: <http://example.org/geo/> SELECT ?city FROM <http://example.org/geoData.rdf> WHERE { ?city geo:areacode ?c . FILTER (?c > 010) } Kennissystemen 2010
71. SPARQL Graph Patterns WHERE clause specifies graph pattern pattern should be matched pattern can match more than once Graph pattern: an RDF graph with some nodes/edges as variables :EuropeanCountry “020”^^xsd:integer :hasCapital ? rdf:type ? ? Kennissystemen 2010
72. Basis: triple patterns Triples with one/more variables Turtle syntax ?xgeo:hasCapitalgeo:Amsterdam ?xgeo:hasCapital ?y ?xgeo:areacode “020”^^xsd:integer ?x ?p ?y All of them match the graph: “020”^^xsd:integer :Netherlands :Amsterdam geo:areacode geo:hasCapital Kennissystemen 2010
73. Conjunctions: several patterns A pattern with several graphs, all must match equivalent to PREFIX geo: <http://example.org/geo/> SELECT ?x FROM <http://example.org/geoData.rdf> WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } } PREFIX geo: <http://example.org/geo/> SELECT ?x FROM <http://example.org/geoData.rdf> WHERE { ?xgeo:hasCapital ?y . ?ygeo:areacode “020”^^xsd:integer . } Kennissystemen 2010
74. Conjunctions: several patterns (2) A pattern with several graphs, all must match equivalent to PREFIX geo: <http://example.org/geo/> SELECT ?x FROM <http://example.org/geoData.rdf> WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } } PREFIX geo: <http://example.org/geo/> SELECT ?x FROM <http://example.org/geoData.rdf> WHERE { ?xgeo:hasCapital [ geo:areacode “020”^^xsd:integer ]. } Kennissystemen 2010
75. Alternatives: UNION A pattern with several graphs At least one should match PREFIX geo: <http://example.org/geo/> SELECT ?city FROM <http://example.org/geoData.rdf> WHERE { { ?city geo:name “Parijs”@nl . } UNION { ?city geo:name “Paris”@fr . } } Kennissystemen 2010
76. Optional Graphs RDF allows for ‘partial’ representations “Give me all people with names, and if known their email address” Use an OPTIONAL graph expression PREFIX geo: <http://example.org/geo/> SELECT ?person ?name ?email WHERE { ?person :name ?name . OPTIONAL { ?person :email ?email } } Kennissystemen 2010
77. Testing values of nodes Tests in FILTER clause have to be validated for matching subgraphs Functions isLiteral(?aNode) isURI(?aNode) str(?aResource) for resources with partially known names for literals with unknown language tag PREFIX geo: <http://example.org/geo/> SELECT ?x ?n WHERE { ?x ?p ?n. FILTER ( str(?p) = “areacode”) } Kennissystemen 2010
78. Testing values of nodes Tests in FILTER clause Comparison (<=, <, =, etc.) Arithmetic operators (+, -, etc.) String matching using regular expressions regex(?x, “netherlands”, “i”) ... matches “The Netherlands” Boolean combination of these && (and), || (or), ! (not) (?y >10 && ?y <30) || !regex(?z, “Rott”) Kennissystemen 2010
79. Boolean comparisons and datatypes RDF has basic datatypes for literals xsd:integer, xsd:float, xsd:string, xsd:dateTime etc. Datatypes can be used in value comparisons ?x < “21”^^xsd:integer ... and be obtained from literals datatype(?aLiteral) Kennissystemen 2010
80. Solution modifiers Sorting using ORDER BY Limiting the number of results: LIMIT PREFIX geo: <http://example.org/geo/> SELECT ?dog ?age WHERE { ?dog a :Dog; ?dog :age ?age .} ORDER BY DESC(?age) PREFIX geo: <http://example.org/geo/> SELECT ?dog ?age WHERE { ?dog a :Dog; ?dog :age ?age .} ORDER BY ?dog LIMIT 10 Kennissystemen 2010
81. SPARQL query types SELECT: table with variable bindings SELECT ... WHERE { ... } CONSTRUCT: returns a graph CONSTRUCT { ... } WHERE { ... } ASK: returns yes/no ASK { ... } DESCRIBE: returns a graph DESCRIBE dbpedia:Amsterdam Kennissystemen 2010
82. SELECT Query Results Solutions consist of variable bindings For each variable in the query, it gives a value (or list) The result is a table, where each column is a variable and each row a combination of variable bindings PREFIX geo: <http://example.org/geo/> SELECT ?x >y ?x WHERE { ?xgeo:contains ?y . OPTIONAL { ?ygeo:areacode ?z }} Kennissystemen 2010
83. CONSTRUCT query results Construct queries return RDF statements The query result is either a subgraph or a transformed graph. PREFIX geo: <http://example.org/geo/> CONSTRUCT {?xgeo:hasCapital ?y . } WHERE { ?xgeo:containsCity ?y . ?ygeo:name “Amsterdam”@nl. } Kennissystemen 2010
84. Recap SPARQL is the query language for the web of data Queries are sent to ‘endpoints’ on the web Queries describe graph patterns with variables Graph patterns match the graphs in the triple store Results are typically returned as a table Kennissystemen 2010