• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Querying the Web of Data

Querying the Web of Data



Presentation at the Knowledge Systems course of the UvA (December 2010)

Presentation at the Knowledge Systems course of the UvA (December 2010)



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Querying the Web of Data Querying the Web of Data Presentation Transcript

    • Querying the Web of DataKennissystemen, December 2010
      Rinke Hoekstra
    • Overview
      Linked (Open) Data
      The Web of Data
      Scalability issues
      RDF Syntaxes
      RDF Storage and Querying
      Kennissystemen 2010
    • The Semantic Web Ideology
      Identity is everything
      Partial solutions are great too!
      Layer cake
      Kennissystemen 2010
    • The Web of Data
      … does it exist?
      Kennissystemen 2010
    • Linked Data
      Kennissystemen 2010
    • Semantic Web
      • Intially
      • `Metadata’ for web pages
      • Since ~2006
      • `Web of Data’
      • Semantic web as data source in its own right
      • Linked Data
      • A ‘Databaseesque’ Web
      • RDF Triple stores
      • Query languages
      Kennissystemen 2010
    • Storage (on the web)
      As documents
      .rdf, .n3, .turtle, .html
      RDF triple stores
      Sesame, Joseki, 4Store, AllegroGraph, OpenLink Virtuoso, SDB/TDB, Open Calais, SWI Prolog
      Reasoners ‘on top’, or via DIG
      Pellet, OWLIM, etc.
      SPARQL Endpoints
      Results as JSON, XML, CSV etc.
      Kennissystemen 2010
    • Data and the Web
      Need to add this ‘meta’ to my ‘data’
      ‘Linking’ data across sites
      Web of Documents and the Web of Data
      Old fashioned HTML:<link rel='meta' type='application/rdf+xml' href='http://www.leibnizcenter.org/~hoekstra/foaf.rdf' title='FOAF'>
      HTTP 303 `see other’http://www.w3.org/TR/swbp-vocab-pub/
      Kennissystemen 2010
    • BBC Music
      Kennissystemen 2010
    • Kennissystemen 2010
    • Kennissystemen 2010
    • Kennissystemen 2010
    • Integration: 303 See Other
      Kennissystemen 2010
    • Integration: Inline
      Attributes on XHTML elements
      Kennissystemen 2010
    • Integration: RDFa Example
      • In XHTML:
      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
      <html xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
      <head><title>Jo's Friends and Family Blog</title></head>
      I'm holding
      <span property="cal:summary">one last summer Barbecue</span>,
      <span property="cal:dtstart" content="20070916T1600-0500">
      September 16th at 4pm.
      • In RDF:
      cal:summary ”one last summer Barbecue";
      cal:dtstart "20070916T1600-0500" .
      Kennissystemen 2010
    • Legal InformationRetrievalforLaymen
      Kennissystemen 2010
    • Voorbeeld
      Kennissystemen 2010
    • So, where’s that data?
      I repeat: does it really exist?
      Kennissystemen 2010
    • Linked Open Data
      Kennissystemen 2010
    • November 2009: 13.1 Billion triples, 142 Million links
      Kennissystemen 2010
    • September 2010: 25 Miljard triples, 395 Miljoenlinks
      Kennissystemen 2010
    • Scalability
      How to deal with massive amounts of data?
      Consequences for reasoning
      Billion Triple Challenge
      (864.8 Million Triples)
      Consequences for querying
      Table lookups, joins etc.
      … and what about …
      Dealing with change, provenance, trust?
      Kennissystemen 2010
    • A rough idea…
      I can crash a DL reasoner using an ontology of ~15 classes and 5 individuals (honestly)
      What if my ontology contains thousands of classes and billions of individuals?
      Kennissystemen 2010
    • Reasoning
      Reasoning with
      inconsistent knowledge
      incomplete knowledge
      Complete vs. incomplete reasoning
      Kennissystemen 2010
    • Reasoning
      • When?
      • Realtime vs. in advance
      • Lightweight reasoning (RDFS, OWL 2 RL)
      • Implementable using forward chaining rules
      • Still problems with scalability
      • Distributed reasoning (DAS-3)
      • MaRVIN
      • ‘SpeedDate’ distrubution of triples across nodes
      • MapReduce
      • Full closure of BTC in 57 minutes
      • Output: 30B triples
      • And what to do with the results?
      Kennissystemen 2010
    • 2 Degrees from Kevin Bacon
      PREFIX p: http://dbpedia.org/property/
      SELECT ?film1 ?actor1 ?film2 ?actor2
      WHERE {
      ?film1 p:starring <http://dbpedia.org/resource/Kevin_Bacon> .
      ?film1 p:starring ?actor1 .
      ?film2 p:starring ?actor1 .
      ?film2 p:starring ?actor2 .}
      DBPedia: 150M triples
      Kennissystemen 2010
    • Another rough idea…
      • 1 Billion triples in MySQL
      • Load time
      • … a couple of hours
      • Simple table lookup (one-variable query)
      • … about 5 minutes
      • Single join (two-variable query)
      • … a couple of hours
      • Better indexes?
      • Harddisk access times are the bottleneck (9ms)
      • More targeted reasoning, querying, federation.
      Kennissystemen 2010
    • SPARQL
      Querying the linked data cloud
      Kennissystemen 2010
    • Languages
      • Multiple Languages
      • RDF, RDFS and OWL
      • Multiple Syntaxes
      RDF/XML, Turtle (Restricted N3), Ntriple
      Functional Syntax, Manchester Syntax, OWL XML
      • RDF
      • Triples <subject, predicate,object>
      • Distributed
      • Always about something else
      • ... but can be about other RDF triples as well.
      Kennissystemen 2010
    • Languages: RDF(S)/XML
      Kennissystemen 2010
    • Languages: FS
      ObjectPropertyRange(uva2:teaches uva:Course)
      ObjectPropertyAssertion(uva:teachesradboud courses:ks2009)
      DataPropertyAssertion(uva:nameradboud "RadboudWinkels")
      Kennissystemen 2010
    • Languages: Turtle
      @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
      @prefix rdfs:<http://www.w3.org/TR/rdf-schema/>.
      @prefix owl:<http://www.w3.org/2002/07/owl#>.
      @prefix uva:<http://www.uva.nl/rdf#>.
      @prefix courses:<http://www.uva.nl/courses#>.
      @prefix :<http://www.uva.nl/people#>.
      uva:AssociateProfessor a rdfs:Class;
      uva:teaches aowl:ObjectProperty;
      :radboud a uva:AssociateProfessor;
      uva:name ”RadboudWinkels”^^xsd:string;
      courses:ks2009 a uva:Course.
      Kennissystemen 2010
    • Turtle Syntax
      Simple abbreviations
      :xgeo:name “A’dam” . :xgeo:areacode “020” .
      :xgeo:name “A’dam” ; geo:areacode “020” .
      :nlgeo:capital [ geo:name “A’dam” ] .
      Blank nodes
      :nlgeo:capital _:bnode1 .
      _:bnode1 geo:name “A’dam” .
      Kennissystemen 2010
    • Querying
      Originally there were many languages
      SPARQL, nRQL, SeRQL, etc.
      SPARQL Query Language for RDF
      Version 1.1 in the making...
      Kennissystemen 2010
    • Do you know SQL?
      Formulate a query on the relational model
      students(name, age, address)
      Structured Query Language
      SELECT namedata needed
      FROM student data source
      WHERE age > 20 data constraint
      Kennissystemen 2010
    • SPARQL Query Syntax
      Inspired by SQL (select-from-where)
      select: the entities (variables) you want to returnSELECT ?city
      from: a datasource (RDF graph)FROM <http://example.org/geo.rdf>
      where: the (sub)graph you want to get information fromWHERE {?city geo:areacode “010”. }
      Including additional constraints on objects, using operatorsWHERE {?city geo:areacode ?c . FILTER (?c > 010). }
      PREFIX geo: <http://example.org/geo/>
      SELECT ?city
      FROM <http://example.org/geoData.rdf>
      WHERE { ?city geo:areacode ?c .
      FILTER (?c > 010)
      Kennissystemen 2010
    • SPARQL Graph Patterns
      WHERE clause specifies graph pattern
      pattern should be matched
      pattern can match more than once
      Graph pattern:
      an RDF graph
      with some nodes/edges as variables
      Kennissystemen 2010
    • Basis: triple patterns
      Triples with one/more variables
      Turtle syntax
      ?xgeo:hasCapital ?y
      ?xgeo:areacode “020”^^xsd:integer
      ?x ?p ?y
      All of them match the graph:
      Kennissystemen 2010
    • Conjunctions: several patterns
      A pattern with several graphs, all must match
      equivalent to
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x
      FROM <http://example.org/geoData.rdf>
      WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } }
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x
      FROM <http://example.org/geoData.rdf>
      WHERE { ?xgeo:hasCapital ?y .
      ?ygeo:areacode “020”^^xsd:integer . }
      Kennissystemen 2010
    • Conjunctions: several patterns (2)
      A pattern with several graphs, all must match
      equivalent to
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x
      FROM <http://example.org/geoData.rdf>
      WHERE { {?xgeo:hasCapital ?y } {?ygeo:areacode “020”^^xsd:integer } }
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x
      FROM <http://example.org/geoData.rdf>
      WHERE { ?xgeo:hasCapital [ geo:areacode “020”^^xsd:integer ]. }
      Kennissystemen 2010
    • Alternatives: UNION
      A pattern with several graphs
      At least one should match
      PREFIX geo: <http://example.org/geo/>
      SELECT ?city
      FROM <http://example.org/geoData.rdf>
      WHERE {
      { ?city geo:name “Parijs”@nl . }
      { ?city geo:name “Paris”@fr . }
      Kennissystemen 2010
    • Optional Graphs
      RDF allows for ‘partial’ representations
      “Give me all people with names, and if known their email address”
      Use an OPTIONAL graph expression
      PREFIX geo: <http://example.org/geo/>
      SELECT ?person ?name ?email
      WHERE {
      ?person :name ?name .
      OPTIONAL { ?person :email ?email }
      Kennissystemen 2010
    • Testing values of nodes
      Tests in FILTER clause have to be validated for matching subgraphs
      for resources with partially known names
      for literals with unknown language tag
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x ?n
      WHERE {
      ?x ?p ?n.
      FILTER ( str(?p) = “areacode”)
      Kennissystemen 2010
    • Testing values of nodes
      Tests in FILTER clause
      Comparison (<=, <, =, etc.)
      Arithmetic operators (+, -, etc.)
      String matching using regular expressions
      regex(?x, “netherlands”, “i”)
      ... matches “The Netherlands”
      Boolean combination of these
      && (and), || (or), ! (not)
      (?y >10 && ?y <30) || !regex(?z, “Rott”)
      Kennissystemen 2010
    • Boolean comparisons and datatypes
      RDF has basic datatypes for literals
      xsd:integer, xsd:float, xsd:string, xsd:dateTime etc.
      Datatypes can be used in value comparisons
      ?x < “21”^^xsd:integer
      ... and be obtained from literals
      Kennissystemen 2010
    • Solution modifiers
      Sorting using ORDER BY
      Limiting the number of results: LIMIT
      PREFIX geo: <http://example.org/geo/>
      SELECT ?dog ?age
      WHERE { ?dog a :Dog; ?dog :age ?age .}
      ORDER BY DESC(?age)
      PREFIX geo: <http://example.org/geo/>
      SELECT ?dog ?age
      WHERE { ?dog a :Dog; ?dog :age ?age .}
      ORDER BY ?dog
      LIMIT 10
      Kennissystemen 2010
    • SPARQL query types
      SELECT: table with variable bindings
      SELECT ... WHERE { ... }
      CONSTRUCT: returns a graph
      CONSTRUCT { ... } WHERE { ... }
      ASK: returns yes/no
      ASK { ... }
      DESCRIBE: returns a graph
      DESCRIBE dbpedia:Amsterdam
      Kennissystemen 2010
    • SELECT Query Results
      Solutions consist of variable bindings
      For each variable in the query, it gives a value (or list)
      The result is a table, where each column is a variable and each row a combination of variable bindings
      PREFIX geo: <http://example.org/geo/>
      SELECT ?x >y ?x
      WHERE { ?xgeo:contains ?y .
      OPTIONAL { ?ygeo:areacode ?z }}
      Kennissystemen 2010
    • CONSTRUCT query results
      Construct queries return RDF statements
      The query result is either a subgraph or a transformed graph.
      PREFIX geo: <http://example.org/geo/>
      CONSTRUCT {?xgeo:hasCapital ?y . }
      WHERE { ?xgeo:containsCity ?y .
      ?ygeo:name “Amsterdam”@nl. }
      Kennissystemen 2010
    • Recap
      SPARQL is the query language for the web of data
      Queries are sent to ‘endpoints’ on the web
      Queries describe graph patterns with variables
      Graph patterns match the graphs in the triple store
      Results are typically returned as a table
      Kennissystemen 2010