Linked Data: Spreading
data over the web
Damian Steer
d.steer@bris.ac.uk
Overview

What is linked data?
A brief primer on RDF
How we used linked data in Research Revealed
The hidden world of data on the web
Recent developments: Facebook, Google, Yahoo, Bing
Information Management: A
Proposal
“To a computer, then, the web is a flat, boring world
devoid of meaning.
This is a pity, as in fact documents on the web describe
real objects and imaginary concepts, and give particular
relationships between them.”
Real ...
...and imaginary objects
Data
Linked Data
Use URIs as names for things
Use HTTP URIs so that
people can look up those
names.
When someone looks up a
URI, provide useful
information, using the
standards (RDF*, SPARQL)
Include links to other URIs. so
that they can discover more
things.
RDF is simple


<http://www.bristol.ac.uk/>
  <http://purl.org/dc/terms/hasVersion>
    <http://m.bristol.ac.uk> .

<http://www.bristol.ac.uk/>
  <http://purl.org/dc/terms/title>
    “Bristol University homepage” .
RDF is simple

            subject
                     predicate
<http://www.bristol.ac.uk/>
                               object
  <http://purl.org/dc/terms/hasVersion>
    <http://m.bristol.ac.uk> .

<http://www.bristol.ac.uk/>
  <http://purl.org/dc/terms/title>
    “Bristol University homepage” .
RDF is simple

             subject
                     predicate
<http://www.bristol.ac.uk/>
                               object
  <http://purl.org/dc/terms/hasVersion>
    <http://m.bristol.ac.uk> .

<http://www.bristol.ac.uk/>
                              literal
  <http://purl.org/dc/terms/title>
    “Bristol University homepage” .
Writing it can be easy


@prefix dc: <http://purl.org/dc/terms/> .

<http://www.bristol.ac.uk>
 dc:title “Bristol University homepage” ;
 dc:hasVersion <http://m.bristol.ac.uk> .
Writing it...

 rdf/xml - the standard
 n-triples - line-oriented, simple
 turtle - human friendly, n-triples with shortcuts
 RDFa - embedded in (x)html
 various JSON
Publishing it

 What’s ‘http://purl.org/dc/terms/hasVersion’?
 GET http://purl.org/dc/terms/hasVersion

 302 http://dublincore.org/2010/10/11/dcterms.rdf#hasVersion

 GET ...

 200 <bunch of rdf/xml, some of which concerns
 dc:hasVersion>

 “Follow your nose”
Publishing it
 Upload an rdf file. Put everything in that.
   URLs like <http://example.com/about.rdf#me>
 Upload an rdf and html version. Content negotiate.
   <../about#me> yields html page in browser.
   <../about#me> yields rdf page if agent asks for it.
 <../about/me> redirect and (perhaps) content
 negotiate.
DBpedia
Wikipedia as linked
data
http://dbpedia.org/resource/
Bristol
 Infoboxes as machine
 readable data
 Some cleaning of
 categories
 Solid base for linked
 data (there’s a bit of
 everything)
Querying: SPARQL

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?homepage ?version
WHERE {
  ?homepage
  dc:title “Bristol University homepage” ;
  dc:hasVersion ?version .
}
Querying: SPARQL

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?origin ?homepage ?version
WHERE {
  GRAPH ?origin {
  ?homepage
  dc:title “Bristol University homepage” ;
  dc:hasVersion ?version .
  }
}
Try on sparql.org


PREFIX dc: <http://purl.org/dc/terms/>
SELECT *
FROM <http://purl.org/dc/terms/
hasVersion>
WHERE {
  dc:hasVersion ?p ?o
}
Result
                        p                                                      o
        <http://purl.org/dc/terms/modified>                               "2008-01-14"

         <http://purl.org/dc/terms/issued>                               "2000-07-11"

                                                          <http://dublincore.org/usage/terms/history/
       <http://purl.org/dc/terms/hasVersion>
                                                                       #hasVersion-003>

   <http://www.w3.org/2000/01/rdf-schema#label>                      "Has Version" @en-US

         <http://www.w3.org/2000/01/rdf-
                                                          <http://purl.org/dc/elements/1.1/relation>
             schema#subPropertyOf>
                                                        "A related resource that is a version, edition, or
<http://www.w3.org/2000/01/rdf-schema#comment>
                                                        adaptation of the described resource." @en-US
                                                    "This term is intended to be used with non-literal values
   <http://www.w3.org/2004/02/skos/core#note>
                                                         as defined in the DCMI Abstract Model (http://
                                                      dublincore.org/documents/abstract-model/). As of
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy>                  <http://purl.org/dc/terms/>
                                                     December 2007, the DCMI Usage Board is seeking a
                                                       way to express this intention with a formal range
                                                          <http://www.w3.org/1999/02/22-rdf-syntax-
                                                                      declaration." @en-US
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
                                                                          ns#Property>
         <http://www.w3.org/2000/01/rdf-
                                                              <http://purl.org/dc/terms/relation>
             schema#subPropertyOf>
Run your own store

$ curl -O http://openjena.org/repo-dev/
org/openjena/fuseki/0.2.1-SNAPSHOT/
fuseki-0.2.1-20110904.172006-16.zip
$ unzip fuseki*.zip
$ cd Fuseki-0.2.1-SNAPSHOT
$ mkdir DB
$ ./fuseki-server --loc DB --update 
    /my-data

http://localhost:3030/
ResearchRevealed
Integrating the university’s research data
Facetted browsing
Behind the scenes

Lots of internal data
Funding council data
Researcher contributed
Other third-party
sources
Publishing our data
 All staff have a contact
 page. Err, actually we
 seem to have dozens.
 Links to my
 organisation
 My organisation links to
 members and parent
 organisation
Publishing data with RDFa

RDF-in-attributes
Adds a few attributes
to html
Links and content can
become objects.
Attributes introduce
properties.
RDFa

<html
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
...
<div id="container" about="#person"
typeof="foaf:Person">
  <h1 property=”foaf:name”>Damian...</h1>
  <h3>
   <a href="http://..."
       rel="foaf:homepage">Homepage</a>
RDFa

<html
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
...
<div id="container" about="#person"
typeof="foaf:Person">
  <h1 property="foaf:name">Damian...</h1>
  <h3>
   <a href="http://..."
      rel="foaf:homepage">Homepage</a>
Linking with the British
Library
 BL converted
 bibliographic data
 New books published
 in the UK since 1950
 3 million records (1E8
 triples)
 Ought to be some
 crossover
Data needed munging
prefix rdfs: <http://www.w3.org/2000/01/
rdf-schema#>
prefix dc: <http://purl.org/dc/terms/>
prefix ilrt: <http://www.ilrt.org/#>
insert into <urn:x:normalised>
{ ?person foaf:name ?nlabel } where
{ select ?person (ilrt:normaliseName(?
label) as ?nlabel)
  { graph ?g1 { ?s dc:contributor ?
person }
    graph ?g2 { ?person rdfs:label ?
label }
Linking with the British
Library
 At the time the data
 wasn’t great
 Authors were strings
 We enriched BL data
 more than the other
 way around
 Much improved now
Linked data in the wild
BBC Nature
Ask the web about penguins


select (count(*) as ?size)
from <http://www.bbc.co.uk/nature/life/
Aptenodytes>
where
{?s ?p ?o}

=> 58
BBC Programmes
Ask about the One Show
prefix po: <http://purl.org/ontology/po/>
select ?synopsis from
<http://www.bbc.co.uk/programmes/
b0171t8n>
{ <http://www.bbc.co.uk/programmes/
b0171t8n#programme>
   po:long_synopsis ?synopsis }
=> “Alex Jones and Joe Crowley are joined
by the actor Neil Morrissey. Larry Lamb
visits a small village in France to find
out about a British WWII airman who has
been honoured there since 1944. Marcus
Facebook Open Graph
Protocol
Open Graph Protocol

Uses RDFa
Found in:
  IMDB
  Rotten Tomatoes
  ...
Schema.org
Schema.org


NOT RDF. But awfully close.
Richer than OGP (in initial incarnation)
Augment results
Questions?
http://incubator.apache.org/jena/

http://sparql.org/

Linked data: spreading data over the web

  • 1.
    Linked Data: Spreading dataover the web Damian Steer d.steer@bris.ac.uk
  • 2.
    Overview What is linkeddata? A brief primer on RDF How we used linked data in Research Revealed The hidden world of data on the web Recent developments: Facebook, Google, Yahoo, Bing
  • 3.
  • 4.
    “To a computer,then, the web is a flat, boring world devoid of meaning. This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them.”
  • 5.
  • 6.
  • 7.
  • 9.
    Linked Data Use URIsas names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs. so that they can discover more things.
  • 10.
    RDF is simple <http://www.bristol.ac.uk/> <http://purl.org/dc/terms/hasVersion> <http://m.bristol.ac.uk> . <http://www.bristol.ac.uk/> <http://purl.org/dc/terms/title> “Bristol University homepage” .
  • 11.
    RDF is simple subject predicate <http://www.bristol.ac.uk/> object <http://purl.org/dc/terms/hasVersion> <http://m.bristol.ac.uk> . <http://www.bristol.ac.uk/> <http://purl.org/dc/terms/title> “Bristol University homepage” .
  • 12.
    RDF is simple subject predicate <http://www.bristol.ac.uk/> object <http://purl.org/dc/terms/hasVersion> <http://m.bristol.ac.uk> . <http://www.bristol.ac.uk/> literal <http://purl.org/dc/terms/title> “Bristol University homepage” .
  • 13.
    Writing it canbe easy @prefix dc: <http://purl.org/dc/terms/> . <http://www.bristol.ac.uk> dc:title “Bristol University homepage” ; dc:hasVersion <http://m.bristol.ac.uk> .
  • 14.
    Writing it... rdf/xml- the standard n-triples - line-oriented, simple turtle - human friendly, n-triples with shortcuts RDFa - embedded in (x)html various JSON
  • 15.
    Publishing it What’s‘http://purl.org/dc/terms/hasVersion’? GET http://purl.org/dc/terms/hasVersion 302 http://dublincore.org/2010/10/11/dcterms.rdf#hasVersion GET ... 200 <bunch of rdf/xml, some of which concerns dc:hasVersion> “Follow your nose”
  • 16.
    Publishing it Uploadan rdf file. Put everything in that. URLs like <http://example.com/about.rdf#me> Upload an rdf and html version. Content negotiate. <../about#me> yields html page in browser. <../about#me> yields rdf page if agent asks for it. <../about/me> redirect and (perhaps) content negotiate.
  • 17.
  • 18.
    http://dbpedia.org/resource/ Bristol Infoboxes asmachine readable data Some cleaning of categories Solid base for linked data (there’s a bit of everything)
  • 19.
    Querying: SPARQL PREFIX dc:<http://purl.org/dc/terms/> SELECT ?homepage ?version WHERE { ?homepage dc:title “Bristol University homepage” ; dc:hasVersion ?version . }
  • 20.
    Querying: SPARQL PREFIX dc:<http://purl.org/dc/terms/> SELECT ?origin ?homepage ?version WHERE { GRAPH ?origin { ?homepage dc:title “Bristol University homepage” ; dc:hasVersion ?version . } }
  • 21.
    Try on sparql.org PREFIXdc: <http://purl.org/dc/terms/> SELECT * FROM <http://purl.org/dc/terms/ hasVersion> WHERE { dc:hasVersion ?p ?o }
  • 22.
    Result p o <http://purl.org/dc/terms/modified> "2008-01-14" <http://purl.org/dc/terms/issued> "2000-07-11" <http://dublincore.org/usage/terms/history/ <http://purl.org/dc/terms/hasVersion> #hasVersion-003> <http://www.w3.org/2000/01/rdf-schema#label> "Has Version" @en-US <http://www.w3.org/2000/01/rdf- <http://purl.org/dc/elements/1.1/relation> schema#subPropertyOf> "A related resource that is a version, edition, or <http://www.w3.org/2000/01/rdf-schema#comment> adaptation of the described resource." @en-US "This term is intended to be used with non-literal values <http://www.w3.org/2004/02/skos/core#note> as defined in the DCMI Abstract Model (http:// dublincore.org/documents/abstract-model/). As of <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://purl.org/dc/terms/> December 2007, the DCMI Usage Board is seeking a way to express this intention with a formal range <http://www.w3.org/1999/02/22-rdf-syntax- declaration." @en-US <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ns#Property> <http://www.w3.org/2000/01/rdf- <http://purl.org/dc/terms/relation> schema#subPropertyOf>
  • 23.
    Run your ownstore $ curl -O http://openjena.org/repo-dev/ org/openjena/fuseki/0.2.1-SNAPSHOT/ fuseki-0.2.1-20110904.172006-16.zip $ unzip fuseki*.zip $ cd Fuseki-0.2.1-SNAPSHOT $ mkdir DB $ ./fuseki-server --loc DB --update /my-data http://localhost:3030/
  • 24.
  • 25.
  • 26.
    Behind the scenes Lotsof internal data Funding council data Researcher contributed Other third-party sources
  • 27.
    Publishing our data All staff have a contact page. Err, actually we seem to have dozens. Links to my organisation My organisation links to members and parent organisation
  • 28.
    Publishing data withRDFa RDF-in-attributes Adds a few attributes to html Links and content can become objects. Attributes introduce properties.
  • 29.
    RDFa <html xmlns:foaf="http://xmlns.com/foaf/0.1/" > ... <div id="container" about="#person" typeof="foaf:Person"> <h1 property=”foaf:name”>Damian...</h1> <h3> <a href="http://..." rel="foaf:homepage">Homepage</a>
  • 30.
    RDFa <html xmlns:foaf="http://xmlns.com/foaf/0.1/" > ... <div id="container" about="#person" typeof="foaf:Person"> <h1 property="foaf:name">Damian...</h1> <h3> <a href="http://..." rel="foaf:homepage">Homepage</a>
  • 31.
    Linking with theBritish Library BL converted bibliographic data New books published in the UK since 1950 3 million records (1E8 triples) Ought to be some crossover
  • 32.
    Data needed munging prefixrdfs: <http://www.w3.org/2000/01/ rdf-schema#> prefix dc: <http://purl.org/dc/terms/> prefix ilrt: <http://www.ilrt.org/#> insert into <urn:x:normalised> { ?person foaf:name ?nlabel } where { select ?person (ilrt:normaliseName(? label) as ?nlabel) { graph ?g1 { ?s dc:contributor ? person } graph ?g2 { ?person rdfs:label ? label }
  • 33.
    Linking with theBritish Library At the time the data wasn’t great Authors were strings We enriched BL data more than the other way around Much improved now
  • 34.
  • 35.
  • 36.
    Ask the webabout penguins select (count(*) as ?size) from <http://www.bbc.co.uk/nature/life/ Aptenodytes> where {?s ?p ?o} => 58
  • 37.
  • 38.
    Ask about theOne Show prefix po: <http://purl.org/ontology/po/> select ?synopsis from <http://www.bbc.co.uk/programmes/ b0171t8n> { <http://www.bbc.co.uk/programmes/ b0171t8n#programme> po:long_synopsis ?synopsis } => “Alex Jones and Joe Crowley are joined by the actor Neil Morrissey. Larry Lamb visits a small village in France to find out about a British WWII airman who has been honoured there since 1944. Marcus
  • 39.
  • 40.
    Open Graph Protocol UsesRDFa Found in: IMDB Rotten Tomatoes ...
  • 41.
  • 42.
    Schema.org NOT RDF. Butawfully close. Richer than OGP (in initial incarnation) Augment results
  • 43.
  • 44.