Linked (Open) Data
INFO 4302 - April 18, 2011
Bernhard Haslhofer - Cornell University
Who am I?

• Postdoc at Cornell Information Science
• Research areas
 • linked data
 • user-contributed data (annotations)
 • (meta-)data interoperability
• Contact:
 • bernhard.haslhofer@cornell.edu
Today we talk about...



http://www.youtube.com/watch?v=5Cb3ik6zP2I
Today we talk about...

• Movies, actors and other real-world entities
• How to make data about these entities
 available on the Web (Linked Data)
• Enabling technologies, best-practices and
 useful tools that help us in doing so
• Other Linked Data projects (BBC, LoC)
Web Architecture Recap
The World Wide Web (WWW)
• Internet != WWW != Google != Facebook
• Fundamental technologies
 • URI - a simple and generic syntax for identifiers
 • HTML - a markup language without formal schema
     binding
 •   HTTP - a simple protocol to access and manipulate
     resources and resource representations in a
     distributed environment

• W3C Consortium (http://www.w3.org)
URIs

• Identification of resources via Uniform
  Resource Identifiers (URIs)
•The generic syntax consists of a hierarchical sequence of components, scheme,
  Generic Syntax:
 authority, path, query, and fragment.

 URI = scheme “:” hier-path [ “?” query ] [ “#” fragment ]

 Scheme and hier-path are required, though the path may be empty.

 Example URIs with components:                               URI


     foo://example.com:8042/over/there?name=ferret#nose
     _/ ________________/_________/ _________/ __/
                                        URL
      |           |             |            |       | URN
    scheme    authority        path        query   fragment
URIs / Resources

• Information Resource
 •   web pages, images, product catalogs, etc
 •   all their essential characteristics can be conveyed in a
     message
 •   e.g., http://www.flickr.com/user2/photos/image.jpg

• Non-Information Resource
 •   other things such as dogs, people, this classroom, concepts
 •   their essence is not information
 •   e.g., http://www.example.com/ontology/meter
HTTP


• A stateless request-response protocol in the
 client-server computing model
• HTTP methods: GET, POST, PUT, DELETE, ...
• Agents may use a URI to access the
 referenced resource = dereferencing the URI
HTTP Content Negotiation

• A URI is not (necessarily) a filename
• Conneg = making available multiple resource
 representations via the same URI

                                         Plain Text
                                         text/plain



                                         HTML (en)
                     URI                  text/html



                                         HTML (jp)
        http://example.com/The_Shining   text/html

                                         Resource
(X)HTML(5)
• A resource representation data format...
• ... for presentation markup
 • rendered by user agents (typically browsers)
 • focus on readability
 • less formal, user-friendly syntax and semantics
Web Services
• Application-to-application communication
 based on the Web architecture
 • simple and open standards (HTTP, XML, JSON, ...)
 • send data from Application A to Application B
     through the Web
 •   usually define some API



                          Web

         Application A               Application B
Linked Data
Why Linked Data?
Why Linked Data?
Why Linked Data?
Why Linked Data?

• There is lots of information on the Web
• ...valuable information that can be (re-)used
• Problem
 • information is usually expressed in the form of
     HTML documents
 •   the underlying raw data are locked in closed data
     silos (mostly DBMS)
(c) http://www.flickr.com/photos/docsearls/5500714140
Why Linked Data?

• The Web is successful because it provides
 • Uniform encoding (HTML)
 • Uniform addressing (URI)
 • Uniform transportation (HTTP)
 for the exchange of documents.
• Why not apply the same mechanism to the
 underlying data?
What is Linked Data?

• A method to build a Web of Data
• Architectural style, set of standards



                        Web
What is Linked Data?

• A set of four principles
 • use URIs as names for things
 • use HTTP URIs so that people can look up those
     names
 •   when someone looks up a URI, provide useful
     information, using the standards (RDF, SPARQL)
 •   include links to other URIs, so that they can
     discover more things
Enabling Technologies
Uniform Resource Identifiers (URI)

• Name and identify things (resources)
• Dereferencable HTTP URIs
                                   http://dbpedia.org/resource/
                                        The_Shining_(film)




      http://data.linkedmdb.org/
          resource/film/2014




                                   http://rdf.freebase.com/ns/m/
                                                 04fjzv
Resource Description Framework (RDF)

• A model for representing data on the Web
• Several statements (triples) form a graph
         http://dbpedia.org/ontology/                                                   http://xmlns.com/foaf/0.1/
                     Film                                                                         Person



                      rdf:type                                                                   rdf:type



         http://dbpedia.org/resource/                                                  http://dbpedia.org/resource/
                                                       dbpprop:starring
              The_Shining_(film)                                                               Jack_Nicholson



                                                                                                     foaf:name
              rdfs:label    rdfs:label
                                                                              dbpedia-owl:birthDate



    !" (#$)                        The Shining (film)                      1937-04-22                             Jack Nicholson
RDF serialization (RDF/XML, N3, Turtle, etc.)


• Data formats for RDF resource representations
   7.2.2.3            RDF Serialization Formats: RDF/XML, N3, Turtle, N-Triple, etc



• Used to transfer RDF data between apps
          Data formats for RDF resource representations

          Used to transfer RDF data from application-to-application

          N3/Turtle example:

         @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
         @prefix dbpedia-owl:<http://dbpedia.org/ontology/> .

         <http://dbpedia.org/resource/The_Shining_%28film%29>
               rdf:type dbpedia-owl:Work , dbpedia-owl:Film .

         @prefix dbpprop:<http://dbpedia.org/property/> .
         @prefix ns9:<http://dbpedia.org/datatype/> .

         <http://dbpedia.org/resource/The_Shining_%28film%29>
               dbpprop:runtime"146.0"^^ns9:minute ;


   © Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 2
   7 Linked (Open) Data                                                                        7-15
RDF Vocabulary Description Language (RDFS)

• A language for describing the syntax and
 semantics of vocabularies in a machine-
 understandable way

                  http://dbpedia.org/ontology/
                             Work




                       rdfs:subClassOf




                  http://dbpedia.org/ontology/
                              Film
OWL - Web Ontology Language
• A more expressive (formal) language for defining the
  syntax and semantics of vocabularies
• Solves RDFS shortcomings but introduces quite some
  complexity

           http://www.w3.org/2002/07/                              http://dbpedia.org/ontology/
               owl#ObjectProperty                                             Work




                     rdf:type                        rdfs:domain




           http://dbpedia.org/ontology/                            http://dbpedia.org/ontology/
                                                     rdfs:range
                     starring                                                 Person



                            rdfs:label



                                          starring
Simple Knowledge Organization System (SKOS)

• A language for describing controlled vocabularies
      (taxonomies, thesauri, classification schemes)


                                              http://dbpedia.org/resource/
                                              Category:1980s_horror_films

                               skos:subject                                   rdf:type

http://dbpedia.org/resource/                         skos:broader
                                                                                         http://www.w3.org/2004/02/
     The_Shining_(film)                                                                       skos/core#Concept

                                                                             rdf:type
                                              http://dbpedia.org/resource/
                                                 Category:1980s_films
Links between Resources

   • OWL defines properties for linking resources
                             http://dbpedia.org/resource/                              http://dbpedia.org/resource/
                                                                 dbpprop:starring
                                  The_Shining_(film)                                           Jack_Nicholson



               owl:sameAs
                                                                                            owl:sameAs
                                          owl:sameAs
http://data.linkedmdb.org/
    resource/film/2014

                                                                                    http://data.nytimes.com/
                                                                                    N5761411277431266513
                                        http://rdf.freebase.com/ns/m/
                                                      04fjzv
SPARQL

 • A query language and protocol for accessing
7.2.2.7   SPARQL - RDF Query Language
    RDF data on the Web
    A query language and protocol for accessing RDF data on the Web


   SELECT DISTINCT ?x

   WHERE {?x skos:subject <http:dbpedia.org/resource/Cate-
      gory:1980s_horror_films>}

   LIMIT 10
Vocabulary / Data
Publishing Best Practices
Publishing Vocabularies
• Hash-based URIs
 •   e.g., http://example.com/example1#ClassA
 •   Suited to group the description of a moderate number of
     related terms into one RDF document
 •   Agent can retrieve terms with a single request

• Slash-based URIs
 •   e.g., http://example.com/example1/ClassB
 •   Suited to split terms in large vocabularies into one
     document per term
 •   No need to download a massive document
Provide either:

human-readable content from vocabulary URI
or:

machine-readable content from vocabulary URI




... depending on what is requested.
Publishing Data

• Distinguish between non-information and
 information resource
• Sample non-information resource
 • http://dbpedia.org/resource/The_Shining_(film)
• Sample information resource
 • http://dbpedia.org/page/The_Shining_(film) - HTML
 • http://dbpedia.org/data/The_Shining_(film) - RDF
Publishing Data

       GET http://dbpedia.org/resource/The_Shining_(film)
       Accept: application/rdf+xml



       303 See Other
       Location: http://dbpedia.org/data/The_Shining_(film)



       GET http://dbpedia.org/data/The_Shining_(film)
       Accept: application/rdf+xml



       200 OK
       ...
       <?xml version="1.0" encoding="utf-8"?>
       <rdf:RDF ...
The Linking Open Data
Community Project
Linking? Open? Data Project?

• Open Data: a philosophy, practice, or policy that data are
  freely available to everyone without restrictions from
  copyright, patents, a.s.o.

• Linked Data: method / best practices for exposing, sharing,
  and connecting data using URIs and RDF

• Linking Open Data: a W3C community project with the
  goal to extend the Web with a data commons by publishing
  various open data sets as RDF on the Web and by setting
  links between data items from different sources
Useful Tools
RDF APIs
•   Java
    •   Jena Semantic Web Framework (http://openjena.org/)
    •   Sesame RDF API (http://www.openrdf.org/)

•   PHP
    •   ARC (http://arc.semsol.org/)

•   Ruby
    •   RDF.rb: Linked Data for Ruby (http://rdf.rubyforge.org/)

•   Python
    •   RDFLib (http://www.rdflib.net/)

•   C
    •   Redland RDF Libraries (http://librdf.org/)
RDF Stores

• OpenLink Virtuoso (http://virtuoso.openlinksw.com/
  dataspace/dav/wiki/Main/)
• 4Store (http://4store.org/)
• AllegroGraph (http://www.franz.com/agraph/
  allegrograph/)
• Oracle 11g (http://www.oracle.com/technetwork/
  database/options/semantic-tech/ index.html)
• ...and many more: http://www.w3.org/2001/sw/wiki/Tools
RDF / Linked Data Wrappers
• D2RQ - SPARQL / Linked Data for relational
 databases (http://www4.wiwiss.fu-berlin.de/
 bizer/d2rq/)
• OAI2LOD Server - expose any OAI-PMH
 source as Linked Data
• TripFS - filesystem as Linked Data
• TripCel - XLS spreadsheets as Linked Dat
• ...
Linked Data debugging

Startup your console / terminal
  - native on Linux / Mac OS X
  - Windows: http://www.cygwin.com/

Dereference resources with cURL (http://curl.haxx.se/)
curl -I -H "Accept: application/rdf+xml" http://
dbpedia.org/resource/The_Shining_%28film%29

curl -H "Accept: application/rdf+xml" http://
dbpedia.org/data/The_Shining_%28film%29
Linked Data debugging

Install the Raptor RDF Syntax Library (http://
librdf.org/raptor/)
  - Mac: brew install raptor

Use the rapper utility to dereference URIs
rapper http://dbpedia.org/resource/The_Shining_%28film
%29

rapper -o rdfxml http://dbpedia.org/resource/
The_Shining_%28film%29
Readings
Required Reading



• T. Heath, C. Bizer. Linked Data: Evolving the Web into a
  Global Data Space, Chapters 1-5

  http://linkeddatabook.com/editions/1.0/
Recommended Readings
• Linked Data Web Site: http://linkeddata.org
• Linked Data / Semantic Web Introduction: http://
  www.linkeddatatools.com/semantic-web-basics
• Tim Berners-Lee. Linked Data Design Issues: http://
  www.w3.org/DesignIssues/LinkedData.html
• Best Practice Recipes for Publishing RDF Vocabularies:
  http://www.w3.org/TR/swbp-vocab-pub/
• How to Publish Linked Data on the Web: http://
  www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

Linked (Open) Data

  • 1.
    Linked (Open) Data INFO4302 - April 18, 2011 Bernhard Haslhofer - Cornell University
  • 2.
    Who am I? •Postdoc at Cornell Information Science • Research areas • linked data • user-contributed data (annotations) • (meta-)data interoperability • Contact: • bernhard.haslhofer@cornell.edu
  • 3.
    Today we talkabout... http://www.youtube.com/watch?v=5Cb3ik6zP2I
  • 4.
    Today we talkabout... • Movies, actors and other real-world entities • How to make data about these entities available on the Web (Linked Data) • Enabling technologies, best-practices and useful tools that help us in doing so • Other Linked Data projects (BBC, LoC)
  • 5.
  • 6.
    The World WideWeb (WWW) • Internet != WWW != Google != Facebook • Fundamental technologies • URI - a simple and generic syntax for identifiers • HTML - a markup language without formal schema binding • HTTP - a simple protocol to access and manipulate resources and resource representations in a distributed environment • W3C Consortium (http://www.w3.org)
  • 7.
    URIs • Identification ofresources via Uniform Resource Identifiers (URIs) •The generic syntax consists of a hierarchical sequence of components, scheme, Generic Syntax: authority, path, query, and fragment. URI = scheme “:” hier-path [ “?” query ] [ “#” fragment ] Scheme and hier-path are required, though the path may be empty. Example URIs with components: URI foo://example.com:8042/over/there?name=ferret#nose _/ ________________/_________/ _________/ __/ URL | | | | | URN scheme authority path query fragment
  • 8.
    URIs / Resources •Information Resource • web pages, images, product catalogs, etc • all their essential characteristics can be conveyed in a message • e.g., http://www.flickr.com/user2/photos/image.jpg • Non-Information Resource • other things such as dogs, people, this classroom, concepts • their essence is not information • e.g., http://www.example.com/ontology/meter
  • 9.
    HTTP • A statelessrequest-response protocol in the client-server computing model • HTTP methods: GET, POST, PUT, DELETE, ... • Agents may use a URI to access the referenced resource = dereferencing the URI
  • 10.
    HTTP Content Negotiation •A URI is not (necessarily) a filename • Conneg = making available multiple resource representations via the same URI Plain Text text/plain HTML (en) URI text/html HTML (jp) http://example.com/The_Shining text/html Resource
  • 11.
    (X)HTML(5) • A resourcerepresentation data format... • ... for presentation markup • rendered by user agents (typically browsers) • focus on readability • less formal, user-friendly syntax and semantics
  • 12.
    Web Services • Application-to-applicationcommunication based on the Web architecture • simple and open standards (HTTP, XML, JSON, ...) • send data from Application A to Application B through the Web • usually define some API Web Application A Application B
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Why Linked Data? •There is lots of information on the Web • ...valuable information that can be (re-)used • Problem • information is usually expressed in the form of HTML documents • the underlying raw data are locked in closed data silos (mostly DBMS)
  • 18.
  • 19.
    Why Linked Data? •The Web is successful because it provides • Uniform encoding (HTML) • Uniform addressing (URI) • Uniform transportation (HTTP) for the exchange of documents. • Why not apply the same mechanism to the underlying data?
  • 22.
    What is LinkedData? • A method to build a Web of Data • Architectural style, set of standards Web
  • 23.
    What is LinkedData? • A set of four principles • use URIs as names for things • use HTTP URIs so that people can look up those names • when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • include links to other URIs, so that they can discover more things
  • 24.
  • 25.
    Uniform Resource Identifiers(URI) • Name and identify things (resources) • Dereferencable HTTP URIs http://dbpedia.org/resource/ The_Shining_(film) http://data.linkedmdb.org/ resource/film/2014 http://rdf.freebase.com/ns/m/ 04fjzv
  • 26.
    Resource Description Framework(RDF) • A model for representing data on the Web • Several statements (triples) form a graph http://dbpedia.org/ontology/ http://xmlns.com/foaf/0.1/ Film Person rdf:type rdf:type http://dbpedia.org/resource/ http://dbpedia.org/resource/ dbpprop:starring The_Shining_(film) Jack_Nicholson foaf:name rdfs:label rdfs:label dbpedia-owl:birthDate !" (#$) The Shining (film) 1937-04-22 Jack Nicholson
  • 27.
    RDF serialization (RDF/XML,N3, Turtle, etc.) • Data formats for RDF resource representations 7.2.2.3 RDF Serialization Formats: RDF/XML, N3, Turtle, N-Triple, etc • Used to transfer RDF data between apps Data formats for RDF resource representations Used to transfer RDF data from application-to-application N3/Turtle example: @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpedia-owl:<http://dbpedia.org/ontology/> . <http://dbpedia.org/resource/The_Shining_%28film%29> rdf:type dbpedia-owl:Work , dbpedia-owl:Film . @prefix dbpprop:<http://dbpedia.org/property/> . @prefix ns9:<http://dbpedia.org/datatype/> . <http://dbpedia.org/resource/The_Shining_%28film%29> dbpprop:runtime"146.0"^^ns9:minute ; © Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 2 7 Linked (Open) Data 7-15
  • 28.
    RDF Vocabulary DescriptionLanguage (RDFS) • A language for describing the syntax and semantics of vocabularies in a machine- understandable way http://dbpedia.org/ontology/ Work rdfs:subClassOf http://dbpedia.org/ontology/ Film
  • 29.
    OWL - WebOntology Language • A more expressive (formal) language for defining the syntax and semantics of vocabularies • Solves RDFS shortcomings but introduces quite some complexity http://www.w3.org/2002/07/ http://dbpedia.org/ontology/ owl#ObjectProperty Work rdf:type rdfs:domain http://dbpedia.org/ontology/ http://dbpedia.org/ontology/ rdfs:range starring Person rdfs:label starring
  • 30.
    Simple Knowledge OrganizationSystem (SKOS) • A language for describing controlled vocabularies (taxonomies, thesauri, classification schemes) http://dbpedia.org/resource/ Category:1980s_horror_films skos:subject rdf:type http://dbpedia.org/resource/ skos:broader http://www.w3.org/2004/02/ The_Shining_(film) skos/core#Concept rdf:type http://dbpedia.org/resource/ Category:1980s_films
  • 31.
    Links between Resources • OWL defines properties for linking resources http://dbpedia.org/resource/ http://dbpedia.org/resource/ dbpprop:starring The_Shining_(film) Jack_Nicholson owl:sameAs owl:sameAs owl:sameAs http://data.linkedmdb.org/ resource/film/2014 http://data.nytimes.com/ N5761411277431266513 http://rdf.freebase.com/ns/m/ 04fjzv
  • 32.
    SPARQL • Aquery language and protocol for accessing 7.2.2.7 SPARQL - RDF Query Language RDF data on the Web A query language and protocol for accessing RDF data on the Web SELECT DISTINCT ?x WHERE {?x skos:subject <http:dbpedia.org/resource/Cate- gory:1980s_horror_films>} LIMIT 10
  • 33.
  • 34.
    Publishing Vocabularies • Hash-basedURIs • e.g., http://example.com/example1#ClassA • Suited to group the description of a moderate number of related terms into one RDF document • Agent can retrieve terms with a single request • Slash-based URIs • e.g., http://example.com/example1/ClassB • Suited to split terms in large vocabularies into one document per term • No need to download a massive document
  • 35.
  • 36.
    or: machine-readable content fromvocabulary URI ... depending on what is requested.
  • 39.
    Publishing Data • Distinguishbetween non-information and information resource • Sample non-information resource • http://dbpedia.org/resource/The_Shining_(film) • Sample information resource • http://dbpedia.org/page/The_Shining_(film) - HTML • http://dbpedia.org/data/The_Shining_(film) - RDF
  • 40.
    Publishing Data GET http://dbpedia.org/resource/The_Shining_(film) Accept: application/rdf+xml 303 See Other Location: http://dbpedia.org/data/The_Shining_(film) GET http://dbpedia.org/data/The_Shining_(film) Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ...
  • 41.
    The Linking OpenData Community Project
  • 42.
    Linking? Open? DataProject? • Open Data: a philosophy, practice, or policy that data are freely available to everyone without restrictions from copyright, patents, a.s.o. • Linked Data: method / best practices for exposing, sharing, and connecting data using URIs and RDF • Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
  • 56.
  • 57.
    RDF APIs • Java • Jena Semantic Web Framework (http://openjena.org/) • Sesame RDF API (http://www.openrdf.org/) • PHP • ARC (http://arc.semsol.org/) • Ruby • RDF.rb: Linked Data for Ruby (http://rdf.rubyforge.org/) • Python • RDFLib (http://www.rdflib.net/) • C • Redland RDF Libraries (http://librdf.org/)
  • 58.
    RDF Stores • OpenLinkVirtuoso (http://virtuoso.openlinksw.com/ dataspace/dav/wiki/Main/) • 4Store (http://4store.org/) • AllegroGraph (http://www.franz.com/agraph/ allegrograph/) • Oracle 11g (http://www.oracle.com/technetwork/ database/options/semantic-tech/ index.html) • ...and many more: http://www.w3.org/2001/sw/wiki/Tools
  • 59.
    RDF / LinkedData Wrappers • D2RQ - SPARQL / Linked Data for relational databases (http://www4.wiwiss.fu-berlin.de/ bizer/d2rq/) • OAI2LOD Server - expose any OAI-PMH source as Linked Data • TripFS - filesystem as Linked Data • TripCel - XLS spreadsheets as Linked Dat • ...
  • 60.
    Linked Data debugging Startupyour console / terminal - native on Linux / Mac OS X - Windows: http://www.cygwin.com/ Dereference resources with cURL (http://curl.haxx.se/) curl -I -H "Accept: application/rdf+xml" http:// dbpedia.org/resource/The_Shining_%28film%29 curl -H "Accept: application/rdf+xml" http:// dbpedia.org/data/The_Shining_%28film%29
  • 61.
    Linked Data debugging Installthe Raptor RDF Syntax Library (http:// librdf.org/raptor/) - Mac: brew install raptor Use the rapper utility to dereference URIs rapper http://dbpedia.org/resource/The_Shining_%28film %29 rapper -o rdfxml http://dbpedia.org/resource/ The_Shining_%28film%29
  • 62.
  • 63.
    Required Reading • T.Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space, Chapters 1-5 http://linkeddatabook.com/editions/1.0/
  • 64.
    Recommended Readings • LinkedData Web Site: http://linkeddata.org • Linked Data / Semantic Web Introduction: http:// www.linkeddatatools.com/semantic-web-basics • Tim Berners-Lee. Linked Data Design Issues: http:// www.w3.org/DesignIssues/LinkedData.html • Best Practice Recipes for Publishing RDF Vocabularies: http://www.w3.org/TR/swbp-vocab-pub/ • How to Publish Linked Data on the Web: http:// www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/