Digital Enterprise Research Institute                                                               www.deri.ie




                                   Linked Data:
                            opportunities and challenges
                                              Dr. Michael Hausenblas, DERI, NUI Galway

                      Open Science Data Cloud NSF PIRE Workshop, Edinburgh, UK, 18 July 2012




 Copyright 2011 Digital Enterprise Research Institute. All rights reserved.




                                                                              Enabling Networked Knowledge
Linked Data 101
Linked Data principles


① Use URIs to identify the “things” in your data


② Use HTTP URIs so people & machines can look them up


③ When a URI is looked up return a description of the thing in a
  structured format (RDF)


④ Link to related things to provide context



                         http://www.w3.org/DesignIssues/LinkedData.html
HTTP URIs
HTTP URIs




curl -L -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Edinburgh

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
     xmlns:dbpedia-owl="http://dbpedia.org/ontology/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:dbpprop="http://dbpedia.org/property/"
     xmlns:ns10="http://dbpedia.org/property/start/" >
  <rdf:Description rdf:about="http://dbpedia.org/resource/Firrhill_High_School">
    <dbpedia-owl:city rdf:resource="http://dbpedia.org/resource/Edinburgh" />
  </rdf:Description>
  <rdf:Description rdf:about="http://dbpedia.org/resource/Murrayfield_Stadium">
    <dbpedia-owl:location rdf:resource="http://dbpedia.org/resource/Edinburgh" />
    <dbpprop:location rdf:resource="http://dbpedia.org/resource/Edinburgh" />
  </rdf:Description>
  <rdf:Description rdf:about="http://dbpedia.org/resource/Stewart%27s_Melville_College">
    <dbpedia-owl:city rdf:resource="http://dbpedia.org/resource/Edinburgh" />
    <dbpprop:city rdf:resource="http://dbpedia.org/resource/Edinburgh" />
  </rdf:Description>
HTTP URIs




curl -L -H "Accept: text/turtle" http://data.ordnancesurvey.co.uk/id/7000000000017765


<http://data.ordnancesurvey.co.uk/doc/7000000000017765> rdf:type foaf:Document, dctype:Text ;
       foaf:primaryTopic <http://data.ordnancesurvey.co.uk/id/7000000000017765> ;
       dct:title "Linked Data for The County of Hampshire" ;
       dct:hasFormat <http://data.ordnancesurvey.co.uk/doc/7000000000017765.rdf> ,
                     <http://data.ordnancesurvey.co.uk/doc/7000000000017765.html> ,
                     <http://data.ordnancesurvey.co.uk/doc/7000000000017765.json> ,
                     <http://data.ordnancesurvey.co.uk/doc/7000000000017765.ttl> .

<http://data.ordnancesurvey.co.uk/id/7000000000017636> rdfs:label "Tadley" ;
                                                       skos:prefLabel "Tadley" .

<http://data.ordnancesurvey.co.uk/id/7000000000017510> rdfs:label "Newton Valence" ;
                                                       skos:prefLabel "Newton Valence" .

<http://data.ordnancesurvey.co.uk/id/7000000000017817> rdfs:label "Ashmansworth" ;
                                                       skos:prefLabel "Ashmansworth" .
HTTP URIs




curl -L -H "Accept: text/turtle” http://bio2rdf.org/genbank:AC008393

@prefix   rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix   owl: <http://www.w3.org/2002/07/owl#> .
@prefix   rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix   dc: http://purl.org/dc/elements/1.1/ .

<http://bio2rdf.org/genbank:AC008393> a <http://bio2rdf.org/genbank_resource:Sequence> ;
     rdfs:label "Homo sapiens chromosome 5 clone CTC-241N9, complete sequence [genbank:AC008393]"
;
     owl:sameAs <http://bio2rdf.org/genbank:ac008393> ;
     dc:title "Homo sapiens chromosome 5 clone CTC-241N9, complete sequence" ;
     dc:modified "26-FEB-2002" ;
     <http://bio2rdf.org/bio2rdf_resource:length> "166847" ;
     <http://bio2rdf.org/bio2rdf_resource:linkedToFrom>
HTTP URIs




curl -L -H "Accept: text/turtle” http://bnb.data.bl.uk/doc/resource/009468944

@prefix   rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix   rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix   dct: <http://purl.org/dc/terms/> .
@prefix   blterms: <http://www.bl.uk/schemas/bibliographic/blterms#> .
@prefix   elements: <http://iflastandards.info/ns/isbd/elements/> .
@prefix   bibo: <http://purl.org/ontology/bibo/> .
@prefix   owl: <http://www.w3.org/2002/07/owl#> .
@prefix   foaf: <http://xmlns.com/foaf/0.1/> .
@prefix   linked-data: <http://purl.org/linked-data/api/vocab#> .
@prefix   void: <http://rdfs.org/ns/void#> .

<http://bnb.data.bl.uk/id/resource/009468944> dct:language <http://lexvo.org/id/iso639-3/eng> ;
     rdfs:seeAlso <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0859761541> ;
     elements:P1053 "vii,147p."@en ;
     rdfs:label "William Wallace / Andrew Fisher" ;
     blterms:bnb "GB8714157" ;
     dct:creator <http://bnb.data.bl.uk/id/person/FisherAndrew1935-> ;
     bibo:isbn10 "0859761541" ;
     dct:title "William Wallace" ;
     rdf:type bibo:Book ,
     dct:BibliographicResource ;
     dct:subject <http://bnb.data.bl.uk/id/concept/ddc/e19/941.1020924> .
Linked Open Data
Linked Open Data cloud




               2007   2008
                                   2008          2010
                                                 2009
                          2008            2009
                            2008




          10
Linked Open Data cloud
http://lod-cloud.net/




       Over 300 open data sets with 40 billion facts, interlinked by 500 million typed links.
Linked Open Data cloud stats
Digital Enterprise Research Institute                                                  www.deri.ie



                                                         triples distribution




                                                         links distribution




                                                                    http://lod-cloud.net/state/

                                                           Enabling Networked Knowledge
… cost and benefits
Linked Data life cycles
Linked Data life cycles
Digital Enterprise Research Institute                                                                         www.deri.ie



                                           http://linked-data-life-cycles.info

   1                        2                   3                4                5                   6

         data                   modeling            publishing       discovery        integration         use cases
       awareness




  LOD cloud                 Neologism          Google Refine         FYN          LATC 24/7         data-gov.ie

  5stardata.info            Schema.org         D2RQ                  LATC DSI




                                                                             Enabling Networked Knowledge
Modeling
Digital Enterprise Research Institute                                                                    www.deri.ie




   1                        2              3                4                5                   6

         data                   modeling       publishing       discovery        integration         use cases
       awareness




  LOD cloud                 Neologism      Google Refine        FYN          LATC 24/7         data-gov.ie

  5stardata.info            Schema.org     D2RQ                 LATC DSI




                                                                        Enabling Networked Knowledge
Neologism
Digital Enterprise Research Institute                                www.deri.ie


  http://neologism.deri.ie/




                                               Enabling Networked Knowledge
Neologism
Digital Enterprise Research Institute                                www.deri.ie


   http://vocab.data.gov/




                                               Enabling Networked Knowledge
Schema.org – Linked Data
Digital Enterprise Research Institute                                      www.deri.ie




                                                     Enabling Networked Knowledge
Publishing
Digital Enterprise Research Institute                                                                    www.deri.ie




   1                        2              3                4                5                   6

         data                   modeling       publishing       discovery        integration         use cases
       awareness




  LOD cloud                 Neologism      Google Refine        FYN          LATC 24/7         data-gov.ie

  5stardata.info            Schema.org     D2RQ                 LATC DSI




                                                                        Enabling Networked Knowledge
Google Refine extension
Digital Enterprise Research Institute                                             www.deri.ie

http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/




                                                            Enabling Networked Knowledge
RDB2RDF – D2RQ
Digital Enterprise Research Institute                                 www.deri.ie


http://d2rq.org/




                                                Enabling Networked Knowledge
Discovery
Digital Enterprise Research Institute                                                                    www.deri.ie




   1                        2              3                4                5                   6

         data                   modeling       publishing       discovery        integration         use cases
       awareness




  LOD cloud                 Neologism      Google Refine        FYN          LATC 24/7         data-gov.ie

  5stardata.info            Schema.org     D2RQ                 LATC DSI




                                                                        Enabling Networked Knowledge
Follow-Your-Nose
Digital Enterprise Research Institute                                   www.deri.ie




                                                  Enabling Networked Knowledge
                                             24
Dataset discovery
Digital Enterprise Research Institute                                    www.deri.ie



http://dsi.lod-cloud.net/




                                                   Enabling Networked Knowledge
Integration
Digital Enterprise Research Institute                                                                    www.deri.ie




   1                        2              3                4                5                   6

         data                   modeling       publishing       discovery        integration         use cases
       awareness




  LOD cloud                 Neologism      Google Refine        FYN          LATC 24/7         data-gov.ie

  5stardata.info            Schema.org     D2RQ                 LATC DSI




                                                                        Enabling Networked Knowledge
Why linking?
Digital Enterprise Research Institute                                                           www.deri.ie

                                                   http://webofdata.wordpress.com/2011/05/22/why-we-link/




Central Contractor Registration (CCR)




                                        Geonames


                                                                    Enabling Networked Knowledge
Effort distribution
Digital Enterprise Research Institute                                              www.deri.ie




                                                                          Third
                               Fix                   Publisher‘s          Party
                                                       Effort
                           Overall Data                                   Effort
                           Integration
                              Effort
                                                             Consumer‘s
                                                               Effort




                                                         Enabling Networked Knowledge
LATC – Interlinking Platform
Digital Enterprise Research Institute                                        www.deri.ie

http://latc-project.eu/platform




                                                       Enabling Networked Knowledge
http://www4.wiwiss.fu-berlin.de/latc/toollibrary/screencast.html
Conclusion
Digital Enterprise Research Institute                                           www.deri.ie
  Opportunities
                   Use the LOD cloud as test-bed (experiments)
                   Benefit from LOD cloud in apps (context)
                   Contribute to make your data more valuable




                                                                                   Challenges
                                  Large-scale processing of Linked Data
                                  Distributed/federated SPARQL queries
                                  Quality of links and the data



                                                          Enabling Networked Knowledge
Resources
Digital Enterprise Research Institute                                                                      www.deri.ie




     Tutorials, technologies, specifications:
                 http://linkeddatabook.com
                 http://lod-cloud.net
                 http://linkeddata.org
                 http://linkeddata-specs.info
                 http://schema.rdfs.org

     Videos:
                 http://ted.com/talks/tim_berners_lee_on_the_next_web.html - Tim Berners-Lee’s TED talk
                 http://www.youtube.com/watch?v=GKfJ5onP5SQ - Linked Data (and the Web of Data)
                 http://www.youtube.com/watch?v=4x_xzT5eF5Q - What is Linked Data?
                 http://vimeo.com/36752317 - Linked Open Data (by Europeana)




                                                                         Enabling Networked Knowledge

Linked Data: opportunities and challenges

  • 1.
    Digital Enterprise ResearchInstitute www.deri.ie Linked Data: opportunities and challenges Dr. Michael Hausenblas, DERI, NUI Galway Open Science Data Cloud NSF PIRE Workshop, Edinburgh, UK, 18 July 2012 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Enabling Networked Knowledge
  • 2.
  • 3.
    Linked Data principles ①Use URIs to identify the “things” in your data ② Use HTTP URIs so people & machines can look them up ③ When a URI is looked up return a description of the thing in a structured format (RDF) ④ Link to related things to provide context http://www.w3.org/DesignIssues/LinkedData.html
  • 4.
  • 5.
    HTTP URIs curl -L-H "Accept: application/rdf+xml" http://dbpedia.org/resource/Edinburgh <?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:dbpedia-owl="http://dbpedia.org/ontology/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dbpprop="http://dbpedia.org/property/" xmlns:ns10="http://dbpedia.org/property/start/" > <rdf:Description rdf:about="http://dbpedia.org/resource/Firrhill_High_School"> <dbpedia-owl:city rdf:resource="http://dbpedia.org/resource/Edinburgh" /> </rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/Murrayfield_Stadium"> <dbpedia-owl:location rdf:resource="http://dbpedia.org/resource/Edinburgh" /> <dbpprop:location rdf:resource="http://dbpedia.org/resource/Edinburgh" /> </rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/Stewart%27s_Melville_College"> <dbpedia-owl:city rdf:resource="http://dbpedia.org/resource/Edinburgh" /> <dbpprop:city rdf:resource="http://dbpedia.org/resource/Edinburgh" /> </rdf:Description>
  • 6.
    HTTP URIs curl -L-H "Accept: text/turtle" http://data.ordnancesurvey.co.uk/id/7000000000017765 <http://data.ordnancesurvey.co.uk/doc/7000000000017765> rdf:type foaf:Document, dctype:Text ; foaf:primaryTopic <http://data.ordnancesurvey.co.uk/id/7000000000017765> ; dct:title "Linked Data for The County of Hampshire" ; dct:hasFormat <http://data.ordnancesurvey.co.uk/doc/7000000000017765.rdf> , <http://data.ordnancesurvey.co.uk/doc/7000000000017765.html> , <http://data.ordnancesurvey.co.uk/doc/7000000000017765.json> , <http://data.ordnancesurvey.co.uk/doc/7000000000017765.ttl> . <http://data.ordnancesurvey.co.uk/id/7000000000017636> rdfs:label "Tadley" ; skos:prefLabel "Tadley" . <http://data.ordnancesurvey.co.uk/id/7000000000017510> rdfs:label "Newton Valence" ; skos:prefLabel "Newton Valence" . <http://data.ordnancesurvey.co.uk/id/7000000000017817> rdfs:label "Ashmansworth" ; skos:prefLabel "Ashmansworth" .
  • 7.
    HTTP URIs curl -L-H "Accept: text/turtle” http://bio2rdf.org/genbank:AC008393 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dc: http://purl.org/dc/elements/1.1/ . <http://bio2rdf.org/genbank:AC008393> a <http://bio2rdf.org/genbank_resource:Sequence> ; rdfs:label "Homo sapiens chromosome 5 clone CTC-241N9, complete sequence [genbank:AC008393]" ; owl:sameAs <http://bio2rdf.org/genbank:ac008393> ; dc:title "Homo sapiens chromosome 5 clone CTC-241N9, complete sequence" ; dc:modified "26-FEB-2002" ; <http://bio2rdf.org/bio2rdf_resource:length> "166847" ; <http://bio2rdf.org/bio2rdf_resource:linkedToFrom>
  • 8.
    HTTP URIs curl -L-H "Accept: text/turtle” http://bnb.data.bl.uk/doc/resource/009468944 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix blterms: <http://www.bl.uk/schemas/bibliographic/blterms#> . @prefix elements: <http://iflastandards.info/ns/isbd/elements/> . @prefix bibo: <http://purl.org/ontology/bibo/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix linked-data: <http://purl.org/linked-data/api/vocab#> . @prefix void: <http://rdfs.org/ns/void#> . <http://bnb.data.bl.uk/id/resource/009468944> dct:language <http://lexvo.org/id/iso639-3/eng> ; rdfs:seeAlso <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0859761541> ; elements:P1053 "vii,147p."@en ; rdfs:label "William Wallace / Andrew Fisher" ; blterms:bnb "GB8714157" ; dct:creator <http://bnb.data.bl.uk/id/person/FisherAndrew1935-> ; bibo:isbn10 "0859761541" ; dct:title "William Wallace" ; rdf:type bibo:Book , dct:BibliographicResource ; dct:subject <http://bnb.data.bl.uk/id/concept/ddc/e19/941.1020924> .
  • 9.
  • 10.
    Linked Open Datacloud 2007 2008 2008 2010 2009 2008 2009 2008 10
  • 11.
    Linked Open Datacloud http://lod-cloud.net/ Over 300 open data sets with 40 billion facts, interlinked by 500 million typed links.
  • 12.
    Linked Open Datacloud stats Digital Enterprise Research Institute www.deri.ie triples distribution links distribution http://lod-cloud.net/state/ Enabling Networked Knowledge
  • 13.
    … cost andbenefits
  • 14.
  • 15.
    Linked Data lifecycles Digital Enterprise Research Institute www.deri.ie http://linked-data-life-cycles.info 1 2 3 4 5 6 data modeling publishing discovery integration use cases awareness LOD cloud Neologism Google Refine FYN LATC 24/7 data-gov.ie 5stardata.info Schema.org D2RQ LATC DSI Enabling Networked Knowledge
  • 16.
    Modeling Digital Enterprise ResearchInstitute www.deri.ie 1 2 3 4 5 6 data modeling publishing discovery integration use cases awareness LOD cloud Neologism Google Refine FYN LATC 24/7 data-gov.ie 5stardata.info Schema.org D2RQ LATC DSI Enabling Networked Knowledge
  • 17.
    Neologism Digital Enterprise ResearchInstitute www.deri.ie http://neologism.deri.ie/ Enabling Networked Knowledge
  • 18.
    Neologism Digital Enterprise ResearchInstitute www.deri.ie http://vocab.data.gov/ Enabling Networked Knowledge
  • 19.
    Schema.org – LinkedData Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  • 20.
    Publishing Digital Enterprise ResearchInstitute www.deri.ie 1 2 3 4 5 6 data modeling publishing discovery integration use cases awareness LOD cloud Neologism Google Refine FYN LATC 24/7 data-gov.ie 5stardata.info Schema.org D2RQ LATC DSI Enabling Networked Knowledge
  • 21.
    Google Refine extension DigitalEnterprise Research Institute www.deri.ie http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ Enabling Networked Knowledge
  • 22.
    RDB2RDF – D2RQ DigitalEnterprise Research Institute www.deri.ie http://d2rq.org/ Enabling Networked Knowledge
  • 23.
    Discovery Digital Enterprise ResearchInstitute www.deri.ie 1 2 3 4 5 6 data modeling publishing discovery integration use cases awareness LOD cloud Neologism Google Refine FYN LATC 24/7 data-gov.ie 5stardata.info Schema.org D2RQ LATC DSI Enabling Networked Knowledge
  • 24.
    Follow-Your-Nose Digital Enterprise ResearchInstitute www.deri.ie Enabling Networked Knowledge 24
  • 25.
    Dataset discovery Digital EnterpriseResearch Institute www.deri.ie http://dsi.lod-cloud.net/ Enabling Networked Knowledge
  • 26.
    Integration Digital Enterprise ResearchInstitute www.deri.ie 1 2 3 4 5 6 data modeling publishing discovery integration use cases awareness LOD cloud Neologism Google Refine FYN LATC 24/7 data-gov.ie 5stardata.info Schema.org D2RQ LATC DSI Enabling Networked Knowledge
  • 27.
    Why linking? Digital EnterpriseResearch Institute www.deri.ie http://webofdata.wordpress.com/2011/05/22/why-we-link/ Central Contractor Registration (CCR) Geonames Enabling Networked Knowledge
  • 28.
    Effort distribution Digital EnterpriseResearch Institute www.deri.ie Third Fix Publisher‘s Party Effort Overall Data Effort Integration Effort Consumer‘s Effort Enabling Networked Knowledge
  • 29.
    LATC – InterlinkingPlatform Digital Enterprise Research Institute www.deri.ie http://latc-project.eu/platform Enabling Networked Knowledge
  • 30.
  • 31.
    Conclusion Digital Enterprise ResearchInstitute www.deri.ie Opportunities  Use the LOD cloud as test-bed (experiments)  Benefit from LOD cloud in apps (context)  Contribute to make your data more valuable Challenges  Large-scale processing of Linked Data  Distributed/federated SPARQL queries  Quality of links and the data Enabling Networked Knowledge
  • 32.
    Resources Digital Enterprise ResearchInstitute www.deri.ie Tutorials, technologies, specifications:  http://linkeddatabook.com  http://lod-cloud.net  http://linkeddata.org  http://linkeddata-specs.info  http://schema.rdfs.org Videos:  http://ted.com/talks/tim_berners_lee_on_the_next_web.html - Tim Berners-Lee’s TED talk  http://www.youtube.com/watch?v=GKfJ5onP5SQ - Linked Data (and the Web of Data)  http://www.youtube.com/watch?v=4x_xzT5eF5Q - What is Linked Data?  http://vimeo.com/36752317 - Linked Open Data (by Europeana) Enabling Networked Knowledge

Editor's Notes

  • #11 In the Figure each node representsa distinct dataset and arcs indicate the existenceof links between data elements in the two data sets.
  • #12 Some 300 datasets, 35billion facts, over 500 million links