Methodological Guidelines for
   Publishing Linked Data



Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho

    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
                      http://www.oeg-upm.net
                      http://www oeg upm net
               {bvillazon,asun,ocorcho}@fi.upm.es
           Phone: 34.91.3366605, Fax: 34.91.3524819



                CONSEGI 2011 – Brasília, Brazil
                      12th May, 2011
ToC




• Introduction to Linked Data

• G id li
  Guidelines f P bli hi Li k d D t
             for Publishing Linked Data

• Demo




                           2
ToC


• Introduction to Linked Data

• Guidelines for Publishing Linked Data

• Demo




                           3
Classic Web



          MovieDB




                                                 Data exposed to
                                                  the Web via
                                                 HTML, pdf, etc.

            CIA
           World
          FactBook




© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
                                                                       4
Classic Web




                                                  Information from
                                                  Complexpages
                                                    single  queries
                                                  can be multiple
                                                    over found via
                                                     pages / data
                                                   search engines
                                                      sources??




© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
                                                                       5
What do we actually want?

      • Use the Web like a single global database




    CIA
   World                                                                                     MovieDB
  FactBook




© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
                                                                       6
Linked Data enables such Web of Data
Global Identifier: URI (Uniform Resource Identifier) which is a string of characters used
                                         Identifier),
               to identify a name or a resource on the Internet.
Data Model: RDF (Resource Description Framework), which is a standard model
               for data interchange on the Web
Access Mechanism: HTTP
Connection: Typed Links


           8000000
                                                                                                         “Even the Rain”

       http://.../population
                                                                                                          http://.../name
                                                                     http://.../filming_location
     http://cia.../Bolivia
                                                                                                   http://imdb.../TLLuvia
                                                                                                      p




        CIA
       World
                                                                                                           MovieDB
      FactBook



    © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
                                                                           7
In a nutshell
• An extension of the current
  Web…
   • … where information and services
                 data
     are given well-defined and explicitly
     represented meaning, …
   • … so that it can be shared and used
     by humans and machines ...
                       machines,
   • ... better enabling them to work in
     cooperation


• How?
   • Promoting information exchange by
     tagging web content with machine
     processable descriptions of its
     meaning.
   • A d t h l i and i f t t
     And technologies d infrastructure
     to do this
   • And clear principles on how to
     publish data


                                             8
The four principles (Tim Berners Lee, 2006)


1. Use URIs as names            • http://www.w3.org/D
   for things                     esignIssues/Linked
2. Use HTTP URIs so               Data.html
   that people can look
   up those names.           http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
                            http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

3. When someone looks
   up a URI, provide
   useful information,
   using th standards
      i the t d d
   (RDF*, SPARQL)
4.
4 Include links to other
   URIs, so that they can
   discover more things.

                            9
So does that mean I have to publish my data as Linked Data, now?

          • But, why?

                        • What was your incentive to publish an HTML page in 1990?
                           • Share data in documents and because your neighbor
                             was doing it


                         • So, why should we publish Linked Data in 2011?
                             ,   y           p
                            • Share data as data and because your neighbor is doing it




© Slide adapted from “Introduction to Linked Data”- Juan Sequeda
                                                                   10
And guess who is starting to publish Linked Data now?


 •   UK Government
 •   US Government
 •   BBC
 •   Open Calais
 •   Freebase
 •   NY Times
 •   CNET
 •   Dbpedia
 •   ….




                       11
Linked Open Data evolution

 2007


          2008

                             2009




           12     12
Linked Open Data

2010




http://richard.cyganiak.de/2007/10/lod/
                                          13
ToC




• Introduction to Linked Data

• G id li
  Guidelines f P bli hi Li k d D t
             for Publishing Linked Data

• Demo




                           14
Linked Data in OEG

• GeoLinkedData is an open initiative whose aim is to
  enrich the Web of Data with Spanish geospatial data.
                               p      g   p
   http://geo.linkeddata.es

• El Viajero Linked Data is project that focuses on the
  integration of the contents produced by newspapers
  and digital platforms belonging to Prisa Group
                                           Group.
   http://webenemasuno.linkeddata.es/

• A project with the Biblioteca Nacional to publish the
  library information as Linked Data.
         y
    http://cultura.linkeddata.es/visualizer/



                           15
Linked Data in OEG

• Tools for generating and cosuming Linked Data, e.g.,
   • geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf
                     http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf

   • map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/


• Spanish Thematic Network of Linked Data
        http://red.linkeddata.es
           p

                                 » Group leader: Ontology Engineering Group

                                 » 19 Research Groups

                                 » 4 companies




                                        16
Guidelines for Publishing Linked Data




      17
Guidelines for Publishing Linked Data




      18
Identification of the data sources



• Guidelines based on the Open Data Manual 1




• Two possibilities

   • To find the data sources already available in a public data
     catalog, e.g., Aporta project 2

   • To get an agreement with a particular government body to
     p
     publish its data sources, e.g., GeoLinkedData - IGN
                                 g



   1   http://opendatamanual.org/
   2   http://aporta.es
                                    19
Identification of the data sources
                                                             GeoLinkedData

                                                            Agreement with the IGN
                IGN
National Geographic Institute of Spain
            g p                   p

        Oracle & MySQL




                                                             Data sources available
                                                            in a public data catalog
         INE
National Statistic Institute of Spain




                                         20
Identification of the data sources
                                                IGN & INE




           Year




Province                         Industry Production Index




                  21
Guidelines for Publishing Linked Data




      22
Vocabulary Modelling
                                                                            Ontology




•   An ontology is an engineering artifact, which provides:
     •   A set of terms
     •   A set of explicit assumptions regarding the intended meaning of the terms.
           • Almost always including concepts and their classification
           • Almost always including properties between concepts




•   Shared understanding of a domain of interest
            nderstanding




                                          23
Vocabulary Modelling
                                 Reuse available vocabularies



Search for suitable
  vocabularies



                                                 Linked Open Vocabularies




    are there         Yes                  Build the vocabulary by
     suitable                                 reusing available
  vocabularies?                                 vocabularies


            No



        …
                            24
Vocabulary Modelling
                 Reuse available non-ontological resources

                                               Highly reliable Web Sites



   Search for suitable                         Domain-related sites
non-ontological resources

                                               Government Catalogs




        are there           Yes        Build the vocabulary by
         suitable                      transforming available
       resources?                             resources


               No




Build the vocabulary from
         scratch



                                  25
Vocabulary Modelling
                                                                                                                 GeoLinkedData
                                                                         WGS84 Geo
                                                                      Positioning: an RDF
                                                                          vocabulary                                   scv:Dimension
                                                                                                                          scv:Item
                                                                                                                        scv:Dataset

               hydrographical
             phenomena (rivers
                          (rivers,
                 lakes, etc.)




                                                                                                                         Vocabulary for
                                                                                                                         instants, intervals,
                                                                                                                                 ,          ,
                                                                                                                         durations, etc.




                                                                                            Names and
                                                                                            international code
                                     Ontology for OGC                                       systems for
                                     Geography Markup                                       territories and
                                     Language
                                        g g                                                 groups




Classes                        33          33
Object Properties
  j       p                    44          44
Data Properties              318          318
                                                        http://neon-toolkit.org/


                                                                      26
Guidelines for Publishing Linked Data




      27
Generation of the RDF Data




                             NOR2O

       INE




                          ODEMapster


      IGN




             Geospatial       Geometry2RDF
              column


IGN




                                       28
Generation of the RDF Data
                                                            NOR2O
Industry Production Index   Year




Province




                                   NOR2O




                                   29
Generation of the RDF Data
                                                                       R2O & ODEMapster
•   R2O is an extensible fully declarative language to describe
                extensible,
    mappings between relational database schemas and ontologies.
•   The ODEMapster processor generates RDF instances from
    relational instances based on the mapping description
    expressed in the R2O document




    www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
                                                              30
Generation of the RDF Data
                                     R2O & ODEMapster
• Creation of the R2O Mappings




                         31
Generation of the RDF Data
         R2O & ODEMapster


         Excerpt of the R2O document




32
Generation of the RDF Data
                                                                             geometry2rdf

• Tool for generating RDF from geometrical information

• The geometry could be available in GML or WKT

• The RDF generated follows our Geometry Model




  http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf

                                                           33
Generation of the RDF Data
                                geometry2rdf



                   Oracle STO UTIL package




SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry))
          AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" c
WHERE c.Etiqueta='Arroyo'




     34
Generation of the RDF Data
              geometry2rdf
Generation of the RDF Data
                                                                                        Geometry Model
                                                               geoes: http://geo.linkeddata.es/
                                                               geo: http://www.w3.org/2003/01/geo/wgs84_pos#




                                        geoes:ontology/Geometría

                      rdfs:subClassOf                                          rdfs:subClassOf
                                        rdfs:subClassOf


          geo:Point                        geoes:ontology/Curva                          geoes:ontology/Polígono


                                            formadoPor                                      formadoPor


  39
geo:lat         39
             geo:long
                                              Collection of 2 or                                 Collection of 3 or
                                              more geo:Points                                    more geo:Points




                                                          36
Generation of the RDF Data
RDF generated according to our Geometry Model




                              1   2




                          0


                 0

           37
Generation of the RDF Data
                                                                                    URI Generation

• URIs are extremely relevant in this process since
  they are the key for the alignment of heterogeneous
  resources that come from different data sources.
      • Cool URIs 1
      • UK Cabinet Office 2


• Examples:
  http://geo.linkeddata.es/ontology/{class/property}
        http://geo.linkeddata.es/ontology/Lago

  http://geo.linkeddata.es/resource/dataset/type/{resourcename}
  http://geo linkeddata es/resource/dataset/type/{resourcename}
            http://geo.linkeddata.es/resource/Provincia/Madrid

  1   http://www.w3.org/TR/cooluris/
  2   http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf

                                                                38
Generation of the RDF Data
                                                       Provenance Information

• It is relevant
    • to manage the provenance information of the resources
    • to establish the license of the information


• Example




  Pubby: http://www4.wiwiss.fu-berlin.de/pubby/


                                                  39
Guidelines for Publishing Linked Data




      40
Publication of the RDF data

          map4rdf



                                      map4rdf
http://oegdev.dia.fi.upm.es/projects/map4rdf/




                                HTML                    Linked Data            SPARQL




      Including Provenance                      Pubby
             Support

 http://www4.wiwiss.fu-berlin.de/pubby/   Pubby 0.3




                                                                          Virtuoso 6.1.0

                                                               41
Guidelines for Publishing Linked Data




      42
Data Cleansing

• To find possible errors, identified by Hogan et al.
   • http-level issues such as accessibility and derefencability
                 issues,                         derefencability,
     e.g., HTTP URIs return 40x/50x errors
   • reasoning issues such as namespace without vocabulary,
     e.g., rss:item term invented
   • malformed/incompatible datatypes, e.g., “true” as xsd:int


• To fix the identified errors

• Example, encoding URIs
   • Special characters á é ñ
                        á, é,
       • http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga




                                 43
Guidelines for Publishing Linked Data




      44
Linking the RDF Data




                     Identify suitable data sets                                       http://ckan.net
                         as li ki t
                             linking targets
                                          t




                       Discover relationships
                        between data items
LIMES                                              Silk Framework
http://aksw.org/Projects/limes                     http://www4.wiwiss.fu-berlin.de/bizer/silk/




                     Validate the relationships
                            discovered              sameAs Validator
                                                    http://oegdev.dia.fi.upm.es:8080/sameAs/




                                                                45
Linking the RDF Data
                                                                        GeoLinkedData


                   GeoLinked
                     Data




                               DBPedia                     GeoNames




        ….                                  ….                                 ….

http://dbpedia.org/re              http://geo.linkeddata                http://sws.geoname
   source/Madrid                       .es/.../Madrid                      s.org/6355233/


        ….                                 ….                                   ….

                                                46
Linking the RDF Data
                                                sameAs Validator




http://oegdev.dia.fi.upm.es:8080/sameAs/




                                           47
Guidelines for Publishing Linked Data




      48
Enable Effective Discovery
                                 Register the dataset into CKAN Registry

• Add the dataset to CKAN, the open registry of data
  and content packages

• Minimum information
    • Name, unique ID for your data set on CKAN
    • Title, full name of your data set
           ,              y
    • URL, link to the data set home page




  http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation


                                                       49
Enable Effective Discovery
                                                  Sitemap protocol

• Used by web crawlers
• Efficiently find all your content & discover
  what has been updated
             http://sitemaps.org/




A sitemap fil contains i f
   i      file      i information regarding one or more URL on
                               i         di                URLs
   your Web site. The information that is stored there helps search
   engines better spider your website.


                                 50
Enable Effective Discovery
Sindice: the best RDF search engine




     51
Enable Effective Discovery
                                                                     sitemap4rdf


• Simple command line tool
• Sends a SPARQL query to list all URIs
• Generates sitemap

 sitemap4rdf htt //
  it    4 df http://yoursite/sparql htt //
                         it /     l http://yoursite/resource/
                                                it /        /

 Example:

 sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/


• run sitemap4rdf specifying th SPARQL endpoint
       it    4 df      if i the               d i t
  and the prefix of the URLs to include in the Sitemap

  http://lab.linkeddata.deri.ie/2010/sitemap4rdf/


                                                    52
Enable Effective Discovery
                    Submit the sitemap location - Sindice

• http://sindice.com/main/submit




                           53
Enable Effective Discovery
                   Submit the sitemap location - Google

• https://www.google.com/webmasters/tools/




                         54
ToC




• Introduction to Linked Data

• G id li
  Guidelines f P bli hi Li k d D t
             for Publishing Linked Data

• Demo




                           55
DEMO
http://geo.linkeddata.es/browser
http://geo linkeddata es/browser




              56
Provinces




57
Capital of Province




58
Provinces – Industry Production Index




 59
Beaches




60
DEMO
http://webenemasuno.linkeddata.es/
http://webenemasuno linkeddata es/




                61
Trips




62
Guide Locations




63
Guide




64
Future Work




65
Methodological Guidelines for
   Publishing Linked Data



Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho

    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
                      http://www.oeg-upm.net
                      http://www oeg upm net
               {bvillazon,asun,ocorcho}@fi.upm.es
           Phone: 34.91.3366605, Fax: 34.91.3524819



                CONSEGI 2011 – Brasília, Brazil
                      12th May, 2011

Methodological Guidelines for Publishing Linked Data

  • 1.
    Methodological Guidelines for Publishing Linked Data Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGI 2011 – Brasília, Brazil 12th May, 2011
  • 2.
    ToC • Introduction toLinked Data • G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data • Demo 2
  • 3.
    ToC • Introduction toLinked Data • Guidelines for Publishing Linked Data • Demo 3
  • 4.
    Classic Web MovieDB Data exposed to the Web via HTML, pdf, etc. CIA World FactBook © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 4
  • 5.
    Classic Web Information from Complexpages single queries can be multiple over found via pages / data search engines sources?? © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 5
  • 6.
    What do weactually want? • Use the Web like a single global database CIA World MovieDB FactBook © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 6
  • 7.
    Linked Data enablessuch Web of Data Global Identifier: URI (Uniform Resource Identifier) which is a string of characters used Identifier), to identify a name or a resource on the Internet. Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the Web Access Mechanism: HTTP Connection: Typed Links 8000000 “Even the Rain” http://.../population http://.../name http://.../filming_location http://cia.../Bolivia http://imdb.../TLLuvia p CIA World MovieDB FactBook © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 7
  • 8.
    In a nutshell •An extension of the current Web… • … where information and services data are given well-defined and explicitly represented meaning, … • … so that it can be shared and used by humans and machines ... machines, • ... better enabling them to work in cooperation • How? • Promoting information exchange by tagging web content with machine processable descriptions of its meaning. • A d t h l i and i f t t And technologies d infrastructure to do this • And clear principles on how to publish data 8
  • 9.
    The four principles(Tim Berners Lee, 2006) 1. Use URIs as names • http://www.w3.org/D for things esignIssues/Linked 2. Use HTTP URIs so Data.html that people can look up those names. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html 3. When someone looks up a URI, provide useful information, using th standards i the t d d (RDF*, SPARQL) 4. 4 Include links to other URIs, so that they can discover more things. 9
  • 10.
    So does thatmean I have to publish my data as Linked Data, now? • But, why? • What was your incentive to publish an HTML page in 1990? • Share data in documents and because your neighbor was doing it • So, why should we publish Linked Data in 2011? , y p • Share data as data and because your neighbor is doing it © Slide adapted from “Introduction to Linked Data”- Juan Sequeda 10
  • 11.
    And guess whois starting to publish Linked Data now? • UK Government • US Government • BBC • Open Calais • Freebase • NY Times • CNET • Dbpedia • …. 11
  • 12.
    Linked Open Dataevolution  2007  2008  2009 12 12
  • 13.
  • 14.
    ToC • Introduction toLinked Data • G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data • Demo 14
  • 15.
    Linked Data inOEG • GeoLinkedData is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. p g p http://geo.linkeddata.es • El Viajero Linked Data is project that focuses on the integration of the contents produced by newspapers and digital platforms belonging to Prisa Group Group. http://webenemasuno.linkeddata.es/ • A project with the Biblioteca Nacional to publish the library information as Linked Data. y http://cultura.linkeddata.es/visualizer/ 15
  • 16.
    Linked Data inOEG • Tools for generating and cosuming Linked Data, e.g., • geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf • map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/ • Spanish Thematic Network of Linked Data http://red.linkeddata.es p » Group leader: Ontology Engineering Group » 19 Research Groups » 4 companies 16
  • 17.
  • 18.
  • 19.
    Identification of thedata sources • Guidelines based on the Open Data Manual 1 • Two possibilities • To find the data sources already available in a public data catalog, e.g., Aporta project 2 • To get an agreement with a particular government body to p publish its data sources, e.g., GeoLinkedData - IGN g 1 http://opendatamanual.org/ 2 http://aporta.es 19
  • 20.
    Identification of thedata sources GeoLinkedData Agreement with the IGN IGN National Geographic Institute of Spain g p p Oracle & MySQL Data sources available in a public data catalog INE National Statistic Institute of Spain 20
  • 21.
    Identification of thedata sources IGN & INE Year Province Industry Production Index 21
  • 22.
  • 23.
    Vocabulary Modelling Ontology • An ontology is an engineering artifact, which provides: • A set of terms • A set of explicit assumptions regarding the intended meaning of the terms. • Almost always including concepts and their classification • Almost always including properties between concepts • Shared understanding of a domain of interest nderstanding 23
  • 24.
    Vocabulary Modelling Reuse available vocabularies Search for suitable vocabularies Linked Open Vocabularies are there Yes Build the vocabulary by suitable reusing available vocabularies? vocabularies No … 24
  • 25.
    Vocabulary Modelling Reuse available non-ontological resources Highly reliable Web Sites Search for suitable Domain-related sites non-ontological resources Government Catalogs are there Yes Build the vocabulary by suitable transforming available resources? resources No Build the vocabulary from scratch 25
  • 26.
    Vocabulary Modelling GeoLinkedData WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers (rivers, lakes, etc.) Vocabulary for instants, intervals, , , durations, etc. Names and international code Ontology for OGC systems for Geography Markup territories and Language g g groups Classes 33 33 Object Properties j p 44 44 Data Properties 318 318 http://neon-toolkit.org/ 26
  • 27.
  • 28.
    Generation of theRDF Data NOR2O INE ODEMapster IGN Geospatial Geometry2RDF column IGN 28
  • 29.
    Generation of theRDF Data NOR2O Industry Production Index Year Province NOR2O 29
  • 30.
    Generation of theRDF Data R2O & ODEMapster • R2O is an extensible fully declarative language to describe extensible, mappings between relational database schemas and ontologies. • The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster 30
  • 31.
    Generation of theRDF Data R2O & ODEMapster • Creation of the R2O Mappings 31
  • 32.
    Generation of theRDF Data R2O & ODEMapster Excerpt of the R2O document 32
  • 33.
    Generation of theRDF Data geometry2rdf • Tool for generating RDF from geometrical information • The geometry could be available in GML or WKT • The RDF generated follows our Geometry Model http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf 33
  • 34.
    Generation of theRDF Data geometry2rdf Oracle STO UTIL package SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry FROM "BCN200"."BCN200_0301L_RIO" c WHERE c.Etiqueta='Arroyo' 34
  • 35.
    Generation of theRDF Data geometry2rdf
  • 36.
    Generation of theRDF Data Geometry Model geoes: http://geo.linkeddata.es/ geo: http://www.w3.org/2003/01/geo/wgs84_pos# geoes:ontology/Geometría rdfs:subClassOf rdfs:subClassOf rdfs:subClassOf geo:Point geoes:ontology/Curva geoes:ontology/Polígono formadoPor formadoPor 39 geo:lat 39 geo:long Collection of 2 or Collection of 3 or more geo:Points more geo:Points 36
  • 37.
    Generation of theRDF Data RDF generated according to our Geometry Model 1 2 0 0 37
  • 38.
    Generation of theRDF Data URI Generation • URIs are extremely relevant in this process since they are the key for the alignment of heterogeneous resources that come from different data sources. • Cool URIs 1 • UK Cabinet Office 2 • Examples: http://geo.linkeddata.es/ontology/{class/property} http://geo.linkeddata.es/ontology/Lago http://geo.linkeddata.es/resource/dataset/type/{resourcename} http://geo linkeddata es/resource/dataset/type/{resourcename} http://geo.linkeddata.es/resource/Provincia/Madrid 1 http://www.w3.org/TR/cooluris/ 2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf 38
  • 39.
    Generation of theRDF Data Provenance Information • It is relevant • to manage the provenance information of the resources • to establish the license of the information • Example Pubby: http://www4.wiwiss.fu-berlin.de/pubby/ 39
  • 40.
  • 41.
    Publication of theRDF data map4rdf map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/ HTML Linked Data SPARQL Including Provenance Pubby Support http://www4.wiwiss.fu-berlin.de/pubby/ Pubby 0.3 Virtuoso 6.1.0 41
  • 42.
  • 43.
    Data Cleansing • Tofind possible errors, identified by Hogan et al. • http-level issues such as accessibility and derefencability issues, derefencability, e.g., HTTP URIs return 40x/50x errors • reasoning issues such as namespace without vocabulary, e.g., rss:item term invented • malformed/incompatible datatypes, e.g., “true” as xsd:int • To fix the identified errors • Example, encoding URIs • Special characters á é ñ á, é, • http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga 43
  • 44.
  • 45.
    Linking the RDFData Identify suitable data sets http://ckan.net as li ki t linking targets t Discover relationships between data items LIMES Silk Framework http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/ Validate the relationships discovered sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 45
  • 46.
    Linking the RDFData GeoLinkedData GeoLinked Data DBPedia GeoNames …. …. …. http://dbpedia.org/re http://geo.linkeddata http://sws.geoname source/Madrid .es/.../Madrid s.org/6355233/ …. …. …. 46
  • 47.
    Linking the RDFData sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 47
  • 48.
  • 49.
    Enable Effective Discovery Register the dataset into CKAN Registry • Add the dataset to CKAN, the open registry of data and content packages • Minimum information • Name, unique ID for your data set on CKAN • Title, full name of your data set , y • URL, link to the data set home page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation 49
  • 50.
    Enable Effective Discovery Sitemap protocol • Used by web crawlers • Efficiently find all your content & discover what has been updated http://sitemaps.org/ A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 50
  • 51.
    Enable Effective Discovery Sindice:the best RDF search engine 51
  • 52.
    Enable Effective Discovery sitemap4rdf • Simple command line tool • Sends a SPARQL query to list all URIs • Generates sitemap sitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / / Example: sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/ • run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ 52
  • 53.
    Enable Effective Discovery Submit the sitemap location - Sindice • http://sindice.com/main/submit 53
  • 54.
    Enable Effective Discovery Submit the sitemap location - Google • https://www.google.com/webmasters/tools/ 54
  • 55.
    ToC • Introduction toLinked Data • G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data • Demo 55
  • 56.
  • 57.
  • 58.
  • 59.
    Provinces – IndustryProduction Index 59
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 67.
    Methodological Guidelines for Publishing Linked Data Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGI 2011 – Brasília, Brazil 12th May, 2011