Methodological Guidelines for   Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho    F...
ToC• Introduction to Linked Data• G id li  Guidelines f P bli hi Li k d D t             for Publishing Linked Data• Demo  ...
ToC• Introduction to Linked Data• Guidelines for Publishing Linked Data• Demo                           3
Classic Web          MovieDB                                                 Data exposed to                              ...
Classic Web                                                  Information from                                             ...
What do we actually want?      • Use the Web like a single global database    CIA   World                                 ...
Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters u...
In a nutshell• An extension of the current  Web…   • … where information and services                 data     are given w...
The four principles (Tim Berners Lee, 2006)1. Use URIs as names            • http://www.w3.org/D   for things             ...
So does that mean I have to publish my data as Linked Data, now?          • But, why?                        • What was yo...
And guess who is starting to publish Linked Data now? •   UK Government •   US Government •   BBC •   Open Calais •   Free...
Linked Open Data evolution 2007          2008                             2009           12     12
Linked Open Data2010http://richard.cyganiak.de/2007/10/lod/                                          13
ToC• Introduction to Linked Data• G id li  Guidelines f P bli hi Li k d D t             for Publishing Linked Data• Demo  ...
Linked Data in OEG• GeoLinkedData is an open initiative whose aim is to  enrich the Web of Data with Spanish geospatial da...
Linked Data in OEG• Tools for generating and cosuming Linked Data, e.g.,   • geometry2rdf http://www oeg upm net/index php...
Guidelines for Publishing Linked Data      17
Guidelines for Publishing Linked Data      18
Identification of the data sources• Guidelines based on the Open Data Manual 1• Two possibilities   • To find the data sou...
Identification of the data sources                                                             GeoLinkedData              ...
Identification of the data sources                                                IGN & INE           YearProvince        ...
Guidelines for Publishing Linked Data      22
Vocabulary Modelling                                                                            Ontology•   An ontology is...
Vocabulary Modelling                                 Reuse available vocabulariesSearch for suitable  vocabularies        ...
Vocabulary Modelling                 Reuse available non-ontological resources                                            ...
Vocabulary Modelling                                                                                                      ...
Guidelines for Publishing Linked Data      27
Generation of the RDF Data                             NOR2O       INE                          ODEMapster      IGN       ...
Generation of the RDF Data                                                            NOR2OIndustry Production Index   Yea...
Generation of the RDF Data                                                                       R2O & ODEMapster•   R2O i...
Generation of the RDF Data                                     R2O & ODEMapster• Creation of the R2O Mappings             ...
Generation of the RDF Data         R2O & ODEMapster         Excerpt of the R2O document32
Generation of the RDF Data                                                                             geometry2rdf• Tool ...
Generation of the RDF Data                                geometry2rdf                   Oracle STO UTIL packageSELECT TO_...
Generation of the RDF Data              geometry2rdf
Generation of the RDF Data                                                                                        Geometry...
Generation of the RDF DataRDF generated according to our Geometry Model                              1   2                ...
Generation of the RDF Data                                                                                    URI Generati...
Generation of the RDF Data                                                       Provenance Information• It is relevant   ...
Guidelines for Publishing Linked Data      40
Publication of the RDF data          map4rdf                                      map4rdfhttp://oegdev.dia.fi.upm.es/proje...
Guidelines for Publishing Linked Data      42
Data Cleansing• To find possible errors, identified by Hogan et al.   • http-level issues such as accessibility and derefe...
Guidelines for Publishing Linked Data      44
Linking the RDF Data                     Identify suitable data sets                                       http://ckan.net...
Linking the RDF Data                                                                        GeoLinkedData                 ...
Linking the RDF Data                                                sameAs Validatorhttp://oegdev.dia.fi.upm.es:8080/sameA...
Guidelines for Publishing Linked Data      48
Enable Effective Discovery                                 Register the dataset into CKAN Registry• Add the dataset to CKA...
Enable Effective Discovery                                                  Sitemap protocol• Used by web crawlers• Effici...
Enable Effective DiscoverySindice: the best RDF search engine     51
Enable Effective Discovery                                                                     sitemap4rdf• Simple command...
Enable Effective Discovery                    Submit the sitemap location - Sindice• http://sindice.com/main/submit       ...
Enable Effective Discovery                   Submit the sitemap location - Google• https://www.google.com/webmasters/tools...
ToC• Introduction to Linked Data• G id li  Guidelines f P bli hi Li k d D t             for Publishing Linked Data• Demo  ...
DEMOhttp://geo.linkeddata.es/browserhttp://geo linkeddata es/browser              56
Provinces57
Capital of Province58
Provinces – Industry Production Index 59
Beaches60
DEMOhttp://webenemasuno.linkeddata.es/http://webenemasuno linkeddata es/                61
Trips62
Guide Locations63
Guide64
Future Work65
Methodological Guidelines for   Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho    F...
Methodological Guidelines for Publishing Linked Data
Upcoming SlideShare
Loading in …5
×

Methodological Guidelines for Publishing Linked Data

1,022 views

Published on

Methodological Guidelines for Publishing Linked Data presented at CONSEGI 2011

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,022
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Methodological Guidelines for Publishing Linked Data

  1. 1. Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGUI 2011 – Brasília, Brazil 12th May, 2011
  2. 2. ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 2
  3. 3. ToC• Introduction to Linked Data• Guidelines for Publishing Linked Data• Demo 3
  4. 4. Classic Web MovieDB Data exposed to the Web via HTML, pdf, etc. CIA World FactBook© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 4
  5. 5. Classic Web Information from Complexpages single queries can be multiple over found via pages / data search engines sources??© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 5
  6. 6. What do we actually want? • Use the Web like a single global database CIA World MovieDB FactBook© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 6
  7. 7. Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters used Identifier), to identify a name or a resource on the Internet.Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the WebAccess Mechanism: HTTPConnection: Typed Links 8000000 “Even the Rain” http://.../population http://.../name http://.../filming_location http://cia.../Bolivia http://imdb.../TLLuvia p CIA World MovieDB FactBook © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 7
  8. 8. In a nutshell• An extension of the current Web… • … where information and services data are given well-defined and explicitly represented meaning, … • … so that it can be shared and used by humans and machines ... machines, • ... better enabling them to work in cooperation• How? • Promoting information exchange by tagging web content with machine processable descriptions of its meaning. • A d t h l i and i f t t And technologies d infrastructure to do this • And clear principles on how to publish data 8
  9. 9. The four principles (Tim Berners Lee, 2006)1. Use URIs as names • http://www.w3.org/D for things esignIssues/Linked2. Use HTTP URIs so Data.html that people can look up those names. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html3. When someone looks up a URI, provide useful information, using th standards i the t d d (RDF*, SPARQL)4.4 Include links to other URIs, so that they can discover more things. 9
  10. 10. So does that mean I have to publish my data as Linked Data, now? • But, why? • What was your incentive to publish an HTML page in 1990? • Share data in documents and because your neighbor was doing it • So, why should we publish Linked Data in 2011? , y p • Share data as data and because your neighbor is doing it© Slide adapted from “Introduction to Linked Data”- Juan Sequeda 10
  11. 11. And guess who is starting to publish Linked Data now? • UK Government • US Government • BBC • Open Calais • Freebase • NY Times • CNET • Dbpedia • …. 11
  12. 12. Linked Open Data evolution 2007  2008  2009 12 12
  13. 13. Linked Open Data2010http://richard.cyganiak.de/2007/10/lod/ 13
  14. 14. ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 14
  15. 15. Linked Data in OEG• GeoLinkedData is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. p g p http://geo.linkeddata.es• El Viajero Linked Data is project that focuses on the integration of the contents produced by newspapers and digital platforms belonging to Prisa Group Group. http://webenemasuno.linkeddata.es/• A project with the Biblioteca Nacional to publish the library information as Linked Data. y http://cultura.linkeddata.es/visualizer/ 15
  16. 16. Linked Data in OEG• Tools for generating and cosuming Linked Data, e.g., • geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf • map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/• Spanish Thematic Network of Linked Data http://red.linkeddata.es p » Group leader: Ontology Engineering Group » 19 Research Groups » 4 companies 16
  17. 17. Guidelines for Publishing Linked Data 17
  18. 18. Guidelines for Publishing Linked Data 18
  19. 19. Identification of the data sources• Guidelines based on the Open Data Manual 1• Two possibilities • To find the data sources already available in a public data catalog, e.g., Aporta project 2 • To get an agreement with a particular government body to p publish its data sources, e.g., GeoLinkedData - IGN g 1 http://opendatamanual.org/ 2 http://aporta.es 19
  20. 20. Identification of the data sources GeoLinkedData Agreement with the IGN IGNNational Geographic Institute of Spain g p p Oracle & MySQL Data sources available in a public data catalog INENational Statistic Institute of Spain 20
  21. 21. Identification of the data sources IGN & INE YearProvince Industry Production Index 21
  22. 22. Guidelines for Publishing Linked Data 22
  23. 23. Vocabulary Modelling Ontology• An ontology is an engineering artifact, which provides: • A set of terms • A set of explicit assumptions regarding the intended meaning of the terms. • Almost always including concepts and their classification • Almost always including properties between concepts• Shared understanding of a domain of interest nderstanding 23
  24. 24. Vocabulary Modelling Reuse available vocabulariesSearch for suitable vocabularies Linked Open Vocabularies are there Yes Build the vocabulary by suitable reusing available vocabularies? vocabularies No … 24
  25. 25. Vocabulary Modelling Reuse available non-ontological resources Highly reliable Web Sites Search for suitable Domain-related sitesnon-ontological resources Government Catalogs are there Yes Build the vocabulary by suitable transforming available resources? resources NoBuild the vocabulary from scratch 25
  26. 26. Vocabulary Modelling GeoLinkedData WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers (rivers, lakes, etc.) Vocabulary for instants, intervals, , , durations, etc. Names and international code Ontology for OGC systems for Geography Markup territories and Language g g groupsClasses 33 33Object Properties j p 44 44Data Properties 318 318 http://neon-toolkit.org/ 26
  27. 27. Guidelines for Publishing Linked Data 27
  28. 28. Generation of the RDF Data NOR2O INE ODEMapster IGN Geospatial Geometry2RDF columnIGN 28
  29. 29. Generation of the RDF Data NOR2OIndustry Production Index YearProvince NOR2O 29
  30. 30. Generation of the RDF Data R2O & ODEMapster• R2O is an extensible fully declarative language to describe extensible, mappings between relational database schemas and ontologies.• The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster 30
  31. 31. Generation of the RDF Data R2O & ODEMapster• Creation of the R2O Mappings 31
  32. 32. Generation of the RDF Data R2O & ODEMapster Excerpt of the R2O document32
  33. 33. Generation of the RDF Data geometry2rdf• Tool for generating RDF from geometrical information• The geometry could be available in GML or WKT• The RDF generated follows our Geometry Model http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf 33
  34. 34. Generation of the RDF Data geometry2rdf Oracle STO UTIL packageSELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311GeometryFROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta=Arroyo 34
  35. 35. Generation of the RDF Data geometry2rdf
  36. 36. Generation of the RDF Data Geometry Model geoes: http://geo.linkeddata.es/ geo: http://www.w3.org/2003/01/geo/wgs84_pos# geoes:ontology/Geometría rdfs:subClassOf rdfs:subClassOf rdfs:subClassOf geo:Point geoes:ontology/Curva geoes:ontology/Polígono formadoPor formadoPor 39geo:lat 39 geo:long Collection of 2 or Collection of 3 or more geo:Points more geo:Points 36
  37. 37. Generation of the RDF DataRDF generated according to our Geometry Model 1 2 0 0 37
  38. 38. Generation of the RDF Data URI Generation• URIs are extremely relevant in this process since they are the key for the alignment of heterogeneous resources that come from different data sources. • Cool URIs 1 • UK Cabinet Office 2• Examples: http://geo.linkeddata.es/ontology/{class/property} http://geo.linkeddata.es/ontology/Lago http://geo.linkeddata.es/resource/dataset/type/{resourcename} http://geo linkeddata es/resource/dataset/type/{resourcename} http://geo.linkeddata.es/resource/Provincia/Madrid 1 http://www.w3.org/TR/cooluris/ 2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf 38
  39. 39. Generation of the RDF Data Provenance Information• It is relevant • to manage the provenance information of the resources • to establish the license of the information• Example Pubby: http://www4.wiwiss.fu-berlin.de/pubby/ 39
  40. 40. Guidelines for Publishing Linked Data 40
  41. 41. Publication of the RDF data map4rdf map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/ HTML Linked Data SPARQL Including Provenance Pubby Support http://www4.wiwiss.fu-berlin.de/pubby/ Pubby 0.3 Virtuoso 6.1.0 41
  42. 42. Guidelines for Publishing Linked Data 42
  43. 43. Data Cleansing• To find possible errors, identified by Hogan et al. • http-level issues such as accessibility and derefencability issues, derefencability, e.g., HTTP URIs return 40x/50x errors • reasoning issues such as namespace without vocabulary, e.g., rss:item term invented • malformed/incompatible datatypes, e.g., “true” as xsd:int• To fix the identified errors• Example, encoding URIs • Special characters á é ñ á, é, • http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga 43
  44. 44. Guidelines for Publishing Linked Data 44
  45. 45. Linking the RDF Data Identify suitable data sets http://ckan.net as li ki t linking targets t Discover relationships between data itemsLIMES Silk Frameworkhttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/ Validate the relationships discovered sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 45
  46. 46. Linking the RDF Data GeoLinkedData GeoLinked Data DBPedia GeoNames …. …. ….http://dbpedia.org/re http://geo.linkeddata http://sws.geoname source/Madrid .es/.../Madrid s.org/6355233/ …. …. …. 46
  47. 47. Linking the RDF Data sameAs Validatorhttp://oegdev.dia.fi.upm.es:8080/sameAs/ 47
  48. 48. Guidelines for Publishing Linked Data 48
  49. 49. Enable Effective Discovery Register the dataset into CKAN Registry• Add the dataset to CKAN, the open registry of data and content packages• Minimum information • Name, unique ID for your data set on CKAN • Title, full name of your data set , y • URL, link to the data set home page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation 49
  50. 50. Enable Effective Discovery Sitemap protocol• Used by web crawlers• Efficiently find all your content & discover what has been updated http://sitemaps.org/A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 50
  51. 51. Enable Effective DiscoverySindice: the best RDF search engine 51
  52. 52. Enable Effective Discovery sitemap4rdf• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap sitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / / Example: sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/• run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ 52
  53. 53. Enable Effective Discovery Submit the sitemap location - Sindice• http://sindice.com/main/submit 53
  54. 54. Enable Effective Discovery Submit the sitemap location - Google• https://www.google.com/webmasters/tools/ 54
  55. 55. ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 55
  56. 56. DEMOhttp://geo.linkeddata.es/browserhttp://geo linkeddata es/browser 56
  57. 57. Provinces57
  58. 58. Capital of Province58
  59. 59. Provinces – Industry Production Index 59
  60. 60. Beaches60
  61. 61. DEMOhttp://webenemasuno.linkeddata.es/http://webenemasuno linkeddata es/ 61
  62. 62. Trips62
  63. 63. Guide Locations63
  64. 64. Guide64
  65. 65. Future Work65
  66. 66. Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGUI 2011 – Brasília, Brazil 12th May, 2011

×