• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Methodological Guidelines for Publishing Linked Data
 

Methodological Guidelines for Publishing Linked Data

on

  • 960 views

Methodological Guidelines for Publishing Linked Data presented at CONSEGI 2011

Methodological Guidelines for Publishing Linked Data presented at CONSEGI 2011

Statistics

Views

Total Views
960
Views on SlideShare
960
Embed Views
0

Actions

Likes
0
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Methodological Guidelines for Publishing Linked Data Methodological Guidelines for Publishing Linked Data Presentation Transcript

    • Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGUI 2011 – Brasília, Brazil 12th May, 2011
    • ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 2
    • ToC• Introduction to Linked Data• Guidelines for Publishing Linked Data• Demo 3
    • Classic Web MovieDB Data exposed to the Web via HTML, pdf, etc. CIA World FactBook© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 4
    • Classic Web Information from Complexpages single queries can be multiple over found via pages / data search engines sources??© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 5
    • What do we actually want? • Use the Web like a single global database CIA World MovieDB FactBook© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 6
    • Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters used Identifier), to identify a name or a resource on the Internet.Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the WebAccess Mechanism: HTTPConnection: Typed Links 8000000 “Even the Rain” http://.../population http://.../name http://.../filming_location http://cia.../Bolivia http://imdb.../TLLuvia p CIA World MovieDB FactBook © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig 7
    • In a nutshell• An extension of the current Web… • … where information and services data are given well-defined and explicitly represented meaning, … • … so that it can be shared and used by humans and machines ... machines, • ... better enabling them to work in cooperation• How? • Promoting information exchange by tagging web content with machine processable descriptions of its meaning. • A d t h l i and i f t t And technologies d infrastructure to do this • And clear principles on how to publish data 8
    • The four principles (Tim Berners Lee, 2006)1. Use URIs as names • http://www.w3.org/D for things esignIssues/Linked2. Use HTTP URIs so Data.html that people can look up those names. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html3. When someone looks up a URI, provide useful information, using th standards i the t d d (RDF*, SPARQL)4.4 Include links to other URIs, so that they can discover more things. 9
    • So does that mean I have to publish my data as Linked Data, now? • But, why? • What was your incentive to publish an HTML page in 1990? • Share data in documents and because your neighbor was doing it • So, why should we publish Linked Data in 2011? , y p • Share data as data and because your neighbor is doing it© Slide adapted from “Introduction to Linked Data”- Juan Sequeda 10
    • And guess who is starting to publish Linked Data now? • UK Government • US Government • BBC • Open Calais • Freebase • NY Times • CNET • Dbpedia • …. 11
    • Linked Open Data evolution 2007  2008  2009 12 12
    • Linked Open Data2010http://richard.cyganiak.de/2007/10/lod/ 13
    • ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 14
    • Linked Data in OEG• GeoLinkedData is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. p g p http://geo.linkeddata.es• El Viajero Linked Data is project that focuses on the integration of the contents produced by newspapers and digital platforms belonging to Prisa Group Group. http://webenemasuno.linkeddata.es/• A project with the Biblioteca Nacional to publish the library information as Linked Data. y http://cultura.linkeddata.es/visualizer/ 15
    • Linked Data in OEG• Tools for generating and cosuming Linked Data, e.g., • geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf • map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/• Spanish Thematic Network of Linked Data http://red.linkeddata.es p » Group leader: Ontology Engineering Group » 19 Research Groups » 4 companies 16
    • Guidelines for Publishing Linked Data 17
    • Guidelines for Publishing Linked Data 18
    • Identification of the data sources• Guidelines based on the Open Data Manual 1• Two possibilities • To find the data sources already available in a public data catalog, e.g., Aporta project 2 • To get an agreement with a particular government body to p publish its data sources, e.g., GeoLinkedData - IGN g 1 http://opendatamanual.org/ 2 http://aporta.es 19
    • Identification of the data sources GeoLinkedData Agreement with the IGN IGNNational Geographic Institute of Spain g p p Oracle & MySQL Data sources available in a public data catalog INENational Statistic Institute of Spain 20
    • Identification of the data sources IGN & INE YearProvince Industry Production Index 21
    • Guidelines for Publishing Linked Data 22
    • Vocabulary Modelling Ontology• An ontology is an engineering artifact, which provides: • A set of terms • A set of explicit assumptions regarding the intended meaning of the terms. • Almost always including concepts and their classification • Almost always including properties between concepts• Shared understanding of a domain of interest nderstanding 23
    • Vocabulary Modelling Reuse available vocabulariesSearch for suitable vocabularies Linked Open Vocabularies are there Yes Build the vocabulary by suitable reusing available vocabularies? vocabularies No … 24
    • Vocabulary Modelling Reuse available non-ontological resources Highly reliable Web Sites Search for suitable Domain-related sitesnon-ontological resources Government Catalogs are there Yes Build the vocabulary by suitable transforming available resources? resources NoBuild the vocabulary from scratch 25
    • Vocabulary Modelling GeoLinkedData WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers (rivers, lakes, etc.) Vocabulary for instants, intervals, , , durations, etc. Names and international code Ontology for OGC systems for Geography Markup territories and Language g g groupsClasses 33 33Object Properties j p 44 44Data Properties 318 318 http://neon-toolkit.org/ 26
    • Guidelines for Publishing Linked Data 27
    • Generation of the RDF Data NOR2O INE ODEMapster IGN Geospatial Geometry2RDF columnIGN 28
    • Generation of the RDF Data NOR2OIndustry Production Index YearProvince NOR2O 29
    • Generation of the RDF Data R2O & ODEMapster• R2O is an extensible fully declarative language to describe extensible, mappings between relational database schemas and ontologies.• The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster 30
    • Generation of the RDF Data R2O & ODEMapster• Creation of the R2O Mappings 31
    • Generation of the RDF Data R2O & ODEMapster Excerpt of the R2O document32
    • Generation of the RDF Data geometry2rdf• Tool for generating RDF from geometrical information• The geometry could be available in GML or WKT• The RDF generated follows our Geometry Model http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf 33
    • Generation of the RDF Data geometry2rdf Oracle STO UTIL packageSELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311GeometryFROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta=Arroyo 34
    • Generation of the RDF Data geometry2rdf
    • Generation of the RDF Data Geometry Model geoes: http://geo.linkeddata.es/ geo: http://www.w3.org/2003/01/geo/wgs84_pos# geoes:ontology/Geometría rdfs:subClassOf rdfs:subClassOf rdfs:subClassOf geo:Point geoes:ontology/Curva geoes:ontology/Polígono formadoPor formadoPor 39geo:lat 39 geo:long Collection of 2 or Collection of 3 or more geo:Points more geo:Points 36
    • Generation of the RDF DataRDF generated according to our Geometry Model 1 2 0 0 37
    • Generation of the RDF Data URI Generation• URIs are extremely relevant in this process since they are the key for the alignment of heterogeneous resources that come from different data sources. • Cool URIs 1 • UK Cabinet Office 2• Examples: http://geo.linkeddata.es/ontology/{class/property} http://geo.linkeddata.es/ontology/Lago http://geo.linkeddata.es/resource/dataset/type/{resourcename} http://geo linkeddata es/resource/dataset/type/{resourcename} http://geo.linkeddata.es/resource/Provincia/Madrid 1 http://www.w3.org/TR/cooluris/ 2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf 38
    • Generation of the RDF Data Provenance Information• It is relevant • to manage the provenance information of the resources • to establish the license of the information• Example Pubby: http://www4.wiwiss.fu-berlin.de/pubby/ 39
    • Guidelines for Publishing Linked Data 40
    • Publication of the RDF data map4rdf map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/ HTML Linked Data SPARQL Including Provenance Pubby Support http://www4.wiwiss.fu-berlin.de/pubby/ Pubby 0.3 Virtuoso 6.1.0 41
    • Guidelines for Publishing Linked Data 42
    • Data Cleansing• To find possible errors, identified by Hogan et al. • http-level issues such as accessibility and derefencability issues, derefencability, e.g., HTTP URIs return 40x/50x errors • reasoning issues such as namespace without vocabulary, e.g., rss:item term invented • malformed/incompatible datatypes, e.g., “true” as xsd:int• To fix the identified errors• Example, encoding URIs • Special characters á é ñ á, é, • http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga 43
    • Guidelines for Publishing Linked Data 44
    • Linking the RDF Data Identify suitable data sets http://ckan.net as li ki t linking targets t Discover relationships between data itemsLIMES Silk Frameworkhttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/ Validate the relationships discovered sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 45
    • Linking the RDF Data GeoLinkedData GeoLinked Data DBPedia GeoNames …. …. ….http://dbpedia.org/re http://geo.linkeddata http://sws.geoname source/Madrid .es/.../Madrid s.org/6355233/ …. …. …. 46
    • Linking the RDF Data sameAs Validatorhttp://oegdev.dia.fi.upm.es:8080/sameAs/ 47
    • Guidelines for Publishing Linked Data 48
    • Enable Effective Discovery Register the dataset into CKAN Registry• Add the dataset to CKAN, the open registry of data and content packages• Minimum information • Name, unique ID for your data set on CKAN • Title, full name of your data set , y • URL, link to the data set home page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation 49
    • Enable Effective Discovery Sitemap protocol• Used by web crawlers• Efficiently find all your content & discover what has been updated http://sitemaps.org/A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 50
    • Enable Effective DiscoverySindice: the best RDF search engine 51
    • Enable Effective Discovery sitemap4rdf• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap sitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / / Example: sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/• run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ 52
    • Enable Effective Discovery Submit the sitemap location - Sindice• http://sindice.com/main/submit 53
    • Enable Effective Discovery Submit the sitemap location - Google• https://www.google.com/webmasters/tools/ 54
    • ToC• Introduction to Linked Data• G id li Guidelines f P bli hi Li k d D t for Publishing Linked Data• Demo 55
    • DEMOhttp://geo.linkeddata.es/browserhttp://geo linkeddata es/browser 56
    • Provinces57
    • Capital of Province58
    • Provinces – Industry Production Index 59
    • Beaches60
    • DEMOhttp://webenemasuno.linkeddata.es/http://webenemasuno linkeddata es/ 61
    • Trips62
    • Guide Locations63
    • Guide64
    • Future Work65
    • Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net http://www oeg upm net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGUI 2011 – Brasília, Brazil 12th May, 2011