• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Methodological Guidelines for Publishing Linked Data
 

Methodological Guidelines for Publishing Linked Data

on

  • 1,434 views

Updated version of our methodological guidelines for publishing Linked Data

Updated version of our methodological guidelines for publishing Linked Data

Statistics

Views

Total Views
1,434
Views on SlideShare
1,433
Embed Views
1

Actions

Likes
0
Downloads
28
Comments
0

1 Embed 1

http://lab.isoco.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Methodological Guidelines for Publishing Linked Data Methodological Guidelines for Publishing Linked Data Presentation Transcript

    • Methodological Guidelines for Publishing Linked Data g Boris Villazón-Terrazas, Oscar Corcho Facultad de Informática, Universidad Politécnica de Madrid , Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net {bvillazon,ocorcho}@fi.upm.es Phone: 34 91 3366605 Fax: 34 91 3524819 34.91.3366605, 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Victor Saquicela, AlVi t S i l Alexander d L ó and many others th t we d de León, d th thatmay have omitted.WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
    • Main ReferencesWood, David (Ed) Linking Government Data - 2011Methodological Guidelines for Publishing Government Linked DataBoris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-PérezBest Practices for Publishing Linked DataW3C Editor’s Draft – Government Linked Data Working GroupMichael Hausenblas, Bernadette Hyland, Boris Villazón-Terrazashttps://dvcs.w3.org/hg/gld/raw-file/bcb72f87b5cc/bp/index.htmlCookbook for Open Government Linked DataW3C Editor’s Draft – Government Linked Data Working GroupBernadette Hyland, Boris Villazón-Terrazas, Sarven Capadislihttp://www.w3.org/2011/gld/wiki/Linked_Data_Cookbookhttp://www w3 org/2011/gld/wiki/Linked Data Cookbook
    • Guidelines for Publishing Linked Data• The process of publishing Linked Data has an iterative incremental life cycle model.• Based on our experience in the production of Linked Data in several Governmental Contexts, have been applied in real case scenarios. 3
    • 4
    • 5
    • Specification• Identification and analysis of the data sources• URI design• Definition of the license 6
    • Specification Identification and analysis of the data sourcesWe have to distinguish• O Open and publish d t th t government agencies h d bli h data that t i have not yet opened up and published • Task that may require contacting to specific government data owners to get access to their legacy data• Reuse and leverage on data already opened up and p published by g y government agencies g • Task to look for these data in public government catalogs • Open Government Data • datacatalogs org datacatalogs.org • Open Government Catalog 7
    • Specification Identification and analysis of the data sourcesAfter we have identified and selected the government data sources• Search and compile all the available data and documentation about those resources• Identify the schema of those resources including conceptual components and th i relationships t l t d their l ti hi• Identify the items in the domain i e things whose domain, i.e., properties and relations are described in the data sources 8
    • Specification GeoLinkedData – Identification of the data sources Agreement with the IGN IGNNational Geographic Institute of Spain Oracle & MySQL Data D t sources available il bl in a public data catalog INENational Statistic Institute of Spain 9
    • Specification GeoLinkedData – Analysis of the data sources YearProvince Industry Production Index 10
    • Specification URI Design• Use meaningful URIs, instead of opaque URIs, when possible• Separate TBox (ontology model) from ABox (instances) URIs URIs. • Base URI http://data.gov.bo/ http://health.data.gov.bo/ • TBox URIs http://data.gov.bo/ontology/{class|property} p g gy { |p p y} • ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque http://data gov bo/resource/province/Tiraque 11
    • Specification GeoLinkedData - URI design• Base URI http://linkeddata.es/ http://geo.linkeddata.es/• TBox URIs http://geo.linkeddata.es/ontology/{concept|property} http://geo.linkeddata.es/ontology/Provincia http://geo linkeddata es/ontology/Provincia• ABox URIs http://geo.linkeddata.es/resource/{r. type}/{r. name} http://geo.linkeddata.es/resource/Provincia/Madrid 12
    • Specification Definition of the license• Several possibilities • The UK Open Government License • Open Database License • Public Domain Dedication and License • Open Data Commons Attribution License • The C Creative C Commons LicensesIt is also possible to reuse and apply an existing license p pp y g of the government data sources. 13
    • Specification GeoLinkedData - Definition of the license• Reusing the original license of the government data sources. IGN and INE data sources have their own license, similar t Att ib ti Sh li i il to Attribution-Share Alik 2 5 G Alike 2.5 Generic i License http://creativecommons.org/licenses/by-sa/2.5/ 14
    • 15
    • Modelling Ontology• An ontology is an engineering artifact, which provides: • A set of terms • A set of explicit assumptions regarding the intended meaning of the terms. • Almost always including concepts and their classification • Almost always including properties between concepts• Shared understanding of a domain of interest• Ontologies expressed in OWL or RDF(S), both based on RDF 16
    • Modelling Reuse available vocabulariesSearch f suitableS h for it bl vocabularies Linked Open Vocabularies are there Yes Build the vocabulary by suitable reusing available g vocabularies? vocabularies No … 17
    • Modelling Reuse available non-ontological resources Highly reliable Web Sites Search f suitable S h for it bl Domain related Domain-related sitesnon-ontological resources Government Catalogs are there Yes Build the vocabulary by suitable transforming available t f i il bl resources? resources NoBuild the vocabulary from scratch 18
    • Modelling GeoLinkedData WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers, lakes, etc.) Vocabulary for instants, intervals, durations, etc. Names and international code Ontology for OGC systems for Geography Markup territories and Language groupsClasses 33 33Object Properties 44 44Data Properties 318 318 http://neon-toolkit.org/ 19
    • Modelling GeoLinkedData20
    • 21
    • Generation• Transformation• Data cleansing• Linking 22
    • Generation Transformation• Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created i th modelling activity b l t d in the d lli ti it• Some tools • CSV and spreadsheets • RDF extension of Google Refine, XLWrap, RDF123, NOR2O • RDB • D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML • XML • GRDDL, ReDeFer 23
    • Generation GeoLinkedData - Transformation NOR2O INE ODEMapster IGN Geospatial Geometry2RDF columnIGN 24
    • Generation GeoLinkedData - TransformationIndustry Production Index YearProvince NOR2O 25
    • Generation GeoLinkedData - Transformation• R2O is an e te s b e, fully dec a at e language to desc be s a extensible, u y declarative a guage describe mappings between relational database schemas and ontologies.• The ODEMapster processor generates RDF instances from relational instances based on the mapping description pp g p expressed in the R2O document www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster 26
    • Generation GeoLinkedData - Transformation• Creation of the R2O Mappings 27
    • GenerationGeoLinkedData - Transformation Excerpt of the R2O document 28
    • Generation GeoLinkedData - Transformation• Tool for generating RDF from geometrical information• The geometry could be available in GML or WKT• The RDF generated follows our Geometry Model http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf 29
    • Generation GeoLinkedData - Transformation Oracle STO UTIL packageSELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311GeometryFROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta=Arroyo 30
    • GenerationGeoLinkedData - Transformation
    • Generation Data Cleansing• To find possible errors, identified by Hogan et al. • http-level issues, such as accessibility and derefencability, e.g., e g HTTP URIs ret rn 40 /50 errors return 40x/50x • reasoning issues such as namespace without vocabulary, e.g., rss:item term invented • malformed/incompatible datatypes, e.g., “true” as xsd:int• To fix the identified errors 32
    • Generation GeoLinkedData – Data Cleansing• Errors • Some resources, with the same name, were mixed. For example, e ample Granada municipality belongs to Granada m nicipalit province, and La Granada municipality belongs to Barcelona Province. • Autonomous communities that only have one province, e.g., Murcia Region, missed some municipalities, but their corresponding provinces, e g Murcia Province have the provinces e.g., Province, correct number of municipalities. • S Some hydrographical resources missed some parts of their f geometrical information. 33
    • Generation Linking Identify suitable data sets http://ckan.net as linking targets Discover relationships between data itemsLIMES Silk Frameworkhttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/ Validate the relationships discovered sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 34
    • Generation GeoLinkedData - Linking GeoLinked Data DBPedia GeoNames …. …. ….http://dbpedia.org/re http://geo.linkeddata http://sws.geoname source/Madrid .es/.../Madrid s.org/6355233/ …. …. …. 35
    • Generation GeoLinkedData - Linkinghttp://oegdev.dia.fi.upm.es:8080/sameAs/http://oegdev dia fi upm es:8080/sameAs/ 36
    • 37
    • Publication• Dataset publication• Metadata publication• Dataset discovery 38
    • Publication Dataset Publication• Tools for storing RDF • Virtuoso Universal Server, Jena, Sesame, 4Store, YARS, OWLIM• SPARQL endpoint and Linked Data frontend • Pubby, Talis Platform, Fuseki 39
    • Publication Metadata Publication• VoID allows to express metadata about RDF datasets• Open Provenance Model 40
    • Publication Dataset discovery• Register the dataset into CKAN Registry• Generate sitemap files for your dataset, by using sitemap4rdf• Submit the sitemap location to Google and Sindice http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation 41
    • Publication GeoLinkedData – Dataset publication HTML Linked Data SPARQL Including Provenance Pubby Supporthttp://www4.wiwiss.fu-berlin.de/pubby/ Pubby 0.3 Virtuoso 6.1.0 610 42
    • PublicationGeoLinkedData – Dataset discovery 43
    • 44
    • ExploitationStreaming resources 45
    • Exploitation GeoLinkedData http://oegdev.dia.fi.upm.es/projects/map4rdf/map4rdf: • Google maps viewer of RDF resources • Resources with spatial information • Extensible with google plugins • Used in other applications like Aemet Goodrelations Aemet, map4rdf SPARQL Triplestore 46
    • DEMOhttp://geo.linkeddata.es/browser 47
    • Provinces48
    • Capital of Province49
    • Provinces – Industry Production Index 50
    • Beaches51
    • Methodological Guidelines for Publishing Linked Data g Boris Villazón-Terrazas, Oscar Corcho Facultad de Informática, Universidad Politécnica de Madrid , Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net {bvillazon,ocorcho}@fi.upm.es Phone: 34 91 3366605 Fax: 34 91 3524819 34.91.3366605, 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,Victor Saquicela, AlVi t S i l Alexander d L ó and many others th t we d de León, d th thatmay have omitted.WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0