ENTER 2015 Research Track Slide Number 1
Methodology for the publication of
Linked Open Data from small and
medium size DMO
Ander García, Maria Teresa Linaza, Javier Franco and
Miriam Juaristi
Vicomtech-IK4, Spain
agarcia@vicomtech.org
http://www.vicomtech.org
ENTER 2015 Research Track Slide Number 2
Outline
• Introduction
• Methodology
• Application to a small DMO
• Conclussions
ENTER 2015 Research Track Slide Number 3
Introduction
• Linked Open Data (LOD) = OD+LD:
– non-privacy-restricted and non-confidential
data produced with public money
– made available without any restrictions on its
usage or distribution
– published on the Web
– machine readable
– its meaning is explicitly defined
– it is linked to other external dataset
ENTER 2015 Research Track Slide Number 4
Introduction
• Publishing LOD involves 3 basic steps:
– assign URIs to the entities described by the
dataset and dereference these URIs over the
HTTP protocol into RDF representations
– set RDF links to other data sources on the Web
– provide metadata about published data
• Researchers have proposed different
methodologies to publish LOD for a variety
of domains
ENTER 2015 Research Track Slide Number 5
Introduction
• Although previous examples could serve as
guidelines for DMOs, the differences
between domains require specific
methodologies and examples for tourism
LOD
• Benefits for DMOs:
– Syntactic interoperatiblity
– Reduction of the costs of applications
– Provision of innovative added-value services
ENTER 2015 Research Track Slide Number 6
Introduction
• Tourism OD:
– Several DMOs publish OD, for example Open
Data Euskadi from Basque Country publishes
(starting from 2010):
• POIs
• Stays
• Restaurants
• Cultural entities
• ….
ENTER 2015 Research Track Slide Number 7
Introduction
• Tourism LOD:
– Few examples
– No linking to other datasets
– No reuse of ontologies
– No active URIs
ENTER 2015 Research Track Slide Number 8
Introduction
• Objective: Publication of 5 star LOD by
DMOs
• Benefits for DMOs:
– Syntactic interoperatiblity
– Reduction of the costs of applications
– Provision of innovative added-value services
ENTER 2015 Research Track Slide Number 9
Methodology
• Oriented to small and medium size DMOs
• Based on Open Source tools
• Main steps:
– Configuration
– Pre-processing
– Triplification
– Publication
ENTER 2015 Research Track Slide Number 10
Configuration
• Non-technical issues:
– Selection and categorization of data
– Publication license, main options:
• public domain: free to share, create and adapt
• attribution: requires to include mentions to source
data
• share-alike: requires public reusers of your data to
share back changes (and attribute).
• Keep-open: in case of redistribution of the data or
its adaptation, it requires a redistribution of a free
version
ENTER 2015 Research Track Slide Number 11
Configuration
• Technical issues:
– Design of URIs:
– Multilingual data publicatin patterns:
• All languages associated to the same/diferent URI
• Use labels
• …
ENTER 2015 Research Track Slide Number 12
Pre-processing
• Extract, clean, and normalise data:
– Format for strings and numbers
– Format for multilingüal values
– Storage of multivalued fields
– Detection of errors and non-existant values
• LOD Refine software to transform original
data based on formulas expressed on
GRELL
ENTER 2015 Research Track Slide Number 13
Triplification
• Analyse the domain: ontologies, datasets,
vocabularies
• Create new ontologies and/or vocabularies
if required
• Express data as RDF triples
• Link data to external datasets
• LOD Refine software to generate RDF
triplles
ENTER 2015 Research Track Slide Number 14
Publication
• Both as OD (repository) and LOD (triple
store)
• Include metadata adhering to the Data
Catalog Vocabulary (DCAT)
• DKAN (OD) and Virtuoso (LOD) software
ENTER 2015 Research Track Slide Number 15
Methodology
Step Task Tool
Configuration
Select data -
Select the license to publish the data -
Design the URI scheme -
Select a multilingual data publication pattern -
Pre
processing
Clean and normalise the data LOD Refine
Triplification
Select existing ontologies, vocabularies and LOD -
Define new ontologies and vocabularies (if required) Protégé
Triplification LOD Refine
Link to external LOD LOD Refine
Publication
Upload the RDF file to a triple store Virtuoso
Create the dataset and add metadata DKAN
Upload the resources of the datasets DKAN
ENTER 2015 Research Track Slide Number 16
Application to a small DMO
• Dataset: 143 POIs of a regional DMO, Urola
Kosta, in four languages (Spanish, Basque,
English, French) and five categories
• Configuration:
– PDDL license, no restrictions
– URI: /data/tourism/BASQUE_NAME
• Ñ replaced by ‘in’ and spaces by ‘_’
– Labels for multilingual data
ENTER 2015 Research Track Slide Number 17
Application to a small DMO
• Pre-processing
– Names to title case, from ERREXIl to Errexil
– Prefix added to telephone numbers:
(+34)943309230
– Secondary mobile numbers stored in a new
column
ENTER 2015 Research Track Slide Number 18
Application to a small DMO
• Triplification
– Ontologies:
• vCard: Contact information
• Dublin Core: Metadata about the resource
– Linked Datasets:
• Geonames: Locations
• Dbpedia: Categories and locations
ENTER 2015 Research Track Slide Number 19
Application to a small DMO
• Triplification
ENTER 2015 Research Track Slide Number 20
Application to a small DMO
• Publication (OD and LOD)
ENTER 2015 Research Track Slide Number 21
Application to a small DMO
• Mobile application:
ENTER 2015 Research Track Slide Number 22
Application to a small DMO
• Mobile application:
– Data available through three channels:
• Direct download (CSV, JSON, RDF,…)
• DKAN Datastore API
• Virtuoso SPARQL
ENTER 2015 Research Track Slide Number 23
Conclussions
• We have presented a methodology and
Open Source tools for DMOs to publish five
star tourism LOD
• The pre-processing and triplification steps
are the hardest steps, but they are only
done once per each type of data to be
published. Then the performed operations
can be applied again directly
ENTER 2015 Research Track Slide Number 24
Conclussions
• We have shown a example of a mobile
application based on published data:
– Data accesible through different channels:
direct download, DKAN API, SPARQL
• Tourism LOD can provide multiple benefits
for DMOs and society but more tools, best
practices and standars are required by
DMOs
ENTER 2015 Research Track Slide Number 25
Future work
• More publication examples:
– Statistical data, based on the RDF Data Cube
vocabulary
– Data stored at relational databases
• Integrate standards such as the UNE
178301:2015 norm

Methodology for the publication of Linked Open Data from small and medium size DMO

  • 1.
    ENTER 2015 ResearchTrack Slide Number 1 Methodology for the publication of Linked Open Data from small and medium size DMO Ander García, Maria Teresa Linaza, Javier Franco and Miriam Juaristi Vicomtech-IK4, Spain agarcia@vicomtech.org http://www.vicomtech.org
  • 2.
    ENTER 2015 ResearchTrack Slide Number 2 Outline • Introduction • Methodology • Application to a small DMO • Conclussions
  • 3.
    ENTER 2015 ResearchTrack Slide Number 3 Introduction • Linked Open Data (LOD) = OD+LD: – non-privacy-restricted and non-confidential data produced with public money – made available without any restrictions on its usage or distribution – published on the Web – machine readable – its meaning is explicitly defined – it is linked to other external dataset
  • 4.
    ENTER 2015 ResearchTrack Slide Number 4 Introduction • Publishing LOD involves 3 basic steps: – assign URIs to the entities described by the dataset and dereference these URIs over the HTTP protocol into RDF representations – set RDF links to other data sources on the Web – provide metadata about published data • Researchers have proposed different methodologies to publish LOD for a variety of domains
  • 5.
    ENTER 2015 ResearchTrack Slide Number 5 Introduction • Although previous examples could serve as guidelines for DMOs, the differences between domains require specific methodologies and examples for tourism LOD • Benefits for DMOs: – Syntactic interoperatiblity – Reduction of the costs of applications – Provision of innovative added-value services
  • 6.
    ENTER 2015 ResearchTrack Slide Number 6 Introduction • Tourism OD: – Several DMOs publish OD, for example Open Data Euskadi from Basque Country publishes (starting from 2010): • POIs • Stays • Restaurants • Cultural entities • ….
  • 7.
    ENTER 2015 ResearchTrack Slide Number 7 Introduction • Tourism LOD: – Few examples – No linking to other datasets – No reuse of ontologies – No active URIs
  • 8.
    ENTER 2015 ResearchTrack Slide Number 8 Introduction • Objective: Publication of 5 star LOD by DMOs • Benefits for DMOs: – Syntactic interoperatiblity – Reduction of the costs of applications – Provision of innovative added-value services
  • 9.
    ENTER 2015 ResearchTrack Slide Number 9 Methodology • Oriented to small and medium size DMOs • Based on Open Source tools • Main steps: – Configuration – Pre-processing – Triplification – Publication
  • 10.
    ENTER 2015 ResearchTrack Slide Number 10 Configuration • Non-technical issues: – Selection and categorization of data – Publication license, main options: • public domain: free to share, create and adapt • attribution: requires to include mentions to source data • share-alike: requires public reusers of your data to share back changes (and attribute). • Keep-open: in case of redistribution of the data or its adaptation, it requires a redistribution of a free version
  • 11.
    ENTER 2015 ResearchTrack Slide Number 11 Configuration • Technical issues: – Design of URIs: – Multilingual data publicatin patterns: • All languages associated to the same/diferent URI • Use labels • …
  • 12.
    ENTER 2015 ResearchTrack Slide Number 12 Pre-processing • Extract, clean, and normalise data: – Format for strings and numbers – Format for multilingüal values – Storage of multivalued fields – Detection of errors and non-existant values • LOD Refine software to transform original data based on formulas expressed on GRELL
  • 13.
    ENTER 2015 ResearchTrack Slide Number 13 Triplification • Analyse the domain: ontologies, datasets, vocabularies • Create new ontologies and/or vocabularies if required • Express data as RDF triples • Link data to external datasets • LOD Refine software to generate RDF triplles
  • 14.
    ENTER 2015 ResearchTrack Slide Number 14 Publication • Both as OD (repository) and LOD (triple store) • Include metadata adhering to the Data Catalog Vocabulary (DCAT) • DKAN (OD) and Virtuoso (LOD) software
  • 15.
    ENTER 2015 ResearchTrack Slide Number 15 Methodology Step Task Tool Configuration Select data - Select the license to publish the data - Design the URI scheme - Select a multilingual data publication pattern - Pre processing Clean and normalise the data LOD Refine Triplification Select existing ontologies, vocabularies and LOD - Define new ontologies and vocabularies (if required) Protégé Triplification LOD Refine Link to external LOD LOD Refine Publication Upload the RDF file to a triple store Virtuoso Create the dataset and add metadata DKAN Upload the resources of the datasets DKAN
  • 16.
    ENTER 2015 ResearchTrack Slide Number 16 Application to a small DMO • Dataset: 143 POIs of a regional DMO, Urola Kosta, in four languages (Spanish, Basque, English, French) and five categories • Configuration: – PDDL license, no restrictions – URI: /data/tourism/BASQUE_NAME • Ñ replaced by ‘in’ and spaces by ‘_’ – Labels for multilingual data
  • 17.
    ENTER 2015 ResearchTrack Slide Number 17 Application to a small DMO • Pre-processing – Names to title case, from ERREXIl to Errexil – Prefix added to telephone numbers: (+34)943309230 – Secondary mobile numbers stored in a new column
  • 18.
    ENTER 2015 ResearchTrack Slide Number 18 Application to a small DMO • Triplification – Ontologies: • vCard: Contact information • Dublin Core: Metadata about the resource – Linked Datasets: • Geonames: Locations • Dbpedia: Categories and locations
  • 19.
    ENTER 2015 ResearchTrack Slide Number 19 Application to a small DMO • Triplification
  • 20.
    ENTER 2015 ResearchTrack Slide Number 20 Application to a small DMO • Publication (OD and LOD)
  • 21.
    ENTER 2015 ResearchTrack Slide Number 21 Application to a small DMO • Mobile application:
  • 22.
    ENTER 2015 ResearchTrack Slide Number 22 Application to a small DMO • Mobile application: – Data available through three channels: • Direct download (CSV, JSON, RDF,…) • DKAN Datastore API • Virtuoso SPARQL
  • 23.
    ENTER 2015 ResearchTrack Slide Number 23 Conclussions • We have presented a methodology and Open Source tools for DMOs to publish five star tourism LOD • The pre-processing and triplification steps are the hardest steps, but they are only done once per each type of data to be published. Then the performed operations can be applied again directly
  • 24.
    ENTER 2015 ResearchTrack Slide Number 24 Conclussions • We have shown a example of a mobile application based on published data: – Data accesible through different channels: direct download, DKAN API, SPARQL • Tourism LOD can provide multiple benefits for DMOs and society but more tools, best practices and standars are required by DMOs
  • 25.
    ENTER 2015 ResearchTrack Slide Number 25 Future work • More publication examples: – Statistical data, based on the RDF Data Cube vocabulary – Data stored at relational databases • Integrate standards such as the UNE 178301:2015 norm