Presentation done by Ander García, Maria Teresa Linaza, Javier Franco and Miriam Juaristi, during "Data management" workshop, of the ENTER2015 eTourism conference.
Blooming Together_ Growing a Community Garden Worksheet.docx
Methodology for the publication of Linked Open Data from small and medium size DMO
1. ENTER 2015 Research Track Slide Number 1
Methodology for the publication of
Linked Open Data from small and
medium size DMO
Ander García, Maria Teresa Linaza, Javier Franco and
Miriam Juaristi
Vicomtech-IK4, Spain
agarcia@vicomtech.org
http://www.vicomtech.org
2. ENTER 2015 Research Track Slide Number 2
Outline
• Introduction
• Methodology
• Application to a small DMO
• Conclussions
3. ENTER 2015 Research Track Slide Number 3
Introduction
• Linked Open Data (LOD) = OD+LD:
– non-privacy-restricted and non-confidential
data produced with public money
– made available without any restrictions on its
usage or distribution
– published on the Web
– machine readable
– its meaning is explicitly defined
– it is linked to other external dataset
4. ENTER 2015 Research Track Slide Number 4
Introduction
• Publishing LOD involves 3 basic steps:
– assign URIs to the entities described by the
dataset and dereference these URIs over the
HTTP protocol into RDF representations
– set RDF links to other data sources on the Web
– provide metadata about published data
• Researchers have proposed different
methodologies to publish LOD for a variety
of domains
5. ENTER 2015 Research Track Slide Number 5
Introduction
• Although previous examples could serve as
guidelines for DMOs, the differences
between domains require specific
methodologies and examples for tourism
LOD
• Benefits for DMOs:
– Syntactic interoperatiblity
– Reduction of the costs of applications
– Provision of innovative added-value services
6. ENTER 2015 Research Track Slide Number 6
Introduction
• Tourism OD:
– Several DMOs publish OD, for example Open
Data Euskadi from Basque Country publishes
(starting from 2010):
• POIs
• Stays
• Restaurants
• Cultural entities
• ….
7. ENTER 2015 Research Track Slide Number 7
Introduction
• Tourism LOD:
– Few examples
– No linking to other datasets
– No reuse of ontologies
– No active URIs
8. ENTER 2015 Research Track Slide Number 8
Introduction
• Objective: Publication of 5 star LOD by
DMOs
• Benefits for DMOs:
– Syntactic interoperatiblity
– Reduction of the costs of applications
– Provision of innovative added-value services
9. ENTER 2015 Research Track Slide Number 9
Methodology
• Oriented to small and medium size DMOs
• Based on Open Source tools
• Main steps:
– Configuration
– Pre-processing
– Triplification
– Publication
10. ENTER 2015 Research Track Slide Number 10
Configuration
• Non-technical issues:
– Selection and categorization of data
– Publication license, main options:
• public domain: free to share, create and adapt
• attribution: requires to include mentions to source
data
• share-alike: requires public reusers of your data to
share back changes (and attribute).
• Keep-open: in case of redistribution of the data or
its adaptation, it requires a redistribution of a free
version
11. ENTER 2015 Research Track Slide Number 11
Configuration
• Technical issues:
– Design of URIs:
– Multilingual data publicatin patterns:
• All languages associated to the same/diferent URI
• Use labels
• …
12. ENTER 2015 Research Track Slide Number 12
Pre-processing
• Extract, clean, and normalise data:
– Format for strings and numbers
– Format for multilingüal values
– Storage of multivalued fields
– Detection of errors and non-existant values
• LOD Refine software to transform original
data based on formulas expressed on
GRELL
13. ENTER 2015 Research Track Slide Number 13
Triplification
• Analyse the domain: ontologies, datasets,
vocabularies
• Create new ontologies and/or vocabularies
if required
• Express data as RDF triples
• Link data to external datasets
• LOD Refine software to generate RDF
triplles
14. ENTER 2015 Research Track Slide Number 14
Publication
• Both as OD (repository) and LOD (triple
store)
• Include metadata adhering to the Data
Catalog Vocabulary (DCAT)
• DKAN (OD) and Virtuoso (LOD) software
15. ENTER 2015 Research Track Slide Number 15
Methodology
Step Task Tool
Configuration
Select data -
Select the license to publish the data -
Design the URI scheme -
Select a multilingual data publication pattern -
Pre
processing
Clean and normalise the data LOD Refine
Triplification
Select existing ontologies, vocabularies and LOD -
Define new ontologies and vocabularies (if required) Protégé
Triplification LOD Refine
Link to external LOD LOD Refine
Publication
Upload the RDF file to a triple store Virtuoso
Create the dataset and add metadata DKAN
Upload the resources of the datasets DKAN
16. ENTER 2015 Research Track Slide Number 16
Application to a small DMO
• Dataset: 143 POIs of a regional DMO, Urola
Kosta, in four languages (Spanish, Basque,
English, French) and five categories
• Configuration:
– PDDL license, no restrictions
– URI: /data/tourism/BASQUE_NAME
• Ñ replaced by ‘in’ and spaces by ‘_’
– Labels for multilingual data
17. ENTER 2015 Research Track Slide Number 17
Application to a small DMO
• Pre-processing
– Names to title case, from ERREXIl to Errexil
– Prefix added to telephone numbers:
(+34)943309230
– Secondary mobile numbers stored in a new
column
18. ENTER 2015 Research Track Slide Number 18
Application to a small DMO
• Triplification
– Ontologies:
• vCard: Contact information
• Dublin Core: Metadata about the resource
– Linked Datasets:
• Geonames: Locations
• Dbpedia: Categories and locations
19. ENTER 2015 Research Track Slide Number 19
Application to a small DMO
• Triplification
20. ENTER 2015 Research Track Slide Number 20
Application to a small DMO
• Publication (OD and LOD)
21. ENTER 2015 Research Track Slide Number 21
Application to a small DMO
• Mobile application:
22. ENTER 2015 Research Track Slide Number 22
Application to a small DMO
• Mobile application:
– Data available through three channels:
• Direct download (CSV, JSON, RDF,…)
• DKAN Datastore API
• Virtuoso SPARQL
23. ENTER 2015 Research Track Slide Number 23
Conclussions
• We have presented a methodology and
Open Source tools for DMOs to publish five
star tourism LOD
• The pre-processing and triplification steps
are the hardest steps, but they are only
done once per each type of data to be
published. Then the performed operations
can be applied again directly
24. ENTER 2015 Research Track Slide Number 24
Conclussions
• We have shown a example of a mobile
application based on published data:
– Data accesible through different channels:
direct download, DKAN API, SPARQL
• Tourism LOD can provide multiple benefits
for DMOs and society but more tools, best
practices and standars are required by
DMOs
25. ENTER 2015 Research Track Slide Number 25
Future work
• More publication examples:
– Statistical data, based on the RDF Data Cube
vocabulary
– Data stored at relational databases
• Integrate standards such as the UNE
178301:2015 norm