Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Oscar Corcho1, Idafen Santana-Pérez1,
Hugo Lafuente2, David Portolés3,
César Cano4, Alfredo Peris4 and José María Sub...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context
2
 IAEst: Instituto Aragonés de Estadí...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data infrastructure
3
...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data infrastructure
4
...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Predesigned reports
offered from Oracle BI
Web ...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data sharing
 En la We...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Goals
7
Extract those statistical reports, tran...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Results
8
 An easier-to-maintain data transfor...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Transformation and publication process
9
Initia...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
10
 Identify and down...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
11
 SKOS concept sche...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
12
 Measurement prope...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
13
 Continuous produ...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
14
 Each iteration g...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
15
 RDF data is stor...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data transformation. In summary…
bi.aragon.es
G...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data publication and use
17
 Data can be acces...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data API
http://opendata.aragon.es/herramientas...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data publication and use
19
 Aragopedia
o http...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Aragopedia
20
 Aragopedia
o JSON result of que...
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Conclusions (Results)
21
 An easier-to-maintai...
Oscar Corcho1, Idafen Santana-Pérez1,
Hugo Lafuente2, David Portolés3,
César Cano4, Alfredo Peris4 and José María Sub...
Upcoming SlideShare
Loading in …5
×

Publishing Linked Statistical Data: Aragón, a case study

501 views

Published on

Presentation at the Semstats2017 workshop (http://semstats.org/2017/) for the paper "Publishing Linked Statistical Data: Aragón, a Case Study", by Oscar Corcho, Idafen Santana-Pérez, Hugo Lafuente, David Portolés, César Cano, Alfredo Peris, José María Subero.

Published in: Government & Nonprofit
  • Be the first to comment

Publishing Linked Statistical Data: Aragón, a case study

  1. 1. Oscar Corcho1, Idafen Santana-Pérez1, Hugo Lafuente2, David Portolés3, César Cano4, Alfredo Peris4 and José María Subero4 1 Ontology Engineering Group, Universidad Politécnica de Madrid 2 Localidata 3 Idearium Consultores 4 Gobierno de Aragón Publishing Linked Statistical Data: Aragón, a case study ocorcho@fi.upm.es @ocorcho 22/10/2017 SemStats 2017 @ ISWC
  2. 2. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context 2  IAEst: Instituto Aragonés de Estadística o http://www.aragon.es/iaest o The statistical office from Aragón o Offering open data through • Open Data portal in Aragón (http://opendata.aragon.es/) • Their own portal (our interest is on the database of “estadística local”)
  3. 3. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data infrastructure 3  Existing data infrastructure o Data warehouse infrastructure based on an Oracle BI o Exports into different formats, including CSVs
  4. 4. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data infrastructure 4  Existing data infrastructure o Data warehouse infrastructure based on an Oracle BI o Exports into different formats, including CSVs o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL ocal.detalleDepartamento  Data retrieval and browsing o Taxonomy-based o Fixed filters coded in the app o User selects • Administrative division • The concrete municipality • Browses the folder structure o Data retrieved in HTML, PDF or CSV
  5. 5. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Predesigned reports offered from Oracle BI Web app for Estadística Local Context: Existing IAEst web app
  6. 6. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data sharing  En la Web del IAEst o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL ocal.detalleDepartamento  En OpenDataAragón o http://opendata.aragon.es/catalogo/edificios-superficie-y- vivienda-comarcas
  7. 7. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Goals 7 Extract those statistical reports, transform them into RDF according to W3C standards, curate them, link them to the existing Linked Data from Aragón (mostly URIs from municipalities and regions) and provide an API and a new user interface to make use of them
  8. 8. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Results 8  An easier-to-maintain data transformation process o Enriching existing Linked Data APIs from Aragón o Using GitHub for • Version control and archival • Continuous updates: detecting new data and data structures on a daily basis • https://github.com/aragonopendata/local-data-aragopedia/  Developer-friendly API  Additional user interface o Improving data retrieval and browsing capabilities  Side effect: data curation o Many errors and improvements detected in pre-existing CSV exports, which have been corrected throughout the process
  9. 9. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Transformation and publication process 9 Initial characterisation •Identify sources •Identify dimensions and measurements Transformation •Daily data download •Processing (UTF8) •Upload into GitHub •New dimensions/measures annotation •RDF transformation Publication and use •Linked Data APIs https://github.com/aragonopendata/local-data-aragopedia/
  10. 10. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 10  Identify and download data sources to be published (~1000) o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/resource/DatosDescarga-UTF8  Pre-process data (UTF-8 encoding, download error verification and retrials)  Identify potential dimensions and measurements o Analysis of column header names (e.g., municipio, comarca), and data content (how many different values) • https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/resource/heads.txt o From 700+ dimensions to ~500 • Curated by IAEst experts (e.g., Male, M, Males, Female, F, Females, Women, Men)
  11. 11. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 11  SKOS concept schemes for each dimension o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/codelists o Mapping files available in GitHub (e.g., https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/metadata/mapping-tipo-edificio- detalle.xlsx)
  12. 12. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 12  Measurement properties o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/dump/DatosTTL/codelists/propertie s.ttl  DSDs o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/dataStructures  Errors were identified during this phase o Same concept, different names (e.g. sexo and género) o Typos in header names o Columns with no values o Data belonging to wrong municipalities and districts o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/dump/errorReport.txt
  13. 13. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 13  Continuous production cycle o Update RDF as reports are generated, modified or removed  Executed every night o Retrieves all the reports from the list (generated before) o Checks whether the reports have been already transformed or if the contain new data o Hash signatures for each generated Data Cube • https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/resource/hashcode.csv • Used to compare data versions • If hashes do not match, the Data Cube is marked to be regenerated
  14. 14. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 14  Each iteration generates a GitHub issue, listing the cubes that have must be created, modified, etc. o https://github.com/aragonopendata/local-data- aragopedia/issues • https://github.com/aragonopendata/local-data- aragopedia/issues/93 (new data) • https://github.com/aragonopendata/local-data- aragopedia/issues/457 (datacube to delete, new configurations needed) o When user interaction is needed, this is reflected in the issue text, and the IAEst responsible needs to update it  RDF transformation is done according to the configuration file o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/metadata/Informe-01-010001- A-TC-TM-TP.xlsx
  15. 15. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 15  RDF data is stored in GitHub (new version) o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/informes  RDF data is stored in the Open Data Aragón SPARQL endpoint o http://opendata.aragon.es/sparql o Reusing the 3cixty KB deployment utilities o Each cube is stored on its own graph o Graphs updated for Data Structure Definition (DSD), properties and SKOS information
  16. 16. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data transformation. In summary… bi.aragon.es Google Drive Dataset and configuration download New dataset? GitHub Sí For each dataset Generate new configuration and create an issue New structure? No Create issue Sí New data? Regenerate data and create issue No Sí SPARQL
  17. 17. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data publication and use 17  Data can be accessed o API (using ELDA) • http://opendata.aragon.es/herramientas/apis?#aragodbpedia o GitHub (CSVs, RDF) o SPARQL endpoint SPARQL Elda Linked Data
  18. 18. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data API http://opendata.aragon.es/herramientas/apis?#aragodb pedia
  19. 19. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data publication and use 19  Aragopedia o http://opendata.aragon.es/apps/aragopedia/datos o Where, when and what (dónde, cuándo y qué) o Data can be downloaded in • CSV • JSON
  20. 20. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Aragopedia 20  Aragopedia o JSON result of querying about • Maestrazgo region (where) • population (what) • in 1999 (when)
  21. 21. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Conclusions (Results) 21  An easier-to-maintain data transformation process o Enriching existing Linked Data APIs from Aragón o Using GitHub for • Version control and archival • Continuous updates: detecting new data and data structures on a daily basis • https://github.com/aragonopendata/local-data-aragopedia/  Developer-friendly API  Additional user interface o Improving data retrieval and browsing capabilities  Side effect: data curation o Many errors and improvements detected in pre-existing CSV exports, which have been corrected throughout the process
  22. 22. Oscar Corcho1, Idafen Santana-Pérez1, Hugo Lafuente2, David Portolés3, César Cano4, Alfredo Peris4 and José María Subero4 1 Ontology Engineering Group, Universidad Politécnica de Madrid 2 Localidata 3 Idearium Consultores 4 Gobierno de Aragón Publishing Linked Statistical Data: Aragón, a case study ocorcho@fi.upm.es @ocorcho 22/10/2017 SemStats 2017 @ ISWC

×