Semantically-Enabled
Environmental Data Discovery and Integration:
Demonstration Using the Icelandic Volcano Use Case

Tatiana Tarasova

1

Massimo Argenti

1 ISLA,

2

Maarten Marx

University of Amsterdam

2 European

Space Agency

October 7, 2013
System description papers, KESW 2013

1
Environmental Research
Example Case Study

Icelandic Volcano Eruption [6]
What was the impact of the eruption of the Icelandic Volcano
(Eyjafjallaj¨kull) in 2010 on the environment?
o
What data can contribute to the research?

?	
  

?	
  

SIOS	
  

?	
  
AURORA	
  BOREALIS	
  

EMSO	
  

?	
  

?	
  

?	
  

EUFAR-­‐COPAL	
  

LIFEWATCH	
  

EISCAT-­‐3D	
  
EPOS	
  

EURO-­‐ARGO	
  

IAGOS-­‐ERI	
  

?	
  
ICOS	
  ATC	
  
Technological and Structural Data Heterogeneity

CSV	
  
FTP	
  catalogues	
  
Ocean	
  
temperature	
  
EURO-­‐ARGO	
  

?	
  

NetCDF	
  
Authorized	
  IP	
  access	
  
Atmospheric	
  
measurements	
  

ICOS	
  ATC	
  
Semantic Data Heterogeneity

hourly	
   ppm	
  
measurements	
  
level	
  2	
  
flask	
  

CSV	
  
FTP	
  catalogues	
  
Ocean	
  
temperature	
  
EURO-­‐ARGO	
  

plaBorm	
  
good	
  quality	
  

?	
  

float	
  trajectories	
  

NetCDF	
  
Authorized	
  IP	
  access	
  
Atmospheric	
  
measurements	
  

ICOS	
  ATC	
  
Outline

1

Motivation

2

Environmental Data Discovery

3

Environmental Data Integration
Linked Data Approach to Environmental Data Integration
Demonstration Using the Icelandic Volcano Case Study
Environmental Data Discovery
Approach
discover data through a single harmonized metadata catalogue
enable semantic data discovery through semantic tagging of datasets

Implementation
ENVRI portal http://portal.envri.eu
OpenSearch [11] based catalogue
1350 data series, 288.971 triples stored in SESAME [12]
geospatial metadata model that extends the INSPIRE guidelines [7]
with richer semantics
http://portal.genesi-dec.eu/news/?id=117
semantic tagging against a set of the Earth Science vocabularies
(GCMD [8], SBA [9], GEMET [10])
Outline

1

Motivation

2

Environmental Data Discovery

3

Environmental Data Integration
Linked Data Approach to Environmental Data Integration
Demonstration Using the Icelandic Volcano Case Study
Linked Environmental Data
Linked Data [13]
→ publish data not documents!

Environmental Data
→ datasets with observations

Linked Environmental Data
→ publish observations not datasets!
→ fine-grain representation of environmental data will bring new
opportunities to query and integrate environmental data at the level
of single observations
Atmospheric Measurements (ICOS) [14]

Dataset “CO2 concentration measured by Mace Head”

Dimensions:
Time
Geospatial location
Unit of Measure
Observed Phenomenon …

Observation
“CO2 concentration in the air
Measured by Mace Head
on 2010-01-05
was 392.011”
Ocean Monitoring (Euro-Argo) [15]

Dimensions

Observations

Dataset
Ash Cloud Dynamics (ACTRIS) [16]

Dimensions

Observations

Dataset
Related Work
RDF Data Cube [17] based approaches
Linked Environmental Data [1]
The ACORN-SAT Linked Climate Dataset [2]
A Linked Data Framework for Publishing the UK Environmental Data [4]

Data Cube: core model
Observa2on	
  
has dataset
Dataset	
  Structure	
  

Dataset	
  
has structure

has dimension

Dimension	
  
But what about semantic data interoperability?
Question
Can we find generic concepts to capture domain semantics of
environmental data?

Data Cube Extension

Observa.on	
  
has dataset
Dataset	
  Structure	
  

Dataset	
  
has structure

...

has location
has time
Loca.on	
  

Time	
  

has Feature of
Interest

FeatureOfInterest	
  
ENVRI vocabulary [18] (based on OGC O&M [19])

Location

Time
Feature of
Interest
has Feature of Interest
has Time
has Property

has Location
has Observed Property

Observed
Property

Observation

has Result

has procedure
Procedure

Result
Outline

1

Motivation

2

Environmental Data Discovery

3

Environmental Data Integration
Linked Data Approach to Environmental Data Integration
Demonstration Using the Icelandic Volcano Case Study
Demonstration

Data
ICOS CO2 concentration, Euro-Argo - ocean temperature
35 collections, 10.520 observations, 136.556 RDF triples

Implementation
RDF generation - RDF Data Cube plug-in for Google Refine [5]
storage - Virtuoso RDF store
access - http://data.politicalmashup.nl/sparql/
Queries

1: query for individual and subsets of observations
Retrieve all the observations for the days of the Volcano eruption
(from 20 March to 23 June, 2010).
Queries

1: query for individual and subsets of observations
Retrieve all the observations for the days of the Volcano eruption
(from 20 March to 23 June, 2010).

2: exploit the semantics of the terms of the ENVRI vocabulary
What phenomena were measured in 2010 in the area next to the
Volcano?
What instruments were used to make measurements in 2010 in the
area next to the Volcano?
Conclusion

→ data discovery through a harmonized metadata catalogue based on
the geospatial metadata model
→ fine-grain representation of environmental data enables queries that
retrieve and integrate data at the level of single observation instead of
pre-defined collections
→ ENVRI vocabulary enables semantically rich queries
Conclusion

→ data discovery through a harmonized metadata catalogue based on
the geospatial metadata model
→ fine-grain representation of environmental data enables queries that
retrieve and integrate data at the level of single observation instead of
pre-defined collections
→ ENVRI vocabulary enables semantically rich queries

Future Work
→ Alignment between data models for data discovery and data
harmonization
→ Systematic study of the proposed modelling solution
Questions?

Thank you!

→ ENVRI portal http://portal.envri.eu
→ more about Linked Environmental Data
http://staff.science.uva.nl/~ttaraso1/html/envri.html
References I
R¨ther, M., Fock, J., and Hubener, J.: Linked Environmental Data. 24th
u
International Conference on Informatics for Environmental Protection (2010)
R¨ther, M., Fock, J., and Hubener, J.: The ACORN-SAT Linked Climate Dataset.
u
Semantic Web Journal (2013)
http://www.semantic-web-journal.net/system/files/swj457.pdf
The ENVRI vocabulary
http://data.politicalmashup.nl/RDF/vocabularies/envri
Shaon, A., Woolf, A., Boczek, R., Rogers, W., and Jackson, M.: An Open Source
Linked Data Framework for Publishing Environmental Data under the UK Location
Strategy. Proceedings of the Terra Cognita Workshop on Foundations,
Technologies and Applications of the Geospatial Web (2011)
http://ceur-ws.org/Vol-798/paper6.pdf
The Data Cube plug-in for Google Refine http://refine.deri.ie/qbExport
2010 eruptions of Eyjafjallaj¨kull on Wikipedia
o
http://en.wikipedia.org/wiki/2010_eruptions_of_Eyjafjallaj%C3%B6kull
References II
State of progress in the development of guidelines to express elements of the
Infrastructure for Spatial Information in the European Community (INSPIRE)
metadata implementing rules using ISO 15836 (Dublin Core). European
Commission (2008) http://inspire.jrc.ec.europa.eu/reports/
ImplementingRules/metadata/MD_IR_and_DC_state%20of%20progress.pdf
The Global Change Master Directory (GCMD) http://gcmd.nasa.gov/
The Societal Benefit Area vocabularies (SBA)
http://www.earthobservations.org/
The GEneral Multilingual Environmental Thesaurus (GEMET)
http://www.eionet.europa.eu/gemet/
The OpenSearch standard protocol http://www.opensearch.org/
http://www.openrdf.org/
Berners-Lee, T.: Linked data - design issues, 2006.
http://www.w3.org/DesignIssues/LinkedData.html
References III

The Integrated Carbon Dioxide System (ICOS), Atmospheric Measurements
System https://icos-atc-demo.lsce.ipsl.fr/
Euro-Argo http://www.argodatamgt.org/
The Aerosols, Clouds, and Trace Gasses Research Infrastructure Network (ACTRIS)
www.actris.net
The Data Cube vocabulary
http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/
The ENVRI vocabulary.
http://data.politicalmashup.nl/RDF/vocabularies/envri
Geographic Information: Observations and Measurements. OGC Abstract
Specification http://www.opengeospatial.org/standards/om

Semantically-Enabled Environmental Data Discovery and Integration: Demonstration Using the Iceland Volcano Use Case

  • 1.
    Semantically-Enabled Environmental Data Discoveryand Integration: Demonstration Using the Icelandic Volcano Use Case Tatiana Tarasova 1 Massimo Argenti 1 ISLA, 2 Maarten Marx University of Amsterdam 2 European Space Agency October 7, 2013 System description papers, KESW 2013 1
  • 2.
  • 3.
    Example Case Study IcelandicVolcano Eruption [6] What was the impact of the eruption of the Icelandic Volcano (Eyjafjallaj¨kull) in 2010 on the environment? o
  • 4.
    What data cancontribute to the research? ?   ?   SIOS   ?   AURORA  BOREALIS   EMSO   ?   ?   ?   EUFAR-­‐COPAL   LIFEWATCH   EISCAT-­‐3D   EPOS   EURO-­‐ARGO   IAGOS-­‐ERI   ?   ICOS  ATC  
  • 5.
    Technological and StructuralData Heterogeneity CSV   FTP  catalogues   Ocean   temperature   EURO-­‐ARGO   ?   NetCDF   Authorized  IP  access   Atmospheric   measurements   ICOS  ATC  
  • 6.
    Semantic Data Heterogeneity hourly   ppm   measurements   level  2   flask   CSV   FTP  catalogues   Ocean   temperature   EURO-­‐ARGO   plaBorm   good  quality   ?   float  trajectories   NetCDF   Authorized  IP  access   Atmospheric   measurements   ICOS  ATC  
  • 7.
    Outline 1 Motivation 2 Environmental Data Discovery 3 EnvironmentalData Integration Linked Data Approach to Environmental Data Integration Demonstration Using the Icelandic Volcano Case Study
  • 8.
    Environmental Data Discovery Approach discoverdata through a single harmonized metadata catalogue enable semantic data discovery through semantic tagging of datasets Implementation ENVRI portal http://portal.envri.eu OpenSearch [11] based catalogue 1350 data series, 288.971 triples stored in SESAME [12] geospatial metadata model that extends the INSPIRE guidelines [7] with richer semantics http://portal.genesi-dec.eu/news/?id=117 semantic tagging against a set of the Earth Science vocabularies (GCMD [8], SBA [9], GEMET [10])
  • 9.
    Outline 1 Motivation 2 Environmental Data Discovery 3 EnvironmentalData Integration Linked Data Approach to Environmental Data Integration Demonstration Using the Icelandic Volcano Case Study
  • 10.
    Linked Environmental Data LinkedData [13] → publish data not documents! Environmental Data → datasets with observations Linked Environmental Data → publish observations not datasets! → fine-grain representation of environmental data will bring new opportunities to query and integrate environmental data at the level of single observations
  • 11.
    Atmospheric Measurements (ICOS)[14] Dataset “CO2 concentration measured by Mace Head” Dimensions: Time Geospatial location Unit of Measure Observed Phenomenon … Observation “CO2 concentration in the air Measured by Mace Head on 2010-01-05 was 392.011”
  • 12.
    Ocean Monitoring (Euro-Argo)[15] Dimensions Observations Dataset
  • 13.
    Ash Cloud Dynamics(ACTRIS) [16] Dimensions Observations Dataset
  • 14.
    Related Work RDF DataCube [17] based approaches Linked Environmental Data [1] The ACORN-SAT Linked Climate Dataset [2] A Linked Data Framework for Publishing the UK Environmental Data [4] Data Cube: core model Observa2on   has dataset Dataset  Structure   Dataset   has structure has dimension Dimension  
  • 15.
    But what aboutsemantic data interoperability? Question Can we find generic concepts to capture domain semantics of environmental data? Data Cube Extension Observa.on   has dataset Dataset  Structure   Dataset   has structure ... has location has time Loca.on   Time   has Feature of Interest FeatureOfInterest  
  • 16.
    ENVRI vocabulary [18](based on OGC O&M [19]) Location Time Feature of Interest has Feature of Interest has Time has Property has Location has Observed Property Observed Property Observation has Result has procedure Procedure Result
  • 17.
    Outline 1 Motivation 2 Environmental Data Discovery 3 EnvironmentalData Integration Linked Data Approach to Environmental Data Integration Demonstration Using the Icelandic Volcano Case Study
  • 18.
    Demonstration Data ICOS CO2 concentration,Euro-Argo - ocean temperature 35 collections, 10.520 observations, 136.556 RDF triples Implementation RDF generation - RDF Data Cube plug-in for Google Refine [5] storage - Virtuoso RDF store access - http://data.politicalmashup.nl/sparql/
  • 19.
    Queries 1: query forindividual and subsets of observations Retrieve all the observations for the days of the Volcano eruption (from 20 March to 23 June, 2010).
  • 20.
    Queries 1: query forindividual and subsets of observations Retrieve all the observations for the days of the Volcano eruption (from 20 March to 23 June, 2010). 2: exploit the semantics of the terms of the ENVRI vocabulary What phenomena were measured in 2010 in the area next to the Volcano? What instruments were used to make measurements in 2010 in the area next to the Volcano?
  • 21.
    Conclusion → data discoverythrough a harmonized metadata catalogue based on the geospatial metadata model → fine-grain representation of environmental data enables queries that retrieve and integrate data at the level of single observation instead of pre-defined collections → ENVRI vocabulary enables semantically rich queries
  • 22.
    Conclusion → data discoverythrough a harmonized metadata catalogue based on the geospatial metadata model → fine-grain representation of environmental data enables queries that retrieve and integrate data at the level of single observation instead of pre-defined collections → ENVRI vocabulary enables semantically rich queries Future Work → Alignment between data models for data discovery and data harmonization → Systematic study of the proposed modelling solution
  • 23.
    Questions? Thank you! → ENVRIportal http://portal.envri.eu → more about Linked Environmental Data http://staff.science.uva.nl/~ttaraso1/html/envri.html
  • 24.
    References I R¨ther, M.,Fock, J., and Hubener, J.: Linked Environmental Data. 24th u International Conference on Informatics for Environmental Protection (2010) R¨ther, M., Fock, J., and Hubener, J.: The ACORN-SAT Linked Climate Dataset. u Semantic Web Journal (2013) http://www.semantic-web-journal.net/system/files/swj457.pdf The ENVRI vocabulary http://data.politicalmashup.nl/RDF/vocabularies/envri Shaon, A., Woolf, A., Boczek, R., Rogers, W., and Jackson, M.: An Open Source Linked Data Framework for Publishing Environmental Data under the UK Location Strategy. Proceedings of the Terra Cognita Workshop on Foundations, Technologies and Applications of the Geospatial Web (2011) http://ceur-ws.org/Vol-798/paper6.pdf The Data Cube plug-in for Google Refine http://refine.deri.ie/qbExport 2010 eruptions of Eyjafjallaj¨kull on Wikipedia o http://en.wikipedia.org/wiki/2010_eruptions_of_Eyjafjallaj%C3%B6kull
  • 25.
    References II State ofprogress in the development of guidelines to express elements of the Infrastructure for Spatial Information in the European Community (INSPIRE) metadata implementing rules using ISO 15836 (Dublin Core). European Commission (2008) http://inspire.jrc.ec.europa.eu/reports/ ImplementingRules/metadata/MD_IR_and_DC_state%20of%20progress.pdf The Global Change Master Directory (GCMD) http://gcmd.nasa.gov/ The Societal Benefit Area vocabularies (SBA) http://www.earthobservations.org/ The GEneral Multilingual Environmental Thesaurus (GEMET) http://www.eionet.europa.eu/gemet/ The OpenSearch standard protocol http://www.opensearch.org/ http://www.openrdf.org/ Berners-Lee, T.: Linked data - design issues, 2006. http://www.w3.org/DesignIssues/LinkedData.html
  • 26.
    References III The IntegratedCarbon Dioxide System (ICOS), Atmospheric Measurements System https://icos-atc-demo.lsce.ipsl.fr/ Euro-Argo http://www.argodatamgt.org/ The Aerosols, Clouds, and Trace Gasses Research Infrastructure Network (ACTRIS) www.actris.net The Data Cube vocabulary http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/ The ENVRI vocabulary. http://data.politicalmashup.nl/RDF/vocabularies/envri Geographic Information: Observations and Measurements. OGC Abstract Specification http://www.opengeospatial.org/standards/om