netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Towards linked data conventions for
delivery of environmental data using
netCDF
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox
CSIRO LWF and BODC/ Marine Institute (Galway)
25 March 2014 | ISESS
… netCDF-LD

Outline
1. Big data and netCDF
2. Metadata challenges
3. Linked Data, JSON-LD & CSV-on-the-web
4. netCDF-LD
5. Applications
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu2 |

Big data
• “90% of the world’s data has been produced over the last two
years”
• Environmental and earth observation has always been big-data…

National Soil and Landscape Grid

Hydrodynamic models

Real-time wind data

netCDF
• Persistence layer
• Simple: self-describing, user
accessible “flat file”
• Java, C++, python, others
• Interfaces to the Data Model
File format
Software library
API

netCDF cont’
• Really good at handling array-oriented data – efficient subsetting
• Multi-dimensional grids
• CF conventions
climatology and
weather forecasting
domains

netCDF metadata example
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:standard_name =
”total_suspended_solids”;

Use of netCDF for Chlorophyll observations

netCDF challenges
• Encoding format and framework
• “Domain agnostic” but conventions exist for communities – e.g.
Climate and Forecasting (CF conventions)
• CF conventions limited
• Use of ‘standard_name’ attribute to denote the combination of
medium/transformation/substance/state/quantity using a particular syntax
• Assumes English
• Standard names take a while to be adopted
• Standard names variations of each other
• Units are from UDUNITS – Unidata managed
• Narrow scope of scientific domain – can’t reuse for other domains

netCDF Metadata challenges – Semantic heterogeneity
Enviro
Application
#1
Data
DB
Chl_MIM
Enviro
Application
#2
Data
DB
mass_conc_
chlorophyll_
In_sea_water
Enviro
Application
#3
Data
DB
mass_conc_
chlorophyll_a_
In_sea_water

Semantic Heterogeneity… leads to data silos
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
Chl_MIM
mass_conc_
chlorophyll_
In_sea_water
mass_conc_
chlorophyll_a_
In_sea_water
X X
Meetings

Approaches to address netCDF metadata challenge
1. Strict naming grammars for standard_name
(Peckham 2014) CSDMS Standard
Naming Conventions

Approaches to address netCDF metadata challenge (2)
2. Break up various concerns in standard_name into multiple
attributes – ref (Yu et al. 2014)
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:scaledQuantityKind_id
= "http://environment.data.gov.au/water/quality/def/property/solids-
total_suspended" ;
Nap_MIM:unit_id =
"http://environment.data.gov.au/water/quality/def/unit/MilliGramsPerLitre" ;
Nap_MIM:substanceOrTaxon_id =
"http://environment.data.gov.au/water/quality/def/object/solids";
Nap_MIM:medium_id =
"http://environment.data.gov.au/water/quality/def/object/ocean"
Nap_MIM:procedure_id = "http://data.ereefs.org.au/ocean-colour/MIM_SVDC_RRS" ;

Harmonised Publish, Discovery, Access and Use
Relies on community
agreed vocabularies
Describe/Publish
the data
Query/Use data
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
substanceOrTaxon=
http://environment.data.gov.au/def/object/chlorophyll
scaledQuantityKind =
http://environment.data.gov.au/def/property/chlorophyll_
concentration
Need to communicate more consistently
Requires shared, precise, agreed semantics

Linked Data, JSON-LD & CSV-on-the-web
"LOD Cloud Diagram as of September 2011" by Anja Jentzsch

Linked Data
• Method of connect related data, existing vocabularies and other
semantics using web links (URIs) and a language for describing
resources (RDF)
• Applications can then
query data, follow the links
and infer new insights
from the data

JSON-LD
{
"name": "John Lennon",
"born": "1940-10-09",
"spouse":
"http://dbpedia.org/resource/Cynthia_Lennon"
}
JSON-LD
Decorators

JSON-LD and Semantic Web
http://dbpedia.org/re
source/John_Lennon
http://dbpedia.org/res
ource/Cynthia_Lennon
“John Lennon”
1940-10-09

CSV-on-the-web
• Linked data for CSV
tabular data
• Add context to
tables

Example: Domain vocab term

Example: Domain vocab term
...
wqp:chlorophyll_a_concentration
a skos:Concept, op:ScaledQuantityKind,
qudt:ChemistryQuantityKind ;
skos:broader wqp:chlorophyll_concentration ;
skos:prefLabel "chlorophyll a concentration"@en ;

netCDF-LD
Take the already successful and popular encoding
format and…
‘linkify’ netCDF!
25 |

netCDF-LD: Linkifying netCDF
‘Context’ boilerplates

netCDF-LD: Linkifying netCDF
Air Temperature
Definition
Air
Temperature
Medium
Quantity
Kind
Kelvin
UoM
eReefs Vocabularies

netCDF-LD: Global Attributes

Assigning URIs to variable level attributes
z:units = "meters";
z:units_ref = "http://qudt.org/vocab/unit#Meter";
z:a =
"http://environment.data.gov.au/def/op#quantityKind";
z:dcPartOf = "http://foo.bar/linked_netCDF_example";
z:valid_range = 0., 5000.;

netCDF-LD to RDF
@prefix unit: <http://qudt.org/vocab/unit#> .
@prefix qudt: <http://qudt.org/1.1/schema/qudt#> .
@prefix op: <http://environment.data.gov.au/def/op#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:z qudt:unit unit:Meter;
a op:ScaledQuantityKind;
dcterms:isPartOf <http://foo.bar/linked_netCDF_example>.

Use of netCDF-LD to support Data Discovery

Data annotated with bindings to vocab URIs
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
= http://environment.data.gov.au
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
/def/feature/ocean

DBL Harvesting and End Use
Data Brokering
Layer
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
/def/feature/ocean
End users
Client
application
chlorophyll

netCDF-LD benefits
• Enhanced metadata – better self-description for any domain
• Allows binding to rich semantics from existing vocabularies being
published via SKOS/OWL/RDF in multiple domains
• Linked Data approach – consistent with Semantic Web
technologies, RDF, JSON-LD, CSV-on-the-web
• Hook into toolkits and software libraries for data discovery via
Semantic Web technologies and other Linked data already being
published online

Future/Current Work
1. netCDF-LD requires adoption at community level – need to test
the proposal at OGC, Unidata, CF communities
2. Tooling for parsing netCDF-LD
3. Encoding large volumes using netCDF-LD
4. Demonstrate Data Broker concept using netCDF-LD

Summary
• netCDF is really good at big data – esp. array-oriented data
• Limitations with the current netCDF metadata structure
• *-LD pattern is emerging across multiple formats (JSON, CSV)
• Propose netCDF-LD which is an unintrusive extension on netCDF to
facilitate improved self-described datasets
• netCDF-LD has potential to enhance data discovery from
environmental dataset across to other Linked Data being
published online

LAND AND WATER
Thank you
Land and Water
Jonathan Yu
Research Software
Engineer
t +61 3 9252 6440
e jonathan.yu@csiro.au

netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF

Similar to netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF (20)

More from Jonathan Yu

More from Jonathan Yu (6)

Recently uploaded

Recently uploaded (20)

netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF