Linked Sensor Data cube

A Linked Sensor Data Cube
for a 100 year homogenised daily temperature dataset
Laurent Lefort
5th Semantic Sensor Network Workshop, 12 November 2012
CSIRO ICT CENTRE

Outline
• ACORN-SAT dataset
• Role of SSN ontology
• Role of RDF Data Cube vocabulary
• Integration of SSN and RDF Data Cube
• Lessons learned
• Conclusions
A Linked Sensor Data Cube for a 100 year homogenised daily temperature 2 | dataset | Laurent Lefort

The ACORN-SAT dataset
• Released by Aus. Bureau of Meteorology (23 March 2012)
• Available at http://www.bom.gov.au/climate/change/acorn-sat/
• 112 stations in total - 60 from 1910 to 2011
• Homogenised (adjusted) daily temperatures
• Tabular format (1 file per time series/station)

The Linked Data version of ACORN-SAT
• Experimental version of ACORN-SAT data
• Available at http://lab.environment.data.gov.au/
• Developed for the Australian Bureau of Meteorology (BOM) by CSIRO in
cooperation with the Australian Government Information Management Office
(AGIMO)
• Temperature (homogenised) plus Rainfall (not homogenised)
• First version presented at Australian GovHack Day
• Alternative to tabular data
• Last version, uploaded to LOD cloud
• http://thedatahub.org/dataset/acorn-sat

Motivation: linked gov. agencies data in Australia
• Linked data (and well managed URIs) to build the bridges between
the different agencies
• Current linked data pilot is one agency (BoM) and one server but
applies solutions and schemes already in place in multi-agencies
and multi-service providers context (e.g. UK)
• Thanks to AGIMO for helping us to set up
http://lab.environment.data.gov.au/

SSN Ontology
• SSN-XG report http://www.w3.org/2005/Incubator/ssn/XGR-ssn/
• SSN Ontology http://purl.oclc.org/NET/ssnx/ssn
• Navigable documentation on wiki auto derived
http://www.w3.org/2005/Incubator/ssn/wiki/SSN

SSN: deployed systems and observations
Device
Skeleton
Deployment
PlatformSite
System
ssn:System
onPlatform
hasSubsystem
deploymentProcesPart
ssn:DeploymentRelatedProcess
hasDeployment
ssn:Deployment
deployedSystem
deployedOnPlatform
ssn:Platform
attachedSystem
ssn:Device
ssn:Sensor
ssn:SensingDevice
observes
inDeployment
observedBy
ssn:Property
observedProperty
ssn:Observation

Specific challenges for the SSN Ontology
• ACORN-SAT data derived from multiple stations with complex history
• Uses homogenisation algorithm to make adjustments to raw data
• “Metadata” used by the algorithm to identify “breakpoints” in time series
– Site changes (moves, building or vegetation having an impact on the quality of
observation), sensor (and sensor screens) changes, procedure changes (hours
of observations)
• BoM station numbering system “somewhat confusing over time”
• Desire to retain a single site number for upper-air observations at obs sites
• Several numbering conventions have been used at one or more locations where
an overlap occurs between an old (comparison) and new site:
– Old site retains old number, new site opens with new number.
– Old site switches to new number for the duration of the comparison, new site
takes over old number from the start of its observations.
– New site opens under new number then switches to old number after end of
comparison.

Linked ACORN SAT deployment data with SSN
• Data describing the deployment history
• Available in ACORN-SAT station catalogue (pdf)
• Not available in tabular format distribution
• ACORN-SAT composite stations
– System composed of one or several BoM stations
• BoM (Bureau of Meteorology) stations
– System composed of one or several station sharing the same codes
• Textual description of significant events
• Data describing the detailed conditions of observations
• Sensors
• Screens
• Automatic Weather stations
• Procedures e.g. hours of observation

Example (Darwin)
Time series – Weather stations – Sites – (Sensors)
Darwin Post Office
014016 (1910-1942)
Darwin Airport
014015 (1941-2007 & 2001-now)
2 sites – 1km apart – same code used

Deployment phases in Darwin

RDF Data cube http://purl.org/linked-data/cube
• RDF Data Cube: a method to organise linked data in slices
• A vocabulary published by the W3C Government Linked Data (GLD) Working
Group (Working Draft)
• Also the method used to publish statistics data and environmental data in
Europe e.g. for Bathing Water Quality in UK
http://www.epimorphics.com/web/projects/bathing-water-quality
• Advantages
• Allows multiple views on the same data (similar to OLAP)
• Generic approach which supports the links to domain-specific definitions
• Useable:
• In any browser via Linked Data API (HTML output)
• In JavaScript via Linked Data API (JSON output)
• In R via SPARQL
12 | A Linked Sensor Data Cube for a 100 year homogenised daily temperature dataset | Laurent Lefort

From: The RDF Data Cube Vocabulary
W3C Working Draft 05 April 2012
http://www.w3.org/TR/vocab-data-cube/
13 | A Linked Sensor Data Cube for a 100 year homogenised daily temperature dataset | Laurent Lefort

Data cube, slice and observation
Dimension d7
Dimension d6
Dimension d1
Dimension d2
Dimension d3
Dimension d4
Dimension d5
Measure m1, m2, …
Attribute a1, a2, …
Cube
Slice
Observation

QB: Dataset, Slice, Observation
Cube and Slice
qb:Slice
qb:Dataset
slice
Cube observation
observation
qb:Observation
subslice

Data Cube Structure:
dimensions, measure, attributes
Current Data Cube structure (and URI/API logic)
Observation
- MinTemperature
- MaxTemperature
- Rainfall
- Booleans for missing data
(2) Year
(3) Month
Day
(1) ACORN-SAT Series/System (station)
• Stations/time series
• Year
• Month
• All linking to observations

Slices and URI scheme

Coupling SSN and RDF Data Cube

Device
Skeleton
acorn-system
Cube observation
observation
structure qb:DataStructureDefinition
component
qb:ComponentSpecification
deployedSystem deploymentProcesPart
bom-station acorn-deploy
hasSubsystem
bom-station:System
acorn-site
raindist
locatedIn
parentADM1
parentFeature, parentCountry
parentADM1, parentADM2
bom-station:Station deployedOnPlatform
acorn-site:Site
acorn-deploy:Deployment
deployment-
ProcessPart
currentSite
acorn-system:System
concept
etcddi
etcddi:xxx
acorn-series:xxx
acorn-series
acorn-series:TimeSeries
acorn-deploy:StandaloneOperation
acorn-sat:Observation
acorn-sat observation
acorn-deploy:PreDeployment
acorn-deploy:PostDeployment
observedBy
acorn-sat:xxx
time:Interval (Instant)
Intervals OWL Time
interval:CalendarInterval (Instant)
raindist:RainfallDistrict
rainstate
rainstate:RainfallState
gn:Feature
geonames
Deployment
System PlatformSite
ssn:System
onPlatform
hasSubsystem
hasDeployment
ssn:DeploymentRelatedProcess
ssn:Deployment
ssn:Platform
deployedOnPlatform
attachedSystem
ssn:Device
ssn:Sensor
ssn:SensingDevice
observes
inDeployment
observedBy
ssn:Property
observedProperty
Cube and Slice
qb:Slice
qb:Dataset
slice
qb:Observation
qb:ComponentProperty
DSD
componentProperty
Dataset
void:Dataset
Concepts
skos:Concept
ssn:Observation

Access to data with Elda via
ssn:hasSubSystem
ssn:hasDeployment
ssn:observedBy ssn:deploymentProcessPart

Mashups
• Display the station locations and their average temperature
readings on a map
• http://lab.environment.data.gov.au/mashup/drilldown
• Select a Date range for climate readings for a given location
• http://lab.environment.data.gov.au/mashup

Lessons learned
• Flexible URI scheme
• ELDA-friendly, UK-style: using nested list endpoints and item endpoints
– http://lab.environment.data.gov.au/data/acorn/climate/slice/station
– http://lab.environment.data.gov.au/data/acorn/climate/slice/station/014015
• Extra slice(s) easy to add to allow multiple access to the same observations
• RDF Data Cube vocabulary (QB)
• Some clarifications needed for qb:structure, qb:sliceKey, qb:sliceStructure,
qb:component and qb:componentAttachment properties e.g. through the
publication of validation rules
• Coupling of SSN ontology and RDF Data Cube vocabulary
• Different ecosystems (OWL vs. RDF/RDFS)
– OK for RDF Data Cube, not OK for other reused vocabularies e.g. UK Intervals
(Jena Eyeball used for validation)
• Observed properties are classes in the SSN ontology and properties in the RDF
Data Cube
– Possibility to reuse/extend the qb:concept properties defined to manage
references to skos:Concept in QB

Conclusions
• Approach is applicable to all climate time series
• Several climate-specific issues not addressed
• Transparency/reproducibility of homogenisation process
– Require raw data plus extra (meta)data (sensors, screen types, sensors
exposure, “qualified” observed properties during a specific observation
interval), plus data used/generated during homogenisation algorithm (ACORN-SAT
uses different values for different value distribution percentiles)
– More ontology work needed (compared to SSN) on homogenisation algorithms
parameters, types of breakpoints and types of adjustment lookup table
• Opportunities to link to other datasets (Australia, World)
• Geo-features (e.g. GeoNames - done) for weather station sites, districts
• Other climate data e.g. regional and world climate data archives, cyclone tracks
(not yet available as linked data)
• Other environmental data (not yet available as linked data)

Thank you
Division/Unit Name
Laurent Lefort
Ontologist
t +61 2 9123 4567
e laurent.lefort@csiro.au
w ict.csiro.au
CSIRO ICT CENTRE

Images credits
• Blair Trewin The ACORN-SAT station at Butlers Gorge in central
Tasmania (surfacetemperatures.blogspot.com.au )

Reused ontologies
Ontology Short Description URL
DOLCE Ultra
Lite (DUL)
A lightweight foundational ontology for
modeling either physical or social contexts
http://www.loa-cnr.
it/ontologies/DUL.owl
Semantic
Sensor Network
An ontology for the description of sensors
and observations, and related concepts.
http://purl.oclc.org/NET/ssnx/ssn
RDF Data Cube
A vocabulary for the publication of multi-dimensional
data as linked data
http://purl.org/linked-data/cube
OWL Time An ontology of temporal concepts http://www.w3.org/2006/time
Intervals
A vocabulary (and URI scheme) for the
definition of instants and intervals.
http://reference.data.gov.uk/def/in
tervals
WGS84_Pos
A vocabulary for representing latitude,
longitude and altitude information in the
WGS84 geodetic reference datum
http://www.w3.org/2003/01/geo/w
gs84_pos
GeoNames
An ontology for the description of
geographical features, their characteristics
and relationships
http://www.geonames.org/ontolog
y/ontology_v3.01.rdf
VoID (Vocabula-ry
of Interlinked
Datasets)
A vocabulary for expressing metadata
about RDF datasets
http://vocab.deri.ie/void

Developed ontologies
Ontology Short Description URL
ETCCDI
Indicators defined by the joint
CCl/CLIVAR/JCOMM Expert Team on
Climate Change Detection and Indices
http://purl.oclc.org/NET/ssnx/etccdi
Rainfall
districts and
states
Geographical areas defined as part of the
Bureau's numbering system for observation
sites
def/stations/raindist
…/rainstate
BoM Station
Definition for the weather stations
registered in the Bureau’s Weather Station
Directory
def/stations/station
Surface Air
Temperature
ACORN-SAT observation (temperature,
rainfall) for one day
def/acorn/sat
Time Series
Time series data defined as data cube
slices (aggregated at different levels)
def/acorn/time-series
ACORN-SAT
deployment
Phases and sub-phases recorded in the
ACORN-SAT documentation pack
def/acorn/deployment
ACORN-SAT
system
The sensing asset used for a deployment
phases (or sub-phase)
def/acorn/system
ACORN-SAT
site
The site used for a deployment phase (or
sub-phase)
def/acorn/site

RDF Data Cube (qb:ComponentAttachement)

Reference to skos:Concept

Linked Sensor Data cube

More Related Content

What's hot

Viewers also liked

Similar to Linked Sensor Data cube

More from Laurent Lefort

Recently uploaded

Linked Sensor Data cube

Editor's Notes