Using the Data Cube vocabulary for Publishing Environmental
Linked Data on lab.environment.data.gov.au
Canberra Semantic W...
Outline
• ACORN-SAT Dataset
• Building the Data Cube
• Enriching ACORN-SAT Linked Data with Metadata
• Published ACORN-SAT...
The ACORN-SAT dataset
• Released by Aus. Bureau of Meteorology (23 March 2012)
• Available at http://www.bom.gov.au/climat...
“Catalogue websites do not
unlock the full potential of the
collected data and metadata”
Using the Data Cube vocabulary fo...
Limitations of ACORN-SAT in Tabular files
• Metadata fields are not documented
• Querying across the catalog is difficult
...
ACORN-SAT as Linked Data
Linked Data is a shift from publishing data in human readable
HTML documents to machine readable ...
ACORN-SAT as Linked Data
RDF Data Cube: a method to organise linked data in slices
• A vocabulary published by the W3C Gov...
RDF Data Cube 101 - Slices and observations
Dimension d6
Dimension d7
Dimension d1
Dimension d2
Dimension d3
Dimension d4
...
RDF Data Cube 101 – Dataset, Slice, Observation
Cube and Slice
qb:DataSet
qb:slice
qb:Observation
Cube observation
qb:obse...
RDF Data Cube 101 – Data Structure Definitions (DSDs)
Using the Data Cube vocabulary for Publishing Environmental Linked D...
5 basic steps
• 1.Define the prefixes to be used
• 2.Publish your schema
• Define the dimension(s) – used to identify the ...
1. Prefixes
• PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
• PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-n...
2. Define the schema
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.a...
3. Define the Observations
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data...
4. Define the slices
Observation
- MinTemperature
- MaxTemperature
- Rainfall
- Booleans for missing data
Day
(3) Month
(2...
Define the DSD
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | La...
5. Select appropriate URIs
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort18 ...
• Data describing the deployment history
• Available in ACORN-SAT station catalogue (pdf)
• Not available in tabular forma...
SSN: deployed systems and observations
Skeleton
Device
Deployment
PlatformSite
System
ssn:System
onPlatform
hasSubsystem
h...
Example (Darwin)
Time series – Weather stations – Sites – (Sensors)
Darwin Post Office
014016 (1910-1942)
Darwin Airport
0...
Deployment phases in Darwin
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.dat...
Multiple Views on Data – Mashups
• Display the station locations and their average temperature
readings on a map
• http://...
Multiple Views on Data – ELDA Linked Data API
ssn:hasSubSystem
ssn:hasDeployment
ssn:deploymentProcessPartssn:observedBy
U...
Multiple Views on Data – SPARQL
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment...
Multiple Views on Data – SPARQL
PREFIX cube: <http://purl.org/linked-data/cube#>
PREFIX sat: <http://lab.environment.data....
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort27 |
Wrap up
• Experimental version of ACORN-SAT data
• Available at http://lab.environment.data.gov.au/
• Developed for the Au...
Use It! http://michaelhalls.net/planforsun/index.php
Using the Data Cube vocabulary for Publishing Environmental Linked Da...
Australian Government Linked Data
Working Group (AGLDWG)
• Ad-hoc group established August 2012
– BoM, OSP, CSIRO , AGIMO,...
Conclusions
• Approach is applicable to all climate time series
• Opportunities to link to other datasets (Australia, Worl...
ISWC 2013
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent...
CSIRO Computational Informatics
Laurent Lefort
Ontologist
t +61 2 9123 4567
e laurent.lefort@csiro.au
W csiro.au
CSIRO COM...
Images credits
• Blair Trewin The ACORN-SAT station at Butlers Gorge in central
Tasmania (surfacetemperatures.blogspot.com...
More information
• Laurent Lefort, Josh Bobruk, Armin Haller, Kerry Taylor and
Andrew Woolf A Linked Sensor Data Cube for ...
Upcoming SlideShare
Loading in …5
×

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

2,232 views

Published on

Canberra Semantic Web Meetup.

Initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube specification, now in Last Call at W3C.

The meeting will be an opportunity to hear about two semantic Web and Linked Data initiatives for statistical data that are driven by the Australian Government. The Bureau of Meteorlogy and CSIRO have recently released a Linked Data version of the ACORN-SAT historical climate data at http://lab.environment.data.gov.au and the ABS has released the Census data modelled in the Data Cube vocabulary which is part of a challenge the ABS is organising in context of the SemStats Workshop (http://www.datalift.org/en/event/semstats2013/challenge) at the International Semantic Web Conference (ISWC) in Sydney (http://iswc2013.semanticweb.org).

Come along to hear about these two projects, the challenges encountered and the solutions developed.

Published in: Technology, Education

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

  1. 1. Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au Canberra Semantic Web meetup CSIRO COMPUTATIONAL INFORMATICS Laurent Lefort, Armin Haller
  2. 2. Outline • ACORN-SAT Dataset • Building the Data Cube • Enriching ACORN-SAT Linked Data with Metadata • Published ACORN-SAT Linked Data Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort2 |
  3. 3. The ACORN-SAT dataset • Released by Aus. Bureau of Meteorology (23 March 2012) • Available at http://www.bom.gov.au/climate/change/acorn-sat/ • 112 stations in total - 60 from 1910 to 2011 • Homogenised (adjusted) daily temperatures • Tabular format (1 file per time series/station) Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort3 |
  4. 4. “Catalogue websites do not unlock the full potential of the collected data and metadata” Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort4 | Richard Cyganiak,
  5. 5. Limitations of ACORN-SAT in Tabular files • Metadata fields are not documented • Querying across the catalog is difficult • Exploring the catalog through different facets geographical/statistical/tabular is not possible • Bulk processing of the dataset or parts of it is not possible • Social annotations are not possible • Integrating the dataset within other datasets is difficult Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort5 |
  6. 6. ACORN-SAT as Linked Data Linked Data is a shift from publishing data in human readable HTML documents to machine readable documents. Linked Data Principles: 1. Use URIs as identifiers for Things http://sws.geonames.org/2172517 2. Make them actionable → http://www.geonames.org/2172517/canberra.html 3. Return information following standards → http://sws.geonames.org/2172517/about.rdf 4. Link to other information objects <rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Canberra"/> Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort6 |
  7. 7. ACORN-SAT as Linked Data RDF Data Cube: a method to organise linked data in slices • A vocabulary published by the W3C Government Linked Data (GLD) Working Group (Working Draft) • Also the method used to publish statistics data and environmental data in Europe e.g. for Bathing Water Quality in UK http://www.epimorphics.com/web/projects/bathing-water-quality Advantages • Allows multiple views on the same data (similar to OLAP) • Generic approach which supports the links to domain-specific definitions Useable: • In any browser via Linked Data API (HTML output) • In JavaScript via Linked Data API (JSON output) • In R via SPARQL Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort7 |
  8. 8. RDF Data Cube 101 - Slices and observations Dimension d6 Dimension d7 Dimension d1 Dimension d2 Dimension d3 Dimension d4 Dimension d5 Measure m1, m2, … Attribute a1, a2, … Cube Slice Observation Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort8 |
  9. 9. RDF Data Cube 101 – Dataset, Slice, Observation Cube and Slice qb:DataSet qb:slice qb:Observation Cube observation qb:observation qb:subSlice Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort9 | qb:Slice qb:dataSet void:subset
  10. 10. RDF Data Cube 101 – Data Structure Definitions (DSDs) Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort10 | http://sdmx.org/wp-content/uploads/2012/11/SDMX-Guidelines-for-the-Design-of-Data- Structure-Definitions.pdf RDF Data Cube model compatible with SDMX
  11. 11. 5 basic steps • 1.Define the prefixes to be used • 2.Publish your schema • Define the dimension(s) – used to identify the observations (ex. time, region), what the observation applies to • Define the measure(s) – the phenomenon being observed • Define the attribute(s) - unit of measure • Define the DSD (attach components) • 3.Publish your data • Define the Dataset (attach DSD) • Define Observations – the actual data • 4.Include Slices (views) on your data • Define SliceKey(s) - the fixed dimensions • Define the DSD (attach SliceKey(s)) • Define the Dataset (attach Slices to be defined) • Define Slices and Observations • 5.Select appropriate URIs Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort11 |
  12. 12. 1. Prefixes • PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> • PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# • PREFIX qb: <http://purl.org/linked-data/cube#> • PREFIX interval: <http://reference.data.gov.uk/def/intervals/> • PREFIX gn: <http://www.geonames.org/ontology#> • PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn#> • PREFIX acorn-sat: <http://lab.environment.data.gov.au/def/acorn/sat/> • PREFIX acorn-series: <http://lab.environment.data.gov.au/def/acorn/time-series/> Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort12 |
  13. 13. 2. Define the schema Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort13 | Dimension Dimension Dimension Measure Atttribute Measure Attribute Measure Attribute Atttribute Atttribute Dimension
  14. 14. 3. Define the Observations Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort14 |
  15. 15. 4. Define the slices Observation - MinTemperature - MaxTemperature - Rainfall - Booleans for missing data Day (3) Month (2) Year (1) ACORN-SAT Series/System (station) Current Data Cube structure (and URI/API logic) • Stations/time series • Year • Month • All linking to observations Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort15 |
  16. 16. Define the DSD Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort16 |
  17. 17. 5. Select appropriate URIs Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort17 |
  18. 18. Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort18 | (extra) Statistics at slice level To port to DDI-RDF Discovery
  19. 19. • Data describing the deployment history • Available in ACORN-SAT station catalogue (pdf) • Not available in tabular format distribution • ACORN-SAT composite stations – composed of one or several BoM stations • BoM (Bureau of Meteorology) stations – composed of one or several station sharing the same codes • Textual description of significant events • Data describing the detailed conditions of observations • Sensors • Deployment Intervals … using Semantic Sensor Network (SSN) ontology • SSN-XG report http://www.w3.org/2005/Incubator/ssn/XGR-ssn/ • SSN Ontology http://purl.oclc.org/NET/ssnx/ssn Station metadata Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort19 |
  20. 20. SSN: deployed systems and observations Skeleton Device Deployment PlatformSite System ssn:System onPlatform hasSubsystem hasDeployment ssn:DeploymentRelatedProcess ssn:Deployment deploymentProcesPart deployedSystem ssn:Platform deployedOnPlatform attachedSystem ssn:Device ssn:Sensor ssn:SensingDevice observes inDeployment observedBy ssn:Property observedProperty ssn:Observation Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort20 |
  21. 21. Example (Darwin) Time series – Weather stations – Sites – (Sensors) Darwin Post Office 014016 (1910-1942) Darwin Airport 014015 (1941-2007 & 2001-now) 2 sites – 1km apart – same code used Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort21 |
  22. 22. Deployment phases in Darwin Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort22 |
  23. 23. Multiple Views on Data – Mashups • Display the station locations and their average temperature readings on a map • http://lab.environment.data.gov.au/mashup/drilldown • Select a Date range for climate readings for a given location • http://lab.environment.data.gov.au/mashup Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort23 |
  24. 24. Multiple Views on Data – ELDA Linked Data API ssn:hasSubSystem ssn:hasDeployment ssn:deploymentProcessPartssn:observedBy Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort24 |
  25. 25. Multiple Views on Data – SPARQL Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort25 |
  26. 26. Multiple Views on Data – SPARQL PREFIX cube: <http://purl.org/linked-data/cube#> PREFIX sat: <http://lab.environment.data.gov.au/def/acorn/sat/> SELECT ?x, MAX(?max) AS ?MaxEver WHERE { <http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071> cube:subSlice ?y . ?y cube:subSlice ?x . ?x sat:month ?z . ?x cube:observation ?obs . ?obs sat:maxTemperature ?max . FILTER regex(?z, "07") } ORDER BY DESC(?max) LIMIT 1 RESULT: http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071/year/1975/ month/07 23.3 Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort26 |
  27. 27. Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort27 |
  28. 28. Wrap up • Experimental version of ACORN-SAT data • Available at http://lab.environment.data.gov.au/ • Developed for the Australian Bureau of Meteorology (BOM) by CSIRO in cooperation with the Australian Government Information Management Office (AGIMO) • Temperature (homogenised) plus Rainfall (not homogenised) • First version presented at Australian GovHack Day • Alternative to tabular data • Last version, uploaded to LOD cloud • http://thedatahub.org/dataset/acorn-sat • Linked data (and well managed URIs) to build the bridges between the different agencies • Current linked data pilot is one agency (BoM) and one server but applies solutions and schemes already in place in multi-agencies and multi-service providers context (e.g. UK) • Thanks to AGIMO for helping us to set up http://lab.environment.data.gov.au/ Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort28 |
  29. 29. Use It! http://michaelhalls.net/planforsun/index.php Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort29 |
  30. 30. Australian Government Linked Data Working Group (AGLDWG) • Ad-hoc group established August 2012 – BoM, OSP, CSIRO , AGIMO, DRALGAS, NAA, GA, ABS • Terms of reference – Develop technical guidelines and best practice on the use of ‘linked- data’ by AG agencies – Inform the development of data.gov.au as a platform for publishing Commonwealth PSI – Promote the benefits and encourage adoption of ‘linked-data’ for publishing Commonwealth PSI – Where appropriate, undertake specific activities and coordinate projects in pursuit of these objectives • Seeking formal endorsement Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort30 |
  31. 31. Conclusions • Approach is applicable to all climate time series • Opportunities to link to other datasets (Australia, World) • Geo-features (e.g. GeoNames - done) for weather station sites, districts • Other climate data e.g. regional and world climate data archives, cyclone tracks (not yet available as linked data) • Other environmental data (not yet available as linked data) Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort31 |
  32. 32. ISWC 2013 Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort32 | • The 12th International Semantic Web Conference and the 1st Australasian Semantic Web Conference 21-25 October 2013, Sydney, Australia • http://iswc2013.semanticweb.org/ • https://twitter.com/iswc2013 • First International Workshop on Semantic Statistics (SemStats 2013) • SemStats 2013 Challenge • Call for Papers • http://datalift.org/en/event/semstats2013/challenge-cfp • Data • http://datalift.org/en/event/semstats2013/challenge Recommended by!
  33. 33. CSIRO Computational Informatics Laurent Lefort Ontologist t +61 2 9123 4567 e laurent.lefort@csiro.au W csiro.au CSIRO COMPUTATIONAL INFORMATICS Thank you
  34. 34. Images credits • Blair Trewin The ACORN-SAT station at Butlers Gorge in central Tasmania (surfacetemperatures.blogspot.com.au ) • Nathanael Boehm Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort34 |
  35. 35. More information • Laurent Lefort, Josh Bobruk, Armin Haller, Kerry Taylor and Andrew Woolf A Linked Sensor Data Cube for a 100 Year Homogenised daily temperature dataset Proc. SSN 2012 Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort35 |

×