Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

174 views

Published on

Margie Smith
Full Webinar: https://youtu.be/EDhJTCm9RN8

Transcript: https://www.slideshare.net/AustralianNationalDataService/transcript-4-fair-r-for-reusable

Other webinars in the series: http://www.ands.org.au/news-and-events/events/fair-webinar-series

Published in: Education
  • Be the first to comment

  • Be the first to like this

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

  1. 1. Provenance as an element of FAIR data principles Enabling data reuse Margie Smith Science Data Governance & Policy Science Data Section
  2. 2. Data governance and policy ANDS FAIR webinar series #4 – September 2017 Data Governance Committee Data Strategy Data Management Policy Data Archive Policy ⁞ Product Management Plans Data Management Plans Source catalogue Standardised vocabularies Publishing schemas ⁞
  3. 3. Why GA cares about data re-use Understanding the provenance of data that GA creates and consumes enables the organisation to adhere to its Science principles and underpins the organisation’s vision to ‘maximise our data potential’. http://www.ga.gov.au/about/corporate-plan ANDS FAIR webinar series #4 – September 2017
  4. 4. ANDS FAIR webinar series #4 – September 2017 What does provenance information look like As part of a metadata record  Information can be brief free-text  Structured free-text Pilbara Block 1:100 000 Landsat-5-TM image maps. Image files in BIL format
  5. 5. ANDS FAIR webinar series #4 – September 2017 What does provenance information look like It can be discursive text The ANUGA hydrodynamic model (https://anuga.anu.edu.au/) was run based on a Digital Elevation Model (DEM) and inputs from a regional storm surge model (GEMS GCOM2D) The maximum inundation depth and momentum values were identified in ArcGIS post processing. DEM used within ANUGA: Triangular mesh created by/within ANUGA from a regular grid (1 m horizontal resolution). The input grid was based on elevation data with varing accuary: onshore and offshore LiDAR, Navy soundings and 1 second SRTM DEM. The derived triangular mesh consisted of smaller triangles (max 5m^2) around the man-made drainage channels and larger triangles around the remainder of the study region (max 350m^2) Regional storm input: Temporal (i.e. storm characteristics through the simulation time) were extracted from the regional storm modelling (GEMS GCOM2D model) results for point locations along the Busselton-Dunsborough coastline. ANUGA model variables Some key variables set within the Python code were: minimum_storable_height = 0.10m, mannings coefficient of friction = 0.03, 12 minute modelling time steps, 64 CPUs were used (variations were identifed between the results depending on the number of CPUs specified. The 64 CPU results were in the middle of the field (range from 8 to 128 CPUs). Broader detail of the methods applied within this project are within the technical methodology document. Also see the GA Professional Opinion (Coastal inundation modelling for Busselton, Western Australia, under current and future climate) (http://pid.geoscience.gov.au/dataset/78873)
  6. 6. ANDS FAIR webinar series #4 – September 2017 Why we need provenance Scenario: advice to the public was generated based on a collection of sensor data at a point in time. Advice is generated Dataset A Agent Models Algorithms used Dataset A temporal subset Software version Advice request HPRM eCAT
  7. 7. Nick Car gave a presentation previously ANDS FAIR webinar series #4 – September 2017 https://youtu.be/elPcKqWoOPg
  8. 8. Provenance for data re-use ANDS FAIR webinar series #4 – September 2017 Process Dataset A HPRM eCat Output(s) Advice prov:Entity Temporal DB Event code / query Report prov:Plan prov:Activity wasGeneratedBy acquisition GitHub
  9. 9. FAIR principles TO BE RE-USABLE: R1. meta(data) have a plurality of accurate and relevant attributes. • R1.1. (meta)data are released with a clear and accessible data usage license. • R1.2. (meta)data are associated with their provenance. • R1.3. (meta)data meet domain-relevant community standards. ANDS FAIR webinar series #4 – September 2017 https://www.force11.org/fairprinciples
  10. 10. What else we are doing at GA • We have moved from an Oracle based ‘GeoCat’ catalogue to our current ‘eCat’ which was made public last month. • It was released as a minimum viable product and now improvements are being backlogged and prioritised as well as the BAU of product release. • We are currently cataloguing our (300+) services and linking the services to the data record in eCat where they exist. (ie some services are based on aggregated datasets or non-GA datasets) • Catalogue schema and codelists will be published next month. • The processes for releasing/publishing data products is well described and generally well known in the organisation. ANDS FAIR webinar series #4 – September 2017
  11. 11. GA Data and Publications Catalogue - eCat ANDS FAIR webinar series #4 – September 2017
  12. 12. ANDS FAIR webinar series #4 – September 2017 GA Data and Publications Catalogue - eCat
  13. 13. GA Data and Publications Catalogue - eCat ANDS FAIR webinar series #4 – September 2017 http://pid.geoscience.gov.au/id/dataset/ga/72759
  14. 14. GA Data and Publications Catalogue - eCat ANDS FAIR webinar series #4 – September 2017
  15. 15. How to support provenance and data reuse ANDS FAIR webinar series #4 – September 2017 A ‘source catalogue’ for the data acquisition phase eCat for publishing the data products Software and Object catalogues in the future
  16. 16. ANDS FAIR webinar series #4 – September 2017 Standards on provenance “Machine readable” could be: - An ISO19115 metadata statement per dataset contributing to a PROV-DM provenance graph Dataset Record(1..n) Product /subset of data in eCat Record1 Source Catalogue Service Report Data product Record(1..n) Product in eCat Record(1..n) Product in eCat derivedFrom
  17. 17. ANDS FAIR webinar series #4 – September 2017 Standards on provenance Dataset A CC-By Dataset B Commercial Ancestor(s) Derived / Aggregated dataset will inherit a license Dataset D Commercial Licences CC-By CiC … License aggregation WMS CC-By Software C
  18. 18. Data management prioritisation ANDS FAIR webinar series #4 – September 2017 Useability High Value
  19. 19. ANDS FAIR webinar series #4 – September 2017 Thank you. Margie.smith@ga.gov.au

×