Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metadata syncronisation with GeoNetwork - a users perspective

249 views

Published on

Metadata synchronisation with GeoNetwork - a users perspective: making metadata great again.
Presented at the ANDS facilitated GeoNetwork Community of Practice on April 3rd, 2017 in Canberra.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Metadata syncronisation with GeoNetwork - a users perspective

  1. 1. Metadata Synchronisation with GeoNetwork – a user’s perspective Making Metadata Great Again
  2. 2. What we needed to do: • Programmatically synchronise metadata between catalogue and files (both directions) • Automagically create large numbers of metadata records populated with values drawn from diverse sources (e.g. in-house databases, spreadsheets, text files, floppy disks, papyrus scrolls, stone tablets etc.) • Update spatial information in metadata record from dataset • Update online distribution linkage metadata from authorised distributions (e.g. THREDDS at NCI) • Demonstrate practical querying of the catalogue for real-time operational usage NO REPLICATION WITHOUT SYNCHRONISATION! GeoNetwork Users’ Group
  3. 3. GeoNetwork Users’ Group Issues which needed to be resolved • The task was initially given as file format translation only, with no mention of metadata. I was a metadata noob who had to ask lots of stupid questions • Data is hosted externally to GA at the NCI • No “hard” linkages existed between datasets and their metadata records. Needed to find records using “soft” keys like title words and filenames. • Some datasets have existing records, some new ones don’t. • GA’s eCat GeoNetwork catalogue implementation went live in April, 2016, creating an immediate and pressing support backlog. • GA’s heroic and knowledgable eCat gurus (Andy, Marty, Belle & Aaron) were (are?) oversubscribed.
  4. 4. GeoNetwork Users’ Group Issues which needed to be resolved (Continued) (You know you’re in trouble when the issues run to two slides) • GA is an early adopter of ISO19115-3 • CSW API for updating data is relatively cumbersome and requires identity management and authentication. GeoNetwork API needed to change status of edited records. Too damn hard. • Centralised identity management & authentication is still a work in progress at GA • Many metadata update operations are relatively complex and best handled with direct manipulation of the metadata XML in Python scripts • CSW querying is user-unfriendly
  5. 5. Metadata synchronisation workflow explained in one, simple diagram GeoNetwork Users’ Group
  6. 6. Overview of Metadata Synchronisation Workflow 1. Convert datasets to standards-compliant netCDF-CF with ACDD metadata attributes 2. Generate unique identifiers (UUID, eCat ID, DOI) for each dataset and write these into the files 3. Create new XML metadata records using values drawn from dataset and other specified sources, and bulk-ingest them into eCat 4. Populate ACDD metadata attributes in netCDF files from values in eCat records 5. Generate updated XML metadata records with updated extents and valid URLs for online distributions, and bulk- ingest them into eCat 6. Repeat steps 4 and 5 as required GeoNetwork Users’ Group
  7. 7. Key factors which make it work • UUID is written into dataset in order to establish a “hard” link to the associated metadata record • Bulk ingestion of new or updated records into eCat becomes a simple, semi-automatic operation. If it validates, it’s good. • The complete, unfiltered, internally-visible metadata record must be accessed for updating (as opposed to the filtered externally-visible one) • Workflow is modular and tuneable, and adaptable to different collections. Already successfully applied to geophysics, bathymetry and elevation datasets. • Simple tools have been provided to users to easily leverage CSW queries (csw_find) GeoNetwork Users’ Group
  8. 8. Sample auto-created netCDF header with ACDD attributes// global attributes: :GDAL = "GDAL 1.11.1, released 2014/09/24" ; :survey_id = "409" ; :ecat_id = 105000 ; :geospatial_lon_min = 148.206 ; :geospatial_lon_resolution = 0.00399999999999068 ; :geospatial_lat_max = -35.31 ; :geospatial_bounds_crs = "GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["deg ree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]" ; :geospatial_lat_min = -36.022 ; :geospatial_lat_resolution = 0.00399999999999778 ; :geospatial_lat_units = "degrees_north" ; :geospatial_lon_units = "degrees_east" ; :geospatial_bounds = "POLYGON((148.4480 -36.0202, 148.2227 -35.9990, 148.2173 -35.9931, 148.2086 -35.3141, 148.2154 -35.3099, 148.5179 -35.4566, 148.5221 -35.4634, 148.5228 -36.0172, 148.5172 -36.0228, 148.4480 -36.0202))" ; :geospatial_lon_max = 148.522 ; :uuid = "c85d7857-f031-4b16-9917-e8b732e3e950" ; :title = "Total Magnetic Intensity (TMI) grid of Canberra-Wagga Wagga, ACT/NSW, 1973/74 survey" ; :source = "This mNSW0409.nc grid includes airborne-derived TMI data for the Canberra-Wagga Wagga, ACT/NSW, 1973/74 survey acquired for the geological survey of ACT, NSW" ; :summary = "Total magnetic intensity (TMI) data measures variations in the intensity of the Earth magnetic field caused by the contrasting content of rock-forming minerals in the Earth crust. Magnetic anomalies can be either positive (field stronger than normal) or negative (field weaker) depending on the susceptibility of the rock. The data are processed via standard methods to ensure the response recorded is that due only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose.This magnetic grid has a cell size of 0.004 degrees (approximately 400m). The data used to produce this grid was acquired in 1975 by the ACT, NSW Government, and consisted of 24461 line-kilometres of data at 1500m line spacing and 150m terrain clearance." ; :product_version = "Version 2.0, April 2015" ; :history = "This mNSW0409.nc grid is an airborne-derived Total Magnetic Intensity (TMI) grid for the Canberra-Wagga Wagga, ACT/NSW, 1973/74 survey. The survey was acquired under the project No. 409 for the geological survey of ACT, NSW. The grid has a cell size of 0.004 degrees (approximately 400m). A total of 24461 line-kilometres of data at a line spacing of 1500m were acquired to produce this grid. To constrain long wavelengths in the grid, an independent data set, the Australia-wide Airborne Geophysical Survey (AWAGS) airborne magnetic data, was used to control the base levels of the survey grid (Milligan et al., 2009). This survey grid is essentially levelled to AWAGS. Details of the specifications of individual airborne surveys can be found in the Fourteenth Edition of the Index of Airborne Geophysical Surveys (Percival, 2014). This Index is also available online at http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_f3ad4f15-96bc-0cf3-e044- 00144fdd4fa6/Index+of+airborne+geophysical+surveys%3A+14th+edition. Further up to date information about individual surveys can also be obtained online from the Airborne Surveys Database at http://www.ga.gov.au/oracle/argus/. The original grid was converted from ERMapper (.ers) format to netCDF4_classic format using GDAL1.11.1. The main purpose of this conversion is to enable access to the data by relevant open source tools and software. The netCDF grid was created on 2016-08-29T10:51:42 and has its y-axis indexed Southward-positive. ReferencesMilligan, P.R., Minty, B.R.S., Richardson, M. & Franklin, R., 2009. The Australia-wide Airborne Geophysical Survey accurate continental magnetic coverage. Preview, No. 138, p. 1- 128. Percival, P.J., 2014. Index of airborne geophysical surveys (Fourteenth Edition)." ; :institution = "Commonwealth of Australia (Geoscience Australia)" ; :keywords = "TMI, magnetics, NCI, AU, Magnetism and Palaeomagnetism, Airborne Digital Data, Geophysical Survey, grid, 409" ; :license = "Creative Commons Attribution 4.0 International Licence" ; :time_coverage_start = "1973-04-09" ; :time_coverage_end = "1975-02-07" ; :doi = "http://dx.doi.org/10.4225/25/589c553856e0e" ; :metadata_link = "https://pid.nci.org.au/dataset/c85d7857-f031-4b16-9917-e8b732e3e950" ; :Conventions = "CF-1.6, ACDD-1.3" ; :date_created = "2013-05-03T00:00:00" ; :date_modified = "2016-08-29T10:51:42" ; } GeoNetwork Users’ Group
  9. 9. csw_find examples – Human-friendly Metadata searches Find all NCI filenames & titles for potassium grids overlapping a geographic bounding box $ csw_find -k NCI,grid,potassium -b 148.996,-35.48,149.399,-35.124 "FILE:GEO" "/g/data1/rr2/National_Coverages/radmap_v3_2015_unfiltered_pctk/radmap_v3_2015_unfiltered_pctk.nc" "radmap v3 2015 unfiltered pct potassium grid" "FILE:GEO" "/g/data1/rr2/National_Coverages/radmap_v3_2015_filtered_pctk/radmap_v3_2015_filtered_pctk.nc" "radmap v3 2015 filtered pct potassium grid" "FILE:GEO" "/g/data2/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rNSW1218k/rNSW1218k.nc" "Radiometric Potassium grid of Southeast Lachlan, NSW, 2010 survey" "FILE:GEO" "/g/data2/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rNSW0756k/rNSW0756k.nc" "Radiometric Potassium grid of NSW DMR, Discovery 2000, Area S, Braidwood, NSW 2001 survey" Find all WMS endpoints for all NCI data from survey ID 850 $ csw_find -k NCI,grid,850 -p WMS -f url "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_A6t/rSA0850_A6t.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_A1k/rSA0850_A1k.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_A6k/rSA0850_A6k.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_A6u/rSA0850_A6u.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_A1u/rSA0850_A1u.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_851K/rSA0850_851K.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A6/mSA0850A6.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_A1t/rSA0850_A1t.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A5/mSA0850A5.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A4/mSA0850A4.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_851u/rSA0850_851u.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A7/mSA0850A7.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_851t/rSA0850_851t.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A1/mSA0850A1.nc" "http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850_851/mSA0850_851.nc" csw_find defaults to hitting GA’s externally-visible GeoNetwork CSW, but can be pointed to any CSW URL GeoNetwork Users’ Group
  10. 10. Still to-do • Integrate tools into completely automated workflows (e.g. folder watchers, regular nightly/weekly synchronisation, etc) • Support non-netCDF file formats (partially done) • Better implement MD5 checksums in metadata against data files for change detection (currently only in description) • Eliminate remaining manual steps in bulk ingestion/updating of metadata records • Demonstrate large-scale automated discovery and processing system • Implement Linked Data solutions to express relationships between entities (e.g. Datasets<->Surveys) GeoNetwork Users’ Group
  11. 11. Presented by Alex Ip data@ga.gov.au GeoNetwork Users’ Group Thank you! Questions?

×