Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DOIs, provenance & vocabularies - Nicholas Car (CSIRO)

148 views

Published on

DOIs, provenance & vocabularies - Nicholas Car (CSIRO)
Presented at the ANDS facilitated GeoNetwork Community of Practice on April 3rd, 2017 in Canberra.

Published in: Science
  • Be the first to comment

  • Be the first to like this

DOIs, provenance & vocabularies - Nicholas Car (CSIRO)

  1. 1. DOIs, provenance & vocabularies Nicholas Car Data Architect nicholas.car@ga.gov.au DOIs, Provenance & Vocabs
  2. 2. Outline Three different extensions to regular GN use: 1. DOI and other identifier use 2. Provenance formulation and recording 3. Vocabulary use DOIs, Provenance & Vocabs
  3. 3. DOIs, other identifiers and GA 64af9ff3-71dd-431a-bc94-9d2280acef79
  4. 4. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable DOIs, Provenance & Vocabs
  5. 5. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable  Alex can generate catalogue records using custom code and post them into GA’s eCat. He can generate the UUIDs rather than have eCat do it so he can know what they are before submission. DOIs, Provenance & Vocabs
  6. 6. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable  Alex can generate catalogue records using custom code and post them into GA’s eCat. He can generate the UUIDs rather than have eCat do it so he can know what they are before submission.  Jingbo can move records between catalogues at the NCI and still use the same UUIDs for them DOIs, Provenance & Vocabs
  7. 7. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable • Weaknesses: • Not meaningful • Not part of an identifier scheme • Not resolvable by themselves DOIs, Provenance & Vocabs
  8. 8. DOIs and other identifiers • GN uses UUIDs for records • Weaknesses: • Not meaningful • Not part of an identifier scheme • Not resolvable by themselves  data.gov.au, not using GN, provides UUIDs and meaningful aliases for datasets, e.g.  “Offshore reconnaissance geophysical techniques”  http://data.gov.au/dataset/cdecf261-84a7-4911-a645- 2d7113e97d0b  http://data.gov.au/dataset/offshore-reconnaissance- geophysical-techniques DOIs, Provenance & Vocabs
  9. 9. DOIs and other identifiers • What are DOIs? • a persistent identifier used to uniquely identify digital objects, standardized by the ISO • Uses the Handle network: highly persistent • Popular and widely understood • Has many convenience resolver systems, e.g. https://doi.org/{DOI} (https://doi.org/10.4225/25/58a3ff6e07d21) • IGSNs are another DOI-like identifier DOIs, Provenance & Vocabs
  10. 10. DOIs and other identifiers • GA uses DOIs for important datasets and our own eCat IDs for all datasets, e.g.: • “Radiometric Thorium Equivalent grid of Warrachie, SA” • UUID: 64af9ff3-71dd-431a-bc94-9d2280acef79 • eCatID: 106850 • Our landing page: http://www.ga.gov.au/metadata- gateway/metadata/record/106850 • DOI: https://doi.org/10.4225/25/58a3ff6e07d21 DOIs, Provenance & Vocabs
  11. 11. GA’s DOI directions • Our eCat ID will remain our authoritative ID • Due to their embedded presence & simplicity • GN configured to mint them • We will promote eCat IDs & other IDs like DOIs, not UUIDs • GN landing page’s “Permalink” button will reveal a DOI • If it exists for a record • If not, an eCat-based URI including the eCat ID • UUIDs only used under the hood • For GN functions like crosslinks • We may support other ID schema in the future, like IGSNs • We require architecture outside GN for URI ID redirection DOIs, Provenance & Vocabs
  12. 12. Provenance
  13. 13. GA’s provenance model • We use PROV DOIs, Provenance & Vocabs
  14. 14. GA’s provenance model • We use PROV DOIs, Provenance & Vocabs
  15. 15. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Designed for satellite data processing • Limited to history of the catalogued item only • Not database/graph (de-normalised wrt many objects) DOIs, Provenance & Vocabs
  16. 16. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • Example: GN  ARGUS • Datasets  Surveys’ metadata online DOIs, Provenance & Vocabs
  17. 17. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • We have had to define our dataset  dataset provenance relationships in ISO19115: • PROV: wasDerivedFrom • ISO -1: AssociationTypeCode dependency • PROV: wasRevisionOf • ISO -1: AssociationTypeCode revisionOf • PROV: hadPrimarySource • ISO -1: AssociationTypeCode source DOIs, Provenance & Vocabs
  18. 18. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • We have had to define our dataset  dataset provenance relationships in ISO19115 • We can have Dataset  other thing relationships • ARGUS example: • PROV: Dataset prov:wasGeneratedBy Activity • ISO -1: Dataset ? Activity (not in GN) DOIs, Provenance & Vocabs
  19. 19. Vocabularies
  20. 20. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs <mri:descriptiveKeywords> <mri:MD_Keywords> <mri:keyword> <gco:CharacterString>Offshore Areas</gco:CharacterString> </mri:keyword> <mri:type> <mri:MD_KeywordTypeCode codeList="http://asdd.ga.gov.au/asdd/profileinfo/ gmxCodelists.xml#MD_KeywordTypeCode" codeListValue="theme"> theme </mri:MD_KeywordTypeCode> </mri:type> </mri:MD_Keywords> </mri:descriptiveKeywords>
  21. 21. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs <mri:descriptiveKeywords> <mri:MD_Keywords> <mri:keyword> <gco:CharacterString>Earth Sciences</gco:CharacterString> </mri:keyword> <mri:thesaurusName> <cit:CI_Citation> <cit:title> <gco:CharacterString> Australian and New Zealand Standard Research Classification (ANZSRC) </gco:CharacterString> </cit:title> ...
  22. 22. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs ... <cit:CI_OnlineResource> <cit:linkage> <gco:CharacterString> http://www.abs.gov.au/ausstats/abs@.nsf/mf/1297.0 </gco:CharacterString> </cit:linkage> </cit:CI_OnlineResource> ...
  23. 23. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • E.g. “GA Data Classification” • Broad GA categorisation for all data • Will be compulsory, as ANZSRC, enforced by GN • Can use specialised terms in other vocabs • GN will offer term selection • Live from online voc, not stored XML DOIs, Provenance & Vocabs
  24. 24. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • We are keen to work with others testing GN/SPARQL service integration DOIs, Provenance & Vocabs
  25. 25. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • Automated KW testing for term tidy-up • Abstract text mining with Natural Language Processing to add to KWs • Bulk addition, based on business knowledge of record data • E.g. thematic tagging based on GA section DOIs, Provenance & Vocabs
  26. 26. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • Automated KW testing for term tidy-up • Abstract text mining with Natural Language Processing to add to KWs • Bulk addition, based on business knowledge of record data • Reverse vocab application • Existing free text terms  vocabs DOIs, Provenance & Vocabs
  27. 27. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • We will be registering our vocabs themselves as datasets in eCat! DOIs, Provenance & Vocabs
  28. 28. Afterword • Lots of extension work at GA using GN • Inter systems linking growing • Semantic Richness beyond ISO19115 growing • GN still the only catalogue system for the foreseeable future • Other GN initiatives at GA, for another CoP meeting! DOIs, Provenance & Vocabs

×