Developing and publishing vocabularies


Published on

Presentation describing recent work on observation-related vocabularies, undertaken by CSIRO as part of a contribution to Australia's National Environmental Information Infrastructure.

Presented at the 2nd workshop of the Ocean Data Interoperability Platform, La Jolla, Ca. 3rd-6th December, 2013

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Here’s a set of important vocabularies. One of the key lessons from these is that words are not sufficient to act as keys. If you ask for a ‘pot’ of beer in different places in Australia, you could get 100% different quantity. At the very least, the name must be scoped. But probably a differnet kind of identifier is required.
  • Example of the data mapped between the HH and NGIS databases.Pretty good: Mostlyseparates Units and methods from the observable property Maps to terms from other related vocabularies (CAS, IUPAC, WDTF) Some grouping (simple hierarchy) and collections Not modulardifficult to incorporate vocabularies not governed by that discipline (e.g. ‘units of measure’) Inconsistent Governancesame concepts (different terms?) appear in multiple vocabularies, no relationships between terms and collections from different data providers, Not interoperablelocal, non-resolvable identifiers, lack of a formal definitions,lack of an ontology describing the relationship between conceptsAmbiguityconcepts are poorly defined,
  • If we look at a ‘nitrogen’ example we have theScaledQuantityKind“dissolved nitrogen concentration” which has the more general concept “nitrogen concentration”, which in turn is a specialization of “Concentration”.“dissolved nitrogen concentration” can be measured in “MolPerCent” units, which can be used for “Concentration” QuantityKinds, or “MilliGramsPerLitre” which can be used for “AmountOfSubstancePerUnitVolume” QuantityKinds.“dissolved nitrogen concentration” is a measure of the “nitrogen” SubstanceOrTaxon, which is an exactMatch of the ChEBI “elemental nitrogen” concept.
  • A number of formal units of measure systems exist: ‘The Unified Code For Units of Measure’ (UCUM; Schadow and McDonald, 2009), ‘Ontology of units of measure’ (OM; Rijgersberg et al., 2013), ‘Measurement Units Ontology’ (MUO; Berrueta and Polo, 2009), ‘Ontology of Units of Measurement’ (UO), ‘Semantic Web for Earth and Environmental Terminology’ (SWEET v2.2), ‘Quantities, Units, Dimensions, Values’ (QUDV; de Kooning et al. 2009) ‘Quantities, Units, Dimensions and Data Types’ (QUDT; Hodgson and Keller, 2011). These all use different modelling approaches and formalisms ranging from simple vocabularies enumerating units of measures, to alignment of measurement related concepts to upper ontologies. They also differ in their coverage of the set of unit of measures. Of the established and well-governed unit of measure ontology options, QUDT is well-aligned with our understanding of the relationships between measurements and units of measure. Our extension to QUDT recognised that QuantityKinds are the subset of all PropertyKinds that can be measured. We created the ScaledQuantityKind class as an equivalent class to allow specifying the units associated with the QuantityKind via the qudt:unit property. I understand that QUDT v2 intends to add this property to QuantityKind, making ScaledQuantityKind redundant.The objectOfInterest property allows specifying what the substance (or taxon if biological) is that is being measured.Vocabularies for the units of measure (instances of the ‘qudt:Unit’ concept) and the kinds of quantities (instances of the ‘qudt:QuantityKind’ concept) were imported from QUDT, supplemented with units of measure from the existing water quality vocabularies and any additional required quantity kinds.QUDT contains 1484 instances of units of measure and 236 quantity kinds. However, it is missing some units of measure and kinds of quantities required for the water quality data. To rectify this, we added 41 additional units of measure, and 17 quantity kinds, all in separate namespaces to the original QUDT. The conversion factors between these new units and the existing QUDT units still need to be defined using the QUDT mechanics.
  • HIC 2014Harmonization of vocabularies for water dataCox, Yu, SimonsObservational data encodes values of properties associated with a feature of interest, estimated by a specified procedure. For water the properties are physical parameters like level, volume, flow and pressure, and concentrations and counts of chemicals, substances and organisms. Water property vocabularies have been assembled at project, agency and jurisdictional level. Organizations such as EPA, USGS, CEH, GA and BoM maintain vocabularies for internal use, and may make them available externally as text files. BODC and MMI have harvested many water vocabularies alongside others of interest in their domain, formalized the content using SKOS, and published them through web interfaces. Scope is highly variable both within and between vocabularies. Individual items may conflate multiple concerns (e.g. property, instrument, statistical procedure, units). There is significant duplication between vocabularies. Semantic web technologies provide the opportunity both to publish vocabularies more effectively, and achieve harmonization to support greater interoperability between datasets. Models for vocabulary items (property, substance/taxon, process, unit-of-measure, etc) may be formalized OWL ontologies, supporting semantic relations between items in related vocabularies;By specializing the ontology elements from SKOS concepts and properties, diverse vocabularies may be published through a common interface;Properties from standard vocabularies (e.g. OWL, SKOS, PROV-O and VAEM) support mappings between vocabularies having a similar scopeExisting items from various sources may be assembled into new virtual vocabularies However, there are a number of challenges: use of standard properties such as sameAs/exactMatch/equivalentClass require reasoning support; items have been conceptualised as both classes and individuals, complicating the mapping mechanics;re-use of items across vocabularies may conflict with expectations concerning URI patterns;versioning complicates cross-references and re-use. This presentation will discuss ways to harness semantic web technologies to publish harmonized vocabularies, and will summarise how many of the challenges may be addressed.
  • Developing and publishing vocabularies

    1. 1. Developing and publishing vocabularies Simon Cox, Bruce Simons, Jonathan Yu | Environmental Information Systems 4 December 2013 LAND AND WATER
    2. 2. Are we talking about the same thing? - Beer glasses in Australia Glass Size 115 ml 4 oz 140 ml 5 oz 170 ml 6 oz 200 ml 7 oz 225 ml 8 oz 255 ml 9 oz NSW - Pony - Seven - - Middy Schooner Pint NT - - - Seven - - Handle Schooner - QLD - Small Beer - - Glass - Pot - - SA - Pony - Butcher - - Schooner Pint - TAS Small Beer - A Beer or Six - Eight - Ten or Pot - - VIC - Pony Small Glass - - Pot Schooner - WA Shetland Pony Pony Bobbie Glass - - Source: 2 | Linked Vocabularies | Simon Cox 285 ml 10 oz 425 ml 15 oz Middy Schooner 575 ml 20 oz Pot
    3. 3. Healthy Headwater - NGIS Terms cas_rn number EC ANGDTS Code EC PH pH 1688700-6 TDS 1688700-6 TDS ANGDTS Description ease at which conduction current can be caused to flow through material in microSiemens/centimetre negative logarithm of hydrogen ion concentration in ph units Units_used us/cm ms/cm mg/L pH units concentration of chloride as Cl in milligrams/litre mg/L mg/kg the portion of total solids that passes through filter and deemed to have been dissolved in sample in milligrams/litre mg/L WDTF Parameter hierarchy alt names chemical name ADWG name IUPAC name ElectricalConduc tivityAt25C_uSc Electrical m Conductivity WaterpH_pH pH TOTALAL KALINITY ALKT concentration in milligrams/litre CaCO3 of titratable bases using a methyl-orange endpoint of about pH 4.3 mg/L HARD the ability of water to precipitate soap and is sum of calcium and magnesium concentrations as milligrams/litre CaCO3 mg/L SAR ratio of sodium to magnesium and calcium and used to assess risk of excess sodium in irrigation water Ratio alkalinity ascribed to carbonate in milligrams/litre CO3 mg/L %MOL Hardness (as CaCO3) Sodium Adsorption Ratio Carbonate Alkalinity (as CaCO3) concentration of nitrate as N in milligrams/litre mg/L mg/kg Nitrate mg/L mg/kg ug/L Iron SAR 3812-326 ALKC NITRATE 1479755-8 7439-896 7439-89- concentration of iron as Fe in 6 milligrams/litre AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use Ion pH, alkalinity, acidity pH Chloride Total Dissolved Solids Chloride Anion Salinity Total Alkalinity (as CaCO3) HARDNE SS_CACO 3 Group Conductivity Chloride Total Dissolved Solids collection pH, alkalinity, acidity Hardness (as calcium carbonate) Hardness (as calcium carbonate) Salinity Carbonate Nitrate and Nitrite pH, alkalinity, acidity Nitrate and Nitrite Iron Anion Metal Cation
    4. 4. Are these the same? “nitrogen” “dissolved nitrogen” “Total nitrogen, water, filtered, milligrams per liter” “Concentration of nitrogen (total) per unit volume of the water body [dissolved plus reactive particulate phase] by oxidation and colorimetric autoanalysis“ “Concentration of nitrogen (total) per unit mass of the water body *dissolved plus reactive particulate <GF/F phase] by filtration and high temperature Pt catalytic oxidation” “Concentration (moles or mass) of total nitrogen (i.e. nitrogen in all chemical forms) in suspended particulate material per unit volume of the water column.” “Concentration of nitrogen (total) ,'PON'- per unit volume of the water body [particulate 2-10um phase+ by filtration, acidification and elemental analysis” “Dissolved total and organic nitrogen concentrations in the water column” 4 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    5. 5. Standards 5 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    6. 6. We are not alone! OKFN Linked Open Vocabularies 6 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    7. 7. Conceptual Model of QUDT AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use 7
    8. 8. Standard ontology of chemicals >36 000 chemical entities 8 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    9. 9. Dissolved nitrogen concentration objects SubstanceOrTaxon ScaledQuantityKind +object Of In t er est +qud t :gen er alizat ion n it r ogen n it r ogen con cen t r at ion d issolved n it r ogen con cen t r at ion +qud t :gen er alizat ion +exact Mat ch Con cen t r at ion elem en t al n it r ogen +qud t :un it +qud t :quan t it yKin d (CHEBI_3 3 2 6 7 ) +qud t :gen er alizat ion MolePer cen t +qud t :un it Unit A m oun t Of Subst an cePer Un it Volum e MilliGr am sPer Lit r e +qud t :quan t it yKin d AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    10. 10. Extension to QUDT  QUDT AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use  WQOP
    11. 11. Linked to SKOS AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    12. 12. 12 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    13. 13. Linked vocabulary items • • • ntration 13 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    14. 14. Underneath • SPARQL endpoint (test with this tool ) • SISSvoc service • SISSvoc search 14 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    15. 15. NERC Vocabulary Service • 60+ collections • 30,000+ terms (most in P01!) • Scope: geography, instruments, organizations, properties ... 15 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    16. 16. NVS implementation/interfaces • SQL store • View item as SKOS Concept • SPARQL endpoint • SISSvoc service • SISSvoc search 16 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    17. 17. Harmonization • WQOP Model extends QUDT • Items refer to ChEBI • • Mappings to NVS • • centration 17 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    18. 18. [ODIP-1] Conflation • Property/parameter vocabularies have 100s-1000s entries • each definition includes – Semantics – the quantity being observed e.g. ‘Nitrogen’ Plus one or more of – Procedure – the instrument or method used – Sampling protocol e.g. Weekly-mean – Units of measure – Aggregation with other primitive parameters • This makes it difficult to discover and combine data from different projects 18 | Linked Vocabularies | Simon Cox
    19. 19. Harmonization challenges • Even standard mapping props have no effect without reasoning (sameAs/exactMatch/equivalentClass) • items may be conceptualised as both classes and individuals: complicates mapping mechanics • URIs generally reflect ownership/maintenance • re-use of items across vocabularies may lead to surprises • versioning complicates cross-eferences and re-use vs. ??? 19 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    20. 20. SUMMARY AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    21. 21. Summary • Vocabularies should be • Standardized • Published • Harmonized • Extend / re-use existing vocabularies where possible 21 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    22. 22. Acknowledgements This work was undertaken as part of CSIRO’s contribution to eReefs – a National Environmental Information Infrastructure project. 22 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
    23. 23. Thank you CSIRO Land and Water Simon Cox Research Scientist t +61 3 9252 6342 e w LAND AND WATER