• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
GBIF Vocabularies, at TDWG 2011 (20 Oct 2011)
 

GBIF Vocabularies, at TDWG 2011 (20 Oct 2011)

on

  • 996 views

Establishing a support infrastructure for Knowledge Organisation Systems (KOS) in biodiversity informatics. ...

Establishing a support infrastructure for Knowledge Organisation Systems (KOS) in biodiversity informatics.
Eamonn O Tuama, Dag Terje Filip Endresen, and David Remsen (GBIF)

Vocabularies and ontologies are types of Knowledge Organisation System (KOS) that range from simple glossaries and dictionaries through classification schemes, taxonomies, thesauri and ontologies [1]. As such, they are an essential foundation for data interoperability within the biodiversity informatics domain. Recent TDWG conferences have acknowledged the need to adopt principles and best practices for developing ontologies and to draw, wherever possible, on available community resources [2]. GBIF also recognises this need by including KOS related activities in the GBIF Strategic Plan 2012-2016 [3] and in its ongoing work programmes. Key activities to date have been the establishment of the prototype GBIF Vocabularies Service [4] and the publication of a white paper Recommendations for the use of Knowledge Organisation Systems by GBIF [5].

The focus of this symposium is on identifying key actors and activities that will enable TDWG, GBIF and related communities of practice to develop a road map on how to engage all players in creating a global infrastructure for the development, maintenance and governance of such vocabularies (the term “vocabularies” is used here in a generic sense to represent the various types of KOS). Several brief presentations will set the background for our discussions. After introducing the broader GBIF context, the recommendations of the GBIF KOS task group and their uptake in the GBIF work programme will be outlined. Two systems for managing vocabularies and ontologies will then be introduced – the GBIF Vocabularies Service [4] and the BioPortal [6]. This will be followed by a summary of the history, management and plans for the TDWG vocabularies which form the foundation for KOS in biodiversity informatics, and conclude with an account of experiences in managing the Darwin Core as an example of a widely used set of terms under active discussion in the community.

[1] http://www.clir.org/pubs/reports/pub91/contents.htm
[2] http://imsgbif.gbif.org/CMS/DMS_.php?ID=1057
[3] http://links.gbif.org/sp2012_2016.pdf
[4] http://vocabularies.gbif.org
[5] http://links.gbif.org/gbif_kos_whitepaper_v1.pdf
[6] http://bioportal.bioontology.org/

Statistics

Views

Total Views
996
Views on SlideShare
996
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • GBIF Vocabularies: http://vocabularies.gbif.org/ - ScratchPads: http://scratchpads.eu/ - Drupal: http://drupal.org/ Semantic Media Wiki: http://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki - SpecesID: http://species-id.net/wiki/ - Key To Nature: http://www.keytonature.eu/wiki/ Protégé: http://protege.stanford.edu/ - Protégé SKOSed: http://code.google.com/p/skoseditor/ (CO-ODE) TopQuadrant, TopBraid Enterprise Vocabulary Net (EVN): Pool Party iQvoc: http://www.w3.org/2001/sw/wiki/Iqvoc TemaTres: http://www.vocabularyserver.com/ More tools are listed from the SKOS community Wiki: http://www.w3.org/2001/sw/wiki/SKOS
  • GBIF Registry: http://rs.gbif.org Google Code project site: http://code.google.com/p/gbif-registry/
  • SKOS: http://www.w3.org/2004/02/skos/, http://www.w3.org/TR/skos-primer, http://en.wikipedia.org/wiki/SKOS CO-ODE: http://www.co-ode.org/, http://code.google.com/p/co-ode-owl-plugins/ - SKOSEd: http://code.google.com/p/skoseditor/
  • CO-ODE: http://www.co-ode.org/, http://code.google.com/p/co-ode-owl-plugins/ - OWL Ontology Browser: http://code.google.com/p/ontology-browser/ - OWLDoc: http://code.google.com/p/co-ode-owl-plugins/wiki/OWLDoc
  • NCBO BioPortal: http://bioportal.bioontology.org/ BioPortal Core Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportal_core/ BioPortal UI Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportalui/ (Ruby on Rails)
  • NCBO BioPortal: http://bioportal.bioontology.org/ BioPortal Core Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportal_core/ BioPortal UI Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportalui/ (Ruby on Rails)
  • NCBO BioPortal: http://bioportal.bioontology.org/ BioPortal Core Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportal_core/ BioPortal UI Download and Source Code: https://bmir-gforge.stanford.edu/gf/project/bioportalui/ (Ruby on Rails)
  • GBIF Strategic Plan 2012-2016: “ Capture new data resources by enabling multiple and enriched data types, such as metadata, taxonomic names, observations, etc., to be linked and accessed.” Page 6. “ GBIF will continue to catalyse the development and adoption of biodiversity informatics standards, a role it has played with great success to date. In anticipation of the integration and serving of future data types, GBIF will work closely with partners to enable data integration and interoperability across phenotypic, genomic, taxonomic, geospatial and ecosystem domains. Two critical areas for the next five years include: (i) Knowledge Organization Systems such as vocabularies, thesauri and ontologies; and (ii) Persistent Identifiers to support a variety of web services including those for core taxonomies and nomenclature through the Global Names Architecture.” Page 7.
  • GBIF Strategic Plans 2007-2011: “ Further activities as part of the Plan will include improving the Data Portal system and expanding the depth and range of data types to which it is a user-friendly gateway ” (page 7). “ GBIF will stimulate high impact science and exploit linkages among biodiversity data types at all levels of biological organisation and with data of other types through worldwide data sharing ” (page 14). “ Indexing of all data types (e.g. names, specimens, observations and literature) using GUIDs in place by end 2007 ” (page 20). “ User-friendly tools for Nodes and data providers to support data sharing: At least one supported package available for each data type such as specimen (2004), observation (2004), descriptive (2008), literature (2009), name/concept (2006), image (2007), character (2008), OGC (2007), etc. ” Page 22. “ Integrate genetic sequences, eco-logical datasets and more data types such as images, sound files, etc. with species occurrence data and digital literature resources. ” Page 52. “ Observational and monitoring data have been underserved by GBIF. Funds are needed for a Programme Officer to work with these communities to bring the users and providers together, identify common needs and opportunities, and develop data models and standards for these data types. Observational and monitoring data are key data types for important user groups, including national, regional and local land managers and conservationists. Failure to develop better ways of managing observational and monitoring data will severely compromise its utility for many of these users, including the biodiversity conventions ” (page 53).

GBIF Vocabularies, at TDWG 2011 (20 Oct 2011) GBIF Vocabularies, at TDWG 2011 (20 Oct 2011) Presentation Transcript

  • Dag Endresen (dendresen@gbif.org) Knowledge Systems Engineer GBIF New Orleans (Louisiana, USA) 20 October 2011 Biodiversity Information Standards, TDWG Annual Meeting 2011, New Orleans The GBIF KOS Work Program: Prioritized Requirements and Proposed Solutions
  • Outline
    • Element vocabularies and value vocabularies
    • Vocabulary management tools
    • Vocabularies exchange format (SKOS)
    • Vocabulary registry (portal)
    • New data types
  • Standards Biodiversity Information Standards (TDWG), Dublin Core Metadata Initiative (DCMI), Genomics Standards Consortium (GSC), etc... provide domain standards. We want to reuse, map and relate terms across these standards. Why: Gain understanding across domains
  • Element vocabulary (glossary) Darwin Core (DwC), Dublin Core (DCMI), Ecological Metadata Language (EML), Gene Ontology (GO), TDWG Ontology, etc... provide definitions for conceptual terms. We want to reuse, map and relate terms from basic vocabularies with concept definitions. Why: reuse terms and share a common definitions and understanding of biodiversity concepts.
  • Vocabulary management tools
    • GBIF Vocabularies
      • Custom Scratchpad Tool (Drupal)
    • Semantic Wiki (SpeciesID, Key to Nature)
    • Protégé (collaborative Protégé)
      • SKOSEd plugin, Web-Protégé
    • Top Quadrant EVN (commercial)
    • Pool Party (commercial)
    • ThManager (open source)
    • ISOcat (Clarin, linguistics)
    • iQvoc (open source)
    • TemaTres (open source, Spanish)
  • GBIF Vocabularies http://vocabularies.gbif.org Collaborative development of community terminology, including biodiversity concept definitions and controlled value lists.
  • Controlled Vocabularies The “Vocabularies” are Value Vocabularies (authority files) of accepted values for terms where controlled values are already available - or appropriate to develop. dwc: basisOfRecord PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist Occurrence Taxon Location dc: Type Collection Dataset Event Image InteractiveResource Service Software Sound Text PhysicalObject StillImage MovingImage gbif: nomenclatural_code ICBN ICZN ICVCN ICNB ICNCP BioCode Why: standardize how biodiversity data is provided when controlled values are appropriate
  • Controlled Vocabularies
    • “ Extensions” are Element Vocabularies defining new terms organized as extensions to Core Types (dwc: Taxon and dwc: Occurrence).
    • Audubon Core (multimedia/images)
    • DwC-Germplasm (plant genetic resources)
    • EOL Data Object (species profiles)
    • GISIN Species Status (invasive species)
    • … etc
    Why: Provide a mechanism for thematic communities to define their own specific terms.
  • GBIF Vocabularies
    • Core types – could be more than DwC: Taxon and DwC: Occurrence
    • habitat, spatial areas, lines, grid, places, images/multimedia, literature, people, institutes, collections, collection specimens, etc…?
    • “ Extensions ” = element/attribute vocabulary, definition of terms
    • Separate the definition of terminology from application models
    • Is “ extensions ” the appropriate label?
    • “ Vocabularies ” = value vocabulary, authority files
    • external examples: countries, languages, …
    • biodiversity domain: taxonRank, basisOfRecord, …
    http://vocabularies.gbif.org
  • GBIF Vocabularies
    • GBIF Vocabularies is hosted by the Scratchpads server in London
    • Install the GBIF Vocabulary Service in Copenhagen?
    • Further developments are needed.
    • Package the Vocabulary Service as an open-source tool?
    • Develop as Drupal modules, migrate to Drupal 7?
    • Element vocabularies are not always an “ extension ” of Darwin Core…?
    • Add management interface with definitions for new core types?
    • Rename “ Extensions ” to “ Element- ” or ” Attribute Vocabularies ” ?
    • Rename “Vocabularies” to “Value Vocabularies” or “Authority files”…?
    • Export and import of vocabularies to and from other management systems (SKOS, RDF, OWL as vocabulary exchange format?)
    • SKOS import and export features to be developed?
    • Improved Human readable interface
    • Export to HTML/PDF format for human readable documentation of a vocabulary?
  • Vocabulary Registry/Portal
    • GBIF Vocabulary Registry
      • Is the present registry sufficient?
    • GBIF Vocabularies
      • Develop the Scratchpads solution further as a vocabulary registry?
    • NCBO BioPortal alternative
      • Start using the NCBO BioPortal software
    Why: Support the discovery of biodiversity terminology and standard vocabularies.
  • GBIF Vocabulary Registry The official versions of the “ vocabularies ” and “ extensions ” for deployment are available from the GBIF Registry ( http://rs.gbif.org ). They are used from here by the GBIF infrastructure such as the IPT and HIT. Separate service for discovery – different service from the GBIF Vocabulary site ( management ≠ discovery ).
  • GBIF Vocabulary Registry
    • Promote SKOS as the preferred vocabulary (exchange) format?
      • Gradually replace XML Schema for defining standards?
      • Why: Promote ease of vocabulary exchange, import and export.
    http://rs.gbif.org Simple Knowledge Organization System (SKOS)
  • GBIF Vocabulary Registry
    • Add human interface to explore SKOS documents at the GBIF Registry?
      • OWLDoc (CO-ODE, static HTML)
      • OWL Ontology Browser (CO-ODE, dynamic)
  • Using the BioPortal Registry GBIF KOS Task Group: “ GBIF should deploy an instance of the BioPortal platform for biodiversity ontologies as a complement to the GBIF Vocabularies Server. ”
  • Using the BioPortal Registry
    • Include Biodiversity Vocabularies to the NCBO BioPortal…?
      • Will support the mapping of terms to the major Genomics Vocabularies.
    • Establish a “ GBIF BioPortal ” using the same BioPortal software?
      • Will focus on Biodiversity Community identity and relevance.
  • Workflow Draft vocabulary Review version Published version Approve? … and other SKOS compliant vocabulary management tools. -> Uptake by the GBIF infrastructure including the IPT and the data portal.
  • “ In anticipation of the integration and serving of future data types , GBIF will work closely with partners to enable data integration and interoperability across phenotypic , genomic , taxonomic , geospatial and ecosystem domains.” GBIF Strategic Plan 2012-2016:
  • “ Further activities as part of the Plan will include improving the Data Portal system and expanding the depth and range of data types “ “ specimen, observation, descriptive, literature, name/concept , image, character, OGC, etc ” GBIF Strategic Plan 2007-2011:
  • New Core Types?
    • DwC: Taxon
    • DwC: Occurrence
    • Aububon Core (images/multimedia)
    • Invasive Species (invasive in region/country)
    • New Spatial Objects (from point locations to include polygon, poly-line and grid objects)
    • etc…
    • Is the general principle on Extension of Core Types also suitable for new data types?
  • Data types dwc: identificationID dwc: dateIdentified dwc: identifiedBy dwc: taxonID dwc: scientificNameID dwc: scientificName … dwc: measurementID dwc: measurementValue dwc:m easurementUnit dwc: measurementDeterminedBy … dwc: taxonID dwc: scientificNameID dwc: scientificName dwc: taxonConceptID dwc: kingdom dwc: family dwc: genus dwc: specificEpithet … dwc: occurrenceID dwc: basisOfRecord dwc: eventID dwc: eventDate dwc: locationID dwc: decimalLongitude dwc: decimalLatitude dwc: taxonID dwc: scientificNameID dwc: scientificName … dc: identifier dc: bibliographicCitation dc: title dc: creator dc: date dc: source dc: language dwc: taxonRemarks … dc = http://purl.org/dc/terms/ dwc = http://rs.tdwg.org/dwc/terms/ gbif = http://rs.gbif.org/terms/1.0/ etc… dwc: vernacularName dc: language dc: temporal dwc: locality …
  • Star schema dc = http://purl.org/dc/terms/ dwc = http://rs.tdwg.org/dwc/terms/ gbif = http://rs.gbif.org/terms/1.0/ audubon: http://rs.tdwg.org/ac/terms/
  • Star schema (??) dc = http://purl.org/dc/terms/ dwc = http://rs.tdwg.org/dwc/terms/ gbif = http://rs.gbif.org/terms/1.0/ audubon: http://rs.tdwg.org/ac/terms/ etc…
  • Dag Endresen (dendresen@gbif.org) Knowledge Systems Engineer GBIF New Orleans (Louisiana, USA) 20 October 2011 Biodiversity Information Standards, TDWG Annual Meeting 2011, New Orleans The GBIF KOS Work Program: Prioritized Requirements and Proposed Solutions