advancing biodiversity science and addressing the many pressing questions in the field that require access to large-scale data and integration of data from many sources. The Biological Collections Ontology (BCO) provides a semantic framework for integrating and reasoning over biodiversity data from diverse sources, including Darwin Core Archives (DwCA), Minimum Information for any (x) Sequence (MIxS) compliant genomic/metagenomic datasets, and field-based surveys or sampling studies. While the ontology itself is a key component of the biodiversity informatics tool kit, we believe that it must be accompanied by user-friendly data conversion, management, and query tools in order to gain wide adoption and function effectively. Over the past year, several hackathons and workshops have focused on converting DwCA and tabular-formatted data into resource description format (RDF), so that it can be queried using the BCO. The result is a functioning proof-of-concept workflow and proposals to take parts of the workflow into production.
In this presentation, we will report on some of the latest developments in the BCO and related applications, including mappings from Darwin Core (DwC) and MIxS to BCO and tools for converting between tabular/DwCA data and RDF. We will conclude with a brief discussion of how the semantic approach adopted by the BCO project can contribute to research for understanding and sustaining biodiversity.
Forensic Biology & Its biological significance.pdf
Using the Biological Collections Ontology to Advance Biodiversity Science
1. Using the Biological Collections
Ontology to Advance Biodiversity
Science
TDWG 2014, Jönköping, Sweden
Ramona Walls
John Wieczorek
Robert Guralnick
John Deck
2. Overview
1. How we model biodiversity information in
the Biological Collections Ontology
2. Integrating ontologies into biodiversity
information workflows
3. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
4. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
RECORD
5. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
MATERIAL SAMPLE
& ORGANISM
6. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
EVENT &
OCCURRENCE
7. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
LOCATION
8. Properties in an example Darwin
Core record
• occurrenceID
• modified
• rights
• institutionCode
• collectionCode
• datasetName
• basisOfRecord
• dynamicProperty
• catalogNumber
• recordedBy
• sex
• preparations
• otherCatalogNumbers
• associatedMedia
• associatedReferences
• associatedSequences
• eventDate
• year
• month
• day
• fieldNumber
• eventRemarks
• higherGeography
• continent
• waterBody
• islandGroup
• island
• country
• stateProvince
• county
• locality
• minimumDepthInMeters
• maximumDepthInMeters
• locationRemarks
• decimalLatitude
• decimalLongitude
• geodeticDatum
• coordinateUncertaintyIn
Meters
• georeferencedBy
• georeferencedDate
• georeferenceSources
• georeferenceRemarks
• identifiedBy
• dateIdentified
• typeStatus
• scientificName
• kingdom
• phylum
• class
• order
• family
• genus
• specificEpithet
• infraspecificEpithet
• scientificNameAuthorship
IDENTIFICATION/TAXON
12. How to create RDF triples (using Ontology terms) for
biodiversity data
Check for an easy way first!
See if you can use the BiSciCol triplifier (http://biscicol.org/triplifier/) or similar tool that
automates file conversion for specific formats. If not, proceed.
Create Mapping File
• Create groups of columns and assign to relevant classes
• Define columns containing a URI identifier for each class within each distinct record.
• If you’re not importing an existing ontology, create relationships between classes
Assemble into Mapping File, the format depending on the tool used in the next step.
Use Conversion Tool
Check out WebKarma (http://www.isi.edu/integration/karma/) or D2RQ (http://d2rq.org/).
Send to Triple-Store
Upload data to a Triple-Store or SPARQL Endpoint (e.g Virtuoso http://www.openlinksw.com/)
http://www.wikihow.com/Create-RDF-Triples-%28Using-Ontology-Terms%29-for-Biodiversity-Data
17. Conclusions
• BCO can work across different data types, not just
for DwC.
• The work of producing BCO has forced us to look
at DwC definitions more rigorously.
• BCO provides an opportunity to manage parts of
the DwC vocabulary as controlled vocabularies
that are rigorously, logically defined.
– example: basisOfRecord
• Road map for this work includes the intention to
propose BCO as a TDWG standard.
18. Acknowledgments
• Dozens of participants at BCO workshops and
hackathons over the past two years
• NSF-EAGER: An Interoperable Information
Infrastructure for Biodiversity Research (I3BR)
• NSF: Research Coordination Network for GSC
(RCN4GSC)
• Gordon and Betty Moore Foundation (iMicrobe)
• VertNet
• University of Kansas Biodiversity Institute
Editor's Notes
Ramona and introductions
Ramona
JOhn
Show typical metadata and how it confounds material entities and process and why this is a problem.
JOhn
Show typical metadata and how it confounds material entities and process and why this is a problem.
John
Show typical metadata and how it confounds material entities and process and why this is a problem.
John
John
John
Ramona
Ramona
separation of processes, material entities, information content entities and how they link to one another
John
Show structure of specimens and observations
Ramona
John
Ramona
Ramona
data collection spreadsheet with an ontology behind it to a triple store you can query to new discoveries!