A conceptual framework for implementing solid identifiers for use in data aggregation frameworks and their impact on data publishing and downstream linking.
Interactive Powerpoint_How to Master effective communication
BiSciCol + VertNet: A Conceptual and Technical Framework for Identifying Specimens
1. BiSciCol + VertNet: A Conceptual and
Technical Framework for Identifying
Specimens
iEvoBio Flash Talk 2013
Aaron Steele, University of California, Berkeley
John Deck, University of California, Berkeley
Rob Guralnick, University of Colorado, Boulder
5. • <1% DwC Triplet match between Genbank and
VertNet
• Identifiers are not awesome (not persistent,
resolvable, or even globally unique)
BiSciCol / Identifier Review of
Challenges
6. ark:/21547/R2 = Uniquely identifies processed data instance
_
separator = _
550e8400-e29b...
suffix =550e8400-e29b-41d4-a716-446655440000
The suffix is assigned by VertNet can be resolved using both the
EZID and BCID systems using the suffix passthrough system.
BCID Technology (from software bazaar)
ark:/21547/
ark:/21547/ = Scheme plus name assigning authority
R2
R2 = BCID Group identifier, defines a common concept per dataset
7. A Conceptual and Technical Framework for
Identifying Specimens with (VertNet + BiScicol)
Lifecycle of a record1. Publisher uploads a record to IPT which gets published in a Darwin Core Archive.2. The Darwin Core Archive is processed by VertNet into a bunch of CSV files that target specific use cases.3. The CSV files are bulkloaded to various datastores, including Amazon S3, CartoDB, and Google App Engine datastore.4. Once in the datastore, records are served over HTTP APIS as JSON which power apps like the VertNet portal.