Jennifer Thomas – Assistant Collection Manager
Division of Entomology
University of Kansas
Current common problems with digital specimen
Common issues with disparate data, data integration
General types of unique identifiers and how they
Specific digital ID’s for Natural History objects and
the need for a global solution
What to do until we have a solution – best practices
Current common problems for
Entomology collection object records
Multiple barcodes for single
This happens when…
Specimens are gifted to another institution
Specimens are retained by another institution
as a result of a revisionary work
Specimens are returned to an institution in
accordance with permitting requirements for that
The digital records rarely accompany any of these
Current problems for all Natural History
Example from Briefings in Bioinformatics: Roderic Page 2008. vol. 9 (5):
345-354. Biodiversity Informatics: The challenge of linking data and the role
of shared identifiers.
The discussion continues
Lot’s of literature out there.
The Dr. Page blog is great.
The NHColl listserve recently
hosted a long discussion on
Hinojosa-Diaz, Nemesio, Engel 2012
We’ll only ever have more
specimens, more associated data, more
data portals, and more ways to share
Natural History Specimens need Globally
Unique Identifiers that really work!
Familiar Unique ID’s
ISBN = International Standard Book Number.
ISSN: International Standard Serial Number.
SSN: Social Security Number.
1st 3 numbers = area code
2nd 2 numbers = group number
Last 4 numbers = serial number
Where did Gooooids come
GUID 1 = Globally Unique Identifier (/Gooooid/). Unique
reference number used as an identifier in computer
hardware/software and based on the UUID standard.
[128-bit values displayed as 32 hexadecimal digits separated
by hyphens] Ex: 3F2504E0-4F89-11D3-9A0C-0305E82C3301
UUID = Universally Unique Identifier. An identifier standard
used in software construction standardized by the Open
GUID 2 = RSS definition still Globally Unique ID. The <guid>
element defines a unique identifier for the item. Aggregators
must view the guid as a string. No rules for syntax. Up to the
creator of the RSS document to establish uniqueness.
DOI = Digital Object Identifier. A character string used
to uniquely identify an object. Used mostly by
publishers (CrossRef, DataCite). A URN commonly
assigned to scientific articles in their electronic form.
Managed by the International DOI Foundation
(IDF), the governance body of the DOI system.
Appoints registration agencies that provide services to
DOI registrants like allocating DOI prefixes, registering
DOI names, etc.
resolution using the Handle System
ARK = Archival Resource Key. ARK’s are URL’s
(Uniform Resource Locator) designed to support long-
term access to information objects. Used extensively
by University digital Libraries/digital archives and
Google! Also requires a registry maintained by the California
Digital Library. NAA = name assigning authority NAAN =
name assigning authority number!
Everyone and everything
wants a unique ID!
ASIN (Amazon Standard Identification Number, a
proprietary product identifier)
CODEN (serial publication identifier currently used by
libraries; replaced by the ISSN for new works)
DOI (Digital Object Identifier)
ETTN (Electronic Textbook Track Number)
ISAN (International Standard Audiovisual Number)
ISBN (International Standard Book Number)
ISMN (International Standard Music Number)
ISRC (International Standard Recording Code)
ISWC (International Standard Musical Work Code)
LCCN (Library of Congress Control Number)
OCLC (Online Computer Library Center)
Has the world gone
Natural History Collections
LSID = Life Science Identifiers (no funny
pronunciation). It is a URN. Ex: Applied to species
names in Species 2000 and ITIS Catalogue of Life
Again, requires a registry. The governing body here
is TDWG “Biodiveristy Information Standards”
(formerly The International Working Group on
John Deck, University of California, Berkeley
Brian Stucky, University of Colorado, Boulder
Lukasz Ziemba, University of Florida, Gaineseville
Nico Cellinese, University of Florida, Gainesville
Rob Guralnick, University of Colorado, Boulder
Reed Beaman, Nico Cellinese, Jonathan
Coddington, Neil Davies, John Deck,
Rob Guralnick, Bryan P. Heidorn, Chris Meyer,
Tom Orrell, Rich Pyle, Kate Rachwal, Brian
Stucky, Rob Whitton, Lukasz Ziemba
Natural History Collections
The Museum community should
implement an international system for
distribution and maintenance of
persistent unique identifiers for all of
our biological objects.
BiSciCol Blog: http://biscicol.blogspot.com/
GUID’s must be globally unique. The “Darwin
Core Triplet” might not be good enough.
GUID’s must be persistent.
GUID’s must be assigned as close to the source
GUID’s propagate downstream to other systems.
Don’t conflate GUID’s for physical material with
GUID’s for metadata about the physical object.
GUID’s need to be attached in a meaningful way
to semantic services.
KU Division of Entomology
Michael Engel - PI
NSF DBI – 1057366: A specimen-
level database of the world’s bees
(Apoidea) at the University of