Thomas ecn 2012


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Figure 3 from his paper. He searched GenBank for sequence data for the ant species Melissotarsusinsularis and found nothing.Then he searched Ant Web and found a specimen listed that had been sequenced and a publication with a supplement table that lists this specimen as the source for sequence number DQ176312. He goes back to genbank, finds this record witht the taxon name Melissotarsus sp. BLF ml.
  • ISBN:A unique 9-digit standard book numbering code. One agency per country is designated to assign ISBN’s for the publishers and self-publishers located in that country.ISSN:This is similar to ISBN, but for periodical publications.SSN:A unique nine digit number issued to U.S. citizens, permanent residents and temporary (working) residents under section 205(c)(2) of the Social Security Act. Issued by The Social Security Administration, an independent agency of the US government.
  • Advantage is it’s persistance. When an object moves it’s URL changes, but it’s DOI remains the same.
  • 10-15 yrs. ago TDWG started working on the GUID problem for museum specimens.The specify software will gererate a human readable guid for your specimens based on/co-oping a domain name system approach.This hasn’t been widely implemented by the community at large.SEMC hasn’t implemented them for that reason.
  • Thomas ecn 2012

    1. 1. { Entomology specimens develop dissociative identity disorder Jennifer Thomas – Assistant Collection Manager Division of Entomology University of Kansas
    2. 2. Synopsis  Current common problems with digital specimen records  Common issues with disparate data, data integration  General types of unique identifiers and how they function  Specific digital ID’s for Natural History objects and the need for a global solution  What to do until we have a solution – best practices
    3. 3. Current common problems for Entomology collection object records  Multiple barcodes for single specimen/collection object
    4. 4. This happens when…  Specimens are gifted to another institution  Specimens are retained by another institution as a result of a revisionary work  Specimens are returned to an institution in accordance with permitting requirements for that particular country.  The digital records rarely accompany any of these specimen transactions.
    5. 5. Current problems for all Natural History specimens
    6. 6. Example from Briefings in Bioinformatics: Roderic Page 2008. vol. 9 (5): 345-354. Biodiversity Informatics: The challenge of linking data and the role of shared identifiers. Melissotarsus sp. BLF ml.
    7. 7. The discussion continues Lot’s of literature out there.  The Dr. Page blog is great. The NHColl listserve recently hosted a long discussion on identifiers
    8. 8. Euglossa embera Hinojosa-Diaz, Nemesio, Engel 2012
    9. 9. We’ll only ever have more specimens, more associated data, more data portals, and more ways to share that data… Natural History Specimens need Globally Unique Identifiers that really work!
    10. 10. Familiar Unique ID’s  ISBN = International Standard Book Number.  ISSN: International Standard Serial Number.  SSN: Social Security Number. 1st 3 numbers = area code 2nd 2 numbers = group number Last 4 numbers = serial number
    11. 11. Where did Gooooids come from?  GUID 1 = Globally Unique Identifier (/Gooooid/). Unique reference number used as an identifier in computer hardware/software and based on the UUID standard. [128-bit values displayed as 32 hexadecimal digits separated by hyphens] Ex: 3F2504E0-4F89-11D3-9A0C-0305E82C3301  UUID = Universally Unique Identifier. An identifier standard used in software construction standardized by the Open Software Foundation.  GUID 2 = RSS definition still Globally Unique ID. The <guid> element defines a unique identifier for the item. Aggregators must view the guid as a string. No rules for syntax. Up to the creator of the RSS document to establish uniqueness.
    12. 12. Persistent identifiers…  DOI = Digital Object Identifier. A character string used to uniquely identify an object. Used mostly by publishers (CrossRef, DataCite). A URN commonly assigned to scientific articles in their electronic form.  Managed by the International DOI Foundation (IDF), the governance body of the DOI system.  Appoints registration agencies that provide services to DOI registrants like allocating DOI prefixes, registering DOI names, etc.  resolution using the Handle System
    13. 13. More…  ARK = Archival Resource Key. ARK’s are URL’s (Uniform Resource Locator) designed to support long- term access to information objects. Used extensively by University digital Libraries/digital archives and Google!  Also requires a registry maintained by the California Digital Library. NAA = name assigning authority NAAN = name assigning authority number!
    14. 14. Everyone and everything wants a unique ID!  ASIN (Amazon Standard Identification Number, a proprietary product identifier)  CODEN (serial publication identifier currently used by libraries; replaced by the ISSN for new works)  DOI (Digital Object Identifier)  ETTN (Electronic Textbook Track Number)  ISAN (International Standard Audiovisual Number)  ISBN (International Standard Book Number)  ISMN (International Standard Music Number)  ISRC (International Standard Recording Code)  ISWC (International Standard Musical Work Code)  LCCN (Library of Congress Control Number)  OCLC (Online Computer Library Center)
    15. 15. Has the world gone identifier crazy? YES!
    16. 16. Natural History Collections implementations  LSID = Life Science Identifiers (no funny pronunciation). It is a URN. Ex: Applied to species names in Species 2000 and ITIS Catalogue of Life Project.  Again, requires a registry. The governing body here is TDWG “Biodiveristy Information Standards” (formerly The International Working Group on Taxonomic Databases).
    17. 17. John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba Natural History Collections implementations
    18. 18. The Solution  The Museum community should implement an international system for distribution and maintenance of persistent unique identifiers for all of our biological objects.
    19. 19. Best Practices BiSciCol Blog:  GUID’s must be globally unique. The “Darwin Core Triplet” might not be good enough.  GUID’s must be persistent.  GUID’s must be assigned as close to the source as possible.  GUID’s propagate downstream to other systems.  Don’t conflate GUID’s for physical material with GUID’s for metadata about the physical object.  GUID’s need to be attached in a meaningful way to semantic services.
    20. 20. Acknowledgements  KU Bioinformatics Andy Bentley Rod Spears Theresa Lammer  KU Division of Entomology Zack Falin Michael Engel - PI NSF DBI – 1057366: A specimen- level database of the world’s bees (Apoidea) at the University of Kansas