David Schindel - Barcode Data Standard Compliance

1,457 views
1,330 views

Published on

Vouchering and archiving of vouchers, imaging and archival of e-vouchers, provenance data quality and sequence and trace file quality

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,457
On SlideShare
0
From Embeds
0
Number of Embeds
94
Actions
Shares
0
Downloads
41
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

David Schindel - Barcode Data Standard Compliance

  1. 1. The BARCODE Data Standard David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian InstitutionSchindelD@si.edu; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938
  2. 2. BARCODE Data Standard is:A set of required elements for a reservedKeyword (‘BARCODE’) in GenBankA set of sequence quality requirementsRequired or recommneded formats fordata interoperability with:– Voucher specimens in biorepositories– Georeferenced data– Taxonomic literature
  3. 3. An Internal ID System for All Animals The Mitochondrial Genome DNA D-Loop Small ribosomal RNA Cytochrome b ND1 ND6Typical Animal Cell ND5 COI ND2 mtDNA L-strand H-strand ND4 ND4L COII ND3 COIII ATPase subunit 8 Mitochondrion ATPase subunit 6
  4. 4. Non-COI regions for other taxa Land plants: – Chloroplast matK and rbcL approved Nov 09 – 70-75% resolving ability, higher in angiosperms – Non-coding plastid and nuclear regions being explored Fungi: – CBOL Working Group met this week in Amsterdam – Agreed to recommend ITS; 72% effective Protists: – CBOL Working Group July meeting, Berlin
  5. 5. BARCODE Record Flow Chart Key Mirroring Update Channel Private Records USER /GenBank
  6. 6. BARCODE Records in GenBank
  7. 7. Submission of BARCODE Records to EBI and DDBJ
  8. 8. Required Elements for BARCODE Taxonomic identification to species Voucher specimen ID in standard format Name of barcode region Length, quality, 2 trace files Forward/reverse primer sequences, names Country/Ocean/Sea of origin
  9. 9. Highly Recommended Elements Latitude/longitude Name of Collector Collection date Name of identifier
  10. 10. Traditional GSC Minimum Traditional Taxonomy Standards GenBank (MI*)Voucher specimenID XXX XXXSpecies ID XXX X X Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin(country, ocean) XXX X Latitude/Longitude XXX XXX Collection date, collector name XXX XXXTrace files XXX XXPrimer information X XX
  11. 11. BARCODE Records in INSDCSpecimen Voucher SpeciesMetadata Specimen Name Georeference Indices Habitat - Catalogue of Life Character sets Images Barcode - GBIF/ECAT Nomenclators Behavior Other genes Sequence - Zoo Record Trace files Primers - IPNI - NameBank Publication linksLiterature - New species citation Record in Databases - Provisional sp. BOLD
  12. 12. Compliance with Standard (1)1.37 million records in BOLD514,390 BARCODE records in INSDC395,774 have ordinal name plus BarcodeIndex Number for taxonomic ID – Rapid data release versus time for annotation – Exposure to data theft, risk of misidentification – Added value of Linnean name – Incidence of misidentifications in GenBank – Danger of circular reasoning
  13. 13. Taxonomic IdentificationThe genus and species combination thatcan be found in:– a taxonomic index such as Catalog of Life, Zoological Record or IPNI;– a taxonomic treatment of a previously published species name; or– a published description of the species; orA provisional label for a potential newspecies;
  14. 14. Rod Page’s ‘Dark Taxa’ R. Page, iPhylo blogspot, 12 April 2011
  15. 15. Taxonomic Content in iBOL Data iBOL ‘Phase 1’ GenBank ‘Phase 0’ Org name: Tentative name is in Order + BIN BOLD, unreleased Tentative Name: GenBank ‘Phase 1’ blank Org name = iBOL ‘Phase 2’ Order + BIN plus Org name: Tentative name Order + BIN Tentative Name: GenBank ‘Phase 2’ blank Org name = sp. name
  16. 16. Unique identifier for the voucher specimenIn standardized format based on Darwin Core: Institutional acronym:Collection code:Specimen number Institutional acronym:Specimen number personal:Collection code:Specimen number GTI/CBOL/iBOL Workshop, 7 November 2009
  17. 17. Compliance with Standard (2)514,390 BARCODE records in INSDC – Traces, primers, length, country, and presence of voucherID checked by GenBank99.9% have entry for /specimen_voucher13,151 have formatted voucher from 38institutions – 20 confirmed in biorepositories – 11 unconfirmed – 7 unlisted
  18. 18. Darwin Core Triplet Structured Link to VouchersInstitutional Collection : Catalog : Code ID Acronym NHM : LEP : 123456 personal : DHJanzen : SRNP12345
  19. 19. Icelandic Institute of Natural History,AMNH Akureyri Division Akureyri IcelandAMNH American Museum of Natural History New York USA Monterrey, NuevoUNL Universidad Autónoma de Nuevo León León MexicoUNL University of Nebraska State Museum Lincoln, Nebraska USA Centro de Estratigrafia e Paleobiologia daUNL Universidade Nova de Lisboa Monte de Caparica PortugalZMK Zoological Musem, Kristiania Oslo NorwayZMK Zoologisches Museum der Universität Kiel Kiel GermanyZMK Zoological Museum, Copenhagen Copenhagen Denmark
  20. 20. CBOL/GBIF/NCBI Registry of Biorepositorieswww.biorepositories.org

×