TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Schindel i evobio norman ok - jun 11
1. The BARCODE Data Standard as a Cross-Cultural Bridge David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.edu; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938
2. Gaining Large Scale Through Standards Are our data meant only for small segregated communities of practice or bigger audiences? Accelerate progress, Economies of scale Re-use and new use of data, synthesis, comparative analysis Shared hardware and software Standardized protocols, easier training and technical assistance Applications by non-specialists (regulatory agencies, citizen scientists, K-12 classroom)
4. Species Identification Matters Basic research: One more character set, but digital and calibrated Standardized yardstick for measuring variability and divergence Objective comparison across taxa, distance Links to Linnean names Triage by non-specialists for species discovery Ecology of juveniles, gut contents, fecal matter Shallow phylogenies showing history of community assemblages Subject to weaknesses of any single character (convergence, pseudogenes, introgression, etc.)
5. Species Identification Matters Applied research/regulation by non-specialists Agricultural pests/beneficial species Endangered/protected species Disease vectors/pathogens Environmental quality indicators Invasive species (e.g., in ballast water) Managing for sustainable harvesting Consumer protection, ensuring food quality Fidelity of seedbanks, culture collections
8. Small ribosomal RNA The Mitochondrial Genome D-Loop DNA mtDNA Cytochrome b ND1 ND6 ND5 COI ND2 COI L-strand H-strand Typical Animal Cell ND4 ND4L COII ND3 COIII ATPase subunit 8 ATPase subunit 6 Mitochondrion An Internal ID System for All Animals
9. Non-COI regions for other taxa Land plants: Chloroplast matK and rbcL approved Nov 09 70-75% resolvingability, higher in angiosperms Non-coding plastid and nuclear regions being explored Fungi: CBOL Working Group met this week in Amsterdam Agreed to recommend ITS; 72% effective Protists: CBOL Working Group July meeting, Berlin
10. How Barcoding Works PHASE 1: Build a barcode reference library: Well-identified specimen Tissue subsample DNA extraction, PCR amplification DNA sequencing Data submission to GenBank PHASE 2: Identify unknowns: Any unidentified juvenile, adult, fragment, product Tissue sample, DNA, sequencing Comparison with sequences in reference library
21. BARCODE Records in INSDC Voucher Specimen Species Name Specimen Metadata GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.
31. Structured Link to Vouchers : : NHM LEP 123456 : : personal DHJanzen SRNP12345
32. NCBI’s Biorepository List Compiled from Index Herbariorum, literature sources, GenBank submissions 6,936 records 1,177 records with non-unique acronyms 517 homonymous acronyms 374 shared by two records 143 shared by three records
35. Accessibility Formal naming Collaborative consensus-building of taxon concepts (CATE) Sharing of non-BARCODE data (ScratchPads) BARCODE data release with provisional nomenclature (PLoS) Specimen data release (GBIF) Comparisons, concept validation Taxon concept formation, refinement Collecting events, specimens Specimen clustering Two Taxonomic Research Processes
36. Long-term data curationof BARCODE records Data records assembled in BOLD Community feedback Compliant with BARCODE standards? Update records (audit trail of species names retained) Data records released on INSDC IDs consistent with other records? GenBank adds BARCODE flag CBOL control of BARCODE flag Data records published in BOLD