Dr David Schindel and Mike Trizna - BOL Data Portal

The Barcode of Life
Data Portal
(http://bol.uvm.edu)
Dr. David E Schindel, Executive Secretary
Michael Trizna, Database Specialist
Consortium for the Barcode of Life (CBOL)
Smithsonian Institution
Washington, DC
www.barcodeoflife.org;
SchindelD@si.edu and TriznaM@si.edu

Contents of Presentation
Crowd-sourced open source software
How does Data Portal complement BOLD
and GenBank?
Data Portal capabilities
Case Study: Smithsonian frozen bird
tissue project

An Experiment in Museum Tissue
Mining and Fast Data Release
Tissue sampling winter/spring
Sequencing completed in September
Sequence quality control in October
Taxonomic checking in early November
– Obvious errors removed
– Minor discrepancies remain
Data released for Adelaide Conference
– Crowd-sourced annotation by community
– Will data be mis-used?

Unique Data Portal Capabilities
Creating customized datasets from public
and/or your private data
Online library of standard datasets
Support sharing within project teams using
Connect IDs, easy link to Working Groups
Running different identification analyses
based on different methodologies:
– Standard sequence input using FASTA format
– Use standard or customized datasets

Barcode Aggregator

727,170 public records

Existing Data Analysis Packages
LIST of packages
– BLOG
– BRONX
– Kernel
– CAOS
– USEARCH
– BLAST
Output of identification routines as
probabilities of assignment

Data Analysis Methods Session
New packages presented Friday
afternoon:
– Damon Little: Automatic Plants Barcode
pipeline (from raw traces to trimmed/edited
sequences)
– Ka Hou Chu: Composite Vector Method
(profile trees for faster alignment and tree-
based analysis)
– Alain Franc: Matching Next Generation results
to Sanger-based reference records

CONNECT for Data Portal
Collaboration

The USNM Bird Project
USNM Division of Birds frozen tissue
collection:
– 21,104 specimens, 2512 species
Which new ones ones to sample/barcode?
Public records for birds
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388

Moving Data Among
BOLD, GenBank, Data Portal
USNM Excel BOLD
Spreadsheet Split into projects that
(KE-Emu Source) consist of 2-4 plates

Local database that Data Portal
holds all fields from Aggregator
the original database
spreadsheet

Creating a ‘Pick List’
Spreadsheet of tissue samples compared
with:
– ITIS taxonomy
– Clemens species list in BOLD
– Counts of GenBank and/or public BOLD
records
– Geographic informattion
Screenshot of USNM list side-by-side with
BOLD records

Identifying Samples to be Subsampled

USNM Bird Dataset
3150 tissues sampled
168 failed sequences
94 problematic sequences
166 clustered badly
2761 ‘BARCODE-ready’ samples
1,147 ‘first-BARCODE’ species
91% increase over 1,259 barcoded species
(3,892 listed in BOLD includes BINs, others)

Two problematic clades, USNM data
Flycatchers: Family Tyrannidae
– Sublegatus arenarum, S. modestus, S.
obscurior, S. sp.
– Conopias parvus, C. albovittatus
– Myiarchus ferox, M. swainsoni, M. sp.
Hummingbirds: Family Trochilidae
– Phaethornis longuemareus
Inconsistencies within USNM dataset
Incompatibilities with public, other data

Resolving Mis-identified
Specimens

What testing dataset to use?
ID trees and analytical routines could use:
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388
Which ones have reliable taxonomic IDs?

Preparing a Data Release Paper
Summary statistics from Data Portal

Figures from BOLD

Dr David Schindel and Mike Trizna - BOL Data Portal

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Dr David Schindel and Mike Trizna - BOL Data Portal

Similar to Dr David Schindel and Mike Trizna - BOL Data Portal (20)

More from Consortium for the Barcode of Life (CBOL)

More from Consortium for the Barcode of Life (CBOL) (20)

Recently uploaded

Recently uploaded (20)

Dr David Schindel and Mike Trizna - BOL Data Portal