• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dr David Schindel and Mike Trizna - BOL Data Portal
 

Dr David Schindel and Mike Trizna - BOL Data Portal

on

  • 1,055 views

Using BOL in conjunction with BOLD, its capabilities and an example of the case study; Smithsonian frozen bird tissue project

Using BOL in conjunction with BOLD, its capabilities and an example of the case study; Smithsonian frozen bird tissue project

Statistics

Views

Total Views
1,055
Views on SlideShare
1,046
Embed Views
9

Actions

Likes
0
Downloads
12
Comments
0

2 Embeds 9

http://paper.li 5
http://a0.twimg.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dr David Schindel and Mike Trizna - BOL Data Portal Dr David Schindel and Mike Trizna - BOL Data Portal Presentation Transcript

    • The Barcode of Life Data Portal(http://bol.uvm.edu) Dr. David E Schindel, Executive Secretary Michael Trizna, Database Specialist Consortium for the Barcode of Life (CBOL) Smithsonian Institution Washington, DC www.barcodeoflife.org; SchindelD@si.edu and TriznaM@si.edu
    • Contents of PresentationCrowd-sourced open source softwareHow does Data Portal complement BOLDand GenBank?Data Portal capabilitiesCase Study: Smithsonian frozen birdtissue project
    • An Experiment in Museum Tissue Mining and Fast Data Release Tissue sampling winter/spring Sequencing completed in September Sequence quality control in October Taxonomic checking in early November – Obvious errors removed – Minor discrepancies remain Data released for Adelaide Conference – Crowd-sourced annotation by community – Will data be mis-used?
    • Unique Data Portal Capabilities Creating customized datasets from public and/or your private data Online library of standard datasets Support sharing within project teams using Connect IDs, easy link to Working Groups Running different identification analyses based on different methodologies: – Standard sequence input using FASTA format – Use standard or customized datasets
    • Barcode Aggregator 727,170 public records
    • Summary Statistics per Family
    • Creating Customized Datasets
    • Existing Data Analysis Packages LIST of packages – BLOG – BRONX – Kernel – CAOS – USEARCH – BLAST Output of identification routines as probabilities of assignment
    • Data Analysis Methods Session New packages presented Friday afternoon: – Damon Little: Automatic Plants Barcode pipeline (from raw traces to trimmed/edited sequences) – Ka Hou Chu: Composite Vector Method (profile trees for faster alignment and tree- based analysis) – Alain Franc: Matching Next Generation results to Sanger-based reference records
    • Sample output
    • CONNECT for Data Portal Collaboration
    • The USNM Bird ProjectUSNM Division of Birds frozen tissuecollection:– 21,104 specimens, 2512 speciesWhich new ones ones to sample/barcode?Public records for birds– All public bird COI records: 10,967– All BARCODE records in GenBank: 8,419– BARCODE with taxonomic names: 7,965– BARCODE, name and 2 traces: 2,388
    • Moving Data Among BOLD, GenBank, Data Portal USNM Excel BOLD Spreadsheet Split into projects that(KE-Emu Source) consist of 2-4 platesLocal database that Data Portalholds all fields from Aggregator the original database spreadsheet
    • Creating a ‘Pick List’Spreadsheet of tissue samples comparedwith:– ITIS taxonomy– Clemens species list in BOLD– Counts of GenBank and/or public BOLD records– Geographic informattionScreenshot of USNM list side-by-side withBOLD records
    • Identifying Samples to be Subsampled
    • Side-by-Side Lists
    • USNM Bird Dataset3150 tissues sampled168 failed sequences94 problematic sequences166 clustered badly2761 ‘BARCODE-ready’ samples1,147 ‘first-BARCODE’ species91% increase over 1,259 barcoded species(3,892 listed in BOLD includes BINs, others)
    • Two problematic clades, USNM data Flycatchers: Family Tyrannidae – Sublegatus arenarum, S. modestus, S. obscurior, S. sp. – Conopias parvus, C. albovittatus – Myiarchus ferox, M. swainsoni, M. sp. Hummingbirds: Family Trochilidae – Phaethornis longuemareus Inconsistencies within USNM dataset Incompatibilities with public, other data
    • Resolving Mis-identified Specimens
    • What testing dataset to use?ID trees and analytical routines could use:– All public bird COI records: 10,967– All BARCODE records in GenBank: 8,419– BARCODE with taxonomic names: 7,965– BARCODE, name and 2 traces: 2,388Which ones have reliable taxonomic IDs?
    • Preparing a Data Release Paper Summary statistics from Data Portal Figures from BOLD