Ilene Mizrachi - Opening Plenary
Upcoming SlideShare
Loading in...5
×
 

Ilene Mizrachi - Opening Plenary

on

  • 730 views

Barcode Sequence Dataflow into Genbank

Barcode Sequence Dataflow into Genbank

Statistics

Views

Total Views
730
Views on SlideShare
730
Embed Views
0

Actions

Likes
0
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Ilene Mizrachi - Opening Plenary Ilene Mizrachi - Opening Plenary Presentation Transcript

  • Ilene Mizrachi November 30, 2011 Fourth International Barcode of Life ConferenceNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Barcode Project -2003 and beyond Barcode of Life project was initiated at in 2003 INSDC would be the repository for raw and assembled sequence data INSDC adopts new source fields to accommodate Barcode metadata requirements Barcode of Life Database (BOLD) established as a community workbench and sequencing center National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • What is a Barcode? A global reference library of DNA barcode sequences that is integrated with other systems of biodiversity information (e.g., databases of specimens, species, biogeographic information). Mechanism to link DNA sequences to vouchered specimens and valid species names. A reserved BARCODE keyword was adopted for data that met strict barcode standards National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Barcode Standard Formally described species or a provisional label for an unpublished species Voucher specimen identifier, preferably in a biorepository using a structured field Country-Code using the controlled vocabulary used by GenBank; Sequence from a gene region specified by the CBOL  COI for animals  matK and rbcL for plants  ITS for fungi Contain at least 75% contiguous, high quality bases from within the approved region Electropherogram trace files for bidirectional sequencing runs Sequences of all forward and reverse primers Strongly recommended data elements  GPS coordinates  Name of the identifier  Name of the collector  Date of collection National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Compliant Barcode Record
  • Barcode records in GenBankNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Life of an iBOL Record
  • Submissions from BOLDNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Data Sharing WorksNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • http://www.ncbi.nlm.nih.gov/WebSub/?tool=barcode
  • QA checks at GenBankTo ensure that the sequence data is of high quality, thefollowing checks are run: Barcode data element compliance Consistency checks such as:  reported latitude-longitude falls within cited country  collection date has already occurred Sequence quality checks National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Compliance toolNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Checking Sequence Quality • Trim primer sequences • Check congruence between fwd and reverse reads • Align sequences to check for gaps • Translate sequences to check for internal stopsNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Updates Are Critical Primary data repository – sequence records owned by submitter Submitter is responsible for providing additional data and metadata as it becomes available:  Publication  Sequence  Taxonomy  Voucher Third party updates are welcome! National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Challenges If Reference Barcodes are to be used for species identification, phylogenetics, ecological forensics, conservation, and macro-analysis of biodiversity patterns, then the minimal requirement should be (a) high quality sequence (b) link to specimen and (c) taxonomic identification Need to support rapid data release including preliminary taxonomic classifications similar to “Fort Lauderdale Principles” of genomics community Data updated asynchronously at BOLD and in GenBank. Need to continue work on update channel Need to work with communities to devise strict QA tests for plant and fungal Barcodes National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
  • Acknowledgements Taxonomy Group  GenBank Group  Scott Federhen  Susan Schafer  Conrad Schoch  Michael Fetchko  Lu Sun  Carol Hotton  Software Support  Detlef Leipe  Colleen Bollin  Kamen Todorov  Vasuki Gobu National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA