Church gmod2012 pt2


Published on

Part 2 of my talk at GMOD 2012

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Show alignment of a feature from first slide to show how far down the chromosome it has moved…
  • Keeping track of people is way easier than keeping track of assemblies.
  • Church gmod2012 pt2

    1. 1. The Evolution of the Resources Navigating Genome Reference Human Genome at NCBI Part 2 Deanna M. Church, NCBI@deannachurch
    2. 2. Data Archives GenBank Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter
    3. 3. Data trackingABC14-1065514J1 Date Phase Gaps LengthFP565796.1 21-Oct-2009 1 1FP565796.2 14-Oct-2010 1 0FP565796.3 07-Nov-2010 3 0
    4. 4. Mouse chrX: 34,800,000-34,890,000NC_000086.1 2 4 3 6 5 7 CM001013.1 2
    5. 5. Mouse chrX: 35,000,000-36,000000 MGSCv3 MGSCv36 X
    6. 6. What’s in a name?GRCh37hg19 Zv7 danRer5 MGSCv37mm8 NCBIM37
    7. 7. By any other name…chr21:8,913,216-9,246,964
    8. 8. By any other name…Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
    9. 9. Genome Browser Agreement Submitter deposits assembly to Assembly QA GenBank/EMBL/DDBJ Submitter updates assembly based on QA results Browsers pick up assembly fromGenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ
    10. 10. hg19 GRCh37
    11. 11. Assembly (e.g. GRCh37.p5) GCA_000001405.6 /GCF_000001405.17 ALT GCA_000001345.1/ Primary GCA_000001305.1/ 4 GCF_000001345.1 Assembly GCF_000001305.13 ALT GCA_000001355.1/ 5 GCF_000001355.1 Non-nuclear ALT GCA_000001365.1/ GCA_000006015.1/ assembly unit 6 GCF_000001365.2 GCF_000006015.1 (e.g. MT) ALT GCA_000001375.1/ 7 GCF_000001375.1ALT GCA_000001315.1/ 1 GCF_000001315.1 ALT GCA_000001385.1/ 8 GCF_000001385.1ALT GCA_000001325.1/ 2 GCF_000001325.2 ALT GCA_000001395.1/ 9 GCF_000001395.1ALT GCA_000001335.1/ 3 GCF_000001335.1 GCA_000005045.5 Patches GCF_000005045.4
    12. 12. GenBank vs RefSeqSubmitter Owned RefSeq Owned Redundancy Non-Redundant Updated rarely Curated INSDC Not INSDC BRCA183 genomic records 3 genomic records31 mRNA records 5 mRNA records27 protein records 1 RNA record 5 protein records
    13. 13. RefSeq for AssembliesTypical assembly edits Addition of non-nuclear (e.g. MT) assembly units Removal of contamination Drop unlocalized/unplaced scaffolds Mask contamination that is placed on chromosome
    14. 14.
    15. 15. Understanding relationships between assemblies using alignmentsFirst Pass Reciprocal best hitSecond Pass Non-reciprocal, duplicative hits
    16. 16. NCBI36 GRCh37.p5No second pass alignments in GRCh37.p5
    17. 17. Annotation pipeline AssembliesTranscripts ProteinsSet of genesOther decoration Francoise Thibaud-Nissen
    18. 18. Content of the final annotation product Description In In a BLAST On FTP sequence database site databaseChromosomes (NC_or AC_)  Scaffolds (NW_ or NT_)   Curated transcripts/proteins (NM_, NR_/NP_)   Predicted transcripts/proteins (fully or partially   -supported) (XM_, XR_/XP_)Non-transcribed pseudogenes  tRNA (annotated with tRNAScan)  Ab initio Gnomon models   Annotation Pipeline RefSeq
    19. 19. Where to find the annotation products?• Nucleotide/Protein databases• Gene• Map Viewer• BLAST databases• FTP site
    20. 20. Annotating multiple assemblies • Assembly-assembly alignments Available at Group 1 Transcript Assembly 1 Group 2 Assembly 2 • Consistent placement of transcripts • Consistent labelling of the genes • Consistent annotation on all assemblies
    21. 21. Annotating multiple assemblies(2) Btau_4.6.1 Same Gene symbol UMD_3.1
    22. 22. Interacting with the communityFlyBase GenBank RefSeq
    23. 23. Thanks! The Genome Reference Consortium The Genome Center at Washington University The Wellcome Trust Sanger Institute The European Bioinformatics Institute The National Center for Biotechnology Information Church group at NCBI For Slides: Valerie Schneider Francoise Thibaud-Nissen Nathan Bouk Evan Eichler Hsiu-Chuan Chen Steve Sherry Peter Meric Victor Ananiev Chao Chen John Lopez John Garner Tim Hefferon NCBI Cliff Clausen