Church gmod2012 pt2

822 views

Published on

Part 2 of my talk at GMOD 2012

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
822
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Show alignment of a feature from first slide to show how far down the chromosome it has moved…
  • Keeping track of people is way easier than keeping track of assemblies.
  • Church gmod2012 pt2

    1. 1. The Evolution of the Resources Navigating Genome Reference Human Genome at NCBI Part 2 Deanna M. Church, NCBI@deannachurch
    2. 2. Data Archives GenBank Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter
    3. 3. Data trackingABC14-1065514J1 Date Phase Gaps LengthFP565796.1 21-Oct-2009 1 1FP565796.2 14-Oct-2010 1 0FP565796.3 07-Nov-2010 3 0
    4. 4. Mouse chrX: 34,800,000-34,890,000NC_000086.1 2 4 3 6 5 7 CM001013.1 2
    5. 5. Mouse chrX: 35,000,000-36,000000 MGSCv3 MGSCv36 X
    6. 6. What’s in a name?GRCh37hg19 Zv7 danRer5 MGSCv37mm8 NCBIM37
    7. 7. By any other name…chr21:8,913,216-9,246,964
    8. 8. By any other name…Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
    9. 9. Genome Browser Agreement Submitter deposits assembly to Assembly QA GenBank/EMBL/DDBJ Submitter updates assembly based on QA results Browsers pick up assembly fromGenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ
    10. 10. hg19 GRCh37http://www.ncbi.nlm.nih.gov/genome/assembly
    11. 11. Assembly (e.g. GRCh37.p5) GCA_000001405.6 /GCF_000001405.17 ALT GCA_000001345.1/ Primary GCA_000001305.1/ 4 GCF_000001345.1 Assembly GCF_000001305.13 ALT GCA_000001355.1/ 5 GCF_000001355.1 Non-nuclear ALT GCA_000001365.1/ GCA_000006015.1/ assembly unit 6 GCF_000001365.2 GCF_000006015.1 (e.g. MT) ALT GCA_000001375.1/ 7 GCF_000001375.1ALT GCA_000001315.1/ 1 GCF_000001315.1 ALT GCA_000001385.1/ 8 GCF_000001385.1ALT GCA_000001325.1/ 2 GCF_000001325.2 ALT GCA_000001395.1/ 9 GCF_000001395.1ALT GCA_000001335.1/ 3 GCF_000001335.1 GCA_000005045.5 Patches GCF_000005045.4
    12. 12. GenBank vs RefSeqSubmitter Owned RefSeq Owned Redundancy Non-Redundant Updated rarely Curated INSDC Not INSDC BRCA183 genomic records 3 genomic records31 mRNA records 5 mRNA records27 protein records 1 RNA record 5 protein records
    13. 13. RefSeq for AssembliesTypical assembly edits Addition of non-nuclear (e.g. MT) assembly units Removal of contamination Drop unlocalized/unplaced scaffolds Mask contamination that is placed on chromosome
    14. 14. http://www.ncbi.nlm.nih.gov/genome
    15. 15. Understanding relationships between assemblies using alignmentsFirst Pass Reciprocal best hitSecond Pass Non-reciprocal, duplicative hits
    16. 16. NCBI36 GRCh37.p5No second pass alignments in GRCh37.p5http://www.ncbi.nlm.nih.gov/tools/gbench/
    17. 17. Annotation pipeline AssembliesTranscripts ProteinsSet of genesOther decoration Francoise Thibaud-Nissen
    18. 18. Content of the final annotation product Description In In a BLAST On FTP sequence database site databaseChromosomes (NC_or AC_)  Scaffolds (NW_ or NT_)   Curated transcripts/proteins (NM_, NR_/NP_)   Predicted transcripts/proteins (fully or partially   -supported) (XM_, XR_/XP_)Non-transcribed pseudogenes  tRNA (annotated with tRNAScan)  Ab initio Gnomon models   Annotation Pipeline RefSeq
    19. 19. Where to find the annotation products?• Nucleotide/Protein databases• Gene http://www.ncbi.nlm.nih.gov/gene http://www.ncbi.nlm.nih.gov/mapview• Map Viewer• BLAST databases• FTP site
    20. 20. Annotating multiple assemblies • Assembly-assembly alignments Available at http://www.ncbi.nlm.nih.gov/genome/tools/remap Group 1 Transcript Assembly 1 Group 2 Assembly 2 • Consistent placement of transcripts • Consistent labelling of the genes • Consistent annotation on all assemblies
    21. 21. Annotating multiple assemblies(2) Btau_4.6.1 Same Gene symbol UMD_3.1
    22. 22. Interacting with the communityFlyBase GenBank RefSeq
    23. 23. Thanks! The Genome Reference Consortium The Genome Center at Washington University The Wellcome Trust Sanger Institute The European Bioinformatics Institute The National Center for Biotechnology Information Church group at NCBI For Slides: Valerie Schneider Francoise Thibaud-Nissen Nathan Bouk Evan Eichler Hsiu-Chuan Chen Steve Sherry Peter Meric Victor Ananiev Chao Chen John Lopez John Garner Tim Hefferon NCBI Cliff Clausen

    ×