SlideShare a Scribd company logo
The Evolution of the Resources
  Navigating Genome Reference
        Human Genome
             at NCBI
                        Part 2


                Deanna M. Church, NCBI




@deannachurch
Data Archives




                     GenBank

   Data in a common format
   Data in a single location (and mirrored)
   Most quality checked prior to deposition
   Robust data tracking mechanism (accession.version)
   Data owned by submitter
Data tracking

ABC14-1065514J1
                Date       Phase   Gaps      Length

FP565796.1   21-Oct-2009    1       1

FP565796.2   14-Oct-2010    1       0

FP565796.3   07-Nov-2010    3       0
Mouse chrX: 34,800,000-34,890,000

NC_000086.1
          2
          4
          3
          6
          5
          7   CM001013.1
                       2
Mouse chrX: 35,000,000-36,000000
           MGSCv3       MGSCv36




                    X
What’s in a name?

GRCh37
hg19

               Zv7
               danRer5

  MGSCv37
mm8
    NCBIM37
By any other name…




chr21:8,913,216-9,246,964
By any other name…




Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
Genome Browser Agreement
    Submitter deposits
       assembly to           Assembly QA
   GenBank/EMBL/DDBJ

                           Submitter updates
                         assembly based on QA
                                results




  Browsers pick up
   assembly from
GenBank/EMBL/DDBJ

                     Assemblies must be in
                     GenBank/EMBL/DDBJ
hg19
               GRCh37




http://www.ncbi.nlm.nih.gov/genome/assembly
Assembly (e.g. GRCh37.p5)
                 GCA_000001405.6 /GCF_000001405.17
                                            ALT      GCA_000001345.1/
  Primary        GCA_000001305.1/            4       GCF_000001345.1
  Assembly       GCF_000001305.13
                                            ALT      GCA_000001355.1/
                                             5       GCF_000001355.1

  Non-nuclear                               ALT      GCA_000001365.1/
                 GCA_000006015.1/
 assembly unit                               6       GCF_000001365.2
                 GCF_000006015.1
    (e.g. MT)
                                            ALT      GCA_000001375.1/
                                             7       GCF_000001375.1
ALT    GCA_000001315.1/
 1     GCF_000001315.1
                                            ALT      GCA_000001385.1/
                                             8       GCF_000001385.1
ALT    GCA_000001325.1/
 2     GCF_000001325.2
                                            ALT      GCA_000001395.1/
                                             9       GCF_000001395.1
ALT    GCA_000001335.1/
 3     GCF_000001335.1                               GCA_000005045.5
                                           Patches
                                                     GCF_000005045.4
GenBank               vs      RefSeq
Submitter Owned              RefSeq Owned
  Redundancy                 Non-Redundant
 Updated rarely                 Curated
    INSDC                      Not INSDC

                     BRCA1
83 genomic records            3 genomic records
31 mRNA records               5 mRNA records
27 protein records            1 RNA record
                              5 protein records
RefSeq for Assemblies

Typical assembly edits
  Addition of non-nuclear (e.g. MT) assembly units
  Removal of contamination
    Drop unlocalized/unplaced scaffolds
    Mask contamination that is placed on chromosome
http://www.ncbi.nlm.nih.gov/genome
Understanding relationships between
                 assemblies using alignments




First Pass   Reciprocal best hit




Second Pass        Non-reciprocal, duplicative hits
NCBI36




                                            GRCh37.p5




No second pass alignments in GRCh37.p5

http://www.ncbi.nlm.nih.gov/tools/gbench/
Annotation pipeline
                   Assemblies
Transcripts                            Proteins




Set of genes
Other decoration
                                    Francoise Thibaud-Nissen
Content of the final annotation product
                   Description                          In      In a BLAST   On FTP
                                                     sequence    database     site
                                                     database

Chromosomes (NC_or AC_)                                                       
Scaffolds (NW_ or NT_)                                                       
Curated transcripts/proteins (NM_, NR_/NP_)                                  

Predicted transcripts/proteins (fully or partially                           
-supported) (XM_, XR_/XP_)

Non-transcribed pseudogenes                                                   
tRNA (annotated with tRNAScan)                                                
Ab initio Gnomon models                                                       



     Annotation Pipeline                                         RefSeq
Where to find the annotation products?
•   Nucleotide/Protein databases
•   Gene             http://www.ncbi.nlm.nih.gov/gene

                        http://www.ncbi.nlm.nih.gov/mapview
•   Map Viewer
•   BLAST databases
•   FTP site
Annotating multiple assemblies
  • Assembly-assembly alignments
     Available at http://www.ncbi.nlm.nih.gov/genome/tools/remap


                         Group 1
            Transcript
                                              Assembly 1
                                    Group 2

                                                           Assembly 2




     • Consistent placement of transcripts
     • Consistent labelling of the genes
     • Consistent annotation on all assemblies
Annotating multiple assemblies(2)


                             Btau_4.6.1


         Same Gene symbol



                             UMD_3.1
Interacting with the community




FlyBase   GenBank         RefSeq
Thanks!
 The Genome Reference Consortium
  The Genome Center at Washington University
  The Wellcome Trust Sanger Institute
  The European Bioinformatics Institute
  The National Center for Biotechnology Information

  Church group at NCBI                                For Slides:
    Valerie Schneider                                  Francoise Thibaud-Nissen
    Nathan Bouk                                        Evan Eichler
    Hsiu-Chuan Chen                                    Steve Sherry
    Peter Meric
    Victor Ananiev
    Chao Chen
    John Lopez
    John Garner
    Tim Hefferon
                                                      NCBI
    Cliff Clausen

More Related Content

What's hot

What's hot (13)

NiH_Presentation
NiH_PresentationNiH_Presentation
NiH_Presentation
 
Gann 112.006
Gann 112.006Gann 112.006
Gann 112.006
 
20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究
 
CtrA Stability Poster (Final)
CtrA Stability Poster (Final)CtrA Stability Poster (Final)
CtrA Stability Poster (Final)
 
Lecture on pUC18 vector
Lecture on pUC18 vectorLecture on pUC18 vector
Lecture on pUC18 vector
 
Molecular markers and plant breeding4
Molecular markers and plant breeding4Molecular markers and plant breeding4
Molecular markers and plant breeding4
 
NSA QPX 2008
NSA QPX 2008NSA QPX 2008
NSA QPX 2008
 
Imaginal discs1
Imaginal discs1Imaginal discs1
Imaginal discs1
 
Epiontis immune monitoring and companion diagnostics 2013
Epiontis immune monitoring and companion diagnostics 2013Epiontis immune monitoring and companion diagnostics 2013
Epiontis immune monitoring and companion diagnostics 2013
 
Comparative Genomics for Marker Development in Cassava
Comparative Genomics for Marker Development in CassavaComparative Genomics for Marker Development in Cassava
Comparative Genomics for Marker Development in Cassava
 
Lec.1 Introduction Prokaryote_and_Eukaryote
Lec.1 Introduction Prokaryote_and_EukaryoteLec.1 Introduction Prokaryote_and_Eukaryote
Lec.1 Introduction Prokaryote_and_Eukaryote
 
Hoofdstuk 20 2008 deel 1
Hoofdstuk 20 2008 deel 1Hoofdstuk 20 2008 deel 1
Hoofdstuk 20 2008 deel 1
 
Recombinant dna technology
Recombinant dna technologyRecombinant dna technology
Recombinant dna technology
 

Viewers also liked (7)

Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
 
NXTGNT kick off
NXTGNT kick offNXTGNT kick off
NXTGNT kick off
 
Bioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperlBioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperl
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
 
Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014
 
Bioinformatics t2-databases wim-vancriekinge_v2013
Bioinformatics t2-databases wim-vancriekinge_v2013Bioinformatics t2-databases wim-vancriekinge_v2013
Bioinformatics t2-databases wim-vancriekinge_v2013
 

Similar to Church gmod2012 pt2

20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Nuria Lopez-Bigas
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
Jan Aerts
 

Similar to Church gmod2012 pt2 (20)

Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
NCBI
NCBINCBI
NCBI
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincoln
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 

More from Deanna Church

Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
Deanna Church
 

More from Deanna Church (16)

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 
Church_NCBIvariation2013
Church_NCBIvariation2013Church_NCBIvariation2013
Church_NCBIvariation2013
 
Church iowa2013
Church iowa2013Church iowa2013
Church iowa2013
 
Church emory2013
Church emory2013Church emory2013
Church emory2013
 
Church GeT-RM
Church GeT-RMChurch GeT-RM
Church GeT-RM
 
Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
Church gia13
Church gia13Church gia13
Church gia13
 
Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church nhgri 2012
Church nhgri 2012Church nhgri 2012
Church nhgri 2012
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 

Church gmod2012 pt2

  • 1. The Evolution of the Resources Navigating Genome Reference Human Genome at NCBI Part 2 Deanna M. Church, NCBI @deannachurch
  • 2. Data Archives GenBank  Data in a common format  Data in a single location (and mirrored)  Most quality checked prior to deposition  Robust data tracking mechanism (accession.version)  Data owned by submitter
  • 3. Data tracking ABC14-1065514J1 Date Phase Gaps Length FP565796.1 21-Oct-2009 1 1 FP565796.2 14-Oct-2010 1 0 FP565796.3 07-Nov-2010 3 0
  • 6. What’s in a name? GRCh37 hg19 Zv7 danRer5 MGSCv37 mm8 NCBIM37
  • 7. By any other name… chr21:8,913,216-9,246,964
  • 8. By any other name… Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
  • 9. Genome Browser Agreement Submitter deposits assembly to Assembly QA GenBank/EMBL/DDBJ Submitter updates assembly based on QA results Browsers pick up assembly from GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ
  • 10. hg19 GRCh37 http://www.ncbi.nlm.nih.gov/genome/assembly
  • 11.
  • 12. Assembly (e.g. GRCh37.p5) GCA_000001405.6 /GCF_000001405.17 ALT GCA_000001345.1/ Primary GCA_000001305.1/ 4 GCF_000001345.1 Assembly GCF_000001305.13 ALT GCA_000001355.1/ 5 GCF_000001355.1 Non-nuclear ALT GCA_000001365.1/ GCA_000006015.1/ assembly unit 6 GCF_000001365.2 GCF_000006015.1 (e.g. MT) ALT GCA_000001375.1/ 7 GCF_000001375.1 ALT GCA_000001315.1/ 1 GCF_000001315.1 ALT GCA_000001385.1/ 8 GCF_000001385.1 ALT GCA_000001325.1/ 2 GCF_000001325.2 ALT GCA_000001395.1/ 9 GCF_000001395.1 ALT GCA_000001335.1/ 3 GCF_000001335.1 GCA_000005045.5 Patches GCF_000005045.4
  • 13. GenBank vs RefSeq Submitter Owned RefSeq Owned Redundancy Non-Redundant Updated rarely Curated INSDC Not INSDC BRCA1 83 genomic records 3 genomic records 31 mRNA records 5 mRNA records 27 protein records 1 RNA record 5 protein records
  • 14.
  • 15. RefSeq for Assemblies Typical assembly edits Addition of non-nuclear (e.g. MT) assembly units Removal of contamination Drop unlocalized/unplaced scaffolds Mask contamination that is placed on chromosome
  • 17. Understanding relationships between assemblies using alignments First Pass Reciprocal best hit Second Pass Non-reciprocal, duplicative hits
  • 18.
  • 19.
  • 20. NCBI36 GRCh37.p5 No second pass alignments in GRCh37.p5 http://www.ncbi.nlm.nih.gov/tools/gbench/
  • 21. Annotation pipeline Assemblies Transcripts Proteins Set of genes Other decoration Francoise Thibaud-Nissen
  • 22. Content of the final annotation product Description In In a BLAST On FTP sequence database site database Chromosomes (NC_or AC_)   Scaffolds (NW_ or NT_)    Curated transcripts/proteins (NM_, NR_/NP_)    Predicted transcripts/proteins (fully or partially    -supported) (XM_, XR_/XP_) Non-transcribed pseudogenes   tRNA (annotated with tRNAScan)   Ab initio Gnomon models   Annotation Pipeline RefSeq
  • 23. Where to find the annotation products? • Nucleotide/Protein databases • Gene http://www.ncbi.nlm.nih.gov/gene http://www.ncbi.nlm.nih.gov/mapview • Map Viewer • BLAST databases • FTP site
  • 24. Annotating multiple assemblies • Assembly-assembly alignments Available at http://www.ncbi.nlm.nih.gov/genome/tools/remap Group 1 Transcript Assembly 1 Group 2 Assembly 2 • Consistent placement of transcripts • Consistent labelling of the genes • Consistent annotation on all assemblies
  • 25. Annotating multiple assemblies(2) Btau_4.6.1 Same Gene symbol UMD_3.1
  • 26. Interacting with the community FlyBase GenBank RefSeq
  • 27.
  • 28. Thanks! The Genome Reference Consortium The Genome Center at Washington University The Wellcome Trust Sanger Institute The European Bioinformatics Institute The National Center for Biotechnology Information Church group at NCBI For Slides: Valerie Schneider Francoise Thibaud-Nissen Nathan Bouk Evan Eichler Hsiu-Chuan Chen Steve Sherry Peter Meric Victor Ananiev Chao Chen John Lopez John Garner Tim Hefferon NCBI Cliff Clausen

Editor's Notes

  1. Show alignment of a feature from first slide to show how far down the chromosome it has moved…
  2. Keeping track of people is way easier than keeping track of assemblies.