SlideShare a Scribd company logo
1 of 23
Download to read offline
HAVANA / Ensembl / GENCODE
annotation on GRCh38
Jonathan M. Mudge
Wellcome Trust Sanger Institute
HAVANA group
HAVANA provide manual gene annotation
cDNAs
ESTs
Genomic sequence (human, mouse, zebrafish…)
Protein
Transcript model
Publication data
Comparative analyses
Next generation datasets
Ensembl: computational genome annotation
Ensembl genebuild based on genomic alignments
Not all Ensembl releases represent new genebuilds
GENCODE is a HAVANA / Ensembl merge
… with 8 institutes contributing
Run every 3-6 months
GENCODEv23 released July 2015
19,797 protein coding genes
15,931 long non-coding RNA genes 14,477 pseudogenes
Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts
CDS exon
Non-coding / UTR
79,795 CDS transcripts due to alternative splicing
27,817 lncRNA transcripts 1,112 transcribed
GENCODE is the geneset for ENCODE
GENCODE has a designated web portal
www.gencodegenes.org
GENCODE has a designated web portal
www.gencodegenes.org
Viewing GENCODE in genome browsers
www.ensembl.org Ensembl 81/82 = GENCODEv23
Viewing GENCODE in genome browsers
https://genome.ucsc.edu
HAVANA annotation can be viewed in Vega
vega.sanger.ac.uk V61 Jun1 2015
‘update’ annotation
v20 was the first GENCODE on GRCh38
v19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)
Most gene IDs are preserved on GRCh38
GRCh37 GRCh38
Gene IDs were transferred based on contig-contig mapping strategy
… also used to map variation etc
ESPN
GRCh37
GRCh37
patch
GRCh38
Ensembl re-annotation of SRGAP2
ENSG00000266028
ENSG00000266028
ENSG00000163486
Assembly Gene ID
• Fixed gene issues caused by 37 > 38 changes
• Major QC performed
• New complex regions on chr 1, 9, X
• Alt loci / Haplotype annotation
v20 was the first GENCODE on GRCh38
V19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)
The new pericentromic region of chr9
New p-arm
Gaps closed / clones flipped round / clones moved to correct arm
Optical mapping data
Hundreds of new / rebuilt models
Old p-arm
Ongoing strategy for patch annotation
Ensembl: annotate patches when released without full gene build
HAVANA: prioritise certain fix / novel patches and alt loci for annotation
• some patches don’t contain genes that need re-annotating
• others are exceptionally complex
NOVEL patch HG-2048
GRCh38.p3
HAVANA pseudogene
HAVANA LRC annotation on GRCh38
Annotation of 34 Leukoctye Receptor
Complexes (LRCs) completed for v20
COX2
COX1
PGF1
PGF2
DM1A
DM1B
MC1B
MC1A
LILRs KIRs
GENCODE remains a work in progress
… arguably, far from complete
• We are missing genes, transcripts and exons
• 1000s of our models are incomplete
• Functional annotation is largely putative
Which transcripts are functional?
How do they function?
GRCh38 GENCODE incorporates NextGen data
Transcript capture and completion Functional annotation
Next generation experimental data
Short read data: querying transcript-level support of existing introns / exons
examining expression patterns, e.g. tissue specificity
Long read data: querying transcript-level support of existing introns / exons
CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts
Ribosome profiling: reappraising initiation codon usage
Mass spectrometry: identifying novel protein-coding regions
GENCODE v23 compared with v19
v23 has 2,678 more genes… 548 less protein coding genes
In conclusion
GENCODE is now a GRCh38 genebuild
Compared to GRCh37 builds it is:
• More accurate
• More comprehensive
• More sophisticated
We recommend you use GENCODEv23 on GRC38
Acknowledgements
Major funding:
GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics
Institute; The University of Lausanne; The Centre de Regulació Genòmica; The
University of California, Santa Cruz; The Massachusetts Institute of Technology;
Yale University; The Spanish National Cancer Research Centre.

More Related Content

What's hot

Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cellsChris Thorne
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014vaschn
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Integrated DNA Technologies
 

What's hot (20)

150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cells
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Crispr
CrisprCrispr
Crispr
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...
 

Viewers also liked

Cambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayorCambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayorJuan Camilo Zapata
 
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelinesJan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelinesGenomeInABottle
 
Intervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramilloIntervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramilloJuan Camilo Zapata
 
La educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativaLa educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativagmsrosario
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGenomeInABottle
 
Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)Vladymyr Klykov
 
Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 19406o Lykeio Kavalas
 
Registro Oncopediátrico Hospitalario Argentino - 2015
Registro Oncopediátrico Hospitalario  Argentino - 2015Registro Oncopediátrico Hospitalario  Argentino - 2015
Registro Oncopediátrico Hospitalario Argentino - 2015Pedro Roberto Casanova
 

Viewers also liked (15)

5 c
5 c5 c
5 c
 
Cambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayorCambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayor
 
Vintage 1
Vintage 1Vintage 1
Vintage 1
 
eric (2)
eric (2)eric (2)
eric (2)
 
4 geneticamolecolare 1
4 geneticamolecolare 14 geneticamolecolare 1
4 geneticamolecolare 1
 
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelinesJan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
 
Web tv
Web tvWeb tv
Web tv
 
McGraw-Hill Books
McGraw-Hill BooksMcGraw-Hill Books
McGraw-Hill Books
 
Intervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramilloIntervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramillo
 
La educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativaLa educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativa
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
Mushroom
MushroomMushroom
Mushroom
 
Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)
 
Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940
 
Registro Oncopediátrico Hospitalario Argentino - 2015
Registro Oncopediátrico Hospitalario  Argentino - 2015Registro Oncopediátrico Hospitalario  Argentino - 2015
Registro Oncopediátrico Hospitalario Argentino - 2015
 

Similar to Grc ashg2015 workshop_mudge

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinicalGolden Helix
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanSardar Arifuzzaman
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)talhakhat
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinarElsa von Licy
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).pptsumitraDas14
 
2011-Molecularmarkerpppppppppppppppppt.ppt
2011-Molecularmarkerpppppppppppppppppt.ppt2011-Molecularmarkerpppppppppppppppppt.ppt
2011-Molecularmarkerpppppppppppppppppt.pptBioinformaticsCentre
 

Similar to Grc ashg2015 workshop_mudge (20)

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
31931 31941
31931 3194131931 31941
31931 31941
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzaman
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Multiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell FunctionMultiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell Function
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt
 
2011-Molecularmarkerpppppppppppppppppt.ppt
2011-Molecularmarkerpppppppppppppppppt.ppt2011-Molecularmarkerpppppppppppppppppt.ppt
2011-Molecularmarkerpppppppppppppppppt.ppt
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 

More from Genome Reference Consortium (20)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 

Recently uploaded (20)

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 

Grc ashg2015 workshop_mudge

  • 1. HAVANA / Ensembl / GENCODE annotation on GRCh38 Jonathan M. Mudge Wellcome Trust Sanger Institute HAVANA group
  • 2. HAVANA provide manual gene annotation cDNAs ESTs Genomic sequence (human, mouse, zebrafish…) Protein Transcript model Publication data Comparative analyses Next generation datasets
  • 4. Ensembl genebuild based on genomic alignments Not all Ensembl releases represent new genebuilds
  • 5. GENCODE is a HAVANA / Ensembl merge … with 8 institutes contributing Run every 3-6 months GENCODEv23 released July 2015
  • 6. 19,797 protein coding genes 15,931 long non-coding RNA genes 14,477 pseudogenes Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts CDS exon Non-coding / UTR 79,795 CDS transcripts due to alternative splicing 27,817 lncRNA transcripts 1,112 transcribed GENCODE is the geneset for ENCODE
  • 7. GENCODE has a designated web portal www.gencodegenes.org
  • 8. GENCODE has a designated web portal www.gencodegenes.org
  • 9. Viewing GENCODE in genome browsers www.ensembl.org Ensembl 81/82 = GENCODEv23
  • 10. Viewing GENCODE in genome browsers https://genome.ucsc.edu
  • 11. HAVANA annotation can be viewed in Vega vega.sanger.ac.uk V61 Jun1 2015 ‘update’ annotation
  • 12. v20 was the first GENCODE on GRCh38 v19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  • 13. Most gene IDs are preserved on GRCh38 GRCh37 GRCh38 Gene IDs were transferred based on contig-contig mapping strategy … also used to map variation etc ESPN
  • 14. GRCh37 GRCh37 patch GRCh38 Ensembl re-annotation of SRGAP2 ENSG00000266028 ENSG00000266028 ENSG00000163486 Assembly Gene ID
  • 15. • Fixed gene issues caused by 37 > 38 changes • Major QC performed • New complex regions on chr 1, 9, X • Alt loci / Haplotype annotation v20 was the first GENCODE on GRCh38 V19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  • 16. The new pericentromic region of chr9 New p-arm Gaps closed / clones flipped round / clones moved to correct arm Optical mapping data Hundreds of new / rebuilt models Old p-arm
  • 17. Ongoing strategy for patch annotation Ensembl: annotate patches when released without full gene build HAVANA: prioritise certain fix / novel patches and alt loci for annotation • some patches don’t contain genes that need re-annotating • others are exceptionally complex NOVEL patch HG-2048 GRCh38.p3 HAVANA pseudogene
  • 18. HAVANA LRC annotation on GRCh38 Annotation of 34 Leukoctye Receptor Complexes (LRCs) completed for v20 COX2 COX1 PGF1 PGF2 DM1A DM1B MC1B MC1A LILRs KIRs
  • 19. GENCODE remains a work in progress … arguably, far from complete • We are missing genes, transcripts and exons • 1000s of our models are incomplete • Functional annotation is largely putative Which transcripts are functional? How do they function?
  • 20. GRCh38 GENCODE incorporates NextGen data Transcript capture and completion Functional annotation Next generation experimental data Short read data: querying transcript-level support of existing introns / exons examining expression patterns, e.g. tissue specificity Long read data: querying transcript-level support of existing introns / exons CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts Ribosome profiling: reappraising initiation codon usage Mass spectrometry: identifying novel protein-coding regions
  • 21. GENCODE v23 compared with v19 v23 has 2,678 more genes… 548 less protein coding genes
  • 22. In conclusion GENCODE is now a GRCh38 genebuild Compared to GRCh37 builds it is: • More accurate • More comprehensive • More sophisticated We recommend you use GENCODEv23 on GRC38
  • 23. Acknowledgements Major funding: GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics Institute; The University of Lausanne; The Centre de Regulació Genòmica; The University of California, Santa Cruz; The Massachusetts Institute of Technology; Yale University; The Spanish National Cancer Research Centre.