SlideShare a Scribd company logo
HAVANA / Ensembl / GENCODE
annotation on GRCh38
Jonathan M. Mudge
Wellcome Trust Sanger Institute
HAVANA group
HAVANA provide manual gene annotation
cDNAs
ESTs
Genomic sequence (human, mouse, zebrafish…)
Protein
Transcript model
Publication data
Comparative analyses
Next generation datasets
Ensembl: computational genome annotation
Ensembl genebuild based on genomic alignments
Not all Ensembl releases represent new genebuilds
GENCODE is a HAVANA / Ensembl merge
… with 8 institutes contributing
Run every 3-6 months
GENCODEv23 released July 2015
19,797 protein coding genes
15,931 long non-coding RNA genes 14,477 pseudogenes
Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts
CDS exon
Non-coding / UTR
79,795 CDS transcripts due to alternative splicing
27,817 lncRNA transcripts 1,112 transcribed
GENCODE is the geneset for ENCODE
GENCODE has a designated web portal
www.gencodegenes.org
GENCODE has a designated web portal
www.gencodegenes.org
Viewing GENCODE in genome browsers
www.ensembl.org Ensembl 81/82 = GENCODEv23
Viewing GENCODE in genome browsers
https://genome.ucsc.edu
HAVANA annotation can be viewed in Vega
vega.sanger.ac.uk V61 Jun1 2015
‘update’ annotation
v20 was the first GENCODE on GRCh38
v19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)
Most gene IDs are preserved on GRCh38
GRCh37 GRCh38
Gene IDs were transferred based on contig-contig mapping strategy
… also used to map variation etc
ESPN
GRCh37
GRCh37
patch
GRCh38
Ensembl re-annotation of SRGAP2
ENSG00000266028
ENSG00000266028
ENSG00000163486
Assembly Gene ID
• Fixed gene issues caused by 37 > 38 changes
• Major QC performed
• New complex regions on chr 1, 9, X
• Alt loci / Haplotype annotation
v20 was the first GENCODE on GRCh38
V19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)
The new pericentromic region of chr9
New p-arm
Gaps closed / clones flipped round / clones moved to correct arm
Optical mapping data
Hundreds of new / rebuilt models
Old p-arm
Ongoing strategy for patch annotation
Ensembl: annotate patches when released without full gene build
HAVANA: prioritise certain fix / novel patches and alt loci for annotation
• some patches don’t contain genes that need re-annotating
• others are exceptionally complex
NOVEL patch HG-2048
GRCh38.p3
HAVANA pseudogene
HAVANA LRC annotation on GRCh38
Annotation of 34 Leukoctye Receptor
Complexes (LRCs) completed for v20
COX2
COX1
PGF1
PGF2
DM1A
DM1B
MC1B
MC1A
LILRs KIRs
GENCODE remains a work in progress
… arguably, far from complete
• We are missing genes, transcripts and exons
• 1000s of our models are incomplete
• Functional annotation is largely putative
Which transcripts are functional?
How do they function?
GRCh38 GENCODE incorporates NextGen data
Transcript capture and completion Functional annotation
Next generation experimental data
Short read data: querying transcript-level support of existing introns / exons
examining expression patterns, e.g. tissue specificity
Long read data: querying transcript-level support of existing introns / exons
CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts
Ribosome profiling: reappraising initiation codon usage
Mass spectrometry: identifying novel protein-coding regions
GENCODE v23 compared with v19
v23 has 2,678 more genes… 548 less protein coding genes
In conclusion
GENCODE is now a GRCh38 genebuild
Compared to GRCh37 builds it is:
• More accurate
• More comprehensive
• More sophisticated
We recommend you use GENCODEv23 on GRC38
Acknowledgements
Major funding:
GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics
Institute; The University of Lausanne; The Centre de Regulació Genòmica; The
University of California, Santa Cruz; The Massachusetts Institute of Technology;
Yale University; The Spanish National Cancer Research Centre.

More Related Content

What's hot

150224 grc kms
150224 grc kms150224 grc kms
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
Genome Reference Consortium
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cells
Chris Thorne
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
Genome Reference Consortium
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
Shaojun Xie
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
Integrated DNA Technologies
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
Genome Reference Consortium
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
Genome Reference Consortium
 
Crispr
CrisprCrispr
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
Eli Kaminuma
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
Denis C. Bauer
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
Genome Reference Consortium
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
vaschn
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...
Integrated DNA Technologies
 

What's hot (20)

150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cells
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Crispr
CrisprCrispr
Crispr
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...
 

Viewers also liked

5 c
5 c5 c
Cambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayorCambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayor
Juan Camilo Zapata
 
Vintage 1
Vintage 1Vintage 1
Vintage 1
Athina Kollia
 
eric (2)
eric (2)eric (2)
4 geneticamolecolare 1
4 geneticamolecolare 14 geneticamolecolare 1
4 geneticamolecolare 1
Delia Ciobotaru
 
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelinesJan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
GenomeInABottle
 
Web tv
Web tvWeb tv
Intervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramilloIntervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramillo
Juan Camilo Zapata
 
La educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativaLa educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativa
gmsrosario
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
GenomeInABottle
 
Mushroom
MushroomMushroom
Mushroom
Gold Lotus
 
Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)
Vladymyr Klykov
 
Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940
6o Lykeio Kavalas
 
Registro Oncopediátrico Hospitalario Argentino - 2015
Registro Oncopediátrico Hospitalario  Argentino - 2015Registro Oncopediátrico Hospitalario  Argentino - 2015
Registro Oncopediátrico Hospitalario Argentino - 2015
Pedro Roberto Casanova
 

Viewers also liked (15)

5 c
5 c5 c
5 c
 
Cambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayorCambio osteomusculares adulto mayor
Cambio osteomusculares adulto mayor
 
Vintage 1
Vintage 1Vintage 1
Vintage 1
 
eric (2)
eric (2)eric (2)
eric (2)
 
4 geneticamolecolare 1
4 geneticamolecolare 14 geneticamolecolare 1
4 geneticamolecolare 1
 
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelinesJan2016 curoverse benchmarking somatic variant_calling_pipelines
Jan2016 curoverse benchmarking somatic variant_calling_pipelines
 
Web tv
Web tvWeb tv
Web tv
 
McGraw-Hill Books
McGraw-Hill BooksMcGraw-Hill Books
McGraw-Hill Books
 
Intervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramilloIntervención a adulto jaiver jaramillo
Intervención a adulto jaiver jaramillo
 
La educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativaLa educacion a_distancia_como_estrategia de inclusión social y educativa
La educacion a_distancia_como_estrategia de inclusión social y educativa
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
Mushroom
MushroomMushroom
Mushroom
 
Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)Как привлекать клиентов в интернете (Быстро и бесплатно)
Как привлекать клиентов в интернете (Быстро и бесплатно)
 
Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940Γιορτή επετείου 28ης Οκτωβρίου 1940
Γιορτή επετείου 28ης Οκτωβρίου 1940
 
Registro Oncopediátrico Hospitalario Argentino - 2015
Registro Oncopediátrico Hospitalario  Argentino - 2015Registro Oncopediátrico Hospitalario  Argentino - 2015
Registro Oncopediátrico Hospitalario Argentino - 2015
 

Similar to Grc ashg2015 workshop_mudge

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
ayeshasattarsandhu
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
Joachim Jacob
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
BITS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
HAMNAHAMNA8
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Golden Helix
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
Tanmay Ghai
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
fruitbreedomics
 
31931 31941
31931 3194131931 31941
31931 31941
Amit Gupta
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
Takako Mochizuki
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzaman
Sardar Arifuzzaman
 
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptxMETHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
Cherry
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
talhakhat
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
GenomeInABottle
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
Elsa von Licy
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
Jyoti Singh
 
Multiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell FunctionMultiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell Function
MiraiBio Group of Hitachi Solutions America, Ltd.
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
Ajit Shinde
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Dan Gaston
 
2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt
sumitraDas14
 

Similar to Grc ashg2015 workshop_mudge (20)

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
31931 31941
31931 3194131931 31941
31931 31941
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzaman
 
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptxMETHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Multiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell FunctionMultiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell Function
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt2011-Molecularmarker (1).ppt
2011-Molecularmarker (1).ppt
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 

More from Genome Reference Consortium (20)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 

Recently uploaded

16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
marigreenproject
 
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
Sérgio Sacani
 
Potential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptxPotential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
Faculty of Applied Chemistry and Materials Science
 
Shoot apex organization and its theories
Shoot apex organization and its theoriesShoot apex organization and its theories
Shoot apex organization and its theories
MEGHASHREE A M
 
Current Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptxCurrent Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptx
ArunachalamM22
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
marigreenproject
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Faculty of Applied Chemistry and Materials Science
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
muralinath2
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
pablovgd
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
Faculty of Applied Chemistry and Materials Science
 
The CGIAR needs a revolution John McIntire a, Achim Dobermann b
The CGIAR needs a revolution John McIntire a, Achim Dobermann bThe CGIAR needs a revolution John McIntire a, Achim Dobermann b
The CGIAR needs a revolution John McIntire a, Achim Dobermann b
Abdellah HAMMA
 
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
Sérgio Sacani
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Gurjant Singh
 
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
Steffi Friedrichs
 
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
muralinath2
 
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdfVIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
poorvarajgolkar
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
Faculty of Applied Chemistry and Materials Science
 

Recently uploaded (20)

16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
16. 20240529_Ailin Molosag_MARIGREEN_SS_Day3_Ailin.pdf
 
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
From Seeds to Supermassive Black Holes: Capture, Growth, Migration, and Pairi...
 
Potential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptxPotential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptx
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
 
Shoot apex organization and its theories
Shoot apex organization and its theoriesShoot apex organization and its theories
Shoot apex organization and its theories
 
Current Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptxCurrent Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptx
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
 
The CGIAR needs a revolution John McIntire a, Achim Dobermann b
The CGIAR needs a revolution John McIntire a, Achim Dobermann bThe CGIAR needs a revolution John McIntire a, Achim Dobermann b
The CGIAR needs a revolution John McIntire a, Achim Dobermann b
 
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
 
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
 
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
bloodclotfactorsprocoagulantsexstrinsicintrinsicfactors-240607054610-6895d6e5...
 
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdfVIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
 

Grc ashg2015 workshop_mudge

  • 1. HAVANA / Ensembl / GENCODE annotation on GRCh38 Jonathan M. Mudge Wellcome Trust Sanger Institute HAVANA group
  • 2. HAVANA provide manual gene annotation cDNAs ESTs Genomic sequence (human, mouse, zebrafish…) Protein Transcript model Publication data Comparative analyses Next generation datasets
  • 4. Ensembl genebuild based on genomic alignments Not all Ensembl releases represent new genebuilds
  • 5. GENCODE is a HAVANA / Ensembl merge … with 8 institutes contributing Run every 3-6 months GENCODEv23 released July 2015
  • 6. 19,797 protein coding genes 15,931 long non-coding RNA genes 14,477 pseudogenes Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts CDS exon Non-coding / UTR 79,795 CDS transcripts due to alternative splicing 27,817 lncRNA transcripts 1,112 transcribed GENCODE is the geneset for ENCODE
  • 7. GENCODE has a designated web portal www.gencodegenes.org
  • 8. GENCODE has a designated web portal www.gencodegenes.org
  • 9. Viewing GENCODE in genome browsers www.ensembl.org Ensembl 81/82 = GENCODEv23
  • 10. Viewing GENCODE in genome browsers https://genome.ucsc.edu
  • 11. HAVANA annotation can be viewed in Vega vega.sanger.ac.uk V61 Jun1 2015 ‘update’ annotation
  • 12. v20 was the first GENCODE on GRCh38 v19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  • 13. Most gene IDs are preserved on GRCh38 GRCh37 GRCh38 Gene IDs were transferred based on contig-contig mapping strategy … also used to map variation etc ESPN
  • 14. GRCh37 GRCh37 patch GRCh38 Ensembl re-annotation of SRGAP2 ENSG00000266028 ENSG00000266028 ENSG00000163486 Assembly Gene ID
  • 15. • Fixed gene issues caused by 37 > 38 changes • Major QC performed • New complex regions on chr 1, 9, X • Alt loci / Haplotype annotation v20 was the first GENCODE on GRCh38 V19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  • 16. The new pericentromic region of chr9 New p-arm Gaps closed / clones flipped round / clones moved to correct arm Optical mapping data Hundreds of new / rebuilt models Old p-arm
  • 17. Ongoing strategy for patch annotation Ensembl: annotate patches when released without full gene build HAVANA: prioritise certain fix / novel patches and alt loci for annotation • some patches don’t contain genes that need re-annotating • others are exceptionally complex NOVEL patch HG-2048 GRCh38.p3 HAVANA pseudogene
  • 18. HAVANA LRC annotation on GRCh38 Annotation of 34 Leukoctye Receptor Complexes (LRCs) completed for v20 COX2 COX1 PGF1 PGF2 DM1A DM1B MC1B MC1A LILRs KIRs
  • 19. GENCODE remains a work in progress … arguably, far from complete • We are missing genes, transcripts and exons • 1000s of our models are incomplete • Functional annotation is largely putative Which transcripts are functional? How do they function?
  • 20. GRCh38 GENCODE incorporates NextGen data Transcript capture and completion Functional annotation Next generation experimental data Short read data: querying transcript-level support of existing introns / exons examining expression patterns, e.g. tissue specificity Long read data: querying transcript-level support of existing introns / exons CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts Ribosome profiling: reappraising initiation codon usage Mass spectrometry: identifying novel protein-coding regions
  • 21. GENCODE v23 compared with v19 v23 has 2,678 more genes… 548 less protein coding genes
  • 22. In conclusion GENCODE is now a GRCh38 genebuild Compared to GRCh37 builds it is: • More accurate • More comprehensive • More sophisticated We recommend you use GENCODEv23 on GRC38
  • 23. Acknowledgements Major funding: GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics Institute; The University of Lausanne; The Centre de Regulació Genòmica; The University of California, Santa Cruz; The Massachusetts Institute of Technology; Yale University; The Spanish National Cancer Research Centre.