Grc ashg2015 workshop_mudge

•

0 likes•1,954 views

Genome Reference Consortium

ASHG 2015 GRC workshop talk by Jonathan Mudge

Science

HAVANA / Ensembl / GENCODE
annotation on GRCh38
Jonathan M. Mudge
Wellcome Trust Sanger Institute
HAVANA group

HAVANA provide manual gene annotation
cDNAs
ESTs
Genomic sequence (human, mouse, zebrafish…)
Protein
Transcript model
Publication data
Comparative analyses
Next generation datasets

Ensembl: computational genome annotation

Ensembl genebuild based on genomic alignments
Not all Ensembl releases represent new genebuilds

GENCODE is a HAVANA / Ensembl merge
… with 8 institutes contributing
Run every 3-6 months
GENCODEv23 released July 2015

19,797 protein coding genes
15,931 long non-coding RNA genes 14,477 pseudogenes
Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts
CDS exon
Non-coding / UTR
79,795 CDS transcripts due to alternative splicing
27,817 lncRNA transcripts 1,112 transcribed
GENCODE is the geneset for ENCODE

GENCODE has a designated web portal
www.gencodegenes.org

Viewing GENCODE in genome browsers
www.ensembl.org Ensembl 81/82 = GENCODEv23

Viewing GENCODE in genome browsers
https://genome.ucsc.edu

HAVANA annotation can be viewed in Vega
vega.sanger.ac.uk V61 Jun1 2015
‘update’ annotation

v20 was the first GENCODE on GRCh38
v19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)

Most gene IDs are preserved on GRCh38
GRCh37 GRCh38
Gene IDs were transferred based on contig-contig mapping strategy
… also used to map variation etc
ESPN

GRCh37
GRCh37
patch
GRCh38
Ensembl re-annotation of SRGAP2
ENSG00000266028
ENSG00000266028
ENSG00000163486
Assembly Gene ID

• Fixed gene issues caused by 37 > 38 changes
• Major QC performed
• New complex regions on chr 1, 9, X
• Alt loci / Haplotype annotation
v20 was the first GENCODE on GRCh38
V19 on GRCh37
GRCh38
(1) HAVANA
liftover
(2) HAVANA
reannotation
(3) Merge into full new Ensembl genebuild
(Ensembl release 76)

The new pericentromic region of chr9
New p-arm
Gaps closed / clones flipped round / clones moved to correct arm
Optical mapping data
Hundreds of new / rebuilt models
Old p-arm

Ongoing strategy for patch annotation
Ensembl: annotate patches when released without full gene build
HAVANA: prioritise certain fix / novel patches and alt loci for annotation
• some patches don’t contain genes that need re-annotating
• others are exceptionally complex
NOVEL patch HG-2048
GRCh38.p3
HAVANA pseudogene

HAVANA LRC annotation on GRCh38
Annotation of 34 Leukoctye Receptor
Complexes (LRCs) completed for v20
COX2
COX1
PGF1
PGF2
DM1A
DM1B
MC1B
MC1A
LILRs KIRs

GENCODE remains a work in progress
… arguably, far from complete
• We are missing genes, transcripts and exons
• 1000s of our models are incomplete
• Functional annotation is largely putative
Which transcripts are functional?
How do they function?

GRCh38 GENCODE incorporates NextGen data
Transcript capture and completion Functional annotation
Next generation experimental data
Short read data: querying transcript-level support of existing introns / exons
examining expression patterns, e.g. tissue specificity
Long read data: querying transcript-level support of existing introns / exons
CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts
Ribosome profiling: reappraising initiation codon usage
Mass spectrometry: identifying novel protein-coding regions

GENCODE v23 compared with v19
v23 has 2,678 more genes… 548 less protein coding genes

In conclusion
GENCODE is now a GRCh38 genebuild
Compared to GRCh37 builds it is:
• More accurate
• More comprehensive
• More sophisticated
We recommend you use GENCODEv23 on GRC38

Acknowledgements
Major funding:
GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics
Institute; The University of Lausanne; The Centre de Regulació Genòmica; The
University of California, Santa Cruz; The Massachusetts Institute of Technology;
Yale University; The Spanish National Cancer Research Centre.

What's hot

150224 grc kmsGenome Reference Consortium

Ashg2015 schneider finalGenome Reference Consortium

Getting the most from the reference assemblyGenome Reference Consortium

Agbt2015 workshop schneiderGenome Reference Consortium

Making genome edits in mammalian cellsChris Thorne

Explaining the assembly modelGenome Reference Consortium

Ashg grc workshop2015_tgGenome Reference Consortium

Variation graphs and population assisted genome inference copyGenome Reference Consortium

Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium

hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie

New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies

GRCWorkshop_geval_1KG_slidesGenome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium

Ashg grc workshop2014_tgGenome Reference Consortium

CrisprYudha Nur Patria

[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma

Transcript detection in RNAseqDenis C. Bauer

Grc workshop agbt2015_tgGenome Reference Consortium

Schneider_AGBT2014vaschn

Reducing off-target events in CRISPR genome editing applications with a novel...Integrated DNA Technologies

What's hot (20)

150224 grc kms

Ashg2015 schneider final

Getting the most from the reference assembly

Agbt2015 workshop schneider

Making genome edits in mammalian cells

Explaining the assembly model

Ashg grc workshop2015_tg

Variation graphs and population assisted genome inference copy

Creating Reference-Grade Human Genome Assemblies

hg19 (GRCh37) vs. hg38 (GRCh38)

New RNA tools for optimized CRISPR/Cas9 genome editing

GRCWorkshop_geval_1KG_slides

Previewing GRCm39: Assembly Updates from the GRC

Ashg grc workshop2014_tg

Crispr

[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...

Transcript detection in RNAseq

Grc workshop agbt2015_tg

Schneider_AGBT2014

Reducing off-target events in CRISPR genome editing applications with a novel...

Viewers also liked

5 cМайа Луст

Cambio osteomusculares adulto mayorJuan Camilo Zapata

Vintage 1Athina Kollia

eric (2)eric mugarisanwa

4 geneticamolecolare 1Delia Ciobotaru

Jan2016 curoverse benchmarking somatic variant_calling_pipelinesGenomeInABottle

Web tvAthina Kollia

McGraw-Hill BooksDennis L. Prince

Intervención a adulto jaiver jaramilloJuan Camilo Zapata

La educacion a_distancia_como_estrategia de inclusión social y educativagmsrosario

GIAB Sep2016 Lightning megan cleveland targeted seqGenomeInABottle

MushroomGold Lotus

Как привлекать клиентов в интернете (Быстро и бесплатно)Vladymyr Klykov

Γιορτή επετείου 28ης Οκτωβρίου 19406o Lykeio Kavalas

Registro Oncopediátrico Hospitalario Argentino - 2015Pedro Roberto Casanova

Viewers also liked (15)

5 c

Cambio osteomusculares adulto mayor

Vintage 1

eric (2)

4 geneticamolecolare 1

Jan2016 curoverse benchmarking somatic variant_calling_pipelines

Web tv

McGraw-Hill Books

Intervención a adulto jaiver jaramillo

La educacion a_distancia_como_estrategia de inclusión social y educativa

GIAB Sep2016 Lightning megan cleveland targeted seq

Mushroom

Как привлекать клиентов в интернете (Быстро и бесплатно)

Γιορτή επετείου 28ης Οκτωβρίου 1940

Registro Oncopediátrico Hospitalario Argentino - 2015

Similar to Grc ashg2015 workshop_mudge

Microarray biotechnologg ppy dna microarraysayeshasattarsandhu

Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob

RNA-seq: analysis of raw data and preprocessing - part 2BITS

RNA sequencing analysis tutorial with NGSHAMNAHAMNA8

Using the GRCh38 reference assembly for clinical interpretation in VSClinicalGolden Helix

RNA Sequencing ResearchTanmay Ghai

Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics

31931 31941Amit Gupta

SAGE- Serial Analysis of Gene ExpressionAashish Patel

Biohackathon2016Takako Mochizuki

Bioinformatics class ppt arifuzzamanSardar Arifuzzaman

SAGE (Serial analysis of Gene Expression)talhakhat

Genome in a bottle for amp GeT-RM 181030GenomeInABottle

2012 10-24 - ngs webinarElsa von Licy

Rna seq and chip seqJyoti Singh

Multiplex Assays for Studying Gene Regulation and Cell FunctionMiraiBio Group of Hitachi Solutions America, Ltd.

Functional genomicsAjit Shinde

Dgaston dec-06-2012Dan Gaston

2011-Molecularmarker (1).pptsumitraDas14

2011-Molecularmarkerpppppppppppppppppt.pptBioinformaticsCentre

Similar to Grc ashg2015 workshop_mudge (20)

Microarray biotechnologg ppy dna microarrays

Part 2 of RNA-seq for DE analysis: Investigating raw data

RNA-seq: analysis of raw data and preprocessing - part 2

RNA sequencing analysis tutorial with NGS

Using the GRCh38 reference assembly for clinical interpretation in VSClinical

RNA Sequencing Research

Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...

31931 31941

SAGE- Serial Analysis of Gene Expression

Biohackathon2016

Bioinformatics class ppt arifuzzaman

SAGE (Serial analysis of Gene Expression)

Genome in a bottle for amp GeT-RM 181030

2012 10-24 - ngs webinar

Rna seq and chip seq

Multiplex Assays for Studying Gene Regulation and Cell Function

Functional genomics

Dgaston dec-06-2012

2011-Molecularmarker (1).ppt

2011-Molecularmarkerpppppppppppppppppt.ppt

Recently uploaded

Clean In Place(CIP).pptx .Poonam Aher Patil

Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani

Proteomics: types, protein profiling steps etc.Silpa

dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET

GBSN - Microbiology (Unit 3)Areesha Ahmad

COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed

IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2

biology HL practice questions IB BIOLOGY1301aanya

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330

Formation of low mass protostars and their circumstellar disksSérgio Sacani

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)AkefAfaneh2

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour

Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1

STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy

Recently uploaded (20)

Clean In Place(CIP).pptx .

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...

Proteomics: types, protein profiling steps etc.

dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...

GBSN - Microbiology (Unit 3)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx

IDENTIFICATION OF THE LIVING- forensic medicine

biology HL practice questions IB BIOLOGY

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE

Formation of low mass protostars and their circumstellar disks

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...

Pests of mustard_Identification_Management_Dr.UPR.pdf

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION

Grc ashg2015 workshop_mudge

1. HAVANA / Ensembl / GENCODE annotation on GRCh38 Jonathan M. Mudge Wellcome Trust Sanger Institute HAVANA group

2. HAVANA provide manual gene annotation cDNAs ESTs Genomic sequence (human, mouse, zebrafish…) Protein Transcript model Publication data Comparative analyses Next generation datasets

3. Ensembl: computational genome annotation

4. Ensembl genebuild based on genomic alignments Not all Ensembl releases represent new genebuilds

5. GENCODE is a HAVANA / Ensembl merge … with 8 institutes contributing Run every 3-6 months GENCODEv23 released July 2015

6. 19,797 protein coding genes 15,931 long non-coding RNA genes 14,477 pseudogenes Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts CDS exon Non-coding / UTR 79,795 CDS transcripts due to alternative splicing 27,817 lncRNA transcripts 1,112 transcribed GENCODE is the geneset for ENCODE

7. GENCODE has a designated web portal www.gencodegenes.org

8. GENCODE has a designated web portal www.gencodegenes.org

9. Viewing GENCODE in genome browsers www.ensembl.org Ensembl 81/82 = GENCODEv23

10. Viewing GENCODE in genome browsers https://genome.ucsc.edu

11. HAVANA annotation can be viewed in Vega vega.sanger.ac.uk V61 Jun1 2015 ‘update’ annotation

12. v20 was the first GENCODE on GRCh38 v19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)

13. Most gene IDs are preserved on GRCh38 GRCh37 GRCh38 Gene IDs were transferred based on contig-contig mapping strategy … also used to map variation etc ESPN

14. GRCh37 GRCh37 patch GRCh38 Ensembl re-annotation of SRGAP2 ENSG00000266028 ENSG00000266028 ENSG00000163486 Assembly Gene ID

15. • Fixed gene issues caused by 37 > 38 changes • Major QC performed • New complex regions on chr 1, 9, X • Alt loci / Haplotype annotation v20 was the first GENCODE on GRCh38 V19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)

16. The new pericentromic region of chr9 New p-arm Gaps closed / clones flipped round / clones moved to correct arm Optical mapping data Hundreds of new / rebuilt models Old p-arm

17. Ongoing strategy for patch annotation Ensembl: annotate patches when released without full gene build HAVANA: prioritise certain fix / novel patches and alt loci for annotation • some patches don’t contain genes that need re-annotating • others are exceptionally complex NOVEL patch HG-2048 GRCh38.p3 HAVANA pseudogene

18. HAVANA LRC annotation on GRCh38 Annotation of 34 Leukoctye Receptor Complexes (LRCs) completed for v20 COX2 COX1 PGF1 PGF2 DM1A DM1B MC1B MC1A LILRs KIRs

19. GENCODE remains a work in progress … arguably, far from complete • We are missing genes, transcripts and exons • 1000s of our models are incomplete • Functional annotation is largely putative Which transcripts are functional? How do they function?

20. GRCh38 GENCODE incorporates NextGen data Transcript capture and completion Functional annotation Next generation experimental data Short read data: querying transcript-level support of existing introns / exons examining expression patterns, e.g. tissue specificity Long read data: querying transcript-level support of existing introns / exons CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts Ribosome profiling: reappraising initiation codon usage Mass spectrometry: identifying novel protein-coding regions

21. GENCODE v23 compared with v19 v23 has 2,678 more genes… 548 less protein coding genes

22. In conclusion GENCODE is now a GRCh38 genebuild Compared to GRCh37 builds it is: • More accurate • More comprehensive • More sophisticated We recommend you use GENCODEv23 on GRC38

23. Acknowledgements Major funding: GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics Institute; The University of Lausanne; The Centre de Regulació Genòmica; The University of California, Santa Cruz; The Massachusetts Institute of Technology; Yale University; The Spanish National Cancer Research Centre.

Grc ashg2015 workshop_mudge

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Grc ashg2015 workshop_mudge

Similar to Grc ashg2015 workshop_mudge (20)

More from Genome Reference Consortium

More from Genome Reference Consortium (20)

Recently uploaded

Recently uploaded (20)

Grc ashg2015 workshop_mudge