Advertisement
Advertisement

More Related Content

Advertisement

More from Genome Reference Consortium(20)

Advertisement

Grc ashg2015 workshop_mudge

  1. HAVANA / Ensembl / GENCODE annotation on GRCh38 Jonathan M. Mudge Wellcome Trust Sanger Institute HAVANA group
  2. HAVANA provide manual gene annotation cDNAs ESTs Genomic sequence (human, mouse, zebrafish…) Protein Transcript model Publication data Comparative analyses Next generation datasets
  3. Ensembl: computational genome annotation
  4. Ensembl genebuild based on genomic alignments Not all Ensembl releases represent new genebuilds
  5. GENCODE is a HAVANA / Ensembl merge … with 8 institutes contributing Run every 3-6 months GENCODEv23 released July 2015
  6. 19,797 protein coding genes 15,931 long non-coding RNA genes 14,477 pseudogenes Hum GENCODE v23: 60,498 genes containing 198, 619 transcripts CDS exon Non-coding / UTR 79,795 CDS transcripts due to alternative splicing 27,817 lncRNA transcripts 1,112 transcribed GENCODE is the geneset for ENCODE
  7. GENCODE has a designated web portal www.gencodegenes.org
  8. GENCODE has a designated web portal www.gencodegenes.org
  9. Viewing GENCODE in genome browsers www.ensembl.org Ensembl 81/82 = GENCODEv23
  10. Viewing GENCODE in genome browsers https://genome.ucsc.edu
  11. HAVANA annotation can be viewed in Vega vega.sanger.ac.uk V61 Jun1 2015 ‘update’ annotation
  12. v20 was the first GENCODE on GRCh38 v19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  13. Most gene IDs are preserved on GRCh38 GRCh37 GRCh38 Gene IDs were transferred based on contig-contig mapping strategy … also used to map variation etc ESPN
  14. GRCh37 GRCh37 patch GRCh38 Ensembl re-annotation of SRGAP2 ENSG00000266028 ENSG00000266028 ENSG00000163486 Assembly Gene ID
  15. • Fixed gene issues caused by 37 > 38 changes • Major QC performed • New complex regions on chr 1, 9, X • Alt loci / Haplotype annotation v20 was the first GENCODE on GRCh38 V19 on GRCh37 GRCh38 (1) HAVANA liftover (2) HAVANA reannotation (3) Merge into full new Ensembl genebuild (Ensembl release 76)
  16. The new pericentromic region of chr9 New p-arm Gaps closed / clones flipped round / clones moved to correct arm Optical mapping data Hundreds of new / rebuilt models Old p-arm
  17. Ongoing strategy for patch annotation Ensembl: annotate patches when released without full gene build HAVANA: prioritise certain fix / novel patches and alt loci for annotation • some patches don’t contain genes that need re-annotating • others are exceptionally complex NOVEL patch HG-2048 GRCh38.p3 HAVANA pseudogene
  18. HAVANA LRC annotation on GRCh38 Annotation of 34 Leukoctye Receptor Complexes (LRCs) completed for v20 COX2 COX1 PGF1 PGF2 DM1A DM1B MC1B MC1A LILRs KIRs
  19. GENCODE remains a work in progress … arguably, far from complete • We are missing genes, transcripts and exons • 1000s of our models are incomplete • Functional annotation is largely putative Which transcripts are functional? How do they function?
  20. GRCh38 GENCODE incorporates NextGen data Transcript capture and completion Functional annotation Next generation experimental data Short read data: querying transcript-level support of existing introns / exons examining expression patterns, e.g. tissue specificity Long read data: querying transcript-level support of existing introns / exons CAGE / RAMPAGE / PolyAseq: establishing start and end points of genes / transcripts Ribosome profiling: reappraising initiation codon usage Mass spectrometry: identifying novel protein-coding regions
  21. GENCODE v23 compared with v19 v23 has 2,678 more genes… 548 less protein coding genes
  22. In conclusion GENCODE is now a GRCh38 genebuild Compared to GRCh37 builds it is: • More accurate • More comprehensive • More sophisticated We recommend you use GENCODEv23 on GRC38
  23. Acknowledgements Major funding: GENCODE partners: Wellcome Trust Sanger Institute; European Bioinformatics Institute; The University of Lausanne; The Centre de Regulació Genòmica; The University of California, Santa Cruz; The Massachusetts Institute of Technology; Yale University; The Spanish National Cancer Research Centre.
Advertisement