SlideShare a Scribd company logo

Ensembl annotation

GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.

1 of 31
Download to read offline
EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl annotation
Bronwen Aken
21 September 2014
How Ensembl started
• Ewan Birney
• Michele Clamp
• Tim Hubbard
Ensembl’s goals
Annotate
(vertebrate)
genome
Integrate
with other
biological
data
Make
publicly
available
• Stable, automatic
annotation
• High quality
• Regular release cycles
• Open source
“Provide a bioinformatics framework to organise biology around
the sequences of large genomes”
Challenges
1. Find functional elements in a genome
• Data have lots of noise
2. Software / hardware
• Storing and manipulating data
3. Intuitive and comprehensive access to data
• Visualization
GRCh38 annotation in Ensembl
What is Genebuilding?
• Automatic, evidence-based annotation of
genes
• Not ab initio
• Based on sequence alignment
• “Best-in-genome”
• Aim for high specificity
• Prefer to miss a few features than heavily over-
predict
Automated gene annotation pipeline is designed
around decisions made during manual annotation

Recommended

More Related Content

What's hot

The ensembl database
The ensembl databaseThe ensembl database
The ensembl databaseAshfaq Ahmad
 
Integrative omics approches
Integrative omics approches   Integrative omics approches
Integrative omics approches Sayali Magar
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysissaberhussain9
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_predictionBas van Breukelen
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencingGoutham Sarovar
 
Bioinformatics and BioPerl
Bioinformatics and BioPerlBioinformatics and BioPerl
Bioinformatics and BioPerlJason Stajich
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomicsUsman Arshad
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 

What's hot (20)

The STRING database
The STRING databaseThe STRING database
The STRING database
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Integrative omics approches
Integrative omics approches   Integrative omics approches
Integrative omics approches
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Genomics
GenomicsGenomics
Genomics
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
DNA Chip
DNA ChipDNA Chip
DNA Chip
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Bioinformatics and BioPerl
Bioinformatics and BioPerlBioinformatics and BioPerl
Bioinformatics and BioPerl
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 

Viewers also liked

TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
 
News screen annotation
News screen annotationNews screen annotation
News screen annotationtommybolton
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotationsAntoine Isaac
 
News Screen Annotation
News Screen AnnotationNews Screen Annotation
News Screen Annotationandygoldman21
 
U Pointer Detailed Training Manual
U Pointer Detailed Training ManualU Pointer Detailed Training Manual
U Pointer Detailed Training ManualUPointer
 
Web2.0 tools categorised
Web2.0 tools categorised Web2.0 tools categorised
Web2.0 tools categorised Anne-Mart Olsen
 
USB Video Conferencing Info-graphic
USB Video Conferencing Info-graphicUSB Video Conferencing Info-graphic
USB Video Conferencing Info-graphicPaul Richards
 
Using 3 M Interactive Tools
Using 3 M Interactive ToolsUsing 3 M Interactive Tools
Using 3 M Interactive ToolsLinda Nitsche
 
InFocus Solutions Displays
InFocus Solutions DisplaysInFocus Solutions Displays
InFocus Solutions DisplaysGabriel Navakas
 
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresent
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresentEzcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresent
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresentvinaybs
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology mhaendel
 
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...Survey of Semantic Media Annotation Tools - towards New Media Applications wi...
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...LinkedTV
 
The Paperless Student - Skills and Confidence Reading on Screen
The Paperless Student - Skills and Confidence Reading on ScreenThe Paperless Student - Skills and Confidence Reading on Screen
The Paperless Student - Skills and Confidence Reading on ScreenMatt Cornock
 
Live – in relationship
Live – in relationshipLive – in relationship
Live – in relationshipankur_sk
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 

Viewers also liked (20)

TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
News screen annotation
News screen annotationNews screen annotation
News screen annotation
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotations
 
News Screen Annotation
News Screen AnnotationNews Screen Annotation
News Screen Annotation
 
U Pointer Detailed Training Manual
U Pointer Detailed Training ManualU Pointer Detailed Training Manual
U Pointer Detailed Training Manual
 
Web2.0 tools categorised
Web2.0 tools categorised Web2.0 tools categorised
Web2.0 tools categorised
 
USB Video Conferencing Info-graphic
USB Video Conferencing Info-graphicUSB Video Conferencing Info-graphic
USB Video Conferencing Info-graphic
 
Using 3 M Interactive Tools
Using 3 M Interactive ToolsUsing 3 M Interactive Tools
Using 3 M Interactive Tools
 
InFocus Solutions Displays
InFocus Solutions DisplaysInFocus Solutions Displays
InFocus Solutions Displays
 
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresent
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresentEzcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresent
Ezcast pro vs Crestron Airmedia vs Barco clickshare vs Latentech wepresent
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
 
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...Survey of Semantic Media Annotation Tools - towards New Media Applications wi...
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...
 
The Paperless Student - Skills and Confidence Reading on Screen
The Paperless Student - Skills and Confidence Reading on ScreenThe Paperless Student - Skills and Confidence Reading on Screen
The Paperless Student - Skills and Confidence Reading on Screen
 
Live – in relationship
Live – in relationshipLive – in relationship
Live – in relationship
 
Windows Vista
Windows VistaWindows Vista
Windows Vista
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
NCBI
NCBINCBI
NCBI
 

Similar to Ensembl annotation

Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian Aurisano
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Monica Munoz-Torres
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.Monica Munoz-Torres
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblDenise Carvalho-Silva, PhD
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 

Similar to Ensembl annotation (20)

Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with Ensembl
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 

More from Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 

More from Genome Reference Consortium (20)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 

Recently uploaded

REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKREJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKAmanDohre
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English BrochureInsilico Gen
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisLinaMarcelaCharrisRa
 
Open Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsOpen Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsPeter Coles
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)Rachana Choudhary
 
discussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxdiscussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxShePerezDelaCruz
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.Silpa Selvaraj
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterSérgio Sacani
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointTELISHA2
 
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilZeeshan Nazir
 
Seminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas TorresSeminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas Torresnicoledoc2004
 
Study of X - Ray Spectra and its types
Study  of X  - Ray Spectra and its typesStudy  of X  - Ray Spectra and its types
Study of X - Ray Spectra and its typestanishashukla147
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptRashmiSanghi1
 
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...Abhinav S
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Nan Yang Academy of Sciences
 
Volatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesVolatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesAhmed Metwaly
 
A galactic microquasar mimicking winged radio galaxies
A galactic microquasar mimicking winged radio galaxiesA galactic microquasar mimicking winged radio galaxies
A galactic microquasar mimicking winged radio galaxiesSérgio Sacani
 
Study of plant growth regulators in fruit culture – structure, biosynthesis ...
Study of plant growth regulators in fruit culture – structure, biosynthesis  ...Study of plant growth regulators in fruit culture – structure, biosynthesis  ...
Study of plant growth regulators in fruit culture – structure, biosynthesis ...AmanDohre
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxperiyar arts college
 

Recently uploaded (20)

REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKREJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina Charris
 
Open Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsOpen Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of Astrophysics
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)
 
discussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxdiscussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptx
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma cluster
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow joint
 
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
 
Seminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas TorresSeminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas Torres
 
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMSREGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
 
Study of X - Ray Spectra and its types
Study  of X  - Ray Spectra and its typesStudy  of X  - Ray Spectra and its types
Study of X - Ray Spectra and its types
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.ppt
 
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
 
Volatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesVolatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduates
 
A galactic microquasar mimicking winged radio galaxies
A galactic microquasar mimicking winged radio galaxiesA galactic microquasar mimicking winged radio galaxies
A galactic microquasar mimicking winged radio galaxies
 
Study of plant growth regulators in fruit culture – structure, biosynthesis ...
Study of plant growth regulators in fruit culture – structure, biosynthesis  ...Study of plant growth regulators in fruit culture – structure, biosynthesis  ...
Study of plant growth regulators in fruit culture – structure, biosynthesis ...
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptx
 

Ensembl annotation

  • 1. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl annotation Bronwen Aken 21 September 2014
  • 2. How Ensembl started • Ewan Birney • Michele Clamp • Tim Hubbard
  • 3. Ensembl’s goals Annotate (vertebrate) genome Integrate with other biological data Make publicly available • Stable, automatic annotation • High quality • Regular release cycles • Open source “Provide a bioinformatics framework to organise biology around the sequences of large genomes”
  • 4. Challenges 1. Find functional elements in a genome • Data have lots of noise 2. Software / hardware • Storing and manipulating data 3. Intuitive and comprehensive access to data • Visualization
  • 6. What is Genebuilding? • Automatic, evidence-based annotation of genes • Not ab initio • Based on sequence alignment • “Best-in-genome” • Aim for high specificity • Prefer to miss a few features than heavily over- predict Automated gene annotation pipeline is designed around decisions made during manual annotation
  • 7. Advantages of re-annotating • Add new genes to new / fixed genomic regions • Updated supporting evidence: Remove models built on data that has been deleted from archives • Move alignments to regions with better mapping
  • 8. Gene annotation pipeline – the basics Identify interesting regions • Rough alignment of sequences to genome Exhaustive alignment to produce transcript models Filter models • Prioritize data sources Produce ‘best guess’ gene set
  • 9. Repeatmasking Same-species proteins Other-species proteins cDNAs/ESTs UTR addition Final gene set Filtering Protein-coding genebuild Filtering TranscriptConsensus LayerAnnotation Also: Small ncRNAs LincRNAs Pseudogenes
  • 10. Repeatmasking Same-species proteins Other-species proteins cDNAs/ESTs UTR addition Final gene set Filtering Protein-coding genebuild Filtering RNA-Seq models Also: Small ncRNAs LincRNAs Pseudogenes MERGE WITH HAVANA
  • 11. Release cycle 26 September 2014 11 Regulation Gene Allele Conserved sequence Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/ Genes • Coding & noncoding • Protein & mRNA alignments • GTF & BAM files Compara • Conserved DNA sequence • Multiple genome alignments • Homologues • Protein families Regulatory regions • DNA methylation • TFBS • Open chromatin Variation • SNPs, indels, structural variation • Phenotypes • QTLs
  • 12. Integrate with other speciesChimpanzeeHuman Gene SLC12A1
  • 14. Genome assembly representation • Coord_system table • Lists the allowed coordinate systems • chromosome, scaffold, contig • With ‘versions’ • GRCh37, GRCh38 • Contigs are shared between assemblies so have no version • ‘Toplevel’ coordinate system • Chromosomes + unplaced scaffolds + unlocalized scaffolds + alternate sequences • Most popular means to access the whole genome • API options for including/excluding alternate sequences and PAR
  • 20. Seq_region names • Regions of the genome are given a slice name; it’s like an address • eg. chromosome:GRCh37:6:133090509:133119701:1 • Users like to say, ‘chromosome 6’ • INSDC coordinates are versioned, but less human-readable • chromosome:GRCh37:CM000668.1:133090509:133119701:1 assembly seq_region. name coord_system start end strand
  • 21. Alternate sequences • Assembly_exception table defines ‘bubbles’ • Initially set up to handle Y chromosome PAR • Adapted to work for MHC haplotypes • Now also used for GRC patches • Assumes ‘equivalent’ region will be present in primary assembly
  • 22. Gene annotation on a ‘patched’ genome 62.3Mb 62.4Mb 62.5MbHsap HG183_PATCH Assembly excepti... SNORA76 > SNORD104 > MILR1 > Genes (GENCODE... Primary assembly... AC025362.12 > AC016489.18 > < AC234063.4Contigs < Y_RNA < hsa-mir-1273e < AC234063.1 < TEX2 < AC016489.1 < PECAM1 Genes (GENCODE... H.sap-H.sap lastz-... Assembly excepti... 62.3Mb 62.4Mb 62.5MbHsap HG183_PATCH protein coding merged Ensembl/Havana RNA gene pseudogene Alternative alleles Projection Gene Legend 62.225Mb 62.250Mb 62.275Mb 62.300Mb 62.325Mb 62.350Mb 62.375Mb 62.400Mb 62.425Mb 62.450Mb 62.475MbHsap Chr. 17 Assembly excepti... H.sap-H.sap lastz-... SNORA76 > SNORD104 > AC138744.2 > MILR1 > Genes (GENCODE... GL383558.1 ... ...GRC alignment i... AC025362.12 > AC016489.18 > < AC009994.10Contigs < TEX2 < RPL31P57 < POLG2 Genes (GENCODE... Assembly excepti... 62.225Mb 62.250Mb 62.275Mb 62.300Mb 62.325Mb 62.350Mb 62.375Mb 62.400Mb 62.425Mb 62.450Mb 62.475MbHsap Chr. 17 Insert relative to reference Delete relative to reference ... Large insert shown truncated due to image scale or edgeMatchAlignment Differe... protein coding merged Ensembl/Havana RNA gene pseudogene Alternative alleles Projection Gene Legend 331.04 kb Forward strand Reverse strand 331.04 kb 276.06 kb Forward strand Reverse strand 276.06 kb TEX2 gene lies across the patch boundary PECAM1 is annotated only on patch HG183 Gap in primary assembly PatchedchromosomePrimarychromosome
  • 23. Gene annotation on a ‘patched’ genome
  • 24. Gene annotation on patches Patch Primary
  • 25. Gene annotation on patches Patch Primary 1. Manual annotation
  • 26. Gene annotation on patches Patch Primary Patch Primary 2. Project models to patch 1. Manual annotation
  • 27. Gene annotation on patches Patch Primary Patch Primary Patch Primary 1. Manual annotation 2. Project models to patch 3. Gap-fill with mini genebuilld
  • 28. Ongoing challenges • How strict should we be when aligning proteins cDNAs to the genome? 1. Genome assembly • Sequencing error (inversion, artificial duplication) • Assembly incomplete • Alignments must allow for truncated matches 2. Population variation • Linear genome is made from ‘one’ individual vs protein databases contain data from many unknown individuals • Paralogues, gene families, pseudogenes 3. Public databases eg. UniProt • Include suspect data and incomplete for many species • When there’s a match, or no match, is it biologically real? • Aligning proteins from other species must allow for mismatches Specificity Sensitivity
  • 31. Reporting data to users Visualisation and Data querying: • - When browsing the primary assembly, how do we make it obvious to users when alternate sequences are available? • - How do we show when the alternate genomic sequences are identical or differ from one another? • - How do we show whether the alternate genome sequences result in identical or different transcribed / translated products? • - How do we make a qualitative call about which allele is “better” to use? eg. ABO • - Data download options • - Concept of a ‘canonical’ transcript per gene (per tissue) Data analysis: • - Linking between alternate alleles (and paralogues?) • - How do we show when data have been mapped from an old to new assembly, compared to freshly aligned to a new assembly? When is it right to map instead of align? • - In a non-linear genome model, how will SNPs (rsIDs) work? • - In a non-linear genome model, what coordinate system should be used?