SlideShare a Scribd company logo
1 of 48
Genome Annotation
Karan Veer Singh,
Scientist.
NBAGR, Karnal,
India
1
• The genome contains all the biological information required to
build and maintain any given living organism
• The genome contains the organisms molecular history
• Decoding the biological information encoded in these molecules
will have enormous impact in our understanding of biology
The Genome
1. Structural genomics-genetic and physical mapping of genomes.
2. Functional genomics-analysis of gene function (and non-genes).
3. Comparative genomics-comparison of genomes across species.
 Includes structural and functional genomics.
 Evolutionary genomics.
Genomics
The Human genome project promised to
revolutionise medicine and explain every
base of our DNA.
Large MEDICAL GENETICS focus
Identify variation in
the genome that is
disease causing
Determine how individual
genes play a role in health
and disease
Human Genome Project
Human Genome Project & Functional
Genome
It cost 3 billion dollars and took 10 years to complete (5 less than
initially predicted).
• Approx 200 Mb still in progress
– Heterochromatin
– Repetitive
Genomics & Genome
annotation
 First genome annotation software system was designed in 1995 by Dr.
Owen White with The Institute for Genomic Research that sequenced
and analyzed the first genome of a free-living organism to be decoded,
the bacterium Haemophilus influenzae
 It involve assembling of the reads to form contigs then assembling
with a reference genome (reference assembly) or de novo assembly to
obtain the complete genome
 Variations such as mutations, SNP, InDels etc can be identified
 The genome is then annotated by structural and functional annotation
 Mapping Image of Whole genome in an easily understandable manner.
Sequence to Annotation
Input1 to Genome Viewer- Variant
Annotation
Input2 to Genome Viewer- Structural
Annotation
 Structural Annotation- AUGUSTUS (version
2.5.5)
Input3 to Genome Viewer-Functional
Annotation
Genome Annotation
 The process of identifying the locations of
genes and the coding regions in a genome to
determe what those genes do
 Finding and attaching the structural elements
and its related function to each genome
locations
11
Genome Annotation
12
gene structure prediction
Identifying elements
(Introns/exons,CDS,stop,start)
in the genome
gene function prediction
Attaching biological information
to these elements- eg: for which
protein exon will code for
Structural annotation
Structural annotation - identification of genomic elements
 Open reading frame and their localisation
 gene structure
 coding regions
 location of regulatory motifs
Functional annotation
Functional annotation- attaching biological
information to genomic elements
 biochemical function
 biological function
 involved regulations
Genome annotation - workflow
16
Genome sequence
Repeats
Structural annotation-Gene finding
Protein-coding genes
nc-RNAs (tRNA, rRNA),
Introns
Functional annotation
View in Genome viewer
Masked or un-masked genome sequence
Genome Repeats & features
17
 Percentage of repetitive sequences in different organisms
Genome Genome Size
(Mb)
% Repeat
Aedes aegypti 1,300 ~70
Anopheles gambiae 260 ~30
Culex pipiens 540 ~50
 Microsatellite
 Minisatellite
 Tandem repeat
 Short tandem repeat
 SSR
Polymorphic between individuals/populations
Finding repeats as a preliminary to gene prediction
18
 Repeat discovery
Homology based approaches
Use RepeatMasker to search the genome and mask the sequence
Masked sequence
 Repeatmasked sequence is an artificial construction where those regions which
are thought to be repetitive are marked with X’s
 Widely used to reduce the overhead of subsequent computational analyses and
to reduce the impact of TE’s in the final annotation set
19
>my sequence
atgagcttcgatagcgatcagctagcgatcaggct
actattggcttctctagactcgtctatctctatta
gctatcatctcgatagcgatcagctagcgatcagg
ctactattggcttcgatagcgatcagctagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctactattggctgatcttaggtcttctga
tcttct
>my sequence (repeatmasked)
atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga
tcttct
Positions/locations are not affected by masking
Types of Masking- Hard or Soft?
 Sometimes we want to mark up repetitive sequence but not to exclude it from
downstream analyses. This is achieved using a format known as soft-masked
20
>my sequence
ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT
AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC
AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT
>my sequence (softmasked)
ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTggcttctctagactcgtctatctctatt
agtatcATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTggcttcgatagcgatcagcTAGCGATC
AGGCTACTATTggcttcgatagcgatcagcTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT
>my sequence (hardmasked)
atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga
tcttct
Genome annotation - workflow
21
Genome sequence
Map repeats
Gene finding- structural annotation
Protein-coding genes
nc-RNAs, Introns
Functional annotation
View in Genome viewer
Masked or un-masked
Structural annotation
Identification of genomic elements
 Open reading frame and their localization
 Coding regions
 Location of regulatory motifs
 Start/Stop
 Splice Sites
 Non coding Regions/RNA’s
 Introns 22
Methods
24
 Similarity
• Similarity between sequences which does not necessarily infer any
evolutionary linkage
 Ab- initio prediction
• Prediction of gene structure from first principles using only the genome
sequence
Genefinding
25
ab initio similarity
ab initio prediction
26
Genome
Coding
potential
Coding
potential
ATG & Stop
codons
ATG & Stop
codons
Splice sites
Examples:
Genefinder, Augustus,
Glimmer, SNAP, fgenesh
Genefinding - similarity
27
 Use known coding sequence to define coding regions
 EST sequences
 Peptide sequences
Problem to handle fuzzy alignment regions around splice sites
Examples: EST2Genome, exonerate, genewise, Augustus,
Prodigal
Gene-finding - comparative
 Use two or more genomic sequences to predict genes based on
conservation of exon sequences
 Examples: Twinscan and SLAM
Genome annotation - workflow
28
Genome sequence
Map repeats
Gene finding- structural annotation
Protein-coding genes
nc-RNAs, Introns
Functional annotation
View in Genome viewer
Masked or un-masked
Genefinding - non-coding RNA genes
29
 Non-coding RNA genes can be predicted using knowledge of their
structure or by similarity with known examples
 tRNAscan - uses an HMM and co-variance model for prediction of
tRNA genes
 Rfam - a suite of HMM’s trained against a large number of different
RNA genes
Gene-finding omissions
30
Alternative isoforms
Currently there is no good method for predicting alternative isoforms
Only created where supporting transcript evidence is present
Pseudogenes
Each genome project has a fuzzy definition of pseudogenes
Badly curated/described across the board
Promoters
Rarely a priority for a genome project
Some algorithms exist but usually not integrated into an annotation set
Practical- structural annotation
31
Eukaryotes- AUGUSTUS (gene model)
~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial --
singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tru
progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea
our_genome.fasta >structural_annotation.gff
Prokaryotes – PRODIGAL (Codon Usage table)
~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa
-f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt
Structural Annotation-output
 Structural Annotation conducted using AUGUSTUS (version 2.5.5),
Magnaporthe_grisea as genome model
Functional
annotation
33
Genome annotation - workflow
34
Genome sequence
Map repeats
Gene finding- structural annotation
Protein-coding genes
nc-RNAs, Introns
Functional annotation
View in Genome viewer
Masked or un-masked
Functional annotation
35
Genome
ATG STOP
AAAn
A B
Transcription
Primary Transcript
Processed mRNA
Polypeptide
Folded protein
Functional activity
Translation
Protein folding
Enzyme activity
RNA processing
m7G
Find function
Functional annotation
36
Attaching biological information to genomic elements
 Biochemical function
 Biological function
 Involved regulation and interactions
 Expression
• Utilize known structural annotation to predicted protein sequence
Functional annotation – Homology Based
 Predicted Exons/CDS/ORF are searched against the non-redundant
protein database (NCBI, SwissProt) to search for similarities
 Visually assess the top 5-10 hits to identify whether these have
been assigned a function
 Functions are assigned
37
Functional annotation - Other features
 Other features which can be determined
 Signal peptides
 Transmembrane domains
 Low complexity regions
 Various binding sites, glycosylation sites etc.
 Protein Domain
 Secretome
See http://expasy.org/tools/ for a good list of possible prediction algorithms
38
Functional annotation - Other features
(Ontologies)
 Use of ontologies to annotate gene products
 Gene Ontology (GO)
 Cellular component
 Molecular function
 Biological process
39
Practical - FUNCTIONAL
ANNOTATION
 Homology Based Method
 setup blast database for nucleotide/protein
 Blasting the genome.fasta for annotations (nucleotide/protein)
 sorting for blast minimum E-value (>=0.01) for nucleotide/protein
 assigning functions
40
Functional annotation- output
August 2008 Bioinformatics tools for Comparative Genomics
of Vectors
41
Conclusion
 Annotation accuracy is dependent available supporting data at the
time of annotation; update information is necessary
 Gene predictions will change over time as new data becomes
available (NCBI) that are much similar than previous ones
 Functional assignments will change over time as new data becomes
available (characterization of hypothetical proteins)
42
Genome annotation - workflow
43
Genome sequence
Map repeats
Gene finding- structural annotation
Protein-coding genes
nc-RNAs, Introns
Functional annotation
View in Genome viewer
Masked or un-masked
Genome Viewer
The Files that can be visualised
Annotation files
Indel files
Consensus sequence
Comparative Genomics 44
Genome View
August 2008 45
46
47
48
Short Read track
49
Thank
You
50

More Related Content

Similar to genomeannotation2013-140127002622-phpapp02.ppt

Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Garry D. Lasaga
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
ELS - M9 L3 L4 print.pdf
ELS - M9 L3 L4 print.pdfELS - M9 L3 L4 print.pdf
ELS - M9 L3 L4 print.pdfBobbyPabores1
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
encode project
encode project encode project
encode project Priti Pal
 

Similar to genomeannotation2013-140127002622-phpapp02.ppt (20)

Thesis def
Thesis defThesis def
Thesis def
 
NCBI
NCBINCBI
NCBI
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Paper - Muhammad Gulraj
Paper - Muhammad GulrajPaper - Muhammad Gulraj
Paper - Muhammad Gulraj
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
ELS - M9 L3 L4 print.pdf
ELS - M9 L3 L4 print.pdfELS - M9 L3 L4 print.pdf
ELS - M9 L3 L4 print.pdf
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
encode project
encode project encode project
encode project
 

More from MohamedHasan816582

Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsMohamedHasan816582
 
Next Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsNext Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsMohamedHasan816582
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationMohamedHasan816582
 
Databases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisDatabases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisMohamedHasan816582
 
Nucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeNucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeMohamedHasan816582
 
Genes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptGenes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptMohamedHasan816582
 

More from MohamedHasan816582 (9)

Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt Bioinformatics
 
Next Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsNext Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomics
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 
Databases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisDatabases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysis
 
Nucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeNucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genome
 
Genes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptGenes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .ppt
 
protein.pptx
protein.pptxprotein.pptx
protein.pptx
 
lecture 1.pptx
lecture 1.pptxlecture 1.pptx
lecture 1.pptx
 
protein Lec.1.ppt
protein Lec.1.pptprotein Lec.1.ppt
protein Lec.1.ppt
 

Recently uploaded

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answersdalebeck957
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Recently uploaded (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

genomeannotation2013-140127002622-phpapp02.ppt

  • 1. Genome Annotation Karan Veer Singh, Scientist. NBAGR, Karnal, India 1
  • 2. • The genome contains all the biological information required to build and maintain any given living organism • The genome contains the organisms molecular history • Decoding the biological information encoded in these molecules will have enormous impact in our understanding of biology The Genome
  • 3. 1. Structural genomics-genetic and physical mapping of genomes. 2. Functional genomics-analysis of gene function (and non-genes). 3. Comparative genomics-comparison of genomes across species.  Includes structural and functional genomics.  Evolutionary genomics. Genomics
  • 4. The Human genome project promised to revolutionise medicine and explain every base of our DNA. Large MEDICAL GENETICS focus Identify variation in the genome that is disease causing Determine how individual genes play a role in health and disease Human Genome Project
  • 5. Human Genome Project & Functional Genome It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted). • Approx 200 Mb still in progress – Heterochromatin – Repetitive
  • 6. Genomics & Genome annotation  First genome annotation software system was designed in 1995 by Dr. Owen White with The Institute for Genomic Research that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae  It involve assembling of the reads to form contigs then assembling with a reference genome (reference assembly) or de novo assembly to obtain the complete genome  Variations such as mutations, SNP, InDels etc can be identified  The genome is then annotated by structural and functional annotation  Mapping Image of Whole genome in an easily understandable manner.
  • 8. Input1 to Genome Viewer- Variant Annotation
  • 9. Input2 to Genome Viewer- Structural Annotation  Structural Annotation- AUGUSTUS (version 2.5.5)
  • 10. Input3 to Genome Viewer-Functional Annotation
  • 11. Genome Annotation  The process of identifying the locations of genes and the coding regions in a genome to determe what those genes do  Finding and attaching the structural elements and its related function to each genome locations 11
  • 12. Genome Annotation 12 gene structure prediction Identifying elements (Introns/exons,CDS,stop,start) in the genome gene function prediction Attaching biological information to these elements- eg: for which protein exon will code for
  • 13. Structural annotation Structural annotation - identification of genomic elements  Open reading frame and their localisation  gene structure  coding regions  location of regulatory motifs
  • 14. Functional annotation Functional annotation- attaching biological information to genomic elements  biochemical function  biological function  involved regulations
  • 15. Genome annotation - workflow 16 Genome sequence Repeats Structural annotation-Gene finding Protein-coding genes nc-RNAs (tRNA, rRNA), Introns Functional annotation View in Genome viewer Masked or un-masked genome sequence
  • 16. Genome Repeats & features 17  Percentage of repetitive sequences in different organisms Genome Genome Size (Mb) % Repeat Aedes aegypti 1,300 ~70 Anopheles gambiae 260 ~30 Culex pipiens 540 ~50  Microsatellite  Minisatellite  Tandem repeat  Short tandem repeat  SSR Polymorphic between individuals/populations
  • 17. Finding repeats as a preliminary to gene prediction 18  Repeat discovery Homology based approaches Use RepeatMasker to search the genome and mask the sequence
  • 18. Masked sequence  Repeatmasked sequence is an artificial construction where those regions which are thought to be repetitive are marked with X’s  Widely used to reduce the overhead of subsequent computational analyses and to reduce the impact of TE’s in the final annotation set 19 >my sequence atgagcttcgatagcgatcagctagcgatcaggct actattggcttctctagactcgtctatctctatta gctatcatctcgatagcgatcagctagcgatcagg ctactattggcttcgatagcgatcagctagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctactattggctgatcttaggtcttctga tcttct >my sequence (repeatmasked) atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga tcttct Positions/locations are not affected by masking
  • 19. Types of Masking- Hard or Soft?  Sometimes we want to mark up repetitive sequence but not to exclude it from downstream analyses. This is achieved using a format known as soft-masked 20 >my sequence ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT >my sequence (softmasked) ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTggcttctctagactcgtctatctctatt agtatcATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTggcttcgatagcgatcagcTAGCGATC AGGCTACTATTggcttcgatagcgatcagcTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT >my sequence (hardmasked) atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga tcttct
  • 20. Genome annotation - workflow 21 Genome sequence Map repeats Gene finding- structural annotation Protein-coding genes nc-RNAs, Introns Functional annotation View in Genome viewer Masked or un-masked
  • 21. Structural annotation Identification of genomic elements  Open reading frame and their localization  Coding regions  Location of regulatory motifs  Start/Stop  Splice Sites  Non coding Regions/RNA’s  Introns 22
  • 22. Methods 24  Similarity • Similarity between sequences which does not necessarily infer any evolutionary linkage  Ab- initio prediction • Prediction of gene structure from first principles using only the genome sequence
  • 24. ab initio prediction 26 Genome Coding potential Coding potential ATG & Stop codons ATG & Stop codons Splice sites Examples: Genefinder, Augustus, Glimmer, SNAP, fgenesh
  • 25. Genefinding - similarity 27  Use known coding sequence to define coding regions  EST sequences  Peptide sequences Problem to handle fuzzy alignment regions around splice sites Examples: EST2Genome, exonerate, genewise, Augustus, Prodigal Gene-finding - comparative  Use two or more genomic sequences to predict genes based on conservation of exon sequences  Examples: Twinscan and SLAM
  • 26. Genome annotation - workflow 28 Genome sequence Map repeats Gene finding- structural annotation Protein-coding genes nc-RNAs, Introns Functional annotation View in Genome viewer Masked or un-masked
  • 27. Genefinding - non-coding RNA genes 29  Non-coding RNA genes can be predicted using knowledge of their structure or by similarity with known examples  tRNAscan - uses an HMM and co-variance model for prediction of tRNA genes  Rfam - a suite of HMM’s trained against a large number of different RNA genes
  • 28. Gene-finding omissions 30 Alternative isoforms Currently there is no good method for predicting alternative isoforms Only created where supporting transcript evidence is present Pseudogenes Each genome project has a fuzzy definition of pseudogenes Badly curated/described across the board Promoters Rarely a priority for a genome project Some algorithms exist but usually not integrated into an annotation set
  • 29. Practical- structural annotation 31 Eukaryotes- AUGUSTUS (gene model) ~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial -- singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tru progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea our_genome.fasta >structural_annotation.gff Prokaryotes – PRODIGAL (Codon Usage table) ~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa -f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt
  • 30. Structural Annotation-output  Structural Annotation conducted using AUGUSTUS (version 2.5.5), Magnaporthe_grisea as genome model
  • 32. Genome annotation - workflow 34 Genome sequence Map repeats Gene finding- structural annotation Protein-coding genes nc-RNAs, Introns Functional annotation View in Genome viewer Masked or un-masked
  • 33. Functional annotation 35 Genome ATG STOP AAAn A B Transcription Primary Transcript Processed mRNA Polypeptide Folded protein Functional activity Translation Protein folding Enzyme activity RNA processing m7G Find function
  • 34. Functional annotation 36 Attaching biological information to genomic elements  Biochemical function  Biological function  Involved regulation and interactions  Expression • Utilize known structural annotation to predicted protein sequence
  • 35. Functional annotation – Homology Based  Predicted Exons/CDS/ORF are searched against the non-redundant protein database (NCBI, SwissProt) to search for similarities  Visually assess the top 5-10 hits to identify whether these have been assigned a function  Functions are assigned 37
  • 36. Functional annotation - Other features  Other features which can be determined  Signal peptides  Transmembrane domains  Low complexity regions  Various binding sites, glycosylation sites etc.  Protein Domain  Secretome See http://expasy.org/tools/ for a good list of possible prediction algorithms 38
  • 37. Functional annotation - Other features (Ontologies)  Use of ontologies to annotate gene products  Gene Ontology (GO)  Cellular component  Molecular function  Biological process 39
  • 38. Practical - FUNCTIONAL ANNOTATION  Homology Based Method  setup blast database for nucleotide/protein  Blasting the genome.fasta for annotations (nucleotide/protein)  sorting for blast minimum E-value (>=0.01) for nucleotide/protein  assigning functions 40
  • 39. Functional annotation- output August 2008 Bioinformatics tools for Comparative Genomics of Vectors 41
  • 40. Conclusion  Annotation accuracy is dependent available supporting data at the time of annotation; update information is necessary  Gene predictions will change over time as new data becomes available (NCBI) that are much similar than previous ones  Functional assignments will change over time as new data becomes available (characterization of hypothetical proteins) 42
  • 41. Genome annotation - workflow 43 Genome sequence Map repeats Gene finding- structural annotation Protein-coding genes nc-RNAs, Introns Functional annotation View in Genome viewer Masked or un-masked
  • 42. Genome Viewer The Files that can be visualised Annotation files Indel files Consensus sequence Comparative Genomics 44
  • 44. 46
  • 45. 47
  • 46. 48