SlideShare a Scribd company logo
Gene Prediction Methods
NUSRAT. M. GULBARGA
Prokaryotic gene structure
ORF (open reading frame)
Start codon Stop codon
TATA box
ATGACAGATTACAGATTACAGATTACAGGATA
Frame 1
Frame 2
Frame 3
Prokaryotes
• Advantages
– Simple gene structure
– Small genomes (0.5 to 10 million bp)
– No introns
– Genes are called Open Reading Frames (ORFs)
– High coding density (>90%)
• Disadvantages
– Some genes overlap (nested)
– Some genes are quite short (<60 bp)
Gene finding approaches
1) Rule-based (e.g, start & stop codons)
2) Content-based (e.g., codon bias, promoter
sites)
3) Similarity-based (e.g., orthologs)
4) Pattern-based (e.g., machine-learning)
5) Ab-initio methods (FFT)
Simple rule-based gene finding
• Look for putative start codon (ATG)
• Staying in same frame, scan in groups of three
until a stop codon is found
• If # of codons >=50, assume it’s a gene
• If # of codons <50, go back to last start codon,
increment by 1 & start again
• At end of chromosome, repeat process for reverse
complement
Example ORF
Content based gene prediction method
• RNA polymerase promoter site (-10, -30
site or TATA box)
• Shine-Dalgarno sequence (+10, Ribosome
Binding Site) to initiate protein translation
• Codon biases
• High GC content
Similarity-based gene finding
• Take all known genes from a related genome and
compare them to the query genome via BLAST
• Disadvantages:
– Orthologs/paralogs sometimes lose function and
become pseudogenes
– Not all genes will always be known in the comparison
genome (big circularity problem)
– The best species for comparison isn’t always obvious
• Summary: Similarity comparisons are good
supporting evidence for prediction validity
Machine Learning Techniques
Hidden Markov Model
ANN based method
Bayes Networks
Ab-initio Methods
• Fast Fourier Transform based methods
• Poor performance
• Able to identify new genes
• FTG method
• http://www.imtech.res.in/raghava/ftg/
Eukaryotic genes
Eukaryotes
• Complex gene structure
• Large genomes (0.1 to 3 billion bases)
• Exons and Introns (interrupted)
• Low coding density (<30%)
– 3% in humans, 25% in Fugu, 60% in yeast
• Alternate splicing (40-60% of all genes)
• Considerable number of pseudogenes
Finding Eukaryotic Genes
Computationally
• Rule-based
– Not as applicable – too many false positives
• Content-based Methods
– CpG islands, GC content, hexamer repeats, composition statistics,
codon frequencies
• Feature-based Methods
– donor sites, acceptor sites, promoter sites, start/stop codons,
polyA signals, feature lengths
• Similarity-based Methods
– sequence homology, EST searches
• Pattern-based
– HMMs, Artificial Neural Networks
• Most effective is a combination of all the above
Gene prediction programs
• Rule-based programs
– Use explicit set of rules to make decisions.
– Example: GeneFinder
• Neural Network-based programs
– Use data set to build rules.
– Examples: Grail, GrailEXP
• Hidden Markov Model-based programs
– Use probabilities of states and transitions between
these states to predict features.
– Examples: Genscan, GenomeScan
Combined Methods
• GRAIL (http://compbio.ornl.gov/Grail-1.3/)
• FGENEH (http://www.bioscience.org/urllists/genefind.htm)
• HMMgene (http://www.cbs.dtu.dk/services/HMMgene/)
• GENSCAN(http://genes.mit.edu/GENSCAN.html)
• GenomeScan (http://genes.mit.edu/genomescan.html)
• Twinscan (http://ardor.wustl.edu/query.html)
Egpred: Prediction of Eukaryotic Genes
• Similarity Search
– First BLASTX against RefSeq datbase
– Second BLASTX against sequences from first BLAST
– Detection of significant exons from BLASTX output
– BLASTN against Introns to filter exons
• Prediction using ab-initio programs
– NNSPLICE used to compute splice sites
• Combined method
http://www.imtech.res.in/raghava/ (Genome Research 14:1756-66)
Thank
you

More Related Content

What's hot

Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
Vijay Hemmadi
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Prosite
PrositeProsite
Cath
CathCath
Cath
Ramya S
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
DrSatyabrataSahoo
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
talhakhat
 
Kegg databse
Kegg databseKegg databse
Kegg databse
Rashi Srivastava
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
geetikaJethra
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
Siva Dharshini R
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
SATHIYA NARAYANAN
 
Scop database
Scop databaseScop database
Scop database
Sayantani Roy
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Afra Fathima
 
ENTREZ.ppt
ENTREZ.pptENTREZ.ppt
ENTREZ.ppt
kishoreGupta17
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
Rida Khalid
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
SHEETHUMOLKS
 

What's hot (20)

Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Prosite
PrositeProsite
Prosite
 
Est database
Est databaseEst database
Est database
 
Cath
CathCath
Cath
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Scop database
Scop databaseScop database
Scop database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
ENTREZ.ppt
ENTREZ.pptENTREZ.ppt
ENTREZ.ppt
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 

Similar to Gene prediction method

gene prediction methods.pptx
gene prediction methods.pptxgene prediction methods.pptx
gene prediction methods.pptx
DrRKSelvakesavanPSGR
 
Genome analysis2
Genome analysis2Genome analysis2
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
Bas van Breukelen
 
Lecture 4.ppt
Lecture 4.pptLecture 4.ppt
Lecture 4.ppt
khadijarafique14
 
Gene Prediction
Gene PredictionGene Prediction
Gene Prediction
Usman Randhawa
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
Prof. Wim Van Criekinge
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
Shifa Ansari
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
VidyasriDharmalingam1
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.ppt
Medhavi27
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
Morgan Langille
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
anjaligoud
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
Cherry
 
Finding genes
Finding genesFinding genes
Finding genes
Sabahat Ali
 
NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
Bl Saini
 
using_web_based_tools.ppt
using_web_based_tools.pptusing_web_based_tools.ppt
using_web_based_tools.ppt
kenter
 
using_webbased_tools.ppt
using_webbased_tools.pptusing_webbased_tools.ppt
using_webbased_tools.ppt
ssuserb86ba7
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
Brian Krueger
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
Scott Dawson
 
Micro RNA.ppt
Micro RNA.pptMicro RNA.ppt
Micro RNA.ppt
Pudhuvai Baveesh
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
ChijiokeNsofor
 

Similar to Gene prediction method (20)

gene prediction methods.pptx
gene prediction methods.pptxgene prediction methods.pptx
gene prediction methods.pptx
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Lecture 4.ppt
Lecture 4.pptLecture 4.ppt
Lecture 4.ppt
 
Gene Prediction
Gene PredictionGene Prediction
Gene Prediction
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.ppt
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
 
Finding genes
Finding genesFinding genes
Finding genes
 
NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
 
using_web_based_tools.ppt
using_web_based_tools.pptusing_web_based_tools.ppt
using_web_based_tools.ppt
 
using_webbased_tools.ppt
using_webbased_tools.pptusing_webbased_tools.ppt
using_webbased_tools.ppt
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Micro RNA.ppt
Micro RNA.pptMicro RNA.ppt
Micro RNA.ppt
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 

More from Nusrat Gulbarga

NMR .pdf
NMR .pdfNMR .pdf
NMR .pdf
Nusrat Gulbarga
 
multiomics-ebook.pdf
multiomics-ebook.pdfmultiomics-ebook.pdf
multiomics-ebook.pdf
Nusrat Gulbarga
 
How Genomics & Data analysis are intertwined each other (1).pdf
How Genomics & Data analysis are intertwined each other  (1).pdfHow Genomics & Data analysis are intertwined each other  (1).pdf
How Genomics & Data analysis are intertwined each other (1).pdf
Nusrat Gulbarga
 
Newtons law of motion ~ II sem ~ m sc bioinformatics
Newtons law of motion ~  II sem ~ m sc  bioinformaticsNewtons law of motion ~  II sem ~ m sc  bioinformatics
Newtons law of motion ~ II sem ~ m sc bioinformatics
Nusrat Gulbarga
 
Chemo-informatics sem 4 MSc Bioinformatics
Chemo-informatics sem 4 MSc  BioinformaticsChemo-informatics sem 4 MSc  Bioinformatics
Chemo-informatics sem 4 MSc Bioinformatics
Nusrat Gulbarga
 
Genomes, omics and its importance, general features III semester
Genomes, omics and its importance, general features   III semesterGenomes, omics and its importance, general features   III semester
Genomes, omics and its importance, general features III semester
Nusrat Gulbarga
 
Architecture of prokaryotic and eukaryotic cells and tissues
Architecture of prokaryotic and eukaryotic cells and tissuesArchitecture of prokaryotic and eukaryotic cells and tissues
Architecture of prokaryotic and eukaryotic cells and tissues
Nusrat Gulbarga
 
Proteomics 123
Proteomics 123Proteomics 123
Proteomics 123
Nusrat Gulbarga
 
Water the unusual properties of water
Water   the unusual properties of waterWater   the unusual properties of water
Water the unusual properties of water
Nusrat Gulbarga
 
JUST SAY CHEESE
JUST SAY CHEESEJUST SAY CHEESE
JUST SAY CHEESE
Nusrat Gulbarga
 
Generation of computer
Generation of computerGeneration of computer
Generation of computer
Nusrat Gulbarga
 
Unit 1 cell biology
Unit 1 cell biologyUnit 1 cell biology
Unit 1 cell biology
Nusrat Gulbarga
 
Mutation
MutationMutation
Mutation
Nusrat Gulbarga
 
Cell signaling
Cell signalingCell signaling
Cell signaling
Nusrat Gulbarga
 
Necrosis
NecrosisNecrosis
Necrosis
Nusrat Gulbarga
 
Biophysics thermodynamics
Biophysics  thermodynamicsBiophysics  thermodynamics
Biophysics thermodynamics
Nusrat Gulbarga
 
Translation in Pro and Eu karyotes
Translation in Pro  and Eu karyotesTranslation in Pro  and Eu karyotes
Translation in Pro and Eu karyotes
Nusrat Gulbarga
 
DATABASE ADMINSTRATION
DATABASE ADMINSTRATION DATABASE ADMINSTRATION
DATABASE ADMINSTRATION
Nusrat Gulbarga
 
123 pathology of_the_endocrine_system_i_final_
123 pathology of_the_endocrine_system_i_final_123 pathology of_the_endocrine_system_i_final_
123 pathology of_the_endocrine_system_i_final_
Nusrat Gulbarga
 
Apoptosis
ApoptosisApoptosis
Apoptosis
Nusrat Gulbarga
 

More from Nusrat Gulbarga (20)

NMR .pdf
NMR .pdfNMR .pdf
NMR .pdf
 
multiomics-ebook.pdf
multiomics-ebook.pdfmultiomics-ebook.pdf
multiomics-ebook.pdf
 
How Genomics & Data analysis are intertwined each other (1).pdf
How Genomics & Data analysis are intertwined each other  (1).pdfHow Genomics & Data analysis are intertwined each other  (1).pdf
How Genomics & Data analysis are intertwined each other (1).pdf
 
Newtons law of motion ~ II sem ~ m sc bioinformatics
Newtons law of motion ~  II sem ~ m sc  bioinformaticsNewtons law of motion ~  II sem ~ m sc  bioinformatics
Newtons law of motion ~ II sem ~ m sc bioinformatics
 
Chemo-informatics sem 4 MSc Bioinformatics
Chemo-informatics sem 4 MSc  BioinformaticsChemo-informatics sem 4 MSc  Bioinformatics
Chemo-informatics sem 4 MSc Bioinformatics
 
Genomes, omics and its importance, general features III semester
Genomes, omics and its importance, general features   III semesterGenomes, omics and its importance, general features   III semester
Genomes, omics and its importance, general features III semester
 
Architecture of prokaryotic and eukaryotic cells and tissues
Architecture of prokaryotic and eukaryotic cells and tissuesArchitecture of prokaryotic and eukaryotic cells and tissues
Architecture of prokaryotic and eukaryotic cells and tissues
 
Proteomics 123
Proteomics 123Proteomics 123
Proteomics 123
 
Water the unusual properties of water
Water   the unusual properties of waterWater   the unusual properties of water
Water the unusual properties of water
 
JUST SAY CHEESE
JUST SAY CHEESEJUST SAY CHEESE
JUST SAY CHEESE
 
Generation of computer
Generation of computerGeneration of computer
Generation of computer
 
Unit 1 cell biology
Unit 1 cell biologyUnit 1 cell biology
Unit 1 cell biology
 
Mutation
MutationMutation
Mutation
 
Cell signaling
Cell signalingCell signaling
Cell signaling
 
Necrosis
NecrosisNecrosis
Necrosis
 
Biophysics thermodynamics
Biophysics  thermodynamicsBiophysics  thermodynamics
Biophysics thermodynamics
 
Translation in Pro and Eu karyotes
Translation in Pro  and Eu karyotesTranslation in Pro  and Eu karyotes
Translation in Pro and Eu karyotes
 
DATABASE ADMINSTRATION
DATABASE ADMINSTRATION DATABASE ADMINSTRATION
DATABASE ADMINSTRATION
 
123 pathology of_the_endocrine_system_i_final_
123 pathology of_the_endocrine_system_i_final_123 pathology of_the_endocrine_system_i_final_
123 pathology of_the_endocrine_system_i_final_
 
Apoptosis
ApoptosisApoptosis
Apoptosis
 

Recently uploaded

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilityISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
SciAstra
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 

Recently uploaded (20)

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilityISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 

Gene prediction method

  • 2. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATA Frame 1 Frame 2 Frame 3
  • 3. Prokaryotes • Advantages – Simple gene structure – Small genomes (0.5 to 10 million bp) – No introns – Genes are called Open Reading Frames (ORFs) – High coding density (>90%) • Disadvantages – Some genes overlap (nested) – Some genes are quite short (<60 bp)
  • 4. Gene finding approaches 1) Rule-based (e.g, start & stop codons) 2) Content-based (e.g., codon bias, promoter sites) 3) Similarity-based (e.g., orthologs) 4) Pattern-based (e.g., machine-learning) 5) Ab-initio methods (FFT)
  • 5. Simple rule-based gene finding • Look for putative start codon (ATG) • Staying in same frame, scan in groups of three until a stop codon is found • If # of codons >=50, assume it’s a gene • If # of codons <50, go back to last start codon, increment by 1 & start again • At end of chromosome, repeat process for reverse complement
  • 7. Content based gene prediction method • RNA polymerase promoter site (-10, -30 site or TATA box) • Shine-Dalgarno sequence (+10, Ribosome Binding Site) to initiate protein translation • Codon biases • High GC content
  • 8. Similarity-based gene finding • Take all known genes from a related genome and compare them to the query genome via BLAST • Disadvantages: – Orthologs/paralogs sometimes lose function and become pseudogenes – Not all genes will always be known in the comparison genome (big circularity problem) – The best species for comparison isn’t always obvious • Summary: Similarity comparisons are good supporting evidence for prediction validity
  • 9. Machine Learning Techniques Hidden Markov Model ANN based method Bayes Networks
  • 10. Ab-initio Methods • Fast Fourier Transform based methods • Poor performance • Able to identify new genes • FTG method • http://www.imtech.res.in/raghava/ftg/
  • 12. Eukaryotes • Complex gene structure • Large genomes (0.1 to 3 billion bases) • Exons and Introns (interrupted) • Low coding density (<30%) – 3% in humans, 25% in Fugu, 60% in yeast • Alternate splicing (40-60% of all genes) • Considerable number of pseudogenes
  • 13. Finding Eukaryotic Genes Computationally • Rule-based – Not as applicable – too many false positives • Content-based Methods – CpG islands, GC content, hexamer repeats, composition statistics, codon frequencies • Feature-based Methods – donor sites, acceptor sites, promoter sites, start/stop codons, polyA signals, feature lengths • Similarity-based Methods – sequence homology, EST searches • Pattern-based – HMMs, Artificial Neural Networks • Most effective is a combination of all the above
  • 14. Gene prediction programs • Rule-based programs – Use explicit set of rules to make decisions. – Example: GeneFinder • Neural Network-based programs – Use data set to build rules. – Examples: Grail, GrailEXP • Hidden Markov Model-based programs – Use probabilities of states and transitions between these states to predict features. – Examples: Genscan, GenomeScan
  • 15. Combined Methods • GRAIL (http://compbio.ornl.gov/Grail-1.3/) • FGENEH (http://www.bioscience.org/urllists/genefind.htm) • HMMgene (http://www.cbs.dtu.dk/services/HMMgene/) • GENSCAN(http://genes.mit.edu/GENSCAN.html) • GenomeScan (http://genes.mit.edu/genomescan.html) • Twinscan (http://ardor.wustl.edu/query.html)
  • 16. Egpred: Prediction of Eukaryotic Genes • Similarity Search – First BLASTX against RefSeq datbase – Second BLASTX against sequences from first BLAST – Detection of significant exons from BLASTX output – BLASTN against Introns to filter exons • Prediction using ab-initio programs – NNSPLICE used to compute splice sites • Combined method http://www.imtech.res.in/raghava/ (Genome Research 14:1756-66)