3. Strategies used in Gene Prediction
Programs
Homology based
• Based on comparison
with other sequence
Ab-initio based
• Based on given sequence
• Use two major features
a) Gene signals
- Start and stop codons
- Transcription factor binding
sites
- Ribosomal binding sites
- Polyadenylation (Poly-A) sites
b) Gene content
- Nucleotide composition
- Pattern of sequence in coding
and non-coding region
- GC content etc.
4. Gene Prediction in Prokaryotes
Characteristics of Genome
- Small size (0.5 to 10Mbp)
- High gene density….90% genome is coding sequence
- Few repetitive sequences
- No Introns
- ATG, ….also GTG, TTG
- Shine-Delgarno sequence……..a Purin-rich sequence complementary
to 16sRNA in ribosome…..AGGAGGT…present downstream of TSS and
upstream of Translation ignition codon
- Possible stop codons.. Three
- Operon….. Followed by termination codon (p-independent
terminator)…….and stretch of TTTT
6. Gene Prediction in Prokaryotes
Conventional Method
a) Identification of ORF and major signals related to prokaryotic
gene
- Conceptual translation in all six reading frames
- Identification of a frame longer than 30 codons … a stop codon
in every 20 codons by chance
- Confirmation of signals like Shine-Delgarno sequence and
other signals
- Sequence similarity searching by BLAST…
b) Codon Biasness.. Third codon nucleotide…G/C…coding regions
have higher GC contents.
c) TESTCODE….. Codon at third position tend to repeat itself… so
by plotting the repetition pattern… coding and non-coding
regions can be differentiated
8. Gene Prediction in Prokaryotes
Non-conventional Method
A Markov model describes the probability of the distribution of nucleotides in a
DNA sequence, in which the conditional probability of a particular sequence
position depends on k previous positions.
Based on Markov model and Hidden Markov Models
Zero Order….every position independent
First Order….a position is dependent on previous position
Second Order…a position depends on preceding two positions
Based upon gene content and gene length….
a prokaryotic gene can be Typical (100 to 500 AA) or Atypical (shorter or longer )
So to explain both types….HMM are developed.
Tools based upon MM, HMM…….GeneMark, Glimmer, FGENESB
9. Gene Prediction in Eukaryotes
Characteristics of Genome
- Large size genomes (10Mbp 670 Gbp)
- Low gene density….almost 3% human genome is coding sequence
- Very rich repetitive sequences between coding regions
- Introns and Exons
- ATG, ….also GTG, TTG
- RNA processing…5’caping……Splicing……Polyadenylation
GT-AG rule for intron exon prediction
- Kozak sequence…..flanking ATG …..
- High CG dinucleotides near TSS………called CpG island
- Possible stop codons.. Three
- Poly-A signal
- High frequency of hexamers in coding regions.
10. Ab Initio based
-First objective is discrimination of Exons and Introns
Use two major features
a) Gene signals
- Start and stop codons
-Intron splice signals
-Transcription factor binding sites
-Polyadenylation (Poly-A) sites
b) Gene content
-Nucleotide composition
-Pattern of sequence in coding and non-coding region
-GC content etc.
-Hexamer Frequencies
Tools….GRAIL, MZEF, FGENES, HMMgene
Strategies used in Gene Prediction
Programs
Homology based
Based upon homology of Exon
structure, Exon sequence and
patterns of Exon-Intron
Genomescan, EST2Genome
Also Consensus Based
GeneComber, DIGIT
Editor's Notes
Promoters in prokaryotic organisms are two short DNA sequences located at the -10 (10bp 5' or upstream) and -35 positions from the transcription start site (TSS). Their equivalent to the eukaryotic TATA box, the Pribnow box (TATAAT) is located at the -10 position and is essential for transcription initiation. The -35 position, simply titled the -35 element, typically consists of the sequence TTGACA and this element controls the rate of transcription. Prokaryotic cells contain sigma factors which assist the RNA polymerase in binding to the promoter region. Each sigma factor recognizes different core promoter sequences.