Statistical methods in
Bioinformatics
Dot Matrix
•First described by Gibbs and McIntyre (1970)
•Dot matrix analysis of DNA sequence (W=11, S=7)
Phage P22 c2 repressor
Phage lambda cI
Dot Matrix
•Dot matrix analysis of amino acid sequence (W=1, S=1)
Phage lambda cI
Phage P22 c2 repressor
Filtering in Dot Matrix
•Filtering can be applied using Sliding windows
Window size Match requirement
(Stringency)
DNA 15 10
Protein 2/3 2
•For DNA Long Windows, higher Stringency
For Proteins Short Windows, Low Stringency
For Protein Domains Long Windows, Low Stringency
Dot Matrix Programs
•DNA strider
•DOTTER
•COMPARE
•DOPLOT
For sequence repeats,
•LALIGN
•PLALIGN
LALIGN/PALIGN
Dot plot for Repeat analysis
(Window=1, Stringency=1)
Dot plot for Repeat analysis
(Window=23, Stringency=7)
Dynamic programming
•Compares every pair of characters in the two sequences and
generates an alignment
•Alignment includes matches, mismatches and gaps
•Alignments obtained depend on the choice of scoring system
Programs for alignment of sequences
Scoring using Gap penalty
Derivation of Dynamic programming
algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Dynamic programming Algorithm
Formal description of Algorithm
Global and Local alignments
Global and Local alignments
Scoring matrices
•Certain amino acid substitutions common in related proteins
from different species
Proteins still function with these
substitutions
Scoring matrices
Scoring matrices
•Probability of changing
A B
is identical to
B A
PAM (Percent Accepted Mutation)
•Based on evolutionary principles
•Each matrix gives the changes expected for a given period
of evolutionary time
•Each change at a particular site is assumed to be
independent of previous mutational events
•Estimations are based on 1572 changes in 71 groups of
protein sequences that were at least 85% similar
Scoring matrices
PAM (Percent Accepted Mutation)
PAM1 matrix estimates what rate of substitution would be
expected if 1% of the amino acids had changed
Similarity Matrix used
40% PAM120
50% PAM80
60% PAM60
14-27% PAM250
BLOSUM (Blocks Amino acid
Substitution Matrices)
Matrix values are based on amino acid substitutions in a large
set of ~2000 conserved amino acid patterns (blocks)
Note: patterns are found by MOTIFMOTIF program
BLOSUM
– Derivation of the Matrix values
PAM 250
BLOSUM62
BLAST home page
BLAST
BLAST results
BLAST results
BLAST results

Bio info statistical-methods[1]