PAM and BLOSUM are the widely used substitution matrices in the sequence alignment. The mathematical modeling of PAM matrices is explained in these slides.
2. DNA SUBSTITUTION
MATRIX
Simple substitution matrix
4 bases – Adenine, Guanine,
Thymine, Cytosine
A C T G
A 1 -1 -1 -1
C -1 1 -1 -1
T -1 -1 1 -1
G -1 -1 -1 1
3. PROTEIN SUBSTITUTION MATRICES
Protein substitution matrices are complex than DNA substitution matrices – 20 Residues
Physio-chemical properties of each individual amino acids vary significantly.
A protein substitution matrix can be based on any property – size, polarity, charge and so on.
Evolution based substitution matrices are the most important!
THE NEEDLEMAN-WUNSCH ALGORITHM FOR SEQUENCE ALIGNMENT, 7TH MELBOURNE BIOINFORMATICS COURSE
4. EVOLUTIONARY SUBSTITUTION MATRICES (WIDELY USED)
PAM – point accepted mutation
E.g.: PAM250
BLOSUM – block substitution
E.g.: BLOSUM62
A MODEL OF EVOLUTIONARY CHANGE IN PROTEINS
5. POINT ACCEPTED MUTATION (PAM) MATRICES
Used to score sequence alignments for proteins.
Based on strong evolutionary principles.
PAM matrices are symmetrical.
PAM matrix gives the probability of single amino acid replaced by another single amino acid, for a given
period of evolutionary time- time taken for ‘n’ point accepted mutations to occur per 100 amino acids.
A MODEL OF EVOLUTIONARY CHANGE IN PROTEINS
6. CONSTRUCTION OF PAM MATRICES
Introduced by Margaret Dayhoff in 1978.
The data used in study includes 1572 mutations in the phylogenetic trees of 71 families of closely related
proteins.
Sequence within a tree were 85% similar(only 15% different) to it’s Ancestors.
Assumption: aligned mismatch resulted by a single mutation event.
Explicit evolution model such as phylogenetic trees are required to identify point accepted mutations
and development of matrix of accepted point mutations - Mutations that are accepted by natural
selection.
Phylogenetic tree
7. Without explicit model such as Phylogenetic
tree:
A C G H
D B G H
A D I J
C B I J
A B G H
A B I J
CB CD BD BB CB X BD BB
A B C D G H I J
A 1 1
B 1 1
C 1 1
D 1 1
G 1
H 1
I 1
J 1
Matrix of accepted point mutations
A MODEL OF EVOLUTIONARY CHANGE IN PROTEINS
8. Assumption – the like hood of amino acid X
replacing Y is same as amino acid Y replacing
X.
Elements in the matrices are (x10).
Fractional exchange results when ancestral
sequences are unknown
What are the information we can acquire from
this matrix?
Can we directly say Asp-Glu has higher
mutability compared to Gly-Trp?
Number of occurrence of each amino acid
9. RELATIVE MUTABILITY & FREQUENCY
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑚𝑢𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 − 𝑚(𝑗) =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑
𝑡𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑 𝑡𝑜 𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛
=
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 − 𝑓(𝑗) =
𝑡𝑜𝑡𝑎𝑙 𝑒𝑥𝑝𝑜𝑠𝑢𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑 𝑡𝑜 𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑𝑠
=
Sum of normalized frequency = 1
𝐴𝑖𝑗 := elements on previous matrix (number of mutation occurred between amino acid i and amino acid j)
11. MUTATION PROBABILITY MATRIX (BASIS FOR 1PAM)
𝑀𝑢𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑀𝑖𝑗) =
𝜆𝑚𝑗 𝐴𝑖𝑗
𝑖
𝑖≠𝑗
𝐴𝑖𝑗
𝑀𝑗𝑗 = 1 − 𝜆𝑚𝑗
=
𝜆𝐴𝑖𝑗
𝑁𝑓(𝑗)
= 𝜆
𝐴𝑖𝑗
𝑛(𝑗)
Non-diagonal elements Diagonal elements
𝑀𝑗𝑗 = 1 −
𝑖
𝑖≠𝑗
𝑀𝑖𝑗
𝜆:= proportionality constant
Pr(remain same)+Pr(change into another amino acid)=1
Sum of elements in each column sum up to 1
12. CONSTANT PROPORTIONALITY
𝑂𝑛𝑒 𝑃𝐴𝑀 =
𝑜𝑛𝑒 𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛
100 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑
One pam is the basic time evolutionary unit
Mutation matrix for 1PAM = 99% of the amino acids remain
conserved.
Above equation gives the total probability of conserved amino
acids.
𝜆 value needed to be chosen to produce 99% of conserved
probability.
100 ∗ 𝜆 𝑗 𝑓 𝑗 𝑚 𝑗 = 1 (observed percentage difference for 1pam)
13. 𝑀𝑖𝑗 ≠ 𝑀𝑗𝑖 𝑏𝑢𝑡, 𝑀𝑖𝑗 𝑓 𝑗 = 𝑀𝑗𝑖 𝑓 𝑖
42 ≠ 36 but, 𝑀𝑖𝑗 𝑓 𝑗 = 𝑀𝑗𝑖 𝑓 𝑖 = 1.68
Each elements gives the probability of that
the amino acid in column j will be replaced
by the amino acid in row i after a given
evolutionary interval (1pam)
𝑃𝑖𝑗,1 = Pr(𝑋1 = 𝑗|𝑋0 = 𝑖)
Elements shown are (x10,000)
A MODEL OF EVOLUTIONARY CHANGE IN PROTEINS
Mutation probability matrix – basis for
1PAM
14. MARKOV CHAIN MODEL
𝑃𝑖𝑗,1 = Pr(𝑋1 = 𝑗|𝑋0 = 𝑖) or 𝑃𝑖𝑗,1 = Pr(𝑋𝑚 + 1 = 𝑗|𝑋𝑚 = 𝑖)
𝑃𝑖𝑗,𝑛 = Pr(𝑋𝑛 = 𝑗|𝑋0 = 𝑖) ?
Do we need to make n observation to know the probability of observing j if i was in the initial
observation? Then we need examples of proteins at given n evolutionary interval.
𝑀 𝑛 = 𝑀1
𝑛
(relation between mutation matrix of 1PAM and PAMn.
15. RELATEDNESS ODDS MATRIX
Ratio of the probability of j-th amino acid replaced by i-th amino acid, to the probability of these amino
acids being aligned by chance.
𝑅𝑖𝑗 =
𝑀 𝑖𝑗
𝑓 𝑖
Symmetrical
PAM(i,j) = log (𝑅𝑖𝑗)
16. PAM250
Elements shown are (x10)
Neutral score = 0
Most informative matrix
with strong evolutionary
priciples
A MODEL OF EVOLUTIONARY CHANGE IN PROTEINS