2. RNA evolution
●
Homologous RNAs can have a common 2nd
structure
without sharing a significant sequence similarity
●
Mutations can lead to compensatory mutations to
maintain the base-paring complementarity
3. Comparative sequence analysis
●
In a structurally correct multiple alignment of RNAs,
conserved base pairs are often revealed by the
presence of frequent correlated compensatory
mutations
●
Measure sequence covariation: mutual information
● fXi
is the frequency of one of the five possible characters
observed in col i: four nucleotides + gap
● fXi,Xj
is the joint frequency of the pairs observed in columns i
and j
Mij = ∑
xi , x j
f xi , x j
log2
f xi , x j
f xi
⋅f x j
4. Mutual information
G U C U G G A C
G A C U G G U C
G G C U G G C C
Mij = ∑
xi , x j
f xi , x j
log2
f xi , x j
f xi
⋅f x j
M2,7 = 3⋅(1
3
⋅log2
1/ 3
1/ 9)= log2 3 ≈ 1.59
● Mij
is maximum if i and j appear completely random but
are perfectly correlated
●
if i and j are uncorrelated, the mutual information is 0
●
if either i or j are highly conserved positions, we also get
little or no mutual information
5. Mutual information
● Mij
is maximum if i and j appear completely random but
are perfectly correlated
●
if i and j are uncorrelated, the mutual information is 0
●
if either i or j are highly conserved positions, we also get
little or no mutual information
Mij = ∑
xi , x j
f xi , x j
log2
f xi , x j
f xi
⋅f x j
M2,7 = 4⋅(1
4
⋅log2
1 /4
1/16)= 2
M1,8 = log2
1
1
= 0
G U C U G G A C
G A C U G G U C
G G C U G G C C
G C C U G G G C
6. Comparative analysis
●
Start with a multiple alignment
●
Predict 2nd
structure base on alignment
●
Refine alignment based on 2nd
structure
●
Repeat
●
The sequences to be compared must be sufficiently:
●
similar that they can be initially aligned by primary
sequence
●
dissimilar that a number of co-varying substitutions can be
detected
7. Comparative analysis
●
How to build 2nd
structure based on alignment?
●
Greedy method
● choose the pair of columns that have the highest Mij
●
make a base pair
● carry on with the second highest Mij
●
Problem columns might end up in more than one base pair
8. Nussinov and alignments
●
Notations
●
aln the RNA alignment
● alnk
the kth
sequence in the alignment
●
aln[i, j] the RNA alignment from position i to j
●
str the best 2nd
structure for aln
(over alphabet {(, ), .})
●
str[i, j] the best2nd
structure for aln[i, j]
●
score[i, j] the number of base pairs in str[i, j]
● aln[i] · aln[j] if for all k, alnk
[i] · alnk
[j]
9. Nussinov and alignments
●
i unpaired and str[i+1, j]
●
j unpaired and str[i, j-1]
●
aln[i] · aln[j] and str[i+1, j-1]
●
str[i, k] and str[k+1, j]
for some i < k < j
i ji+1
i jj-1
i ji+1 j-1
i jk k+1
10. Nussinov and alignments
●
Scoring base pairs
●
on one sequence + 1
● on an alignment + 1 + Mij
●
Base pairs between columns with high mutual
information are favoured
●
Other scoring schemes?