2. Why?
● RNA molecules can be classified in:
● messenger (coding) RNA
● non-coding RNA
● The non-coding RNA have a wide
range of function that is (believed
to be) determined by its tertiary
structure
● The scaffold for the tertiary
structure is provided by the 2nd
structure
tRNA phenylalanine from yeast
tertiary and 2nd
structure
3. RNA sequence
● RNA (RiboNucleic Acid) molecules are very similar to DNA
(DeoxyriboNucleic Acid) molecules
● Each molecule is made of a chain of nucleotides (bases).
There are only four nucleotides. Thus, the sequence (or
primary structure) of the RNA molecule can be represented
as a string over the alphabet {A, C, G, U}
adenine cytosine guanine uracil
4. RNA 2nd
structure
● Unlike DNA, RNA is produced as a single stranded molecule
which then folds to form base pairs (2nd
structure)
● The typical base pairs are created between:
● canonical (Watson - Crick) base pairs:
– A and U
– C and G
● non-canonical base pairs:
– G and U
● RNA can form other base pairs but they are encountered
with very low frequency
5. RNA 2nd
structure representation
● Primary structure
A C A G U A G G U G U C
● The sequence of base pairs
{1 · 11, 2 · 10, 3 · 9}
● Bracket
( ( ( . . . . . ) ) ) .
● Dome
● Standard graphical representation
1 3 9 12
6. Base pairs
● Any base can take part in at most one base pair
● Two base pairs can be in one of three configurations
● Overlapping base pairs form a pseudoknot
● A 2nd
structure without pseudoknots can be represented as
a planar graph
juxtaposed nested overlapping
7. RNA 2nd
structure prediction
● Energy minimization
● predict a 2nd
structure of least free energy
● based on primary structure only
● example Nussinov, Zuker's Mfold
● Comparative structure prediction
● predict 2nd
structures for several sequences
● based on a prior (reliable) alignment
● Probabilistic models
● example SCFGs (stochastic context free grammars)
8. Nussinov
● Minimum energy maximum number of base pairs
● Calculate best structure for small subsequences and work
outwards to larger and larger subsequences
● Notations
● seq – the RNA sequence (over alphabet {A, C, G, U}
● seq[i, j] – the RNA sequence from position i to position j
● str – the best 2nd
structure for seq (over alphabet { (, ), .}
● str[i, j] – the best 2nd
structure for seq[i, j]
● score[i, j] – the number of base pairs in str[i, j]
9. Nussinov
● i unpaired and str[i+1, j]
● j unpaired and str[i, j-1]
● seq[i] · seq[j] and str[i+1, j-1]
● str[i, k] and str[k+1, j] for some
i < k < j
i i+1 j
i j-1 j
i i+1 j-1 j
i k k+1 j
10. Nussinov
score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxik j−1 score[i ,k]score[k1, j]
i i+1 j
i j-1 j
i i+1 j-1 j
i k k+1 j
11. Nussinov
score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxik j−1 score[i ,k]score[k1, j]
i i+1 j
i j-1 j
i i+1 j-1 j
i k k+1 j
Space? Time?
12. Nussinov
score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxik j−1 score[i ,k]score[k1, j]
i i+1 j
i j-1 j
i i+1 j-1 j
i k k+1 j
Space? Time?O(n2
) O(n3
)
13. score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxikj−1
score[i ,k]score[k1, j]
A C A G U A G G U G U C
A 0 0
C 0 0 0
A 0 0 0 0
G 0 0 0 0 0
U 0 0 0 0 0 0
A 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0
Nussinov
14. score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxikj−1
score[i ,k]score[k1, j]
A C A G U A G G U G U C
A 0 0 0 1 2 2 2 2 3 3 4 4
C 0 0 0 1 1 1 2 2 2 3 3 3
A 0 0 0 0 1 1 1 1 2 2 3 3
G 0 0 0 0 0 0 1 1 2 2 3 3
U 0 0 0 0 0 0 1 1 1 2 2 2
A 0 0 0 0 0 0 0 0 1 1 2 2
G 0 0 0 0 0 0 0 0 1 1 1 2
G 0 0 0 0 0 0 0 0 0 0 1 1
U 0 0 0 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0 0 0 1
U 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0
Nussinov
15. score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxikj−1
score[i ,k]score[k1, j]
A C A G U A G G U G U C
A 0 0 0 1 2 2 2 2 3 3 4 4
C 0 0 0 1 1 1 2 2 2 3 3 3
A 0 0 0 0 1 1 1 1 2 2 3 3
G 0 0 0 0 0 0 1 1 2 2 3 3
U 0 0 0 0 0 0 1 1 1 2 2 2
A 0 0 0 0 0 0 0 0 1 1 2 2
G 0 0 0 0 0 0 0 0 1 1 1 2
G 0 0 0 0 0 0 0 0 0 0 1 1
U 0 0 0 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0 0 0 1
U 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0
Backtracking
( ( ( . ( . . ) ) ) ) .
16. score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxikj−1
score[i ,k]score[k1, j]
A C A G U A G G U G U C
A 0 0 0 1 2 2 2 2 3 3 4 4
C 0 0 0 1 1 1 2 2 2 3 3 3
A 0 0 0 0 1 1 1 1 2 2 3 3
G 0 0 0 0 0 0 1 1 2 2 3 3
U 0 0 0 0 0 0 1 1 1 2 2 2
A 0 0 0 0 0 0 0 0 1 1 2 2
G 0 0 0 0 0 0 0 0 1 1 1 2
G 0 0 0 0 0 0 0 0 0 0 1 1
U 0 0 0 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0 0 0 1
U 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0
Backtracking
( ( ( . ( . . ) ) ) ) .
Time?
17. score[i , j]=
{
0 if j−i2
max
{
score[i1, j]
score[i , j−1]
score[i1, j−1]1 if seq[i]⋅seq[ j]
maxikj−1
score[i ,k]score[k1, j]
A C A G U A G G U G U C
A 0 0 0 1 2 2 2 2 3 3 4 4
C 0 0 0 1 1 1 2 2 2 3 3 3
A 0 0 0 0 1 1 1 1 2 2 3 3
G 0 0 0 0 0 0 1 1 2 2 3 3
U 0 0 0 0 0 0 1 1 1 2 2 2
A 0 0 0 0 0 0 0 0 1 1 2 2
G 0 0 0 0 0 0 0 0 1 1 1 2
G 0 0 0 0 0 0 0 0 0 0 1 1
U 0 0 0 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0 0 0 1
U 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0
Backtracking
( ( ( . ( . . ) ) ) ) .
Time? O(n2
)