RNA structure analysis

RNA
• Ribonucleic acid
• Single-stranded nucleic acid polymer
• Carbohydrate is ribose
• fold into unique structures guided by
complementary pairing between nucleotide
bases.
• To many people:
– “RNA is the passive intermediary messenger
between DNA genes and the protein
translation machinery”

Non-coding RNAs
Many non-coding RNAs exist
 Adopt sophisticated 3D structures
 Catalyse biochemical reactions
Different classes of non-coding RNAs participate in
different cellular process:
 Gene expression regulation (miRNAs, piRNAs,
IncRNAs)
 RNA maturation (snRNAs, snoRNAs)
 Protein synthesis (rRNAs, tRNAs)

Three major types of RNA
• Messenger RNA (mRNA)
– Serving as a temporary copy of genes that is used as
a template for protein synthesis.
• Transfer RNA (tRNA)
– Functioning as adaptor molecules that decode the
genetic code.
• Ribosomal RNA (rRNA)
– Catalyzing the synthesis of proteins.

RNA secondary structure
• Unlike DNA, RNA is typically produced as
a single stranded molecule
• Then folds intramolecularly to form a
number of short base-paired stems.
• This base-paired structure is called the
secondary structure of the RNA.

Pseudoknots
Nucleic acid secondary structure
containing at least two stem
loop structures in which half of
one stem is intercalated
between the two halves of
another stem

primary (sequence),
secondary (hairpins, bulges and internal loops),
tertiary (A‐minor motif, 3‐way junction, pseudoknot, etc.)

Elements of a
RNA secondary structure
• Loop: single stranded subsequence bounded by
base pairs
• Hairpin loop: a loop at the end of a stem
• Bulge (loop): single stranded bases occurring
within a stem
• Interior loop: single stranded bases interrupting
both sides of a stem
• Multi-branched loop/junctions: a loop from
which three or more stems radiate

Need
• Understanding the structures of RNA provides insights
into the functions of this class of molecules.
• Detailed structural information about RNA has significant
impact on understanding the mechanisms of a vast array
of cellular processes such as gene expression, viral
infection, and immunity.
• RNA structures can be experimentally determined using
x-ray crystallography or NMR techniques.
• However, these approaches are extremely time
consuming and expensive.
• As a result, computational prediction has become an
attractive alternative.

RNA Sequence Analysis
• The sequence evolution of RNA is
constrained by the structure.
• It is possible to have two different RNA
sequences with the same secondary
structure.
• Drastic changes in sequence can often be
tolerated as long as compensatory
mutations maintain base-pairing
complementarity.

SRP- signal recognition particle RNA

Computational Prediction
• At present, there are essentially two types of method of
RNA structure prediction.
• One is based on the calculation of the minimum free
energy of the all possible combinations of potential
double-stranded regions derived from a single RNA
sequence.
• This can be considered as ab initio approach.
• The second is a comparative approach which infers
structures based on an evolutionary comparison of
multiple related RNA sequences.
• Involves identification of Base Covariation that
maintains 2° and 3° structure of an RNA molecule during
evolution
Base covariation – sequence is changed while base pairing interations
are preserved

1. Ab Initio
• Many secondary structures can be drawn for a given
sequence
• Again, the number of secondary structures increases
exponentially with sequence length
– An RNA of 200 bases has over 1050 possible base-paired
structures
• Goal: Distinguish the biologically correct structure from all the
incorrect structures.
• In searching for the lowest energy form, all possible base-pair
patterns have to be examined.
• There are several methods for finding all the possible base-
paired regions from a given nucleic acid sequence.
• Two methods
– Dot matrix method
– Dynamic programming method

Dot Matrices
• A simple dot matrix can find all possible base-paring
patterns of an RNA sequence when one sequence is
compared with itself.
• Here dots are placed in the matrix to represent matching
complementary bases instead of identical ones.
• The diagonals perpendicular to the main diagonal
represent regions that can self hybridize to form
double-stranded structure with traditional A–U and G–
C base pairs.
• In reality, the pattern detection in a dot matrix is often
obscured by high noise levels.
• One way to reduce the noise in the matrix is to select an
appropriate window size of a minimum number of
contiguous base matches.
• Normally, only a window size of four consecutive base
matches is used.
• If the dot plot reveals more than one feasible structures,
the lowest energy one is chosen.

Dynamic Programming
• The use of a dot plot can be effective in finding a single
secondary structure in a small molecule.
• However, if a large molecule contains multiple secondary
structure segments, choosing a combination that is
energetically most stable among a large number of
possibilities can be difficult.
• In this approach, an RNA sequence is compared with itself.
• A scoring scheme is applied to fill the matrix with match
scores based on Watson–Crick base complementarity.
• A path with the maximal score within a scoring matrix after
taking into account the entire sequence information
represents the most probable secondary structure form.
• The dynamic programming method produces one
structure with a single best score.
• However, this is potentially a drawback of this approach
because in reality an RNA may exist in multiple alternative
forms with near minimum energy but not necessarily the
one with maximum base pairs.

Algorithms for RNA secondary
structure prediction
• We need:
– An algorithm for evaluating the scores of all
possible structures
– A function that assigns the correct structure
the highest score
• Two methods:
– Nussinov folding algorithm
– Zuker folding algorithm

Nussinov folding algorithm
• Goal:
Find the structure with the most base pairs
• Nussinov introduced an efficient dynamic programming algorithm
for this problem
• A recursive algorithm that calculates
– the best structure for small subsequences and
– computes for a given RNA sequence the maximal number of base pairs
of any nested structure
– Nussinov Algorithm solves the problem of RNA non-crossing
secondary structure prediction by base pair maximization
– Simplistic approach ;„Does not give accurate structure predictions.
Predictions; Misses: nearest neighbor interactions, stacking interactions,„
loop length preferences

Zuker folding algorithm
• Most sophisticated secondary structure prediction
method for single RNAs
– An energy minimisation algorithm which assumes
that the correct structure is the one with the
lowest equilibrium free energy
• The equilibrium free energy of an RNA secondary
structure is approximated as the sum of individual
contributions from loops, base pairs and other secondary
structure elements.
• The minimum energy structure can be calculated
recursively by a dynamic programming algorithm
very similar to how the maximum base-paired structure
was calculated like the Nussinov algorithm

Suboptimal RNA folding
• The original Zuker algorithm finds only the
optimal structure.
• The biologically correct structure is often not the
calculated optimal structure.
• suboptimal structures are structures that a
given sequence could fold into aside from the
minimum free energy structure
• Zuker introduced a suboptimal folding algorithm.
• The algorithm samples one base pair sub
optimally.
• The rest of the structure is the optimal structure
given that base pair.

• Difference with the Nussinov folding algorithm:
– Energies of stems are calculated by adding stacking
contributions for the interface between
neighbouring base pairs instead of individual
contributions for each pair.
• Advantage:
– Better fit to experimentally observed equilibrium free
energy values for RNA structures, but it complicates
the dynamic programming algorithm
Zuker folding algorithm

RNA Analysis Tools
• MFOLD: prediction of RNA Secondary Structure by
Energy Minimization (Zuker)
• RNAfold: calculate secondary structures of RNAs
RNAeval: calculate energy of RNA sequences on given
secondary structure
• RNAheat: calculate specific heat of RNAs
• RNAdistance: calculate distances of RNA secondary
structures
• RNApdist: calculate distances of thermodynamic RNA
secondary structures ensembles
• RNAinverse: find RNA sequences with given secondary
structure

RNA Analysis Tools
• RNAsubopt: calculate suboptimal secondary
structures of RNAs
• tRNAscan-SE: detection of transfer RNA genes
• FAStRNA: predicts potential tRNA genes in genomic
DNA sequences.
• FAStRNA-CM relies on a probabilistic model.
• FAStRNA-CLASS relies on a pattern-matching
approach.
• palindrome: Looks for inverted repeats in a nucleotide
sequence (EMBOSS).
• RNAGA: Prediction of common secondary structures of
RNAs by genetic algorithm (Chen, Le, Maizel)

MFOLD
• Predicts energetically most stable structure of an
RNA molecule.
• Also uses covariance information from
phylogenetically related sequences.
• Includes methods for graphic display of predicted
molecule.
• Used for sequence lengths < 1000 nucleotides in length.
• Demands more resources on computer
• Uses N3 complexity – where N is sequence length
• Doubling sequence length increases computation time
up to 8 times

2. COMPARATIVE APPROACH
• The comparative approach uses multiple
evolutionarily related RNA sequences to infer
a consensus structure.
• This approach is based on the assumption that
RNA sequences that deem to be homologous
fold into the same secondary structure.
• By comparing related RNA sequences, an
evolutionarily conserved secondary structure
can be derived.

Covariation
• To distinguish the conserved secondary structure among
multiple related RNA sequences, a concept of
“covariation” is used.
• It is known that RNA functional motifs are structurally
conserved.
• To maintain the secondary structures while the
homologous sequences evolve, a mutation occurring in
one position that is responsible for base pairing should
be compensated for by a mutation in the corresponding
base-pairing position so to maintain base pairing and the
stability of the secondary structure.
• Any lack of covariation can be deleterious to the RNA
structure and functions.

Consensus drawing
• Another aspect of the comparative method is to select a
common structure through consensus drawing.
• Because predicting secondary structures for each
individual sequence may produce errors, by comparing
all predicted structures of a group of aligned RNA
sequences and drawing a consensus, the commonly
adopted structure can be selected; many other possible
structures can be eliminated in the process.
• The comparative-based algorithms can be further
divided into two categories based on the type of input
data.
• One requires predefined alignment and the other does
not.

Algorithms That Use Prealignment
• This type of algorithm requires the user to provide a
pairwise or multiple alignment as input.
• The sequence alignment can be obtained using standard
alignment programs such as T-Coffee, PRRN, or Clustal.
• Based on the alignment input, the prediction programs
compute structurally consistent mutational patterns such
as covariation and derive a consensus structure
common for all the sequences.
• In practice, the consensus structure prediction is often
combined with thermodynamic calculations to improve
accuracy.
• This type of program is relatively successful for
reasonably conserved sequences.

RNAalifold
• RNAalifold (http://rna.tbi.univie.ac.at/cgi-
bin/alifold.cgi) is a program in the Vienna
package.
• It uses a multiple sequence alignment as
input to analyze covariation patterns on the
sequences.
• A scoring matrix is created that combines
minimum free energy and covariation
information.
• Dynamic programming is used to select the
structure that has the minimum energy for the
whole set of aligned RNA sequences.

Algorithms That Do Not Use
Prealignment
• This type of algorithm simultaneously aligns
multiple input sequences and infers a consensus
structure.
• The alignment is produced using dynamic
programming with a scoring scheme that
incorporates sequence similarity as well as
energy terms.
• Because the full dynamic programming for
multiple alignment is computationally too
demanding, currently available programs limit
the input to two sequences.

Foldalign
• Foldalign (http://foldalign.kvl.dk/server/index.html) is a
web-based program for RNA alignment and structure
prediction.
• The user provides a pair of unaligned sequences.
• The program uses a combination of Clustal and dynamic
programming with a scoring scheme that includes
covariation information to construct the alignment.
• A commonly conserved structure for both sequences is
subsequently derived based on the alignment.
• To reduce computational complexity, the program
ignores multibranch loops and is only suitable for
handling short RNA sequences.

RNA structure analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to RNA structure analysis

Similar to RNA structure analysis (20)

More from Afra Fathima

More from Afra Fathima (20)

Recently uploaded

Recently uploaded (20)

RNA structure analysis