SlideShare a Scribd company logo
1 of 12
An analysis of large rRNA sequences folded by a thermodynamic
method
Dana S Fields1 and Robin R Gutell1,2
Background: The secondary structure of RNA can be predicted by the
thermodynamics-based method of Zuker and Turner. The accuracy of the
method’s secondary structure predictions for rRNA can be assessed by using as
reference the currently available rRNA secondary structure models that have
been derived from comparative analysis of rRNA sequence alignments.
Results: We folded 72 23S rRNA sequences with the Zuker–Turner method
and scored the resulting secondary structure predictions against the
comparative model. Empirically, trends in the score were observed as a function
of the phylogenetic memberships of the sequences and as a function of the base
pairs secondary structural contexts. Further, three parameters were found that
(anti-)correlate with the score.
Conclusions: Three semiquantitative predictors of score were found: % of
noncanonical base pairs, % of hairpin loops that were stable tetraloops, and
sequence %G+C. The folding of rRNA is a tractable problem and
thermodynamics-based folding algorithms, in particular, are useful in the study of
this folding problem even for large RNA molecules (e.g. 16S and 23S rRNA).
Introduction
The translation of mRNA into protein is a process central
to all known forms of life on Earth. The molecular
machine that performs this process of decoding and con-
struction is the ribosome. Hence, the nature of the
(self)organization of the ribosome itself from its con-
stituent molecular parts is of great concern to biologists
and chemists alike.
Basically, the ribosome is composed of three primary
rRNA molecules (5S, 16S or 16S-like, and 23S or 23S-like
rRNA) and, depending on the particular organism under
study, a host of ∼ 50–80 different proteins [1,2]. The sec-
ondary structure for 5S, 16S, and 23S rRNA can be
inferred by comparative analysis on an appropriate align-
ment of rRNA sequences [3–5]. The comparative method
assumes that the sequences of rRNA have evolved so as to
preserve a three-dimensional structure (and function)
common to all organisms. It does not assume anything
about the thermodynamic stability or the kinetic history of
the proposed structure. In fact, it does not comment at all
on the time dependence of any structural feature.
Comparative analysis has already been performed on 5S,
16S, and 23S rRNA (reviewed in [3–5]). Importantly, due
to the ubiquitous lack of experimentally derived rRNA
secondary structures, comparatively derived rRNA sec-
ondary structure models currently serve as the reference
or ‘true’ structures of the three rRNAs. Experiments do,
however, provide a large body of evidence in confirmation
of the comparatively derived rRNA structures [1,2].
With this valuable point of reference in hand, the rRNA
secondary structure predictions of any computation proce-
dure can be gauged. In particular, we want to assess the
accuracy of the secondary structure predictions of the
heavily used method of Zuker and Turner [6–9] as it
applies to rRNA. Essentially, the method folds the input
RNA sequence according to a set of experimentally
derived thermodynamic values and returns the (not neces-
sarily unique) secondary structure of minimum free
energy — this is called the energetically optimal folding.
At the same time, the method also returns a number of
secondary structures of lesser stability pursuant to several
other defined parameters; these structures are called ener-
getically suboptimal foldings.
Recently, we submitted 56 16S rRNA sequences from the
three primary lines of descent (Archaea, (Eu)bacteria, and
Eucarya [10]) and the two eucaryal organelles (chloroplast
and mitochondria) to the Zuker–Turner method [11]. A
subsequent comparison of each of the 56 energetically
optimal foldings to their respective comparatively derived
structures revealed that the Zuker–Turner method had,
on average, correctly predicted 46 ± 17% (mean ± standard
deviation [SD]) of the true secondary structure. The
Addresses: 1Department of Molecular, Cellular, and
Developmental Biology, Campus Box 347,
University of Colorado at Boulder, Boulder, CO
80309-0347, USA. 2Department of Chemistry and
Biochemistry, Campus Box 215, University of
Colorado at Boulder, Boulder, CO 80309-0215,
USA. DS Fields e-mail: Dana.Fields@Colorado.edu.
Correspondence: Robin R Gutell at Department of
Chemistry and Biochemistry, Campus Box 215,
University of Colorado at Boulder, Boulder, CO
80309-0215, USA.
e-mail: Robin.Gutell@Colorado.Edu
Key words: comparative analysis, rRNA, RNA folding,
RNA secondary structure, Zuker–Turner method
Received: 17 Jun 1996
Revisions requested: 15 Jul 1996
Revisions received: 14 Aug 1996
Accepted: 16 Aug 1996
Published: 09 Oct 1996
Electronic identifier: 1359-0278-001-00419
Folding & Design 09 Oct 1996, 1:419–430
© Current Biology Ltd ISSN 1359-0278
Research Paper 419
maximum score was 81% for an archaeal sequence and the
minimum was 10% for a eucaryal sequence. In general,
the high-to-low trend in score and the mean score of each
group (in parentheses) was: 16S Archaea (69%) > Eubacte-
ria (55%) > chloroplast (48%) > mitochondria (31%) =
Eucarya (30%). These results were very interesting
because they showed that, in some cases, the thermody-
namics-based method of Zuker and Turner could success-
fully elucidate the biologically relevant form of a large
RNA sequence such as 16S rRNA, which is about 1500
bases in length. (We believe that a score of 46% is mean-
ingful in the context of a 10% minimum score and an 81%
maximum score.)
In the present paper, we continue this investigation of the
applicability of the Zuker–Turner method by presenting
the outcome of the folding of 72 phylogenetically and
structurally distinct 23S rRNA sequences (see Table 1) by
the said method (see Materials and methods). But even
more, we identify several properties of the comparatively
derived secondary structures that serve as semiquantita-
tive predictors of the accuracy of the secondary structure
predictions of the Zuker–Turner method. In the Results
section, we start by providing a summary of the vast
amount of information obtained from the 72 23S rRNA
foldings by working with average scores and we also take
this opportunity to compare and contrast these results
with our previous findings on 16S rRNA [11]. The conclu-
sions drawn here represent a set of empirical observations
on the performance of the Zuker–Turner method. We
then switch from using average scores to looking at indi-
vidual scores as a function of several properties of the
comparatively derived secondary structures; in this way,
we are able to identify the predictors. The predictors are
the % of noncanonical base pairs, the % of hairpin loops
that are stable tetraloops, and the sequence %G+C. Here
we keep our speculation on the causality of the predictive
relationship to a minimum; the important point is that we
demonstrate that this complex problem does indeed have
‘handles’ with which it can be understood. In the Discus-
420 Folding & Design Vol 1 No 6
Table 1
Selection of 23S rRNA.
Archaea (8)
Euryarchaeota Halobacterium marismortui, Halococcus morrhuae, Methanobacterium thermoautotrophicum,
Methanococcus vannielii, Thermococcus celer, Thermoplasma acidophilum
Crenarchaeota Sulfolobus solfataricus, Thermoproteus tenax
(Eu)bacteria (13)
Thermotoga Thermotoga maritima
Deinococcus + relatives Thermus thermophilus
Spirochaetes + relatives Borrelia burgdorferi
Purple bacteria Campylobacter coli, Escherichia coli, Pseudomonas aeruginosa, Pseudomonas cepacia,
Rhodobacter sphaeroides (A)
Cyanobacteria Anacystis nidulans (new name = Synechoccus sp. 6301)
Gram-positive Bacillus subtilis, Frankia sp., Mycobacterium leprae, Streptomyces ambofaciens
Chloroplast (10)
Protista Astasia longa, Chlamydomonas eugametos, Chlamydomonas reinhardtii, Chlorella ellipsoidea,
Euglena gracilis, Palmaria palmata
Plantae Alnus incana, Marchantia polymorpha, Nicotiana tabacum, Zea mays
Mitochondria (18)
Protista Acanthamoeba castellanii, Chondrus crispus, Dictyostelium discoideum, Paramecium tetraurelia,
Prototheca wickerhamii, Tetrahymena pyriformis (R)
Fungi Penicillium chrysogenum, Saccharomyces cerevisiae, Schizosaccharomyces pombe
Plantae Marchantia polymorpha, Zea mays
Animalia Caenorhabditis elegans, Crossostoma lacustre, Drosophila melanogaster, Gallus gallus, Homo sapiens,
Paracentrotus lividus, Xenopus laevis
Eucarya (23)
Archezoa Giardia intestinalis, Giardia muris
Protista Chlorella ellipsoidea, Crithidia fasciculata, Euglena gracilis, Physarum polycephalum, Phytophthora megasperma,
Prorocentrum micans, Tetrahymena thermophila, Toxoplasma gondii-P, Trypanosoma brucei
Fungi Cryptococcus neoformans B, Pneumocystis carinii, Saccharomyces cerevisiae
Animalia Aedes albopictus, Caenorhabditis elegans, Drosophila melanogaster, Herdmania momus, Homo sapiens,
Mus musculus, Xenopus laevis
Plantae Arabidopsis thaliana, Oryza sativa
sion, we assume that the biologically relevant secondary
structure of rRNA is in fact the structure of global
minimum free energy. By doing this, we provide a frame-
work within which we can speculate on the biological
meaning of the results.
Results
The scoring scheme
For any one rRNA sequence, the Zuker–Turner method
returns an energetically optimal folding and a number of
energetically suboptimal foldings, and we obtain a sepa-
rate score for each of these alternative structures. The
score for any particular energetically optimal or suboptimal
folding is simply the percentage of the canonical base
pairs (i.e. AU-, CG-, and GU-containing base pairs) that
are present in the sequence’s comparatively derived sec-
ondary structure [12,13] that are proposed to exist in the
folding. For example, the comparatively derived sec-
ondary structure of 23S rRNA of Escherichia coli has 801
canonical base pairs. The energetically optimal folding of
the E. coli 23S rRNA sequence correctly predicts 466 of
these base pairs, yielding a score of 58%. For each of the
72 23S rRNA sequences studied here, we have mapped
the base pairs that were correctly predicted in the energet-
ically optimal foldings onto the comparatively derived sec-
ondary structures (Materials and methods; see also
http://pundit.colorado.edu:8080/root.html).
The definition of score given here is useful because it
straightforwardly reflects the amount of secondary struc-
ture correctly predicted by the folding algorithm that it
was theoretically able to predict. The Zuker–Turner
method, unlike the comparative approach, does not
predict noncanonical base pairs, so excluding noncanoni-
cal base pairs from the score allows it to range from 0 to
100%. This definition does not address the larger issue of
what is happening to the single-stranded regions, nor does
it examine the energetically optimal and suboptimal fold-
ings in the regions where the true secondary structure was
not correctly predicted to see what structure, if any,
formed instead. These issues must eventually be
addressed and a score must be defined to suit this need,
but these are not in the scope of this paper.
The prediction of RNA secondary structure with the
Zuker–Turner method tacitly assumes that the structure
of global minimal free energy is in fact the structure of
biological relevance. Hence, the scores given in this paper
always refer to the score of the energetically optimal fold-
ings computed in the aforementioned base pair based
manner, except when we explicitly note that we are
working with the scores of the highest scoring energeti-
cally suboptimal foldings (i.e. the best suboptimal fold-
ings) or otherwise. Finally, it is not uncommon for
secondary structures to be scored against their respective
comparatively derived structures by counting the number
of helices that were correctly predicted. According to
Zuker et al. [14], a helix is defined as a region of double-
stranded RNA of at least three base pairs where interrup-
tions of internal or bulge loops of up to two unpaired bases
are allowed. A helix is then said to be correctly predicted
in the folding when all the base pairs comprising the helix
are correctly predicted (in the aforementioned sense) with
the exception of at most two base pairs. We used this
helix-based definition of score alongside our own base pair
based definition of score only in Figure 1.
Summary of the 23S rRNA foldings
General trends in the scores
The 23S rRNA sequence of highest score was the archaeal
sequence of Thermococcus celer at 74%, while the lowest
scoring 23S rRNA sequence was that of the chloroplast of
Astasia longa at 19%. The mean score ± SD over all 72 23S
rRNA sequences was 44 ± 11%. Interestingly, this is
essentially the same as the score of 46 ± 17% that was
found in our previous work over all 56 16S rRNA
sequences [11]. This strongly suggests for 16S and 23S
rRNA that sequence length does not relate to the folding
score because 23S and 23S-like rRNAs are about twice the
length of 16S and 16S-like rRNAs.
Research Paper Analysis of 23S rRNA Fields and Gutell 421
Figure 1
Scoring of the secondary structure predictions of the method of Zuker
and Turner [6–9]. The horizontal axis shows the five phylogenetic
groups under study: arc, Archaea; eub, Eubacteria; chl, chloroplast;
mit, mitochondria; euc, Eucarya. The vertical axis is the group mean
score of the energetically optimal foldings of the 23S rRNA
sequences. The group mean scores in terms of the base pair based
counting method are denoted as ‘bp’ while the group mean scores in
terms of the helix-based counting method are denoted as ‘hel’. The
maximum and minimum scores are denoted by closed and open circles
respectively. The standard deviations are displayed as vertical lines.
100
90
80
70
60
50
40
30
20
10
100
50% %
bp hel
arc
bp hel
eub
bp hel
chl
bp hel
mit
bp hel
euc
The mean score of each group is presented in Figure 1
where the trend in the group mean scores is (mean score in
parentheses): 23S Archaea (59%) = Eubacteria (53%) >
chloroplast (39%) = mitochondria (38%) = Eucarya (41%).
The 23S and 16S rRNA cases are similar in that the
sequences of Archaea and Eubacteria are folded more accu-
rately on average by the Zuker–Turner method than are the
sequences of mitochondria and Eucarya. The two cases
differ in that the group mean scores for 23S Archaea, Eubac-
teria, and chloroplast are less than their corresponding values
in the 16S rRNA case, while in contrast, the group mean
scores of 23S mitochondria and Eucarya are larger than their
corresponding values in the 16S rRNA case. Taking into
account our previous comments on the overall mean scores,
one concludes that 16S and 23S rRNA secondary structures
are predicted with a similar pattern of accuracy (on average)
by the Zuker–Turner method, except that as one moves
from working with 16S to 23S rRNA there is a corresponding
shift in accuracy away from sequences of Archaea, Eubacte-
ria, and chloroplast to those of mitochondria and Eucarya.
From Figure 1, one can see that the range of prediction
success (i.e. maximum minus minimum score) is ∼ 22% for
Archaea and Eubacteria, and ∼ 35% for chloroplast, mito-
chondria, and Eucarya. These are slightly smaller than the
analogous ranges found for 16S rRNA, which were 28%
and 38% respectively. By using instead the scores of the
best suboptimal foldings, the 23S rRNA group mean
scores increase only 5–8%, but the aforementioned trend
in the 23S rRNA group mean score is preserved. Figure 1
also displays the group mean scores of the 23S rRNA fold-
ings in terms of the helix-based scoring scheme, where the
helix-based scheme clearly causes no significant change in
the results.
Average scores in terms of base pair distances
A base pair can be categorized by the difference in the
numerical positions in the primary sequence of its two
constituent bases. This type of categorization is useful
because it allows one to attempt to understand the scores
in terms of local versus global base-to-base pairing. The
inset of Figure 2 uses bins of size 100 nt to denote the
mean distribution of the base pair distances found in the
comparatively derived secondary structures of the 13 23S
rRNA eubacterial sequences; the analogous distributions
for Archaea, chloroplast, mitochondria, and Eucarya give
essentially the same pattern. All five phylogenetic groups
do have base pairs in the 23S rRNA comparatively derived
secondary structures with distances > 600 nt, the largest of
these reaching 3300 nt in the mitochondria. But since all
these combined make up only ~ 2% of the total in each
group, they are not shown.
The inset of Figure 2 reveals that the majority (∼ 75%) of
the base pairs in 23S rRNA comparatively derived sec-
ondary structures are separated by < 100 nt; these are
422 Folding & Design Vol 1 No 6
Figure 2
The inset shows the distribution of the base
pairs according to distance. The horizontal
axis shows the classification of the base pairs
in the eubacterial comparatively derived
secondary structures according to the
distance of separation of each base pair’s
constituent bases in the primary sequence; a
bin size of 100 nt is used. The vertical axis
denotes the average % of the base pairs over
the Eubacteria which fall into each of the
distance bins of the horizontal axis. The
standard deviation (SD) of each average was
no larger than 1.4%. In the main figure, the
scoring of the secondary structure predictions
of the method of Zuker and Turner are shown
with respect to base pair distance. The
horizontal axis shows the five phylogenetic
groups under study: arc, Archaea; eub,
Eubacteria; chl, chloroplast; mit, mitochondria;
euc, Eucarya. The vertical axis is the group
mean score of the energetically optimal
foldings of the 72 23S rRNA sequences
specific to the classification of the base pairs
according to the distance of separation of the
base pair’s constituent bases in the primary
sequence; a bin size of 100 nt is used. For the
100 nt bins, the SD was about 7% for arc and
eub, and about 11% for chl, mit, and euc. For
the 200 nt bins the SD was ~ 17% for all the
groups, except eub was ~ 10%. For the 300
to 600 nt bins, the SD was typically one- to
two-fold the value of the respective mean
score.
100
90
80
70
60
50
40
30
20
10
100
50% %
100200300400500600
arc
100200300400500600
eub
100200300400500600
chl
100200300400500600
mit
100200300400500600
euc
100
90
80
70
60
50
40
30
20
10
%
Comparative structure
100 200 300 400 500 600
called ‘short range’ while base pairs separated by > 100 nt
are called ‘long range’. Interestingly, this inset is similar to
its counterpart in our previous study of 16S rRNA [11]. So,
an important point to note is that even though 23S and
23S-like rRNAs are about twice the size of 16S and 16S-
like rRNA molecules, which of course gives 23S rRNA the
potential to form many more long-range interactions than
could be found in 16S rRNA, both rRNA molecules
appear to be using short-range base pairs to constitute the
bulk of their secondary structure.
Carrying on with the idea of binning each base pair in the
comparatively derived structures, for Figure 2 proper we
calculate a new set of group mean scores specific to each
bin. The most immediate conclusion from Figure 2 is that
within any one group the short-range base pairs are pre-
dicted better than the long-range base pairs. In support of
this, one observes that the mean scores of each of the five
100 nt bins exceeds its corresponding group mean score in
Figure 1 by ∼ 7%; no other distance bin has this behavior.
This same type of behavior was observed in the case of 16S
rRNA. The most notable pattern of difference between
the 23S and 16S rRNA cases is found in the 200 nt bin;
specifically, the 23S rRNA mean scores were greater than
their 16S rRNA counterparts by typically ∼ 20%.
Average scores with base pairs classified by loop closure
A base pair can be categorized according to the type of
loop that it closes. The categories (as exemplified in the
inset of Fig. 3) are hairpin loop closing (h), internal loop
closing (i), bulge loop closing (b), and multistem loop
closing (m). Categorizing base pairs in this way is useful
because it allows one to attempt to relate the scores with
the local structural features of RNA. Group mean scores
specific to each type of loop category were computed and
collected in Figure 3. Although the standard deviations in
these group means can be rather large (e.g. in Fig. 3, see
the m category in the mitochondria group), one can still
see that within each of the five phylogenetic groups,
hairpin loop closing base pairs are predicted better than
bulge and internal loop closers, which in turn score better
than the multistem loop closing base pairs.
The group mean scores specific to the hairpin loops all
exceed their respective group mean score in Figure 1 by
9–18% and this is not seen with the mean scores specific
to the other loop categories (except for the b category in
Eucarya in Fig. 3). By the nature of RNA structure, a base
pair that closes a hairpin loop is usually a short-range base
pair as well. So it is interesting to observe that the group
mean scores specific to the hairpin loops all exceed their
respective 100 nt bin mean scores in Figure 2 by 3–10%.
In the case of 16S rRNA, we observed basically these
same two patterns [11]. So overall, one concludes that base
pairs classified as being hairpin loop closers in 16S and 23S
rRNA are predicted better than base pairs classified as
closing the other three loop types.
Predictors of folding success
Score versus % of noncanonical base pairs
The Zuker–Turner method does not predict noncanoni-
cal base pairs even though this type of pairing does occur
Research Paper Analysis of 23S rRNA Fields and Gutell 423
Figure 3
The inset shows the base pair classification in
terms of loop closure. The categories are: h,
hairpin loop closing; i, internal loop closing; b,
bulge loop closing; m, multistem loop closing.
In the main figure, scoring of the secondary
structure predictions of the method of Zuker
and Turner is shown with respect to the four
loop closing classifications. The horizontal axis
shows the five phylogenetic groups under
study. The vertical axis is the group mean
score of the energetically optimal foldings of
the 72 23S rRNA sequences specific to the
classification of the base pairs according to
the type of loop that they close. The maximum
and minimum scores that contributed to each
of the loop-specific group mean scores are
denoted by a closed and open circle,
respectively. The standard deviations are
displayed as vertical lines.
100
90
80
70
60
50
40
30
20
10
100
50% %
h i b m h i b m h i b m h i b m h i b m
arc eub chl mit euc
′ ′
in comparatively derived secondary structures. In partic-
ular, for the 72 23S rRNA comparatively derived sec-
ondary structures studied here, the % of noncanonical
base pairs ranged from 2.9 to 9.9%. (By noncanonical, we
mean any pairing other than AU, UA, CG, GC, GU, or
UG.) In our earlier work on 16S rRNA [11], it was noted
qualitatively that the % of noncanonical base pairs
present in a comparatively derived secondary structure
was ‘inversely proportional’ to the folding score of the
corresponding primary sequence. To follow up on these
two clues, Figure 4 shows the score of the folding of each
of the 72 23S rRNA sequences plotted against their cor-
responding values of % noncanonical base pairs; Figure 5
gives the same information for the 56 16S rRNA
sequences. In Figure 4, the score decreases by ∼ 40%
with a 6% increase in % noncanonical base pairs, while in
Figure 5, the score decreases by ∼ 50% with an 8%
increase in % noncanonical base pairs. The linear correla-
tions coefficients (see Materials and methods) are –0.55
and –0.74 for Figures 4 and 5, respectively, which are
both significant at a level < 0.05%. We therefore con-
clude that the accuracy of the secondary structure predic-
tions of 16S and 23S rRNA made by the Zuker–Turner
method will decrease with increasing % noncanonical
base pairs.
Why are noncanonical base pairs detrimental to the score?
One reason for this may be that in the comparatively
derived secondary structures, noncanonical base pairs are
often internal to helices (see the secondary structure dia-
grams at http://pundit.colorado.edu:8080/root.html). So,
the closest the Zuker–Turner method can come to cor-
rectly reproducing helices bearing internal noncanonical
base pair(s) is to correctly form the canonical base pairs
while simultaneously leaving internal loop(s) where the
noncanonical base pair(s) should be. The destabilizing
influence of an internal loop negates the stabilizing contri-
bution of about two stacked dinucleotide units, and this
clearly presents a way by which noncanonical base pairs
can erode the accuracy of a secondary structure prediction.
Score versus % of hairpin loops that are ‘stable tetraloops’
Examination of the comparatively derived secondary
structures of the 72 23S rRNA sequences revealed that the
most frequently observed hairpin loops were of size four
(tetraloops) and, excepting triloops, the observed fre-
quency of each size of loop tended to increase with
decreasing loop size. Specifically, considering the five
phylogenetic group averages, tetraloops comprised
424 Folding & Design Vol 1 No 6
Figure 4
Score versus % of noncanonical base pairs (%NC) present in the
corresponding comparatively derived secondary structure. For each of
the 72 23S rRNA sequences folded, the score of the energetically
optimal folding is plotted against its %NC. The linear correlation
coefficient is –0.55.
Score
%NC
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
4.00 6.00 8.00 10.00
Figure 5
Score versus % of noncanonical base pairs (%NC) present in the
corresponding comparatively derived secondary structure. For each of
the 56 16S rRNA sequences folded in our previous work [11], the
score of the energetically optimal folding is plotted against its %NC.
The linear correlation coefficient is –0.74.
Score
%NC
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
6.00 8.00 10.00 12.00 14.00 16.00
26–35% of the total hairpin loops. In our study of 16S
rRNA [11,15], tetraloops were also found to be the most
frequently observed (30–50%) type of hairpin loop; the
nature and function of tetraloops in 16S rRNA [5,15] and
23S rRNA [5] has been discussed. The Zuker–Turner
method assigns a free energy contribution of about
+4.5 kcal to any tetraloop, but even more, it will simulta-
neously add an additional –2.0 kcal to the free energy con-
tribution of tetraloops with the following composition [7]:
GHGA, GVAA, and UWCG (H = not G, V = not U, W = A
or U [16]). Working with the comparatively derived sec-
ondary structures, the distribution of these three sub-
classes of tetraloop (which have been called ‘stable
tetraloops’) with respect to all tetraloops is shown in
Table 2. Clearly, the frequency of use of these stable
tetraloops is: GVAA > GHGA > UWCG. This same trend
was found in our study of 16S rRNA [11,15], although on
average the frequencies of use of the stable tetraloops
were higher by ∼ 15% than in the 23S rRNA case.
Given that the Zuker–Turner method recognizes these
so-called stable tetraloops, the % of hairpin loops that are
in fact stable tetraloops in the comparatively derived sec-
ondary structures (% hairpin loops that are stable
tetraloops [%HSTL] = number of stable tetraloops /
number of all hairpin loops) seems likely to be related to
the score. Therefore, the scores of the folding of each of
the 72 23S rRNA sequences are plotted against the corre-
sponding values of %HSTL in Figure 6; the kindred plot
for the 56 16S rRNA sequences is given in Figure 7. In
Figure 6 the score increases by ∼ 30% with a 20% increase
in %HSTL, while in Figure 7, the score increases by
∼ 50% with a 30% increase in %HSTL. The linear correla-
tion coefficients for Figures 6 and 7 are +0.51 and +0.66,
respectively, which are both significant at a level < 0.05%.
We conclude that the accuracy of the secondary structure
predictions of 16S and 23S rRNA made by the
Zuker–Turner method will increase with increasing %
hairpin loops that are stable tetraloops.
Attempting a quick rationalization of this correlation, one
might recall that the Zuker–Turner method simply adds a
stabilizing contribution to GHGA, GVAA, and UWCG
tetraloops. So continuing with the thought, it might then
be considered ‘obvious’ that the placement of these stabi-
lizing structures where a hairpin loop is required would
improve the score. However, this argument fails to con-
sider the specific pattern of distribution of the tetraloop
subsequences in each of the full-length sequences under
consideration; i.e. in each full-length sequence, there are a
number of subsequences that could form ‘stable
tetraloops’ (just by chance) and this presents a potentially
sequence-specific type of ‘background’ in which %
hairpin loops that are stable tetraloops must be inter-
preted. So further work is clearly needed to elucidate the
nature of this correlation.
Score versus sequence length
Perhaps the most obvious property of a secondary struc-
ture that one would expect to have an impact on the accu-
racy of its folding is the length of the sequence, because
the complexity of the folding problem increases with
increasing sequence size. In our earlier work with 16S
rRNA [11], we commented that score and length appear
not to be related on the basis of a simple survey of the
results. To place this idea on more solid ground for both
16S and 23S rRNA, we prepared plots (data not shown;
see Figs A,B at http://pundit.colorado.edu:8080/root.html)
of score versus sequence length where only the mitochon-
drial and eucaryal sequences were considered; we did not
use archaeal, eubacterial, or chloroplast sequences because
these sequences are very close in length and hence they
do not sample a broad length range. The linear correlation
coefficients for the two plots are +0.15 and +0.19 for the 41
mitochondrial and eucaryal 23S rRNA sequences and the
22 mitochondrial and eucaryal 16S rRNA sequences,
respectively. These coefficients indicate a correlation of
doubtful significance, so we conclude that score and
length are not related in the case of 16S and 23S rRNA (as
suggested earlier).
Score versus %G+C
The % of bases in a sequence that are G or C (%G+C) is
known to be useful in characterizing the behavior of
nucleic acids; for example, %G+C correlates with the
melting temperature of DNA [17]. Therefore, we plotted
%G+C against the score for the archaeal, eubacterial,
chloroplast, and mitochondrial 23S rRNA data (see Fig. 8),
and in a separate plot for the eucaryal 23S rRNA data (see
Fig. 9). Analogous plots were produced for 16S rRNA from
our earlier work [11] (data not shown; see Figs C,D at
http://pundit.colorado.edu:8080/root.html). The most
immediate impression from the examination of all of these
plots was that the 49 23S and 41 16S rRNA archaeal,
eubacterial, chloroplast, and mitochondrial data points cor-
relate positively with the score; i.e. the score increases
with increasing %G+C. Specifically, linear correlation
coefficients are +0.59 and +0.55 for the 23S and 16S rRNA
cases, respectively, which are both significant at a level of
Research Paper Analysis of 23S rRNA Fields and Gutell 425
Table 2
Distribution of the ‘stable tetraloops’ with respect to all
tetraloops in the comparatively derived secondary structures.
Group %GHGA %GVAA %UWCG Sum of %
Archaea 23 ± 9 26 ± 10 6 ± 4 55
Eubacteria 16 ± 4 27 ± 9 8 ± 3 51
Chloroplast 19 ± 6 22 ± 6 6 ± 4 47
Mitochondria 12 ± 11 20 ± 13 3 ± 4 35
Eucarya 17 ± 5 22 ± 9 10 ± 7 49
H = not G; V = not U; W = A or U [16].
< 0.1%. In contrast, the eucaryal data points for the 23 23S
and 15 16S rRNA sequences appeared to be anti-correlat-
ing with the score (see Fig. 9). Specifically, linear correla-
tion coefficients are –0.63 and –0.40 for the 23S and 16S
rRNA cases, respectively, which is significant at a level of
< 0.5% for the 23S rRNA case but which represents a
weak correlation (i.e. significance level of 14%) for the 16S
rRNA case. So, regarding the noneucaryal sequences of
16S and 23S rRNA, we conclude that %G+C correlates
with the score; for 23S eucaryal rRNA, we conclude that
an anti-correlation exists.
In our earlier paper on 16S rRNA [11], a plot of score
versus %G+C was considered. In this plot, the data points
belonging to Archaea, Eubacteria, and chloroplast were
displayed as a single group, while the data points belong-
ing to mitochondria and Eucarya were taken together.
With the results displayed in this way, the correlations and
anti-correlation that were noted in the previous paragraph
were not realized; the conclusion was instead that score
and %G+C did not seem to be related.
Discussion
The mechanism by which an rRNA molecule attains its
biologically relevant secondary and tertiary structure is not
known. Undoubtedly, this folding mechanism incorpo-
rates pathways guided by both kinetic and thermody-
namic constraints, and these pathways certainly include
proteins and other important factors [18,19]. Nevertheless,
for the sake of discussion here, we assume that the
‘driving force’ behind the in vivo folding of rRNA is the
global minimization of the free energy of the naked rRNA
molecule.
The overall average scores for the prediction of 16S and
23S rRNA were found to be 46 ± 17% and 44 ± 11%,
respectively. Under the assumption that the global mini-
mization of free energy determines the secondary structure,
why do the majority of the scores not approach 100% and
why are there significant differences in the folding success
of the different phylogenetic groups? In answer, first recall
the overall trend in prediction success displayed in Figure 1
where, on average, the mitochondrial and eucaryal
sequences did not fold well relative to the archaeal and
eubacterial sequences. This suggests that the free energy
elements used as input to the Zuker–Turner method may
need to be separately ‘optimized’ for each of the five phylo-
genetic groups. One must take care when discussing opti-
426 Folding & Design Vol 1 No 6
Figure 7
Score versus % of the hairpin loops that are ‘stable tetraloops’
(%HSTL) in the corresponding comparatively derived secondary
structure. For each of the 56 16S rRNA sequences folded in our
previous work [11], the score of the energetically optimal folding is
plotted against its %HSTL. The linear correlation coefficient is +0.66.
Score
%HSTL
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
10.00 20.00 30.00 40.00
Figure 6
Score versus % of the hairpin loops that are ‘stable tetraloops’
(%HSTL) in the corresponding comparatively derived secondary
structure. For each of the 72 23S rRNA sequences folded, the score
of the energetically optimal folding is plotted against its %HSTL.
The linear correlation coefficient is +0.51.
Score
%HSTL
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
5.00 10.00 15.00 20.00 25.00
mization. In an ideal setting at a given temperature, only
one set of free energy parameters is expected for all RNA
molecules. However, real systems are likely to maintain
different conditions of salt, pH, and other factors, so in this
sense ‘optimization’ might prove useful.
Second, there may be structural features (i.e. motifs) still
unknown to us that need to be identified and incorporated
into the algorithm; these unknown features could be
group-specific and could account for the trend in the score
in Figure 1. Analogously, the frequency of occurrence of
motifs in each of the five phylogenetic groups may be
related to the accurate prediction of base pairs and hence
the score; this relationship requires further study. The
aforementioned ‘optimization’ of the Zuker–Turner
method would need to account for the frequency of occur-
rence in motifs
Why are short-range base pairs predicted better than long-
range base pairs (see Fig. 2)? The Zuker–Turner method
does not penalize a base pair simply for being long range,
nor does it stabilize a base pair simply for being short
range. So, one must look beyond this convenient view in
order to answer the question. By definition, the con-
stituent bases that make up a short-range base pair are
separated by fewer nucleotides than are the constituent
bases of a long-range base pair. Hence, a smaller amount
of the overall rRNA structure is ‘closed’ by a short-range
base pair, and in contrast, a long-range base pair can close
a significant portion of the overall structure. This suggests
that the formation of a long-range base pair is a more
complex problem in three-dimensional space than is the
formation of a short-range base pair. Since the
Zuker–Turner method folds RNA by estimating the free
energy of only secondary structural features, the three-
dimensional information required to correctly predict
long-range base pairs is not inherent in the thermody-
namic energy values, and the result that short-range base
pairs are predicted better than long-range base pairs is
then not unexpected.
Why are base pairs that are classified as being hairpin loop
closers predicted better than base pairs classified as
closing bulge, internal, and multistem loops? By the con-
struction of RNA, base pairs that close hairpin loops are
very likely to be short-range base pairs as well. So this
question may be partially answered by the previous com-
ments regarding short-range base pairs. On the other
hand, hairpin loops have received much greater experi-
Research Paper Analysis of 23S rRNA Fields and Gutell 427
Figure 8
Score versus % of bases in the corresponding primary sequence that
are G or C (%G+C). For each of the 49 23S archaeal, eubacterial,
chloroplast, and mitochondrial rRNA sequences folded, the score of
the energetically optimal folding is plotted against its %G+C. The
linear correlation coefficient is +0.59.
Score
%G+C
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
20.00 30.00 40.00 50.00 60.00
Figure 9
Score versus % of bases in the corresponding primary sequence that
are G or C (%G+C). For each of the 23 23S eucaryal rRNA
sequences folded, the score of the energetically optimal folding is
plotted against its %G+C. The linear correlation coefficient is –0.63.
Score
%G+C
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
38.00
40.00
42.00
44.00
46.00
48.00
50.00
52.00
54.00
56.00
58.00
60.00
40.00 50.00 60.00 70.00
mental attention than have bulge, internal, and multistem
loops (i.e. irregularities within helices, which would also
include noncanonical base pairs). Given this disparate
level of attention, the Zuker–Turner method may simply
be accounting more accurately for base pairs in proximity
to hairpin loops than for base pairs in proximity to the
other three loop types; this could answer the current ques-
tion. To illustrate the complexity of the study of irregular-
ities in a helix, one needs to consider only the two
different arrangements of bulge loops of size one (see [3]).
In this case, the single base may ‘bulge out’ of the helix in
which it is situated (as its name implies), or instead the
base may be intercalated into its respective helix. Surely
these two arrangements should not be assigned the same
value of free energy by the Zuker–Turner method as is
currently the case? (One must use caution with arguments
that focus on the adjustment of a free energy input
element because, by necessity, the adjustment will affect
the secondary structure at both the global and the local
level.) Also, the free energy values currently used for mul-
tistem loops are estimations and a better understanding of
multistem loops would undoubtedly lead to an increase in
predictive accuracy [6,20].
Any time a prediction of RNA secondary structure is
made, one must be concerned with the accuracy of the
selfsame prediction. Regarding the Zuker–Turner
method, Zuker and co-workers have begun to address the
issue of accuracy in the case of 16S rRNA [20]. Specifi-
cally, they showed that predictions made by their method
are more reliable when the helices of the structure of
interest are predicted with a low level of competing struc-
tures in the suboptimal foldings. (They call this property
‘well determinedness’.) We approached the issue of accu-
racy for 16S and 23S rRNA by looking for predictors of
score and we semiquantitatively identified three such
parameters: % noncanonical base pairs, % hairpin loops
that are stable tetraloops, and %G+C.
Might claiming % noncanonical base pairs and % hairpin
loops that are stable tetraloops as predictors of score
simply be ‘too obvious’ to warrant our elaboration? Not at
all! First, this work has provided not only semiquantitative
proof that the relationships exist, but it has gauged the
magnitude of the relationship. Second, the fact that
several predictors have been found, and that their effect
on the score is measurable, shows that a continued investi-
gation of the accuracy of the Zuker–Turner method has
the potential of uncovering other secondary structure fea-
tures that might (anti-)correlate with the score. It was a
negative result that length was found not to be a predictor
of score for the 16S and 23S mitochondrial and eucaryal
sequences. Nevertheless, this fact was worth reporting
because it shows that the Zuker–Turner method should
not be disconsidered simply because one is working with
large sequences.
Predictors of score can be used in two ways. First, predic-
tors can point out where time and effort (experimental and
theoretical) should be focused to improve the method. All
the predictors noted here fall into this category and %
noncanonical base pairs is a very straightforward example.
Zuker and co-workers [20] have already suggested that
incorporation of rules to predict noncanonical base pairs is
likely to improve the accuracy of the method and our
demonstration of % noncanonical base pairs as a predictor
places this suggestion on solid ground. Second, using the
Zuker–Turner method as a way to play out the folding of
rRNA, predictors directly question the biological nature of
rRNA. We hinted at this above when we suggested “that
the free energy elements used as input to the
Zuker–Turner method may need to be separately ‘opti-
mized’ for each of the five phylogenetic groups.” The %
noncanonical base pairs is a (perhaps trivial) example of
this in that it forces one to notice that mitochondrial and
eucaryal 23S rRNA comparatively derived secondary
structures utilize noncanonical base pairs to a greater
extent than do the secondary structures of the other phy-
logenetic groups. In a contrasting fashion, one is forced to
note that % hairpin loops that are stable tetraloops is larger
for archaeal and eubacterial comparatively derived sec-
ondary structures than it is for structures of mitochondria
and Eucarya.
The case of %G+C in 23S rRNA provides a less superficial
example of this second usage of a predictor, because the
content of G+C in a eucaryal sequence exhibits a very dif-
ferent behavior in the execution of the algorithm than it
does in the other phylogenetic groups (see Figs 8,9). This
may infer that the folding strategy of eucaryal rRNA
differs in some (unknown) manner from the strategies of
the other phylogenetic groups. In the future, this differ-
ence could be investigated by charting the context of use
(e.g. is the base found predominantly in helices or in
single-stranded regions, or does it tend to close a particular
type of loop?) of the four base types in the comparatively
derived secondary structures. We speculate that this could
provide a basis of explanation for the behavior of %G+C as
a predictor of score, and yet at the same time, our bio-
chemical perspective of rRNA would expand in a funda-
mental way by learning how each of the four base types
are ‘used’ in building secondary structure in each of the
five phylogenetic groups.
In closing, we have presented in this paper the results of
the folding of 72 phylogenetically and structurally distinct
23S rRNA sequences by the Zuker–Turner method.
Using averages of scores, we have pointed out trends in
the accuracy of the prediction of the method by grouping
the sequences into five phylogenetic classes and by sub-
grouping the classes according to base pair distance and
loop closing category. This showed that base pairs that are
better understood (i.e. short-range and hairpin loop
428 Folding & Design Vol 1 No 6
closing base pairs) tend to be better predicted. To begin
to understand how the Zuker–Turner method can be
improved and, just as important, to probe the biology of
16S and 23S rRNA, we identified three semiquantitative
predictors of score (i.e. % noncanonical base pairs, %
hairpin loops that are stable tetraloops, and %G+C).
Finally, by assuming that the global minimization of free
energy is the ‘driving force’ behind the folding of 16S and
23S rRNA, we have discussed the results and showed that
much work remains to be done in this field.
Materials and methods
Diagrams of the 72 comparatively derived secondary structures of 23S
rRNA used here are publically available at: http://pundit.
colorado.edu:8080/root.html. Each diagram shows the comparatively
inferred base pairs and the base pairs that were correctly predicted by
the energetically optimal folding of the Zuker–Turner method. The most
immediate conclusion upon review of these structures is that the base
pairs that are correctly predicted are ‘short range’ with respect to their
positions in the primary sequence (i.e. < 100 bases apart). For each of
the 72 sequences, the score of the energetically optimal folding, the
score of the best suboptimal folding, the sequence length, and the
other values used in the Results section are also available.
Notation and abbreviations
Both 23S and 23S-like rRNA are referred to as 23S rRNA. Likewise,
16S and 16S-like rRNA are both referred to as 16S rRNA. Canonical
base pairs are Watson–Crick base pairs or GU or UG base pairs; any
other type of base pair is noncanonical. For the purposes of denoting
trends in the text we use ‘>’ in its standard role as ‘greater than’, except
that ‘=’ will be used whenever the values of two scores come within
90% of each other. We use the word ‘semiquantitative’ to describe our
results because we utilize linear correlation coefficients without provid-
ing or discussing the associated regression line.
Comparatively derived secondary structures
The comparatively derived secondary structures upon which all the
scores were based were secondary structures derived from compara-
tive analysis [3–5]. The 72 23S rRNA sequences used in this paper
and their respective comparatively derived secondary structures were
selected from recent compilations [12,13] or from data in preparation
for publication by MN Schnare, SH Damberger, MW Gray, and
RR Gutell. The sequences were selected so that phylogenetically and
structurally distinct comparatively derived secondary structures would
be represented. The sequences were similar in phylogenetic position-
ing and degree of structural variation to the 16S rRNA sequences ana-
lyzed previously [11]. The secondary structure diagrams were drawn
with the program XRNA (Weiser and Noller;
ftp://fangio.ucsc.edu/pub/XRNA/) on Sun Sparc workstations.
Zuker–Turner method
The Zuker–Turner method [6–9] used here was version 2.2 of the
MFOLD program and the free energy values were those for 37°C. At
the time of writing the MFOLD package was available at
ftp://snark.wustl.edu/pub/ and the MFOLD manual could be found at
http://ibc.wustl.edu/~zuker/seqanal/. We implemented MFOLD on a
Sun Sparc 10/51 and we set the user-defined parameters required for
the operation of MFOLD to the following values: % for sort (P%) = 10,
number of trace-backs = 50, and window size = 20. Each sequence
was submitted to MFOLD in its full-length form rather than being
broken into its known domains with subsequent submission of these
domain subsequences. Although this practice increased the time of
computation by several-fold (each folding took approximately 48 to 72
and 120 megabytes of memory), it did emphasize our pretense of
having no prior knowledge of the biologically relevant secondary struc-
ture. It is noted that the Zuker–Turner method has been used in a way
in which coaxial stacking and some noncanonical base pairs were
addressed [21]; we did not use this version.
Linear correlation coefficients.
Our goal was to find semiquantitative predictors of the accuracy of the
Zuker–Turner method and this ultimately required us to interpret scatter
plots of score versus the corresponding values of the suspected pre-
dictors. To police our qualitative interpretation of the scatter plots, we
computed the linear correlation coefficient without regression analysis
for each plot under consideration. Naturally, one does not expect that
the hypothetical function relating the score to any particular predictor
need be linear; nevertheless, from the multitude of standard statistics
that can be employed in any study, the extent to which our scatter plots
support a linear relation was a frugal choice. Letting x and y represent
the two values to be compared and with the bar denoting the mean
value the linear correlation coefficient was [22]:
⌺ (xi – x–) (yi – y–)
√⌺ (xi – x–)2 ⌺ (yi – y–)2 (1)
The linear correlation coefficients were computed in Fortran 77 on a
Sun Sparc workstation and the level of significance for each linear cor-
relation coefficient was determined from Table 3 in the appendix of [22].
Acknowledgements
This work was supported by grants from the NIH (GM48207) and the Col-
orado RNA Center. The WH Keck Foundation is thanked for its support of
RNA research at the University of Colorado. RR Gutell is an associate in the
Program in Evolutionary Biology of the Canadian Institute for Advanced
Research. We thank Danielle Konings for a great deal of assistance in all
the phases of this project. We also thank the referees for their constructive
criticisms.
References
1. Hill, W.E., Dahlberg, A., Garrett, R.A., Moore, P.B., Schlessinger, D. &
Warner, J.R. (eds). (1990). The Ribosome: Structure, Function, and
Evolution. American Society for Microbiology, Washington, DC.
2. Nierhaus, K.H., Franceschi, F., Subramanian, A.R., Erdmann, V.A. &
Wittmann-Liebold, B. (eds). (1993). The Translational Apparatus:
Structure, Function, Regulation, Evolution. Plenum Press, New York.
3. Gutell, R.R., Larsen, N. & Woese, C.R. (1994). Lessons from an evolv-
ing rRNA: 16S and 23S rRNA structures from a comparative perspec-
tive. Microbiol. Rev. 58, 10–26.
4. Woese, C.R. & Pace, N.R. (1993). Probing RNA structure, function
and history by comparative analysis. In The RNA World. (Gesteland,
R.F., Atkins, J.F., eds), pp. 91–117, Cold Spring Harbor Laboratory
Press, Plainview NY.
5. Gutell, R.R. (1996). Comparative sequence analysis and the structure
of 16S and 23S rRNA. In Ribosomal RNA: Structure, Evolution, Pro-
cessing, and Function in Protein Biosynthesis. (Zimmermann, R.A.,
Dahlberg, A.E., eds), pp. 111–128, CRC Press, Boca Raton FL.
6. Jaeger, J.A., Turner, D.H. & Zuker, M. (1989). Improved predictions of
secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A. 86,
7706–7710.
7. Jaeger, J.A., Turner, D.H. & Zuker, M. (1990). Predicting optimal and
suboptimal secondary structure for RNA. In Molecular Evolution:
Computer Analysis of Protein and Nucleic Acid Sequences. (Doolit-
tle, R.F., ed.), vol. 183, pp. 281–306, Academic Press, San Diego CA.
8. Zuker, M. (1989). On finding all suboptimal foldings of an RNA mole-
cule. Science 244, 48–52.
9. Zuker, M. (1989). The use of dynamic programming algorithms in RNA
secondary structure prediction. In Mathematical Methods for DNA
Sequences. (Waterman, M.S., ed.), pp. 159–184, CRC Press, Boca
Raton FL.
10. Woese, C.R., Kandler, O. & Wheelis, M.L. (1990). Towards a natural
system of organisms: proposal for the domains Archaea, Bacteria and
Eucarya. Proc. Natl. Acad. Sci. U.S.A. 87, 4576–4579.
11. Konings, D.A.M. & Gutell, R.R. (1995). A comparison of thermody-
namic foldings with comparatively derived structures of 16S and 16S-
like rRNAs. RNA 1, 559–574.
12. Gutell, R.R., Gray, M.W. & Schnare, M.N. (1993). A compilation of
large subunits (23S and 23S-like) ribosomal RNA structures: 1993.
Nucleic Acids Res. 21, 3055–3074.
Research Paper Analysis of 23S rRNA Fields and Gutell 429
13. Schnare, M.N., Damberger, S.H., Gray, M.W. & Gutell, R.R. (1996).
Comprehensive comparison of structural characteristics in eukaryotic
cytoplasmic large subunit (23S-like) ribosomal RNA. J. Mol. Biol. 256,
701–719.
14. Zuker, M., Jaeger, J.A. & Turner, D.H. (1991). Comparison of optimal
and suboptimal RNA secondary structures predicted by free energy
minimization with structures determined by phylogenetic comparison.
Nucleic Acids Res. 19, 2707–2714.
15. Woese, C.R., Winker, S. & Gutell, R.R. (1990). Architecture of riboso-
mal RNA: constraints on the sequence of “tetra-loops”. Proc. Natl.
Acad. Sci. U.S.A. 87, 8467–8471.
16. Cornish-Bowden, A. (1985). Nomenclature for incompletely specified
bases in nucleic acid sequences: recommendations 1984. Nucleic
Acids Res. 13, 3021–3030.
17. Saenger, W. (1984). Principles of Nucleic Acid Structure. p. 146,
Springer-Verlag, New York.
18. Herschlag, D. (1995). RNA chaperones and the RNA folding problem.
J. Biol. Chem. 270, 20871–20874.
19. Higgs, P.G. & Morgan, S.R. (1995). Thermodynamics of RNA folding.
When is an RNA molecule in equilibrium? In Advances in Artificial
Life: Proceedings of the Third European Conference on Artificial Life.
(Moran, F., Moreno, A., Merelo, J.J., Chacon, P., eds), In Lecture Notes
in Artificial Intelligence, vol. 929, pp. 852–861, Springer Verlag, New
York.
20. Zuker, M. & Jacobson, A.B. (1995). “Well-determined” regions in RNA
secondary structure prediction: analysis of small subunit ribosomal
RNA. Nucleic Acids Res. 23, 2791–2798.
21. Walter, A.E., et al., & Zuker, M. (1994). Coaxial stacking of helixes
enhances binding of oligoribonucleotides and improves predictions of
RNA folding. Proc. Natl. Acad. Sci. U.S.A. 91, 9218–9222.
22. Taylor, J.R. (1982). An Introduction to Error Analysis. University
Science Books, Mill Valley CA.
430 Folding & Design Vol 1 No 6
Because Folding & Design operates a ‘Continuous Publication
System’ for Research Papers, this paper has been published
via the internet before being printed. The paper can be
accessed from http://biomednet.com/cbiology/fad.htm — for
further information, see the explanation on the contents page.

More Related Content

Similar to Analysis of rRNA Sequences Folded by Thermodynamic Method

Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Robin Gutell
 
Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Robin Gutell
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Robin Gutell
 
Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Robin Gutell
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Robin Gutell
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Robin Gutell
 
Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Robin Gutell
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Robin Gutell
 
Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Robin Gutell
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Robin Gutell
 
Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Robin Gutell
 
Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Robin Gutell
 
Gutell 043.curg.1994.26.0321
Gutell 043.curg.1994.26.0321Gutell 043.curg.1994.26.0321
Gutell 043.curg.1994.26.0321Robin Gutell
 
Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Robin Gutell
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Gutell 069.mpe.2000.15.0083
Gutell 069.mpe.2000.15.0083Gutell 069.mpe.2000.15.0083
Gutell 069.mpe.2000.15.0083Robin Gutell
 
Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Robin Gutell
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodRobin Gutell
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Robin Gutell
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataRobin Gutell
 

Similar to Analysis of rRNA Sequences Folded by Thermodynamic Method (20)

Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105
 
Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430
 
Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173Gutell 010.jbc.1984.259.05173
Gutell 010.jbc.1984.259.05173
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785
 
Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010
 
Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473
 
Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735
 
Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313
 
Gutell 043.curg.1994.26.0321
Gutell 043.curg.1994.26.0321Gutell 043.curg.1994.26.0321
Gutell 043.curg.1994.26.0321
 
Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
Gutell 069.mpe.2000.15.0083
Gutell 069.mpe.2000.15.0083Gutell 069.mpe.2000.15.0083
Gutell 069.mpe.2000.15.0083
 
Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.good
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_data
 

More from Robin Gutell

Gutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiGutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiRobin Gutell
 
Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Robin Gutell
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Robin Gutell
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigRobin Gutell
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Robin Gutell
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Robin Gutell
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Robin Gutell
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Robin Gutell
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Robin Gutell
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Robin Gutell
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Robin Gutell
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Robin Gutell
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Robin Gutell
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Robin Gutell
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Robin Gutell
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Robin Gutell
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Robin Gutell
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Robin Gutell
 

More from Robin Gutell (20)

Gutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiGutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xi
 
Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfig
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Analysis of rRNA Sequences Folded by Thermodynamic Method

  • 1. An analysis of large rRNA sequences folded by a thermodynamic method Dana S Fields1 and Robin R Gutell1,2 Background: The secondary structure of RNA can be predicted by the thermodynamics-based method of Zuker and Turner. The accuracy of the method’s secondary structure predictions for rRNA can be assessed by using as reference the currently available rRNA secondary structure models that have been derived from comparative analysis of rRNA sequence alignments. Results: We folded 72 23S rRNA sequences with the Zuker–Turner method and scored the resulting secondary structure predictions against the comparative model. Empirically, trends in the score were observed as a function of the phylogenetic memberships of the sequences and as a function of the base pairs secondary structural contexts. Further, three parameters were found that (anti-)correlate with the score. Conclusions: Three semiquantitative predictors of score were found: % of noncanonical base pairs, % of hairpin loops that were stable tetraloops, and sequence %G+C. The folding of rRNA is a tractable problem and thermodynamics-based folding algorithms, in particular, are useful in the study of this folding problem even for large RNA molecules (e.g. 16S and 23S rRNA). Introduction The translation of mRNA into protein is a process central to all known forms of life on Earth. The molecular machine that performs this process of decoding and con- struction is the ribosome. Hence, the nature of the (self)organization of the ribosome itself from its con- stituent molecular parts is of great concern to biologists and chemists alike. Basically, the ribosome is composed of three primary rRNA molecules (5S, 16S or 16S-like, and 23S or 23S-like rRNA) and, depending on the particular organism under study, a host of ∼ 50–80 different proteins [1,2]. The sec- ondary structure for 5S, 16S, and 23S rRNA can be inferred by comparative analysis on an appropriate align- ment of rRNA sequences [3–5]. The comparative method assumes that the sequences of rRNA have evolved so as to preserve a three-dimensional structure (and function) common to all organisms. It does not assume anything about the thermodynamic stability or the kinetic history of the proposed structure. In fact, it does not comment at all on the time dependence of any structural feature. Comparative analysis has already been performed on 5S, 16S, and 23S rRNA (reviewed in [3–5]). Importantly, due to the ubiquitous lack of experimentally derived rRNA secondary structures, comparatively derived rRNA sec- ondary structure models currently serve as the reference or ‘true’ structures of the three rRNAs. Experiments do, however, provide a large body of evidence in confirmation of the comparatively derived rRNA structures [1,2]. With this valuable point of reference in hand, the rRNA secondary structure predictions of any computation proce- dure can be gauged. In particular, we want to assess the accuracy of the secondary structure predictions of the heavily used method of Zuker and Turner [6–9] as it applies to rRNA. Essentially, the method folds the input RNA sequence according to a set of experimentally derived thermodynamic values and returns the (not neces- sarily unique) secondary structure of minimum free energy — this is called the energetically optimal folding. At the same time, the method also returns a number of secondary structures of lesser stability pursuant to several other defined parameters; these structures are called ener- getically suboptimal foldings. Recently, we submitted 56 16S rRNA sequences from the three primary lines of descent (Archaea, (Eu)bacteria, and Eucarya [10]) and the two eucaryal organelles (chloroplast and mitochondria) to the Zuker–Turner method [11]. A subsequent comparison of each of the 56 energetically optimal foldings to their respective comparatively derived structures revealed that the Zuker–Turner method had, on average, correctly predicted 46 ± 17% (mean ± standard deviation [SD]) of the true secondary structure. The Addresses: 1Department of Molecular, Cellular, and Developmental Biology, Campus Box 347, University of Colorado at Boulder, Boulder, CO 80309-0347, USA. 2Department of Chemistry and Biochemistry, Campus Box 215, University of Colorado at Boulder, Boulder, CO 80309-0215, USA. DS Fields e-mail: Dana.Fields@Colorado.edu. Correspondence: Robin R Gutell at Department of Chemistry and Biochemistry, Campus Box 215, University of Colorado at Boulder, Boulder, CO 80309-0215, USA. e-mail: Robin.Gutell@Colorado.Edu Key words: comparative analysis, rRNA, RNA folding, RNA secondary structure, Zuker–Turner method Received: 17 Jun 1996 Revisions requested: 15 Jul 1996 Revisions received: 14 Aug 1996 Accepted: 16 Aug 1996 Published: 09 Oct 1996 Electronic identifier: 1359-0278-001-00419 Folding & Design 09 Oct 1996, 1:419–430 © Current Biology Ltd ISSN 1359-0278 Research Paper 419
  • 2. maximum score was 81% for an archaeal sequence and the minimum was 10% for a eucaryal sequence. In general, the high-to-low trend in score and the mean score of each group (in parentheses) was: 16S Archaea (69%) > Eubacte- ria (55%) > chloroplast (48%) > mitochondria (31%) = Eucarya (30%). These results were very interesting because they showed that, in some cases, the thermody- namics-based method of Zuker and Turner could success- fully elucidate the biologically relevant form of a large RNA sequence such as 16S rRNA, which is about 1500 bases in length. (We believe that a score of 46% is mean- ingful in the context of a 10% minimum score and an 81% maximum score.) In the present paper, we continue this investigation of the applicability of the Zuker–Turner method by presenting the outcome of the folding of 72 phylogenetically and structurally distinct 23S rRNA sequences (see Table 1) by the said method (see Materials and methods). But even more, we identify several properties of the comparatively derived secondary structures that serve as semiquantita- tive predictors of the accuracy of the secondary structure predictions of the Zuker–Turner method. In the Results section, we start by providing a summary of the vast amount of information obtained from the 72 23S rRNA foldings by working with average scores and we also take this opportunity to compare and contrast these results with our previous findings on 16S rRNA [11]. The conclu- sions drawn here represent a set of empirical observations on the performance of the Zuker–Turner method. We then switch from using average scores to looking at indi- vidual scores as a function of several properties of the comparatively derived secondary structures; in this way, we are able to identify the predictors. The predictors are the % of noncanonical base pairs, the % of hairpin loops that are stable tetraloops, and the sequence %G+C. Here we keep our speculation on the causality of the predictive relationship to a minimum; the important point is that we demonstrate that this complex problem does indeed have ‘handles’ with which it can be understood. In the Discus- 420 Folding & Design Vol 1 No 6 Table 1 Selection of 23S rRNA. Archaea (8) Euryarchaeota Halobacterium marismortui, Halococcus morrhuae, Methanobacterium thermoautotrophicum, Methanococcus vannielii, Thermococcus celer, Thermoplasma acidophilum Crenarchaeota Sulfolobus solfataricus, Thermoproteus tenax (Eu)bacteria (13) Thermotoga Thermotoga maritima Deinococcus + relatives Thermus thermophilus Spirochaetes + relatives Borrelia burgdorferi Purple bacteria Campylobacter coli, Escherichia coli, Pseudomonas aeruginosa, Pseudomonas cepacia, Rhodobacter sphaeroides (A) Cyanobacteria Anacystis nidulans (new name = Synechoccus sp. 6301) Gram-positive Bacillus subtilis, Frankia sp., Mycobacterium leprae, Streptomyces ambofaciens Chloroplast (10) Protista Astasia longa, Chlamydomonas eugametos, Chlamydomonas reinhardtii, Chlorella ellipsoidea, Euglena gracilis, Palmaria palmata Plantae Alnus incana, Marchantia polymorpha, Nicotiana tabacum, Zea mays Mitochondria (18) Protista Acanthamoeba castellanii, Chondrus crispus, Dictyostelium discoideum, Paramecium tetraurelia, Prototheca wickerhamii, Tetrahymena pyriformis (R) Fungi Penicillium chrysogenum, Saccharomyces cerevisiae, Schizosaccharomyces pombe Plantae Marchantia polymorpha, Zea mays Animalia Caenorhabditis elegans, Crossostoma lacustre, Drosophila melanogaster, Gallus gallus, Homo sapiens, Paracentrotus lividus, Xenopus laevis Eucarya (23) Archezoa Giardia intestinalis, Giardia muris Protista Chlorella ellipsoidea, Crithidia fasciculata, Euglena gracilis, Physarum polycephalum, Phytophthora megasperma, Prorocentrum micans, Tetrahymena thermophila, Toxoplasma gondii-P, Trypanosoma brucei Fungi Cryptococcus neoformans B, Pneumocystis carinii, Saccharomyces cerevisiae Animalia Aedes albopictus, Caenorhabditis elegans, Drosophila melanogaster, Herdmania momus, Homo sapiens, Mus musculus, Xenopus laevis Plantae Arabidopsis thaliana, Oryza sativa
  • 3. sion, we assume that the biologically relevant secondary structure of rRNA is in fact the structure of global minimum free energy. By doing this, we provide a frame- work within which we can speculate on the biological meaning of the results. Results The scoring scheme For any one rRNA sequence, the Zuker–Turner method returns an energetically optimal folding and a number of energetically suboptimal foldings, and we obtain a sepa- rate score for each of these alternative structures. The score for any particular energetically optimal or suboptimal folding is simply the percentage of the canonical base pairs (i.e. AU-, CG-, and GU-containing base pairs) that are present in the sequence’s comparatively derived sec- ondary structure [12,13] that are proposed to exist in the folding. For example, the comparatively derived sec- ondary structure of 23S rRNA of Escherichia coli has 801 canonical base pairs. The energetically optimal folding of the E. coli 23S rRNA sequence correctly predicts 466 of these base pairs, yielding a score of 58%. For each of the 72 23S rRNA sequences studied here, we have mapped the base pairs that were correctly predicted in the energet- ically optimal foldings onto the comparatively derived sec- ondary structures (Materials and methods; see also http://pundit.colorado.edu:8080/root.html). The definition of score given here is useful because it straightforwardly reflects the amount of secondary struc- ture correctly predicted by the folding algorithm that it was theoretically able to predict. The Zuker–Turner method, unlike the comparative approach, does not predict noncanonical base pairs, so excluding noncanoni- cal base pairs from the score allows it to range from 0 to 100%. This definition does not address the larger issue of what is happening to the single-stranded regions, nor does it examine the energetically optimal and suboptimal fold- ings in the regions where the true secondary structure was not correctly predicted to see what structure, if any, formed instead. These issues must eventually be addressed and a score must be defined to suit this need, but these are not in the scope of this paper. The prediction of RNA secondary structure with the Zuker–Turner method tacitly assumes that the structure of global minimal free energy is in fact the structure of biological relevance. Hence, the scores given in this paper always refer to the score of the energetically optimal fold- ings computed in the aforementioned base pair based manner, except when we explicitly note that we are working with the scores of the highest scoring energeti- cally suboptimal foldings (i.e. the best suboptimal fold- ings) or otherwise. Finally, it is not uncommon for secondary structures to be scored against their respective comparatively derived structures by counting the number of helices that were correctly predicted. According to Zuker et al. [14], a helix is defined as a region of double- stranded RNA of at least three base pairs where interrup- tions of internal or bulge loops of up to two unpaired bases are allowed. A helix is then said to be correctly predicted in the folding when all the base pairs comprising the helix are correctly predicted (in the aforementioned sense) with the exception of at most two base pairs. We used this helix-based definition of score alongside our own base pair based definition of score only in Figure 1. Summary of the 23S rRNA foldings General trends in the scores The 23S rRNA sequence of highest score was the archaeal sequence of Thermococcus celer at 74%, while the lowest scoring 23S rRNA sequence was that of the chloroplast of Astasia longa at 19%. The mean score ± SD over all 72 23S rRNA sequences was 44 ± 11%. Interestingly, this is essentially the same as the score of 46 ± 17% that was found in our previous work over all 56 16S rRNA sequences [11]. This strongly suggests for 16S and 23S rRNA that sequence length does not relate to the folding score because 23S and 23S-like rRNAs are about twice the length of 16S and 16S-like rRNAs. Research Paper Analysis of 23S rRNA Fields and Gutell 421 Figure 1 Scoring of the secondary structure predictions of the method of Zuker and Turner [6–9]. The horizontal axis shows the five phylogenetic groups under study: arc, Archaea; eub, Eubacteria; chl, chloroplast; mit, mitochondria; euc, Eucarya. The vertical axis is the group mean score of the energetically optimal foldings of the 23S rRNA sequences. The group mean scores in terms of the base pair based counting method are denoted as ‘bp’ while the group mean scores in terms of the helix-based counting method are denoted as ‘hel’. The maximum and minimum scores are denoted by closed and open circles respectively. The standard deviations are displayed as vertical lines. 100 90 80 70 60 50 40 30 20 10 100 50% % bp hel arc bp hel eub bp hel chl bp hel mit bp hel euc
  • 4. The mean score of each group is presented in Figure 1 where the trend in the group mean scores is (mean score in parentheses): 23S Archaea (59%) = Eubacteria (53%) > chloroplast (39%) = mitochondria (38%) = Eucarya (41%). The 23S and 16S rRNA cases are similar in that the sequences of Archaea and Eubacteria are folded more accu- rately on average by the Zuker–Turner method than are the sequences of mitochondria and Eucarya. The two cases differ in that the group mean scores for 23S Archaea, Eubac- teria, and chloroplast are less than their corresponding values in the 16S rRNA case, while in contrast, the group mean scores of 23S mitochondria and Eucarya are larger than their corresponding values in the 16S rRNA case. Taking into account our previous comments on the overall mean scores, one concludes that 16S and 23S rRNA secondary structures are predicted with a similar pattern of accuracy (on average) by the Zuker–Turner method, except that as one moves from working with 16S to 23S rRNA there is a corresponding shift in accuracy away from sequences of Archaea, Eubacte- ria, and chloroplast to those of mitochondria and Eucarya. From Figure 1, one can see that the range of prediction success (i.e. maximum minus minimum score) is ∼ 22% for Archaea and Eubacteria, and ∼ 35% for chloroplast, mito- chondria, and Eucarya. These are slightly smaller than the analogous ranges found for 16S rRNA, which were 28% and 38% respectively. By using instead the scores of the best suboptimal foldings, the 23S rRNA group mean scores increase only 5–8%, but the aforementioned trend in the 23S rRNA group mean score is preserved. Figure 1 also displays the group mean scores of the 23S rRNA fold- ings in terms of the helix-based scoring scheme, where the helix-based scheme clearly causes no significant change in the results. Average scores in terms of base pair distances A base pair can be categorized by the difference in the numerical positions in the primary sequence of its two constituent bases. This type of categorization is useful because it allows one to attempt to understand the scores in terms of local versus global base-to-base pairing. The inset of Figure 2 uses bins of size 100 nt to denote the mean distribution of the base pair distances found in the comparatively derived secondary structures of the 13 23S rRNA eubacterial sequences; the analogous distributions for Archaea, chloroplast, mitochondria, and Eucarya give essentially the same pattern. All five phylogenetic groups do have base pairs in the 23S rRNA comparatively derived secondary structures with distances > 600 nt, the largest of these reaching 3300 nt in the mitochondria. But since all these combined make up only ~ 2% of the total in each group, they are not shown. The inset of Figure 2 reveals that the majority (∼ 75%) of the base pairs in 23S rRNA comparatively derived sec- ondary structures are separated by < 100 nt; these are 422 Folding & Design Vol 1 No 6 Figure 2 The inset shows the distribution of the base pairs according to distance. The horizontal axis shows the classification of the base pairs in the eubacterial comparatively derived secondary structures according to the distance of separation of each base pair’s constituent bases in the primary sequence; a bin size of 100 nt is used. The vertical axis denotes the average % of the base pairs over the Eubacteria which fall into each of the distance bins of the horizontal axis. The standard deviation (SD) of each average was no larger than 1.4%. In the main figure, the scoring of the secondary structure predictions of the method of Zuker and Turner are shown with respect to base pair distance. The horizontal axis shows the five phylogenetic groups under study: arc, Archaea; eub, Eubacteria; chl, chloroplast; mit, mitochondria; euc, Eucarya. The vertical axis is the group mean score of the energetically optimal foldings of the 72 23S rRNA sequences specific to the classification of the base pairs according to the distance of separation of the base pair’s constituent bases in the primary sequence; a bin size of 100 nt is used. For the 100 nt bins, the SD was about 7% for arc and eub, and about 11% for chl, mit, and euc. For the 200 nt bins the SD was ~ 17% for all the groups, except eub was ~ 10%. For the 300 to 600 nt bins, the SD was typically one- to two-fold the value of the respective mean score. 100 90 80 70 60 50 40 30 20 10 100 50% % 100200300400500600 arc 100200300400500600 eub 100200300400500600 chl 100200300400500600 mit 100200300400500600 euc 100 90 80 70 60 50 40 30 20 10 % Comparative structure 100 200 300 400 500 600
  • 5. called ‘short range’ while base pairs separated by > 100 nt are called ‘long range’. Interestingly, this inset is similar to its counterpart in our previous study of 16S rRNA [11]. So, an important point to note is that even though 23S and 23S-like rRNAs are about twice the size of 16S and 16S- like rRNA molecules, which of course gives 23S rRNA the potential to form many more long-range interactions than could be found in 16S rRNA, both rRNA molecules appear to be using short-range base pairs to constitute the bulk of their secondary structure. Carrying on with the idea of binning each base pair in the comparatively derived structures, for Figure 2 proper we calculate a new set of group mean scores specific to each bin. The most immediate conclusion from Figure 2 is that within any one group the short-range base pairs are pre- dicted better than the long-range base pairs. In support of this, one observes that the mean scores of each of the five 100 nt bins exceeds its corresponding group mean score in Figure 1 by ∼ 7%; no other distance bin has this behavior. This same type of behavior was observed in the case of 16S rRNA. The most notable pattern of difference between the 23S and 16S rRNA cases is found in the 200 nt bin; specifically, the 23S rRNA mean scores were greater than their 16S rRNA counterparts by typically ∼ 20%. Average scores with base pairs classified by loop closure A base pair can be categorized according to the type of loop that it closes. The categories (as exemplified in the inset of Fig. 3) are hairpin loop closing (h), internal loop closing (i), bulge loop closing (b), and multistem loop closing (m). Categorizing base pairs in this way is useful because it allows one to attempt to relate the scores with the local structural features of RNA. Group mean scores specific to each type of loop category were computed and collected in Figure 3. Although the standard deviations in these group means can be rather large (e.g. in Fig. 3, see the m category in the mitochondria group), one can still see that within each of the five phylogenetic groups, hairpin loop closing base pairs are predicted better than bulge and internal loop closers, which in turn score better than the multistem loop closing base pairs. The group mean scores specific to the hairpin loops all exceed their respective group mean score in Figure 1 by 9–18% and this is not seen with the mean scores specific to the other loop categories (except for the b category in Eucarya in Fig. 3). By the nature of RNA structure, a base pair that closes a hairpin loop is usually a short-range base pair as well. So it is interesting to observe that the group mean scores specific to the hairpin loops all exceed their respective 100 nt bin mean scores in Figure 2 by 3–10%. In the case of 16S rRNA, we observed basically these same two patterns [11]. So overall, one concludes that base pairs classified as being hairpin loop closers in 16S and 23S rRNA are predicted better than base pairs classified as closing the other three loop types. Predictors of folding success Score versus % of noncanonical base pairs The Zuker–Turner method does not predict noncanoni- cal base pairs even though this type of pairing does occur Research Paper Analysis of 23S rRNA Fields and Gutell 423 Figure 3 The inset shows the base pair classification in terms of loop closure. The categories are: h, hairpin loop closing; i, internal loop closing; b, bulge loop closing; m, multistem loop closing. In the main figure, scoring of the secondary structure predictions of the method of Zuker and Turner is shown with respect to the four loop closing classifications. The horizontal axis shows the five phylogenetic groups under study. The vertical axis is the group mean score of the energetically optimal foldings of the 72 23S rRNA sequences specific to the classification of the base pairs according to the type of loop that they close. The maximum and minimum scores that contributed to each of the loop-specific group mean scores are denoted by a closed and open circle, respectively. The standard deviations are displayed as vertical lines. 100 90 80 70 60 50 40 30 20 10 100 50% % h i b m h i b m h i b m h i b m h i b m arc eub chl mit euc ′ ′
  • 6. in comparatively derived secondary structures. In partic- ular, for the 72 23S rRNA comparatively derived sec- ondary structures studied here, the % of noncanonical base pairs ranged from 2.9 to 9.9%. (By noncanonical, we mean any pairing other than AU, UA, CG, GC, GU, or UG.) In our earlier work on 16S rRNA [11], it was noted qualitatively that the % of noncanonical base pairs present in a comparatively derived secondary structure was ‘inversely proportional’ to the folding score of the corresponding primary sequence. To follow up on these two clues, Figure 4 shows the score of the folding of each of the 72 23S rRNA sequences plotted against their cor- responding values of % noncanonical base pairs; Figure 5 gives the same information for the 56 16S rRNA sequences. In Figure 4, the score decreases by ∼ 40% with a 6% increase in % noncanonical base pairs, while in Figure 5, the score decreases by ∼ 50% with an 8% increase in % noncanonical base pairs. The linear correla- tions coefficients (see Materials and methods) are –0.55 and –0.74 for Figures 4 and 5, respectively, which are both significant at a level < 0.05%. We therefore con- clude that the accuracy of the secondary structure predic- tions of 16S and 23S rRNA made by the Zuker–Turner method will decrease with increasing % noncanonical base pairs. Why are noncanonical base pairs detrimental to the score? One reason for this may be that in the comparatively derived secondary structures, noncanonical base pairs are often internal to helices (see the secondary structure dia- grams at http://pundit.colorado.edu:8080/root.html). So, the closest the Zuker–Turner method can come to cor- rectly reproducing helices bearing internal noncanonical base pair(s) is to correctly form the canonical base pairs while simultaneously leaving internal loop(s) where the noncanonical base pair(s) should be. The destabilizing influence of an internal loop negates the stabilizing contri- bution of about two stacked dinucleotide units, and this clearly presents a way by which noncanonical base pairs can erode the accuracy of a secondary structure prediction. Score versus % of hairpin loops that are ‘stable tetraloops’ Examination of the comparatively derived secondary structures of the 72 23S rRNA sequences revealed that the most frequently observed hairpin loops were of size four (tetraloops) and, excepting triloops, the observed fre- quency of each size of loop tended to increase with decreasing loop size. Specifically, considering the five phylogenetic group averages, tetraloops comprised 424 Folding & Design Vol 1 No 6 Figure 4 Score versus % of noncanonical base pairs (%NC) present in the corresponding comparatively derived secondary structure. For each of the 72 23S rRNA sequences folded, the score of the energetically optimal folding is plotted against its %NC. The linear correlation coefficient is –0.55. Score %NC 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 4.00 6.00 8.00 10.00 Figure 5 Score versus % of noncanonical base pairs (%NC) present in the corresponding comparatively derived secondary structure. For each of the 56 16S rRNA sequences folded in our previous work [11], the score of the energetically optimal folding is plotted against its %NC. The linear correlation coefficient is –0.74. Score %NC 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 80.00 6.00 8.00 10.00 12.00 14.00 16.00
  • 7. 26–35% of the total hairpin loops. In our study of 16S rRNA [11,15], tetraloops were also found to be the most frequently observed (30–50%) type of hairpin loop; the nature and function of tetraloops in 16S rRNA [5,15] and 23S rRNA [5] has been discussed. The Zuker–Turner method assigns a free energy contribution of about +4.5 kcal to any tetraloop, but even more, it will simulta- neously add an additional –2.0 kcal to the free energy con- tribution of tetraloops with the following composition [7]: GHGA, GVAA, and UWCG (H = not G, V = not U, W = A or U [16]). Working with the comparatively derived sec- ondary structures, the distribution of these three sub- classes of tetraloop (which have been called ‘stable tetraloops’) with respect to all tetraloops is shown in Table 2. Clearly, the frequency of use of these stable tetraloops is: GVAA > GHGA > UWCG. This same trend was found in our study of 16S rRNA [11,15], although on average the frequencies of use of the stable tetraloops were higher by ∼ 15% than in the 23S rRNA case. Given that the Zuker–Turner method recognizes these so-called stable tetraloops, the % of hairpin loops that are in fact stable tetraloops in the comparatively derived sec- ondary structures (% hairpin loops that are stable tetraloops [%HSTL] = number of stable tetraloops / number of all hairpin loops) seems likely to be related to the score. Therefore, the scores of the folding of each of the 72 23S rRNA sequences are plotted against the corre- sponding values of %HSTL in Figure 6; the kindred plot for the 56 16S rRNA sequences is given in Figure 7. In Figure 6 the score increases by ∼ 30% with a 20% increase in %HSTL, while in Figure 7, the score increases by ∼ 50% with a 30% increase in %HSTL. The linear correla- tion coefficients for Figures 6 and 7 are +0.51 and +0.66, respectively, which are both significant at a level < 0.05%. We conclude that the accuracy of the secondary structure predictions of 16S and 23S rRNA made by the Zuker–Turner method will increase with increasing % hairpin loops that are stable tetraloops. Attempting a quick rationalization of this correlation, one might recall that the Zuker–Turner method simply adds a stabilizing contribution to GHGA, GVAA, and UWCG tetraloops. So continuing with the thought, it might then be considered ‘obvious’ that the placement of these stabi- lizing structures where a hairpin loop is required would improve the score. However, this argument fails to con- sider the specific pattern of distribution of the tetraloop subsequences in each of the full-length sequences under consideration; i.e. in each full-length sequence, there are a number of subsequences that could form ‘stable tetraloops’ (just by chance) and this presents a potentially sequence-specific type of ‘background’ in which % hairpin loops that are stable tetraloops must be inter- preted. So further work is clearly needed to elucidate the nature of this correlation. Score versus sequence length Perhaps the most obvious property of a secondary struc- ture that one would expect to have an impact on the accu- racy of its folding is the length of the sequence, because the complexity of the folding problem increases with increasing sequence size. In our earlier work with 16S rRNA [11], we commented that score and length appear not to be related on the basis of a simple survey of the results. To place this idea on more solid ground for both 16S and 23S rRNA, we prepared plots (data not shown; see Figs A,B at http://pundit.colorado.edu:8080/root.html) of score versus sequence length where only the mitochon- drial and eucaryal sequences were considered; we did not use archaeal, eubacterial, or chloroplast sequences because these sequences are very close in length and hence they do not sample a broad length range. The linear correlation coefficients for the two plots are +0.15 and +0.19 for the 41 mitochondrial and eucaryal 23S rRNA sequences and the 22 mitochondrial and eucaryal 16S rRNA sequences, respectively. These coefficients indicate a correlation of doubtful significance, so we conclude that score and length are not related in the case of 16S and 23S rRNA (as suggested earlier). Score versus %G+C The % of bases in a sequence that are G or C (%G+C) is known to be useful in characterizing the behavior of nucleic acids; for example, %G+C correlates with the melting temperature of DNA [17]. Therefore, we plotted %G+C against the score for the archaeal, eubacterial, chloroplast, and mitochondrial 23S rRNA data (see Fig. 8), and in a separate plot for the eucaryal 23S rRNA data (see Fig. 9). Analogous plots were produced for 16S rRNA from our earlier work [11] (data not shown; see Figs C,D at http://pundit.colorado.edu:8080/root.html). The most immediate impression from the examination of all of these plots was that the 49 23S and 41 16S rRNA archaeal, eubacterial, chloroplast, and mitochondrial data points cor- relate positively with the score; i.e. the score increases with increasing %G+C. Specifically, linear correlation coefficients are +0.59 and +0.55 for the 23S and 16S rRNA cases, respectively, which are both significant at a level of Research Paper Analysis of 23S rRNA Fields and Gutell 425 Table 2 Distribution of the ‘stable tetraloops’ with respect to all tetraloops in the comparatively derived secondary structures. Group %GHGA %GVAA %UWCG Sum of % Archaea 23 ± 9 26 ± 10 6 ± 4 55 Eubacteria 16 ± 4 27 ± 9 8 ± 3 51 Chloroplast 19 ± 6 22 ± 6 6 ± 4 47 Mitochondria 12 ± 11 20 ± 13 3 ± 4 35 Eucarya 17 ± 5 22 ± 9 10 ± 7 49 H = not G; V = not U; W = A or U [16].
  • 8. < 0.1%. In contrast, the eucaryal data points for the 23 23S and 15 16S rRNA sequences appeared to be anti-correlat- ing with the score (see Fig. 9). Specifically, linear correla- tion coefficients are –0.63 and –0.40 for the 23S and 16S rRNA cases, respectively, which is significant at a level of < 0.5% for the 23S rRNA case but which represents a weak correlation (i.e. significance level of 14%) for the 16S rRNA case. So, regarding the noneucaryal sequences of 16S and 23S rRNA, we conclude that %G+C correlates with the score; for 23S eucaryal rRNA, we conclude that an anti-correlation exists. In our earlier paper on 16S rRNA [11], a plot of score versus %G+C was considered. In this plot, the data points belonging to Archaea, Eubacteria, and chloroplast were displayed as a single group, while the data points belong- ing to mitochondria and Eucarya were taken together. With the results displayed in this way, the correlations and anti-correlation that were noted in the previous paragraph were not realized; the conclusion was instead that score and %G+C did not seem to be related. Discussion The mechanism by which an rRNA molecule attains its biologically relevant secondary and tertiary structure is not known. Undoubtedly, this folding mechanism incorpo- rates pathways guided by both kinetic and thermody- namic constraints, and these pathways certainly include proteins and other important factors [18,19]. Nevertheless, for the sake of discussion here, we assume that the ‘driving force’ behind the in vivo folding of rRNA is the global minimization of the free energy of the naked rRNA molecule. The overall average scores for the prediction of 16S and 23S rRNA were found to be 46 ± 17% and 44 ± 11%, respectively. Under the assumption that the global mini- mization of free energy determines the secondary structure, why do the majority of the scores not approach 100% and why are there significant differences in the folding success of the different phylogenetic groups? In answer, first recall the overall trend in prediction success displayed in Figure 1 where, on average, the mitochondrial and eucaryal sequences did not fold well relative to the archaeal and eubacterial sequences. This suggests that the free energy elements used as input to the Zuker–Turner method may need to be separately ‘optimized’ for each of the five phylo- genetic groups. One must take care when discussing opti- 426 Folding & Design Vol 1 No 6 Figure 7 Score versus % of the hairpin loops that are ‘stable tetraloops’ (%HSTL) in the corresponding comparatively derived secondary structure. For each of the 56 16S rRNA sequences folded in our previous work [11], the score of the energetically optimal folding is plotted against its %HSTL. The linear correlation coefficient is +0.66. Score %HSTL 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 80.00 10.00 20.00 30.00 40.00 Figure 6 Score versus % of the hairpin loops that are ‘stable tetraloops’ (%HSTL) in the corresponding comparatively derived secondary structure. For each of the 72 23S rRNA sequences folded, the score of the energetically optimal folding is plotted against its %HSTL. The linear correlation coefficient is +0.51. Score %HSTL 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 5.00 10.00 15.00 20.00 25.00
  • 9. mization. In an ideal setting at a given temperature, only one set of free energy parameters is expected for all RNA molecules. However, real systems are likely to maintain different conditions of salt, pH, and other factors, so in this sense ‘optimization’ might prove useful. Second, there may be structural features (i.e. motifs) still unknown to us that need to be identified and incorporated into the algorithm; these unknown features could be group-specific and could account for the trend in the score in Figure 1. Analogously, the frequency of occurrence of motifs in each of the five phylogenetic groups may be related to the accurate prediction of base pairs and hence the score; this relationship requires further study. The aforementioned ‘optimization’ of the Zuker–Turner method would need to account for the frequency of occur- rence in motifs Why are short-range base pairs predicted better than long- range base pairs (see Fig. 2)? The Zuker–Turner method does not penalize a base pair simply for being long range, nor does it stabilize a base pair simply for being short range. So, one must look beyond this convenient view in order to answer the question. By definition, the con- stituent bases that make up a short-range base pair are separated by fewer nucleotides than are the constituent bases of a long-range base pair. Hence, a smaller amount of the overall rRNA structure is ‘closed’ by a short-range base pair, and in contrast, a long-range base pair can close a significant portion of the overall structure. This suggests that the formation of a long-range base pair is a more complex problem in three-dimensional space than is the formation of a short-range base pair. Since the Zuker–Turner method folds RNA by estimating the free energy of only secondary structural features, the three- dimensional information required to correctly predict long-range base pairs is not inherent in the thermody- namic energy values, and the result that short-range base pairs are predicted better than long-range base pairs is then not unexpected. Why are base pairs that are classified as being hairpin loop closers predicted better than base pairs classified as closing bulge, internal, and multistem loops? By the con- struction of RNA, base pairs that close hairpin loops are very likely to be short-range base pairs as well. So this question may be partially answered by the previous com- ments regarding short-range base pairs. On the other hand, hairpin loops have received much greater experi- Research Paper Analysis of 23S rRNA Fields and Gutell 427 Figure 8 Score versus % of bases in the corresponding primary sequence that are G or C (%G+C). For each of the 49 23S archaeal, eubacterial, chloroplast, and mitochondrial rRNA sequences folded, the score of the energetically optimal folding is plotted against its %G+C. The linear correlation coefficient is +0.59. Score %G+C 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 20.00 30.00 40.00 50.00 60.00 Figure 9 Score versus % of bases in the corresponding primary sequence that are G or C (%G+C). For each of the 23 23S eucaryal rRNA sequences folded, the score of the energetically optimal folding is plotted against its %G+C. The linear correlation coefficient is –0.63. Score %G+C 22.00 24.00 26.00 28.00 30.00 32.00 34.00 36.00 38.00 40.00 42.00 44.00 46.00 48.00 50.00 52.00 54.00 56.00 58.00 60.00 40.00 50.00 60.00 70.00
  • 10. mental attention than have bulge, internal, and multistem loops (i.e. irregularities within helices, which would also include noncanonical base pairs). Given this disparate level of attention, the Zuker–Turner method may simply be accounting more accurately for base pairs in proximity to hairpin loops than for base pairs in proximity to the other three loop types; this could answer the current ques- tion. To illustrate the complexity of the study of irregular- ities in a helix, one needs to consider only the two different arrangements of bulge loops of size one (see [3]). In this case, the single base may ‘bulge out’ of the helix in which it is situated (as its name implies), or instead the base may be intercalated into its respective helix. Surely these two arrangements should not be assigned the same value of free energy by the Zuker–Turner method as is currently the case? (One must use caution with arguments that focus on the adjustment of a free energy input element because, by necessity, the adjustment will affect the secondary structure at both the global and the local level.) Also, the free energy values currently used for mul- tistem loops are estimations and a better understanding of multistem loops would undoubtedly lead to an increase in predictive accuracy [6,20]. Any time a prediction of RNA secondary structure is made, one must be concerned with the accuracy of the selfsame prediction. Regarding the Zuker–Turner method, Zuker and co-workers have begun to address the issue of accuracy in the case of 16S rRNA [20]. Specifi- cally, they showed that predictions made by their method are more reliable when the helices of the structure of interest are predicted with a low level of competing struc- tures in the suboptimal foldings. (They call this property ‘well determinedness’.) We approached the issue of accu- racy for 16S and 23S rRNA by looking for predictors of score and we semiquantitatively identified three such parameters: % noncanonical base pairs, % hairpin loops that are stable tetraloops, and %G+C. Might claiming % noncanonical base pairs and % hairpin loops that are stable tetraloops as predictors of score simply be ‘too obvious’ to warrant our elaboration? Not at all! First, this work has provided not only semiquantitative proof that the relationships exist, but it has gauged the magnitude of the relationship. Second, the fact that several predictors have been found, and that their effect on the score is measurable, shows that a continued investi- gation of the accuracy of the Zuker–Turner method has the potential of uncovering other secondary structure fea- tures that might (anti-)correlate with the score. It was a negative result that length was found not to be a predictor of score for the 16S and 23S mitochondrial and eucaryal sequences. Nevertheless, this fact was worth reporting because it shows that the Zuker–Turner method should not be disconsidered simply because one is working with large sequences. Predictors of score can be used in two ways. First, predic- tors can point out where time and effort (experimental and theoretical) should be focused to improve the method. All the predictors noted here fall into this category and % noncanonical base pairs is a very straightforward example. Zuker and co-workers [20] have already suggested that incorporation of rules to predict noncanonical base pairs is likely to improve the accuracy of the method and our demonstration of % noncanonical base pairs as a predictor places this suggestion on solid ground. Second, using the Zuker–Turner method as a way to play out the folding of rRNA, predictors directly question the biological nature of rRNA. We hinted at this above when we suggested “that the free energy elements used as input to the Zuker–Turner method may need to be separately ‘opti- mized’ for each of the five phylogenetic groups.” The % noncanonical base pairs is a (perhaps trivial) example of this in that it forces one to notice that mitochondrial and eucaryal 23S rRNA comparatively derived secondary structures utilize noncanonical base pairs to a greater extent than do the secondary structures of the other phy- logenetic groups. In a contrasting fashion, one is forced to note that % hairpin loops that are stable tetraloops is larger for archaeal and eubacterial comparatively derived sec- ondary structures than it is for structures of mitochondria and Eucarya. The case of %G+C in 23S rRNA provides a less superficial example of this second usage of a predictor, because the content of G+C in a eucaryal sequence exhibits a very dif- ferent behavior in the execution of the algorithm than it does in the other phylogenetic groups (see Figs 8,9). This may infer that the folding strategy of eucaryal rRNA differs in some (unknown) manner from the strategies of the other phylogenetic groups. In the future, this differ- ence could be investigated by charting the context of use (e.g. is the base found predominantly in helices or in single-stranded regions, or does it tend to close a particular type of loop?) of the four base types in the comparatively derived secondary structures. We speculate that this could provide a basis of explanation for the behavior of %G+C as a predictor of score, and yet at the same time, our bio- chemical perspective of rRNA would expand in a funda- mental way by learning how each of the four base types are ‘used’ in building secondary structure in each of the five phylogenetic groups. In closing, we have presented in this paper the results of the folding of 72 phylogenetically and structurally distinct 23S rRNA sequences by the Zuker–Turner method. Using averages of scores, we have pointed out trends in the accuracy of the prediction of the method by grouping the sequences into five phylogenetic classes and by sub- grouping the classes according to base pair distance and loop closing category. This showed that base pairs that are better understood (i.e. short-range and hairpin loop 428 Folding & Design Vol 1 No 6
  • 11. closing base pairs) tend to be better predicted. To begin to understand how the Zuker–Turner method can be improved and, just as important, to probe the biology of 16S and 23S rRNA, we identified three semiquantitative predictors of score (i.e. % noncanonical base pairs, % hairpin loops that are stable tetraloops, and %G+C). Finally, by assuming that the global minimization of free energy is the ‘driving force’ behind the folding of 16S and 23S rRNA, we have discussed the results and showed that much work remains to be done in this field. Materials and methods Diagrams of the 72 comparatively derived secondary structures of 23S rRNA used here are publically available at: http://pundit. colorado.edu:8080/root.html. Each diagram shows the comparatively inferred base pairs and the base pairs that were correctly predicted by the energetically optimal folding of the Zuker–Turner method. The most immediate conclusion upon review of these structures is that the base pairs that are correctly predicted are ‘short range’ with respect to their positions in the primary sequence (i.e. < 100 bases apart). For each of the 72 sequences, the score of the energetically optimal folding, the score of the best suboptimal folding, the sequence length, and the other values used in the Results section are also available. Notation and abbreviations Both 23S and 23S-like rRNA are referred to as 23S rRNA. Likewise, 16S and 16S-like rRNA are both referred to as 16S rRNA. Canonical base pairs are Watson–Crick base pairs or GU or UG base pairs; any other type of base pair is noncanonical. For the purposes of denoting trends in the text we use ‘>’ in its standard role as ‘greater than’, except that ‘=’ will be used whenever the values of two scores come within 90% of each other. We use the word ‘semiquantitative’ to describe our results because we utilize linear correlation coefficients without provid- ing or discussing the associated regression line. Comparatively derived secondary structures The comparatively derived secondary structures upon which all the scores were based were secondary structures derived from compara- tive analysis [3–5]. The 72 23S rRNA sequences used in this paper and their respective comparatively derived secondary structures were selected from recent compilations [12,13] or from data in preparation for publication by MN Schnare, SH Damberger, MW Gray, and RR Gutell. The sequences were selected so that phylogenetically and structurally distinct comparatively derived secondary structures would be represented. The sequences were similar in phylogenetic position- ing and degree of structural variation to the 16S rRNA sequences ana- lyzed previously [11]. The secondary structure diagrams were drawn with the program XRNA (Weiser and Noller; ftp://fangio.ucsc.edu/pub/XRNA/) on Sun Sparc workstations. Zuker–Turner method The Zuker–Turner method [6–9] used here was version 2.2 of the MFOLD program and the free energy values were those for 37°C. At the time of writing the MFOLD package was available at ftp://snark.wustl.edu/pub/ and the MFOLD manual could be found at http://ibc.wustl.edu/~zuker/seqanal/. We implemented MFOLD on a Sun Sparc 10/51 and we set the user-defined parameters required for the operation of MFOLD to the following values: % for sort (P%) = 10, number of trace-backs = 50, and window size = 20. Each sequence was submitted to MFOLD in its full-length form rather than being broken into its known domains with subsequent submission of these domain subsequences. Although this practice increased the time of computation by several-fold (each folding took approximately 48 to 72 and 120 megabytes of memory), it did emphasize our pretense of having no prior knowledge of the biologically relevant secondary struc- ture. It is noted that the Zuker–Turner method has been used in a way in which coaxial stacking and some noncanonical base pairs were addressed [21]; we did not use this version. Linear correlation coefficients. Our goal was to find semiquantitative predictors of the accuracy of the Zuker–Turner method and this ultimately required us to interpret scatter plots of score versus the corresponding values of the suspected pre- dictors. To police our qualitative interpretation of the scatter plots, we computed the linear correlation coefficient without regression analysis for each plot under consideration. Naturally, one does not expect that the hypothetical function relating the score to any particular predictor need be linear; nevertheless, from the multitude of standard statistics that can be employed in any study, the extent to which our scatter plots support a linear relation was a frugal choice. Letting x and y represent the two values to be compared and with the bar denoting the mean value the linear correlation coefficient was [22]: ⌺ (xi – x–) (yi – y–) √⌺ (xi – x–)2 ⌺ (yi – y–)2 (1) The linear correlation coefficients were computed in Fortran 77 on a Sun Sparc workstation and the level of significance for each linear cor- relation coefficient was determined from Table 3 in the appendix of [22]. Acknowledgements This work was supported by grants from the NIH (GM48207) and the Col- orado RNA Center. The WH Keck Foundation is thanked for its support of RNA research at the University of Colorado. RR Gutell is an associate in the Program in Evolutionary Biology of the Canadian Institute for Advanced Research. We thank Danielle Konings for a great deal of assistance in all the phases of this project. We also thank the referees for their constructive criticisms. References 1. Hill, W.E., Dahlberg, A., Garrett, R.A., Moore, P.B., Schlessinger, D. & Warner, J.R. (eds). (1990). The Ribosome: Structure, Function, and Evolution. American Society for Microbiology, Washington, DC. 2. Nierhaus, K.H., Franceschi, F., Subramanian, A.R., Erdmann, V.A. & Wittmann-Liebold, B. (eds). (1993). The Translational Apparatus: Structure, Function, Regulation, Evolution. Plenum Press, New York. 3. Gutell, R.R., Larsen, N. & Woese, C.R. (1994). Lessons from an evolv- ing rRNA: 16S and 23S rRNA structures from a comparative perspec- tive. Microbiol. Rev. 58, 10–26. 4. Woese, C.R. & Pace, N.R. (1993). Probing RNA structure, function and history by comparative analysis. In The RNA World. (Gesteland, R.F., Atkins, J.F., eds), pp. 91–117, Cold Spring Harbor Laboratory Press, Plainview NY. 5. Gutell, R.R. (1996). Comparative sequence analysis and the structure of 16S and 23S rRNA. In Ribosomal RNA: Structure, Evolution, Pro- cessing, and Function in Protein Biosynthesis. (Zimmermann, R.A., Dahlberg, A.E., eds), pp. 111–128, CRC Press, Boca Raton FL. 6. Jaeger, J.A., Turner, D.H. & Zuker, M. (1989). Improved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A. 86, 7706–7710. 7. Jaeger, J.A., Turner, D.H. & Zuker, M. (1990). Predicting optimal and suboptimal secondary structure for RNA. In Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. (Doolit- tle, R.F., ed.), vol. 183, pp. 281–306, Academic Press, San Diego CA. 8. Zuker, M. (1989). On finding all suboptimal foldings of an RNA mole- cule. Science 244, 48–52. 9. Zuker, M. (1989). The use of dynamic programming algorithms in RNA secondary structure prediction. In Mathematical Methods for DNA Sequences. (Waterman, M.S., ed.), pp. 159–184, CRC Press, Boca Raton FL. 10. Woese, C.R., Kandler, O. & Wheelis, M.L. (1990). Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eucarya. Proc. Natl. Acad. Sci. U.S.A. 87, 4576–4579. 11. Konings, D.A.M. & Gutell, R.R. (1995). A comparison of thermody- namic foldings with comparatively derived structures of 16S and 16S- like rRNAs. RNA 1, 559–574. 12. Gutell, R.R., Gray, M.W. & Schnare, M.N. (1993). A compilation of large subunits (23S and 23S-like) ribosomal RNA structures: 1993. Nucleic Acids Res. 21, 3055–3074. Research Paper Analysis of 23S rRNA Fields and Gutell 429
  • 12. 13. Schnare, M.N., Damberger, S.H., Gray, M.W. & Gutell, R.R. (1996). Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23S-like) ribosomal RNA. J. Mol. Biol. 256, 701–719. 14. Zuker, M., Jaeger, J.A. & Turner, D.H. (1991). Comparison of optimal and suboptimal RNA secondary structures predicted by free energy minimization with structures determined by phylogenetic comparison. Nucleic Acids Res. 19, 2707–2714. 15. Woese, C.R., Winker, S. & Gutell, R.R. (1990). Architecture of riboso- mal RNA: constraints on the sequence of “tetra-loops”. Proc. Natl. Acad. Sci. U.S.A. 87, 8467–8471. 16. Cornish-Bowden, A. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13, 3021–3030. 17. Saenger, W. (1984). Principles of Nucleic Acid Structure. p. 146, Springer-Verlag, New York. 18. Herschlag, D. (1995). RNA chaperones and the RNA folding problem. J. Biol. Chem. 270, 20871–20874. 19. Higgs, P.G. & Morgan, S.R. (1995). Thermodynamics of RNA folding. When is an RNA molecule in equilibrium? In Advances in Artificial Life: Proceedings of the Third European Conference on Artificial Life. (Moran, F., Moreno, A., Merelo, J.J., Chacon, P., eds), In Lecture Notes in Artificial Intelligence, vol. 929, pp. 852–861, Springer Verlag, New York. 20. Zuker, M. & Jacobson, A.B. (1995). “Well-determined” regions in RNA secondary structure prediction: analysis of small subunit ribosomal RNA. Nucleic Acids Res. 23, 2791–2798. 21. Walter, A.E., et al., & Zuker, M. (1994). Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl. Acad. Sci. U.S.A. 91, 9218–9222. 22. Taylor, J.R. (1982). An Introduction to Error Analysis. University Science Books, Mill Valley CA. 430 Folding & Design Vol 1 No 6 Because Folding & Design operates a ‘Continuous Publication System’ for Research Papers, this paper has been published via the internet before being printed. The paper can be accessed from http://biomednet.com/cbiology/fad.htm — for further information, see the explanation on the contents page.