Gutell 081.cosb.2002.12.0301

301
The determination of the 16S and 23S rRNA secondary structure
models was initiated shortly after the first complete 16S and
23S rRNA sequences were determined in the late 1970s. The
structures that are common to all 16S rRNAs and all 23S rRNAs
were determined using comparative methods from the analysis
of thousands of rRNA sequences. Twenty-plus years later, the
16S and 23S rRNA comparative structure models have been
evaluated against the recently determined high-resolution crystal
structures of the 30S and 50S ribosomal subunits. Nearly all of
the predicted covariation-based base pairs, including the regular
base pairs and helices, and the irregular base pairs and tertiary
interactions, were present in the 30S and 50S crystal structures.
Addresses
*Institute for Cellular and Molecular Biology, and Section of Integrative
Biology, University of Texas, 2500 Speedway, Austin,
Texas 78712-1095, USA; e-mail: robin.gutell@mail.utexas.edu
†Division of Medicinal Chemistry, College of Pharmacy, University of
Texas, Austin, Texas 78712, USA; e-mail: hanbau@pundit.icmb.utexas.edu
‡Institute for Cellular and Molecular Biology, University of Texas,
2500 Speedway, Austin, Texas 78712-1095, USA;
e-mail: cannone@mail.utexas.edu
Correspondence: Robin R Gutell
Current Opinion in Structural Biology 2002, 12:301–310
0959-440X/02/$ — see front matter
© 2002 Elsevier Science Ltd. All rights reserved.
Abbreviations
CRW Comparative RNA Web
PDB Protein Data Bank
Introduction: the grand challenge
One of the grand challenges in science is the RNA folding
problem. The computational aim is to be able to fold a
linear sequence of nucleotides into its biologically active
three-dimensional structure. The challenge is to distinguish
the correct base pairings and helices from the large number
of possible interactions. For 16S rRNA, a molecule
1500 nucleotides in length, there are approximately 15,000
possible helices, with less than 100 of these in the final
structure. The 23S rRNA is about twice the length of
16S rRNA, with about 50,000 possible helices, of which
150 are in the final structure. A possible set of unique,
nonoverlapping helices, or portions of them, are assembled
to form a single structure model. The maximum number of
combinatorial arrangements of all possible helices is very
(very) large, with about 4.3 × 10393 possible structure models
for 16S rRNA and about 6.3 × 10740 for 23S rRNA.
To identify the correct structure from these large numbers
of possible base pairings, helices and structure models, we
need the basic rules of RNA structure, or constraints, that
define the following:
1. All of the possible RNA structural motifs (e.g. base pair,
helix, hairpin loops, etc.).
2. The mappings and associations between each of these
structural elements, and the permissible arrangements
and composition of the nucleotides that form that element
(a ‘many-to-one problem’).
3. The organization and arrangement of these structural
elements with one another, both locally and globally across
the entire RNA structure.
4. The thermodynamic energetics associated with the proper
folding of the RNA molecule.
5. Other factors influencing RNA folding, including protein
binding (e.g. chaperones and ribosomal proteins) and the
rates of folding during transcription.
6. The relative contributions of these rules to the process
of folding the RNA and to the structure that participates
in its function.
Our appreciation of these dynamics of RNA folding,
beyond our understanding of the basic building blocks of
RNA structure (the canonical base pairs, G•C, A•U and
G•U, and the arrangement of these base pairs into helices),
is rudimentary. Consequently, we do not have sufficient
constraints at this time to accurately and reliably predict
the correct RNA higher-order structure from its underlying
sequence. The program mfold [1,2], the most successful
of the RNA folding algorithms that predict secondary
structure from the underlying sequence, integrates
thermodynamic base-pairing rules with a helix identifica-
tion and selection scheme. Although the prediction of
RNA secondary structure from the analysis of a single
sequence has improved significantly, this computer
program, with its inherent folding criteria, still does not
consistently and unambiguously determine the correct
secondary structure [2–6]. Beyond the prediction of the
base pairings in the secondary structure, tertiary interactions
that are layered onto the secondary structure are even
harder to predict because of the larger number of less
defined structural components.
Beginning in the late 1970s, our specific goals were to
predict the structure of the 16S and 23S rRNAs, the major
RNA components in the 30S and 50S ribosomal subunits,
respectively. These RNAs are complexed with ribosomal
proteins and are intimately associated with protein synthesis.
An understanding of their secondary and tertiary structures
will lay the foundation for our future understanding and
appreciation of their functions.
In contrast to the RNA folding algorithms, which utilize
thermodynamic information on consecutive base pairs and
other small structural elements, an alternative method,
The accuracy of ribosomal RNA comparative structure models
Robin R Gutell*, Jung C Lee† and Jamie J Cannone‡

comparative analysis, is based on a very simple and
profound principle. This method has been utilized to predict
the secondary structure and the early stages of the tertiary
structure of several RNA molecules, including the rRNAs.
In addition to these structure predictions, the comparative
approach has also revealed new information about RNA
structural motifs and other principles of RNA structure.
Inferring higher-order structure from patterns
of sequence variation
Shortly after the first tRNA sequence was determined [7],
it was rationalized from a comparative perspective that all
tRNA sequences should have equivalent secondary and
tertiary structures to allow them to interact with the same
binding sites on the ribosome and with the same set of
proteins and RNAs during protein synthesis. Two basic
principles form the foundation for the comparative analysis
of RNA structure: firstly, different RNA sequences can
fold into the same secondary and tertiary structures and,
secondly, the unique structure and function of an RNA
molecule is maintained through the evolutionary process
of mutation and selection. We utilized this comparative
paradigm for the prediction of the 16S and 23S rRNA
structures. We assumed that all 16S (and 16S-like) and 23S
(and 23S-like) rRNAs have the same general secondary and
tertiary structures, regardless of the extent of conservation
and variation among the sequences. The correct helices
that have been identified using comparative analysis are
present in the same homologous region of the rRNAs and
have variation in the composition of the sequences, whilst
maintaining G•C, A•U and G•U base pairs. Initially, we
identified base-paired positions within a potential helix that
have ‘covariation’ (similar patterns of variation) in a set of
sequences aligned for maximum sequence identity [8–10].
Proposed helices with two or more covariations were
considered ‘proven’. Versions of the 16S and 23S rRNA
structure models from the early 1980s (Santa Cruz/Urbana
versions) are shown in Figure 1. The majority of the helices
in these early structure models had at least one covariation
per helix. We considered this model to be the minimal
structure, that is, there were areas that were incomplete.
Two other sets of 16S and 23S rRNA structure models
were determined independently with comparative methods
[11–14], whereas another set of model diagrams was adapted
in full from previously proposed structure models [15–17].
Subsequently, as the number of sequences in our 16S and
23S rRNA alignments surpassed 25, we developed different
algorithms and computer programs to identify positions in
an alignment that have similar patterns of variation [18–20].
Given this series of improvements in the covariation
algorithms, coupled with very dramatic increases in the
302 Nucleic acids
Figure 1
I
II
III
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
1600
1640
2900
5’ 3’
3’ half
10
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
5’
3’
I
II
III
IV
V
VI
5’
3’
1650
1700
1750
1800
1850
1900
1950
2000
2050
2100
2150
2200
2250
2300
2350
2400
2450
2500
2550
2600
2650
2700
2750
2800
2850
2900
5’ half
(a) (b) (c)
Current Opinion in Structural Biology
The original (1980–81) Noller-Woese-Gutell comparative structure
models for the 16S and 23S rRNAs. (a) 16S rRNA (adapted from
[8]). (b) 23S rRNA, 5′ half (adapted from [9]). (c) 23S rRNA, 3′ half
(adapted from [9]). E. coli (GenBank accession number J01695) is
used as the reference sequence. Each of these models has been
superimposed onto the corresponding current model diagrams to
highlight the similarities and differences. Nucleotides are replaced with
colored dots: black, positions that are unchanged between the original
and current models; blue, base pairs present in the original models
but absent from the current models; red, positions that are unpaired in
the original models but are part of a base pair in the current models;
green, positions that are part of one base pair in the original models
but are part of a different base pair in the current models. Full-page
versions of each panel are available online at
http://www.rna.icmb.utexas.edu/ANALYSIS/COSB2002/ (part of the
CRW site at http://www.rna.icmb.utexas.edu/).

number and diversity of rRNA sequences in our sequence
collection, we were able to identify more positions with
similar patterns of variation. Although the early covariation
analysis only identified those covariations that involve A•U
and G•C pairings within a potential helix, our algorithms
have, for the past ten years, identified all positional
covariations, regardless of base pair type and their types of
interchanges with other base pairs (e.g. U•U ↔ C•C,
A•A ↔ G•G, U•U ↔ G•G), and independent of the spatial
relationship with other base pairings and structural elements
[21]. Consequently, we began identifying single base pairings
not flanked by other base pairings, noncanonical base pairs
and other types of tertiary interactions (see below). In
addition to the inclusion of newly identified base pairs,
previously proposed base pairs were removed from the
structure models when the ratio of covariation to variation
dropped with increasing numbers of sequences.
To gauge the extent of positional covariation and our
confidence in the accuracy of each of these proposed base
pairs, we established a quantitative scoring method.
Higher scores reflect a greater extent of pure covariation
(simultaneous changes at both of the paired positions),
larger numbers of exchanges between a set of base pair
types that covary with one another (e.g. A•U ↔ G•C)
and/or a larger number of mutual changes or covariations
that occur during the evolution of the RNA (also called
phylogenetic events). These three parameters can,
individually or collectively, influence our confidence in a
putative base pair. For example, we were more confident
in the authenticity of the 570•866 base pair in 16S rRNA
because of several phylogenetic events within the bacteria,
archaea and eucarya [22]. These 16S and 23S rRNA
covariation-based structure models only contain those base
pairs with positional covariation or G•C, A•U or G•U base
pairs that are within a regular helix and present in more
than 80% of the sequences.
The most recent comparative structure models for 16S and
23S rRNA are shown in Figure 2 and are based on the
analysis of approximately 7000 16S and 1050 23S rRNA
sequences [21,23]. These two structure models are the
culmination of 20 years of comparative analysis (see
below). The base pair symbols are color coded to reveal our
confidence in the authenticity of that base pair; base pairs
with the highest covariation scores are shown in red,
followed by green and black. Base pairs with gray symbols
are conserved in more than 98% of the sequences, whereas
Ribosomal RNA comparative structure models Gutell, Lee and Cannone 303
Figure 2
I
II
III
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
1600
1640
2900
5’ 3’
3’ half
(2407-2410)
(2010-2011)
(2018)
(2057/2611 BP)
(2016-2017)
A
IV
V
VI
5’
3’
1650
1700
1750
1800
1850
1900
1950
2000
2050
2100
2150
2200
2250
2300
2350
2400
2450
2500
2550
2600
2650
2700
2750
2800
2850
2900
5’ half
(1269-1270)
(413-416)
(1262-1263)
(746)
(531)
10
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
5’
3’
I
II
III
A
(a) (b) (c)
The current Noller-Woese-Gutell comparative structure models for the
16S and 23S rRNAs. (a) 16S rRNA. (b) 23S rRNA, 5′ half. (c) 23S
rRNA, 3′ half. E. coli (GenBank accession number J01695) is used as
the reference sequence. Nucleotides are replaced with colored dots
that represent confidence in the base pair: red, high covariation scores;
green, lower but significant covariation scores occurring within a
standard helix containing a red base pair; black, even lower covariation
scores occurring within a standard helix containing a red base pair;
gray, conserved in more than 98% of the sequences occurring within
a standard helix containing a red base pair; blue, do not have a significant
amount of pure covariation and do not occur within a standard helix (see
[23] for additional details). Base pair symbols indicate the type of base
pair: line, canonical base pair; small closed circle, G•U base pair; large
open circle, G•A base pair; large closed circle, other noncanonical
base pairs. Nucleotides involved in tertiary interactions (including
pseudoknots) are boxed and connected with lines. Diagrams adapted
from [23]. Full-page versions of each panel are available online at the
CRW site (http://www.rna.icmb.utexas.edu/ANALYSIS/COSB2002/).

blue base pairs do not have a significant amount of pure
covariation and do not occur within a standard helix
(see [23] for more details). As the majority of the base pairs
have red symbols, we believe that nearly all of the base
pairs in the current 16S and 23S rRNA covariation-based
structure models are correct (see below).
The evolution of the 16S and 23S rRNA covariation-based
structure models is shown graphically in Figure 1 and
quantitatively in Table 1. To allow easy comparison with the
current models, the original 1980–81 16S and 23S rRNA
structure models were redrawn using the current models as
a template (Figure 1). Base pairs that are present in both the
original and current models are shown in black, and those
that are different in the original structure models and the
most recent covariation-based structure models are illustrated
in blue, red and green. Blue base pair symbols indicate base
pairs in the original models that are absent from the current
models, red nucleotides are unpaired in the original models
and paired in the current models, and green nucleotides are
part of different base pairs in the two structure models.
In 1980–81, the 16S and 23S rRNA structure models were
based on just two complete rRNA sequences per structure;
at the end of 1999, this work culminated with the analysis of
approximately 7000 16S and 1050 23S rRNA sequences.
These structure models evolved over nearly 20 years as the
collection of sequences grew and our methods to identify
and score covariations were developed and refined. To assess
the changes, the original 1980–81 structure models were
compared with the current 1999 structure models (Table 1,
adapted from Section 1b on the ‘Comparative RNA Web’
[CRW] site and database; http://www.rna.icmb.utexas.edu).
We draw four significant conclusions from this analysis.
Firstly, nearly 60% of the base pairs in the current 16S
rRNA structure model were predicted from the analysis
of two sequences for the original structure model; nearly
78% of the current 23S rRNA base pairings were predicted
from the original structure model. Secondly, in contrast,
approximately 80% of the original 16S and 87% of the
original 23S rRNA base pairs proposed in 1980–81 are
present in the current models. Thirdly, approximately 70
16S and 100 23S initial base pairs have been removed from
the original rRNA structure models. Finally, the number of
unusual, tertiary and tertiary-like base pairings that are pre-
dicted with confidence increases in parallel with increases
in the number and diversity of rRNA sequences studied
and with improvements in the covariation algorithms. In
conclusion, the major components of the 16S and 23S
rRNA structure models were predicted correctly from the
analysis of just a few 16S and 23S rRNA sequences that are
approximately 75% similar to one another. Thousands of
additional rRNA sequences with significant degrees of
similarity and diversity with one another were subsequently
analyzed with covariation analysis to refine the secondary
structure models, to begin to identify tertiary base pairs and
to establish a system to measure the extent of covariation at
all of the proposed base pairs. Beyond the prediction of
base pairs with covariation analysis, the comparative
sequence and structure data are encrypted with fundamental
principles of RNA structure and archaeological markers
that indicate the ancestry of that RNA sequence [24].
Our next task is to decipher these ‘treasures’ from the
comparative RNA sequence and structure data sets. To
this end, we have established the CRW site and database
([23]; http://www.rna.icmb.utexas.edu/) to organize, analyze
and disseminate comparative data for the 5S, 16S (and
16S-like) and 23S (and 23S-like) rRNAs, group I and II
introns, and tRNAs. The main types of information and
data available online for each of these RNAs are: the current
comparative RNA structure model; nucleotide and base
pair frequency tables for all positions in the reference
structures; secondary structure conservation diagrams that
reveal the extent of conservation of the RNA sequence
and structure; more than 400 representative secondary
structure diagrams for organisms from groups that span the
phylogenetic tree and reveal the major forms of structural
variation; nearly 12,000 publicly available sequences that
are 90% or more complete; and sequence alignments.
304 Nucleic acids
Table 1
Summary of the evolution of the Noller-Woese-Gutell 16S and 23S rRNA structure models from the first to the most recent
covariation-based structure models (adapted from Table 3a,b in [23]).
Model 16S rRNA 23S rRNA
Date 1980 1999 1981 1999
1. Approximate number of complete sequences 2 7000 2 1050
2. Percentage of 1999 sequences* 0.03 100 0.2 100
3. Number of bp proposed correctly* 284 478 676 870
4. Number of bp proposed incorrectly* 69 0 102 0
5. Total bp in model (3 + 4) 353 478 778 870
6. Percentage of bp in model present in the current model (3 / X)*†
59.4 100 77.7 100
7. Accuracy of proposed bp (3 / 5) 80.5 100 86.9 100
8. Number of bp in current model missing from this model (X – 3)*†
194 0 194 0
9. Number of tertiary bp proposed correctly* 4 40 4 65
10. Percentage of tertiary bp proposed correctly* 10.0 100 6.2 100
11. Number of base triples proposed correctly* 0 6 0 7
12. Percentage of base triples proposed correctly* 0 100 0 100
*Comparisons are made against the current (1999) models. †
X = 478 for 16S rRNA; X= 870 for 23S rRNA. bp, base pairs.

This type of comparative data is the foundation for the
subsequent identification and analysis of RNA structural
motifs. Although the patterns of variation at both positions
in many of the base pairs in the RNA structure are similar
and thus should be identified with covariation analysis,
other sets of base pairs do not have similar patterns of
variation at the two interacting positions. Thus, one of the
larger goals of comparative analysis is to predict those base
pairs lacking similar patterns of variation that occur in
several different types of structural elements, as well as
those base pairs with positional covariation that are conserved
among the sequences in that data set. The process of
comparative analysis, then, is to first predict base pairings
with covariation analysis, followed by the identification of
motifs that are composed of unique arrangements of
sequences within specific structural elements. Several
RNA structural motifs have been identified and/or are still
being defined from sequence and structure perspectives.
These motifs include:
1. Unpaired adenosines in the covariation-based structure
model [18,25•].
2. Tetraloops — hairpin loops with four nucleotides that are
composed of specific sequences [26].
3. Tetraloop receptors and other tertiary interactions involving
tetraloops [27–30].
4. Dominant G•U base pairs [31,32].
5. Tandem G•A oppositions [33,34].
6. Base triples [20].
7. Adenosine platforms [25•,35].
8. U-turns [36].
9. E loops (or S turns) [25•,37,38].
10. E-like loops [25•].
11. Cross-strand purine stacks [39].
12. A•A and A•G oppositions/base pairs at the ends of
helices [10,40,41•].
13. Lone pair triloops ([21]; RR Gutell et al., unpublished
data).
14. A-minor motif [42•,43•].
15. Kink-turn [44•].
Crystal structures of the 16S and 23S rRNAs:
the accuracy of the rRNA comparative
structure models
To assess the accuracy of the covariation-based structure
models, the comparative models for tRNA [19,20,45–50],
fragments of 5S rRNA [51], the L11-binding region of
23S rRNA [9,21,23] and the group I intron [52,53] were
compared with the corresponding high-resolution crystal
structures [39,54–58]. Nearly all of the secondary structure
base pairings and a few of the tertiary base pairs observed
in the crystal structure were predicted in the comparative
structure models for all of these RNAs. More recently, the
high-resolution crystal structures of the 30S [59••,60] and
50S [61••] ribosomal subunits were solved, giving us the
opportunity to evaluate the accuracy of our most recent
16S and 23S rRNA structure models. The results were
again affirmative: approximately 97–98% of the base
pairings predicted with covariation analysis (in the final
covariation-based structure models) are indeed present
in the 16S and 23S rRNA crystal structures (Table 2;
RR Gutell et al., unpublished data). The accuracy of the 16S
and 23S rRNA covariation-based structure prediction not
only augments the credibility of the comparative approach,
but it also validates the sequence alignments that have
been initiated, refined and expanded over the past 20 years,
the initial covariation analysis and our subsequent
Table 2
Comparison of the current comparative structure models and the crystal structures of the 16S and 23S rRNAs*.
16S rRNA†
23S rRNA‡
Total
Predicted base pairs§
Model CB #
461 / 476 / 97% 779 / 797 / 98% 1240 / 1273 / 97%
Tentative CB#
8 / 23 / 35% 18 / 36 / 50% 26 / 59 / 44%
Motif-based¶
45 / 65 / 70% 86 / 122 / 70% 131 / 187 / 70%
Crystal structure interactions¥
+/+ base–base 514 883 1397
–/+ base–base 56 425 481
Total base–base 683 1297 1862
Base–backbone 49 237 286
*A more complete analysis will be presented later (RR Gutell et al., unpublished data). †
T. thermophilus, GenBank accession number M26923,
PDB code 1FJF [59
]. ‡
H. marismortui, GenBank accession number AF034620, PDB code 1JJ2 [61
]. §
Data are shown as approximate
number of base pairs present in the crystal structure / approximate number of predicted base pairs / percentage of predicted base pairs
present in the crystal structure. #
CB, covariation-based. ¶
The motifs analyzed here are AA.AG@helix.ends [41
], tandem GA [33,34], E and
E-like loops [25
], lone pair triloops (RR Gutell et al., unpublished data) and base triples [20]. ¥
Approximate numbers of interactions in the two
ribosomal crystal structures.

covariation algorithms and their refinements. In addition
to the final covariation-based structure model, nearly 45%
of the tentative covariation-based base pairs and 70% of
the motif-based base pairs that were predicted are in the
crystal structure (Table 2). In total, about 90% of the base
pairs predicted by comparative analysis are from the
covariation-based analysis and 10% are from the alternative
motif-based analysis ([20,25•,33,34,41•]; RR Gutell et al.,
unpublished data).
The secondary structure diagrams for Thermus thermophilus
16S rRNA and Haloarcula marismortui 23S rRNA are shown
in Figure 3. All of the base–base and base–backbone
interactions in the 30S [59••] and 50S [61••] ribosomal
subunit crystal structures are colored to reflect the initial
identification of each pairing. The three primary categories
are: present in both the comparative model (covariation
and motif analysis) and the crystal structure (+/+), present
in the comparative model but not in the crystal structure
(+/–), and not present in the comparative model but
present in the crystal structure (–/+). The nucleotides and
base pair symbols are colored red for +/+, green for +/–,
blue for –/+ base–base interactions and brown for –/+
base–backbone interactions.
The affirmative base pairs that were predicted using
covariation analysis (see red nucleotides and base pair
symbols in Figure 3) include: essentially all base pairs that are
strictly homologous between the E. coli reference structure
models and the T. thermophilus 16S and H. marismortui 23S
rRNA crystal structures that have a significant amount of
positional covariation; base pairs that are standard
Watson–Crick (G•C and A•U) and G•U base pair
exchanges; base pairs that occur within standard secondary
structure helices (2 base pairs in length) that are nested
(i.e. not a pseudoknot); individual base pairs and helices
306 Nucleic acids
Figure 3
Comparison of the current Noller-Woese-Gutell
comparative structure models for the 16S and
23S rRNAs with the corresponding ribosomal
subunit crystal structures. (a) 16S rRNA
versus the T. thermophilus structure
(GenBank accession number M26923;
PDB code 1FJF; [59••]). (b) 23S rRNA,
5′ half versus the H. marismortui structure
(GenBank accession number AF034620;
PDB code 1JJ2; [61••]). (c) 23S rRNA, 3′ half
versus the H. marismortui structure (GenBank
accession number AF034620; PDB code 1JJ2;
[61••]). Nucleotides are replaced with colored
dots that show the sources of the
interactions: red, present in both the
covariation-based structure model and the
crystal structure; green, present in the
comparative structure and not present in
the crystal structure; blue, not present in
the comparative structure and present in the
crystal structure; magenta, present in the
covariation-based tentatives or motif-based
analysis, and present in the crystal structure;
brown, base–backbone or
backbone–backbone interactions; purple,
positions that are unresolved in the crystal
structure. Colored open circles around
positions show the third nucleotide of base
triples and colored open rectangles show the
base pairs of base triples. Colored open squares
are used for clarity. Full-page versions of each
panel are available online at the CRW site
(http://www.rna.icmb.utexas.edu/ANALYSIS/
COSB2002/).
5’
3’
50
100
150
200
250
300
350
400
450 500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
(a)

Figure 3 continued
3’half
5’
3’
5’
3’
5’3’
b
b
a
a
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
1600
1650
1700
G
E
D
C
B
A
F
F
B
E
I
C
H
J
K
D
2469
2119
2430
2264
2265
2263
2101
2537
2060
2384
2477
2111
1840
1737
2113
2274
1833
1835
1843
2465
2280
2395
2283
2492
2530
2071
2531
2078
2070
2521
2500
2499
2524
25512526
2550
2604
2079
2080
2101
25371725
2044
1723
1736
2043
1725
2051
1867
27342735
2744
2745
2060
2075
2082
2660
2056
2055
2394
2068
25292300
2301
2307
2055
2022
1830
2074
2480
2107
2279
2302
2520
2523
2070
2498
2523
1866
2070
1920
2491
2396
2110
1836
2066
2453
1832
1865
2075
2786
3952443
A4182449
7362406
2098
8852113
8952097
13662058
13712054
13732052
839
G
5372059
15612739
I
H
857
L
1831
2472
2077
2298
2297
2311
2084
2085
J
K
L
2623
5’half
5’
3’
E
1750
1800
1850
19001950
2000
2050
2100
2150
2200
2250
2300
2350
2400
2450
2500
2550
2600
2650
2700
2750
28002850
2900
D
B
F
A
A
B
C
D
E
F
G
H
I
J
K
L
C
900
1054
168
169
221
387
388
403
634
636
738
767
768
832
836
869
869
875
876
876
919
927
1057
1058
1059
1060
1078
1133
1230
1232
1232
1233
1233
1239
1359
1375
1380
1432
1476
1713
1714
1714
11531561
532
535631
619
1079
1234
629
1128
538
1373
840
1369
1368
692
923
767
923
1058
1062
820
874
778
1468
1475
419
830
1014
1063
1052
917
928
11331231
1130
1056
536
1231
921
G
922
I
1359
4182449
8392098
5372059
13662058
2052
3952443
7362406
857
8852113
8952097
13712054
2739
1393
1831
H
1235
1127
1132
1374
1376
1376
885
1007
1006
1079
532
11
J
K
L
877
(b)(c)
CurrentOpinioninStructuralBiology

that form pseudoknots, including tertiary interactions;
lone pairs, including those in the lone pair triloop motif
(RR Gutell et al., unpublished data); and noncanonical
base pairs and their exchanges — A•A ↔ G•G, U•U ↔ C•C,
A•G ↔ G•A, A•C ↔ G•U, U•A ↔ G•G, A•C ↔ U•A and
A•G ↔ R•U [21].
Although more than 1250 base pairs predicted with covari-
ation analysis are in the crystal structure, approximately 35
of them are not (see green nucleotides in Figure 3; note
that the green interactions include those predicted with
both covariation analysis and motif-based analysis). The
majority of these +/– proposed covariation-based base pairs
that are absolutely homologous between the E. coli reference
models and the T. thermophilus 16S and H. marismortui 23S
rRNA structures were not predicted with our highest (red)
confidence rating. Instead, there was either no positional
covariation or an insignificant amount of these putative
base pairs; these interactions were included in the structure
model because they form a G•C, A•U or G•U pair in more
than 80% of the sequences and were adjacent to a base pair
with covariation. The majority of these +/– base pairs are
colored black, our lowest covariation confidence rating.
The aberrant base pairs that are truly homologous between
the crystal structure and the E. coli reference structure
have two other important characteristics. First, all of these
putative base pairs occur at the ends of helices and, second,
there is a bias in the types of base pairs that are not predicted
correctly at the ends of helices. The two most frequent
pairing types (in this latter category) are U•G and U•A
(where the U is at the 5′ half of the helix). These putative
base pairs might not occur in the rRNA structure or,
alternatively, they might be dynamic and are paired at
certain stages of protein synthesis and not in the states of
the crystal structures analyzed here. There is a precedent
for conformational changes of the base pairings at the ends
of helices. Positions 1408 and 1493 form an A•A base pair
in the uncomplexed 30S ribosomal subunit (PDB code
1FJF; [59••]), but are not paired when tRNA and mRNA
are complexed to the 30S subunit [62]. We speculate that
other A•A and A•G oppositions/base pairs at the ends of
helices in the 16S and 23S rRNAs might be involved in
conformational changes [41•]. There is also an interesting
anecdote about the putative U•A pairings that are not in
the crystal structure. The orientation of these U•A pairs
would place the conserved, ’unpaired’ adenosine at the
3′ end of the loop, a very common arrangement in the 16S
and 23S rRNAs [25•].
We will not know all of the structural possibilities for these
putative base pairings until we obtain more crystallographic,
NMR or other experimental data for these regions of the
rRNA. Although comparative analysis has predicted
approximately 510 16S and 880 23S rRNA base pairs, an
additional ~170 16S and ~415 23S rRNA base pairs
(base–base) are in the crystal structure that were not
predicted with comparative methods. Essentially, none of
these ‘–/+’ base pairs has a significant amount of positional
covariation and thus could not be predicted with covariation
analysis. In general, these ‘–/+’ base pairs comprise
noncanonical base pairs that are not associated with
standard helices that were predicted with covariation
analysis. A more detailed comparison between the compar-
ative and crystal structures will be presented elsewhere
(RR Gutell et al., unpublished data).
Conclusions
Covariation analysis has accurately predicted all of the
standard secondary structure base pairings and helices in
the 16S and 23S rRNA crystal structures. These methods
have also identified some of the 16S and 23S rRNA tertiary
base–base interactions. Motif-based analysis has begun to
identify some of the base pairs that do not have similar
patterns of variation. Our future goal is to gain a better
understanding of tertiary base–base interactions from a
comparative perspective and, more specifically, to determine
their base pair types and exchanges, and the types of
structural elements or motifs with which they are associated.
A more complete set of RNA structure constraints is
necessary to accurately and reliably predict an RNA structure
from its underlying sequence, and to understand the
dynamics between structure and function.
Acknowledgements
This work was supported by the National Institutes of Health (GM48207),
by the Welch Foundation (F-1427) and by start-up funds from the Institute
for Cellular and Molecular Biology at the University of Texas at Austin.
References and recommended reading
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
••of outstanding interest
1. Zuker M: On finding all suboptimal foldings of an RNA molecule.
Science 1989, 244:48-52.
2. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequence
dependence of thermodynamic parameters improves prediction
of RNA secondary structure. J Mol Biol 1999, 288:911-940.
3. Zuker M, Jaeger JA, Turner DH: A comparison of optimal
and suboptimal RNA secondary structures predicted by
free energy minimization with structures determined
by phylogenetic comparison. Nucleic Acids Res 1991,
19:2707-2714.
4. Zuker M, Jacobson AB: ‘Well-determined’ regions in RNA
secondary structure prediction: analysis of small subunit
ribosomal RNA. Nucleic Acids Res 1995, 23:2791-2798.
5. Konings DAM, Gutell RR: A comparison of thermodynamic
foldings with comparatively derived structures of 16S and
16S-like rRNAs. RNA 1995, 1:559-574.
6. Fields DS, Gutell RR: An analysis of large rRNA sequences
folded by a thermodynamic method. Fold Des 1996,
1:419-430.
7. Holley RW, Apgar J, Everett GA, Madison JT, Maquisee M, Merrill SH,
Penswick JR, Zamir A: Structure of a ribonucleic acid. Science
1965, 147:1462-1465.
8. Woese CR, Magrum LJ, Gupta R, Siegel RB, Stahl DA, Kop J,
Crawford N, Brosius J, Gutell R, Hogan JJ et al.: Secondary structure
model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic
and chemical evidence. Nucleic Acids Res 1980, 8:2275-2293.
9. Noller HF, Kop J, Wheaton V, Brosius J, Gutell RR, Kopylov AM,
Dohme F, Herr W, Stahl DA, Gupta R et al.: Secondary structure
model for 23S ribosomal RNA. Nucleic Acids Res 1981,
9:6167-6189.
308 Nucleic acids

10. Woese CR, Gutell R, Gupta R, Noller HF: Detailed analysis of the
higher-order structure of 16S-like ribosomal ribonucleic acids.
Microbiol Rev 1983, 47:621-669.
11. Stiegler P, Carbon P, Zuker M, Ebel JP, Ehresmann C: Secondary and
topographic structure of ribosomal RNA 16S of Escherichia coli.
C R Seances Acad Sci D 1980, 291:937-940.
12. Glotz C, Zwieb C, Brimacombe R, Edwards K, Kossel H:
Secondary structure of the large subunit ribosomal RNA from
Escherichia coli, Zea mays chloroplast, and human and mouse
mitochondrial ribosomes. Nucleic Acids Res 1981, 9:3287-3306.
13. Zwieb C, Glotz C, Brimacombe R: Secondary structure
comparisons between small subunit ribosomal RNA molecules
from six different species. Nucleic Acids Res 1981, 9:3621-3640.
14. Branlant C, Krol A, Machatt MA, Pouyet J, Ebel JP, Edwards K,
Kossel H: Primary and secondary structures of Escherichia coli
MRE 600 23S ribosomal RNA. Comparison with models of
secondary structure for maize chloroplast 23S rRNA and for large
portions of mouse and human 16S mitochondrial rRNAs. Nucleic
Acids Res 1981, 9:4303-4324.
15. Huysmans E, De Wachter R: Compilation of small ribosomal subunit
RNA sequences. Nucleic Acids Res 1986, 14(suppl):R73-R118.
16. De Rijk P, Van de Peer Y, Chapelle S, De Wachter R: Database on
the structure of large ribosomal subunit RNA. Nucleic Acids Res
1994, 22:3495-3501.
17. Van de Peer Y, De Rijk P, Wuyts J, Winkelmans T, De Wachter R:
The European small subunit ribosomal RNA database.
Nucleic Acids Res 2000, 28:175-176.
18. Gutell RR, Weiser B, Woese CR, Noller HF: Comparative anatomy
of 16-S-like ribosomal RNA. Prog Nucleic Acid Res Mol Biol 1985,
32:155-216.
19. Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD: Identifying
constraints on the higher-order structure of RNA: continued
development and application of comparative sequence analysis
methods. Nucleic Acids Res 1992, 20:5785-5795.
20. Gautheret D, Damberger SH, Gutell RR: Identification of
base-triples in RNA using comparative sequence analysis. J Mol
Biol 1995, 248:27-43.
21. Gutell RR: Comparative sequence analysis and the structure of
16S and 23S rRNA. In Ribosomal RNA: Structure, Evolution, Processing
and Function in Protein Biosynthesis. Edited by Dahlberg AE,
Zimmermann RA. Boca Raton: CRC Press; 1996:111-128.
22. Gutell RR, Noller HF, Woese CR: Higher order structure in
ribosomal RNA. EMBO J 1986, 5:1111-1113.
23. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D’Souza LM,
Du Y, Feng B, Lin N, Madabusi LV, Müller KM et al.: The Comparative
RNA Web (CRW) site: an online database of comparative
sequence and structure information for ribosomal, intron, and
other RNAs. BMC Bioinformatics 2002, 3:2.
24. Woese CR: Bacterial evolution. Microbiol Rev 1987, 51:221-271.
25. Gutell RR, Cannone JJ, Shang Z, Du Y, Serra MJ: A story: unpaired
• adenosine bases in ribosomal RNA. J Mol Biol 2000,
304:335-354.
Although the abundance of conserved, unpaired adenosines was revealed in
1985 [18], a more extensive comparative analysis of 16S and 23S rRNAs
substantiated the initial finding and revealed that more than 50% of the
3′ ends of loops in 16S and 23S rRNAs have an adenosine that is
conserved in more than 90% of the sequences.
26. Woese CR, Winker S, Gutell RR: Architecture of ribosomal RNA:
constraints on the sequence of tetra-loops. Proc Natl Acad Sci
USA 1990, 87:8467-8471.
27. Jaeger L, Michel F, Westhof E: Involvement of a GNRA tetraloop in
long-range RNA tertiary interactions. J Mol Biol 1994,
236:1271-1276.
28. Costa M, Michel F: Frequent use of the same tertiary motif by
self-folding RNAs. EMBO J 1995, 14:1276-1285.
29. Costa M, Michel F: Rules for RNA recognition of GNRA tetraloops
deduced by in vitro selection: comparison with in vivo evolution.
EMBO J 1997, 16:3289-3302.
30. Juneau K, Cech TR: In vitro selection of RNAs with increased
tertiary structure stability. RNA 1999, 5:1119-1129.
31. Gutell RR, Larsen N, Woese CR: Lessons from an evolving
ribosomal RNA: 16S and 23S rRNA structure from a comparative
perspective. Microbiol Rev 1994, 58:10-26.
32. Gautheret D, Konings D, Gutell RR: GU base pairing motifs in
ribosomal RNAs. RNA 1995, 1:807-814.
33. SantaLucia J Jr, Kierzek R, Turner DH: Effects of GA mismatches on
the structure and thermodynamics of RNA internal loops.
Biochemistry 1990, 29:8813-8819.
34. Gautheret D, Konings D, Gutell RR: A major family of motifs
involving G-A mismatches in ribosomal RNA. J Mol Biol 1994,
242:1-8.
35. Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Szewczak AA,
Kundrot CE, Cech TR, Doudna JA: RNA tertiary structure mediation
by adenosine platforms. Science 1996, 273:1696-1699.
36. Gutell RR, Cannone JJ, Konings D, Gautheret D: Predicting U-turns
in ribosomal RNA with comparative sequence analysis. J Mol Biol
2000, 300:791-803.
37. Wimberly B: A common RNA loop motif as a docking module and
its function in the hammerhead ribozyme. Nat Struct Biol 1994,
1:820-827.
38. Leontis NB, Westhof E: A common motif organizes the structure of
multi-helix loops in 16 S and 23 S ribosomal RNAs. J Mol Biol
1998, 283:571-583.
39. Correll CC, Freeborn B, Moore PB, Steitz TA: Metals, motifs, and
recognition in the crystal structure of a 5S rRNA domain. Cell
1997, 91:705-712.
40. Traub W, Sussman JL: Adenine-guanine base pairing ribosomal
RNA. Nucleic Acids Res 1982, 10:2701-2708.
41. Elgavish T, Cannone JJ, Lee JC, Harvey SC, Gutell RR:
• AA.AG@Helix.Ends: A:A and A:G base-pairs at the ends of 16 S
and 23 S rRNA helices. J Mol Biol 2001, 310:735-753.
Conserved A•A or A•G oppositions occur at the ends of more than
100 helices in the 16S and 23S rRNAs. Approximately 75% of these oppo-
sitions are base paired. This paper gives an example in which one ‘simple’
RNA structure principle, an A•A and/or A•G opposition at the end of a helix,
was the basis for more than 75 new rRNA base pairs.
42. Nissen P, Ippolito JA, Ban N, Moore PB, Steitz TA: RNA tertiary
• interactions in the large ribosomal subunit: the A-minor motif.
Proc Natl Acad Sci USA 2001, 98:4899-4903.
This paper, along with [43•], presents at least a partial three-dimensional
structure explanation for the abundance of the unpaired adenosines in the
covariation-based structure model.
43. Doherty EA, Batey RT, Masquida B, Doudna JA: A universal mode
• of helix packing in RNA. Nat Struct Biol 2001, 8:339-343.
This paper, along with [42•], presents at least a partial three-dimensional
structure explanation for the abundance of the unpaired adenosines in the
covariation-based structure model.
44. Klein DJ, Schmeing TM, Moore PB, Steitz TA: The kink-turn:
• a new RNA secondary structure motif. EMBO J 2001,
20:4214-4221.
The authors present another new RNA structural motif. Expect more motifs
to be identified from the analysis of the ribosomal crystal structures and the
comparative rRNA sequence data.
45. Rajbhandary UL, Stuart A, Faulkner RD, Chang SH, Khorana HG:
Nucleotide sequence studies on yeast phenylalanine sRNA. Cold
Spring Harb Symp Quant Biol 1966, 31:425-434.
46. Madison JT, Everett GA, Kung HK: On the nucleotide sequence of
yeast tyrosine transfer RNA. Cold Spring Harb Symp Quant Biol
1966, 31:409-416.
47. Zachau HG, Dutting D, Feldman H, Melchers F, Karau W: Serine
specific transfer ribonucleic acids. XIV. Comparison of nucleotide
sequences and secondary structure models. Cold Spring Harb
Symp Quant Biol 1966, 31:417-424.
48. Levitt M: Detailed molecular model for transfer ribonucleic acid.
Nature 1969, 224:759-763.
49. Olsen GJ: Comparative analysis of nucleotide sequence data
[PhD Thesis]. Colorado: University of Colorado Health Sciences
Center; 1983.
50. Chiu DK, Kolodziejczak T: Inferring consensus structure
from nucleic acid sequences. Comput Appl Biosci 1991,
7:347-352.

51. Fox GW, Woese CR: 5S RNA secondary structure. Nature 1975,
256:505-507.
52. Michel F, Dujon B: Conservation of RNA secondary structures in
two intron families including mitochondrial-, chloroplast- and
nuclear-encoded members. EMBO J 1983, 2:33-38.
53. Michel F, Westhof E: Modelling of the three-dimensional
architecture of group I catalytic introns based on comparative
sequence analysis. J Mol Biol 1990, 216:585-610.
54. Quigley GJ, Rich A: Structural domains of transfer RNA molecules.
Science 1976, 194:796-806.
55. Kim SH: Crystal structure of yeast tRNA-phe and general structural
features of other tRNAs. In Transfer RNA: Structure, Properties, and
Recognition. Edited by Schimmel PR, Sol D, Abelson JN. New York:
Cold Spring Harbor Laboratory Press; 1979:83-100.
56. Conn GL, Draper DE, Lattman EE, Gittis AG: Crystal structure of a
conserved ribosomal protein-RNA complex. Science 1999,
284:1171-1174.
57. Wimberly BT, Guymon R, McCutcheon JP, White SW,
Ramakrishnan V: A detailed view of a ribosomal active site: the
structure of the L11-RNA complex. Cell 1999, 97:491-502.
58. Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Kundrot CE,
Cech TR, Doudna JA: Crystal structure of a group I ribozyme
domain: principles of RNA packing. Science 1996, 273:1678-1685.
59. Wimberly BT, Brodersen DE, Clemons WM Jr, Morgan-Warren RJ,
•• Carter AP, Vonhein C, Hartsch T, Ramakrishnan V: Structure of the
30 S ribosomal subunit. Nature 2000, 407:327-339.
This high-resolution crystal structure of the 30S ribosomal subunit, along
with the 50S crystal structure [61••], establishes the foundation for much of
the future work on the ribosome.
60. Schluenzen F, Tocilj A, Zarivach R, Harms J, Gluehmann M, Janell D,
Bashan A, Bartels H, Agmon I, Franceschi F et al.: Structure of
functionally activated small ribosomal subunit at 3.3 Å resolution.
Cell 2000, 102:615-623.
61. Ban N, Nissen P, Hansen J, Moore PB, Steitz TA: The complete
•• atomic structure of the large ribosomal subunit at 2.4 Å
resolution. Science 2000, 289:905-920.
This high-resolution crystal structure of the 50S ribosomal subunit, along
with the 30S crystal structure [59••], establishes the foundation for much of
the future work on the ribosome.
62. Ogle JM, Brodersen DE, Clemons WM Jr, Tarry MJ, Carter AP,
Ramakrishan V: Recognition of cognate transfer RNA by the 30 S
ribosomal subunit. Science 2001, 292:897-902.
310 Nucleic acids

Gutell 081.cosb.2002.12.0301

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Gutell 081.cosb.2002.12.0301

Similar to Gutell 081.cosb.2002.12.0301 (20)

More from Robin Gutell

More from Robin Gutell (20)

Recently uploaded

Recently uploaded (20)

Gutell 081.cosb.2002.12.0301