Gutell 081.cosb.2002.12.0301


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Gutell 081.cosb.2002.12.0301

  1. 1. 301The determination of the 16S and 23S rRNA secondary structuremodels was initiated shortly after the first complete 16S and23S rRNA sequences were determined in the late 1970s. Thestructures that are common to all 16S rRNAs and all 23S rRNAswere determined using comparative methods from the analysisof thousands of rRNA sequences. Twenty-plus years later, the16S and 23S rRNA comparative structure models have beenevaluated against the recently determined high-resolution crystalstructures of the 30S and 50S ribosomal subunits. Nearly all ofthe predicted covariation-based base pairs, including the regularbase pairs and helices, and the irregular base pairs and tertiaryinteractions, were present in the 30S and 50S crystal structures.Addresses*Institute for Cellular and Molecular Biology, and Section of IntegrativeBiology, University of Texas, 2500 Speedway, Austin,Texas 78712-1095, USA; e-mail:†Division of Medicinal Chemistry, College of Pharmacy, University ofTexas, Austin, Texas 78712, USA; e-mail:‡Institute for Cellular and Molecular Biology, University of Texas,2500 Speedway, Austin, Texas 78712-1095, USA;e-mail: cannone@mail.utexas.eduCorrespondence: Robin R GutellCurrent Opinion in Structural Biology 2002, 12:301–3100959-440X/02/$ — see front matter© 2002 Elsevier Science Ltd. All rights reserved.AbbreviationsCRW Comparative RNA WebPDB Protein Data BankIntroduction: the grand challengeOne of the grand challenges in science is the RNA foldingproblem. The computational aim is to be able to fold alinear sequence of nucleotides into its biologically activethree-dimensional structure. The challenge is to distinguishthe correct base pairings and helices from the large numberof possible interactions. For 16S rRNA, a molecule1500 nucleotides in length, there are approximately 15,000possible helices, with less than 100 of these in the finalstructure. The 23S rRNA is about twice the length of16S rRNA, with about 50,000 possible helices, of which150 are in the final structure. A possible set of unique,nonoverlapping helices, or portions of them, are assembledto form a single structure model. The maximum number ofcombinatorial arrangements of all possible helices is very(very) large, with about 4.3 × 10393 possible structure modelsfor 16S rRNA and about 6.3 × 10740 for 23S rRNA.To identify the correct structure from these large numbersof possible base pairings, helices and structure models, weneed the basic rules of RNA structure, or constraints, thatdefine the following:1. All of the possible RNA structural motifs (e.g. base pair,helix, hairpin loops, etc.).2. The mappings and associations between each of thesestructural elements, and the permissible arrangementsand composition of the nucleotides that form that element(a ‘many-to-one problem’).3. The organization and arrangement of these structuralelements with one another, both locally and globally acrossthe entire RNA structure.4. The thermodynamic energetics associated with the properfolding of the RNA molecule.5. Other factors influencing RNA folding, including proteinbinding (e.g. chaperones and ribosomal proteins) and therates of folding during transcription.6. The relative contributions of these rules to the processof folding the RNA and to the structure that participatesin its function.Our appreciation of these dynamics of RNA folding,beyond our understanding of the basic building blocks ofRNA structure (the canonical base pairs, G•C, A•U andG•U, and the arrangement of these base pairs into helices),is rudimentary. Consequently, we do not have sufficientconstraints at this time to accurately and reliably predictthe correct RNA higher-order structure from its underlyingsequence. The program mfold [1,2], the most successfulof the RNA folding algorithms that predict secondarystructure from the underlying sequence, integratesthermodynamic base-pairing rules with a helix identifica-tion and selection scheme. Although the prediction ofRNA secondary structure from the analysis of a singlesequence has improved significantly, this computerprogram, with its inherent folding criteria, still does notconsistently and unambiguously determine the correctsecondary structure [2–6]. Beyond the prediction of thebase pairings in the secondary structure, tertiary interactionsthat are layered onto the secondary structure are evenharder to predict because of the larger number of lessdefined structural components.Beginning in the late 1970s, our specific goals were topredict the structure of the 16S and 23S rRNAs, the majorRNA components in the 30S and 50S ribosomal subunits,respectively. These RNAs are complexed with ribosomalproteins and are intimately associated with protein synthesis.An understanding of their secondary and tertiary structureswill lay the foundation for our future understanding andappreciation of their functions.In contrast to the RNA folding algorithms, which utilizethermodynamic information on consecutive base pairs andother small structural elements, an alternative method,The accuracy of ribosomal RNA comparative structure modelsRobin R Gutell*, Jung C Lee† and Jamie J Cannone‡
  2. 2. comparative analysis, is based on a very simple andprofound principle. This method has been utilized to predictthe secondary structure and the early stages of the tertiarystructure of several RNA molecules, including the rRNAs.In addition to these structure predictions, the comparativeapproach has also revealed new information about RNAstructural motifs and other principles of RNA structure.Inferring higher-order structure from patternsof sequence variationShortly after the first tRNA sequence was determined [7],it was rationalized from a comparative perspective that alltRNA sequences should have equivalent secondary andtertiary structures to allow them to interact with the samebinding sites on the ribosome and with the same set ofproteins and RNAs during protein synthesis. Two basicprinciples form the foundation for the comparative analysisof RNA structure: firstly, different RNA sequences canfold into the same secondary and tertiary structures and,secondly, the unique structure and function of an RNAmolecule is maintained through the evolutionary processof mutation and selection. We utilized this comparativeparadigm for the prediction of the 16S and 23S rRNAstructures. We assumed that all 16S (and 16S-like) and 23S(and 23S-like) rRNAs have the same general secondary andtertiary structures, regardless of the extent of conservationand variation among the sequences. The correct helicesthat have been identified using comparative analysis arepresent in the same homologous region of the rRNAs andhave variation in the composition of the sequences, whilstmaintaining G•C, A•U and G•U base pairs. Initially, weidentified base-paired positions within a potential helix thathave ‘covariation’ (similar patterns of variation) in a set ofsequences aligned for maximum sequence identity [8–10].Proposed helices with two or more covariations wereconsidered ‘proven’. Versions of the 16S and 23S rRNAstructure models from the early 1980s (Santa Cruz/Urbanaversions) are shown in Figure 1. The majority of the helicesin these early structure models had at least one covariationper helix. We considered this model to be the minimalstructure, that is, there were areas that were incomplete.Two other sets of 16S and 23S rRNA structure modelswere determined independently with comparative methods[11–14], whereas another set of model diagrams was adaptedin full from previously proposed structure models [15–17].Subsequently, as the number of sequences in our 16S and23S rRNA alignments surpassed 25, we developed differentalgorithms and computer programs to identify positions inan alignment that have similar patterns of variation [18–20].Given this series of improvements in the covariationalgorithms, coupled with very dramatic increases in the302 Nucleic acidsFigure 1IIIIII501001502002503003504004505005506006507007508008509009501000105011001150120012501300135014001450150015501600164029005’ 3’3’ half1050100150200250300350400450500550600650700750800850900950100010501100115012001250130013501400145015005’3’IIIIIIIVVVI5’3’165017001750180018501900195020002050210021502200225023002350240024502500255026002650270027502800285029005’ half(a) (b) (c)Current Opinion in Structural BiologyThe original (1980–81) Noller-Woese-Gutell comparative structuremodels for the 16S and 23S rRNAs. (a) 16S rRNA (adapted from[8]). (b) 23S rRNA, 5′ half (adapted from [9]). (c) 23S rRNA, 3′ half(adapted from [9]). E. coli (GenBank accession number J01695) isused as the reference sequence. Each of these models has beensuperimposed onto the corresponding current model diagrams tohighlight the similarities and differences. Nucleotides are replaced withcolored dots: black, positions that are unchanged between the originaland current models; blue, base pairs present in the original modelsbut absent from the current models; red, positions that are unpaired inthe original models but are part of a base pair in the current models;green, positions that are part of one base pair in the original modelsbut are part of a different base pair in the current models. Full-pageversions of each panel are available online at (part of theCRW site at
  3. 3. number and diversity of rRNA sequences in our sequencecollection, we were able to identify more positions withsimilar patterns of variation. Although the early covariationanalysis only identified those covariations that involve A•Uand G•C pairings within a potential helix, our algorithmshave, for the past ten years, identified all positionalcovariations, regardless of base pair type and their types ofinterchanges with other base pairs (e.g. U•U ↔ C•C,A•A ↔ G•G, U•U ↔ G•G), and independent of the spatialrelationship with other base pairings and structural elements[21]. Consequently, we began identifying single base pairingsnot flanked by other base pairings, noncanonical base pairsand other types of tertiary interactions (see below). Inaddition to the inclusion of newly identified base pairs,previously proposed base pairs were removed from thestructure models when the ratio of covariation to variationdropped with increasing numbers of sequences.To gauge the extent of positional covariation and ourconfidence in the accuracy of each of these proposed basepairs, we established a quantitative scoring method.Higher scores reflect a greater extent of pure covariation(simultaneous changes at both of the paired positions),larger numbers of exchanges between a set of base pairtypes that covary with one another (e.g. A•U ↔ G•C)and/or a larger number of mutual changes or covariationsthat occur during the evolution of the RNA (also calledphylogenetic events). These three parameters can,individually or collectively, influence our confidence in aputative base pair. For example, we were more confidentin the authenticity of the 570•866 base pair in 16S rRNAbecause of several phylogenetic events within the bacteria,archaea and eucarya [22]. These 16S and 23S rRNAcovariation-based structure models only contain those basepairs with positional covariation or G•C, A•U or G•U basepairs that are within a regular helix and present in morethan 80% of the sequences.The most recent comparative structure models for 16S and23S rRNA are shown in Figure 2 and are based on theanalysis of approximately 7000 16S and 1050 23S rRNAsequences [21,23]. These two structure models are theculmination of 20 years of comparative analysis (seebelow). The base pair symbols are color coded to reveal ourconfidence in the authenticity of that base pair; base pairswith the highest covariation scores are shown in red,followed by green and black. Base pairs with gray symbolsare conserved in more than 98% of the sequences, whereasRibosomal RNA comparative structure models Gutell, Lee and Cannone 303Figure 2IIIIII501001502002503003504004505005506006507007508008509009501000105011001150120012501300135014001450150015501600164029005’ 3’3’ half(2407-2410)(2010-2011)(2018)(2057/2611 BP)(2016-2017)AIVVVI5’3’165017001750180018501900195020002050210021502200225023002350240024502500255026002650270027502800285029005’ half(1269-1270)(413-416)(1262-1263)(746)(531)1050100150200250300350400450500550600650700750800850900950100010501100115012001250130013501400145015005’3’IIIIIIA(a) (b) (c)Current Opinion in Structural BiologyThe current Noller-Woese-Gutell comparative structure models for the16S and 23S rRNAs. (a) 16S rRNA. (b) 23S rRNA, 5′ half. (c) 23SrRNA, 3′ half. E. coli (GenBank accession number J01695) is used asthe reference sequence. Nucleotides are replaced with colored dotsthat represent confidence in the base pair: red, high covariation scores;green, lower but significant covariation scores occurring within astandard helix containing a red base pair; black, even lower covariationscores occurring within a standard helix containing a red base pair;gray, conserved in more than 98% of the sequences occurring withina standard helix containing a red base pair; blue, do not have a significantamount of pure covariation and do not occur within a standard helix (see[23] for additional details). Base pair symbols indicate the type of basepair: line, canonical base pair; small closed circle, G•U base pair; largeopen circle, G•A base pair; large closed circle, other noncanonicalbase pairs. Nucleotides involved in tertiary interactions (includingpseudoknots) are boxed and connected with lines. Diagrams adaptedfrom [23]. Full-page versions of each panel are available online at theCRW site (
  4. 4. blue base pairs do not have a significant amount of purecovariation and do not occur within a standard helix(see [23] for more details). As the majority of the base pairshave red symbols, we believe that nearly all of the basepairs in the current 16S and 23S rRNA covariation-basedstructure models are correct (see below).The evolution of the 16S and 23S rRNA covariation-basedstructure models is shown graphically in Figure 1 andquantitatively in Table 1. To allow easy comparison with thecurrent models, the original 1980–81 16S and 23S rRNAstructure models were redrawn using the current models asa template (Figure 1). Base pairs that are present in both theoriginal and current models are shown in black, and thosethat are different in the original structure models and themost recent covariation-based structure models are illustratedin blue, red and green. Blue base pair symbols indicate basepairs in the original models that are absent from the currentmodels, red nucleotides are unpaired in the original modelsand paired in the current models, and green nucleotides arepart of different base pairs in the two structure models.In 1980–81, the 16S and 23S rRNA structure models werebased on just two complete rRNA sequences per structure;at the end of 1999, this work culminated with the analysis ofapproximately 7000 16S and 1050 23S rRNA sequences.These structure models evolved over nearly 20 years as thecollection of sequences grew and our methods to identifyand score covariations were developed and refined. To assessthe changes, the original 1980–81 structure models werecompared with the current 1999 structure models (Table 1,adapted from Section 1b on the ‘Comparative RNA Web’[CRW] site and database; draw four significant conclusions from this analysis.Firstly, nearly 60% of the base pairs in the current 16SrRNA structure model were predicted from the analysisof two sequences for the original structure model; nearly78% of the current 23S rRNA base pairings were predictedfrom the original structure model. Secondly, in contrast,approximately 80% of the original 16S and 87% of theoriginal 23S rRNA base pairs proposed in 1980–81 arepresent in the current models. Thirdly, approximately 7016S and 100 23S initial base pairs have been removed fromthe original rRNA structure models. Finally, the number ofunusual, tertiary and tertiary-like base pairings that are pre-dicted with confidence increases in parallel with increasesin the number and diversity of rRNA sequences studiedand with improvements in the covariation algorithms. Inconclusion, the major components of the 16S and 23SrRNA structure models were predicted correctly from theanalysis of just a few 16S and 23S rRNA sequences that areapproximately 75% similar to one another. Thousands ofadditional rRNA sequences with significant degrees ofsimilarity and diversity with one another were subsequentlyanalyzed with covariation analysis to refine the secondarystructure models, to begin to identify tertiary base pairs andto establish a system to measure the extent of covariation atall of the proposed base pairs. Beyond the prediction ofbase pairs with covariation analysis, the comparativesequence and structure data are encrypted with fundamentalprinciples of RNA structure and archaeological markersthat indicate the ancestry of that RNA sequence [24].Our next task is to decipher these ‘treasures’ from thecomparative RNA sequence and structure data sets. Tothis end, we have established the CRW site and database([23]; to organize, analyzeand disseminate comparative data for the 5S, 16S (and16S-like) and 23S (and 23S-like) rRNAs, group I and IIintrons, and tRNAs. The main types of information anddata available online for each of these RNAs are: the currentcomparative RNA structure model; nucleotide and basepair frequency tables for all positions in the referencestructures; secondary structure conservation diagrams thatreveal the extent of conservation of the RNA sequenceand structure; more than 400 representative secondarystructure diagrams for organisms from groups that span thephylogenetic tree and reveal the major forms of structuralvariation; nearly 12,000 publicly available sequences thatare 90% or more complete; and sequence alignments.304 Nucleic acidsTable 1Summary of the evolution of the Noller-Woese-Gutell 16S and 23S rRNA structure models from the first to the most recentcovariation-based structure models (adapted from Table 3a,b in [23]).Model 16S rRNA 23S rRNADate 1980 1999 1981 19991. Approximate number of complete sequences 2 7000 2 10502. Percentage of 1999 sequences* 0.03 100 0.2 1003. Number of bp proposed correctly* 284 478 676 8704. Number of bp proposed incorrectly* 69 0 102 05. Total bp in model (3 + 4) 353 478 778 8706. Percentage of bp in model present in the current model (3 / X)*†59.4 100 77.7 1007. Accuracy of proposed bp (3 / 5) 80.5 100 86.9 1008. Number of bp in current model missing from this model (X – 3)*†194 0 194 09. Number of tertiary bp proposed correctly* 4 40 4 6510. Percentage of tertiary bp proposed correctly* 10.0 100 6.2 10011. Number of base triples proposed correctly* 0 6 0 712. Percentage of base triples proposed correctly* 0 100 0 100*Comparisons are made against the current (1999) models. †X = 478 for 16S rRNA; X= 870 for 23S rRNA. bp, base pairs.
  5. 5. This type of comparative data is the foundation for thesubsequent identification and analysis of RNA structuralmotifs. Although the patterns of variation at both positionsin many of the base pairs in the RNA structure are similarand thus should be identified with covariation analysis,other sets of base pairs do not have similar patterns ofvariation at the two interacting positions. Thus, one of thelarger goals of comparative analysis is to predict those basepairs lacking similar patterns of variation that occur inseveral different types of structural elements, as well asthose base pairs with positional covariation that are conservedamong the sequences in that data set. The process ofcomparative analysis, then, is to first predict base pairingswith covariation analysis, followed by the identification ofmotifs that are composed of unique arrangements ofsequences within specific structural elements. SeveralRNA structural motifs have been identified and/or are stillbeing defined from sequence and structure perspectives.These motifs include:1. Unpaired adenosines in the covariation-based structuremodel [18,25•].2. Tetraloops — hairpin loops with four nucleotides that arecomposed of specific sequences [26].3. Tetraloop receptors and other tertiary interactions involvingtetraloops [27–30].4. Dominant G•U base pairs [31,32].5. Tandem G•A oppositions [33,34].6. Base triples [20].7. Adenosine platforms [25•,35].8. U-turns [36].9. E loops (or S turns) [25•,37,38].10. E-like loops [25•].11. Cross-strand purine stacks [39].12. A•A and A•G oppositions/base pairs at the ends ofhelices [10,40,41•].13. Lone pair triloops ([21]; RR Gutell et al., unpublisheddata).14. A-minor motif [42•,43•].15. Kink-turn [44•].Crystal structures of the 16S and 23S rRNAs:the accuracy of the rRNA comparativestructure modelsTo assess the accuracy of the covariation-based structuremodels, the comparative models for tRNA [19,20,45–50],fragments of 5S rRNA [51], the L11-binding region of23S rRNA [9,21,23] and the group I intron [52,53] werecompared with the corresponding high-resolution crystalstructures [39,54–58]. Nearly all of the secondary structurebase pairings and a few of the tertiary base pairs observedin the crystal structure were predicted in the comparativestructure models for all of these RNAs. More recently, thehigh-resolution crystal structures of the 30S [59••,60] and50S [61••] ribosomal subunits were solved, giving us theopportunity to evaluate the accuracy of our most recent16S and 23S rRNA structure models. The results wereagain affirmative: approximately 97–98% of the basepairings predicted with covariation analysis (in the finalcovariation-based structure models) are indeed presentin the 16S and 23S rRNA crystal structures (Table 2;RR Gutell et al., unpublished data). The accuracy of the 16Sand 23S rRNA covariation-based structure prediction notonly augments the credibility of the comparative approach,but it also validates the sequence alignments that havebeen initiated, refined and expanded over the past 20 years,the initial covariation analysis and our subsequentRibosomal RNA comparative structure models Gutell, Lee and Cannone 305Table 2Comparison of the current comparative structure models and the crystal structures of the 16S and 23S rRNAs*.16S rRNA†23S rRNA‡TotalPredicted base pairs§Model CB #461 / 476 / 97% 779 / 797 / 98% 1240 / 1273 / 97%Tentative CB#8 / 23 / 35% 18 / 36 / 50% 26 / 59 / 44%Motif-based¶45 / 65 / 70% 86 / 122 / 70% 131 / 187 / 70%Crystal structure interactions¥+/+ base–base 514 883 1397–/+ base–base 56 425 481Total base–base 683 1297 1862Base–backbone 49 237 286*A more complete analysis will be presented later (RR Gutell et al., unpublished data). †T. thermophilus, GenBank accession number M26923,PDB code 1FJF [59]. ‡H. marismortui, GenBank accession number AF034620, PDB code 1JJ2 [61]. §Data are shown as approximatenumber of base pairs present in the crystal structure / approximate number of predicted base pairs / percentage of predicted base pairspresent in the crystal structure. #CB, covariation-based. ¶The motifs analyzed here are AA.AG@helix.ends [41], tandem GA [33,34], E andE-like loops [25], lone pair triloops (RR Gutell et al., unpublished data) and base triples [20]. ¥Approximate numbers of interactions in the tworibosomal crystal structures.
  6. 6. covariation algorithms and their refinements. In additionto the final covariation-based structure model, nearly 45%of the tentative covariation-based base pairs and 70% ofthe motif-based base pairs that were predicted are in thecrystal structure (Table 2). In total, about 90% of the basepairs predicted by comparative analysis are from thecovariation-based analysis and 10% are from the alternativemotif-based analysis ([20,25•,33,34,41•]; RR Gutell et al.,unpublished data).The secondary structure diagrams for Thermus thermophilus16S rRNA and Haloarcula marismortui 23S rRNA are shownin Figure 3. All of the base–base and base–backboneinteractions in the 30S [59••] and 50S [61••] ribosomalsubunit crystal structures are colored to reflect the initialidentification of each pairing. The three primary categoriesare: present in both the comparative model (covariationand motif analysis) and the crystal structure (+/+), presentin the comparative model but not in the crystal structure(+/–), and not present in the comparative model butpresent in the crystal structure (–/+). The nucleotides andbase pair symbols are colored red for +/+, green for +/–,blue for –/+ base–base interactions and brown for –/+base–backbone interactions.The affirmative base pairs that were predicted usingcovariation analysis (see red nucleotides and base pairsymbols in Figure 3) include: essentially all base pairs that arestrictly homologous between the E. coli reference structuremodels and the T. thermophilus 16S and H. marismortui 23SrRNA crystal structures that have a significant amount ofpositional covariation; base pairs that are standardWatson–Crick (G•C and A•U) and G•U base pairexchanges; base pairs that occur within standard secondarystructure helices (2 base pairs in length) that are nested(i.e. not a pseudoknot); individual base pairs and helices306 Nucleic acidsFigure 3Comparison of the current Noller-Woese-Gutellcomparative structure models for the 16S and23S rRNAs with the corresponding ribosomalsubunit crystal structures. (a) 16S rRNAversus the T. thermophilus structure(GenBank accession number M26923;PDB code 1FJF; [59••]). (b) 23S rRNA,5′ half versus the H. marismortui structure(GenBank accession number AF034620;PDB code 1JJ2; [61••]). (c) 23S rRNA, 3′ halfversus the H. marismortui structure (GenBankaccession number AF034620; PDB code 1JJ2;[61••]). Nucleotides are replaced with coloreddots that show the sources of theinteractions: red, present in both thecovariation-based structure model and thecrystal structure; green, present in thecomparative structure and not present inthe crystal structure; blue, not present inthe comparative structure and present in thecrystal structure; magenta, present in thecovariation-based tentatives or motif-basedanalysis, and present in the crystal structure;brown, base–backbone orbackbone–backbone interactions; purple,positions that are unresolved in the crystalstructure. Colored open circles aroundpositions show the third nucleotide of basetriples and colored open rectangles show thebase pairs of base triples. Colored open squaresare used for clarity. Full-page versions of eachpanel are available online at the CRW site(’3’50100150200250300350400450 50055060065070075080085090095010001050110011501200125013001350140014501500Current Opinion in Structural Biology(a)
  7. 7. Ribosomal RNA comparative structure models Gutell, Lee and Cannone 307Figure 3 continued3’half5’3’5’3’5’3’bbaa50100150200250300350400450500550600650700750800850900950100010501100115012001250130013501400145015001550160016501700GEDCBAFFBEICHJKD24692119243022642265226321012537206023842477211118401737211322741833183518432465228023952283249225302071253120782070252125002499252425512526255026042079208021012537172520441723173620431725205118672734273527442745206020752082266020562055239420682529230023012307205520221830207424802107227923022520252320702498252318662070192024912396211018362066245318321865207527863952443A41824497362406209888521138952097136620581371205413732052839G537205915612739IH857L18312472207722982297231120842085JKL26235’half5’3’E175018001850190019502000205021002150220022502300235024002450250025502600265027002750280028502900DBFAABCDEFGHIJKLC90010541681692213873884036346367387677688328368698698758768769199271057105810591060107811331230123212321233123312391359137513801432147617131714171411531561532535631619107912346291128538137384013691368692923767923105810628208747781468147541983010141063105291792811331231113010565361231921G922I1359418244983920985372059136620582052395244373624068578852113895209713712054273913931831H12351127113213741376137688510071006107953211JKL877(b)(c)CurrentOpinioninStructuralBiology
  8. 8. that form pseudoknots, including tertiary interactions;lone pairs, including those in the lone pair triloop motif(RR Gutell et al., unpublished data); and noncanonicalbase pairs and their exchanges — A•A ↔ G•G, U•U ↔ C•C,A•G ↔ G•A, A•C ↔ G•U, U•A ↔ G•G, A•C ↔ U•A andA•G ↔ R•U [21].Although more than 1250 base pairs predicted with covari-ation analysis are in the crystal structure, approximately 35of them are not (see green nucleotides in Figure 3; notethat the green interactions include those predicted withboth covariation analysis and motif-based analysis). Themajority of these +/– proposed covariation-based base pairsthat are absolutely homologous between the E. coli referencemodels and the T. thermophilus 16S and H. marismortui 23SrRNA structures were not predicted with our highest (red)confidence rating. Instead, there was either no positionalcovariation or an insignificant amount of these putativebase pairs; these interactions were included in the structuremodel because they form a G•C, A•U or G•U pair in morethan 80% of the sequences and were adjacent to a base pairwith covariation. The majority of these +/– base pairs arecolored black, our lowest covariation confidence rating.The aberrant base pairs that are truly homologous betweenthe crystal structure and the E. coli reference structurehave two other important characteristics. First, all of theseputative base pairs occur at the ends of helices and, second,there is a bias in the types of base pairs that are not predictedcorrectly at the ends of helices. The two most frequentpairing types (in this latter category) are U•G and U•A(where the U is at the 5′ half of the helix). These putativebase pairs might not occur in the rRNA structure or,alternatively, they might be dynamic and are paired atcertain stages of protein synthesis and not in the states ofthe crystal structures analyzed here. There is a precedentfor conformational changes of the base pairings at the endsof helices. Positions 1408 and 1493 form an A•A base pairin the uncomplexed 30S ribosomal subunit (PDB code1FJF; [59••]), but are not paired when tRNA and mRNAare complexed to the 30S subunit [62]. We speculate thatother A•A and A•G oppositions/base pairs at the ends ofhelices in the 16S and 23S rRNAs might be involved inconformational changes [41•]. There is also an interestinganecdote about the putative U•A pairings that are not inthe crystal structure. The orientation of these U•A pairswould place the conserved, ’unpaired’ adenosine at the3′ end of the loop, a very common arrangement in the 16Sand 23S rRNAs [25•].We will not know all of the structural possibilities for theseputative base pairings until we obtain more crystallographic,NMR or other experimental data for these regions of therRNA. Although comparative analysis has predictedapproximately 510 16S and 880 23S rRNA base pairs, anadditional ~170 16S and ~415 23S rRNA base pairs(base–base) are in the crystal structure that were notpredicted with comparative methods. Essentially, none ofthese ‘–/+’ base pairs has a significant amount of positionalcovariation and thus could not be predicted with covariationanalysis. In general, these ‘–/+’ base pairs comprisenoncanonical base pairs that are not associated withstandard helices that were predicted with covariationanalysis. A more detailed comparison between the compar-ative and crystal structures will be presented elsewhere(RR Gutell et al., unpublished data).ConclusionsCovariation analysis has accurately predicted all of thestandard secondary structure base pairings and helices inthe 16S and 23S rRNA crystal structures. These methodshave also identified some of the 16S and 23S rRNA tertiarybase–base interactions. Motif-based analysis has begun toidentify some of the base pairs that do not have similarpatterns of variation. Our future goal is to gain a betterunderstanding of tertiary base–base interactions from acomparative perspective and, more specifically, to determinetheir base pair types and exchanges, and the types ofstructural elements or motifs with which they are associated.A more complete set of RNA structure constraints isnecessary to accurately and reliably predict an RNA structurefrom its underlying sequence, and to understand thedynamics between structure and function.AcknowledgementsThis work was supported by the National Institutes of Health (GM48207),by the Welch Foundation (F-1427) and by start-up funds from the Institutefor Cellular and Molecular Biology at the University of Texas at Austin.References and recommended readingPapers of particular interest, published within the annual period of review,have been highlighted as:• of special interest••of outstanding interest1. Zuker M: On finding all suboptimal foldings of an RNA molecule.Science 1989, 244:48-52.2. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequencedependence of thermodynamic parameters improves predictionof RNA secondary structure. J Mol Biol 1999, 288:911-940.3. Zuker M, Jaeger JA, Turner DH: A comparison of optimaland suboptimal RNA secondary structures predicted byfree energy minimization with structures determinedby phylogenetic comparison. Nucleic Acids Res 1991,19:2707-2714.4. Zuker M, Jacobson AB: ‘Well-determined’ regions in RNAsecondary structure prediction: analysis of small subunitribosomal RNA. Nucleic Acids Res 1995, 23:2791-2798.5. Konings DAM, Gutell RR: A comparison of thermodynamicfoldings with comparatively derived structures of 16S and16S-like rRNAs. RNA 1995, 1:559-574.6. Fields DS, Gutell RR: An analysis of large rRNA sequencesfolded by a thermodynamic method. Fold Des 1996,1:419-430.7. Holley RW, Apgar J, Everett GA, Madison JT, Maquisee M, Merrill SH,Penswick JR, Zamir A: Structure of a ribonucleic acid. Science1965, 147:1462-1465.8. Woese CR, Magrum LJ, Gupta R, Siegel RB, Stahl DA, Kop J,Crawford N, Brosius J, Gutell R, Hogan JJ et al.: Secondary structuremodel for bacterial 16S ribosomal RNA: phylogenetic, enzymaticand chemical evidence. Nucleic Acids Res 1980, 8:2275-2293.9. Noller HF, Kop J, Wheaton V, Brosius J, Gutell RR, Kopylov AM,Dohme F, Herr W, Stahl DA, Gupta R et al.: Secondary structuremodel for 23S ribosomal RNA. Nucleic Acids Res 1981,9:6167-6189.308 Nucleic acids
  9. 9. 10. Woese CR, Gutell R, Gupta R, Noller HF: Detailed analysis of thehigher-order structure of 16S-like ribosomal ribonucleic acids.Microbiol Rev 1983, 47:621-669.11. Stiegler P, Carbon P, Zuker M, Ebel JP, Ehresmann C: Secondary andtopographic structure of ribosomal RNA 16S of Escherichia coli.C R Seances Acad Sci D 1980, 291:937-940.12. Glotz C, Zwieb C, Brimacombe R, Edwards K, Kossel H:Secondary structure of the large subunit ribosomal RNA fromEscherichia coli, Zea mays chloroplast, and human and mousemitochondrial ribosomes. Nucleic Acids Res 1981, 9:3287-3306.13. Zwieb C, Glotz C, Brimacombe R: Secondary structurecomparisons between small subunit ribosomal RNA moleculesfrom six different species. Nucleic Acids Res 1981, 9:3621-3640.14. Branlant C, Krol A, Machatt MA, Pouyet J, Ebel JP, Edwards K,Kossel H: Primary and secondary structures of Escherichia coliMRE 600 23S ribosomal RNA. Comparison with models ofsecondary structure for maize chloroplast 23S rRNA and for largeportions of mouse and human 16S mitochondrial rRNAs. NucleicAcids Res 1981, 9:4303-4324.15. Huysmans E, De Wachter R: Compilation of small ribosomal subunitRNA sequences. Nucleic Acids Res 1986, 14(suppl):R73-R118.16. De Rijk P, Van de Peer Y, Chapelle S, De Wachter R: Database onthe structure of large ribosomal subunit RNA. Nucleic Acids Res1994, 22:3495-3501.17. Van de Peer Y, De Rijk P, Wuyts J, Winkelmans T, De Wachter R:The European small subunit ribosomal RNA database.Nucleic Acids Res 2000, 28:175-176.18. Gutell RR, Weiser B, Woese CR, Noller HF: Comparative anatomyof 16-S-like ribosomal RNA. Prog Nucleic Acid Res Mol Biol 1985,32:155-216.19. Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD: Identifyingconstraints on the higher-order structure of RNA: continueddevelopment and application of comparative sequence analysismethods. Nucleic Acids Res 1992, 20:5785-5795.20. Gautheret D, Damberger SH, Gutell RR: Identification ofbase-triples in RNA using comparative sequence analysis. J MolBiol 1995, 248:27-43.21. Gutell RR: Comparative sequence analysis and the structure of16S and 23S rRNA. In Ribosomal RNA: Structure, Evolution, Processingand Function in Protein Biosynthesis. Edited by Dahlberg AE,Zimmermann RA. Boca Raton: CRC Press; 1996:111-128.22. Gutell RR, Noller HF, Woese CR: Higher order structure inribosomal RNA. EMBO J 1986, 5:1111-1113.23. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D’Souza LM,Du Y, Feng B, Lin N, Madabusi LV, Müller KM et al.: The ComparativeRNA Web (CRW) site: an online database of comparativesequence and structure information for ribosomal, intron, andother RNAs. BMC Bioinformatics 2002, 3:2.24. Woese CR: Bacterial evolution. Microbiol Rev 1987, 51:221-271.25. Gutell RR, Cannone JJ, Shang Z, Du Y, Serra MJ: A story: unpaired• adenosine bases in ribosomal RNA. J Mol Biol 2000,304:335-354.Although the abundance of conserved, unpaired adenosines was revealed in1985 [18], a more extensive comparative analysis of 16S and 23S rRNAssubstantiated the initial finding and revealed that more than 50% of the3′ ends of loops in 16S and 23S rRNAs have an adenosine that isconserved in more than 90% of the sequences.26. Woese CR, Winker S, Gutell RR: Architecture of ribosomal RNA:constraints on the sequence of tetra-loops. Proc Natl Acad SciUSA 1990, 87:8467-8471.27. Jaeger L, Michel F, Westhof E: Involvement of a GNRA tetraloop inlong-range RNA tertiary interactions. J Mol Biol 1994,236:1271-1276.28. Costa M, Michel F: Frequent use of the same tertiary motif byself-folding RNAs. EMBO J 1995, 14:1276-1285.29. Costa M, Michel F: Rules for RNA recognition of GNRA tetraloopsdeduced by in vitro selection: comparison with in vivo evolution.EMBO J 1997, 16:3289-3302.30. Juneau K, Cech TR: In vitro selection of RNAs with increasedtertiary structure stability. RNA 1999, 5:1119-1129.31. Gutell RR, Larsen N, Woese CR: Lessons from an evolvingribosomal RNA: 16S and 23S rRNA structure from a comparativeperspective. Microbiol Rev 1994, 58:10-26.32. Gautheret D, Konings D, Gutell RR: GU base pairing motifs inribosomal RNAs. RNA 1995, 1:807-814.33. SantaLucia J Jr, Kierzek R, Turner DH: Effects of GA mismatches onthe structure and thermodynamics of RNA internal loops.Biochemistry 1990, 29:8813-8819.34. Gautheret D, Konings D, Gutell RR: A major family of motifsinvolving G-A mismatches in ribosomal RNA. J Mol Biol 1994,242:1-8.35. Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Szewczak AA,Kundrot CE, Cech TR, Doudna JA: RNA tertiary structure mediationby adenosine platforms. Science 1996, 273:1696-1699.36. Gutell RR, Cannone JJ, Konings D, Gautheret D: Predicting U-turnsin ribosomal RNA with comparative sequence analysis. J Mol Biol2000, 300:791-803.37. Wimberly B: A common RNA loop motif as a docking module andits function in the hammerhead ribozyme. Nat Struct Biol 1994,1:820-827.38. Leontis NB, Westhof E: A common motif organizes the structure ofmulti-helix loops in 16 S and 23 S ribosomal RNAs. J Mol Biol1998, 283:571-583.39. Correll CC, Freeborn B, Moore PB, Steitz TA: Metals, motifs, andrecognition in the crystal structure of a 5S rRNA domain. Cell1997, 91:705-712.40. Traub W, Sussman JL: Adenine-guanine base pairing ribosomalRNA. Nucleic Acids Res 1982, 10:2701-2708.41. Elgavish T, Cannone JJ, Lee JC, Harvey SC, Gutell RR:• AA.AG@Helix.Ends: A:A and A:G base-pairs at the ends of 16 Sand 23 S rRNA helices. J Mol Biol 2001, 310:735-753.Conserved A•A or A•G oppositions occur at the ends of more than100 helices in the 16S and 23S rRNAs. Approximately 75% of these oppo-sitions are base paired. This paper gives an example in which one ‘simple’RNA structure principle, an A•A and/or A•G opposition at the end of a helix,was the basis for more than 75 new rRNA base pairs.42. Nissen P, Ippolito JA, Ban N, Moore PB, Steitz TA: RNA tertiary• interactions in the large ribosomal subunit: the A-minor motif.Proc Natl Acad Sci USA 2001, 98:4899-4903.This paper, along with [43•], presents at least a partial three-dimensionalstructure explanation for the abundance of the unpaired adenosines in thecovariation-based structure model.43. Doherty EA, Batey RT, Masquida B, Doudna JA: A universal mode• of helix packing in RNA. Nat Struct Biol 2001, 8:339-343.This paper, along with [42•], presents at least a partial three-dimensionalstructure explanation for the abundance of the unpaired adenosines in thecovariation-based structure model.44. Klein DJ, Schmeing TM, Moore PB, Steitz TA: The kink-turn:• a new RNA secondary structure motif. EMBO J 2001,20:4214-4221.The authors present another new RNA structural motif. Expect more motifsto be identified from the analysis of the ribosomal crystal structures and thecomparative rRNA sequence data.45. Rajbhandary UL, Stuart A, Faulkner RD, Chang SH, Khorana HG:Nucleotide sequence studies on yeast phenylalanine sRNA. ColdSpring Harb Symp Quant Biol 1966, 31:425-434.46. Madison JT, Everett GA, Kung HK: On the nucleotide sequence ofyeast tyrosine transfer RNA. Cold Spring Harb Symp Quant Biol1966, 31:409-416.47. Zachau HG, Dutting D, Feldman H, Melchers F, Karau W: Serinespecific transfer ribonucleic acids. XIV. Comparison of nucleotidesequences and secondary structure models. Cold Spring HarbSymp Quant Biol 1966, 31:417-424.48. Levitt M: Detailed molecular model for transfer ribonucleic acid.Nature 1969, 224:759-763.49. Olsen GJ: Comparative analysis of nucleotide sequence data[PhD Thesis]. Colorado: University of Colorado Health SciencesCenter; 1983.50. Chiu DK, Kolodziejczak T: Inferring consensus structurefrom nucleic acid sequences. Comput Appl Biosci 1991,7:347-352.Ribosomal RNA comparative structure models Gutell, Lee and Cannone 309
  10. 10. 51. Fox GW, Woese CR: 5S RNA secondary structure. Nature 1975,256:505-507.52. Michel F, Dujon B: Conservation of RNA secondary structures intwo intron families including mitochondrial-, chloroplast- andnuclear-encoded members. EMBO J 1983, 2:33-38.53. Michel F, Westhof E: Modelling of the three-dimensionalarchitecture of group I catalytic introns based on comparativesequence analysis. J Mol Biol 1990, 216:585-610.54. Quigley GJ, Rich A: Structural domains of transfer RNA molecules.Science 1976, 194:796-806.55. Kim SH: Crystal structure of yeast tRNA-phe and general structuralfeatures of other tRNAs. In Transfer RNA: Structure, Properties, andRecognition. Edited by Schimmel PR, Sol D, Abelson JN. New York:Cold Spring Harbor Laboratory Press; 1979:83-100.56. Conn GL, Draper DE, Lattman EE, Gittis AG: Crystal structure of aconserved ribosomal protein-RNA complex. Science 1999,284:1171-1174.57. Wimberly BT, Guymon R, McCutcheon JP, White SW,Ramakrishnan V: A detailed view of a ribosomal active site: thestructure of the L11-RNA complex. Cell 1999, 97:491-502.58. Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Kundrot CE,Cech TR, Doudna JA: Crystal structure of a group I ribozymedomain: principles of RNA packing. Science 1996, 273:1678-1685.59. Wimberly BT, Brodersen DE, Clemons WM Jr, Morgan-Warren RJ,•• Carter AP, Vonhein C, Hartsch T, Ramakrishnan V: Structure of the30 S ribosomal subunit. Nature 2000, 407:327-339.This high-resolution crystal structure of the 30S ribosomal subunit, alongwith the 50S crystal structure [61••], establishes the foundation for much ofthe future work on the ribosome.60. Schluenzen F, Tocilj A, Zarivach R, Harms J, Gluehmann M, Janell D,Bashan A, Bartels H, Agmon I, Franceschi F et al.: Structure offunctionally activated small ribosomal subunit at 3.3 Å resolution.Cell 2000, 102:615-623.61. Ban N, Nissen P, Hansen J, Moore PB, Steitz TA: The complete•• atomic structure of the large ribosomal subunit at 2.4 Åresolution. Science 2000, 289:905-920.This high-resolution crystal structure of the 50S ribosomal subunit, alongwith the 30S crystal structure [59••], establishes the foundation for much ofthe future work on the ribosome.62. Ogle JM, Brodersen DE, Clemons WM Jr, Tarry MJ, Carter AP,Ramakrishan V: Recognition of cognate transfer RNA by the 30 Sribosomal subunit. Science 2001, 292:897-902.310 Nucleic acids