Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
+Chapter 6Comparative Sequence Analysis and the Structureof 16S and 23S rRNARobin R. GutellCONTENTS+I. Introduction .................................................................................................................................. IIIA. General Comments ............................................................................................................... IIIB. Principles of Covariance Analysis ........................................................................................ 112C. Modelling rRNA Secondary and Tertiary Structure ............................................................ 112II. Secondary and Tertiary Structure ............................................................................................... 115A. Current Models of 16S and 23S rRNA Secondary and Tertiary Structure ........................ 115B. Characteristics of the Base Pairs .......................................................................................... 1171. Canonical Pairings ........................................................................................................... 1172. Noncanonical Pairings ..................................................................................................... 118a. G:U Base Pairs and G:U .... A:C Interchanges .......................................................... 118b. Purine:Purine, Pyrirnidine:Pyrirnidine, and Other Interconversions .......................... 119C. Artangement of the Base Pairs and Characteristics of Helices ........................................... 1201. Traditional Organization of Base Pairs and Helices ....................................................... 1202. Lone Pairs ........................................................................................................................ 1213. Pseudoknots ..................................................................................................................... 1214. Base Pairings in Parallel .................................................................................................. 1225. Base Triples ..................................................................................................................... 1226. Coaxial Helices ................................................................................................................ 1237. Tetraloops ........................................................................................................................ 123m. A Comparative Perspective on the Structure of rRNA .............................................................. 124Acknowledgments ................................................................................................................................ 125References .............................................................................................................................................. 125I. INTRODUCTIONA. GENERAL COMMENTSDetermining the secondary and tertiary structure of an RNA from its sequence requires a strongunderstanding of the basic principles of RNA structure and the expertise to utilize these principles totransfonn the sequence of that molecule into its higher-order structure. Beyond our basic appreciation forsecondary-structure helices and a few emerging RNA structural motifs (e.g., pseudoknots, tetraloops.etc.), our knowledge of the principles of RNA structure is rudimentary. Moreover, our ability to fold aprimary structure into its secondary structure. while getting better, needs further improvement since thenumber of theoretically possible secondary structures for large RNA molecules is quite large, andidentifying the biologically correct version is not easily accomplished. Given these limitations, we areunable to take any single rRNA sequence and detennine its secondary or tertiary structure with strongconfidence. However, the methodology of comparative sequence analysis has been used to solve thesecondary structures for a number of different RNA molecules (reviewed in: Woese and Pace 1993,Gutell 1993a). The objective of this chapter is to review comparative sequence analysis and the structuresfor 16S and 23S rRNA that have been inferred with this approach, and to discuss some of the principlesof RNA structure that.are resulting from these studies. [A longer and more detailed account ofthe materialpresented in this article has been published elsewhere (Gutell, Larsen, and Woese 1994)].Several different RNA molecules have been analyzed by the comparative approach. Comparativesequence analysis was first used to suggest the cloverleaf secondary-structure configuration of tRNA(Holley et al. 1965; Madison, Everett, and Kung 1966; RajBhandary et al. 1966; Zachau et al. 1966). A0-S49J.Il864-JI96I$O.oo.s.soo 1996 by CRt Prcu.lnc. 111
-112few years later a more detailed analysis yielded comparative support for afew tertiary interactions withinlRNA (Levitt 1969). Today, with a significantly larger collection of lRNA sequences, and more refinedand quantitative covariation methods. all of the secondary-structure base pairs and the majority of tertiarybase-base interactions (Olsen 1983; Gutell et al. 1992; Gutell 1993a - and references therein) and tertiarybase-triple interactions (Gautheret, Damberger, and Gutell 1995) can now be inferred.B. PRINCIPLES OF COVARIANCE ANALYSISComparative sequence analysis is based on a very simple and profound principle. The same three-dimensional structure for an RNA molecule can be derived from a h~rge number of different sequences.For this paradigm to be proven, the functionally equivalent RNA molecules under study (e.g., lRNAs, 16SrRNA. etc.) should form a comparable three-dimensional structure no matter how similar or divergenttheir sequences are. Trus principle has been applied mainly in the following circumstances. Analogoussecondary- and tertiary-structure helices have been proposed when compensatory base changes (e.g., A:UH G:C) in a collection of aligned sequences are identified within a potential helix. In other words, (two)positions are said to have a structural relationship (Le., base pair) when their patterns of change in asequence alignment are coordinated (i.e., the two positions covary). Thus, to identify secondary-structurehelices we search for positional covariances. The first and simplest result to emerge from this analysiswas the identification of secondary-structure helices. Subsequently, tertiary base-base interactions havebeen proposed on the basis of these same methods. More recently, refined methods for identifying base-triples (Gautheret, Damberger, and Gutell 1995) have been developed and applied to the rRNA datasets,resulting in several likely base-triple candidates (Gutell et aI., mss. in preparation).Comparative analysis reveals more than just the secondary- and tertiary-structure pairing assignmentsbetween two or three bases. The pattern of variation between the two pairing positions can suggest aspecific confonnation for the pairing arrangement. A few examples are mentioned here. (I) Two positionsthat contain all canonical and G:U base pairing possibilities are quite likely to fonn the standard Watson-Crick base pairing confonnation. (2) A few base-paired positions in 16S and 23S rRNAs are restrictedto A:U or U:A base pairs, suggesting that these base pairs fonn a unique conformation that only thiscombination of pairing types can adopt. (3) For some base pair positions. the constraint is different amongvarious phylogenetic groups, implying that different confonnations might occur within different phylo-genetic groups. A sampling of other restricted base pair exchanges are discussed below.The comparative approach by itself does not prove the existence of a base pair, helix, or entiresecondary structure. Instead it reveals constraints in positional variation, from which we infer secondaryand tertiary structure. Given the success this approach has had in predicting a three-dimensional tRNAstructure that is largely congruent with its crystal structure solution, we are confident that these inferredstructures are biologically meaningful. While the comparatively derived 16S and 23S rRNA secondaryand tertiary structures cannot be experimentally substantiated in the same manner as tRNA, the combi-nation of various experimental approaches (e.g., site-directed mutagenesis, chemical modification, etc.)has corroborated these proposed structures (see Hill et al. 1990; Nierhaus et al. 1993). The comparativemethodology should not be viewed as a competing approach in elucidating rRNA structure. Rather thecombination of experimental and comparative approaches presents us with a richer collection of facts tobuild upon. The comparative approach is, in one sense, an analysis of experiments perfonned during theevolution of the rRNAs under study. However, we do not know what the experimental design or intentswere. We observe and analyze the sequences that have survived this evolutionary process, and from thesepatterns of variation we have inferred secondary- and tertiary-structure base pairings. We can also deriveother types of infonnation from these patterns of sequence variation. For example. positions and structuralfeatures that are highly conserved are indicative of functional significance. The compositional frequencywithin certain unpaired loops could suggest unique conformations or thennodynamic properties (e.g.,"tetraloops"; see below).C. MODELLING rRNA SECONDARY AND TERTIARY STRUCTUREThe ribosomal RNAs are quite amenable to the comparative approach due to their significant role in thestructure of the ribosome and its function in protein synthesis, their long biological ancestry that linksthem to an early stage in the evolution of the cell, and their patterns of conservation that make them thepreferred choice for phylogenetic reconstruction studies (Woese 1980, Woese 1987).The modelling of rRNA structures has proceeded in stages. The comparative analysis has reliedprimarily on the simplest pattern of covariation. Two positions are considered paired only when the two
-•113aligned positions vary in a highly coordinated manner (i.e., variation at one position is compensated forby a correlated change at its pairing position). Initially, only covarying positions with canonical (A:U andG:C) and G:U pairings, flanked by other canonical pairings, were considered to be base paired. The initialstudies on 16S and 23S rRNAs resulted in minimal secondary-structure models (Woese et al. 1980;Stiegler et al. 1981; Zwieb, Glotz, and Brimacombe 1981; Noller et al. 1981; Branlant et al. 1981; Glotz,Zwieb, and Brimacombe 1981). The majority of the proposed helices are still present in our currentsecondary- structure model, although many have been refined. At the outset, with a small number ofavailable sequences, only a few base pairs within a potential helix exhibited compensatory base substi-tutions. In such cases, the length of the helix was determined to be the maximum number of possibleconsecutive and canonical base pairs. At that time a helix was considered phylogenetically proven whentwo or more of its base pairs had two or more compensatory base changes (Woese et al. 1980).The growing rRNA sequence collections presented us with the opportunity to test the authenticity ofthe first comparatively derived structures. and to search for new interactions. The majority of the earliersecondary-structure base pairs in the 16S and 23S rRNAs were corroborated with the additional sequenceinfonnation. These base pairings either received support from additional examples of compensatory basesubstitutions, or the nucleotides that are putatively base paired were phylogenetically conserved in thelarger rRNA datasets (i.e., they did not contain counter-evidence). Thus the number of base pairs withcomparative support (i.e., compensatory base changes) increased with the larger number of availablesequences. In contrast, some base pairs were eliminated due to uncompensated base substitutions (Le.,a change at one position without a change in its putative pairing partner). New positional covariances wereidentified with these larger sequence collections. Some of these pairings were components of newlyproposed secondary-structure helices. Others were associated with "more complex secondary (e.g.,noncanonical base pairings, or base pairings not flanked by other canonical base pairs; see below) ortertiary (e.g., pseudoknots and parallel interactions; see below) interactions.The first complete 16S rRNA sequence was determined in 1978 (Brosius et al. 1978, Brosius et al.1981). Subsequently, approximately 3,750 16S and 16S-like rRNA sequences (complete or nearly so)have been determined and are in the public domain (or will be soon) (Maidak et al. 1994, Gutell 1994,Gutell, unpublished collections). This number includes 2,200 from (eu)Bacteria, 100 from Archaea, 1,220from Eucarya (nuclear), 60 from chloroplasts, and 160 from mitochondria. In 1980, the first complete 23SrRNA sequence was published (Brosius, Dull, and Noller 1980). Since then, approximately 340 23S and23S-like rRNA sequences (complete or nearly so) have become publicly available (or will be soon)(Gutell, Gray, and Schnare 1993; Gutell, unpublished collection). Ofthis number, 82 are from (eu)Bacteria,16 from Archaea, 42 from Eucarya (nuclear), 92 from chloroplasts, and 105 from mitochondria. Thiscollection of sequences is large and well distributed across all of the major taxa. Primary-structurealignments and secondary-structure diagrams for representative 16S and 23S rRNA sequences are alsopublicly available (Maidak et al. 1994; Gutell 1994; Gutell, Gray, and Schnare 1993).In parallel with the increase in the number of 16S and 23S rRNA sequences, the methods used toidentify positional covariances have improved. Initially, only covariations involving canonical basepairs in a potential secondary structure helix were scored positively. Later on, a more sophisticatedalgorithm was developed to identify positional covariations regardless of the base pair types and theinteractions at the flanking positions (Gutell et al. 1985). However, this algorithm could only identifyperfect covariations (e.g., A:U H G:C interchanges were identified, but A:U H G:U H G:C weremissed). Currently, our algorithms measure the mutual information between two positions with a chi-square statistic (Gutell et al. 1992), and a pseudo-measure of the number of mutual changes that haveoccurred during the phylogenetic evolution ofthe RNA under study (Le., phylogenetic events) (Gautheret,Damberger, and Gutell 1995; Gutell and Damberger, mss. in preparation). These newest computationalmethods are currently being applied to our very large collection of 16S and 23S rRNA databases. The16S rRNA secondary and tertiary structure model remains largely the same, although a few secondarystructure base pairings will be eliminated, while a few tertiary-like pairings will be added. Severalputative base triples have been identified as well (Gutell et aI., mss. in preparation). For the 23S rRNA,the refinements in structure will involve a few more base pairings. A few pairings have been eliminated(which are reflected in Figure 1). while a greater number of new secondary and tertiary base pairs willbe included in future structure models. Several strong base triple candidates have now been identified(these new base-base and base-triple interactions are not shown in Figure I). Of these newly proposed23S rRNA interactions, several are located in functionally important regions of the 23S rRNA (Gutellet al.. mss. in preparation).
-114Figure 1A The current version of the Escherichia coli 168 and 238 rRNA secondary and tertiary structuremodels. A detailed comparative analysis for the secondary and tertiary structure base pairs will be formallypresented elsewhere. All canonical (A:U, U:A, C:G, and G:C) secondary-structure base pairs are indicated by aconnecting line, G:U pairs by a dot. G:A pairs with open circles, and other noncanonicat pairings (see text) withclosed circles, The nucleotides that are juxtaposed but not connected with a line or circles are consideredpossible, but do not have comparative proof or disproof. These nucleotides are usually invariant, or nearly so."Tertiary" interactions with strong comparative data are connected by thicker (and longer) solid lines. Dashedtertiary lines refer to interactions that are considered possible, but are lacking convincing comparative evidence.Every 10th nucleotide is marked with a tick mark, and every 50th nucleotide is numbered. The secondary structurediagrams were drawn with the program XRNA, which was developed by B. Weiser and H. Noller. (A) -Eschenchia coli 165 rRNA. The sequence was determined by Brosius at aI. (Brosius et aJ. 1978; Brosius et aJ.1981 ).
-115Figure 1B Escherichia coli 238 rRNA, 5 half. The sequence was determined by Brosius et al. (Brosius et al.1980.)II. SECONDARY AND TERTIARY STRUCTUREA. CURRENT MODELS OF 16S AND 23S rRNA SECONDARY AND TERTIARYSTRUCTUREThe comparatively inferred secondary- and tertiary-structure models for Escherichia coli l6S and 23SrRNAs are shown in Figure I. The base numbering in Figure 1 and within this chapter are for the E. colil6S and 23S rRNA sequences (Brosius et al. 1978; Brosius, Dull, and Noller 1980; Brosius et al. 1981).The degree of comparative support for the majority of the secondary and tertiary base pairs in the 16Sand 23S rRNA structure models is quite strong. A small number of pairings reflect minimal comparativeevidence, although enough to warrant their inclusion in the structure diagram. A formal analysis of eachsecondary- and tertiary-structure base pair will be presented elsewhere.
-•116Figure 1C Escherichia coli 235 rRNA, 3 half.Base pairs which contain strong comparative support are shown with a connecting symbol. Tick marksdesignate A:U and G:C base pairs, a small closed dot is used for G:U pairs, a larger open circle denotesA:G pairings, and a larger closed circle is used for all other noncanonical pairings. In contrast, conservednucleotides in the immediate vicinity of a convincing helix that can fonn canonical and G:U base pairsare shown juxtaposed with no connecting symbol. These latter base pairings are (currently) neither provennor disproven by the comparative data.One of the simple beauties of comparative sequence analysis, as it is practiced here, is that secondary-and tertiary-structure base pairings are inferred in the absence of any knowledge of the principles of RNA
-•117structure. The best correlation is determined for every E.coli position in the alignment. This correlationis based solely on the mutual change of nucleotides in this alignment, no matter what the underlying basesare (e.g., this procedure will identify interchanges such as C:C H U:U as well as A:U H O:C). Thus,canonical base pairs are not specifically searched for. The two positions that do correlate will do soindependently (to a first approximation) of flanking positions. Therefore the correlating pair could belocated in a multitude of structural environments. They could be antiparallei and adjacent to another setof correlating positions (e.g., to fonn ahelix). arranged to form a set of paraliel interactions, isolated fromother known base pairings, etc. Secondary-structure helices can be nested or arranged into a pseudoknotconfiguration. In the former, helices afe contained within one another, with a few long-range helicesestablishing the primary structural domains. In the latter, the pseudoknol helix is formed by the crossingof another helix.As Nature will have it (as she usually does), the majority of all correlating pairs involve canonical andG:U base pairs, the vast majority of all pairings are arranged into secondary-structure helices, and themajority of these helices are nested. These expected observations should not be taken too lightly;comparative sequence analysis has revealed the most basic principles of RNA structure.In contrast with these predominant and expected reSUlts, we also observe some exceptions to thesecommon forms of RNA structure in the two rRNAs. Several unusual base pairings and base-pairinginterchanges are emerging from the comparative studies, including O:U H A:C, A:O H O:A, U:U HC:C, and A:A H G:G noncanonical base pair replacements. While the majority of all base pairings areadjacent and the nucleotides within them antiparallel, afew are isolated by themselves, and others containparallel base pairings. In addition, a small but growing number of helices cross the boundaries of othershelices, forming structural pseudoknots. These less common and structurally interesting elements bothcontribute to our understanding of 16S and 23S rRNA structure and enrich our knowledge about possibleRNA structures. Are these exceptional and aberrant structural features biologically meaningful? Giventhat our comparative studies have independently derived the basic rules for RNA structure, we areconfident that these newly proposed structural elements will be experimentally verified. A few pertinentrRNA examples for these structural themes will be presented below.B. CHARACTERISTICS OF THE BASE PAIRS1, Canonical PairingsSince all of the 16S and 23S rRNA secondary and tertiary structure base pairs are inferred from mutualpatterns of variation, and not from a search for canonical pairings, it is of interest to note the frequencyofbase-pairtypes in these comparative structure models. Of the 16 possible pairing types (e.g., U:U, U:A,U:O, U:C, C:U, C:A, etc.), the majority of all helical base pairs in the comparatively derived 16S and 23SrRNA structures are composed of O:C and A:U pairings. In an analysis of hundreds of (eu)Bacterial andchloroplast 16S and 23S rRNA sequences, the order of frequency for all of the comparatively inferredbase pairs are (with the relative frequencies for 16S and 23S separated by a slash): O:C [30%/28%J > C:O[24%/28%J > U:A [15%1I3%J >A:U [12%/13%J > O:U [7%I7%J > u:o [6%/6%] > A:O [2%11 %] > O:A[I%II%J > U:U [1 %/1%] > others (Konings and Outell, unpublished results). This result parallels thethermodynamic stabilities for base pairs (Freier et a/. 1986; Turner, Sugimoto, and Freier 1988),suggesting that the comparatively derived structures are, at least in part, selected to be thermOdynamicallystable.Underlying these general base pair frequencies and constraints are different selection pressures foreach base pair position. While a detailed analysis is beyond the scope of this chapter, a few keyobservations will be presented here. The majority of all base pair positions covary between canonical andG:U pairings only. The others reflect varying amounts ofnoncanonical pairings. While canonical and G:Upairings are still the predominant pairing type at these base pair positions, there are some positions thatonly contain noncanonical pairs (Outell, Larsen, and Woese 1994; Konings and Outell, unpublishedresults).Next we ask ifthe secondary and tertiary structure base pairs are restricted in their variation. Ifahelicalbase pair freely interchanges between all canonical and G:U pairings, then we can infer that this pairingforms the standard base pairing conformation (assuming that the conformation of that base pair isconserved across all sequences in the dataset). Its purpose is to maintain a helix and it is probably notintimately involved in protein synthesis. However, if a base pair position is limited in the types of basepairs observed, then we can surmise that this position might be part of a structural element that is
-•118important for recognition andlor function. [Because our rRNA database is now sufficiently large. we canassume that each position has had ample opportunity to explore all possible nucleotides and base pairings.Thus, if a position or a base pair is restricted in variation it is due to a structural or functional constraintand not due to a small sampling size].While many base pair positions appear to freely exchange between base pair types (e.g., A:U .... U:AH C:G H G:C) without much restraint, and are thus under less selection pressure, a large number ofpairings are restricted in a variety of interesting ways. Within the (eu)Bacteria, some pairings areinvariant, although they differ in the other phylogenetic domains. Other base pair positions only inter-change between two base pair types (e.g., A:U and G:C); while others exchange between three or morebase pair types. Base pairings that are predominantly R:Y-constrained (i.e., covary between A:U and G:Cor U:A and CoG) are relatively frequent, occurring at approximately 40% and 50% of the (eu)Bacterial16S and 23S rRNA base pair positions, respectively. There is a tendency for these R:Y-restricted basepairs to occur in long-range helices (e.g., helices that are separated by a large number of nucleotides) orhelices that have been associated with ribosomal function or ribosomal protein binding (Konings andGutell, unpublished). Of those base pairs with only two canonical pairing types, the majority involveconservative transitional interchanges (Le., G:C H A:U). as we might expect. A small number ofcanonical interchanges involve transversions. Of these, A:U H C:G and G:C H U:A occur lessfrequently than G:C .... G:C and A:U .... U:A (Konings and Gutell, unpublished data). While theconformational and functional significance of these latter canonical base pair exchanges are not currentlyappreciated. we can speculate that they will be associated with protein binding, interesting RNAconformations (e.g., base triples), and possibly ribosomal RNA function (e.g., tRNA binding). A moredetailed analysis of these base pair exchanges will be presented elsewhere (Konings and Gutell, manu-script in preparation).2, Noncanonical PairingsIn addition to the restricted canonical base pair exchanges, there afe several noncanonical helical basepairings with interesting interchange patterns, The majority of these occur in secondary-structure helices.although a few discussed in this section occur in more complex structural elements and might beconsidered tertiary. For this discussion G:U base pairs will be considered to be noncanonical.a. G:U Base Pairs and G:U H A:C InterchangesThe canonical base pairs G:C, CoG, U:A, and A:U are the most frequent comparatively derived pairings(see above). Immediately following in frequency are G:U and U:G base pairs, occurring at 7% and 6%in 16S and 23S rRNAs respectively [in a (eu)Bacterial and chloroplast alignment](Konings and Gutell,unpublished results). Although this overall frequency is low, G:U base pairs occur at more than 30% ofthe base pair positions in 16S and 23S rRNAs. The frequency of G:U pairings at these positions variesover a large range. some positions containing just a few percent, while other positions are occupied withnearly 100% G:U base pairings. Positions with a majority (> 50%) of G:U pairings have been classifiedinto three types (see Figure 3 in Gutell. Larsen, and Woese 1994). Those that are invariant (or nearly so)within the (eu)Bacterial datasets are designated "I", Base pair positions that are predominantly G:U arecaBed "D" (for dominant). Within this class are examples of G:U .... U:G interchanges. The third typehas been called "N" for nontypical. Here the base pair interconverts between G:U and A:C. For thepositions where this occurs, it is usually quite pronounced within one or two phylogenetic groups, orsometimes within parts ofa phylogenetic domain (e.g., within the gram positive group ofthe (eu)Bacteria).At this time we will only discuss some examples of the "N" G:U type.Although very infrequent, there are a few examples where the U:G (or CoAl is invariant in the threeprimary phylogenetic domains, while U:G and C:A pairings covary in the mitochondria. The 16S rRNAposition 1402: 1500 contains a C:A juxtaposition in all (eu)Bacteria, Archaea, Eucarya, and the majorityof mitochondria, while a few phylogeneticaBy unrelated distantly related mitochondria have a U:G pair(Gutell 1993b; Gutell, Larsen, and Woese 1994). The 16S rRNA position 1052:1206 is predominantlyU:G, with a few C:A pairings in the mitochondria (Konings and Gutell, unpublished). Although there isno immediate explanation. it is of interest to note that both of these 16S rRNA base pairs are in immediateproximity to sites associated with protein synthesis (Rinke-Appel et al. 1994; Prince et al. 1982;Cunningham et al. 1993; Moine and Dahlberg 1994; Noller et al. 1990). Within 23S rRNA, positions2249:2255 and 2457:2494 provide two more examples for this type ofU:G .... C:A interchange. The U:G
-•119pair is mostly invariant in the three phylogenetic domains for both of these base pairs. While U:G is thepredominant mitochondrial pairing at the 2249:2255 pcsition, there are a small but significant number ofsequences with a C:A pair. The second 23S base pair 2457:2494 contains approximately 50% C:A and50% U:G base pairs in the mitochondria. Both of these positions are also in close proximity to sitesinvolved in protein synthetic activity (Noller et al. 1990; Sirum·Connolly and Mason 1993; Gregory,Lieberman, and Dahlberg 1994). .Some of the U:G H C:A interconversions occur primarily within one phylogenetic group, whilecanonical pairings are present in the other phylogenetic groups. The 16S rRNA pair 249:275 interchangesbetween U:G and C:A in the Archaea and (eu)Bacterial groups (Gutell, Larsen, and Woese 1994), while thisU:G pair alternates with otherpairs in the Eucarya and mitochondria. Two other examples ofthis mini-themeoccur at 16S rRNA pcsitions 1074:1083 and 1086:1099. At the 1074:1083 base pair, G:U and A:C pairscovary in the Archaea, with the A:C pair present in over 80% of the Archaea sequences. Within the(eu)Bacteria, this base pair is always a G:U, while the Eucarya and mitochondria maintain canonical pairingsalong with the G:U pair. At the 1086: 1099 base pair, the (eu)Bacteria interchange between U:G and C:Apairings, with the U:G pairoccuring in over 90% ofthe sequences. Canonical and other types ofnoncanonicalpairings are present in the Archaea, Eucarya, and mitochondria at the 1086: 1099 base pair. One last exampleofthis theme occurs in (eu)Bacteria at pcsitions 383:391 in 23S rRNA. This pairing is predominantly a C:A,alternating to U:G in a few phylogenetically distinct organisms. This base pair and surrounding nucleotidesare deleted in the Archaea and Eucarya. These are the purest examples for this novel pairing exchange inthe rRNAs. Other alternating U:G H C:A pairs are not exclusive. Instead they also exchange with canonicalor noncanonical pairs. One prominent example is present at the (eu)Bacterial 16S rRNA base pair 152: 169.Here the A:C pair exists in 75% of the sequences, followed by G:U(13%), G:G(6%), and N:N(6%). Thenumber of mutual changes (Le., covariations) occurring during evolution at this base pair position is quitelarge. The most common compensatory change is between A:C and G:U base pairs, followed by A:C andG:G base pairs. Other examples of alternating G:U and A:C base pairs within select phylogenetic groupshave been identified and will be presented elsewhere.b. Purine:Purine, Pyrimidine:Pyrimidine, and Other InterconversionsComparative studies have revealed several examples of A:G, G:A, A:A, G:G, U:U, and C:C pairings. Afraction of these rare base pairs exchange with canonical pairings. However, each one of the noncanonicalbase pairings is also involved in unique and exclusive covariations with other noncanonical pairings.Although none of these has been explicitly associated with protein recognition. ribosomal function. ormore complex RNA-RNA interactions, we can speculate that each of them is involved in more thansimple base pairing.A:G H G:A. There are several examples in the rRNAs for A:G H G:A covariations. In 16S rRNAthe cleanest example involves pcsitions 1357 and 1365, which are the closing base pairs for a hairpin loop(Woese et al. 1983). Within the (eu)Bacteria, chloroplasts, and mitochondria, A:G and G:A interconvertexclusively with one another, while in the Archaea and Eucarya this base pair consists only of canonicalbase pairs. Three examples are known in 23S rRNA. The first occurs in a secondary-structure helix atpcsitions 1858:1884 (Gutell 1993b; Trust et a1. 1994). This exchange is flanked on both sides withnoncanonical and canonical pairings. fonning an irregular helix. This unusual structural element onlyforms in the (eu)Bacteria and chloroplasts (Gutell 1992). The Archaea and Eucarya have truncated thispart of the helix. The second 23S rRNA example, at pcsitions 2112:2169, does not occur in a secondary·structure helix; instead. it is situated adjacent to other pairings that form a parallel structure (GuteH andWoese 1990; see below). This structurally unique region is associated with the E site (Moazed and Noller1989). Crosslinking studies have confirmed the propcsed 2112:2169 interaction (Doring, Gruer, andBrimacombe 1991). The third 23S rRNA A:G H G:A exchange occurs in (eu)Bacteria at positions857:920 (Konings and Gutell, unpublished). Although this pairing is flanked by a canonical pairing onits 5 side and a G:U pairing on its 3 side, this helical region is irregular. with several A:G pairings. TheArchaea contain canonical pairings at 857:920 while the Eucarya have both canonical and noncanonicalpairs. The (eu)Bacterial 5S rRNA has an example of a A:G H G:A covariation between positions 75and 101, which is nested in an irregular helical structure (Gutell, unpublished analysis). This base pairis flanked on its 5 side with a U:G pair that exchanges with C:A and A:A pairings, and on its 3 sidewith a G:G pair that interchanges with A:A (see below). Each of these A:G H G:A covariations isfound in an interesting structural context. One occurs at the end of a helix, one is present in a parallel
-•120structure. another is in close proximity to other A:G pairings. and two are flanked by other noncanonicalpairings.U:U H CoCo There are three rRNA base pairings that covary exclusively between U:U and C:c. Twoof them occur in 16S rRNA. at positions 245:283 and 1307:1330. and one in 23S rRNA. at positions1782:2586 (Outell and Woese 1990; Outell. Larsen. and Woese 1994). The 16S rRNA 245:283 base pairis considered a lone pair since neither of the paired nucleotides is contiguous with an adjacent base pair.This pair is a C:C in nearly all Eucarya sequences, while in both the (eu)Bacteria and Archaea there aremany documented U:U H C:C covariations. Note that this pairing is a few base pairs from the 249:275U:O H C:A exchange (see above). Ribosomal protein SI7 protects these base pairs from modification(powers and Noller 1995). suggesting that these unusual pairings are important for protein recognition.The second 16S rRNA U:U H C:C exchange occurs at the 1307: 1330 base pair (Outell and Woese 1990;Outell. Larsen. and Woese 1994). This covariation is present within each of the three phylogeneticdomains. and is flanked on its 3 side by the 1308·1314/1323·1329 helix and on its 5 side with severalhighly conserved O:A and A:O juxtapositions. The third U:U H C:C exchange associates the 23S rRNApositions 1782 and 2586. Not only is this pairing isolated from other known base pairings (e.g.• a lonepair; see below), it is also a long-range interaction between domains N and V.This long range covarianceis supported by crosslinking studies (Stiege. Glotz. and Brimacombe 1983; Mitchell et al. 1990).A:A H G:G. The 16S rRNA lone base pair at positions 722:733 covaries between A:A and 0:0(Outell and Woese 1990). This base pair is considered a lone pair since both of the paired nucleotidesare not contiguous with an adjacent base pair. A second example of this type of exchange occurs in the(eu)BacteriaI5S rRNA at positions 76: 100. as noted above. This pair is adjacent to a A:O H O:A pairingexchange on its 5 side; an invariant U:A pair is on its 3 side. Here both examples of the A:A H G:Gcovariance occur in an irregular helical structure, suggesting a recognition motif (Gutell, Larsen, andWoese 1994).Other interconversions. There are also a few base pair positions that alternate between a noncanonicaland canonical base pair.Those that interchange primarily between two types of base pairings are noted here.U:A H G:G at 16S rRNA positions 438:496 (Outell and Woese 1990). This pairing constraint occursin (eu)Bacteria and Archaea. The Eucarya do not have an analogous helix. This pairing is situated withina putative base triple (see below; Outell. Larsen. and Woese 1994).A:C H U:A at 16S rRNA positions 996:1045 (Woese personal communication; Outell. Larsen. andWoese 1994). This base pair is only present in the (eu)Bacteria. The Archaea and Eucarya have a differenthelical fonn in the corresponding region. This pairing is at the end of a helix and involved in a putativebase triple (Outell et al.. mss. in preparation).A:G H R:U at 16S rRNA positions 122:239 (Outell and Woese unpublished). This exchange onlyoccurs in the (eu)Bacteria. The predominant alternation is between the O:U and A:O base pairings. A:Upairings are the third most frequent base pair. interchanging mostly with the O:U pairing. The Archaeacontain mostly A:O base pairs. while the Eucaryal helix and base pair are not strictly homologous. The122:239 base pair is at the terminus of the 122·1291232·239 helix. and situated in close proximity to aputative triple between positions 121 and 1241237 or 1251236 (see below; Outell et al.. mss. in preparation).A:G H G:U at 23S rRNA positions 15:525 and 2675:2732. The 15:525 O:U H A:O base paircovariance is strongest in the (eu)Bacteria, present but weaker in the Archaea, and nonexistent in theEucaryal domain. The 2675:2732 base pair alternates between A:O and O:U in the (eu)Bacteria. TheArchaea and Eucarya are lacking this base pair; instead. they have an extra base pair at the terminus ofthe 2646·265212668·2674 helix. This 2675:2732 interaction is at the junction of a putative coaxial stackbetween helices 2646·265212668·2674 and 2675·268012727·2732 (see below; Oute1l1992; OutellI993b).The significance of these constrained and noncanonical base pair alternations is not currently under-stood. However, it is of interest to note that the majority of them occur in a unique structural context, beit at the end of a helix. in a lone pair, in proximity or direct association with (putative) base triples, orat the junction of a (putative) coaxial helix stack. We can speculate that these unusual pairings and pairingexchanges are generally important for the fonnation of specialized structural confonnations. We awaitthe experiments that should enlighten us as to why these base pairings are restricted in their variation.C. ARRANGEMENT OF THE BASE PAIRS AND CHARACTERISTICS OF HELICES1. Traditional Organization of Base Pairs and HelicesComparative sequence analysis, in its simplest fonn, identifies positional covariation. From thesecorrelations, we infer base pairings. As noted earlier, the basic principles of base pairings in RNA have
-•121been derived independently in these comparative studies. It was also noted that the majority of thecanonical base pairings are arranged consecutively and antiparallei with one another so as to form helices.Further, the majority of these helices are constructed so that each helix is within the boundaries of anotherhelix (e.g.. they are not knotted), In contrast. to traditional base pairs and helices, a few base pairs are notflanked by other base pairings (e.g., lone pairs), a small number of helices cross the boundaries of otherhelices fanning pseudoknots, while other base pairs are arranged in parallel. The nontraditional structuralelements are described, along with a few other interesting structural features derived from our campara·tive studies of rRNA, in the following paragraphs.2. Lone PairsApproximately 20 comparatively derived base pairings are not components of a secondary or tertiarystructure helix. Rather, these base pairs are isolated from other base pairings (see Figure I). While a fewof the lone pairs are completely isolated from other base pairings (i.e., they are not within a few nucleotidesof the closest base pair; see below), most of the lone pairs are adjacent on their 5 or 3 sides to an existingsecondary structure helix. While some of these lone pairs contain noncanonical base pairings (see above).the majority were identified by canonical base pairing exchanges. Examples of lone pairs that are completelyisolated from other base pairs are in the 16S rRNA, 450:483 and 1399:1504, and in the 23S rRNA,1782:2586 (noncanonical pair, see above) and 2117:2172 (see below). Lone pairs that are partially isolatedin 16S rRNA are: 47:361; 245:283 (noncanonical pair, see above); 575:880; 722:733 (noncanonical pair,see above); and 779:803. Partially isolated lone pairs in the 23S RNA are: 30:510; 61:93 (see below); 67:74;234:430; 319:323; 1082: 086; 087: I 02; 1262:2017; 1752:1756; 1800: 1817; 2282:2427; 2512:2574; and2626:2777 [Lone pairings that cross an existing helix (Le., a pseudoknot) are designated in bold type). Threeof the 23S rRNA lone pairs form a hairpin loop composed of three nucleotides (319:323; 1082:1086; and1752: 1756). Each of these is immediately contiguous on its 5 and/or 3 sides with other helical base pairs.All of these single base pairs, in isolation from other interactions, are probably not very stable. Thus for thesebase pairs to form, we suspect that auxiliary factors, such as proteins and/or other RNA:R.!1A interactions(i.e., stacking onto adjacent bases or base triple formation) stabilize their formation. The 2282:2427interaction has been substantiated by crosslinking experiments (Mitchell et al. 1990).3. PseudoknotsAs noted earlier, the majority of all base pairs and helices are nested; their formation does not involvethe formation of a knot. In contrast, pseudoknots are helical interactions that cross the boundaries ofanother set of helices. There are approximately 15 of these structural motifs in the 16S and 23S rRNAs(Gutell, Larsen, and Woese 1994 and references therein). The three pseudoknot helices in 16S rRNA are:17·19/916·918; 505-507/524-526; and 570-571 /865-866 (see Figure I). The base pairings in these helicesare all canonical. These helices appear to be more than just a structural entity. While the structuralintegrity of each of these comparatively derived helices has been experimentally substantiated, they havealso been shown to be important for protein synthesis (Brink, Verbeet, and de Boer 1993; Powers andNoller 1991; Vila et al. 1994). These pseudoknots are all located in close proxintity to the helices thatestablish the three structural domains in 16S rRNA and in regions ofthe rRNA that are conserved in theArchaea. Eucarya. and (eu)Bacteria. We can speculate that they might coordinate interactions, and evenmovement. between the 16S rRNA structural domains.The 23S rRNA contains more than ten pseudoknots. The majority of the base pairs in these structuralelements are canonical, although there are a few noncanonical pairings. While the pseudoknot helices in16S rRNA are two or three base pairs in length, some of the 23S rRNA pseudoknots consist of only asingle base pair (e.g., 2626:2777). However, as with 16S rRNA, the longest pseudoknots are only threebase pairs in length. Two of the 23S rRNA pseudoknots associate two hairpin loops (designated belowas "loop-loop" interactions). The 23S rRNA pseudoknots are: 61:93 and 65-66/88·89 (loop-loop); 67:74;234:430; 317-318/333-334; 413-416/2407-2410 (loop-loop); 1005-100611137-1138; 1343-134411403-1404; 2111/(2144 2147); 2112:2169 (noncanonical, see above), 2113:2170, 2117:2172 (base pairings inparallel, see below); 1782:2586 (noncanonical, see above); 2328-2330/2385-2387; and 2626:2777. A fewof the comparatively inferred interactions in 23S rRNA are also substantiated with experimental data.Recently, the proposed 1005-100611137-1138 pseudoknot helix in domain II has been verified by site-directed mutagenesis and shown to be important for ribosomal function (Rosendahl, Hansen, andDouthwaite, 1995). Domain III of23S rRNA contains the 1343-134411403-1404 pseudoknot helix, whichis positioned at the base of three secondary structure helices. Experiments verified this pairing and revealed
-•122that this helix is important for ribosomal protein binding (Kooi et al. 1993). Experimental support for the1782:2586 and 2112:2169 interactions was discussed earlier. Direct experimental evidence (eg. crosslinking)for the other proposed 235 rRNA pseudoknots is currently lacking. However, it should be emphasized thatthe comparative support (i.e., the number of compensatory base changes) for each of these pseudoknotinteractions is very strong. Thus, we anticipate experimental verification in time.In summary, all of the proposed rRNA pseudoknot helices are short, ranging from one to threebasepairs in length. The majority of these helices have either their 5 or 3 halves immediately adjacentto the end of a secondary-structure helix. And some of the pseudoknot helices can potentially stack ontomore than one secondary-structure helix. These observations, in parallel with the experimental charac-terization of a simple pseudoknot structure that revealed coaxial stacking of the stems (Puglisi, Wyatt,and Tinoco, 1990), lend support to the idea that some of the 165 and 235 rRNA pseudoknot helices canbe coaxially stacked onto one or more adjacent helices in a confonnationally static or dynamic manner(ten Dam, Pleij, and Draper 1992, see below).4, Base Pairings in ParallelThe majority of all base pairs are positioned antiparallel to one another to form traditional RNA helices.In contrast, there are a few adjacent base pairs that are configured in parallel to one another. These occurin 235 rRNA at positions 2112/2169, 2113/2170, and 211712172 (Gutell and Woese 1990; Gutell, Larsen,and Woese 1994). These pairings are at the E site (Moazed and Noller 1989), suggesting that this unusualarrangement of base pairs is associated with protein synthesis. As noted earlier, positions 2112 and 2169alternate primarily between A:G and G:A pairings, and the proposed interactions have been validated bycross-linking data (Doring, Greuer, and Brimacombe 1991).5. Base TriplesOur search for base triples by comparative methods has led to several unexpected findings. Initially, nostrong 165 or 235 rRNA triple candidates were identified. Consequently we investigated the triples intRNA for which there are several three-dimensional crystal structures available for different tRNAs.Moreover. the comparative sequence alignment database is large and representative of all tRNA classes.First, these studies revealed that base triples are not conserved across all tRNAs. For example. in yeastlRNA"", the three triples involve positions 45(10:25), (12:23)9, and (13:22)46. In E.coli tRNAGI", the oneand only triple associates positions 45(13:22), while in E.co/i tRNA"" the base triples form betweenpositions 9(13:22), 8(14:21), and 48(15:20) (Gautheret, Damberger, and Gutell 1995, and referencestherein). Second. similar base triple conformations can fonn in the absence of covariation. For example.the conformation for the (12:23)9 lRNA base triple is nearly identical for the sequences (U:A)A and(U:A)G (Klug, Ladner, and Robertus 1974; Gautheret, Damberger, and Gutell 1995). Third, the base pairsin the helices that are associated with base-triple fonnation tend to have a strong neighbor effect. Inparticular we observe that paired nucleotides have a strong correlation with nucleotides in the flankingbase pairs and reflect a strong correlation with its base pair partner as well. Taken together, these findingssuggest that for base triples, the structural unit under selection is not simply and only the base pair, asit is for base:base interactions (to a first approximation). For base triples, a larger three-dimensionalstructure appears to be evolving as a unit under the same selection constraint (Gautheret. Damberger, andGutell 1995). These findings help explain in part why our initial searches for base triples in 165 and 235rRNA were inconclusive.This analysis of the lRNA base triples has resulted in our development of quantitative methods tobetter identify these structural features (Gautheret, Damberger, and Gutell 1995). The newer comparativealgorithms have improved our identification of the known base triples in lRNAs and group I introns, andhave also suggested new base triples in the group I introns (Gautheret, Damberger, and Gutell 1995).More recently these methods have been applied to the 165 and 235 rRNA datasets, resulting in severalstrong base triple candidates (Gutell et aI., manuscript in preparation). A few of them are mentioned here.Previously we had recognized a probable triple covariance involving positions 440, 494, and 497 in 165rRNA (Gutell, Larsen, and Woese 1994). This triple is detected with the newer computational methods,suggesting that it is indeed a significant correlation. However, this pairing is a bit unusual. Althoughpositions 440 and 494 are juxtaposed in the secondary structure. the correlation is weaker than forpositions 440 and 497. Within the boundaries of this putative base triple lies the unusual base pair 438:496(see above), which alternates between U:A and G:G (Gutell and Woese 1990). Taken together, these twosets of correlations suggest an interesting structure at the base of the 442-446/488-492 helix. Our
-123quanititive algorithms have also identified a strong correlation between the unpaired 16S rRNA position121 and the 124:237 and 125:236 basepairs, suggesting another probable base triple. While position 121can potentially interact with either base pair, the base triple more likely involves just one of these basepairs. The other correlating base pair is probably due to a very strong neighbcr effect (we observe a strongneighbcr effect in the tRNA D-helix where three of the four base pairs are involved in base triples; Gutellet al. 1992; Gautheret, Damberger, and Gutell 1995). Several other base triples have been detected thatalso link an unpaired nucleotide with a base pair that is afew nucleotides away. The details for these basetriples will be presented elsewhere (Gutell et aI., manuscript in preparation).Within 23S rRNA there are several excellent base triple covariations (Gutell et aI., manuscript inpreparation). A few of these incorporate two distant regions of the secondary structure that havepreviously been physically linked by crosslinking experiments (Mitchell et al. 1990; Doring, Greuer, andBrimacombe 1991). Probably the most tantalizing of these involves position 746 and the 2057:2611 basepair. Here the triplet (G:C)U is most frequent, followed by (A:U)G and (C:G)e. Positions 748 and 2613-2614, in very close proximity to these triple nucleotides. have been crosslinked with ultraviolet irradiation(Mitchell et al. 1990), strongly suggesting the authenticity for this comparatively derived base triple. Thisbase triple, if real, might well be involved in protein synthesis. The 2057:2611 base pair adjoins thepeptidyJ transferase loop and is implicated in resistance to numerous antibiotics (reviewed in: Douthwaite1992). In addition, chemical protection studies have mapped vemamycin B to position 752 (Moazed andNoller 1987). Other putative 16S and 23S rRNA base triples have been identified. These along with thebase triples discussed here will be presented in greater detail in the near future (Gutell et aI., manuscriptin preparation).6. Coaxial HelicesUltimately. we seek to fold the secondary-structure helices into a three-dimensional structure. The fewknown tertiary interactions and pseudoknot helices already impose a sense of three dimensionality uponthe 16S and 23S rRNA structures. However. many more constraints are necessary to achieve a biologi-cally meaningful structure. Potentially, a better understanding of how adjacent helices interact couldprovide some of this infonnation.Our knowledge of the tRNA crystal structure (Kim 1979) and physical investigations of smalloligonucleotides (Puglisi, Wyatt, and Tinoco 1990) reveal that adjacent helices can coaxially stack,forming an elongated helical structure. Inspection of the 16S and 23S rRNA secondary-structure diagrams(Figure 1) suggests many possible helical stackings. Can we determine which of these are present in thenative 16S and 23S rRNA structures? In 1983, Carl Woese proposed a comparative rationale that couldpotentially identify some helices that are coordinated into an elongated helical structure (Woese et a1.1993). Since two helices that are coaxially stacked are expected to maintain a constant overall length,comparative support for coaxial helices would be derived from cases in which one helix in a group oforganisms is shorter. while its coordinating helix in that group is longer by the same amount therebymaintaining the same combined length. TIlls is noted in two sets of helices proposed to fonn coaxialstacks, one in 16S and the otherin 23S rRNA. The 16S rRNA helices 500-504/541-545 and 511-515/536-540 are together 10 base pairs in length. In the (eu)Bacteria the two helices are 5 and 5 bp in length. Incontrast, the corresponding helices in both the Archaea and Eucarya are 6 and 4 bp in length (Winker andWoese 1991). A pseudoknot helix forms between the side bulge ofthese two helices and the large hairpinloop capping the 511-515/536-540 helix (see above), suggesting a more complex and dynamic structure.At the base of the a-sarcin helix in 23S rRNA, helix 2646-265212668-2674 is proposed to stack onto the3 adjoining helix, 2675-268012727-2732, forming a 13-bp coordinated helix. In the (eu)Bacteria lengthsof the helices are 7 and 6 bp, while in the Archaea and Eucarya the corresponding lengths are 8 and 5bp (see abcve discussion about A:G H G:U interconversions, Gutell 1992; Gutell 1993b).7, TetraloopsThe search for positional covariation has identified base pairs, and has revealed much detail about pairingconstraints and the arrangement of the base pairs into larger structural elements. While the analysis ofunpaired nucleotides is not as advanced. these studies have categorized the unpaired nucleotides intoseveral categories: hairpin loops, bulges. terminal extensions of helices, internal loops, and multistemloops. Only the most frequent of the hairpin loops, the so called "tetraloops" will be discussed here.The most frequent hairpin loop size in 16S and 23S rRNA is four. These tetraloops account forapproximately 50% and 40% of all hairpin loops in prokaryotic 16S and 23S rRNAs, respectively. Of the
-124256 possible sequences of length four. the majority of the rRNA tetraloops are distributed into threesequence families - GNRA. UUCG. and CUUG (Woese et al. 1990). The tetraloop at positions 83-86 in the165 rRNA alternates almost exclusively between the sequences CUUG. UUCG. and GeAA. The particularclosing base pair for these hairpin loops is usually associated with the tetraloop sequence. CUUG ispredominantly closed with a G:C pair. UUCG is closed with C:G. while the closing base pair for the GCAAtetraloop is usually an A:U. While this correlation between the closing base pair and the type of tetraloopis. on the whole. true for other UUCG and CUUG tetraIoops in the rRNA. the other GNRA tetraloops areusually closed with a C:G or G:C pair (Woese et al. 1990). A few tetraloops alternate between these threesequence families, but the majority of these positions are evolving more slowly and are usually restrictedto a smaller number of sequence motifs. The GNRA sequence family accounts for the majority of tetralooppositions that are conserved at alevel above 80%. The various tetraloop themes suggest that these structuralelements can serve different functions in the rRNA structure.Do tetraloops influence the folding of rRNA. that is. does their stability contribute to nucleating theformation of important helices? Do they form a unique conformation and/or interact with other regionsof the RNA? The UUCG loop. closed with a C:G base pair is very stable (Tuerk et al. 1988). which helpsto explain why it is a frequent sequence motif in the rRNAs. However. the GNRA sequence motif occursmore frequently than UUCG even though this tetraloop is less stable (although they are slightly morestable than sequences that are less frequently observed in rRNA tetraloops) (Antao. Lai. and Tinoco 1991;5antaLucia. Kierzek and Turner 1992). Why are the GNRA tetraloops so abundant if they are not asenergetically stable? While the conformations for both of these tetraloops are very compact and structur-ally unique (Varani. Cheong. and Tinoco 1991; Heus and Pardi 1991). these features do not directlyresolve the question. Thus, are the GNRA tetraloops selected to interact with other regions of the rRNAor to be recognized by proteins? One study has revealed the formation of a base triple between the thirdnucleotide of the GNRA loop and a helical base pair (Jaeger. Michel. and Westhof 1994). In anotherstudy. an internal loop comprised of an 11 nucleotide motif was shown to bind to GAAA hairpin loops(Costa and Michel 1995). In other cases. it has been shown that certain proteins recognize specific GNRAtetraloops (Orita et aL 1993; 5zewczak et al. 1993). Moreover. !X-sarcin recognition of the GAGAtetraloop requires a C:G closing base pair (Gluck. Endo. and Wool 1994). suggesting a rationale for someof the closing base pair constraints noted earlier. We can only speculate that some tetraloops nucleatefolding so as to assure the proper rRNA structure (e.g.• the 165 rRNA 83-86 tetraloop). while others.including some of the conserved GNRA motifs, are involved in tertiary interactions. Consistent with thisproposition, there are two possible tertiary interactions in 16S rRNA that involve a tetraloop. The firstis between position 1268. the third nucleotide of a GNRA loop and the base pair 1311: 1326 (Gutell.unpublished). This putative base triple is similar to the form first recognized by Michel in group I introns(Jaeger. Michel. and Westhof 1994). When the R (of the GNRA loop) is a G. then the interacting basepair is an A:U, and when it is an A, the interacting base pair is a G:C. A second tertiary interactioninvolving a 165 rRNA tetraloop has been known since 1985 (Gutell et al. 1985. Gutell. Noller. and Woese1986). The unpaired position 570 covaries with 866. the last nucleotide of the 863-866 tetraloop. WhileE.coli has a UAAC loop sequence. many 165 rRNA sequences contain a GAM sequence in thecorresponding hairpin loop (Gutell 1993b). Do GNRA telraloops have a special conformation thatpredisposes them to tertiary interactions? The answer is yes. Very recently it has been determined thatGNRA tetra100ps can adopt the uridine turn conformation (Jucker and Pardi 1995). This U-turn. as it isknown. creates a sharp turn in the backbone. preparing the nucleotides immediately 3 for tertiary-likebasepairing (Jucker and Pardi 1995). We can now begin to understand how positions 865 and 866 in theUAAC and GAAA tetraloops can form a pseudoknot with positions 570-571. These recent findings andthe high frequency of GNRA tetraloops in 165 and 235 rRNAs now suggest that other GNRA and relatedtetraloops may well be involved in tertiary interactions.Ill. A COMPARATIVE PERSPECTIVE ON THE STRUCTURE OF rRNAJust a few years ago. we began our studies ofrRNA structure with a small collection of aligned rRNAsequences, a minimal knowledge of how RNA folds up into specific secondary and tertiary structures,and little appreciation for the functionally important structural elements. In addition, we accepted thesimple principle that different sequences can adopt a similar secondary and tertiary structure when all ofthe members of the sequence family under study are constrained to a common three-dimensionalstructure. With this sequence information and conceptual framework, comparative sequence analysis has
-125transfonned the 16S and 23S rRNA datasets into reliable secondary-structure models and the beginningsoftertiary-structure models. At present our inquires are leading to the recognition ofnovel RNA structuralelements while providing an important perspective on how the rRNAs are involved in the functioningribosome.The comparatively derived 16S and 23S rRNA secondary- and tertiary-structure models are the resultof over 10 years of development. Iri the early 1980" with a handful of 16S and 23S rRNA sequences,the search for a common secondary structure relied primarily on the identification of covarying nucle-olides present in a secondary-structure helix. These efforts produced the initial secondary-structuremodels for 16S and 23S rRNA. While, at the time, our confidence in these models was based on a minimalamount of comparative evidence. the majority of the secondary~structure base pairings originally pro-posed are present in todays highly refined secondary-structure diagrams. However, the older compara-tive methods and the limited number of 16S and 23S rRNA sequences available were only sufficient toestablish the basic secondary-structure models. No tertiary interactions could be discerned. nor could webegin to understand other structural and confonnational details.At this time we have a very large collection of 16S and 23S rRNA sequences and a better appreciationfor the comparative sequence paradigm, along with more powerful and generalized algorithms forcorrelation analysis. With the assistance of a faster computer, we are now detennining the best correlationfor each position in the 16S and 23S rRNA. This exhaustive study has already yielded a highly refinedsecondary-structure model and led to the identification of numerous tertiary interactions which have anoverwhelming degree of comparative support.For the future we wonder whether the comparative sequence paradigm will have yet more to offer inregard to understanding rRNA structure. The answer is surely affinnative. We should expect comparativeanalysis to reveal more tertiary interactions, including base triples, and to provide a better understanding ofthe relationships between sequence and structure (e.g., tetraloops) with the goal of ascertaining new RNAstructure motifs. Moreover, comparative analysis should pennit us to further investigate patterns ofvariationand how they relate to RNA confonnation (e.g., a O:A pairing motif in internal loops, Oauthere~ Konings,and Outell 1994). Ultimately, comparative analysis and the effort to determine a structure common to allof the 16S and 23S rRNA sequences will show us more than one-to-one positional covariance. Thesemethods will take advantage of the growing appreciation for RNA confonnations and the mapping betweena sequence and its secondary and tertiary interactions to assist in the search for a common structure.ACKNOWLEDGMENTSI would like to gratefully acknowledge the contribution and influence Drs. Carl Woese and HarryNoller had on this chapter. Critical readings and enhancements by Drs. Al Dahlberg and Bob Zimmennannare appreciated.REFERENCESAntao V.P., Lai S.Y. and Tinoco I. Jr. (1991). A thennodynamic study of unusually stable RNA and DNA hairpins. NucleicAcids Res. 19:5901-5905.Branlant C., Krot A., Machatt M.A., Pouyet J., Ebel J.P., Edwards K., and Kossel H. (1981). Primary and secondarystructures of Escherichia coli MRE 600 23S ribosomal RNA. Comparison with models of secondary structure formaize chloroplast 23S rRNA and for large portions of mouse and human 16S mitochondrial rRNAs. Nucleic AcidsRes. 9:4303-4324.Brink M.F., Verbeet M.Ph., and de Boer H.A. (1993). Ponnation of the central pseudoknot in 16S rRNA is essential forinitiation of translation. EMBO 1. 12:3987-3996.Brosius J ., Palmer M.L., Kennedy, PJ., and Noller H.F. (1978). Complete nucleotide sequence ofa 16S ribosomal RNAgene from Escherichia coli. Proc. Natl. Acad. Sci. USA 75:4801-4805.Brosius J.. Dull T., and Noller H.F. (1980). Complete nucleotide sequence of a 23S ribosomal RNA gene from Escherichiacoli. Proc. Nad. Acad. Sci. USA 77:201-204.Brosius J .. Dull T•• Sleeter D.D., and Noller H.F. (1981). Gene Organization and Primary Structure of a Ribosomal RNAOperon from Escherichia coli. 1. Mol. Bioi. 148:107-127.Costa M. and Michel F. (1995). Frequent use of the same tertiary motif by self·fo1ding RNAs. EMBO 1. 14:1276-1285.Doring T., Gruer B., and Brimacombe R. (1991). The three-dimensional folding of ribosomal RNA; localization of a seriesof intra-RNA cross-links in 23S RNA induced by treatment of Escherichia coli 50S ribosomal subunits with bis-(2-chloroethyl)-methylamine. Nucleic Acids Res. 19:3517-3524.
-•126Freier S.M., Kierzek R.t Jaeger J.A., Sugimoto N., Caruthers M.H., Neilson T., and Turner D.H. (1986). Improved free-energy parameters for predictions of RNA duplex stability. Proc. Nat/. Acad. Sci. USA 83:9373-9377.Gautheret D., Konings D., and GuteH R.R. (1994) A major family afmotifs involving G.A mismatches in ribosomal RNA.J. Mol. Bioi. 242:1-8.Gautheret D., Damberger S.H., and Gulell R.R. (1995) Identification of Base-triples in RNA using Comparative SequenceAnalysis. J. Mol. BioI. 248:27-43.Glotz C., Zwieb C., and Brlmacombe R. (1981). Secondary structure of the large subunit ribosomal RNA from Escherichiacoli. Zea mays chloroplast. and human and mouse mitochondrial ribosomes. Nucleic Acids Res. 9:3287-3306.Gluck A., Endo Y., and Wooll.G. (1994), The ribosomal RNA identity elements for ricin and for alpha-sarcin: mutationsin the putative CG pair that closes a GAGA tetraloop. Nucleic Acids Res. 22:321-324.Gregory S.T., Lieberman K.R., and DahlbergA.E. 1994. Mutations in the peptidyl transferase region ofE.co/i 23S rRNAaffecting translational accuracy. Nucleic Acids Res. 22:279-284.Gutell R.R., Welser B., Woese C.R., and Noller H.F. (1985). Comparative anatomy of 16S-like ribosomal RNA. Progressin Nucleic Acid Research and Molecular Biology 32:155-216.Gutell R.R.. Noller H.F., Woese C.R. (1986) Higher order structure in ribosomal RNA. EMBO J. 5: 1111-1113.Gulell R.R. and Woese C.R. (1990) Higher order structural elements in ribosomal RNAs: Pseudo-knots and the use ofnoncanonical pairs. Proc. Natl. Acad. Sci. USA 87:663-667.GuleU, R.R., Power, A.. Hertz, G.Z., Putz, EJ. and Slonno, G.D. (1992) Identifying Constraints on the Higher·OrderStructure of RNA: Continued Development and Application of Comparative Sequence Analysis Methods. NucleicAcids Res. 20:5785-5795.Gutell R.R. (1992) Evolutionary Characteristics of 16S and 23S rRNA Structures.pp. 243-309. PRESENTED AT: TheSymposium - The Origin and Evolution of Prokaryotic and Eukaryotic Cells. Shimada. Japan. April 22-25. 1992.Hyman Hartman and Koichiro Matsuno (eds.). World Scientific Publishing Co.• New Jersey. USAGuten R.R.. Gray M.W., and Schnare M.N. ( 1993) A compilation of large subunit (23S- and 23S·like) ribosomal RNAstructures. Nucleic Acids Res. 21:3055-3074 [Database issue].Gulell R.R. (l993a). Comparative studies of RNA: inferring higher·order structure from patterns of sequence variation.Current Opinion in Structural Biology 3:313-322.Gutell R.R. (l993b) The Simplicity Behind the Elucidation of Complex Structure in Ribosomal RNA.p.477-488. In TheTranslatioMI Apparatus. Nierhaus et aI.• eds. Plenum Publishing Corporation.GUlell R.R. (1994) Collection of SmaJl Subunit (16S- and 16S·like) ribosomal RNA structures: 1994. Nucleic Acids Res.22:3502-3507(Database issue).Gutell R.R., Larsen N., and Woese C.R. (1994). Lessons from an evolving rRNA: 16S and 23S rRNA structures from acomparative perspective. Microbiological Reviews 58:10-26.Heus H.A. and Pardi A. (1991). Structural Features That Give Rise to the Unusual Stability of RNA Hairpins ContainingGNRA Loops. Science 253:191-194.Hill W.E., Dahlberg A., Garrett R.A.. Moore P.B., Schlessinger D., and Warner J.R. eds. (1990). The Ribosome:Structure. Function &. Evolution. Hill et al., eds. American Society for Microbiology.Holley R.W., Apgar J., Everett G.A., Madlson J.T., Marqulsee M., Merrill S.H., Penswick J.R., and Zamir A. ( 1965).Structure of a Ribonucleic Acid. Science 147:1462-1465.Jaeger L, Michel F., and Westhof E. (1994). Involvement of a GNRA Tetraloop in Long·range RNA Tertiary Interactions.J, Mol. BioI. 236:1271-1276.Jucker F.M. and Pardi A. (1995). GNRA tetraloops make a U-turn. RNA 1:219-222.Kim SoH. (1979). Crysta1 Structure of Yeast tRNA-phe and General Structural Features of Other tRNAs. pp.83-100 InTransfer RNA: Structure. Properties. and Recognition. Schimmel P.R.. Soil D.• and Abelson l.N., eds, Cold SpringHarbor Laboratory,KJug A., Ladner J., Robertus J.D. (1974). The Structural Geometry of Co-ordinated Base Changes in Transfer RNA. J.Mol. Bioi. 89:511-516.Kooi, E.A., Rutgers C.A., Mulder A., Vant Riel J., Venema J., and Raue H.A. (1993). The phylogenetically conserveddoublet tertiary interaction in domain 1II of the large subunit rRNA is crucial for ribosomal protein binding. Proc. Nad.Acad, Sci. USA 90:213-216.Levitt M. (1969). Detailed Molecular Model for Transfer Ribonucleic Acid. Nature 224:759-763.Madison J.T., Everett G.A., and Kung H.K. (1966). On the Nucletide Sequence of Yeast Tyrosine Transfer RNA. pp. 409-416. Volume XXXI Cold Spring Harbor Symposia on Quantitative BiologyMaidak B.L., Larsen N., McCaughey M,J., Overbeek R., Olsen G,J., Fogel K., Blandy J., and Woese C.R. (1994). TheRibosomal Database Project. Nucleic Acids Res. 22:3485-3487 [Database issue].Mitchell P., Osswald M., Schueler D., and Brimacombe R. (1990). Selective isolation and detailed ana1ysis ofintta-RNAcross-links induced in the large ribosomal subunit of E.coli: a model for the tertiary structure of the t.RNA bindingdomain in 23S RNA. Nucleic Acids Res. 18:4325-4333.Moazed D. and Noller H.F. (1987). Chloramphenicol. erythromycin, carbomycin and vernamycin B protect overlappingsites in the peptidyl transferase region of 23S ribosomal RNA. Biochimie 69:879-884.
-I127Moazed D. and Noller HS. (1989), Interaction oftRNA with 235 rRNA in the Ribosomal A, P. and E Sites. Cell 57:585-597.Nierhaus K.H., Franceschi F., Subramanian A.R., Erdmann V.A., Wittmann-Liebold B. (1993), The TranslationalApparatus. Nierhaus et aI., eds. Plenum Publishing Corporation.Noller H.F., Kop J., Wheaton V., Brosius J., GuteH R.R., Kopylov A.M., Dohme F., Herr W.o Stahl D.A., Gupta R.,and Woese c.R. (1981), Secondary structure model for 235 ribosomal RNA. Nucleic Acids Res. 9:6167--6189.Noller H.F., Moazed D., Stem S., Powers T., Allen P.N., Robertson J.M., Welser B. and Triman K. (1990). Structureof rRNA and lis Functional Interaction in Translation. pp. 73-92. In The Ribosome: Structure. Function & Evolution.Hill et aI.• eds. American Society for Microbiology. Washington. DC.Olsen GJ. (1983). Comparative Analysis of Nucleotide Sequence Dara. Ph.D. thesis. University of Colorado.Orita M., Nishikawa F., Shimayama T., Taira K., Endo Y., and Nishikawa S. (1993). High-resolution NMR study ofa synthetic oligoribonucleotide with a tetranucleotide GAGA loop that is a substrate for the cytotoxic protein. ricin.Nucleic Acids Res. 21 :5670-5678.Powers T. and Noller H.F. (1991). A functional pseudoknot in 16S ribosomal RNA. EMBO 1. 10:2203--2214.Powers T. and Noller H.F. (1995). Hydroxyl radical footprinting of ribosomal proteins on 16S rRNA. RNA 1: 194-209.Prince J.B., Taylor B.H., Thurlow D.L., OCengand J., and Zimmennann R.A. (1982) Covalem crosslinking of tRNA-val to 16S RNA at the ribosomal P site: identification of crosslinked residues. Proc. Nat/. Acad. Sci. USA 79:5450-5454.Puglisi J.D., Wyatt J.R., and Tinoco IJr. (1990). Conformation of an RNA Pseudoknot. 1. Mol. Bioi. 214:437-453.RajBhandary V.L., Stuart A., Faulkner R.D., Chang S.H., and Khorana H.G.(l966). Nucleotide Sequence Studies onYeast Phenylalanine sRNA. pp. 425-434.Volume XXXI Cold Spring Harbor Symposia on Quantitative BiologyRinke-Appel J., Junke N., Brimacombe R., Lavrik I., Dokudovskaya S., Dontsova 0., Bogdanov A. (1994). Contactsbetween 16S ribosomal RNA and mRNA. within the spacer regon separating the AUG initiator codon and the Shine·Oalgrano sequence; a site-directed cross-linking study. Nucleic Acids Res. 22:3018-3025.Rosendahl G., Hansen L.B., and Douthwaite S. (1995). Pseudoknot in domain II of 23S rRNA is essential for ribosomalfunction. 1. Mol. Bioi. 249:59-60.SantaLucia J., Kierzek R., Turner D.H. (1992). Context dependence of Hydrogen bond free energy revealed by substi-mtions in an RNA hairpin. Science 256:217-219.Sirum-Connolly K. and Mason T.L. (1993). Functional requirement of a site-specific ribose methylation in ribosomalRNA. Science 262:1886-1889.Stiege W., Glotz C., and Brimacombe R. (1983). Localisation of a series of intra-RNA cross-links in the secondary andtertiary structure of 23S RNA, induced by ultraviolet irradiation of Escherichia coli 50S ribosomal subunits. NucleicAcids Res. 11:1687-1706.Stiegler P., Carbon P., Ebel J.P., Ehresmann C. (1981). A General Secondary-Structure Model for Procaryotic andEucaryotic RNAs of the Small Ribosomal Subunits. Eur. 1. Biochem. 120:487-495.Szewczak A.A., Moore P.B., Chan Y~L., and Wool I.G. (1993). The confonnation of the sarcinlricin loop from 28Sribosomal RNA. Proc. Natl. Acad. Sci. USA 90:9581-9585.Trust TJ., Logan S.M., Gustafson C.E., Romaniuk PJ., Kim N.W., Chan V.L., Ragan M.A.• Guerry P., and GutellR.R. (1994). Phylogenetic and Molecular Characterization of a 23S rRNA gene positions the genus Campylobacterin the epsilon subdivision of the Proteobacteria and shows that the presence of transcribed spacers is common inCampylobaeter spp. 1. Bacteriology 176: 4597-4609.ten Dam E., Pleij K., and Draper D. (1992). Structural and Functional Aspects of RNA Pseudoknots. Biochemistry31 :11665-11676.Tuerk C., Gauss P., Thermes C., Groebe D.R., Gayle M., Guild N., Stonno G., DAubenton-Carafa Y., UhlenbeckO.C., Tinoco I., Brody E.N., and Gold L. (1988). CUUCGG hairpins: Extraordinarily stable RNA secondarystructures associated with various biochemical processes. Proc. NtJtl. Acad. Sci. USA 85:1364-1368.Turner D.H., Sugimoto N., aod Freier SM. (1988). RNA Structure Prediction. Annu. Rev. Biophys. Biophys. Chem.17:167-192.Varani G., Cheong C., and Tinoco I. Jr. (1991). Structure of an Unusually Stable RNA Hairpin. Biochemistry 30:3280-3289.Vila A., Viril-Farley J., and Tapprich W.E. (1991). Pseudoknot in the central domain of small subunit ribosomal RNAis essential for translation. Proc. Narl. Aead. Sci. USA 91: 11148-11152.Winker, S. and Woese, C. R. (1991) A Definition of the Domains Archaea. Bacteria and Eucarya in Tenus of Small SubunitRibosomal RNA Characteristics. System. Appl. Microbiol. 14:305-310Woese C.R.• Magnun LJ., Gupta R., Siegel R.B., Stahl D.A., Kop J., Crawford N., Brosius J., Gutell R., HoganJ,J.,and Noller H.F. (1980). Secondary structure model for bacteriall6S ribosomal RNA: phylogenetic. enzymatic andchemical evidence. Nucleic Acids Res. 8:2275-2293.Woese C.R. (1980). Just So Stories and Rube Goldberg Machines: Speculations on the Origin of the Protein SyntheticMachinery. In Ribosomu: Structure. Function. and Genetics. Chambliss G.. Craven G.R.. Davies J.• Kahan L.. andNomura M., eds. University Park Press. Baltimore.
-•128Woese C.R., Gulell R.. Gupta R., Noller H.F. (1983). Detailed analysis of the higher-order structure of 16S-like ribosomalribonucleic acids. Microbial. Rev. 47:621-669.Woese C.R. and Gutell R.R. (1989). Evidence for several higher order structural elements in ribosomal RNA. Proc. Natl.Acad. Sci. USA 86:31 19-3122Woese, C.R., Winker S., and Gulell R.R. (1990). Architecture of Ribosomal RNA: Constraints on the sequence of Tetra-loops. Proc. NaIf. Acad. Sci. USA 87:8467-8471.Woese, C.R. (1987). Bacterial Evolution. Microbiological Reviews 51:221-271.Woese. C.R. and Pace R. N. (1993). Probing RNA Structure. Function. and History by Comparative Analysis. p.91-117.In The RNA World, Cold Spring Harbor Laboratory Press.Yonath A., Bennett W., Weinstein S., and Wittmann H.G. (1990). Crys14l1ography and Image Reconstuc(ions ofRibosomes. pp.134-147. In The Ribosome: Structure. Function & Evolution. Hill et aI., eds. American Society forMicrobiology.Zachau H.G., Dutting D.• Feldmann H., Melchers F., and Karau W. (1966). Serine Specific Transfer Ribonucleic Acids.XIV. Comparison of Nucleotide Sequences and Secondary Structure Models. pp. 417-424. Volume XXXI. ColdSpring Harbor Symposia on Quantilalive Biology.Zwieb C., Glatz C., and Brimacombe R. (198 I). Secondary structure comparisons between small subunit ribosomal RNAmolecules from six different species. Nucleic Acids Res. 9:3621-3640.
-•Ribosomal RNAStructure, Evolution,Processing, and Functionin Protein BiosynthesisEdited byRobert A. ZimmermannAlbert E. DahlbergCRCPressBoca Raton New York London Tokyo
-•Library of Congress Cotaloging.in.PubUClltloD DoloRibosomal RNA : structure. evolution. processing. and function in prolein biosynthesis I Roben A. Zimmennann.Alben E. Dahlberg. edilors.p. em.ISBN 0-8493·8864-3i . RNA. 2. Ribosomes. I. Zimmennann. Roben A. II. Dahlberg. Alben E.QP623.R46 1995574.873283·-<1c20Developed by Telford ~s95·17157CIPThis book contains infonnation obtained from authentic and highly regarded sources. Reprinled malerial is quoled withpennission. and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publishreliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materialsor for the consequences of their use.Neither this book nor any pan may be reproduced or transmitted in any fonn or by any means, electronic or mechanical.including photocopying. microfilming. and recording. or by any infonnation storage or retrieval system. without priorpermission in writing from the publisher.All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal useof specific clien", may be granled by CRC Press. Inc., provided that S.50 per page photocopied is paid directly toCopyrighl Clearance Cenler. 27 Congress Street. Salem. MA 01970 USA. The fee code for users of the TransactionalReponing Service is ISBN 0-8493·8864-31961SO.00+S.50. The fee is subject 10 change withoul notice. For organizations thaihave been granted a pholocopy license by !he CCC. a separate sySlem of payment has been arranged.eRe Press. Inc.s consent does not extend to copying for general distribution, for promotion. for creating new works.or for resale. Specific permission must be-obtained in writing from eRe Press for such copying.Direct all inquiries to CRC Press. Inc.• 2000 Corpol1lle Blvd.. NW.• Boca Raton. Florida 33431.~ 1996 by CRC Press. Inc.No claim to original U.S. Government worksInlemational Slandard Book Number 0-8493·8864-3Library of Congress Card Number 95·17157Prinled in lbe United Slales of America 1 2 3 4 5 6 7 8 9 0Printed on acid-free paper~-----------------------------------------------