• Save
Gutell 074.jmb.2000.304.0335
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Gutell 074.jmb.2000.304.0335






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Gutell 074.jmb.2000.304.0335 Document Transcript

  • 1. A Story: Unpaired Adenosine Bases inRibosomal RNAsR. R. Gutell1*, J. J. Cannone1, Z. Shang1, Y. Du1and M. J. Serra21Institute for Cellular andMolecular Biology, Universityof Texas, 2500 SpeedwayAustin, TX 78712-1095, USA2Department of ChemistryAllegheny College, 520 N.Main St., MeadvillePA 16335, USAIn 1985 an analysis of the Escherichia coli 16 S rRNA covariation-basedstructure model revealed a strong bias for unpaired adenosines. Thesame analysis revealed that the majority of the G, C, and U bases werepaired. These biases are (now) consistent with the high percentage ofunpaired adenosine nucleotides in several structure motifs.An analysis of a larger set of bacterial comparative 16 S and 23 SrRNA structure models has substantiated this initial ®nding and revealednew biases in the distribution of adenosine nucleotides in loop regions.The majority of the adenosine nucleotides are unpaired, while themajority of the G, C, and U bases are paired in the covariation-basedstructure model. The unpaired adenosine nucleotides predominate in themiddle and at the 3Hend of loops, and are the second most frequentnucleotide type at the 5Hend of loops (G is the most common nucleotide).There are additional biases for unpaired adenosine nucleotides at the3Hend of loops and adjacent to a G at the 5 end of the helix. The mostprevalent consecutive nucleotides are GG, GA, AG, and AA. A total of70 % of the GG sequences are within helices, while more than 70 % of theAA sequences are unpaired. Nearly 50 % of the GA sequences areunpaired, and approximately one-third of the AG sequences are withinhelices while another third are at the 3Hloop.5Hhelix junction.Unpaired positions with an adenosine nucleotide in more than 50 % ofthe sequences at the 3Hend of 16 S and 23 S rRNA loops were identi®edand arranged into the A-motif categories XAZ, AAZ, XAG, AAG, andAAG:U, where G or Z is paired, G:U is a base-pair, and X is not an Aand Z is not a G in more than 50 % of the sequences. These sequencemotifs were associated with several structural motifs, such as adenosineplatforms, E and E-like loops, A:A and A:G pairings at the end of helices,G:A tandem base-pairs, GNRA tetraloop hairpins, and U-turns.# 2000 Academic PressKeywords: RNA structure; comparative sequence analysis; unpairedadenosines; structure motifs; computational biology/bioinformatics*Corresponding authorIntroductionRNA molecules can form similar secondary andtertiary structures for sequences that are not identi-cal, and in many situations with less than 50 %sequence similarity. Comparative sequence anal-ysis attempts to identify those structural elementsthat are in common between different sequencesthat are members of the same RNA family (e.g.tRNA). Comparative sequence analysis has beenused successfully to predict secondary and tertiaryinteractions in several RNA molecules (reviewedby Woese & Pace, 1993: Gutell, 1996; Michel et al.,2000). The majority of these interactions are com-posed of G:C and A:U base-pairs (here, we de®neunderlined nucleotides as base-paired), organizedinto regular secondary structure helices, and ident-i®ed with covariation analysis due to the mannerin which both paired positions coordinatelychange, or covary, their nucleotide composition(Woese et al., 1983; Gutell et al., 1985). Beyond theprediction of standard base-pairs in secondarystructure helices, covariation analysis is also pre-dicting non-standard base-pairs (e.g. A:Gexchanges with G:A, and U:U exchanges with C:C)and base-pairs that form tertiary structure (Gutell,1996; Gutell et al., unpublished results). Wenow believe that all of the standard secondarystructure base-pairs in the Escherichia coli 16 SE-mail address of the corresponding author:robin.gutell@mail.utexas.edudoi:10.1006/jmbi.2000.4172 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 304, 335±3540022-2836/00/030335±20 $35.00/0 # 2000 Academic Press
  • 2. and 23 S rRNAs have been identi®ed with ourcovariation analysis. For those situations where wecan compare and contrast a solved crystal structurewith comparative data from a RNA sequencealignment, paired positions with a strong covaria-tion are nearly always base-paired in the crystalstructure (Gutell, 1999; Gutell et al., unpublishedresults). Therefore, covariation analysis, when usedjudiciously, can accurately predict base-pairs in anRNA structure.We now wonder what type of contribution com-parative analysis will have on the prediction andunderstanding of the three-dimensional structuresof the rRNAs (Ban et al., 1999, 2000; Cate et al.,1999; Clemons et al., 1999; Tocilj et al., 1999;Schluenzen et al., 2000; Wimberly et al., 2000). Wecan begin to address this issue when we appreciatethat comparative analysis, in its most general form,identi®es patterns of variation in its search for acommon structure. Base-pairs are predicted forthose positions that vary at the same time in theevolution of that RNA, regardless of the typeof base-pairing and/or the arrangement of thispairing in relationship with the ¯anking positions.Since the majority of the base-pairs are G:C, A:U,or G:U, and these pairs are arranged into standardsecondary structure helices, we conclude that cov-ariation analysis can identify the basic buildingblocks of RNA structure without any structural orother preconceived biases.Given this success, we now question if otherRNA building blocks or motifs can be decipheredfrom our comparative RNA sequence and structuredata sets. Our traditional comparative secondarystructure model only shows those secondaryand tertiary structure base-pairs with positionalcovariation within the underlying sequences plusinvariant Watson-Crick base-pairs which aredirectly adjacent to base-pairs with positionalcovariation. All of the unpaired positions in thesediagrams imply the lack of pairings with covaria-tion, not that these positions are not paired orinteracting with other regions of the RNA. Can werelate speci®c patterns of variation that occurwithin a de®ned structural context to a three-dimensional structure motif? Can we now predictstructure for the positions that do not covary withother positions? Alternatively, we question whattypes of structure occur at the unpaired positionsin the covariation structure model and ask ifcan we develop principles that relate sequencevariation with these structural elements.While some structural elements, such as base-pairs and helices, form similar structures withsequences whose positions covary, other structuralelements with similar shapes form sets of alignedsequences that do not have positional covariationwith one another (Gautheret et al., 1995a). Com-parative analysis of nucleotide distributions indifferent structural elements has resulted in theidenti®cation of several sequence and structuremotifs in these unpaired regions. This list includestetraloops (Woese et al., 1990), tandem G:A base-pairs (Gautheret et al., 1994), dominant G:U base-pairs (Gautheret et al., 1995b), E-loops (Gutell et al.,unpublished results; Gautheret et al., 1994;Wimberly 1994; Leontis & Westhof, 1998), U-turns(Gutell et al., 2000), and A:A and A:G base-pairs atthe ends of helices (here-after called AA.AG@he-lix.ends). These sequence-based analyses are givenmore meaning, biologically and structurally,from their comparison with experimental studies,especially the NMR and crystallographic analysisof several rRNA fragments (Szewczak et al., 1993;Kalurachchi et al., 1997; Conn et al., 1999;Wimberly et al., 1999; Agalarov et al., 2000; Nikulinet al., 2000). Our goals for the future are to identifymore biased distributions of nucleotides andsequences in different structural arrangements, toascribe biological and structural signi®cance tothem, and to deduce sets of sequence-structurerelationship rules, from which we aspire to accu-rately predict detailed RNA structure from a singlesequence.In 1985, a simple count of the paired andunpaired nucleotides in E. coli 16 S rRNA revealeda strong bias for unpaired adenosine nucleotides(Gutell et al., 1985). A total of 62 % of the adenosinenucleotides were unpaired, while approximately30 % of the G, C, and U bases were unpaired. Thestructural signi®cance for this bias was not knownat the time. However, these biases are (now)consistent with the high percentage of unpairedadenosine bases in the GNRA tetraloops (Woeseet al., 1990), E-loops (Gautheret et al., 1994;Wimberly, 1994; Leontis & Westhof, 1998), adeno-sine platforms (Cate et al., 1996b) and AA side-step(Conn et al., 1999) RNA sequence and structuremotifs found after this initial adenosine bias wasfound.Here, we follow up with a larger and moredetailed analysis of paired and unpaired nucleo-tides in our collection of rRNA and group I introncomparative structure models, track the frequentlyoccurring unpaired nucleotides, and associate thesewith different structural motifs.ResultsThe base compositions for 175 bacterial 16 S and71 bacterial 23 S rRNA comparative structuremodels have been analyzed and presented here.For our online presentation (see Materials andMethods for detailed explanations), we have ana-lyzed a larger set of comparative structures from5 S, 16 S, and 23 S rRNAs (including bacteria,archaea, and eucarya nuclear, chloroplast, andmitochondria sequences) and group I introns. Ourcollection of structure diagrams represents all ofthe major phylogenetic groups within the bacterialdomain (as well as for the other primary phylo-genetic domains). The comparative structuremodel is based on covariation analysis (Woeseet al., 1983; Gutell et al., 1985, unpublished results).For the purposes of the current analysis, positions336 Unpaired Adenosine Bases in Ribosomal RNAs
  • 3. with substantial covariation or containing invariantWatson-Crick base-pairs are base-paired andpositions that do not covary with other positionsare unpaired in our covariation structure model.The current 16 S and 23 S rRNA secondarystructure models are available from http://www.rna.icmb.utexas.edu/CSI/2STR/ref2str.htmlThe frequencies for single nucleotide positionsare presented in histogram format (Figure 1). Thetotal frequencies for the four RNA nucleotides A,U, C, and G were characterized into helices (base-paired) and loops (unpaired), and then subdividedfurther into the 5Hend, center, and 3Hend positionsfor helices and loops. Overall, G (31.4 %) is themost prevalent nucleotide, followed by A (25.7 %),C (22.4 %), and U (20.5 %), G is also the most com-mon nucleotide in helices (36.6 %), while A (14.5 %)occurs with the lowest frequency in paired pos-itions. Guanosine occurs with an even higherfrequency at the 5Hend of helices (46.2 %), where Uis the least frequent (13.5 %). Meanwhile, C is themost abundant nucleotide at the 3Hend of helices(38.1 %), followed by G (30.4 %). Adenosine is themost prevalent nucleotide at unpaired positions,occurring at 42.6 %, while C is the least common at12.5 %. Adenosine is even more dominant at the 3Hend of loops, occurring in 53.5 % of the sequences.Meanwhile, G is the most common nucleotide atthe 5Hend of loops (37.1 %); adenosine is second at29.3 %. Another measure of the bias in unpairedadenosine bases is revealed in the ratio of unpairedto paired nucleotides for single nucleotides (seealso the online query system). The unpaired/paired ratio for each nucleotide is: A, 1.96; U, 0.71;G, 0.43; and C, 0.29. Alternatively, 66.2 % of theadenosine bases are unpaired; the percentages ofunpaired U, G, and C bases are 41.5 %, 30.1 %, and22.3 %, respectively, for our collection of bacterial16 S and 23 S rRNA structure models. Thesevalues are similar but not identical with the valuesdetermined for the 1985 version of the E. coli 16 SrRNA covariation structure model (Gutell et al.,1985). The same trends and nucleotide biases alsooccur for our other RNA structure models (avail-able online).Figure 1. Frequency anddistribution of single nucleotides inbacterial 16 S and 23 S rRNAs com-parative structure models. The totalnumber of occurrences for each ofthe four nucleotides at nine struc-tural categories: total (all positions),paired, unpaired, 5H-helix.end(5Hend of a helix), 3H-helix.end(3Hend of a helix), 5H-loop.end(5Hend of a loop), 3H-loop.end (3Hend of a loop), helix.center (all pos-itions within a helix that are not atthe 5Hor 3Hends of a helix), andloop.center (all positions within aloop that are not at the 5Hor 3Hends of a loop).Figure 2. Frequency and distribution of consecutive nucleotides in bacterial 16 S and 23 S rRNAs comparativestructure models. The total number of occurrences for the 16 dinucleotides at three structural categories: total (allpositions), in helix (paired), and in loop (unpaired).Unpaired Adenosine Bases in Ribosomal RNAs 337
  • 4. Next, we investigated the frequency anddistribution of consecutive nucleotides. The mostcommon dinucleotides are the four purine combi-nations. Consecutive GG residues are the mostprevalent at 9.86 %, followed by GA (7.92 %), AG(7.88 %), and AA (7.65 %) (Figure 2). Thedinucleotides were classi®ed into four categories:paired (helical), unpaired (loop), and the twopaired/unpaired junctions, 3Hloop.5Hhelix and3Hhelix.5Hloop. The most frequent consecutive dinu-cleotides are distinctly different between these fourcategories. In helices, GG (14.1 %), GC (10.4 %), CC(9.0 %), and GU (8.3 %) are the most prevalent con-secutive dinucleotides; note that these consecutivedinucleotide arrangements are components of themost stable nearest-neighbors (Xia et al., 1998). Incontrast, AA (19.2 %), GA (13.4 %), and UA (9.8 %)are the most common adjacent dinucleotides inloop motifs (Figure 2). Greater than 70 % ofthe consecutive adenosine residues are withinunpaired regions, consistent with the observationthat 5H-AA-3H/3H-UU-5His the least stable nearest-neighbor (Xia et al., 1998).The adjacent dinucleotides with the highestunpaired to paired ratio are AA (5.68), UA (2.03),GA (1.47), AU (1.20), while the three lowest ratiosare GC (0.17), GG (0.15), and CC (0.11). Theseratios again emphasize that adenosine bases tendto be unpaired, consecutive adenosine bases areeven more likely to be unpaired, and that consecu-tive G and C bases tend to be paired.The most abundant dinucleotides at loop-helixjunctions were analyzed (Figure 3). CG (14.6 %),GA (10.3 %), and CA (10.2 %) are the most abun-dant at the 3Hhelix.5Hloop junction; AG (25.0 %) andAC (13.3 %) are the two most abundant pairs at the3Hloop.5Hhelix junction. These results are consistentwith the abundance of A and G bases at the 5Hendof loops, A nucleotides at the 3Hend of loops, andG and C nucleotides at the 5Hand 3Hends of helices.The strong preference for AG at loop-helix junc-tions might not be a simple consequence of stab-ility since all 5Hdangling ends have nearly thesame small stabilizing effect helices (Freier et al.,1986). The most stable 3Hdangling end sequences,CA, CG, GA, and GG (Freier et al., 1986), occurfrequently in our 16 S and 23 S rRNA structuredata sets (Figure 3).Next, we investigated the frequencies forthree consecutive nucleotides - NNN and NNN atloop.helix and helix.loop interfaces, where N isunpaired and N is paired. Figure 4(a) and (b) dis-play the 32 most prevalent trinucleotide combi-nations for NNN (a) and NNN (b). The observedtriplets at these junctions are very biased intheir distributions. At the 3Hloop.5Hhelix interface(Figure 4(a)), AAG occurs in 14.4 % of the junc-tions, followed by AAC (6.7 %) and GAG (5.4 %).All of the 11 most frequent sequences contain atleast one unpaired A nucleotide; nine of these 11trinucleotides have an A base at the extreme 3Hendof the loop. The trinucleotides at the 3Hhelix.5Hloopinterface (Figure 4(b)) are signi®cantly different.The three most abundant trinucleotides are BGA,where B is not A: CGA (7.6 %), UGA (5.8 %), andGGA (5.4 %). The six most frequent sequences haveat least one adenosine base in the two unpairedpositions, with purines accounting for 11 of the 12unpaired positions. In addition to these biased dis-tributions of triplets at loop/helix junctions,Figure 4(a) and (b) also reveal that only 32 of the64 possible triplets account for more than 80 % ofthese occurrences.The most signi®cant ®ndings to this stage in ouranalysis are the high percentages of: (1) unpairedadenosine bases, with adenosine residues account-ing for more than 50 % of the nucleotides at the 3Hloop ends; (2) paired guanosine bases, with guano-sine accounting for nearly 50 % of the nucleotidesat the 5Hend of helices; (3) unpaired consecutiveadenosine bases; and (4) AG at 3Hloop.5Hhelixjunctions.Our next set of goals is to map these frequentlyoccurring nucleotides onto the 16 S and 23 S rRNAcomparative structure models, to determine thosepositions where the unpaired adenosine residue atthe 3Hend of the loop occurs in more than 50 % ofthe bacterial sequences, and to identify largermotifs that build onto these dominant adenosinebases. We rationalize that 3Hloop positions with anadenosine in more than 50 % of the sequences(hereafter called the ``A-motifs) are important forFigure 3. Frequency and distri-bution of dinucleotides at loop-helix junctions in bacterial 16 S and23 S rRNAs comparative structuremodels. Total number of occur-rences of consecutive nucleotides atthe two loop-helix junctions,3Hhelix.5Hloop and 3Hloop.5Hhelix.338 Unpaired Adenosine Bases in Ribosomal RNAs
  • 5. the formation of conserved structural motifs. Atotal of 527 unpaired positions in the 16 S and 23 SrRNAs are followed by a base-pair predicted withcovariation analysis. We expect, based upon theobserved nucleotide frequencies in the bacterial16 S and 23 S rRNA sequences (A, 25.7 %; C,22.4 %; G, 31.4 %; U, 20.5 %), adenosine to occur at25.7 % (135 occurrences) of these 3Hloop ends forany one set of 16 S and 23 S rRNA structures. Weobserve that, collectively, the positions at the 3Hloop ends contain 54.5 % adenosine bases. The twoextreme cases for the distribution of these adeno-sine bases among the 527 3Hloop ends are (1) theadenosine nucleotides are distributed evenly, sothat each of the loop ends contains 54.5 % adeno-sine; and (2) the adenosine nucleotides are concen-trated such that 287 of the loop ends contain 100 %adenosine. In fact, 294 of the 527 3Hloop ends havean adenosine base in more than 50 % of the bac-terial 16 S and 23 S rRNA sequences (Table 1); theaverage conservation value for adenosine at thesepositions is 93.7 %. Therefore, there is a verypronounced bias for adenosines to be veryconserved at the 3Hloop ends of the 16 S and 23 SrRNAs.Of the 294 3Hloop ends with an adenosine basein more than 50 % of bacterial sequences, 136 arefollowed by a paired G in more than 50 % of thosesequences (AG motif; Table 1). In contrast, weexpect 43 of these motifs in the 16 S and 23 SrRNAs, based on the observed nucleotide frequen-cies (527*.257*.314). Finally, the number of AA andAAG motifs observed is again more than the num-ber expected for a random distribution (Table 1).The distributions of the expected and observed A,AA, AG, AAG, and AAG:U motifs in hairpin,multi-stem, internal, and bulge loops were deter-mined (Table 1). The number of observed A-motifsat each of the loop motifs is (again) signi®cantlylarger than expected. (Note for the followingA-motifs (where each motif occurs in a minimumof 50 % of the sequences): AAG, the G is not pairedto a U in more than 33 % of the sequences; AA, thenucleotide 3Hof the second A is not a G in morethan 50 % of the sequences; AG, the nucleotide 5Hof the A is not an A in more than 50 % of thesequences; A, the paired nucleotide following theA is not a G in more than 50 % of the sequencesand the nucleotide preceding the A is not an A inmore than 50 % of the sequences.)The A-motifs have been mapped onto the 16 Sand 23 S rRNA secondary structure models(Figure 5). Each of ®ve motifs is assigned a differ-ent color: AAG:U motifs are indicated in red, AAGin green, AG in blue, AA in orange, and A inyellow. Position numbers for the A-motifs in the16 S and 23 S rRNA are listed in Tables 2(AAG:U), 3 (AAG), 4 (AG), 5 (AA), and 6 (A).The loop-helix junctions listed in Table 2 havethe AAG sequence present in more than 50 % ofthe bacterial sequences, and G:U in more than 33 %of the same sequence set. Thirteen 16 S and 23 SrRNA junctions satisfy this criteria. The majority ofthese occur in internal loops (10), and a few occurin bulge (2) and multi-stem (1) loops; three occurin 16 S rRNA, and ten appear in 23 S rRNA (seeTable 2 and Figure 5). The majority of these arevery well conserved, occurring with percentagessigni®cantly higher than the required minimum.Seven have greater than 90 % AAG and 90 % G:Ubase-pair conservation; the average conservationvalues are 81 % AAG and 77 % G:U.The remaining 43 AAG loop-helix junctions arelisted in Table 3. These junctions are distributedmore evenly than the AAG:U A-motif in hairpin(9), multi-stem (19), and internal (14) loops, withone in a bulge loop; 15 occur in 16 S rRNA and 28occur in 23 S rRNA (see Table 3 and Figure 5).More than 75 % of the hairpin junctions are part ofa GNRA tetraloop. Over half (23) of these AAGjunctions are conserved in more than 90 % of thesequences, with an average conservation value ofFigure 4. Frequency and distribution of consecutivetrinucleotides at loop-helix junctions in bacterial 16 Sand 23 S rRNAs. The ranking of the top 32 most fre-quent trinucleotides at the two loop-helix junctions,3Hhelix.5Hloop and 3Hloop.5Hhelix. Two of the three con-secutive nucleotides are unpaired at both junctions. Thepaired nucleotides are underlined. (a) 3Hloop.5Hhelix junc-tion. (b) 3Hhelix.5Hloop junction.Unpaired Adenosine Bases in Ribosomal RNAs 339
  • 6. 86 %. The consecutive AA nucleotides are con-served in approximately 93 % of the sequences.AG loop-helix junctions are listed in Table 4.There are 80 examples of this motif, with a sig-ni®cant proportion occurring in internal (26),multi-stem (28), and hairpin (17) loops, and theremaining nine in bulge loops; 23 occur in 16 SrRNA and 57 occur in 23 S rRNA (see Table 4and Figure 5). Almost 60 % of the AG motifs areconserved in more than 90 % of the sequences,and 81 % of these motifs are conserved in morethan 70 % of the sequences. Six of the hairpinloops are GNRA tetraloops; seven other loopshave unusually stable G:A mismatches betweenthe ®rst and last nucleotides of the hairpin loop(Serra et al., 1994).Figure 5 (legend shown on page 342)340 Unpaired Adenosine Bases in Ribosomal RNAs
  • 7. Figure 5 (legend shown on page 342)Unpaired Adenosine Bases in Ribosomal RNAs 341
  • 8. A total of 56 AA motifs (Table 5) occur pre-dominantly in multi-stem (24), internal (16), andhairpin (12) loops; four occur in bulge loops (seeTable 5 and Figure 5). 18 occur in 16 S rRNAand 39 occur in 23 S rRNA. Over 60 % of thesemotifs are conserved in more than 90 % ofthe sequences. Table 5 also contains the mostprevalent AAN sequence at each motifsite (where N is base-paired; sites havingAAG > 50 % appear in Tables 2 or 3). Nearly50 % of the AA motifs in Table 5 are AAC.Eight of the hairpin loops have unusually stablesequences, either GNRA tetraloops (4) or G:A®rst mismatches (4) (Serra et al., 1994).Figure 5. A-motifs mapped onto the Escherichia coli 16 S and 23 S rRNA comparative secondary structure models.Unpaired positions at the 3Hend of loops that occur in more than 50 % of the bacterial sequences are highlighted indifferent colors: XAZ, yellow; AAZ, orange; XAG, blue; AAG, green; and AAG:U, red; where X is not A in morethan 50 % of the sequences, Z is not G in more than 50 % of the sequences, and paired nucleotides are underlined.Diagrams were generated using the program XRNA (Weiser, B. & Noller, H., University of California at Santa Cruz).(a) 16 S rRNA. (b) 23 S rRNA, 5Hhalf. (c) 23 S rRNA, 3Hhalf.342 Unpaired Adenosine Bases in Ribosomal RNAs
  • 9. There are 102 A-motifs, with a signi®cant num-ber of occurrences in multi-stem (38), internal (29),bulge (20), and hairpin (15) loops; 41 occur in 16 Sand 61 occur in 23 S rRNA (see Table 6 andFigure 5). A total of 77 % of the A motifs areconserved in more than 90 % of the bacterialsequences, and 50 % are 100 % conserved in thosesequences!DiscussionAnalysis of a large set of bacterial 16 S and 23 SrRNA covariation-based comparative structuremodels has revealed a propensity for adenosinebases to be unpaired. A disproportionate numberof these unpaired adenosine nucleotides are con-secutive, at the 3Hend of loops, and adjacent to apaired G at the 3Hloop.5Hhelix junction. The highlyconserved nature of the loop-helix junctionsdescribed here suggests that they are an importantpart of several different motifs. Because they occurso frequently, we believe that they are a majorbuilding block in the 16 S and 23 S rRNA struc-tures. Our goal is to transform these sequencemotifs into structural motifs that help coordinatethree-dimensional structure. We have named theadenosine bases that occur at the 3Hend of loops inmore than 50 % of the bacterial 16 S and 23 SrRNA sequences A-motifs. These are associatedwith several known structural motifs and areclassi®ed into ®ve categories: AAG:U, AAG, AG,AA, and A.Adenosine platformsThe ®rst set of loop-helix junctions to consider isthose with a AAG:U motif (Table 2 and Figure 5).Thirteen positions in the 16 S and 23 S rRNA con-tain the AAG sequence conserved in more than50 % of the sequences (see Table 2) and the G:Ubase-pair conserved in more than 33 % of thesequences (16 S positions 415, 432, and 1289; 23 Spositions 14, 706, 1214, 1470, 1854, 1877, 1890,2135, 2542, 2851). Seven of these sites (in italics) areconserved in more than 90 % of the sequences.This complex sequence motif forms the adeno-sine platform present in the crystal structure of theTetrahymena thermophila group I intron P4-P6domain (Cate et al., 1996a,b). To ascertain if theadenosine platform-like sequence motifs in the16 S and 23 S rRNA are capable of forming theTable 1. Characterization of nucleotides at loop-helix junctions for loops with unpaired 5Hnucleotides in 16 S and23 S rRNALoop type Total A AA AG AAG AAG:UTotal Measured 527 294 (56 %) 113 (21 %) 136 (26 %) 56 (11 %) 13 (2 %)Predicted ± 135 (26 %) 35 (7 %) 43 (8 %) 11 (2 %) 2 (1 %)Hairpin Measured 91 53 (58 %) 21 (23 %) 26 (29 %) 9 (10 %) 0 (±)Predicted ± 24 (25 %) 6 (6 %) 8 (8 %) 2 (2 %) 0 (±)Multi stem Measured 202 110 (54 %) 45 (22 %) 48 (24 %) 20 (10 %) 1 (1 %)Predicted ± 51 (26 %) 13 (7 %) 16 (8 %) 4 (2 %) 1 (1 %)Internal Measured 163 95 (58 %) 40 (25 %) 50 (31 %) 24 (15 %) 10 (6 %)Predicted ± 42 (26 %) 11 (7 %) 13 (8 %) 3 (2 %) 1 (1 %)Bulge Measured 71 36 (51 %) 7 (10 %) 12 (17 %) 13 (4 %) 2 (3 %)Predicted ± 18 (25 %) 5 (7 %) 6 (8 %) 1 (1 %) 0 (±)Junctions were counted if an A-motif occurred in greater than 50 % (33 % for AAG:U) of the sequences in the bacterial 16 S and23 S rRNA alignments (http://www.rna.icmb.utexas.edu/). Predicted values were calculated with nucleotide frequencies: A(25.7 %), G (31.4 %), and U (20.5 %); values are rounded to the nearest whole number. Percentages are calculated with respect to thetotal number of positions for that loop type; values are rounded to the nearest whole number, with ``± used to represent zero.Table 2. A-motif: AAG:U sites in 16 S and 23 S rRNAPositionaAA (%)bAAG (%)bG:U (%)bPredictedstructuremotifs cA. Multi-stem loops23 S rRNA14 99 99 98 PB. Internal loops16 S rRNA415 76 75 59 EL, P432 100 55 45 GA, P1289 100 55 55 A, P23 S rRNA706 100 94 94 A, P1214 97 97 97 A, P1470 86 81 76 GA, P1854 100 54 39 GA, P1877 98 98 98 P1890 100 100 100 P2135d86 48 46 PC. Bulge loops23 S rRNA2542 100 100 99 P2851 93 91 91 PrRNA positions have an AAG:U motif in more than 33 % ofthe bacterial sequences and are indicated in red on Figure 5.aThe position number is the nucleotide at the 3Hloop end,at the loop-helix junction.bMore detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.cA, AA.AG@helix.ends; EL, E-like Loop; GA, tandem G:Abase-pairs; P, adenosine platform (see Discussion).dAlthough this site contains less than 50 % AAG, it wasincluded because it contains more than 33 % G:U and narrowlymissed the required minimum for AAG.Unpaired Adenosine Bases in Ribosomal RNAs 343
  • 10. adenosine platform structural motif, we have ana-lyzed the group I intron adenosine platforms froma comparative sequence perspective. The crystalstructure of the P4-P6 domain of the group I intronhas three adenosine platforms at positions 172,219, and 226 (numbers refer to the second A ofthe AAG motif for the T. thermophila sequence(GenBank Accession # J01235)). Each of the threeadenosine platforms occurs in a distinct structuralenvironment in the comparative secondary struc-Table 3. A-motif: AAG sites in 16 S and 23 S rRNAPosition aAA (%)bAAG (%)bPredicted structure motifscLoop dA. Hairpin loops16 S rRNA383 98 70 A GNRA901 100 97 A, U GNRA23 S rRNA311 91 84 U 6633 100 77 U GNRA1226 62 52 A, U GNRA1810 95 88 A GNRA1872 70 65 GNRA1928 100 100 U 32361 62 55 6B. Internal loops16 S rRNA1333 100 99 A1434 98 941469 54 541493 99 99 A1503 100 10023 S rRNA609 100 68 A1001 99 99 A, GA1156 98 851354 100 99 A, GA, U1572 92 83 A, GA, U1580 88 86 GA1701 100 99 A, EL2469 96 96 A, GA2810 83 83 AC. Multi-stem loops16 S rRNA60 98 98 A, GA197 99 93 A499 99 98574 99 98768 97 96 EL873 100 89915 100 85938 100 9923 S rRNA423 100 93472 94 94603 53 53 A, GA1010 100 531029 100 65 A, GA1308 99 991641 86 85 A2336 100 992378 100 96 A, U2412 93 852566 100 100 AD. Bulge loops23 S rRNA1848 100 96rRNA positions have an AAG motif in more than 50 % of the bacterial sequences and are indicated in green on Figure 5.aThe position number is the nucleotide at the 3Hloop end, at the loop-helix junction.bMore detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.cA, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion).dHairpin loop size (in nucleotides) and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % ofthe bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.344 Unpaired Adenosine Bases in Ribosomal RNAs
  • 11. ture model (Michel & Dujon, 1983; Michel &Westhof, 1990) and the three-dimensional crystalstructure (Cate et al., 1996b): a hairpin loop atposition 172, a symmetric 3  3 internal loopat position 219 (where 3  3 refers to the numberof nucleotides on each side of the internal loop),and an asymmetric 3  2 internal loop at position226. They also differ in regards to the type oftertiary interactions with which they are associ-ated. The adenosine platform at position 226 ispart of the tetraloop receptor (Murphy & Cech,1994; Cate et al., 1996b) that makes an intramolecu-lar contact with a tetraloop at position 150, one ofthe interactions responsible for aligning the twoTable 4. A-motif: AG sites in 16 S and 23 S rRNAPosition aA (%) bAG (%) bPredictedstructuremotifs cLoop dA. Hairpin loops16 S rRNA300 100 100 A, U GNRA1080 100 90 A, U GNRA1269 100 72 A, U GNRA23 S rRNA167 100 99 9*251 100 98 A 5*322 100 71 3466 100 79 A, U GNRA492 99 75 5*646 87 86 5*1073 100 100 U 91098 99 99 A, U 6*1618 98 95 A 6*1755 100 73 32147 95 95 4*2534 54 53 62598 100 100 A, U GNRA2662 # 100 100 A, U GNRAB. Multi-stem loops16 S rRNA8 98 9826‡ 100 99 A288 100 92353 98 98 A523‡ 100 99828 80 71860 96 88 A1046‡ 100 991067 100 100 A, U23 S rRNA177‡ 59 58 A324‡ 73 55332 100 88374 100 67 A, GA, E532‡ 65 61627 99 98 A, GA655 98 98 A, GA699‡ 99 95 A945 99 76 A975 99 99 A1189 100 99 A, GA, E1342 100 100 U1791 100 98 A1932 100 100 A, GA, EL2119 100 1002126 100 100 A, GA2587 100 83 A, U2629 63 57Position aA (%) bAG (%) bPredictedstructuremotifs cLoop dC. Internal loops16 S rRNA246‡ 100 100 A520 100 100 A665 70 67687 100 97 A802 100 99 A, EL1252 72 681275 93 921418 100 98 A, GA1456‡ 82 7323 S rRNA84 100 98244 100 99 A, GA, E294‡ 100 88 A861 100 96 A, GA, E878 86 731111 100 1001237 100 821268 100 65 A, GA, E1373‡ 100 91 EL1434 78 581439 90 561477 92 88 A, GA, EL1866 99 90 A, GA2158 100 992298‡ 91 672320 60 512388‡ 100 1002639 100 78 A, GAD. Bulge loops16 S rRNA583‡ 100 100777 100 9623 S rRNA213 100 100764‡ 100 60941‡ 100 991205‡ 76 671490‡ 97 961586 90 792602‡ 100 100rRNA positions have an AG motif in more than 50 % of the bacterial sequences and are indicated in blue on Figure 5.aThe position number is the nucleotide at the 3Hloop end, at the loop-helix junction; ‡, the nucleotide prior to this position isbase-paired; #, Sarcin/Ricin loop.bMore detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.cA, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).dHairpin loop size and special characteristics:. GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterialrRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.Unpaired Adenosine Bases in Ribosomal RNAs 345
  • 12. coaxial stacked helices of the P4-P6 domain. Theother two adenosine platforms form intermolecularcrystal contacts, whose physiological signi®cance isuncertain. We will focus on the two internal loopsat positions 219 and 226, since ten of the 13 adeno-sine platform candidates in 16 S and 23 S rRNAoccur in internal loops (two occur in bulge loops,and the last occurs in a multi-stem loop (Table 2and Figure 5)). The adenosine platform at the hair-pin loop at position 172 of the P4-P6 domain willnot be considered here, in part because it is alsoinvolved in a intramolecular crystal interaction thatis not physiological.The P4-P6 domain, as represented by theT. thermophila crystal structure, is only present inthe C1 and C2 subgroups of the group I introns(Michel & Westhof, 1990; Damberger & Gutell,1994). To ensure that we are comparing similarstructural elements, we only analyzed those C1sequences that have the same number of nucleo-tides as T. thermophila at the positions involvedin the two adenosine platforms. Only 110 of the319 sequences in the group C1 intron alignmenthave a symmetric 3 Â 3 internal loop at position219 in our sequence alignments and data set.Table 7 reveals the high degree of conservationof the two adenosine residues 5Hof the loop-helix junction; 98 % of the sequences have an Aresidue at positions 218 and 219. Position G220and its pairing partner U253 are each conservedin approximately 70 % of the sequences, whilethe G:U base-pair occurs in less in less than60 % of the sequences. The second most commonbase-pair is C:G, followed by A:U and G:C. InTable 5. A-motif: AA sites in 16 S and 23 S rRNAPositionaAA (%)bSequencecPredictedStructureMotifsdLoopeA. Hairpin loops16 S rRNA162 99 AAC A, U GNRA622 99 AAC U 5696 100 AAU A, U 6*1170 97 AAA 5*1519 97 AAG A GNRA23 S rRNA127 100 AAC A GNRA390 72 AAA 7752 92 AAA U 8*1085 100 AAA U 31367 66 AAG GNRA1635 55 AAU A 5*2311 84 AAU 7B. Internal loops16 S rRNA374 100 AAU A449 52 AAG E676 100 AAU A, GA782 100 AAC A, EL909 100 AAC A, E1447 94 AAC23 S rRNA257 60 AAG E346 89 AAA515 100 AAC U677 82 AAC901 60 AAC911 100 AAC1143 100 AAA1322 71 AAG1655 99 AAC A2015 90 AAU2741 100 AAC A, GA, UPositionaAA (%)bSequencecPredictedStructureMotifsdLoopeC. Multi-stem loops16 S rRNA120 99 AAC U510 99 AAC959 100 AAU A, GA1005 51 AAU23 S rRNA182 56 AAC A218 61 AAA223 94 AAU A, GA, U300 99 AAC EL429 98 AAA483 58 AAC A, U735 99 AAC793 61 AAA A, GA821 100 AAU U1275 99 AAA1302 68 AAG1610 100 AAC1786 100 AAA1978 100 AAC A2199 100 AAC A, GA, U2287 65 AAA A, GA2426 98 AAC U2433 100 AAA U2734 50 AAGD. Bulge loops16 S rRNA51 87 AAC72 58 AGC642 51 AAC23 S rRNA1900 89 AAArRNA positions have an AA motif in more than 50 % of the bacterial sequences and are indicated in orange in Figure 5.aThe position number is the nucleotide at the 3Hloop end, at the loop-helix junction.bMore detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.cMost prevalent loop-helix sequence.dA, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).eHairpin loop size and special characteristics: GNRA, tetraloops (Woese et al. 1990) occur in more than 70 % of the bacterialrRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.346 Unpaired Adenosine Bases in Ribosomal RNAs
  • 13. Table 6. A-motif: A sites in 16 S and 23 S rRNAPositionaA (%)bPredictedstructuremotifscLoopdA. Hairpin Loops16 S rRNA845 61 51016 94 A, U GNRA1453 52 UNGG23 S rRNA199 l00 4548 59 4574 76 U 8616 75 5*1176 62 41918 93 72478 100 A 7*2705 99 4*2757 100 A 11*2799 56 32826 100 72860 96 A, U GNRAB. Multi-stem loops16 S rRNA16 100 A315‡ 100 A338‡ 99 A366 65495 99546‡ 51864 100 U983 100 A994 1001101 1001157 100 A, GA1191 1001339 1001349 100 A, GA, E1398 100 A23 S rRNA52 99 A, GA73 10094 81149‡ 95 A233 100 GA270 92340 99 A, GA, EL412 98432 100460 99 A, GA, E670 100990 1001103‡ 100 A1384 1001603 991829 1002042 842062 1002171‡ 100 U2173‡ 100 A, GA, U2346 100 A, GA2358 98 A2835 100 APositionaA (%)bPredictedstructuremotifscLoopdC. Internal loops16 S rRNA151 100174 94 A, GA282 100 A389‡ 100 A482 98 A, GA487 99 A, GA, E535 100715 100 A, GA1306 100 A, GA1408 99 A1483 99 A, GA1499 10023 S rRNA63‡ 5691‡ 89103 99 A207 99 A, GA, E1050 1001419 95 A, GA1664‡ 1001689 100 A, GA, EL1723 621745 531802 100 A, GA1885‡ 982005‡ 85 A2327‡ 100 A2614 1002657 # 100 A, GA, E2690 68D. Bulge loops16 S rRNA55‡ 10065 94130‡ 100205 83397‡ 100595‡ 79 BT1042‡ 551055 1001196‡ 991227‡ 1001394‡ 10023 S rRNA443‡ 100739‡ 61 BT896‡ 99927‡ 891819 1001981‡ 992051‡ 612873‡ 1002879‡ 98rRNA positions have an A motif in more than 50 % of the bacterial sequences and are indicated in yellow on Figure 5.aThe position number is the nucleotide at the 3 loop end, at the loop-helix junction; ‡, the nucleotide prior to this position isbase-paired; #, Sarcin/Ricin loopbMore detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.cA, AA.AG@helix.ends; BT, base triple; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion)dHairpin loop size and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterialrRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.Unpaired Adenosine Bases in Ribosomal RNAs 347
  • 14. addition, positions 219:254 do not form Watson-Crick base-pairs.A total of 139 of the 319 ICI sequences had a3 Â 2 internal loop at position 226 (Table 7). As inthe previous example, adenosine bases are themost frequent nucleotide at the two positions 5Hofthe loop-helix junction; however, the frequencies ofthese two adenosine bases are not as high. One-quarter of the sequences have a C base in place ofthe adenosine at position 226, which is consistentwith previous sequence analysis and in vitro selec-tion experiments (Costa & Michel, 1997). The G atposition 227 and the G:U base-pair at positions227:247 are both present in 65 % and 62 % of thesequences, respectively. One of the most conservedfeatures of the 226 adenosine platform is theU224:A248 reverse Hoogsteen base-pair, whichoccurs in 87 % of the sequences. While all fournucleotides are observed at the bulge at position249, 88 % of the sequences are pyrimidine bases; inthe P4-P6 crystal structure (Cate et al., 1996a), thisposition is involved in the tertiary interactionswith the tetraloop at position 150 and can poten-tially form a hydrogen bond to A226 (Costa &Michel, 1997).The adenosine platform at position 226 in theP4-P6 domain crystal structure widens the minorgroove of the RNA helix to allow tertiary contactwith the tetraloop at positions 150-153. The tetra-loop receptor in the absence of bound tetraloopassumes an alternate structure, with the adenosineforming a cross-strand stack (Butcher et al., 1997).The adenosine bases, rather than forming the side-by-side arrangement observed in the crystal struc-ture, are arranged in a stacked zipper-like arrange-ment. In addition, the ®rst adenosine nucleotidesof the two platforms (218 and 225) become suscep-tible to methylation by dimethylsulfate when thetetraloop-receptor interaction is disrupted bymutation (Murphy & Cech, 1994). Thus, the adeno-sine platform motif appears to have both confor-mational and sequence plasticity. The majority ofthe ICI sequences with the same internal loop con-®guration as the Tetrahymena group I intron (seeabove) have an adenosine and purine juxtaposedand adjacent to the G:U base-pair (positions 219and 254, and 226 and 248; see Table 7).The most conserved features of the two group Iintron adenosine platforms that occur at internalloops are the two consecutive adenosines at the 3Hend of the loop. The paired G at the 3Hloop.5Hhelixjunction and the G:U base-pair are also moderatelyconserved. Since the majority of the 16 S and 23 SrRNA adenosine platform candidates are moreconserved at these four positions than the twoknown intron adenosine platforms, it is reasonableto expect this motif to occur at the majority (if notall) of the 16 S and 23 S rRNA AAG:U sequencemotifs listed in Table 2. Also note that the majority(77 %) of the rRNA platform candidates occur ininternal loops (Table 2 and Figure 5). Most of our16 S and 23 S rRNA adenosine platform candidatesalso have an adenosine and purine juxtaposed andadjacent to the G:U base-pair facing the loop(Gautheret et al., 1995b), similar to the two intronadenosine platforms; the most notable exception isthe junction at position 1890 in the 23 S rRNA,where a highly conserved (97 %) uridine at position1852 is opposite the ®rst A at position 1890. Twosets of rRNA adenosine platform candidates (16 SrRNA positions 415 and 432, and 23 S rRNA pos-itions 1854 and 1890) occur at the two opposingends of the same internal loop. The structural andfunctional signi®cance of this tight clustering ofadenosine platforms is currently unknown. Wewonder if these two potential adenosine platformTable 7. Base composition of adenosine platforms in group IC1 intronsPercentageaA C G U A C G U PairingbStructurecaPercentages were determined as described in the text. Only percentages greater than 1 % are shown.bBase-pairing occurring in more than 5 % of the sequences examined.cPartial secondary structure of the Tetrahymena thermophila IC1 intron (GenBank #J01235). The complete structure is available athttp://www.rna.icmb.utexas.edu/CSI/2STR/ref2str.htmldIndicates base present in the P4-P6 subdomain of Tetrahymena thermophila.348 Unpaired Adenosine Bases in Ribosomal RNAs
  • 15. motifs form simultaneously, or perhaps alternatein formation during protein biosynthesis. Addition-ally, six of the putative adenosine platforms inTable 2 overlap with other A-motifs, e.g. 16 SrRNA position 1289 is part of the adenosine plat-form and the AA.AG@helix.ends motif 16 S rRNAposition 415 (Elgavish et al., unpublished results) ispart of the adenosine platform and the E-like loopmotif (see below). The A-motifs, that are associatedwith adenosine platforms are noted in Table 2.E and E-like loopsComparative sequence analysis has identi®edpotential E loop motifs (Varani et al., 1989;Wimberly et al., 1993) in both 16 S and 23 S rRNA(Gautheret et al., 1994; Wimberly, 1994; Leontis &Westhof, 1998). Thirteen dominant A sites inTables 2-6 overlap with eleven E loops; each occur-rence is indicated in these Tables. Two 16 S andeight 23 S rRNA loop E motifs were predicted ear-lier. The 16 S rRNA positions are 909 (Table 5) and1349 (Table 6); the 23 S rRNA positions are 207(Table 6), 244 (Table 4), 374 (Table 4), 460 (Table 6),674, 1189 (Table 4), 1268 (Table 4), and 2657(Table 6). Our analysis identi®ed all of these exceptfor position 674 in 23 S rRNA. This E loop motifoverlapped with two positions (674 and 806) thatare now base-paired in our covariation structuremodel (comparative support shown in base-pairfrequency tables at the CRW Site; see Materialsand Methods) but were unpaired at the time thatthe E loop was proposed (Leontis & Westhof,1998). Therefore, we dont consider this putative Eloop to be valid.Our analysis of dominant A positions has alsorevealed two new E loop sequence motifs. The ®rstis at positions 447-449 and 484-487 in 16 S rRNA,with both positions 449 and 487 containing a domi-nant A. This potential E loop motif is at the centerof an elongated and irregular compound helix.This motif is ¯anked on one side by a helix and onthe other by a lone pair (450:483, E. coli number-ing). A tandem G:A base-pair is on the other sideof this lone pair. The second new E loop sequencemotif is in the 23 S rRNA at positions 858-861 and916-918. The nucleotides in this motif were pairedin the older versions of the 23 S rRNA secondarystructure model, thus preventing its detection untilnow. The previous base-pairs were removed fromthe current structure model since the variations atthe individual positions were not matched by asimilar pattern of variation at the partner positions.Our analysis of the dominant A bases at the 3Hend of loops has also revealed a sequence motifthat is similar to but not identical with the E loopmotif. The canonical E loop motif has an asym-metric 4Â3 internal loop, as shown in Figure 6(a).For sequences 5H-NGUAP-3Hand 5H-QGAA-3H, Pand Q (positions 5 and 6) are base-paired, withunusual pairing conformations between positions 1and 9, 3 and 8, and 4 and 7 (Figure 6(a)). In con-trast, our E-like loop motif, as we like to call it,also contains the two sequences 5H-NGUAP-3Hand5H-QGAAZ-3H(Figure 6(b)). Here again, P and Q(positions 5 and 6) and N and Z (positions 1 and10) form two canonical base-pairs, leaving the 5HGUA-3Hin sequence 1 juxtaposed with the 5H-GAA3Hin sequence 2. Presumably three additional pair-ings are formed: G:A (2 and 9), U:A (3 and 8), andA:G (4 and 7). The conformations for the secondand third pairings, U:A and A:G are related to theG:A type II tandems as described by Gautheretet al. (1994). Here, the invariant U:A base-pair isthought to adopt the reverse Hoogsteen confor-mation, adjacent to a sheared A:G base-pair, result-ing in the two adenosine bases protruding into theminor groove and overwinding the helix. Thisarrangement of nucleotides is present in the bac-terial version of the 5 S rRNA E loop, and is calledthe cross-strand A stack (Correll et al., 1997). Poss-ibly the ®rst sheared A:G base-pair (positions 2and 9 in Figure 6(b)) underwinds the helix andreturns it to register. Eight E-like loop motifs arepresent in the conserved core of the 16 S and 23 SrRNAs and contain eleven dominant A sites. Threeof these motifs occur at positions 413-415/428-430,765-767/812-814, and 780-782/800-802 in the 16 SrRNA; ®ve more occur in the 23 S rRNA atpositions 298-300/338-340, 1358-1360/1371-1373,1475-1477/1514-1516, 1687-1689/1699-1701, and1930-1932/1968-1970. Five of these E-like loopsoccur in internal loops; three are present in multi-stem loops. The A-motifs that are associated withE and E-like loops are noted in Tables 2-6.AA.AG@helix.ends and tandem G:A base-pairsAdenosine bases at the 3Hend of loops have alsobeen associated with G:A base-pairs at the end ofhelices (Traub & Sussman, 1982; Woese et al.,1983). Here, the helix is extended by at least oneG:A base-pair (for example, the sequences 5H-AGP-3Hand 5H-QCG-3Hinteract to form A:G, G:C, andP:Q base-pairs). G:A juxtapositions have beenFigure 6. Schematic of E and E-like loops. Nucleotidesare numbered for reference. Types of base-pairing areindicated by lines: canonical pairings (G:C, A:U) havethick, continuous lines, type II tandem G:A pairingshave thin, broken lines, and other non-canonical pair-ings are shown with thick, broken lines. (a). Canonical Eloop, where positions 1-4 and 7-9 comprise the 4 Â 3internal loop. (b). E-like loop. Positions 2-4 and 7-9 com-prise the 3 Â 3 internal loop.Unpaired Adenosine Bases in Ribosomal RNAs 349
  • 16. shown to be energetically stable in one thermo-dynamic study of bulge loops (Longfellow et al.,1990). More recently, we have analyzed a largenumber of 16 S and 23 S rRNAs comparative struc-ture models and con®rmed that many helices doclose with a G:A juxtaposition (Elgavish et al.unpublished results). However, we also noted inour comparative study that many of these juxtapo-sitions in E. coli are maintained in at least 90 % ofthe sequences and found, in addition to the G:Ajuxtapositions, that many helices are ¯anked byA:A or A:A/G:A juxtapositions. Our studiesrevealed a strong bias in the orientation for theseG:A base-pairs: A is always 5Hto the helix, while Gor A is 3Hto the helix. These observations are con-sistent with the bias for unpaired adenosine basesat the 3Hend of loops and for the high percentageof unpaired G and A at the 5Hend of loops. Notethat some of these AA.AG@helix.ends are acomponent of E and E-like loops and that GNRAtetraloops (Woese et al., 1990) have the AA.AG@helix.ends motif. A total of 116 A-motifs are associ-ated with AA.AG@helix.ends and are noted inTables 2-6.Several of these A:A and G:A juxtapositions atthe 5Hend of helices are ¯anked on their 5Hside bya second A:A or G:A pair. Tandem G:A and A:Apairs in the 16 S and 23 S RNA were identi®ed ear-lier (SantaLucia et al., 1990; Gautheret et al., 1994),and can adopt a single structure conformation thatis consistent with their pattern of nucleotidesubstitutions (Gautheret et al., 1994). We havesearched again for these tandem G:A/A:A motifsin our newer 16 S and 23 S rRNA comparativestructure models and our larger collection ofcomparative rRNA structure models. In addition tothe tandems identi®ed earlier (Gautheret et al.,1994), we have found 23 new tandems thatare conserved in at least 90 % of the bacterial 16 Sand 23 S rRNA sequences. Fifty A-motifs areassociated with G:A tandems, and they are notedin Tables 2-6.U-turnsThe U-turn, a structure motif characterized by asharp turn in the RNA, was ®rst identi®ed in thetRNA crystal structure (Quigley & Rich, 1976), andsubsequently has been found in several otherRNAs (Pley et al., 1994; Jucker & Pardi, 1995;Huang et al., 1996; Fountain et al., 1996; Conn et al.,1999; Culver et al. 1999; Stallings & Moore, 1997;Puglisi & Puglisi, 1998).Dominant A nucleotides at the 3Hend of 16 Sand 23 S rRNA loops are also found in some of thetetra- and hexanucleotide hairpin loops that formU-turns (Woese et al., 1990; Jucker & Pardi, 1995;Huang et al., 1996; Fountain et al., 1996). In both ofthese loop mo®fs, a base-pair forms between theguanosine at the ®rst position of the hairpin loop(and 3Hto the helix), and the adenosine at the lastposition of the loop (and 5Hto the helix). Recently,we have predicted, based on the analysis of manycomparative structure models, 57 positions in the16 S and 23 S rRNA where the U-turn motif mightoccur (Gutell et al., 2000). The 39 U-turn candidatesthat are coincident with A-motifs are noted inTables 2-6. Of these, 22 occur in hairpin loops; 13(59 %) of these are GNRA tetraloops. The remain-ing 17 occur in internal loops and multi-stemloops.Concluding commentsOf the 527 positions at the 3Hend of loops in the16 S and 23 S rRNA, nearly 300 are occupied witha dominant A, an adenosine that occurs in morethan 50 % of the bacterial sequences. Largersequence motifs that occur frequently are builtonto these A-motifs. There are 102 A, 56 AA, 80AG, 43 AAG, and 13 AAG:U A-motifs. A total of51 % of of these sites are part of a known structuralmotif (Table 8(a)). Of these, 39 % of the A-motifsare associated with the AA.AG@helix.ends motif;14 % of these are within GNRA tetraloops. TandemG:A pairs and U-turns are also common, occurringat 17 % and 14 % of the A-motif sites, respectively.There are smaller percentages of adenosineplatforms (4 %) and E loop (4 %) and E-like loop(4 %) sequence motifs (Table 8(a)).Some of these structural motifs are part of a lar-ger structural element. For example, some of theAA.AG@helix.ends motifs are within the bound-aries of E and E-like loops, the tandem G:A motif,and GNRA tetraloops. Some of these GNRA hair-pin loops are themselves involved in larger tertiaryfolds (Jaeger et al., 1994; Costa & Michel, 1995;Cate et al., 1996b). Other A-motifs are associatedwith more than one structural motif in which onemotif is not entirely contained within the other.Here, the structural motifs involve positions thatare not utilized by the other, except for the domi-nant A at the 3Hend of the loop. For example, pos-ition 415 in 16 S rRNA is part of the E-like loopand adenosine platform motifs. Two exampleswhere a single dominant A is part of both an ade-nosine platform and a G:A tandem are at 16 SrRNA position 432 and position 1854 in 23 SrRNA. Although our understanding of RNA struc-tural motifs is not complete, these overlapping andpossibly competing structural A-motifs suggestthat these junctions of the RNA might be under-going conformational changes. In total, only onestructural motif occurs at 51 % of the A-motifs thatare associated with a known structural motif(Table 8). A total of 37 % are associated with twostructural motifs, and 13 % are associated withthree structural motifs.In contrast, we are unable to predict the struc-ture conformation for 49 % of the A-motifs. There-fore, there is the possibility that new structuralmotifs occur at these positions. Alternatively, struc-tural motifs that we are already familiar with occurat these A-motifs with a composition and arrange-ment of nucleotides that were not previouslyassociated with that motif (for example, adenosine350 Unpaired Adenosine Bases in Ribosomal RNAs
  • 17. platforms occur at positions with sequences otherthan AAG:U). To help resolve this issue, the con-formations of these adenosine bases in the 30 Sand 50 S ribosomal subunit crystal structures (Banet al., 2000; Schluenzen et al., 2000; Wimberly et al.,2000) need to be analyzed. Some 8 % of theA-motifs are single bulge adenosine nucleotides;while the structural signi®cance for all of them areunknown, covariation analysis and NMR haverevealed a base-triple in 16 S rRNA between abulged A at position 595 and the base-pair at596:644 (CRW Site; Kalurachchi et al., 1997).Although the thermodynamic consequences ofthe unpaired adenosine bases identi®ed here in thecovariation-based structure models are not known,an earlier thermodynamic study of internal loopsrevealed that unpaired adenosine bases in asym-metrical loops are more destabilizing than those insymmetrical loops (Peritz et al., 1991). The threesets of results, (1) this thermodynamic study; (2)the preponderance of adenosine bases in unpairedregions of the covariation-based structure model,with the majority of these occurring in asymmetri-cal loops; and (3) the structural studies that revealthat the majority of these unpaired adenosinenucleotides are base-paired, albeit in an irregularmanner (Cate et al., 1996a,b; Ban et al., 2000;Schluenzen et al., 2000; Wimberly et al., 2000), mayall be coordinated and in¯uence RNA folding. Wespeculate that these destabilizing, asymmetricallyplaced adenosine nucleotides are a signi®cant com-ponent in the transition from secondary to tertiaryRNA structure. The destabilizing effects of theseadenosines on secondary structure, coupled withthe need for an RNA molecule to adopt its minimalenergetic state, suggest that these abundant adeno-sine nucleotides will actively seek out energeticallystabilizing tertiary interactions and, in the process,form a three-dimensional RNA molecule.The propensity for conserved and unpaired ade-nosine bases in the 16 S and 23 S rRNA covariationstructure models must be related to the structureand function of the ribosome. As stated earlier,unpaired positions in the covariation structuremodel do not imply that those positions are notpaired; it (only) says that they dont pair in theregular manner that most covariation-based base-pairs do. And given that other unpaired positionsare paired, albeit irregularly, in other RNAmolecules whose structures have been solved bycrystallography or NMR (e.g. adenosine platforms,E loops), we anticipate these unpaired positions inthe 16 S and 23 S rRNA covariation structuremodels to be paired. We now wonder if these unu-sual pairings can be predicted with comparativeanalysis. Our A story is a beginning towards thisend.As noted, the A-motifs come in various forms,i.e. A, AA, AG, AAG, and AAG:U, and these areassociated with several known structural motifs.These observations suggest that unpaired adeno-sine bases can form a variety of different structuralconformations. What is special about adenosinethat lends itself to participating in these structuralmotifs? And in some situations, it appears asthough at least two different structural elementscan occur at the same A-motif. Does one structuralmotif predominate at these positions, or do thesesites provide the ribosome with an opportunity toalternate conformations during the ribosome cycle?Is the prevalence of adenosine bases at these pos-itions related to the ability of adenosine to accom-modate a variety of binding partners, perhaps itsbase stacking potential, or other interesting inter-actions? The A story is not ®nished.Table 8. Summary of domainant A nucleotides and related motifs (based upon Tables 1-6)A. Occurrences of motifs at dominant A positionsCategory 16 S rRNA 23 S rRNA Total1 # of adenosine platforms 3 (3 %) 10 (5 %) 13 (4 %)2 # of loops 4 (4 %) 8 (4 %) 12 (4 %)3 # of E-like loops 4 (4 %) 7 (4 %) 11 (4 %)4 # of AA,AG@helix.ends 44 (44 %) 72 (37 %) 116 (39 %)4a # of AA,AG@helix.ends in GNRA tetraloops 8 (8 %) 8 (4 %) 16 (5 %)4b # of other AA,AG@helix.ends 36 (36 %) 64 (33 %) 100 (34 %)5 # of tandem GAs 13 (13 %) 37 (19 %) 50 (17 %)6 # of U-turns 11 (11 %) 29 (15 %) 40 (14 %)7 # of single bulges 9 (9 %) 14 (7 %) 23 (8 %)8 Total # of dominant A bases associated with motifs (1-6)a51 (51 %) 98 (51 %) 149 (51 %)9 # of dominant A bases not associated with motifs (1-6) 49 (49 %) 96 (49 %) 145 (49 %)10 Total # of dominant A bases at 3Hends of loops (8 ‡ 9) 100 194 294B. Number of motifs per dominant A nucleotide (not including single bulges)Motifs 16 S rRNA 23 S rRNA Total1 25 (49 %) 51 (52 %) 76 (51 %)2 24 (47 %) 30 (31 %) 54 (36 %)3 2 (4 %) 17 (17 %) 19 (13 %)Total # of dominant A bases 51 98 149Total # of associated motifs 79 162 241Average # of associated motifs per dominant A position with an associatedmotif1.5 1.7 1.6aA single dominant A may be associated with 1-3 motifs.Unpaired Adenosine Bases in Ribosomal RNAs 351
  • 18. Materials and MethodsAdditional supporting data is presented at the CRWSite (http://www.rna.icmb.utexas.edu) and the CRW AStory pages (http://www.rna.icmb.utexas.edu/ANAL-YSIS/A-STORY/). The CRW A story informationsupplements the data presented in Figures 1-4 andTables 1-8 and is divided into four categories: generaldata; position-speci®c data; structure diagrams; andmanuscript materials. The general data (GE) section con-tains generalized counts for the number and frequencyof different A-motifs in the 16 S and 23 S rRNA com-parative structure models from the (1) bacteria (summar-ized in Figures 1-4); (2) the archaea and eucarya(nuclear, chloroplast, and mitochondria); and (3) A-motifanalysis of the comparative structure models from 5 SrRNA and group I introns. The position-speci®c data(PS) section presents frequency tables for all of the 16 Sand 23 S rRNA positions which contain an A-motif (withdata from the three phylogenetic domains, chloroplasts,and mitochondria); larger motifs (adenosine platforms,E and E-like loops, AA.AG@helix.ends, tandem G:Apairings, and U-turns) that map onto the A-motifs areidenti®ed. Frequency tables for E and E-like Loops(including only bacterial data) are also provided here.The structure diagrams (SD) section contains Figure 5and includes secondary structure diagrams for each ofthe motifs examined in these motifs. The manuscriptmaterials (MS) section contains all of the Figures andTables from this manuscript.The RNA sequence alignments used for this analysisare maintained by us at the University of Texas (R.R.G.,unpublished results; CRW Site). Sequences were manu-ally aligned with the alignment editor AE2 (T. Macke,Scripps Research Institute, San Diego, CA). As of June2000, the bacterial 16 S alignment contains 5859sequences, and the bacterial 23 S alignment contains 327sequences; both alignments use E. coli (GenBank Acces-sion # J01695) as their reference sequence for positionnumbers. The group I intron (C1 subclass) alignmentcontains 319 sequences and uses T. thermophila (GenBankAccession # J01235) as its reference sequence for positionnumbers. Two subalignments of 110 and 139 sequenceshaving the appropriate arrangement of nucleotides at the219 and 226 adenosine platform internal loops (see thetext) were created from this larger alignment. Thesesequence alignments will be available from this site inthe future.Secondary structure models for representatives of themain phylogenetic groupings are inferred by compara-tive sequence analysis (Gutell; 1996; Gutell et al., unpub-lished results). As of June 2000, a total of 399 16 SrRNAs, 292 23 S rRNA, 73 5 S rRNAs, and 174 group Iintron secondary structure models are in our collection(CRW Site). At present, only a subset of these diagrams(those diagrams incorporating all of the newest pairingsin our re®ned structure models and in which we havethe most con®dence) are publicly available; as diagramsare updated to meet these standards, they will be madeavailable. For Figures 1-4, we counted the overall distri-butions of the four nucleotides for the entire RNA struc-ture, and for paired, unpaired, and loop-helix junctionpositions, analyzing 278 bacterial structures (209 from16 S rRNA and 69 from 23 S rRNA); a complete list ofthese models is available online. We also present onlinethe detailed frequencies used to calculate the histogramsin Figures 1-4. For these tables (CRW A Story (GE)), wehave analyzed all of our 16 S, 23 S, and 5 S rRNA(bacterial, archaea, eucarya, chloroplast, and mitochon-dria) and group I intron comparative structure models.The numbers of structure models analyzed for the onlinetables are included in those tables. Other nucleotide dis-tributions are listed dynamically on our online tables.The programs that generate this information will be pre-sented elsewhere (Z.S. & R.G., unpublished results).These online tables will be routinely updated as morecomparative structure models are determined.Positions at the 3Hends of loops in the E. coli 16 S and23 S rRNA secondary structure models were manuallyidenti®ed. Each site was classi®ed into one of four looptypes: hairpin, multi-stem, internal, or bulge. The pre-dicted A-motif frequencies in Table 1 were calculatedusing the nucleotide frequency values determined fromthe bacterial 16 S and 23 S structures (above).The program query (Gutell et al., unpublishedprogram) was used to collect nucleotide frequency datafrom (AE2) sequence alignments. Base frequenciesfor each site were computed independently from thebacterial alignments (16 S and 23 S rRNA). For bacterialdata, sites with a given A-motif in more than 50 % of thesequences (33 % for the AAG:U motif) are summarizedin Table 1 and detailed in Tables 2-6; the data fromTables 1-6 are summarized with respect to structuralmotifs in Table 8. Single nucleotide and base-pairfrequencies in Table 7 were calculated from the intronalignments using query.The secondary structure ®gures showing the A-motifsites (Figure 5), the group I intron secondary structurediagram portion in Table 7, and the additional secondarystructure diagrams available online were generated withthe program XRNA (Weiser & Noller, University ofCalifornia, Santa Cruz).AcknowledgmentsThis work was supported by the NIH (awarded toR.R.G., GM48207), NSF (awarded to M.S., MCB-9707940), Welch Foundation (awarded to R.R.G.), andfrom startup funds from the Institute for Cellular andMolecular Biology at the University of Texas at Austin(awarded to R.R.G.).ReferencesAgalarov, S. C., Prasad, G. S., Funke, P. M., Stout, C. D.& Williamson, J. R. (2000). Structure of theS15,S6,S18-rRNA complex: assembly of the 30 Sribosome central domain. Science, 288, 107-112.Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B.& Steitz, T. A. (1999). Placement of protein andRNA structures into a 5 A-resolution map of the50 S ribosomal subunit. Nature, 400, 841-847.Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz,T. A. (2000). The complete atomic structure of thelarge ribosomal subunit at 2.4 A resolution. Science,289, 905-920.Butcher, S. E., Dieckmann, T. & Feigon, J. (1997).Solution structure of a GAAA tetraloop receptorRNA. EMBO J, 16, 7490-7499.Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,B. L., Kundrot, C. E. et al. (1996a). Crystal structureof a group I ribozyme domain: principles of RNApacking. Science, 273, 1678-1686.Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,B. L., Szewczak, A. A., Kundrot, C. E., Cech, T. R.352 Unpaired Adenosine Bases in Ribosomal RNAs
  • 19. & Doudna, J. A. (1996b). RNA tertiary structuremediation by adenosine platforms. Science, 273,1696-1699.Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest,T. N. & Noller, H. F. (1999). X-ray crystal structuresof 70S ribosome functional complexes. Science, 285,2095-2104.Clemons, W. M., Jr, May, J. L. C., Wimberly, B. T.,McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V.(1999). Structure of a bacterial 30 S ribosomalsubunit at 5.5 A resolution. Nature, 400, 833-840.Conn, G. L., Draper, D. E., Lattman, E. E. & Gittis, A. G.(1999). Crystal structure of a conserved ribosomalprotein-RNA complex. Science, 284, 1171-1174.Correll, C. C., Freeborn, B., Moore, P. B. & Steitz, T. A.(1997). Metals, motifs, and recognition in the crystalstructure of a 5S rRNA domain. Cell, 91, 705-712.Costa, M. & Michel, F. (1995). Frequent use of the sametertiary motif by self-folding RNAs. EMBO J. 14,1276-1285.Costa, M. & Michel, F. (1997). Rules for RNA recog-nition of GNRA tetraloops deduced by in vitroselection: comparison with in vivo evolution. EMBOJ. 16, 3289-3302.Culver, G. M., Cate, J. H., Yusupova, G. Z., Yusupov,M. M. & Noller, H. F. (1999). Identi®cation of anRNA-protein bridge spanning the ribosomal sub-unit interface. Science, 285, 2133-2136.Damberger, S. H. & Gutell, R. R. (1994). A comparativedatabase of group I intron structures. Nucl. AcidsRes. 22, 3508-3510.Fountain, M. A., Serra, M. J., Krugh, T. R. & Turner,D. H. (1996). Structural features of a six-nucleotideRNA hairpin loop found in ribosomal RNA.Biochemistry, 35, 6539-6548.Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N.,Caruthers, M. H., Neilson, T. & Turner, D. H.(1986). Improved free-energy parameters for predic-tions of RNA duplex stability. Proc. Natl Acad. Sci.USA, 83, 9373-9377.Gautheret, D., Konings, D. & Gutell, R. R. (1994). Amajor family of motifs involving G:A mismatches inribosomal RNA. J. Mol. Biol. 242, 1-8.Gautheret, D., Damberger, S. H. & Gutell, R. R. (1995a).Identi®cation of base-triples in RNA using com-parative sequence analysis. J. Mol. Biol. 248, 27-43.Gautheret, D., Konings, D. & Gutell, R. R. (1995b). G:Ubase pairing motifs in ribosomal RNAs. RNA, 1,807-814.Gutell, R. R. (1996). Comparative sequence analysis andthe structure of 16S and 23S rRNA. In RibosomalRNA: Structure, Evolution, Processing and Func-tion in Protein Biosynthesis (Dahlberg, A. E. &Zimmermann, R. A., eds), pp. 111-128, CRC Press,Boca Raton, FL, USA.Gutell, R. R. (1999). Comparative analysis of RNAsequences. Nucl. Acids Symp. Ser. 41, 48-53.Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F.(1985). Comparative anatomy of 16S- like ribosomalRNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216.Gutell, R. R., Cannone, J. J., Konings, D. & Gautheret, D.(2000). Predicting U-turns in ribosomal RNA withcomparative sequence analysis. J. Mol. Biol. 300,791-803.Huang, S., Wang, Y.-X. & Draper, D. E. (1996). Structureof a hexanucleotide RNA hairpin loop conserved inribosomal RNAs. J. Mol. Biol. 258, 308-321.Jaeger, L., Michel, F. & Westhof, E. (1994). Involvementof a GNRA tetraloop in Long-range RNA tertiaryinteractions. J. Mol. Biol. 236, 1271-1276.Jucker, F. M. & Pardi, A. (1995). GNRA tetraloops makea U-turn. RNA, 1, 219-222.Kalurachchi, K., Uma, K., Zimmermann, R. A. &Nikonowicz, E. P. (1997). Structural features of thebinding site for ribosomal protein S8 in Escherichiacoli 16S rRNA de®ned using NMR spectroscopy.Proc. Natl Acad. Sci. USA, 94, 2139-2144.Leontis, N. B. & Westhof, E. (1998). A common motiforganizes the structure of multi-helix loops in 16 Sand 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-583.Longfellow, C. E., Kierzek, R. & Turner, D. H. (1990).Thermodynamic and spectroscopic study of bulgeloops in oligoribonucleotides. Biochemistry, 29, 278-285.Michel, F. & Dujon, B. (1983). Conservation of RNA sec-ondary structures in two intron families includingmitochondrial-, chloroplast- and nuclear-encodedmembers. EMBO J. 2, 33-38.Michel, F. & Westhof, E. (1990). Modeling of the three-dimensional architecture of group I catalytic intronsbased upon comparative sequence analysis. J. Mol.Biol. 216, 585-610.Michel, F., Costa, M., Massire, I. & Westhof, E. (2000).Modeling RNA tertiary structure from patterns ofsequence variation. Methods Enzymol. 317, 491-510.Murphy, F. L. & Cech, T. R. (1994). GAAA tetraloopand conserved bulge stabilize tertiary structure of agroup I intron domain. J. Mol. Biol. 236, 49-63.Nikulin, A., Serganov, A., Ennifar, E., Tishchenko, S.,Nevskaya, N., Shepard, W., Portier, C., Garber, M.,Ehresmann, B., Ehresmann, C., Nikonov, S. &Dumas, P. (2000). Crystal structure of the S15-rRNAcomplex. Nature Struct. Biol. 7, 273-277.Peritz, A. E., Kierzek, R., Sugimoto, N. & Turner, D. H.(1991). Thermodynamic study of internal loops inoligoribonucleotides: symmetric loops are morestable than symmetric loops. Biochemistry, 30, 6428-6436.Pley, H. W., Flaherty, K. M. & McKay, D. B. (1994).Three-dimensional structure of a hammerhead ribo-zyme. Nature, 372, 68-74.Puglisi, E. V. & Puglisi, J. D. (1998). HIV-1 A-rich RNAloop mimics the tRNA anticodon structure. NatureStruct. Biol. 5, 1533-1036.Quigley, G. J. & Rich, A. (1976). Structural domains oftransfer RNA molecules. Science, 194, 796-806.SantaLucia, J., Kierzek, R. & Turner, D. H. (1990). Effectsof GA mismatches on the structure and thermo-dynamics of RNA internal loops. Biochemistry, 9,8813-8819.Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J.,Gluehmann, M., Janell, D., Bashan, A., Bartels, H.,Agman, I., Franceschi, F. & Yonath, A. (2000). Struc-ture of functionally activated small ribosomal sub-unit at 3.3 AÊ resolution. Cell, 102, 615-623.Serra, M. J., Axenson, T. J. & Turner, D. H. (1994). Amodel for the stabilities of RNA hairpins based ona study of the sequence dependence of stability forhairpins with six nucleotides. Biochemistry, 33,14289-14296.Stallings, S. C. & Moore, P. B. (1997). The structure ofan essential splicing element: stem loop IIA fromyeast U2 snRNA. Structure, 5, 1173-1185.Szewczak, A. A., Moore, P., Chan, Y-L. & Wool, I. G.(1993). The conformation of the sarcin/ricin loopUnpaired Adenosine Bases in Ribosomal RNAs 353
  • 20. from 28S ribosomal RNA. Proc. Natl Acad. Sci. USA,90, 9581-9585.Tocilj, A., Schluenzen, F., Janell, D., Gluehmann, M.,Hansen, H. A. S., Harms, J., Bashan, A., Bartels, H.,Agmon, I., Franceschi, F. & Yonath, A. (1999). Thesmall ribosomal subunit from Thermus thermophilusat 4.5 AÊ resolution: pattern ®ttings and the identi®-cation of functional site. Proc. Natl Acad. Sci. USA.96, 14252-14257.Traub, W. & Sussman, J. L. (1982). Adenine-guaninebase pairing in ribosomal RNA. Nucl. Acids Res. 10,2701-2708.Varani, G., Wimberly, B. & Tinoco, I., Jr (1989). Confor-mation and dynamics of an RNA internal loop.Biochemistry, 28, 7760-7772.Wimberly, B. (1994). A common RNA loop motif as adocking module and its function in the hammer-head ribozyme. Nature Struct. Biol. 1, 820-827.Wimberly, B., Varani, G. & Tinoco, I., Jr (1993). The con-formation of loop E of eukaryotic 5S ribosomalRNA. Biochemistry, 32, 1078-1087.Wimberly, B. R., Guymon, R., McCutcheon, J. P., White,S. W. & Ramakrishnan, V. (1999). A detailed viewof a ribosomal active site: the structure of the L11-RNA complex. Cell, 97, 491-502.Wimberly, B. T., Broderson, D. E., Clemons, W. M., Jr,Morgan-Warren, R. J., Carter, A. P., Vonrhein, C.,Hartsch, T. & Ramakrishnan, V. (2000). Structure ofthe 30 S ribosomal subunit. Nature, 407, 327-339.Woese, C. R. & Pace, N. R. (1993). Probing RNA struc-ture, function, and history by comparative analysis.In The RNA World (Gesteland, R. F. & Atkins, J. F.,eds), pp. 91-117, Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, NY.Woese, C. R., Gutell, R., Gupta, R. & Noller, H. F.(1983). Detailed analysis of the higher-order struc-ture of 16S-like ribosomal ribonucleic acids. Microb.Rev. 47, 621-669.Woese, C. R., Winker, S. & Gutell, R. R. (1990). Architec-ture of ribosomal RNA: constraints on the sequenceof ``tetra-loops. Proc. Natl Acad. Sci. USA, 87, 8467-8471.Xia, T., SantaLucia, J., Jr, Burkard, M. E., Kierzek, R.,Schroeder, S., Jiao, X., Cox, C. & Turner, D. H.(1998). Thermodynamic parameters for anexpanded nearest neighbor model for formationof RNA duplexes with Watson-Crick base-pairs.Biochemistry, 37, 14719-14735.Edited by D. E. Draper(Received 7 July 2000; received in revised form 9 September 2000; accepted 9 September 2000)354 Unpaired Adenosine Bases in Ribosomal RNAs