Gutell 075.jmb.2001.310.0735

188 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
188
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gutell 075.jmb.2001.310.0735

  1. 1. AA.AG@Helix.Ends: A:A and A:G Base-pairs at theEnds of 16 S and 23 S rRNA HelicesTricia Elgavish1, Jamie J. Cannone2, Jung C. Lee3, Stephen C. Harvey1and Robin R. Gutell2*1Department of Biochemistryand Molecular GeneticsUniversity of Alabama atBirmingham, BirminghamAL 35294, USA2Institute for Cellular andMolecular Biology, Universityof Texas at Austin, 2500Speedway, Austin, TX 78712-1095, USA3Division of MedicinalChemistry, College ofPharmacy, University of Texasat Austin, AustinTX 78712, USAThis study reveals that AA and AG oppositions occur frequently at theends of helices in RNA crystal and NMR structures in the PDB databaseand in the 16 S and 23 S rRNA comparative structure models, with the Gusually 3Hto the helix for the AG oppositions. In addition, these opposi-tions are frequently base-paired and usually in the sheared conformation,although other conformations are present in NMR and crystal structures.These A:A and A:G base-pairs are present in a variety of structuralenvironments, including GNRA tetraloops, E and E-like loops, interfacedbetween two helices that are coaxially stacked, tandem G:A base-pairs,U-turns, and adenosine platforms. Finally, given structural studies thatreveal conformational rearrangements occurring in regions of the RNAwith AA and AG oppositions at the ends of helices, we suggest thatthese conformationally unique helix extensions might be associated withfunctionally important structural rearrangements.# 2001 Academic PressKeywords: ribosomal RNA structure; comparative sequence analysis; A:Aand A:G base-pairs (non-canonical pairs); structure motifs; computationalbiology/bioinformatics (coaxial stacking)*Corresponding authorIntroductionOur ultimate goal is to accurately predict RNAsecondary and tertiary structure from its sequence.To begin to achieve this objective, we need adetailed set of RNA structure rules and principlesthat relate sequences to small structural elementsas well as to global structure. Given that the num-ber of possible secondary structures for an RNAsequence is very large (http://www.rna.icmb.utex-as.edu/METHODS/) and the current set of RNAstructure principles within the best of the RNAfolding algorithms1,2are not adequate to achievethese goals,3,4we have utilized comparativesequence analysis5,6to identify those base-pairsthat would form similar structures for a set ofsequences considered to be structurally and func-tionally equivalent. Traditionally, we havesearched for positions in a sequence alignmentwith similar patterns of variation (also called co-variation). Due to the strong congruence betweenthese covariation-based comparative structuremodels and crystal structure solutions7(Gutellet al., unpublished results), we are very con®dentin the authenticity of these proposed base-pairs.While the majority of the positions that covarywith one another are associated with secondarystructure base-pairs, there are a few short- andlong-range tertiary interactions in the rRNAs8(CRW Site; see Materials and Methods). We nowaspire to predict additional base-pairings at thepositions that are not base-paired in the covaria-tion-based structure models. These base-pairswould add more secondary structure to the currentcomparative structure models and fold this modelinto a three-dimensional structure.Both of these latter aspirations will require adifferent type of comparative sequence analysisthat goes beyond simple covariation analysis.Operationally, we de®ne comparative sequenceanalysis as the general method that identi®es struc-tures that are common to different sequences,while covariation analysis is the method that ident-i®es positions in a sequence alignment with similarpatterns of variation. Covariation analysis willidentify a subset of the total number of base-pairsthat are in common to different sequences. Whilethis latter type of analysis identi®es structurallyE-mail address of the corresponding author:robin.gutell@mail.utexas.eduAbbreviations used: PDB, Protein Data Bank.doi:10.1006/jmbi.2001.4807 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 310, 735±7530022-2836/01/040735±19 $35.00/0 # 2001 Academic Press
  2. 2. isomorphic base-pairs (e.g. A:U, G:C, C:G, andU:A) from the identi®cation of positions with simi-lar patterns of variation in a sequence alignment, itis possible to form isomorphic base-pair confor-mations from two positions that have different pat-terns of variation. To identify these, we need toknow, a priori, the base-pair exchanges (e.g. G:U toG:C or A:G to A:A) that will form isomorphicbase-pair conformations within a speci®c structuralcontext. A few years ago, we developed a compu-ter program that would return the isomorphicbase-pair conformations that are possible for anyknown set of pairing types.9However, this systemby itself will not help us to identify new base-pairsat positions with no matching pattern of variationsince, without additional information, we do notknow which positions to base-pair. Ultimately, weneed to have a larger set of structural constraintsthat will help us decipher the unique patterns ofvariation into isomorphic structures.Beyond the canonical base-pairs (A:U, G:C, G:U)that are arranged into the standard secondarystructure helices and tertiary interactions, severalother RNA structural motifs have been identi®edwith a sequence analysis perspective.5,6,8,10Theseinclude tetraloops,11lone-pair tri-loops,8pseudoknots,6,12,13dominant G:U base-pairs,14tan-dem G:A base-pairs,15E-loops,15± 17U-turns,18basetriples,19,20tetraloop receptors,21 ±23adenosineplatforms,24,25and base-pairs arranged in parallel.6A structural perspective of these RNA motifs ispresented in two recent reviews.26,27In addition to the comparative sequence analysisof these RNA motifs, it was ®rst observed in theearly 1980s that helices in Escherichia coli 16 SrRNA were frequently ¯anked by AGoppositions.28,29Consistent with this observation, itwas observed that the majority of the 3Hends ofloops are an adenosine while the 5Hends of loopsare an adenosine or guanosine in the covariation-based 16 S and 23 S rRNA structure models.25An AG opposition (where an opposition refersto two bases on opposite strands at the end of ahelix that are in proximity with one another) atpositions 1056:1103 (E. coli numbering) is base-paired in the crystal structure for the L11 bindingfragment of 23 S rRNA.30Position 1056 is a G inthe majority of the Bacteria, Archaea, and chloro-plasts, while it is an A in the majority of the Eucar-ya. Position 1103 is an A in nearly all of theBacteria, Archaea, Eucarya, and chloroplasts. Thus,from a comparative perspective, we expect themajority of the Eucarya with an A at position 1056to form an A1056:A1103 base-pair. The experimen-tal support for this A:G base-pair, in addition tothe earlier AG sightings at the ends of E. coli 16 SrRNA helices and the bias for unpaired As and Gsat the ends of helices, suggested that many helicesin the rRNAs might be ¯anked with A:G and A:Abase-pairs. During the preparation of this manu-script, high-resolution crystal structures weredetermined for the 30 S and 50 S ribosomal sub-units.31± 33Our objectives for this paper are: (1) toidentify the conserved AA and AG oppositions atthe helix ends in the comparative structure modelsfor 16 S and 23 S rRNA, (2) to determine if AAand AG oppositions are base-paired in all RNAcrystal and NMR structures that contain an AA orAG at the end of a standard helix, and (3) to deter-mine the conformations for these A:A or A:G base-pairs.ResultsComparative sequence analysis of the ends ofrRNA helicesThe nucleotide frequencies at the positions ¯ank-ing the ends of all helices in our 16 S and 23 SrRNA alignments (see Materials and Methods andCRW Site) were determined for the nuclearencoded rRNAs from the three major phylogeneticgroups (Bacteria, Archaea, and Eucarya) and thetwo Eucarya organelles (chloroplasts and mito-chondria). Only helix ends in the Bacteria with anAA, AG, or AA/AG in more than 90 % of thesequences were scored as candidates. Sinceapproximately 90 % of the AG oppositions havethe G 3Hof the helix, we have focused on this orien-tation in this manuscript and in Table 1. However,a small number (eight in rRNA and 14 in the PDBstructure database) of examples of AG oppositionswhere the G is 5Hto the helix are discussed below.All oppositions were subdivided into two cat-egories: invariant and exchange. Invariant sitescontain only AA or AG in the Bacterial alignment,while sites with both types of pairings (where theminimum for each pairing is 2 %) in at least one ofthe primary alignments (Archaea, Bacteria, Eucar-ya nuclear, chloroplast, or mitochondrial) wereclassi®ed as exchanges. These oppositions aremapped onto the December 1999 version of theE. coli 16 S and 23 S rRNA covariation-based struc-ture models (Figure 1; CRW Site). The base-pairfrequencies for each of the AA and AG sites foreach of the 16 S and 23 S alignments (Archaea,Bacteria, Eucarya nuclear, chloroplast, and mito-chondrial) are all available at our web site, CRWAA.AG (see Materials and Methods).There are 139 oppositions (as de®ned above) inthe 16 S and 263 oppositions in the 23 S rRNAcomparative structure models. In the hypotheticalworld where the frequency of each of the fournucleotides is 25 % at paired and unpaired pos-itions and there is no bias for any nucleotide pairsat these positions, for each opposition, we expect a12.5 % (2/16) chance of ®nding an AA or AG.Thus, for any one rRNA sequence, we expect,based upon this random sampling, there to beapproximately 17 (139 Â 0.125) sites in 16 S and 33(263 Â 0.125) sites in 23 S rRNA with an AA or AGopposition at the end of a helix (referred to hereun-der as AA.AG@helix.ends). The expected numberof AA and AG sites that occur at the same pos-itions in 90 % of 5850 Bacterial 16 S sequences is1.7 Â 10À4755, and for 325 Bacterial 23 S rRNA736 A:A and A:G Base-pairs at the Ends of RNA Helices
  3. 3. sequences the number is 7.0  10À265. Thus, weconclude that the odds of ®nding the same patternin 90 % of the sequence sets by random chance areextremely low; however, 30 % of the oppositions atthe ends of 16 S rRNA helices (42 of 139) and 28 %of the oppositions at the ends of 23 S rRNA helices(73 of 263) have an AA or AG opposition in atleast 90 % of the sequences.Since the 1056:1103 base-pair in 23 S rRNA has asigni®cant number of AA and AG oppositionswith a minimal number of alternative base-pairs,we have ¯agged this base-pair, along with othersimilar positions that also have a more signi®cantextent of A:A and A:G pairings. These sites areshown in Figure 1 with red and green asterisks onthe 16 S and 23 S rRNA secondary structure dia-grams and within the AA/AG base-pair frequencytables (CRW AA.AG Online Table 4). The redasterisk sites contain only AA and AG in all of theArchaea, Bacteria, Eucarya nuclear and chloroplastalignments, with a minimum number of excep-tions. The 23 S rRNA 1056:1103 site contains sig-ni®cant amounts of AA/AG pairings in nearly allof the non-mitochondrial sequences; only a fewsequences out of 582 do not have an AA or AG.The other red asterisk sites in 23 S rRNA are627:636 and 2126:2162; sites with comparablenucleotide frequencies in 16 S rRNA are 780:802,888:909, 959:976, 1408:1493, 1417:1483, and1418:1482.The green asterisks (Figure 1; CRW AA.AG)reveal those sites with signi®cant amounts of AA/AG exchanges with a minimal amount of otheroppositions in at least one alignment while at leastone other alignment contains a larger number ofexceptions to the pure AA/AG exchange pattern.Green sites in the 16 S rRNA are: 26:557, 60:107,197:220, 447:487 (with a large percentage of Wat-son-Crick/G:U base-pairs in the Archaea), 691:696,860:869, 1157:1179, and 1304:1333. Green asterisksites in 23 S rRNA are 244:254, 463:466, 602:655,603:625, 637:651 (with a large percentage of Wat-son-Crick base-pairs in the Archaea), 861:916,945:972, 975:988, 1000:1155, 1354:1377, 1655:2005,1791:1828, 2125:2173, 2199:2224, 2287:2345,2346:2371, 2358:2429, 2587:2607, and 2639:2775.Orientation of the AG oppositionsThere are two orientations possible for AG oppo-sitions relative to the helix to which they are adja-cent: the G can be 5Hor 3Hto the adjacent helix. Theanalysis of an early version of the E. coli 16 SrRNA comparative structure model revealed thatTable 1. Distribution of AA/AG oppositions (with G 3Hto helix for AG oppositions) in the bacterial 16 S and 23 SrRNA comparative structure modelsLoop type Hairpin Internal Multi-stemOpposition C[ ‡,ù, À ]a[S,I,O]bC[ ‡,ù, À ]a[S,I,O]bC[ ‡,ù, À ]a[S,I,O]bCocCrd(%)16 S rRNAInvariant 7[7,0,0] [7,0,0] 9[6,0,3] [3,2,1] 5[4,0,1] [0,2,2] 21 17 (81%)AA 0[0,0,0] [0,0,0] 5[2,0,3] [0,1,1] 1[1,0,0] [0,1,0] 6 3 (50%)AG 7[7,0,0] [7,0,0] 4[4,0,0] [3,1,0] 4[3,0,1] [0,1,2] 15 14 (93%)Exchange 2[2,0,0] [2,0,0] 10[9,1,0] [7,1,1] 9[4,0,5] [2,0,2] 20 15 (75%)Total 9[9,0,0] [9,0,0] 19[15,1,3] [10,3,2] 14[8,0,6] [2,2,4] 41 32% xtal.str.e9/9ˆ100% 15/18ˆ83% 8/14ˆ57% 32/41ˆ78%23 S rRNAInvariant 11[9,2,0] [9,0,0] 13[10,2,1] [9,0,1] 13[6,1, 6] [5,1,0] 32 25 (78%)AA 0[0,0,0] [0,0,0] 4[1,2,1] [0,0,1] 4[0,0, 4] [0,0,0] 6 1 (17%)AG 11[9,2,0] [9,0,0] 9[9,0,0] [9,0,0] 9[6,1, 2] [5,1,0] 26 24 (92%)Exchange 4[2,1,1] [2,0,0] 12[8,3,1] [8,0,0] 20[9,6, 5] [7,0,2] 26 19 (74%)Total 15[11,3,1] [11,0,0] 25[18,5,2] [17,0,1] 33[15,7,11] [12,1,2] 58 44% xtal.str.e11/12ˆ92% 18/20ˆ90% 15/26ˆ58% 44/58ˆ76%rRNA Total 24[20,3,1] [20,0,0] 44[33,6,5] [27,3,3] 47[23,7,17] [14,3,6] 99 76% xtal.str.e20/21ˆ95% 33/38ˆ87% 23/40ˆ58% 76/99ˆ77%S: 20/20 (100%) S: 27/33 (82%) S: 14/23 (61%) S: 61/76 (80%)I: 3/33 (9%) I: 3/23 (13%) I: 6/76 (8%)O: 3/33 (9%) O: 6/23 (26%) O: 9/76 (12%)aC, number of predicted base-pairings based on the bacterial structure; ‡, number of predicted pairings in the crystal structure;ù, number of predicted pairings for which there is no homologous structure in the crystal structures (see the text for details); À,number of predicted pairings that are not present in the crystal structure.bConformation of the base-pair: S, sheared; I, imino or imino-like; O, other.cCo, the total number of homologous base-pairs from that category in the comparative structure model.dCr, the total number and percentage of base-pairs in the crystal structure.eThe percentage of base-pairs predicted with comparative analysis that are present in the crystal structure [`` ‡ ``/(``C-``ù``)].Percentage of the base-pairs having the conformation: S, sheared; I, imino; O, other.A:A and A:G Base-pairs at the Ends of RNA Helices 737
  4. 4. the G tends to be at the 3Hend of the helix.28Ouranalysis here of the most recent versions of a largenumber of phylogenetically diverse 16 S and 23 SrRNA comparative structure models is consistentwith this earlier result. Of the invariant AG andAA/AG oppositions that ¯ank a helix, approxi-mately 87 are oriented with the G 3Hto the helix,while eight AG oppositions have the G 5Hto thehelix. This result, as discussed later, is consistentwith the types and frequencies of A:A and A:Gbase-pair conformations present in the crystalstructures.1050100150200250300350400450500550600650700750800850900950100010501100115012001250130013501400145015005’3’IIIIII*** ************** ****aAAAUUGAAG A G U UU GAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACAC AUGCAAG U CG AA C G G UA AC A G G A A G A A G CUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGA G G G GG A U AA C U A C U G GAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCG G G C C U C U U GCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGG CUCACCUAGGCGACG AUCCCUAGCUGGUCUGAG AGGA UG ACC A GC CACACUGGAACUGAGACA C GG U C C A GACUCCUAC GGGAGG C A GCAGUGGGGAAUAUUGCACAAUGGGCGCAA G C C U G A U G C A GCCA UGCCGCGUGUAUGAAGAAGGCCUUCG G G U UGU A AAG U A CUUUCAGCGGGGAGGAAGGGAGUAAAGUUAA U ACCUUUGCUCA UUGAC G UUACCCGCAGAAGAAGCACCGGCUA A CUCCGψGCCAGCAG C CGC GGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUG GGCGUAAAGCGCACGCAGGCGGUUUGUUAAGUCAGAUGUGAAAUCCCCGGGCUCA A C C U G G G AA CU G C A U C U G AU AC U G G C A A G CUUGAGUCUCGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAA A U G CGUA GAGA U C U G G A G G A A UAC CGGU GG C GAAGGCGGCCCCCUGGACGAAGACUGACGCUCAGGUGCGAAAGCGUGGGGA GCAAACAGGAUUA G AUACCCUGGUAGUCCACGC C G UAAACGAUG U C G A C U U GGAGGUUGUGCCC U UGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUCGACCGCCUG G GGAG UACG G C C GCAAGGUUAAAACUCAA AU G A A U U G A C GGG G G C C C GCA C A A GCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACG CGAAGAAC C U UACCUGGUCUUGACAUCCACGGAAGUUUUCAGAGA U G A G A A U GUGCCUU CGGGAACCGUGAGAC AGGUGCUGCA UGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUA AGUCCCG CAA C G A G CGC A ACC C U U A U C C U U U G U U G C CA GC G G U CCGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGU CAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUAC A AU GGCGCAUACA A A GAGAA GCGA C CUCG CGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUU CGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACA AGGU A A C C G U A G G GGAACCUGCGGUUGGAUCACCUCCUUAFigure 1 (legend shown on page 741)738 A:A and A:G Base-pairs at the Ends of RNA Helices
  5. 5. IIIII501001502002503003504004505005506006507007508008509009501000105011001150120012501300135014001450150015501600164029005’ 3’3’ halfm1m5(2407-2410)(2010-2011)(2018)(2057/2611 BP)(2016-2017)(2012)***********************bGGUUAAGCGACUAAGCGUACACGGUGGAUGCC CUG G C A G U C A G A GGCGAUGAAGGACGUGCUAAUC UGCGAUAAG CGUCGGUAAGGUGAUAUGAACC GUUAUAACCGGCGAUUUCCGA A U GGGGA AACCC AGUGUGUUU CGACACACUAUCAUUAACUGA A UCCAUAGGUUAAUGAGGCGAAC C G G G GG A A CUG A AACAUCUAAGUACCCCGAGGAAAAGAAAUCAACCGAGAUUCCCC CAGUAGCGGCGAGCGAACGGGGAGCAGCCCAG A G CCU G A AUC A G U G U G U G U G U U A G U GGAA GCGUCUGG AAAGGCGCGCG AUACAGGGUGACAGCCCCGUACACAAAAAUGCACAUGCUGUGAGCUCGAUGAGUAGGGCGGGACACGUGGU AUCCUGUCUGAAUAUGGGGGGAC C AUCCUCC A AGGCUAAAUACUCCUGACUGACCGAUAGUGAACCAGUACCGUGA G GGAA A GGCGAAAAGAACCCCGGCGA G G G GA GU GAA A A A GAA CCUGAAACCGUGUACGUACAAGCAGUGGGAGCACGCUUAGGCGUGUGACUGCGUA C C U UUUGUAUAAUGGGUCAGCGACUUAUAUUCUGUAGCAAG G U UA AC C G AAUAGGGGAGCCGAAGGGAAACCGAGUCUUAACU G G G C GUUA A GUUGCAGGGUAUAGACCCGAAACCCGGUGAUCUAGCCAUGGGCAG G U UG A AG G U U G G G UAACACUAACUGGAGGACCGAACCGACUAAUGψUGAAAA AUUAGCGGAUGACUUGUGGCUGGGGGUGAAAG GC CAAUC A AACCGGGAGAUA GCUGGUUCUCCCCGAAAGCUAUUUAGGUAGCGCCUCGUGAAUUCAUCUCCGGGGGUAGAGCACUGUUUCGGCAAGGGGGUCAUCCCGACUUA CCAACCCGAUGCAAACUG CGAAUACCGGAGA AUGUUAUCACGGGAGACACACGGCGGGψGCUAA C G U C C G U C G U GAAGAGGGAAAC AACCCA G A CCGCCAGCUAAGGUCCCA AA GU CAUGGUUAAGUGGGAAA CGAUGUGGGAAGGCCCAGAC A GCCAGGAUGUUGGCUUAGAAG C AG C C A U C A U UUAAA GAAAG CG UAAUAGCUCACUGGUCGAGUCGGCCUGCGCG G AAGAUGUAACGGGGCUAAACCAUGCACCGAAGCUGCGG CAGCGACGCU UAUGCGUUGUUGGGUAG G G G A GCGUUCUGUAAGCCUGCGA A GGUGUGCUG UGAGGCAUGCUGGAGGUAUCAGAAGUG CGAAUG C U G A CAUAAGUAACGA U A A AGCGGGUGAA AAGCCCGCU CGCCGGAAGACCAAGGGUUCCUGUCCAACGUUAA U C G G G G C A G GGUGAGU CGACCCCUAAGGCGAGGCCGAAA G G CGUAG U CG A UG GGAA ACAGGUUAA UAUUCCUGUACU U G G U G U U A C U G CG AA G G G G GGA CGGAGAAGGCUAUGUUGGCCGGGCGACGGUU G UC C C G G UUUAAGCGUGUAGGCUGGUUUUCCAGGCAAAU C C G G A A A A U CA AG G C UG A GG C G U GAUGA CG A G G C A C UACGGUGCUGAAGCAACAAAUGCCCUGCUUCCAGGAAAAGCCUCUAAGCAUCAGGUAACAUCAAAUCGUACCCCAAAC C G ACACAGGUGGUC AG G U A GAGAAUACCAAGGCG CGCUUAACCUUFigure 1 (legend shown on page 741)A:A and A:G Base-pairs at the Ends of RNA Helices 739
  6. 6. IVVVI5’3’165017001750180018501900195020002050210021502200225023002350240024502500255026002650270027502800285029005’ halfm(1269-1270)(413-416)(1262-1263)(746)(531)(1268)*** **********cGGUUAAGCU UGAGAGAA CUCGGGUGAAGGAACUAGGCAAAAUGGUGCCGUAACUUCG G GA G A AG G C A CGCUGAUAUGUAGGUGAGGUCCCUC GCGGAUGGAGCUGAAAUCAGU CGA AG A U A C C A G CUGGCUGCAACUGUUUAUUAA A AA C ACAGCACUGUGCAAACACGA AAGUGGACGUAUACGGUGUGAC G C CUGCCCG GUGCCGGAA GGUUAAUUGAUGGGGUUAGCGC AAGCGAAGCUCUUGAUCGAAGCCCCGGU AAACGGC GGCCGψAAC ψAψAACGGUC CU AAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCUGCACGAAUGGCGUAAUGAUGGCCAGGCUGUCUCCACCCGAGACUCA GU G A A AUUGAACUC GC U GUG AAGAUGCAGUGUAC C C G C G G CAA G A C G GAAAGA CCCCGUGAACCUUUACUAUAGCUUGACACUGAACAUUGAGCCUUGAUGUGUAGG A UAG G U G GGA GGCUUUGAA GUGUGGACGC CAGUCUGCAUGGAGCCGACCUUGAAAUACCACCCUUUAAUGUUUGAUGUUCU A A C G UUG A C C C G U AAUCCGGGUUGCGGACAGUGUCUGGUGGGUAGUU U GACUGGG GCGGUC UCCUCCUAAAG A GUAACGGAGGA G C A CGAAGGUUGGCUAAUCCUGGUCG G ACAUCAGGA GGUUA GUGC AAUGGCAUAAGCCAGCUUGAC U G C G A G C G U GACGGCGCGAGCAGGUGCGAAAGCAGGUCAUAGUGAUCCGGUGGU UCUGAAUGGAAGGGCCAUCGCUCAACGGAUAAAAGGU ACUCCGGGG A DAACAGG C ψGA U A C C G C CC A AG AG UUCAUAUCGACGGCGGUGUUUGGCACCUCGAψGUCGGCUCAUCACA U C C U G G G G C U G AAGUAGGUCCCAAGGGUAUGGCUGUUCGCCAUUUAAA GUGGUACGCGAGCψGGGUUUAGAACGUCGUGAGACA GUψCGGUCCCUAUCUGCCGUGGGCGCUGGAGAACU GAGGGGGGCUGCUCCUA GUA CG AGAGGACCGGAGUGGACGCAUC ACUGGU GUUCGGGUUGUCAUGCCAAUGGCACUGCCCGGUAGCUAAAUGCGGAAGAGAUAAGUGCUGAAAGCAUCU A AGCACGAA A CUUGCCCCGAGAUGAGUUCUCCCUGACCCUUUAAGGGUCCUGAAGGAA C G U U GA AGACGACGACGUUGAUAGGCCGGGUGUGU AAGCGCAGCGAUGCGUUGAGCUAACCGGUA CUAAUGAACCGUGAGGCUUAACCUUFigure 1 (legend shown on page 741)740 A:A and A:G Base-pairs at the Ends of RNA Helices
  7. 7. An analysis of the helix ends in the crystal andNMR structures and in the 16 S and 23 S rRNAcrystal structuresAA.AG@helix.ends in rRNAsAn analysis of approximately 6000 Bacterial 16 Sand over 300 23 S rRNA sequences aligned formaximum structure similarity revealed 115 helixends with AA, AG, and AA/AG oppositions inmore than 90 % of the sequences (Table 1 andFigure 1). These are proportionately distributed inthe 16 S and 23 S rRNAs, with 42 occurrences in16 S and 73 in 23 S rRNA, and are present in thethree loop categories, with 24 candidates in hair-pins, 44 in internal loops, and 47 in multi-stemloops. Invariant and exchange cases occur at nearlythe same frequencies. 75 % of the invariant sitescontain an AG opposition, while only 25 % have anAA (Table 1). In addition, there is a bias for invar-iant A:G base-pairs in hairpin loops (with themajority of these occurring in tetraloops11), and aslight bias for multi-stem loops to have AA/AGexchanges (Table 1). The nucleotide frequencies fora larger set of sequences (approximately 8500 16 Sand over 1000 23 S rRNA sequences) that includesthe nuclear encoded rRNAs in the three primaryphylogenetic groups, Archaea, Bacteria and Eucar-ya, and the two Eucarya organelles, chloroplastsand mitochondria (see Online Table 4 at CRWAA.AG), reveal that the majority of the positionscontain the AA and AG oppositions in all of thealignments and phylogenetic groups, while someof the AA and AG oppositions in the Bacteria con-tain AU/GC or other nucleotide sets in one ormore of the non-bacterial alignments. For example,23 S rRNA positions 637:651 and 713:718 both con-tain AG oppositions in nearly all of the Bacteria,and both exchange between G:C and C:G in theArchaea.During the preparation of this manuscript, thecrystal structures for the 30 S32,33and 50 S31riboso-mal subunits were solved. We have analyzed thesestructures to determine if the AA and AG opposi-tions at the ends of helices that occur in more than90 % of the known Bacterial rRNA sequences arebase-paired in the crystal structures. A total of 99of the 115 Bacterial-centric oppositions wereresolved in the crystal structures and had homolo-gous positions in the Thermus thermophilus 16 Sand Haloarcula marismortui 23 S rRNA crystal struc-tures; these are tabulated in Table 1 and high-lighted on the 16 S and 23 S rRNA secondarystructure diagrams in Figure 1. Of these 99, 76(77 %) form an A:A or A:G base-pair (78 % (32/41)in 16 S and 76 % (44/58) in 23 S rRNA). InvariantAG oppositions (41 examples) at the ends of helicesoccur more frequently than invariant AA opposi-tions (12 examples) in the 16 S and 23 S rRNAs(Table 1); our analysis of the rRNA crystal struc-tures reveals that the AG oppositions form base-pairs more frequently than the AA oppositions.The 99 homologous oppositions have a slightlybiased distribution in the three unpaired loop cat-egories. A total of 40 % (40/99) occur in multi-stemloops, 38 % (38/99) in internal loops, and 21 % (21/99) in hairpin loops.A total of 20 of the 21 (95 %) homologous AAand AG candidates in hairpin loops are base-paired (Table 1 and Figure 1). GNRA tetraloopsoccur at 62 % (13/21) of these hairpin loops, andall of these have base-pairing between the ®rst andlast nucleotide of this hairpin loop. As well, six ofthe seven (86 %) homologous hairpin loops withmore than four nucleotides also have base-pairingat the two ends of the loop. Finally, all of thesebase-pairs are in the sheared conformation.For the AA and AG oppositions at the ends ofhelices in internal loops, 87 % (33/38) are base-paired (83 % and 90 % of the 16 S and 23 S rRNAcandidates). In contrast with the hairpin loops,where 76 % (16/21) of the candidates have aninvariant AG, 47 % (18/38) of the internal loopshave an AA/AG exchange, while only 34 % (13/38) have an invariant AG. All of the invariant AGoppositions are base-paired, and all except one ofthese (92 %) form a sheared conformation. All butone of the 18 (94 %) AA/AG exchanges are alsobase-paired. 15 of the 17 (88 %) base-paired AA/AG exchanges are in the sheared conformation,Figure 1. E. coli 16 S and 23 S rRNA comparative secondary structure models (based upon the sequences in Gen-Bank Accession no. J01695) showing the AA and AG oppositions at the ends of helices that occur in more than 90 %of the bacterial sequences. These opposed nucleotides are shown in red. Highlights indicate additional informationfrom crystal structures: orange, opposition is base-paired in the crystal structure; green, candidate is not base-pairedin the crystal structure; blue, candidate is not homologous, was not determined or is a Watson-Crick base-pair in thecrystal structure (e. g. this region is deleted, or is not an AA or AG opposition in the sequence of the organism thatwas crystallized). Candidates with AA/AG exchanges are marked with asterisks: red, signi®cant exchanges in allalignments with minimal exceptions; green, signi®cant exchanges in at least one alignment with minimal exceptionsbut with more exceptions in at least one other alignment; blue, exchanges in at least one alignment (excluding mito-chondria). Nucleotides which are base-paired in the crystal structures but not in the comparative structure modelswhich affect potential coaxial stacking and AA/AG oppositions that are not base-paired are colored blue and con-nected with blue lines and boxes to indicate the base-pairing. Highlights within helices indicate potential coaxialstacking: brown, not present in crystal structure; yellow, present in crystal structure. Base-pairs predicted with covar-iation analysis are denoted with - for canonical A:U and G:C base-pairs, small closed circles for G:U base-pairs, largeopen circles for G:A base-pairs, and large closed circles for non-canonical base-pairs. (a) 16 S rRNA (crystal structure:T. thermophilus33). (b) 23 S rRNA, 5Hhalf (crystal structure: H. marismortui31). (c) 23 S rRNA, 3Hhalf (crystal structure:H. marismortui31).A:A and A:G Base-pairs at the Ends of RNA Helices 741
  8. 8. a cbfFront view of sheared A:G base-pairsFront view of imino A:G base-pairsFront view of A:A base-pairsSide view of sheared A:G base-pairsSide view of imino A:G base-pairsSide view of A:A base-pairsg h id eFigure2(legendshownopposite)
  9. 9. one is in the imino conformation, and the other isin the unusual A:G N3-amino base-pair confor-mation (see CRW AA.AG Online Figure 3 forchemical structure drawings and abbreviationsused in other online materials). A lower percentageof base-pairing occurs with the invariant AA oppo-sitions. Here, base-pairing occurs in only three ofthe seven (43 %) homologous invariant AA opposi-tions. None of these form a sheared conformation,one forms an imino conformation, and two formunusual conformations. On the whole, the shearedconformation occurs in 82 % (27/33) of the pairedoppositions in internal loops. 9 % (3/33) have theimino conformation and the remaining 9 % (3/33)have another type of conformation (Table 1).Of the three loop categories, the lowest percen-tage of base-pairs for AA/AG oppositions at theends of helices occurs in multi-stem loops. Here,58 % (23/40) of these candidates are base-paired inthe 16 S and 23 S rRNA. Within this category, thehighest percentage of base-pairings occurs for theinvariant AG oppositions, where 75 % (9/12) arebase-paired. Base-pairing occurs in 57 % (13/23) ofthe AA/AG exchanges, and for only one of ®ve(20 %) invariant AA oppositions. 61 % (14/23) ofthe AA/AG oppositions in multi-stem loops formsheared conformations, 13 % (3/23) form the iminoconformation, and six (26 %) form other confor-mation types. For these rRNA oppositions, thehighest percentage of base-pairs occur for theinvariant AGs, followed by the AA/AG exchanges,with the lowest percentage of pairing in multi-stemloops (Table 1). 93 % (38/41) of the invariant AGoppositions are base-paired, 74 % (34/46) of theAA/AG exchanges are base-paired, and only 33 %(4/12) of the invariant AAs are base-paired.Several conformations are possible for these A:Gbase-pairs. The most common and well-character-ized are sheared and imino (Figure 2(a) and (d)34).The sheared conformation occurs in 80 % (61/76)of the base-paired oppositions of the 16 S and 23 SrRNAs. The sheared conformation forms in 87 %(33/38) of the invariant A:G base-pairs(Figure 2(a)), in 82 % (28/34) of the AA/AGexchanges, and does not occur in any of the fourinvariant A:A base-pairs (Figure 2(g), top). Animino or imino-like conformation occurs six times(6/76 ˆ 8 %) in the 16 S and 23 S rRNAs. Theyform in 8 % (3/38) of the invariant A:G base-pairs(Figure 2(d)), in just one of the 34 (3 %) AA/AGexchanges and in two of the four (50 %) invariantA:A base-pairs (Figure 2(g), bottom). Beyond thesetwo well-characterized conformations, there are®ve other conformations (CRW AA.AG OnlineFigure 3 and Online Table 4): (1) A:A N7-amino(``A7-1``; one in 16 S rRNA at positions 1248:1289);(2) A:A N7-amino symmetric (``A7``; one in 23 SrRNA at positions 1689:1698); (3) A:G N1-amino(``G1``; one in 16 S rRNA at positions 983:1222); (4)A:G N7-amino (``G7``; one in 23 S rRNA at pos-itions 149:177); and (5) A:G N3-amino (``G3``; fourin 16 S rRNA at positions 60:107, 197:220, 687:700,and 1067:1108; one in 23 S rRNA at positions627:636).There are eight examples of the A:G base-pair inthe 16 S and 23 S rRNA crystal structures wherethe G is 5Hto the helix. These occur at 16 S rRNApositions 112:315, 143:220, 321:332, 945:1236,1160:1176, and 1357:1365, and at 23 S rRNA pos-itions 75:111 and 2547:2561 (Figure 1 and base-pairfrequency tables at CRW AA.AG). Five of thesebase-pairs were already in the covariation-basedrRNA structure models, with exchanges betweenthe G:A and G:C/G:U/A:U or A:G base-pairs. Theremaining three had minor exchanges with G:C/G:U/A:U base-pairs. All eight of these rRNA base-pairs are in the imino conformation, which is con-sistent with the similarity between the G:A iminoand Watson-Crick conformations.AA.AG@helix.ends in the PDB structure databaseTo appreciate the conformation and structuraldetails about these AA and AG oppositions at theends of rRNA helices, and to establish a set ofrules for RNA structure principles that de®ne themand will help us predict their occurrence in thefuture, we have also analyzed the ends of helicesin the crystal and NMR structures available at thePDB structure database (http://www.rcsb.org/pdb/35). The crystal and NMR RNA structures thatare analyzed and discussed below are summarizedin Table 2 and detailed in CRW AA.AG OnlineTable 5. These 29 crystal and 41 NMR structurescontain 116 AA and AG oppositions (61 in crystalstructures and 55 in NMR structures) at the end ofa helix. The 70 structures can be divided by RNAmolecule into the following categories: 12 rRNAstructures (22 cases), 11 tRNA structures (22 cases),four group I intron structures (14 cases), and 43Figure 2. Stereo views of A:G and A:A base-pairs at helix ends in different structural motifs from X-ray crystallo-graphy. NMR structures are omitted for clarity. The A in each base-pair is superimposed on the left of each panel.Chemical drawings were created using ISIS/Draw and stereo images were created using Insight II. (a) Chemicaldrawing of the G:A sheared base-pair (G:A N3-amino, amino-N7 base-pair34). (b) Front view of sheared A:G base-pairs: blue, GNRA tetraloop; yellow, E loop; green, tandem GA; red, helix end. (c) Side view of (b). (d) Chemicaldrawing of the G:A imino base-pair (G:A carbonyl-amino, imino-N1 base-pair34). (e) Front view of imino A:G base-pairs: blue, 5Hhelix end; yellow, 3Hhelix end. (f) Side view of (e). (g) Chemical drawings of the A:A sheared-like base-pair (top; A:A N3-amino base-pair44) and the A:A imino-like base-pair (bottom; A:A N1-amino base-pair44). (h) Frontview of A:A base-pairs: yellow, N1-amino conformation; blue, N3-amino conformation; red, N7-amino conformation;green, tandem; gray, triple. (i) Side view of (h).A:A and A:G Base-pairs at the Ends of RNA Helices 743
  10. 10. other RNA structures (58 cases), including one SRPstructure (three cases), ®ve ribozyme structures(nine cases), ®ve pseudoknot structures (®vecases), and four Rev response element structures(six cases).For the PDB structure database (Table 2), 80 %(93/116) of the oppositions are base-paired. AGoppositions at the ends of helices occur more fre-quently than AA oppositions in the PDB structuredatabase (Table 2). Our analysis of the structuredatabase reveals that the AG oppositions formbase-pairs more frequently than the AA opposi-tions. These oppositions also have a biased distri-bution in the three loop categories. 44 % (51/116)occur in internal loops, 39 % (45/116) in hairpinloops, and 17 % (20/116) in multi-stem loops.There is an even distribution of oppositions thatare base-paired in these loops: 76 % (34/45) in thehairpin loops, 82 % (42/51) in the internal loops,and 85 % (17/20) in the multi-stem loops.A total of 90 % (70/78) of the AG oppositions atthe ends of helices in the PDB structure database(Table 2) are base-paired. These include both orien-tations (i.e. G 5Hand 3Hto the helix, and GA tan-dems). However, 70 % (54/78) have the G 5Hto thehelix. 67 % (47/70) of the A:G base-pairs are in thesheared conformation (Figure 2(a)), 30 % (21/70)are in the imino conformation (Figure 2(d)), and3 % (2/70) form the G:A‡carbonyl-amino, N7-N1base-pair conformation (Online Figure 3(e)).When the G is 3Hto the helix in the examples inTable 2, the sheared conformation is formed in83 % (40/48) of the A:G base-pairs. 12 % (6/48) arein the imino conformation, and 4 % (2/48) formother conformations. These A:G sheared base-pairsare often a component of a larger motif that wecurrently recognize. All 16 examples of A:G base-pairs in GNRA tetraloops are in the sheared con-formation, and all of the A:G base-pairs in hairpinloops and at the end of a helix are in the shearedconformation (with the G 3Hto the helix). All 11 ofthe A:G base-pairs in the E-loop and E-like loopcases that occur in internal and multi-stem loopsare also sheared. 14 of the 22 other A:G base-pairswith the G 3Hto the helix are also in the shearedconformation. The sheared conformation induces abend in the backbone that does not distort the¯anking helix when the G is 3Hto the helix; how-ever, the ¯anking helix will be distorted when theG is 5Hto the helix. The observed bias for shearedconformations for those A:G base-pairs orientedwith the G 3Hto the helix is consistent with thistopological constraint. However, there is oneexample from a lower-resolution crystal structureof a sheared A:G base-pair when the G is 5Hto thehelix; this base-pair is at positions A299:G279in the Tetrahymena thermophila group I intron, with3-4 AÊ between the hydrogen bonding pairs.36In contrast with the sheared conformation, A:Gbase-pairs at the ends of helices can adopt animino conformation34that can form at either end ofa helix (with the G 5Hor 3Hto the helix) without dis-torting the surrounding base-pairs. There are sixexamples in Table 2 where an A:G base-pair (withthe G 3Hto the helix) forms an imino conformation.There are also a few examples where an A:G base-pair with this orientation in Table 2 adopts anotherconformation type (see below). As well, 71 % (15/21) of the A:G base-pairs in the imino conformation(including the two tandem GA cases) are orientedwith the G 5Hto the helix (Table 2). 93 % (13/14) ofthe single A:G base-pairs with the G 5Hto the helixare in the imino conformation; the other is asheared base-pair (see above). There are twoexamples of tandem G:A imino base-pairs wherethe G is 3Hto the helix in one case and 5Hto thehelix in the other.37A total of four of the sixexamples of imino A:G base-pairs with the G 3Htoa helix are in single nucleotide bulges, adjacent tothe A:G or A:A base-pair, where only one nucleo-tide remains unpaired.38-41In these instances, animino conformation, with its non-helix-distortingproperties, may be preferred over the sheared con-formation.We have investigated the A:G base-pair confor-mations in different structural motifs to determineif the nucleotides surrounding the A:G base-pairin¯uence the conformation of this base-pair. TheA:G base-pairs in Figure 2 are color-coded for theGNRA tetraloop, E loop, and GA tandem motifsTable 2. Distribution of AA and AG juxtapositions at the ends of helices in the structures in the PDB StructureDatabaseLoop type Hairpin Internal Multi-stem TotalOpposition C[ ‡ , À ]a[S,I,O]bC[ ‡ , À ]a[S,I,O]bC[ ‡ , À ]a[S,I,O]bC[ ‡ , À ]a[S,I,O]bAA 18[11,7] [11,0,0] 14[8,6] [4,1,3] 6[4,2] [1,1,2] 38[23,15] [16,2,5]AG c27[23,4] [23,0,0] 24[23,1] [15,6,2] 3[2,1] [2,0,0] 54[48,6] [40,6,2]GA d0[0,0] [0,0,0] 7[5,2] [0,5,0] 9[9,0] [1,8,0] 16[14,2] [1,13,0]GA tandem 0[0,0] [0,0,0] 6[6,0] [4,2,0] 2[2,0] [2,0,0] 8[8,0] [6,2,0]AG totals 27[23,4] [23,0,0] 37[34,3] [19,13,2] 14[13,1] [5,8,0] 78[70,8] [47,21,2]Total 45[34,11] [34,0,0] 51[42,9] [23,14,5] 20[17,3] [6,9,2] 116[93,23] [63,23,7]aC, number of examples of AA or AG juxtapositions at the ends of helices from crystal or NMR structures. ‡, Base-pair ispresent; À, base-pair is absent.bConformation of AA or AG base-pairs present in the crystal or NMR structures: S, sheared; I, imino or imino-like; O, other.cG is 3Hto the helix.dG is 5Hto the helix.744 A:A and A:G Base-pairs at the Ends of RNA Helices
  11. 11. and the unincorporated A:G base-pairs when the Gis 3Hto the helix. Our analysis revealed that theconformations for the A:G base-pairs are nearlyidentical in all of these motifs except for the GNRAtetraloops (Figure 2(b) and (c), blue nucleotides),where the G of the GNRA tetraloop G:A shearedbase-pair is shifted toward the major groove of theA. This shift is due to the additional hydrogenbonds between the guanosine base and the back-bone of A in the tetraloop, and between the back-bone atoms of G and other bases in the loop.42There is a minimal amount of conformational ¯exi-bility in tandem G:A base-pairs with sheared andimino conformations (Figure 2(b), (c), (e) and (f)).Imino base-pairs showed much less conformational¯exibility than sheared base-pairs, regardless ofwhether the base-pair was 5Hor 3Hto the helix(Figure 2(e) and (f)).Two consecutive A:G base-pairs can both formsheared base-pairs within a helix when the ®rstG:A base-pair is followed by another A:G base-pair. Both A:G base-pairs distort the helix; how-ever, they are oriented so that they offset or com-pensate one another to maintain the overallregularity of the helix.15,43There are six examplesof tandem sheared G:A base-pairs in the database.We have identi®ed conformations for A:A base-pairs that are analogous to the sheared and iminoA:G base-pairs. 61 % (23/38) of the AA oppositionsat the end of helices in the PDB NMR and crystalstructure database (Table 2) are base-paired. Thereare ®ve different A:A base-pairing conformations;two are analogous to the conformations in thesheared and imino A:G base-pairs. The A:A N3-amino (A:A sheared) base-pair has one hydrogenbond between N3 of the ®rst adenosine and theamino group on the second (Figure 2(g), top44); incomparison, the sheared A:G base-pair forms twohydrogen bonds, one from the N3 of the guanosineto the adenosine amino group and the secondbetween N7 of A and the amino group of G(Figure 2(a)). The A:A N1-amino (A:A imino-like)base-pair conformation forms a single hydrogenbond between N1 of one adenosine and the aminogroup of the second (Figure 2(g), bottom44); whilethe hydrogen bonding pattern is different, theoverall shape of the base-pair resembles that of theA:G imino conformation and the orientation of thebackbone (Figure 2(d)). 70 % (16/23) of the A:Abase-pairs in the PDB structure database (Table 2)are in the sheared (A:A N3-amino) conformation,and 9 % (2/23) are in the A:A imino-like (A:A N1-amino) conformation. The sheared A:A (A:A N3-amino) base-pairs occur at the end of the D stem/hairpin loop junction in tRNAs and within the A:Atandem base-pairs. Other sheared A:A (A:A N3-amino) base-pairs occur in a tetraloop and in theunincorporated 3Hhelix end category. All 11 of thehairpin loops with the A:A base-pair have thesheared conformation, while 50 % (4/8) of theinternal loops and 33 % (1/3) of the multi-stemloops have this conformation for the A:A base-pair.The imino-like A:A (A:A N1-amino) base-pairsoccur in the unincorporated 3Hhelix end category.The remaining 21 % (5/23) of the A:A base-pairs inthe structure database have three other confor-mations, each with two hydrogen bonds, asopposed to a single hydrogen bond for the sheared(A:A N3-amino; Figure 2(g), top) and imino-like(A:A N1-amino; Figure 2(g), bottom) confor-mations. There are three ``A:A N7-amino, amino-N1`` base-pairs (with hydrogen bonds between theWatson-Crick and Hoogsteen faces of each A, onefrom N7 of the ®rst A to the amino group of thesecond, and one from N1 of the second A to theamino group of the ®rst34), one ``A:A N1-aminosymmetric base-pair (similar to the imino-likeA:A (A:A N1-amino) base-pair, but with one ade-nosine ¯ipped so that two hydrogen bonds canform between N1 on each adenosine and theamino group of its partner34), and one ``A:A N7-amino symmetric base-pair (with hydrogen bondsbetween the N7 and amino groups of each A44),which is analogous to a sheared A:G base-pairwhere the G is in the syn conformation.A:A and A:G base-pairs that stack onto theends of helicesBeyond the base-pairing of the AA and AGoppositions at the ends of helices, we have investi-gated the structures in the PDB structure databaseto determine if these non-canonical base-pairsstack onto the adjoining base-pair in the helix towhich they are adjacent. The results are af®rma-tive: all but one of the 72 A:G and 23 A:A base-pairs are stacked, with stacking de®ned as one orboth of the base-paired nucleotides overlappingwith the adjoining base-pair in the helix. Examplesof the three-dimensional structures for stacked A:Gbase-pairs in the sheared and imino conformationsare shown in Online Figure 4.The one exception for the base stacking in thePDB structure database occurs in the mouse mam-mary tumor virus pseudoknot, where an A:A base-pair does not stack onto the end of the helix. Thisbase-pair is composed of A14, situated between thetwo helices of the pseudoknot, and A6, located inone of the loops. This base-pair forms in one of thetwo constructs of the mouse mammary tumorvirus. In the construct where A14 is unpaired, A14stacks on G15 in the helix below.45In the constructwhere A14 is base-paired to A6, the A14:A6 base-pair does not stack on the G15:C5 base-pair at theend of the helix.46Burkard et al.47analyzed the nucleotide stackingsat the ends of helices in the PDB structure databaseand found that all AG oppositions at the ends ofhelices are base-paired and stacked when the G is3Hto the helix. Our analysis of the rRNA crystalstructures31,33revealed that both positions of theA:G base-pairs at the ends of helices are stacked in78 % (21/27) of the cases in the 16 S rRNA and88 % (36/41) of the cases in the 23 S rRNA (infor-mation about stacking is available from the base-pair frequency tables at CRW AA.AG). In theA:A and A:G Base-pairs at the Ends of RNA Helices 745
  12. 12. remaining six 16 S rRNA and ®ve 23 S rRNAcases, one nucleotide of each A:G base-pair isstacked upon the neighboring base-pair. For A:Abase-pairs, four of the ®ve 16 S rRNA cases and allthree of the 23 S rRNA cases have both nucleotidesstacked; in the lone exception, one nucleotide ofthe A:A base-pair is stacked upon the neighboringbase-pair. In total, stacking occurs on both pos-itions in 78 % (25/32) of the 16 S rRNA base-pairsand 89 % (39/44) of the 23 S rRNA base-pairs; theremaining seven 16 S rRNA and ®ve 23 S rRNAA:A and A:G base-pairs have only one of the twobase-paired positions involved in stacking (seeOnline Table 4 at CRW AA.AG).Coaxial stacking with A:A and A:Gbase-pairing at the helix interfacesThe ends of helices have a propensity to stackonto one another. Transfer RNA contains two setsof coaxial helices, the acceptor and TÉC helices,and the D and anticodon helices.48The two mostcommon base-pairs at positions 26:44, at the top ofthe tRNA anticodon helix (and stacked onto the Dstem), are G:A and A:G (see CRW AA.AG OnlineTable 7 for the base-pair frequencies for tRNA pos-ition numbers 26:44, Saccharomyces cerevisiaephenylalanine numbering). Other base-pairs pre-sent in more than 5 % of the sequences are A:A,A:U, A:C and U:A.More recently, two sets of coaxial helices wereidenti®ed in the crystal structure for the L11 bind-ing region of 23 S rRNA (Figure 1(b)30,49). Thelone-pair 1082:1086 (E. coli numbering) is stackedonto the 1057-1059/1079-1081 helix. A second lone-pair, 1087:1102, is stacked onto the G1056:A1103base-pair at the top of the 1051-1056/1103-1108helix.Given these two precedents for A:G and A:Abase-pairs at the interface between coaxiallystacked helices, we questioned if (1) there are otherexamples in the RNA structure database for thismotif and (2) if one of the functions of A:A andA:G base-pairs at the termini of helices is to be atthe interface of two helices that are coaxiallystacked.23 of the 116 examples in the PDB structuredatabase with an AA or AG at the end of a helix(Online Table 6) are adjacent to another helix. 21 ofthese are base-paired, while two are unpaired(Table 2). A Curves analysis was performed onthese helix junctions to measure the angle betweenthe two helices and the overall helix dis-placement.50,51Helices are considered to be coaxialwhen both the angle between the helix axes andtheir displacement are minimal, as discussed inMaterials and Methods. Eight of the 21 examplesin the structure database with an A:A or A:G base-pair at the end of one helix and adjacent to anotherhelix occur at the anticodon/D helix junction intRNA. All eight of these tRNA examples are coaxi-ally stacked with the G 5Hto the helix and the G:Abase-pair in the imino conformation (the N1-aminoconformation for the one A:A base-pair). Inaddition to the eight tRNA cases, eight moreexamples satisfy these strict criteria, includingexamples in 23 S rRNA and the RRE RNA.However, there are a few cases where an A:A orA:G base-pair at the end of a helix is not coaxial toa second helix. The P4-P6 domain of the group Iintron contains tandem G:A base-pairs in a multi-stem loop at positions 139:164 and 140:163 thatextends the P5b helix and ¯anks and adjoins theP5a and P5c helices (PDB ID 1GID52). The axis ofthe P5c helix (165-167/173-175) that is 3Hto theA139:G164 base-pair continues at an angle of 94 to and is 11.7 AÊ displaced from the P5b helix end-ing in A:G. The axis of the P5a helix (136-138/180-182), 5Hto the A139:G164 base-pair, has an angle of42 to and 9.15 AÊ displacement from the P5b helixending in A:G. Helices P5a and P5c are not con-sidered to be coaxial with P5b. The second excep-tion also occurs in the group I intron, where the P3and P7 helices that end with A:A base-pairs(A269:A306 and A270:A104) are not coaxial.36Here, the two helices are separated by 3.9 AÊ andoccur at an angle of 40 .While 21 of the 23 examples in the PDB databasewith an AA or AG opposition at the end of onehelix and adjacent to another helix form an A:A orA:G base-pair, A:A or A:G base-pairs do not formin the remaining two examples. In both cases, thehelices are not coaxial with one another. The RNAis kinked at the internal loop junction by 170 andthe axis is displaced by 16 AÊ when the spliceo-somal U1A protein is bound to its RNA.53Helices are also not stacked for the unpaired AAoppositions in the mouse mammary tumor viruspseudoknot junction. Here, the angle between thehelices is 78 and the helix displacement is 5.3 AÊ .45As noted earlier, there are 115 cases with an AAor AG opposition at the end of a helix in the Bac-terial 16 S and 23 S rRNA secondary structuremodels. A total of 99 of these are homologous andhave their structures determined in the 16 S and23 S rRNA crystal structures; 76 of these are base-paired in the two crystal structures. All additionalbase-pairs in the crystal structures that are not inthe comparative structure models, adjacent to A:Aand A:G base-pairs at the ends of helices, andimmediately opposed to another helix with nointervening nucleotides were identi®ed on the sec-ondary structure diagrams in Figure 1. Two heliceswith an A:A or A:G base-pair at their interface andno unpaired nucleotides on the strand connectingthem were considered as a possible coaxial helixand identi®ed in Figure 1; those stacked in thecrystal structures were identi®ed.DiscussionOur goal is to predict base-pairs for those pos-itions with similar patterns of variation (covaria-tion) and, more recently, for those positions witheither unique patterns of variation or no variation.746 A:A and A:G Base-pairs at the Ends of RNA Helices
  13. 13. Toward this end, an earlier analysis of base-pairedand unpaired nucleotides in covariation-basedrRNA structure models has revealed that there is asigni®cant bias for adenosines to be unpaired, anda more pronounced bias for unpaired As at the 3Hend of loops.25The same analysis also determinedthat Gs and As are the two most frequent nucleo-tides at the 5Hend of a loop. Given that the GA/AA opposition at positions 1056:1103 is base-paired in the 23 S rRNA L11 crystal structures,30,49we have searched for other examples of AA andAG oppositions at the ends of helices.AA and AG oppositions, base-pairs, andconformations at the ends of helicesOur analysis of the 16 S and 23 S rRNA covaria-tion-based models revealed that AA and AG oppo-sitions that occur in more than 90 % of the rRNAsequences at the ends of helices are very common.Of the approximately 400 oppositions at the end ofa helix, more than 100 of them have a very con-served AA, AG or an AA/AG exchange. Prior tothe resolution of the 16 S and 23 S rRNA crystalstructure solutions, our only examples with physi-cal evidence for A:G and A:A base-pairs at theends of helices were in the NMR and crystal struc-ture solutions available from the PDB structuredatabase. Our analysis of both databases revealed,as discussed earlier, the following trends. (1) Morethan 75 % of these AA and AG oppositions arebase-paired. (2) Of the AA and AG oppositions,AG oppositions occur more frequently and arebase-paired at a higher percentage. (3) For the twoAG orientations, the G is 3Hto the helix in approxi-mately 90 % of the cases. (4) For the three loop cat-egories, the highest percentage of base-pairingoccurs in the hairpin loops, followed by internaland multi-stem loops. (5) Overall, the most com-mon conformation for the base-paired oppositionsis sheared. The imino and several unusual confor-mations occur at a much lower frequency. The per-centage of sheared conformations is higher for A:Gbase-pairs (versus A:A) and higher when the G is 3Hto the helix. In contrast, essentially all of the A:Gbase-pairs with the G 5Hto the helix have the iminoconformation.AA and AG oppositions that are notbase-pairedWhile 80 % (93/116) of the AA and AG opposi-tions at the ends of helices from the PDB structuredatabase are base-paired, 23 are not. 65 % (15/23)of these involve AA oppositions and 35 % (8/23)have AG oppositions. For the 16 S and 23 S rRNA,we have similar percentages of unpaired AA andAG oppositions. 77 % (76/99) of the oppositionsare base-paired while 23 are not. Here, the highestpercentage of non-pairing occurs for the invariantAA oppositions (66 %; 8/12), followed by AA/AGexchanges (26 %; 12/46) and invariant AGs (7 %;3/41). It is not obvious why these oppositions arenot base-paired, while the majority of them are. Ahigher percentage of AA oppositions are not base-paired, and for the 16 S and 23 S rRNA a higherpercentage of oppositions in multi-stem loops arenot base-paired (42 % of the oppositions in multi-stem loops are not base-paired, versus 5 % and13 % of the oppositions in hairpin and internalloops; Table 1). There are no obvious sequence pat-terns ¯anking the oppositions that distinguish thepaired from the unpaired. Maybe there is a higherpercentage of unpaired oppositions in the multi-stem loops since these regions of the RNA havemore opportunities to form interactions with otherpositions in the multi-stem loop. And maybe theexplanation for the higher frequency of unpairedAA oppositions is that these unpaired adenosinesare inserting into the minor groove of helices, asrecently documented in the A-minor motif54andtype I/II base triples.55Alternatively, these AA and AG oppositionsmight not base-pair because one or both of thesepositions are involved in a standard base-baseinteraction with another region of the RNA or aninteraction with a protein. Some of the unpairedoppositions in the PDB database are associatedwith protein binding to the RNA, pseudoknots andunusual base-pair conformations between one ofthe positions in the opposition and another pos-ition (entries with unpaired oppositions associatedwith proteins are: 1CN8, 1AUD, 1RNK, 1ZDI,1ZDJ, 7MSF, 1YFG, 1C04; 1QA6, 1TLR, and 1GID).For the rRNAs, there are 23 oppositions that arenot base-paired. The positions in only four of theseare not involved in other intramolecular base-baseinteractions, while both positions in 12 oppositionsare involved in other intramolecular RNA-RNAinteractions, and one of the positions in seven ofthe oppositions is involved in another intramolecu-lar RNA-RNA interaction (Figure 1).However, in contrast, there are examples of A:Aand A:G base-pairs at the ends of helices in thePDB database that are also interacting with pro-teins (entries with paired oppositions associatedwith proteins are: 1A4T, 1QFQ, 1D6 K, 1DFU,1ETF, 1ULL, 484D, 2TOB, 1NEM, and 1PBR). Forthe rRNAs, there are examples of A:A and A:Gbase-pairs at the ends of helices that are interactingwith other positions in the rRNA crystalstructures.31,33Thus, there is no simple explanationfor why some of the AA and AG oppositions arenot base-paired. However, there is an example ofan A:A/A:G base-pair at the end of a helix in the16 S rRNA that becomes unpaired during proteinsynthesis, suggesting that these AA and AG oppo-sitions might not be static, but instead involved inmovement (see below).A:A and A:G base-pairs and conformations inlarger motifsIn 1985, it was observed that the majority of theadenosines were unpaired in the E. coli 16 S rRNAcovariation-based structure model.56More recently,A:A and A:G Base-pairs at the Ends of RNA Helices 747
  14. 14. it was determined that this bias occurs in a largecollection of 16 S and 23 S rRNA structure mod-els,25and that there is an even stronger bias forunpaired adenosines to be at the 3Hend of loops,and guanines and adenines to occur at the 5Hendof loops. These biases are consistent with and aug-ment our identi®cation of AA and AG oppositionsat the ends of helices. Other biases in the distri-butions of nucleotides in the loop structures withthese dominant adenosines at the 3Hends of loopswere identi®ed, with several different structuralmotifs mapped onto these regions of the 16 S and23 S rRNA25(see also 16 S and 23 S rRNA second-ary structure Figures with motifs mapped onto theoppositions at CRW AA.AG). These include adeno-sine platforms, E and E-like loops, tandem GAs,GNRA tetraloops, and U-turns. The AA and AGoppositions at the ends of helices are a componentin these motifs, although not necessarily in allexamples for each of these motifs. Sheared A:Gbase-pairs with the G 3Hto the helix are present inGNRA tetraloops, the E loop, tandem A:G base-pairs, and in some of the U-turns. Thus, thesheared base-pairing conformation appears to bean important structural element utilized in theselarger structural motifs. The GNRA tetraloop is acommon structural element in various RNAs,including the rRNAs.11The second motif is the Eloop that was ®rst identi®ed in the 5 S rRNA andsubsequently observed in several other RNAs.16,57 ±61The third motif is tandem G:A base-pairs. Herethe A:G base-pairs that are arranged in tandem canbe in the sheared or imino conformation. A singleA:G base-pair in the sheared conformation and¯anked by standard G:C or A:U base-pairs woulddistort the helix; however, a second A:G base-pairwith a sheared conformation in the proper orien-tation would offset this original distortion andbring the helix back into register. An unexpectedlyhigh number of tandem G:A base-pairs was ident-i®ed with comparative sequence analysis of therRNAs15,62(a revised list of tandem GA opposi-tions in the rRNAs is available at CRW A Story).The U-turn is the fourth motif, where the RNAbackbone undergoes a sharp bend after the single-stranded U in a UNR sequence. This motif is mostnotably present in the anticodon and T loops oftRNAs.63,64The UNR sequence, as revealed in arecent study of comparative structures of 16 S and23 S rRNAs,18is sometimes ¯anked by an A:Gbase-pair, and occurs within the three loop cat-egories: hairpin, internal, and multi-stem. (Wehave also noted that there is usually a AG or AAopposition that is adjacent to the G:U base-pairassociated with the adenosine platform14,25(seeabove).)Given that these A:G base-pairs at the ends ofhelices are associated with several larger motifs,we have analyzed here the conformation of theA:G base-pairs in various structural motifs andhave determined that the conformations are identi-cal in all of these motifs, except for the GNRA tet-raloops, where it is shifted slightly (Figure 2(b) and(c)).A:A and A:G base-pair and coaxial stackingAll but one of the A:A and A:G base-pairs at theends of helices in the PDB database and the 16 Sand 23 S rRNA crystal structures are stacked ontothe end of the helix. The extension of these helicesoccurs for all of the A:A and A:G base-pairs in thestructure database, except for one example in aconformationally constrained pseudoknot.45Thispreponderance of stacking is maintained in therRNAs, as noted earlier.Given the tendency for helices to coaxially stackonto one another when they are adjacent to oneanother, we have questioned if A:A and A:G base-pairs at the interface of two helices might in¯uencethe coaxial stacking potential of these two helices.Our analysis of the structures in the PDB structuredatabase was af®rmative: 76 % (16/21) of adjacenthelices with an A:A or A:G base-pair betweenthem are coaxially stacked. Previously, it has beenshown that coaxial stacking at helix junctionsstabilizes the structure by about 2 kcal/mol.65± 66Additional studies con®rmed that A:G base-pairsat the junction between coaxially stacked helicescontribute the same energy as U:A base-pairs,while tandem GAs are almost as stabilizing assingle AGs in a junction.67The analysis of the potential coaxial helices inthe 16 S and 23 S rRNA revealed mixed results. Atotal of 11 of the 12 (92 %) potential coaxial helicesare stacked in the 16 S rRNA crystal structure(Figure 1(a); base-pair frequency tables at CRWAA.AG). However, only 11 of the 22 (50 %) poten-tial coaxially stacked helices are actually stacked inthe 23 S rRNA crystal structure (Figure 1(b) and(c); base-pair frequency tables at CRW AA.AG).Conformational changes in the 16 S rRNAA-siteIn our paper about unpaired adenosines in thecovariation-based rRNA structure models,25weobserved that some of the positions involved inAA and AG oppositions at the ends of helices alsooccur in adenosine platforms, E and E-like loops,tandem GAs, and U-turn sequence motifs. Wespeculated that conformational rearrangementsmight be necessary if both of these sequence motifsfold into their respective structural motifs. Thecrystal structure of the A-site in 16 S rRNA hasbeen determined in the presence and absence ofthe antibiotics paromomycin, streptomycin, andspectinomycin,68initiation factor 1 (IF1),69andmRNA/tRNA.70The analysis of the crystal struc-ture revealed the status of the 1408:1493 AA/AGopposition at the end of a helix. This opposition isadjacent to the invariant C1407:G1494 base-pair.Position 1408 is an A in greater than 99 % of thebacteria, 98 % of the chloroplasts, and 96 % of themitochondria (see Online Table 4(a) at CRW748 A:A and A:G Base-pairs at the Ends of RNA Helices
  15. 15. AA.AG, and the individual nucleotide frequencytables at the CRW Site). All of these sequences thatdo not have an A at position 1408 have a G. Great-er than 99 % of the Eucarya 16 S-like rRNAsequences have a G at position 1408; the remainingsequences have an A. 70 % of the Archaea 16 SrRNA sequences have an A at position 1408, whilethe remaining 30 % have a G. Position 1493 is an Ain more than 99 % of all 16 S and 16 S-like rRNAsequences. Position 1492 is also equally conserved,with an adenosine in more than 99 % of all 16 Sand 16 S-like rRNA sequences (CRW Site SingleBase Frequency Tables). Thus, in the Bacteria,chloroplasts, and mitochondria, and 70 % of theArchaea, the 1408:1493 opposition is an AA, whileit is a GA in the Eucarya and 30 % of the Archaea.Positions 1408:1493 form an A:A base-pair in theT. thermophilus 30 S ribosomal subunit crystalstructure that is not complexed with antibiotics,IF1, or mRNA/tRNA33(Online Table 8), whilethey are unpaired in the three different crystalstructures that are complexed with the antibioticsparomomycin, streptomycin, and spectinomycin,IF1, and a mRNA/tRNA codon-anticodon helix.When positions 1408:1493 are not base-paired, thetwo invariant adenines at positions 1492 and 1493are ¯ipped out of the helix and are available forinteractions with IF1 and the codon-anticodonhelix. In conjunction with the unpairing of the1408:1493 base-pair and the movement of positions1492 and 1493 from the inside to the outside of thehelix, there are minor changes in the bend angleand the displacement of the coaxial stack ¯ankingboth sides of the 1408:1493 opposition (OnlineTable 8). The base-pairs in proximity to the1408:1493 opposition (C1399:G1504, G1401:C1501,C1402:A1500, C1404:G1497, G1405:C1496,U1406:U1495, C1407:G1494, C1409:G1491,G1410:C1490, C1411:G1489, and C1412:G1488) areall base-paired in both the presence and absence ofthese molecules involved in protein synthesis(Online Table 8). The conserved, but not invariant,A1413:G1487 base-pair (see CRW Site base-pair fre-quency tables for 16 S rRNA; predominantly A:Gin the Bacteria, Archaea, and chloroplasts, U:A inthe Eucarya, and C:G in the mitochondria) is base-paired in the imino conformation in three of thefour crystal structures, and is unpaired in the pre-sence of IF1. These results reveal that the 1408:1493AA/AG opposition at the end of the helix isinvolved in a conformational rearrangementdirectly associated with protein synthesis. Thisregion of the A-site contains a set of commonlyoccurring rRNA motifs, described earlier.25Morethan 50 % (527 in total) of the 3Hends of loops in16 S and 23 S rRNA contain a conserved adenosinein the covariation-based structure models. 56(11 %) of these ``A-motifs are ¯anked by an A onits 5Hend and a paired G on its 3Hend. This highlyconserved AAG motif occurs at 16 S rRNA pos-itions 1492-1494. While this sequence motif con-tains some of the features characteristic of theadenosine platform,24,25we do not know if pos-itions 1492 and 1493 are base-paired at some stagein protein synthesis, as they are in the adenosineplatform.Concluding statementOur analysis of the PDB structure database andthe 16 S and 23 S rRNA crystal structures revealedgeneral similarities in the higher than expected fre-quencies of AA and AG oppositions at the ends ofhelices, and, for both sets of data, similar extents ofbase-pairing (80 % for the PDB, 76 % for the tworRNAs). The frequencies of AG oppositions andoppositions that are base-paired were higher thanthe frequencies of AA oppositions and their base-pairs for both data sets. As well, the frequency ofoppositions that are base-paired is highest for thehairpin loops for both data sets, followed byinternal and multi-stem loops for the rRNAs. Thefrequencies of A:G base-pairs (when the G is 3Htothe helix) in the sheared conformation are signi®-cantly higher than the frequency of imino confor-mations and other unusual conformations for bothdata sets, while essentially all of the A:G base-pairswith the G 5Hto the helix are in the imino confor-mation for both data sets. The sheared confor-mation occurs in 100 % of the A:A/A:G base-pairsat the ends of helices in hairpin loops in both datasets, a lower percentage in internal loops (82 %(27/33) in rRNA, 55 % (23/42) in the PDB), and thelowest percentage in multi-stem loops (61 % (14/23) in rRNA, 35 % (6/17) in the PDB). In contrast,the imino conformation occurs at the lowest per-centage in hairpin loops (0 % in both data sets), ahigher percentage in internal loops (9 % (3/33) inrRNA, 33 % (14/42) in the PDB), and the highestpercentage in multi-stem loops (13 % (6/23) inrRNA, 53 % (9/17) in the PDB). Other confor-mations occur in both data sets, although limitedto internal and multi-stem loops. For the rRNAs,they are more prevalent than imino conformations,especially in multi-stem loops (Table 1). All ofthese A:A/A:G base-pairs are stacked in someform onto the ¯anking helix. The one major, anom-alous difference between the two data sets is forcoaxial stacking. 91 % (21/23) of the potential coax-ial stacks in the PDB database are coaxial. For 16 SrRNA, this number is 92 % (11/12). However, for23 S rRNA, this number is only 50 % (11/22). Thecombined total for 16 S and 23 S rRNA is 65 %(22/34).A:A and A:G base-pairs at the ends of helicesare associated with several different structuralmotifs, including E loops, U-turns, and GNRA tet-raloops. While the majority of the AA and AGoppositions are base-paired, approximately 25 % ofthem are not. The percentage of unpaired AAoppositions is higher than unpaired AG opposi-tions. For the ribosomal RNAs, the highest percen-tage of unpaired oppositions is for those that occurin the multi-stem loops. Currently, there is noobvious explanation for why 25 % of the opposi-tions are not base-paired. However, given that theA:A and A:G Base-pairs at the Ends of RNA Helices 749
  16. 16. 16 S rRNA 1408:1493 AA/AG opposition isdynamic, changing its form from paired tounpaired during protein synthesis, we wonder ifthe state of other AA/AG oppositions at the endsof helices are also dynamic and associated withribosomal movement during assembly and proteinsynthesis.71,72Materials and MethodsThe rRNA sequence alignments used for this analysisare maintained by us at the University of Texas and areavailable from the CRW AA.AG Site (see below).Sequences were manually aligned with the alignmenteditor AE2 (T. Macke, Scripps Clinic, San Diego, CA).Our analysis of the AA and AG oppositions at the endsof helices was performed on this large collection of 16 Sand 23 S rRNA sequences that span the three primaryphylogenetic lineages and the two Eucarya organelles, asoutlined in Table 3. The numbering systems from theE. coli 16 S and 23 S rRNA sequences (GenBank Acces-sion no. J01695) are used as the references for positionnumbers for both 16 S and 23 S rRNAs.AA and AG oppositions at the ends of helices in themost recent (December 1999) 16 S and 23 S rRNA E. colicovariation-based structure models (CRW Site; seebelow) were manually identi®ed. Each candidate wasclassi®ed into one of three loop types: hairpin, internalor multi-stem. The program query (Gutell et al., unpub-lished) was used to collect single nucleotide and base-pair frequency data from the (AE2) sequence alignments.Base frequencies for each candidate were computedindependently from each of the alignments (16 S and23 S rRNAs; bacterial, archaea, and eucarya nuclear,chloroplast, and mitochondrial). AA.AG@helix.ends can-didates with greater than 90 % AA, AG or AA/AG (withthe G 3Hto the helix for AG and AA/AG oppositions) inthe bacterial alignment were considered further. Thecomparative sequence analysis data is summarized inTable 1 and presented in greater detail in Online Table 4at CRW AA.AG (see below).Supplementary data that augments the Tables andFigures in this manuscript is available from the CRWAA.AG@helix.ends pages (abbreviated as CRW AA.AG;http://www.rna.icmb.utexas.edu/ANALYSIS/AAAG/),the CRW Site (http://www.rna.icmb.utexas.edu), andthe CRW A Story pages (http://www.rna.icmb.utexas.e-du/ANALYSIS/A-STORY/). The information availableat CRW AA.AG includes: base-pair frequency tables forall of the AA and AG oppositions at the ends of helicesthat occur in more than 90 % of the bacterial sequences(Online Table 4); tables of the PDB structures analyzedin Table 2 (Online Table 5) and for the coaxial stackinganalysis (Online Table 6); chemical structure diagramsfor all of the base-pair types described here (OnlineFigure 3); and 16 S and 23 S rRNA secondary structurediagrams showing the AA/AG oppositions, potentialcoaxial stackings (Figure 1) and multiple motifs (OnlineFigure 5).The tabulated information in Table 1 is culled fromOnline Table 4 (16 S and 23 S rRNA base-pair frequencytables), which includes: (1) the percent occurrences for all16 base-pairing types (e.g., A:A, A:C, A:G, etc.) at eachof the AA and AG sites in ®ve alignments (Bacteria,Archaea, Eucarya nuclear, chloroplasts and mitochon-dria); (2) the exchange patterns between AA and AG; (3)the loop type (hairpin, internal, or multi-stem); (4) anyassociated motifs (e.g. E loop); and (5) for all of theoppositions that are base-paired in the rRNA crystalstructures, four additional entries: (a) a RasMol73,74image of that base-pair created from the crystal struc-tures (16 S rRNA, PDB ID 1FJF;3323 S rRNA, PDB ID1FFK31); (b) the conformation of the base-pair;34,44(c)identi®cation of the nucleotides of the opposition whichstack onto the adjoining helix; and (d) the adjoiningbase-pair(s) upon which the opposition stacks. Theonline tables describing the PDB structures (OnlineTables 5 and 6) present, for each of the NMR and crystalstructures, an expanded description of the experimentalsystems, RasMol73,74images highlighting the AA andAG oppositions, links to the MEDLINE abstract, andadditional information pertinent to that analysis.The secondary structure Figures showing theAA.AG@helix.ends sites (Figure 1) and additionalsecondary structure diagrams at CRW AA.AG (OnlineFigure 5) were generated using the interactive graphicsprogram XRNA (Weiser Noller, University of Califor-nia, Santa Cruz). Chemical structures were generatedusing ISIS/Draw and CS ChemDraw Std. 3D imageswere generated using Insight II.The PDB ®le for each rRNA crystal structure was visu-alized using RasMol.73,74The conformation34,44of eachbase-pair was assessed.We have analyzed the A:A and A:G oppositions at theends of helices in the NMR and crystal structures fromthe PDB.35Only one structure was analyzed when thatstructure was solved more than once with the samemethod. For NMR structures, we analyzed either theminimized average structure (when available) or the ®rststructure. Both NMR and crystal structures were ana-lyzed when a single structure was solved using bothmethods. For sequences determined by both X-ray crys-tallography and NMR spectroscopy, we analyzed onestructure from each method. Both the free and boundforms were analyzed when the same RNA construct wasTable 3. Approximate number of sequences in the 16 S and 23 S rRNA alignmentsNo. of sequences bAlignment IDaPhylogenetic group/organelle 16 S rRNA 23 S rRNAB Bacteria 5850 325A Archaea 260 40C Chloroplast 180 100E Eucarya 1050 265M Mitochondria 160 310Total All 8500 1040aSingle-letter code used to identify the alignment in the base-pair frequency tables (Online Table 4 at CRW AA.AG).bApproximate number of sequences in each alignment at the time of this analysis.750 A:A and A:G Base-pairs at the Ends of RNA Helices
  17. 17. solved in the presence and absence of protein or otherligands.Base-pairs were extracted from PDB ®les and superim-posed using Insight II. The atoms in the base of eachadenine in A:G base-pairs were superimposed. For A:Abase-pairs, the atoms in one adenine were superimposedso that the other adenine of the base-pair sat on themajor groove side of the superimposed adenines. Basestacking was evaluated manually using Insight II andRasMol.A Curves analysis50,51was used to assess if adjacenthelices were coaxial by determining the angle and axisdisplacement between the best linear axes of thesehelices. Linear axes were calculated for helices with threeor more base-pairs, including the terminal A:G or A:Abase-pair. When the A was not base-paired, this nucleo-tide was not included in axis calculations. Coaxial helicesshould theoretically have no axis displacement and littleor no angle between axes. The D stem and anticodonstem are relatively coaxial in the tRNA three-dimen-sional structure. In this case, the average angle betweenthe anticodon stem ending in an imino A:G base-pairand the D stem axes is 17.17 and the axis displacementis 3.36 AÊ for the eight structures studied. These valueswere used as a baseline to determine whether the axes inother structures were also coaxially stacked, accountingfor a range of normal base-pair helicoidal parameters atthe junctions. For the analysis of the full set of 21examples, we considered two helices to be coaxial whenthe angle between them was less than 30 and the helixdisplacement was less than 5 AÊ .Note Added in ProofA re-analysis of the 50 S ribosomal crystal struc-ture revealed that the 2650 helix in 23 S rRNA(Figure 1(c), page 740) is coaxially stacked, andthus should be colored yellow and not brown. Thecounts of coaxially stacked helices on pages 748and 749 have been corrected.AcknowledgmentsWe greatly appreciate the constructive commentsfrom both reviewers. This work was supported by theNIH (GM48207, awarded to R.R.G.; GM56544, awardedto S.C.H.) and from startup funds from the Institute forCellular and Molecular Biology at the University ofTexas at Austin and the Welch Foundation (bothawarded to R.R.G.).References1. Mathews, D. H., Sabina, J., Zuker, M. Turner,D. H. (1999). Expanded sequence dependence ofthermodynamic parameters improves prediction ofRNA secondary structure. J. Mol. Biol. 288, 911-940.2. Zuker, M., Mathews, D. H. Turner, D. H. (1999).Algorithms and thermodynamics for RNA second-ary structure prediction: a practical guide. In RNABiochemistry and Biotechnology (Barciszewski, J. Clark, B. F. C., eds), pp. 11-43, Kluwer AcademicPublishers.3. Konings, D. A. M. Gutell, R. R. (1995). A compari-son of thermodynamic foldings with comparativelyderived structures of 16 S and 16 S-like rRNAs.RNA, 1, 559-574.4. Fields, D. S. Gutell, R. R. (1996). An analysis oflarge rRNA sequences folded by a thermodynamicmethod. Fold. Des. 1, 419-430.5. Woese, C. R. Pace, N. R. (1993). Probing RNAstructure, function, and history by comparative anal-ysis. In The RNA World (Gesteland, R. F. Atkins,J. F., eds), pp. 91-118, Cold Spring HarborLaboratory Press, Plainview, New York.6. Gutell, R. R., Larsen, N. Woese, C. R. (1994).Lessons from an evolving rRNA: 16 S and 23 SrRNA structures from a comparative perspective.Microbiol. Rev. 58, 10-26.7. Gutell, R. R. (1999). Comparative analysis of RNAsequences. Nucl. Acids Symp. Ser. 41, 48-53.8. Gutell, R. R. (1996). Comparative sequence analysisand the structure of 16 S and 23 S rRNA. In Riboso-mal RNA. Structure, Evolution, Processing, and Func-tion in Protein Biosynthesis (Zimmerman, R. A. Dahlberg, A. E., eds), pp. 111-128, CRC Press, BocaRaton.9. Gautheret, D. Gutell, R. R. (1997). Inferring theconformation of RNA base pairs and triples frompatterns of sequence variation. Nucl. Acids Res. 25,1559-1564.10. Michel, F., Costa, M., Massire, C. Westhof, E.(2000). Modeling RNA tertiary structure from pat-terns of sequence variation. Methods Enzymol. 317,491-510.11. Woese, C. R., Winker, S. Gutell, R. R. (1990).Architecture of ribosomal RNA: constraints on thesequence of tetra-loops. Proc. Natl Acad. Sci. USA,87, 8467-8471.12. Gutell, R. R., Noller, H. F. Woese, C. R. (1986).Higher order structure in ribosomal RNA. EMBO J.5, 1111-1113.13. Lehnert, V., Jaeger, L., Michel, F. Westhof, E.(1996). New loop-loop tertiary interactions in self-splicing introns of subgroup IC and ID: a complete3D model of the Tetrahymena thermophila ribozyme.Chem. Biol. 3, 993-1009.14. Gautheret, D., Konings, D. Gutell, R. R. (1995).G.U base pairing motifs in ribosomal RNA. RNA, 1,807-814.15. Gautheret, D., Konings, D. Gutell, R. R. (1994). Amajor family of motifs involving G.A mismatches inribosomal RNA. J. Mol. Biol. 242, 1-8.16. Wimberly, B. (1994). A common RNA loop motif asa docking module and its function in the hammer-head ribozyme. Nature Struct. Biol. 1, 820-827.17. Leontis, N. B. Westhof, E. (1998). A commonmotif organizes the structure of multi-helix loops in16 S and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-583.18. Gutell, R. R., Cannone, J. J., Konings, D. Gautheret, D. (2000). Predicting U-turns in riboso-mal RNA with comparative sequence analysis.J. Mol. Biol. 300, 791-803.19. Michel, F. Westhof, E. (1990). Modeling of thethree-dimensional architecture of group I catalyticintrons based upon comparative sequence analysis.J. Mol. Biol. 216, 585-610.20. Gautheret, D., Damberger, S. H. Gutell, R. R.(1995). Identi®cation of base triples in RNA usingcomparative sequence analysis. J. Mol. Biol. 248, 27-43.A:A and A:G Base-pairs at the Ends of RNA Helices 751
  18. 18. 21. Jaeger, L., Michel, F. Westhof, E. (1994). Involve-ment of a GNRA tetraloop in long-range RNAtertiary interactions. J. Mol. Biol. 236, 1271-1276.22. Costa, M. Michel, F. (1995). Frequent use of thesame tertiary motif by self-folding RNAs. EMBO J.14, 1276-1285.23. Costa, M. Michel, F. (1997). Rules for RNA recog-nition of GNRA tetraloops deduced by in vitro selec-tion: comparison with in vivo evolution. EMBO J. 16,3289-3302.24. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K.,Golden, B. L., Szewczak, A. A., Kundrot, C. E.,Cech, T. R. Doudna, J. A. (1996). RNA tertiarystructure mediation by adenosine platforms. Science,273, 1696-1699.25. Gutell, R. R., Cannone, J. J., Shang, Z., Du, Y. Serra, M. (2000). A story: unpaired adenosine basesin ribosomal RNAs. J. Mol. Biol. 304, 335-354.26. Hermann, T. Patel, D. J. (1999). Stitching togetherRNA tertiary architectures. J. Mol. Biol. 294, 829-849.27. Moore, P. B. (1999). Structural motifs in RNA. Annu.Rev. Biochem. 68, 287-300.28. Traub, W. Sussman, J. L. (1982). Adenine-guaninebase pairing ribosomal RNA. Nucl. Acids Res. 10,2701-2708.29. Woese, C. R., Gutell, R., Gupta, R. Noller, H. F.(1983). Detailed analysis of the higher-orderstructure of 16 S-like ribosomal ribonucleic acids.Microbiol. Rev. 47, 621-669.30. Conn, G. L., Draper, D. E., Lattman, E. E. Gittis,A. G. (1999). Crystal structure of a conserved riboso-mal protein-RNA complex. Science, 284, 1171-1174.31. Ban, N., Nissen, P., Hansen, J., Moore, P. B. Steitz, T. A. (2000). The complete atomic structure ofthe large ribosomal subunit at 2.4 AÊ resolution.Science, 289, 905-920.32. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J.,Gluehmann, M., Janell, D., Bashan, A., Bartels, H.,Agmon, I., Franceschi, F. Yonath, A. (2000).Structure of functionally activated small ribosomalsubunit at 3.3 AÊ resolution. Cell, 102, 615-623.33. Wimberly, B. T., Brodersen, D. E., Clemons, W. M.,Jr, Morgan-Warren, R. J., Carter, A. P., Vonhein, C.,Hartsch, T. Ramakrishnan, V. (2000). Structure ofthe 30 S ribosomal subunit. Nature, 407, 327-339.34. Burkard, M. E., Turner, D. H. Tinoco, I., Jr (1999).Structures of base pairs involving at least twohydrogen bonds. In The RNA World (Gesteland, R. F.,Cech, T. R. Atkins, J. F., eds), 2nd edit., pp. 675-680, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, New York.35. Berman, H. M., Westbrook, J., Feng, Z., Gilliland,G., Bhat, T. N., Weissig, H., Shindyalov, I. N. Bourne, P. E. (2000). The Protein Data Bank. Nucl.Acids Res. 28, 235-242.36. Golden, B. L., Gooding, A. R., Podell, E. R. Cech,T. R. (1998). A preorganized active site in the crystalstructure of the Tetrahymena ribozyme. Science, 282,259-264.37. Wu, M. Turner, D. H. (1996). Solution structure of(rGCGGACGC)2 by two-dimensional NMR and theiterative relaxation matrix approach. Biochemistry,35, 9677-9689.38. Rowsell, S., Stonehouse, N. J., Convery, M. A.,Adams, C. J., Ellington, A. D., Hirao, I., Peabody,D. S., Stockley, P. G. Phillips, S. E. (1998). Crystalstructures of a series of RNA aptamers complexedto the same protein target. Nature Struct. Biol. 5, 970-975.39. Peterson, R. D. Feigon, J. (1996). Structural changein Rev responsive element RNA of HIV-1 on bind-ing Rev peptide. J. Mol. Biol. 264, 863-877.40. Battiste, J., Mao, H., Rao, N., Tan, R., Muhandiram,D., Kay, L., Frankel, A. Williamson, J. (1996).Alpha helix-RNA major groove recognition in anHIV-1 rev peptide-RRE RNA complex. Science, 273,1547-1551.41. Ye, X., Gorin, A., Ellington, A. D. Patel, D. J.(1996). Deep penetration of an alpha-helix into awidened RNA major groove in the HIV-1 rev pep-tide-RNA aptamer complex. Nature Struct. Biol. 3,1026-1033.42. Jucker, F. M., Heus, H. A., Yip, P. F., Moors, E. H. Pardi, A. (1996). A network of heterogeneoushydrogen bonds in GNRA tetraloops. J. Mol. Biol.264, 968-980.43. SantaLucia, J. J. Turner, D. H. (1993). Structure of(rGGCGAGCC)2 in solution from NMR andrestrained molecular dynamics. Biochemistry, 32,12612-12623.44. Nagaswamy, U., Voss, N., Zhang, Z. Fox, G. E.(2000). Database of non-canonical base pairs foundin known RNA structures. Nucl. Acids Res. 28, 375-376.45. Shen, L. X. Tinoco, I. J. (1995). The structure of anRNA pseudoknot that causes ef®cient frameshiftingin mouse mammary tumor virus. J. Mol. Biol. 247,963-978.46. Kang, H., Hines, J. V. Tinoco, I. J. (1996). Confor-mation of a non-frameshifting RNA pseudoknotfrom mouse mammary tumor virus. J. Mol. Biol. 259,135-147.47. Burkard, M. E., Kierzek, R. Turner, D. H. (1999).Thermodynamics of unpaired terminal nucleotideson short RNA helixes correlates with stacking athelix termini in larger RNAs. J. Mol. Biol. 290, 967-982.48. Sussman, J. L., Holbrook, S. R., Warrant, R. W.,Church, G. M. Kim, S.-H. (1978). Crystal structureof yeast phenylalanine T-RNA. I. Crystallographicre®nement. J. Mol. Biol. 123, 607-630.49. Wimberly, B. T., Guymon, R., McCutcheon, J. P.,White, S. W. Ramakrishnan, V. (1999). A detailedview of a ribosomal active site: the structure of theL11-RNA complex. Cell, 97, 491-502.50. Lavery, R. Sklenar, H. (1988). The de®nition ofgeneralized helicoidal parameters and of axis curva-ture for irregular nucleic acids. J. Biomol. Struct.Dynam. 6, 63-91.51. Lavery, R. Sklenar, H. (1989). De®ning thestructure of irregular nucleic acids: conventions andprinciples. J. Biomol. Struct. Dynam. 6, 655-667.52. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K.,Golden, B. L., Kundrot, C. E., Cech, T. R. Doudna, J. A. (1996). Crystal structure of a group Iribozyme domain: principles of RNA packing.Science, 273, 1678-1685.53. Allain, F. H., Howe, P. W., Neuhaus, D. Varani,G. (1997). Structural basis of the RNA-binding speci-®city of human U1A protein. EMBO J. 16, 5764-5772.54. Nissen, P., Ippolito, J. A., Ban, N., Moore, P. B. Steitz, T. A. (2001). RNA tertiary interactions in thelarge ribosomal subunit: the A-minor motif. Proc.Natl Acad. Sci. USA, 98, 4899-4903.55. Doherty, E. A., Batey, R. T., Masquida, B. Doudna, J. A. (2001). A universal mode of helixpacking in RNA. Nature Struct. Biol. 8, 339-343.752 A:A and A:G Base-pairs at the Ends of RNA Helices
  19. 19. 56. Gutell, R. R., Weiser, B., Woese, C. R. Noller, H. F.(1985). Comparative anatomy of 16 S-like ribosomalRNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216.57. Varani, G., Wimberly, B. Tinoco, I. J. (1989). Con-formation and dynamics of an RNA internal loop.Biochemistry, 28, 7760-7772.58. Wimberly, B., Varani, G. Tinoco, I. J. (1993). Theconformation of loop E of eukaryotic 5S ribosomalRNA. Biochemistry, 32, 1078-1087.59. Szewczak, A. A., Moore, P. B., Chang, Y. L. Wool,I. G. (1993). The conformation of the sarcin/ricinloop from 28S ribosomal RNA. Proc. Natl Acad. Sci.USA, 90, 9581-9585.60. Correll, C. C., Munishkin, A., Chan, Y. L., Ren, Z.,Wool, I. G. Steitz, T. A. (1998). Crystal structureof the ribosomal RNA domain essential for bindingelongation factors. Proc. Natl Acad. Sci. USA, 95,13436-13441.61. Correll, C. C. Munishkin, W. I. (1999). The twofaces of the Escherichia coli 23 S rRNA Sarcin/Ricindomain: the structure at 1.11 AÊ resolution. J. Mol.Biol. 292, 275-287.62. SantaLucia, J. J., Kierzek, R. Turner, D. H. (1990).Effects of GA mismatches on the structure and ther-modynamics of RNA internal loops. Biochemistry, 29,8813-8819.63. Quigley, G. J. Rich, A. (1976). Structural domainsof transfer RNA molecules. Science, 194, 796-806.64. Kim, S.-H. (1979). Crystal structure of yeast tRNA-phe and general structural features of other tRNAs.In Transfer RNA: Structure, Properties, and Recognition(Schimmel, P. R., Soll, D. Abelson, J. N., eds), pp.83-100, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, New York.65. Walter, A. E. Turner, D. H. (1994). Sequencedependence of stability for coaxial stacking of RNAhelixes with Watson-Crick base paired interfaces.Biochemistry, 33, 12715-12719.66. Walter, A. E., Turner, D. H., Kim, J., Lyttle, M. H.,MuÈller, P., Mathews, D. H. Zuker, M. (1994).Coaxial stacking of helixes enhances binding ofoligoribonucleotides and improves predictions ofRNA folding. Proc. Natl Acad. Sci. USA, 91, 9218-9222.67. Kim, J., Walter, A. E. Turner, D. H. (1996). Ther-modynamics of coaxially stacked helixes with GAand CC mismatches. Biochemistry, 35, 13753-13761.68. Carter, A. P., Clemons, W. M., Brodersen, D. E.,Morgan-Warren, R. J., Wimberly, B. T. Ramakrishnan, V. (2000). Functional insights fromthe structure of the 30S ribosomal subunit and itsinteractions with antibiotics. Nature, 407, 340-348.69. Carter, A. P., Clemons, W. M., Jr., Brodersen, D. E.,Morgan-Warren, R. J., Hartsch, T., Wimberly, B. T. Ramakrishnan, V. (2001). Crystal structure of aninitiation factor bound to the 30S ribosomal subunit.Science, 291, 498-501.70. Ogle, J. M., Brodersen, D. E., Clemons, W. M., Jr,Tarry, M. J., Carter, A. P. Ramakrishnan, V.(2001). Recognition of cognate transfer RNA by the30 S ribosomal subunit. Science, 292, 897-902.71. Woese, C. R. (1980). Just so stories and rube gold-berg machines: speculations on the origins of theprotein synthetic machinery. In Ribosomes: Structure,Function, and Genetics (Chambliss, G., Craven, G. R.,Davies, J., Davis, K., Kahan, L. Nomura, M., eds),pp. 357-373, University Park Press, Baltimore,Maryland.72. Frank, J. Agrawal, R. K. (2000). A ratchet-likeinter-subunit reorganization of the ribosome duringtranslocation. Nature, 406, 318-322.73. Sayle, R. A. Milner-White, E. J. (1995). RASMOL:biomolecular graphics for all. Trends Biochem. Sci. 20,374.74. Bernstein, H. J. (2000). Recent changes to RasMol,recombining the variants. Trends Biochem. Sci. 25,453-455.Edited by J. Doudna(Received 27 December 2000; received in revised form 14 May 2001; accepted 29 May 2001)A:A and A:G Base-pairs at the Ends of RNA Helices 753

×