JMB—MS 440 Cust. Ref. No. PEW 84/94 [SGML]J. Mol. Biol. (1995) 248, 27–43Identiﬁcation of Base-triples in RNA usingComparative Sequence AnalysisDaniel Gautheret, Simon H. Damberger and Robin R. Gutell*Comparative sequence analysis has proven to be a very efﬁcient tool for theDepartment of MolecularCellular and Developmental determination of RNA secondary structure and certain tertiary interactions.However, base-triples, an important RNA structural element, cannot beBiology, Campus Box 347predicted accurately from sequence data. We show here that the poor baseUniversity of ColoradoBoulder, CO 80309-0347 correlations observed at base-triple positions are the result of two factors. (1)Base covariation is not as strictly required in triples as it is in Watson–CrickU.S.A.pairs. (2) Base-triple structures are less conserved among homologousmolecules. A particularity of known triple-helical regions is the presence ofmultiple base correlations that do not reﬂect direct pairing. We suggest thatnatural mutations in base-triples create structural changes that requirecompensatory mutations in adjacent base-pairs and triples to maintain thetriple-helix conformation. On the basis of these observations, we devised twonew measures of association that signiﬁcantly enhance the base-triple signalin correlation studies. We evaluated correlations between base-pairs andsingle stranded bases, and correlations between adjacent base-pairs.Positions that score well in both analyses are the best triple candidates. Thisprocedure correctly identiﬁes triples, or interactions very close to theproposed triples, in type I and type II tRNAs and in the group I intron.Keywords: RNA structure; comparative sequence analysis; base-triples*Corresponding authorIntroductionBase-triples are among the essential tertiaryinteractions in RNA three-dimensional structure.The best characterized RNA base-triples are those oftRNA (Quigley & Rich, 1976; Sussman & Kim, 1976),and there is also good evidence for base or nucleotidetriples in self-splicing group I introns, in which theyare required for enzymatic activity (Michel et al.,1990). Base-triples involving a base-pair and a distantsingle-stranded nucleotide create long-range con-straints on RNA folding, and constitute powerfulassets for structure determination. The value ofbase-triple information in modeling studies has beenclearly demonstrated in the case of group I introns(Michel & Westhof, 1990; Jaeger et al., 1994), andmore beneﬁts can be expected from the incorpor-ation of base-triple information in computationalRNA folding procedures (Malhotra et al., 1990; Majoret al., 1993). The prediction of base-triples directlyfrom sequence information is therefore highlydesirable.Certain base interactions, those constituting RNAsecondary structure, can be predicted accuratelyfrom sequence data using comparative sequenceanalysis, a method based on the principle thatevolution maintains a common structure throughcompensatory mutations (reviewed by Gutell, 1993;Woese & Pace, 1993). Compensatory mutations wereinitially identiﬁed visually in relatively smallsequence alignments, resulting in the ﬁrst reliablesecondary structure models (Gutell, 1993; Woese &Pace, 1993). The simultaneous growth of sequencedatabases and reﬁnement of computational methodshave signiﬁcantly enhanced our ability to derivebase–base interactions from sequence analysis(Olsen, 1983; Gutell et al., 1985; Haselman et al., 1988;Winker et al., 1990; Chiu & Kolodziejczak, 1991;Gutell et al., 1992). Although methods have improvedsufﬁciently to identify correctly several tertiaryinteractions in 16 S and 23 S rRNA (Gutell et al.,1994), predicting base-triples with conﬁdenceremains problematic. Only a few base-triples havebeen suggested on the basis of comparative analysisto date, in the early study of tRNA by Levitt (1969),in rRNA (Gutell, et al., 1994) and in the group I intron(Michel et al., 1990), where triples were experimen-tally substantiated (Michel et al., 1990; Green &Szostak, 1994).Present address: D. Gautheret, Departement deBiologie, Universite Aix-Marseille II, Faculte de Luminy,13 000 Marseille, and J.G.S., C.N.R.S., 31 ch. JosephAiguier, 13 402 Marseille Cedex 20, France.0022–2836/95/160027–17 $08.00/0 7 1995 Academic Press Limited
JMB—MS 440Identiﬁcation of RNA Triples28In spite of the scarcity of comparatively inferredbase-triples, these interactions are certainly wide-spread, and therefore many remain to be discovered.We have thus begun a detailed comparativeanalysis of RNA triples to derive principles andalgorithms that can be applied to base-tripleprediction in different RNA molecules. Theavailability of large sequence databases and ofseveral tRNA crystal structures now permits a morethorough characterization of triple interactions. Wecan now ask how base-triple structures vary inrelated molecules, and how base sequences at andaround triples reﬂect these structural changes.Principles derived from the analysis of tRNA andgroup I intron triples can be incorporated into ourcorrelation analyses, and signiﬁcantly enhance ourability to predict base-triples from sets of alignedsequences.Characterization of Base-triplesSequence correlations in the vicinity ofbase-triplesCurrent comparative analysis methods detectnucleotide interactions by measuring correlationsbetween pairs of RNA positions. This usuallyinvolves the construction of contingency tablescontaining the number of observations for eachbase-pair at position i·j. Let no(Mi,Nj) be thenumber of observations of base-pair M·N(M,N $ 4A,U,G,C5) at position i·j. We compute thenumber of bases M and N at positions i and j(no(Mi) and no(Mj)) and the expected numberof observations for each M·N base pair:ne(Mi,Nj) = no(Mi) × no(Nj). The difference be-tween expected and observed values reﬂects thedependence of the two positions. This difference canbe computed as follows (Olsen, 1983):x2= sM,N[no(Mi,Nj) − ne(Mi,Nj)]2ne(Mi,Nj)(1)Mutual information is an alternative measureof correlation that yields improved results inthe detection of RNA interactions (Chiu &Kolodziejczak, 1991). It requires base frequencies( fo(Mi,Nj), fo(Mi), fo(Nj)) to be used instead ofabsolute numbers; it is computed as follows:M(i,j) = sM,N $fo(Mi,Nj) × lnfo(Mi,Nj)fo(Mi) × fo(Nj)% (2)Mutual information accurately predicts thesecondary structure of tRNA, as well as the tertiarypairs 15.48 and 26.44 (Chiu & Kolodziejczak, 1991;Gutell et al., 1992). We present in Tables 1 and 2 theM(i,j) values obtained in the base-triple regions oftRNA and group I intron. For each position, the eighthighest correlations are shown (73 positions in tRNAand 134 in the group I intron were analyzed). Themost signiﬁcant correlations are at the top of eachcolumn, and those corresponding to possible triplesare indicated by asterisks. The secondary structureand tertiary interactions of yeast tRNAPheare shownin Figure 1a. Base-triples involve positions 45·(10·25),(12·23)·9 and (13·22)·46. The proposed group I introntriples (Michel & Westhof, 1990) involve positions(108·213)·259 and (109·212)·260 in the P4 stem and(216·257)·105 and (215·258)·106 in the P6 stem. Theseare shown on the intron secondary structure inFigure 2.The secondary structure correlations (10/25,11/24, 12/23 and 13/22 in tRNA (see Table 1) and108/213, 109/212, 215/258 and 216/257 in group I(see Table 2)) are the highest at each helical position.The correlations that follow Watson–Crick pairingsin Tables 1 and 2 are intriguing. Certain base-triple positions correlate (23/9 and 22/46 in tRNA,212/260 and 213/259 in the group I intron), but doso more weakly than secondary pairs (compare, e.g.23/12 and 23/9), and even more weakly than somenon-interacting positions. For example, in tRNA,the value of correlation 23/9 (a base-triple) is lowerthan that of 23/13 (non-interacting positions). TheTable 1The eight best correlations (M(i,j) (Gutell et al., 1992) for tRNA positions 2, 9 to 13, 22 to 25 and 45 to 46 are evaluatedagainst all tRNA positionstRNA positions2 9 10 11 12 13 22 23 24 25 45 4671a0.90b23 0.26* 25 0.08 24 0.78 23 0.99 22 0.33 13 0.33 12 0.99 11 0.78 10 0.08 46 0.12 13 0.31*35 0.09 12 0.26* 45 0.06* 13. 0.29 13 0.30 46 0.31* 46 0.28* 13 0.28 13 0.28 24 0.06 13 0.11* 22 0.28*31 0.06 13 0.12 64 0.04 36 0.18 9 0.26* 12 0.30 23 0.17 9 0.26* 36 0.16 11 0.06 22 0.08* 12 0.1712 0.06 46 0.09 32 0.03 12 0.14 46 0.17 11 0.29 12 0.17 22 0.17 12 0.15 39 0.05 12 0.07 23 0.1729 0.06 24 0.07 50 0.03 23 0.14 22 0.17 23 0.28 11 0.13 46 0.17 23 0.14 26 0.04 9 0.07 45 0.1224 0.06 11 0.07 49 0.03 22 0.13 24 0.15 24 0.28 24 0.13 24 0.14 22 0.13 49 0.04 23 0.06 24 0.1170 0.05 45 0.07 68 0.02 26 0.11 11 0.14 36 0.13 36 0.08 11 0.14 46 0.11 13 0.04 10 0.06* 35 0.1041 0.05 22 0.06 5 0.02 46 0.10 26 0.09 9 0.12 45 0.08 1 0.08 26 0.11 65 0.04 36 0.06 11 0.10Numbers in bold type denote correlations between nucleotides in close proximity in the 3D structure. Correlations correspondingto a secondary structure base-pair are indicated by an asterisk, while base-triples in any of the type I tRNA crystal structures areunderlined.atRNA position number, based on yeast Phe reference numbering. Base-triples for yeast Phe are: (10·25)·45, (12·23)·9 and (13·22)·46.Alternative base triples found in other tRNA crystal structures are noted in Figure 1.b(M(i,j)) correlation value.
JMB—MS 440Identiﬁcation of RNA Triples 29Table 2The eight best correlations (M(i,j)) for group I intron positions 105 to 109, 212 to 213, 215 to 216 and 257 to 260 are evaluatedagainst the positions of the group I intron core (deﬁned in Materials and Methods)Group I intron positions105 106 108 109 212 213 215 216 257 258 259 260103a0.39b216 0.33 213 0.85 212 0.78 109 0.78 108 0.85 258 0.38 257 0.74 216 0.74 215 0.38 213 0.56* 212 0.47*101 0.27 257 0.32 259 0.56* 108 0.51 108 0.49 259 0.56* 221 0.22 106 0.33 106 0.32 269 0.32 108 0.56* 109 0.43*269 0.26 103 0.26 109 0.51 213 0.51 213 0.49 109 0.51 112 0.20 258 0.31 255 0.30 217 0.31 109 0.47 259 0.25257 0.26* 101 0.22 212 0.49 259 0.47 260 0.47* 212 0.49 222 0.18 269 0.29 258 0.30 216 0.31 212 0.45 108 0.21216 0.24* 105 0.21 278 0.23 260 0.43* 259 0.45 278 0.21 220 0.17 255 0.29 269 0.28 257 0.30 260 0.25 213 0.21271 0.22 255 0.21 260 0.21 268 0.37 268 0.36 260 0.20 252 0.17 103 0.25 103 0.27 103 0.25 268 0.24 268 0.17104 0.22 258 0.20* 96 0.21 307 0.28 307 0.28 268 0.20 208 0.17 105 0.24* 105 0.26* 255 0.25 284 0.20 258 0.14255 0.21 217 0.18 268 0.20 256 0.21 256 0.21 96 0.19 218 0.16 101 0.21 101 0.22 222 0.24 278 0.18 253 0.13Numbers in bold type denote correlations between nucleotides in close proximity in the 3D model of Michel & Westhof (1990).Correlations corresponding to a secondary structure base-pair are underlined, while correlations corresponding to proposed base-triplesin the group I intron 3D model (Michel & Westhof, 1990) are denoted with an asterisk.aGroup I intron position number based on T. thermophila reference numbering. Previously proposed base-triples are: (108·213)·259,(109·212)·260, (215·258)·106 and (216·257)·105.b(M(i,j)) correlation value.correlation 25/45 (a base-triple) is not within thetop eight correlates, ranking at number 31 in thecorrelations involving position 25 (not shown).Similar effects are observed in the group I introns(Table 2).A second important observation concerns thenetwork of correlations linking most nucleotides inthe vicinity of the tRNA base-triples. Signiﬁcantcorrelations between unpaired positions wererecorded earlier with smaller tRNA datasets (Olsen,1983; Haselman et al., 1988), and in a more recentstudy (Gutell et al., 1992). These ‘‘cross-correlations,’’indicated by boldface numbers in Tables 1 and 2,involve consecutive or non-interacting positions,such as 11/12, 22/23, 9/46 and 9/12 in tRNA or109/108, 109/259, 212/213 and 106/105 in group Iintrons, spanning the entire triple-helical regions inboth RNAs. These correlations have values of thesame order of magnitude as the main secondarystructure correlations. This contrasts signiﬁcantlywith what is usually observed in helical positions.Typical Watson–Crick positions (see, e.g. tRNAposition 2 in Table 1, or Figure 3 of Gutell et al., 1992)display a difference of one order of magnitudebetween the ﬁrst and second highest correlations,and rarely show correlations with neighboringpositions. This analysis thus raises two questionsregarding base-triples. (1) Why do positions involvedin base-triples have correlation values that are lowerthan secondary structure positions, and (2) whywould a triple-helical region display networkedcorrelations?Why are sequence correlations weaker inbase-triples?Base-triples do not demonstrate covariation as dosecondary base-pairsComparative analysis searches for a commonstructure by identifying compensatory changes, orcovariation. This principle applies itself very wellto the detection of Watson–Crick pairs: in orderto preserve the Watson–Crick conformation,mutations must occur in a compensatory fashion,which results in four prominent sequence patterns(A·U, U·A, G·C or C·G). Each base type in a positionis associated with a distinct base type in a secondposition, and vice versa. Even when a signiﬁcantincidence of G·U or other non-canonical pairs isobserved, the existence of a secondary base-pairusually remains unambiguous (Gutell et al., 1994).Considering triple sequences in tRNA (Figure 3) andgroup I introns (Figure 4), we ﬁnd there is no strictcovariation between the secondary structure base-pair and the third position. In the tRNA triple(12·23)·9 (Figure 3b), there is covariation between thesequences (U·A)·A and (G·C)·G, but this covariationis obscured by the presence of several non-compen-satory changes. For example, an A at position 9 isassociated with several different Watson–Crick pairsat position 12·23. Similarly, a G·C pair at position10·25 (Figure 3a) is associated with all four basesat position 45. The other triples in tRNA and thegroup I intron also display signiﬁcant levels ofuncorrelated changes (Figures 3c and 4). Theseobservations lead us to ask why base-triples lack thestricter patterns of covariation observed in secondarystructure base-pairs. This question can be answered,at least in part, by an observation of base-triplestructures.A perfect triple isomorphism is possible in theabsence of base covariationThe interaction of a Watson–Crick pair with a thirdbase occurs through different types of non-canonicalinteractions, such as the Hoogsteen pairing. Incontrast to Watson–Crick pairs, these tertiaryinteractions can retain an identical conformationafter a unilateral mutation. For example, theHoogsteen-like A9·A23 base-pair present in the
JMB—MS 440Identiﬁcation of RNA Triples30(12·23)·9 triple of yeast tRNAPhecan be converted toa G9·A23 base-pair, which occurs in some tRNAs(Figure 3b), while retaining the same conformation(Figure 5a) (Klug et al., 1974). Among the multiplenon-canonical pairs that can be constructed with oneor two hydrogen bonds, there are several ways offorming a unique conformation while modifyingeither base in the pair. Numerous base-tripleconformations can thus be maintained throughnon-compensatory mutations.Base-triples vary in structure and positionThe available tRNA crystal structures reveal morestructural heterogeneity in base-triples than insecondary structure base-pairs. Figure 1 shows thebase-triples forming in four tRNA crystal structures.The yeast tRNAPhebase-triples (described above) areshown in Figure 1a. Base-triples in E. coli tRNAMetf(Woo et al., 1980) and yeast tRNAMeti (Basavappa &Sigler, 1991) do not differ signiﬁcantly from thoseof yeast tRNAPhe(data not shown). However, inyeast tRNAAsp(Dumas et al., 1985) (Figure 1b), abase–sugar interaction that formed between pos-itions 14 and 21 in tRNAPheis converted into abase–base interaction, creating a (8·14)·21 base-triple.On the basis of the electron density map, there is noevidence for the triples 45·(10·25) and (13·22)·46 inthe E. coli tRNAGlncomplexed with its cognatesynthetase (Rould et al., 1989; V. Rath & T. A.Figure 1. Tertiary base/base interactions in 4 tRNA crystal structures, mapped onto the yeast tRNAPhesecondarystructure. Continuous lines, base-triples; broken lines, other tertiary base/base interactions. Sugar–phosphate backboneinteractions are not shown. a, Yeast tRNAPhe; b, yeast tRNAAsp; c, E. coli tRNAGln; d, E. coli tRNASer. Insertions (+) anddeletions (r) relative to: a, yeast tRNAPhe; and b, yeast tRNAAsp, r48; c, E. coli tRNAGln, r17; d, E. coli tRNASer, r17, +19a,+20a, +47a, 47b, to 47q. AA, amino acceptor stem; TCC, TCC stem and loop; D, D-stem and loop; AC, anticodon stemand loop; V, variable loop.
JMB—MS 440Identiﬁcation of RNA Triples 31Figure 2. Core secondary structureand triple interactions in the T.thermophila group I intron. Triples areindicated by bold lines. Filled circlesdenote triples and other positionsdiscussed in the text. The 2putative triples, (220·253)·255 and(110·211)·305 are shown by a thickerbroken line. The representation isformatted as proposed in Cech et al.1994.Steitz, personal communication) (Figure 1c). Withinthis same complex, the distance between the pair12·23 and position 9 suggests that this triple also doesnot form. Instead, a base-triple forms at positions45·(13·22), resulting in a local conformation differentfrom that of tRNAPhe(Figure 5b) (V. Rath & T. A.Steitz, personal communication). Alignment errorsare an unlikely cause of this important difference,since both E. coli tRNAGlnand yeast tRNAPhehave avariable loop of ﬁve nucleotides, and the basessurrounding the variable triple are positionedsimilarly in both three-dimensional structures.Different triples also form in E. coli tRNASer(GGA)complexed with seryl-tRNA synthetase (Biou et al.,1994) (Figure 3d). In this type II tRNA, all thetRNAPhetriples are absent, while other base-triplesform at positions (8·14)·21, 20·(15·48) and 9·(13·22)(two insertions in the D-loop of this tRNA are giventhe numbers 20a and 20b by Biou et al. (1994), whilethey are numbered 19a and 20a in our alignment; thetriple noted 20a·(15·48) by these authors is thus20·(15·48) here).This comparison of tRNA structures is mostilluminating. Of the six available tRNA crystalstructures, four are different with respect totheir base-triples. Some of the observed variationsinvolve only small conformational changes (e.g.the formation of the (8·14)·21 triple), and somemight result from the formation of the tRNAsynthetase complex (in tRNAGlnin particular),but the fact remains that base-triples can formdifferently even among tRNAs of the samemorphological family (e.g. type I tRNAs). There-fore, even if triple sequences demonstrated covaria-tion, this would not always involve the same pairsof positions, and would therefore be poorlydetected. This is another important explanation forthe relatively low correlations observed at base-triples.Analysis of the network of correlations aroundbase-triplesWe have shown that tRNA and group I intronspresent networked sequence correlations in thevicinity of base-triples. If these correlations, whichwe will also refer to as ‘‘neighbor effects’’, arespeciﬁc to base-triples, they will constitute a usefulinstrument for triple identiﬁcation.The previous observation of triples involvingdifferent positions in different molecules couldexplain some cross-correlations, for example 45/10and 45/13, which could result from alternativeinteractions in tRNAPheand tRNAGln. However,this does not explain correlations betweendifferent pairs of the same helix (e.g. 11/12, 12/13,12/22 and 22/23 in tRNA), which constitutethe majority of cross-correlations. From a closerobservation of structure variations in base-triples,we explain below how these cross-correlations mightresult from ‘‘compensatory’’ mutations involving notonly paired bases but also adjacent bases in a tripleregion.All the sequence combinations observed at a givenbase-triple position cannot, in general, adopt anidentical triple conformation. For example, when thethird triple position changes from a pyrimidine to a
JMB—MS 440Identiﬁcation of RNA Triples32Figure 3. Sequences observed at base-triple positionsin type I tRNA. Positions shown are those of yeasttRNAPhetriples. a, 45·(10·25) triple; b, (12·23)·9 triple; c,(13·22)·46 triple. Only values greater than 5 are shown.Numbers in bold type represent more than 10% of thetRNAs.The formation of a triple helix such as that in thetRNA D-stem is a highly cooperative processinvolving a complex network of ion binding, stackingand van der Waals’ interactions (Bina-Stein & Stein,1976; Holbrook et al., 1977). Therefore, a structuralchange such as the one shown in Figure 5c mightadversely affect neighboring base-triples, and thus‘‘compensatory’’ mutations may be required topreserve the conformational or energetic propertiesof the triple helix. In this case, a mutation in the thirdbase of an adjacent triple can be as appropriateas further mutations in the same triple, since asingle change in the ﬂanking position can directlycompensate for the backbone displacement. Wetherefore propose that ‘‘compensatory’’ mutations ina triple helix involve nucleotides in different stackingplanes as well as within the planes. Such mutationscould propagate through the triple helix and createthe multiple correlations observed. If this hypothesisis conﬁrmed, the presence of cross-correlationswould be indicative of triple helix formation.An alternative explanation for the presence ofnetworked correlations could be the involvement ofthe correlated positions in a common RNA identityelement. In other words, nucleotides of a base-tripleregion could be selected as a whole in order tomaintain the speciﬁcity of the RNA with respect toa certain biological process, such as interaction witha speciﬁc protein, thus creating correlations betweennon-interacting bases. Although a few identityelements have been localized into the base-tripleregion of tRNA (Pu¨tz et al., 1991; Smith & Yarus, 1989;McClain, 1993a), we do not believe they are animportant source of networked correlations, for thefollowing reasons. First, cross-correlations are muchhigher in the D-stem than in any other part of themolecule (Gutell et al., 1992), although importantidentity sites are present elsewhere (Hou &Schimmel, 1989; McClain et al., 1991). Second,cross-correlations in the group I intron are alsohigher in the triple region (stems P4 and P6) than inany other part of the molecule (see analysis below).Finally, a recent experimental study (Hou, 1994)demonstrated that mutations in the tRNA triples(8·14)·21 and (13·22)·46 had major effects on thestructure of the 15·48 pair. This shows that largephysical constraints exist in this triple region that donot result from tRNA identity. Although we cannotexclude the possibility that identity elementscontribute to cross-correlations in base-triple re-gions, there are better reasons for correlations to becaused by base-triples or other complex foldingpatterns.We will now concentrate on two types ofcross-correlations. First, the interdependence of allthree bases in a triple produces correlations betweeneach position of the secondary structure base-pairand the third base of the triple (see Tables 1 and 2).Therefore, directly measuring the correlation be-tween secondary structure base-pairs and single-stranded bases (base to base-pair correlation) isexpected to produce a stronger signal than the usualpairwise correlations. A second type of interestingpurine, it is not always possible to build a triple thatwould accommodate the bulkier residue withoutsigniﬁcantly displacing the sugar backbone of thethird nucleotide. In the (108·213)·259 triple in thegroup I intron, most species have a pyrimidine atposition 259 (Figure 4a), and these can all be foldedinto a conformation very similar to that shown for the(C·G)·C triple in Figure 5c (Michel et al., 1990).However, a change from (C·G)·C to (C·G)·G,which occurs naturally, requires a relatively largedisplacement of the sugar backbone of G259, asshown in Figure 5c (Michel et al., 1990). We expect toobserve such variations in most RNA triples, sinceboth purines and pyrimidines generally occur as thethird residue (see Figures 3 and 4). Even whensequence variations maintain a purine or apyrimidine as the third base, hydrogen bondingconstraints can prevent the conservation of anidentical structure. We thus expect to observewidespread conformational variations of the typeshown in Figure 5c.
JMB—MS 440Identiﬁcation of RNA Triples 33Figure 4. Sequences (seqs) observedat base-triple positions in group Iintrons. Positions shown are those ofthe suggested T. thermophila triples.a, (108·213)·259; b, (109·212)·260;c, (215·258)·106; d, (216·257)·105. Onlyvalues greater than 2 are shown.Numbers in bold type represent morethan 10% of the introns.correlation occurs between adjacent base-pairs(base-pair to base-pair correlation). Assuming thistype of correlation is characteristic of base-triples, itsidentiﬁcation should also help predict triples. In thefollowing sections, we propose methods to quantifythese two types of correlations.Figure 5. Alternative structures ofhomologous base-triples in differentRNAs. a, Base-triple (12·23)·9 in yeasttRNAPhe, and a possible conformationfor the same triple after an A9 toG9 mutation. b, Triples forming withbase-pair 13·22 in yeast tRNAPhe(Quigley & Rich, 1976) and E. coliRNAGln(V. Rath & T. A. Steitz, personalcommunication). c, Proposed structureof group I intron triple (108·213)·259forming with C259 or G259 (Michelet al., 1990).
JMB—MS 440Identiﬁcation of RNA Triples34Table 3Base to base-pair correlation (x2) and neighbor effects (N) in type I tRNAs (895 sequences)Best Suggested cause ofcorrelates Pair x2best nt x2 aNbcorrelationc,d+ 13·22 t 46 100 100 Triple+ 12·23 t 9 76 64 Triple31·39 t 36 69 20 Id (Yarus, 82)3·70 t 35 61 16 Id tRNAAla(Hou & Schimmel, 1989)51·63 : 36 41 9+ 11·24 : 36 41 57 Id tRNATrp(Hisch, 1971)13·22 9 45 40 100 Triple (E. coli tRNAGln)1·72 : 35 35 16 Id tRNAGln(Rould et al., 1989)+ 10·25 : 45 27 27 Triple (yeast tRNAPhe, tRNAAsp)1·72 9 73 26 16 Id tRNAGln(Rould et al., 1989)15·48 : 35 24 NA Id tRNACys(Hou et al., 1993)27·43 : 36 22 14 Id tRNATrp(Shultz & Yarus, 1994)30·40 : 36 20 13 Id (Yarus, 1982)Correlations are ranked by the x2value. Only those correlates within 20% of the highest x2value arelisted. The best correlates are identiﬁed with a plus (+) in the ﬁrst column (see Table 7).a% of highest value.bNeighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having noneighbor in the secondary structure.cId, identity element possibly responsible for the correlation. When several tRNAs have identityelements matching the correlated positions, only one is cited as an example.dReferences are given for identity elements only.Inferring Triple InteractionsBase to base-pair correlations: identiﬁcation ofpotential triplesOur goal here was to consider base-pairs as singlevariables, and directly compute the correlationsbetween base-pairs and single-stranded positions.These base to base-pair correlations can be evaluatedusing a x2test, replacing the usual 4 × 4 contingencytable with a 16 × 4 contingency table that comparesthe four possible sequences for the single positionwith the 16 possible sequences for the base-pair.Such a table is similar to those in Figures 3 and 4. Ax2test can be performed using equation (1).However, large contingency tables increase theprobability of having empty or almost empty cellsthat can strongly bias x2values. To remedy thisproblem, we subdivided the 16 × 4 table into severalsub-tables. This method is an alternative to thatproposed by Olsen (1983) to address the same prob-lem. For each row M and column N in the original16 × 4 table (T), we create a 2 × 2 table of the form:T(M,N) Si = 1,4T(i,N) −T(M,N)Si = 1,4;j = 1,16T(i,j) −Sj = 1,16T(M,j) − Si = 1,4T(i,N) −T(M,N) Sj = 1,16T(M,j) +T(M,N)Values from table T are compressed in the new2 × 2 table, so that one of the cells contains T(M,N),two other cells contain the sums of the remainingvalues of row M and column N, and the last cellcontains the sum of all remaining values in table T.Such tables are generated for each value of M and N.Values of x2are then computed for all sub-tables,except those having expected values smaller than 5in any of their cells. The highest x2value generatedis kept as the ﬁnal correlation value.To simulate an application of the method tobase-triple prediction, we computed only corre-lations between known secondary structure base-pairs and unpaired positions. In tRNA, tertiaryinteractions predicted by pairwise comparativeanalysis (15·48 and 26·44; Gutell et al., 1992) were alsoincluded as base-pairs, so that triples involving thesepairs could be detected. An application of thisprocedure to the type I tRNA alignment yields theresults shown in Table 3.The following criteria were used to establish thesigniﬁcance of correlations. Since x2values do nothave an upper limit and vary with the number ofsequences considered, we did not use absolute x2values, but the percentage of the highest valueencountered in the whole analysis. The highestcorrelation observed in a given molecule thus takesa value of 100. A cut off point for the signiﬁcance ofcorrelations was then chosen empirically, based onknown base-triples. All known tRNA and group Iintron triples (see below) have a x2value greaterthan 25% of the highest value. We thus tentativelyconsidered correlations in this range as signiﬁcant (acutoff of 20% is used in Table 3 in order to showadditional correlations, which will be discussedbelow).A second selection criterion was introduced totreat base-pairs that had signiﬁcant correlations(>25%) with several single-stranded positions. Tosolve this problem, we performed the correlation
JMB—MS 440Identiﬁcation of RNA Triples 35analysis in two directions: we sought the single-stranded positions that best correlated with eachbase-pair, and we sought the base-pairs that bestcorrelated with each single-stranded position. Whena base-pair and a single-stranded position aremutually best correlates (hereafter termed ‘‘recipro-cal correlates’’), they are indicated with a doublearrow in Table 3. When either the base-pair is the bestcorrelate of the single-stranded position, or thesingle-stranded position is the best correlate of thebase-pair, the correlation is indicated with a singlearrow towards the best correlate. When neither thebase-pair nor the single-stranded position is the bestcorrelate, the correlation is not shown. This methodconsiderably reduces the number of correlationsshown for each position. These two criteria are usedin all subsequent analyses.In type I tRNAs, the two triples (13·22)·46 and(12·23)·9 can now be predicted with conﬁdence. Theyboth display high and reciprocal correlations. Thebest correlate of position 10·25 is 45, as expected fromtriples forming in yeast tRNAPhe. However, thisrelationship is not reciprocal: the best correlate of 45is 13·22, which rather reﬂects the E. coli tRNAGlnsituation, where the triple involves positions45·(13·22). Since there is a precedent for base 45 toform triples with at least two different base-pairs,correlations results like this are expected.Other high and reciprocal correlations are(31·39)/36 and (3·70)/35. Interestingly, these seem-ingly false positives are not artifacts. The pair 3·70 isan important identity element for the aminoacylationof alanine tRNAs of several organisms (McClain &Foss, 1988; Hou & Schimmel, 1989) and of variousother tRNAs (reviewed by McClain, 1993b), so it isnot surprising that it varies in concert with position35, at the center of the anticodon, and thereforenecessarily associated with tRNA identity as well.The (31·39)/36 correlation was identiﬁed by Yarus(1982) in the ‘‘extended anticodon’’ hypothesis,which states that several positions in the anticodonstem and loop are selected as a block to confer on thetRNA an optimal coding accuracy. It is alsonoteworthy that 36 is the best correlate of pair 27·43,which reﬂects a recent experimental associationbetween these two sites in the control of translationby tRNATrp(Schultz & Yarus, 1994). Other ‘‘falsepositives’’ can be related to known tRNA identityelements (see references in Table 3), suggesting thatthis method is identifying biologically meaningfulassociations.The recent determination of a tRNASercrystalstructure (Biou et al., 1994) provides us with valuablebase-triple information about type II tRNAs.We applied the same correlation analysis to this classof tRNAs to determine whether the two triplesforming in tRNASerat positions 20·(15·48) and9·(13·22) could be detected (there is not enoughvariation in basepair 8·14, also involved in a triple,to seek correlations involving this pair). Table 4presents the highest base to base-pair correlationsobserved in an alignment of 262 type II tRNAs(comprising serine, leucine and certain tyrosinetRNAs). In excellent agreement with the crystallo-graphic data, the only ‘‘reciprocal correlates’’ inTable 4 are observed at positions involved inbase-triples in E. coli tRNASer. The triple 20·(15·48)has the highest overall x2value, and the triple9·(13·22) has a x2value above the threshold ofsigniﬁcance deﬁned previously, albeit relatively lowTable 4Base to base-pair correlation (x2) and neighbor effects (N) in type II tRNAs(262 sequences)Best Suggested cause ofcorrelates Pair x2best nt x2 aNbcorrelationc+ 15·48 t 20 100 NA Triple15·48 9 21 94 NA Neighbor effect15·48 9 59 91 NA+ 12·23 : 21 90 6412·23 9 15 86 6412·23 9 48 82 6412·23 9 20 79 6415·48 9 35 57 NA Id? (as in tRNACys)3·70 : 35 37 16 Id? (as in tRNAGln)12·23 9 73 36 642·71 9 20 33 116·67 t 15 30 132·71 9 59 28 1127·43 t 35 28 14 Id? (as in tRNATrp)27·43 9 36 28 14 Id? (as in tRNATrp)+ 13·22 t 9 27 100 Triple6·67 9 37 26 13Correlations are ranked by the x2value. Only those correlates within 25% of thehighest x2value are listed. The best correlates are identiﬁed with a plus (+) in the ﬁrstcolumn (see Table 7).a% of highest value.bNeighbor effects computed according to eqn (3) (% of highest value). NA,base-pairs having no neighbor in the secondary structure.cSee footnotes to Table 3.
JMB—MS 440Identiﬁcation of RNA Triples36Figure 6. Sequences observed at positions formingbase-triples in E. coli tRNASer. a, Triple (13·22)·9. b, Triple(15·48)·20. Only values greater than 2 are shown. Numbersin bold type represent more than 10% of the tRNAs.results. These mutations are particularly interesting,since they are probably not the result of a fortuitousancestral event, but instead they more likely reﬂect‘‘neutral’’ changes between functionally equivalentsequences. To help identify more base-pair andbase-triple interactions with comparative analysis,we need to determine the number of times theseconcerted mutations have occurred throughout theevolution of the RNA under study. The larger thenumber of such phylogenetic events, (e.g. concertedmutations over evolutionary space), the moresigniﬁcant that correlation is, and thus the moreconﬁdent we are that the positions of interest arephysically interacting. This general concept wasintroduced a number of years ago, and was utilizedto reinforce our case for some of the ﬁrst proposedbase–base tertiary interactions in 16 S rRNA (Gutellet al., 1985). This type of observation, essential incorrelation analyses, requires knowledge of thephylogenetic relationships among the sequencesunder study. For tRNA, these relationships areunclear. On one hand, all tRNAs interact with theribosome and its factors; thus, they are all under thiscommon constraint; changes in their sequence in theevolutionary dimension will be neutral. On the otherhand, tRNA sequences within each acceptor familyare constrained by a speciﬁc synthetase recognitionfunction. Thus, tRNAs have at least two mutationaldimensions, which obscure their phylogenetic his-tory (for a more detailed assessment of this issue,please see: Ninio, 1982; Cedergren et al., 1981). Incontrast, a molecule such as the 16 S rRNA has thesame function in all organisms. Its phylogeny, and thephylogeny of the cells in which these 16 S rRNAsexist, is well deﬁned (Woese, 1987). Thus, compara-tive studies can determine with more conﬁdence thenumber and nature of the concerted mutations thathave occurred throughout a phylogenetic tree,allowing us to pinpoint mutations that occurredbetween closely related RNAs. These changes arethe most likely to be ‘‘neutral’’.Group I introns are mobile elements with a fastevolutionary clock, and therefore their phylogenycannot be deﬁned as well as that of rRNAs. However,there is no known variety of functions in group Iintrons that would impede the construction of a treeas it does for tRNAs. Since a consistent (althoughimperfect) classiﬁcation of group I introns isavailable (Michel & Westhof, 1990), we can search forsigniﬁcant phylogenetic events more rigorously thanwe did for tRNAs. In this section, we implement asimple method to count mutations, and use its resultsto strengthen base-triple prediction in the group Iintrons.Our phylogenetic event counting was performed asfollows. Group I intron sequences in the alignmentwere classiﬁed into phylogenetic groups as de-scribed in Materials and Methods. For each potentialtriple position (i·j)·k (i and j being base-pairedand k single-stranded in the secondary structure),changes are counted as aligned sequences, areexamined from the ﬁrst sequence to the last; c equalsthe number of times a change is observed at i,j or k(27% of the highest value). No signiﬁcant corre-lations are detected for canonical type I tRNA triples,in agreement with crystal and solution studies,which suggest that these triples are absent in type IItRNAs (Biou et al., 1994; Dock-Bregeon et al., 1989;Dietrich et al., 1990; Baron et al., 1993).The sequences observed at position (15·48)·20 and9·(13·22) in the type II tRNA dataset are shown inFigure 6. (The base-triples in yeast tRNASerare(G15·C48)·U20 and G9·(G13·A22).) The correlation(15·48)/20 is due primarily to an association of A20with A15·U48, and an U or C at position 20 withG15·C48 (Figure 6a). Analysis of the alignmentreveals these three principal sequences are present inall type II isoacceptor groups (data not shown),suggesting that concerted changes have occurredseveral separate times through evolution. Thesigniﬁcance of a correlation is considerably increasedwhen multiple concerted changes are observedindependently, as they are here (Gutell et al., 1985).The correlation (13·22)/9 is primarily due to anassociation between A9 and A13·A22 (Figure 6b).Although the correlation is relatively weak, con-certed changes yielding sequence A9·(A13·A22)occur in all type II isoacceptor groups, and evenamong isoacceptor tRNAs from the same organism(data not shown). This again indicates that thecorrelation is very signiﬁcant.Phylogenetic Event CountingIn the previous section, we mentioned a fewconcerted mutations occurring in closely relatedtRNAs to help support some of our correlation
JMB—MS 440Identiﬁcation of RNA Triples 37and e equals the number of times a change isobserved at (i or j) and k (i.e. a concerted changebetween the base-pair and the third position). Theratio e/c is the proportion of mutual changes over thetotal number of changes, and is our measure ofphylogenetic events. As noted earlier, the detail wecan decipher with correlation analysis is enhancedby incorporating phylogenetic event information intoour algorithm. For this paper we have not sought acomplete solution, since that would entail a betterappreciation of the phylogenetic relationships of theRNAs under study, and better knowledge of how tovalue mutual changes that occur between distantlyand closely related organisms. For the purposes ofthis article we have developed a simple methodwhich assumes that the sequences are roughlyordered by their phylogenetic relationships, andtreats all mutual changes as equivalent. Therefore, alarge number of mutual changes within closelyrelated RNA sequences will increase the e/c valuemore than a few mutual changes between distantlyrelated RNA species. For our immediate needs, thisapproximation works well, as we will see.Results of the combined x2and e/c analysis ofgroup I intron sequences are presented in Table 5.Here the x2analysis was performed ﬁrst, followed byan e/c analysis for each signiﬁcant x2base-pair/basecorrelate. An asterisk in Table 5 denotes those triplecorrelations that score the highest with the e/canalysis.Among the three highest x2reciprocal correlatesare (109·212)/260 and (108·213)/259, correspondingto the two proposed base-triples in the P4 stem(Michel et al., 1990). Both of these triples are alsostrongly supported by our phylogenetic countingmethod. However, the proposed P6 stem triples(215·258)·106 and (216·257)·105 are not accuratelypredicted. Of these two previously proposed triples,position 105 correlates best with the pair 216·257with x2analysis, and the e/c analysis associates thepair 216·257 with position 105. The highest x2correlations for the P6 base-pairs are (216·257)/106 (areciprocal correlate) and (215·258)/103. The(215·258)/106 correlation is in the signiﬁcant range(38% of the highest value), but it is not shown inTable 5 because better correlations involvingpositions 106 and 215·258 exist (see above). There areseveral possible explanations for these apparentinaccuracies. First, the P3/P4 junction (positions103 to 106) generally varies in size from three toﬁve bases, and there are a few examples of aninsertion of several hundred bases. Thus, thesequences in this region cannot be aligned withabsolute conﬁdence. Until more sequence infor-mation suggests otherwise, we have justiﬁed the twounpaired 3 P3/P4 nucleotides toward the P4 stem,Table 5Base to base-pair correlation (x2) and neighbor effects (N) in group I introns (222 seqs)Best x2+ e/c Suggested causecorrelates Pair bestant x2 bNcSequences of correlationd+ 109·212 * t * 260 100.0 67 222 Triple+ 262·312 * t * 263 97.4 NA 222 Triple?+ 108·213 * t * 259 95.8 75 222 Triplee216·257 t 106 61.2 100 221 Neighbor effect+ 110·211 * t * 305 56.9 36 222 Triple?f280·298 t 279 52.9 39 210268·307 : 279 49.3 2 215+ 216·257 9 * 105 48.7 100 222 Triple215·258 t 103 46.0 100 220 Neighbor effect268·307 9 256 40.3 2 18297·277 : 279 40.1 23 183285·293 : 256 38.2 30 161107·214 : 260 36.4 78 222 Neighbor effect215·258 9 269 35.4 100 187+ 220·253 * t 255 33.0 15 161 Triple?216·257 9 101 32.3 100 221215·258 9 217 29.6 100 119 Neighbor effect111·209 : * 305 29.4 43 222 Neighbor effect?109·212 9 304 29.2 67 222 Neighbor effect?102·272 : 263 27.8 NA 221286·292 : 300 26.9 18 150102·272 9 270 25.2 NA 189Correlations are ranked by the x2value. Only those correlates within 25% of the highest x2value arelisted. The best correlates are identiﬁed with a plus in the ﬁrst column (see Table 7).ax2best correlate noted with arrows; best e/c ratio noted with *.b% of highest value.cNeighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having noneighbor in the secondary structure.dReferences in text.eThe best base e/c correlates for the (108·213) base-pair is shared by positions 259 and 302.fThe best base-pair e/c correlates for position 305 are shared by (110·211) and (97·277). The alignmentin the vicinity of the (97·277) base-pair is questionable due to length variation of the P3 helix. Thus webelieve the best correlation is between (110·211) and 305.
JMB—MS 440Identiﬁcation of RNA Triples38Figure 7. Sequences observed atvarious correlating group I intronpositions. a, (262·312)/263. b,(110·211)/305. c, (280·298)/279. Onlyvalues greater than 2 are shown.Numbers in bold type representmore than 10% of the introns.while the other P3/P4 nucleotides are justiﬁedtoward the P3 helix. Alternative decisions could haveproduced signiﬁcantly different correlations fornucleotides 103 to 106. Other potential problems inthe identiﬁcation of P6 triples were raised by a recentNMR study (Chastain & Tinoco, 1993), whichsuggested that P6 triples involved base/sugarinteractions, and varied signiﬁcantly in structureupon sequence change. In contrast to base/baseinteractions, base/sugar interactions could producesequence constraints in which correlations betweenadjacent bases become predominant, thus possiblyexplaining these unexpected correlations. Finally, it isalso possible that very high neighbor effects couldrelegate the actual triple correlations to secondposition.Other reciprocal correlates having high x2valuesare present. One involves positions 262·312 and 263.This triple correlation is also strongly identiﬁed withour phylogenetic event-based method (see Table 5).Figure 7a shows the sequences observed at(262·312)·263. C263 is associated with A262·U312,while A263 is associated with G262·C312 orC262·G312. Our observations of the alignmentreveals several concerted mutations occurring amongclosely related introns, particularly within sub-groups IB1 and IC1 (data not shown). The base-pair262·312 is itself well supported, with multipleindependent covariations observed (data not shown).This strong correlation can be interpreted in variousways. On the basis of a three-dimensional modelingstudy of the intron guanosine binding site (Yaruset al., 1991), it was proposed that these threenucleotides form a base-triple. Alternatively, it hasbeen suggested, from experimental mutagenesisstudies, that this sequence constraint is necessaryto ensure that the nucleotide at position 263 isbulged out of this helix, and is not base-pairedto position 312 (Couture et al., 1990). Note thatwhen position 263 is a C, the 262·312 base-pair isan A·U or U·A. When 263 is an A, 262·312 is a G·Cor C·G. Thus, position 263 is not able to form astandard Watson–Crick pair with 312. This hypoth-esis suggests that the triplets (U·A)A and (G·C)Cshould also be found, which has not been the case todate. We favor the suggestion by Yarus that abase-triple interaction forms between these pos-itions.A second correlation, (110·211)/305, is supportedby the e/c study and a high x2reciprocal correlation.This correlation is particularly interesting, since itinvolves nucleotides spanning two distantdomains of the group I intron, namely the P3/P7 andP4/P6 coaxial stems. The correlation resultsprimarily from an exchange between the sequencepatterns (A·U)·C and (G·C)·U. This correlation,unlike many of the others, occurs in its purest formin the subgroups 1A and 1D (Figure 7b), althoughcovariation between these triplets is found in theother subgroups, albeit intermixed with non-con-verted variations (data not shown). The correlation(110·211)/305 was identiﬁed previously using asmaller dataset (Michel & Westhof, 1990) but wasdisregarded on the ground of steric conﬂicts with theP4 triples. However, more recent experimental data(Pyle et al., 1992) have suggested interactionsbetween the P1 stem and the J7/8 strand that shiftJ7/8 towards the P4 stem, and thus reduce thedistance between nucleotides 110·211 and 305.Adjusting the current three-dimensional model totake these new data into account could suggestalternative ways to form a (110·211)/305 interaction,and perhaps resolve the steric conﬂicts. In addition,two other correlations in Table 5, (111·209)/305 and(109·212)/304, resemble the neighbor effects thatcould be expected in the presence of a (110·211)·305triple.The other reciprocal correlations in Table 5 are(280·298)/279 and (220·253)/255. The ﬁrst is notsupported by the e/c analysis, and thus we do notconsider it a credible triple candidate. The otherreciprocal correlation, (220·253)/255, is supported
JMB—MS 440Identiﬁcation of RNA Triples 39by a signiﬁcant number of coordinated changes,primarily in subgroups IC1 and IC2. The number ofnucleotides between positions 253 and 257 isvariable. Thus it is difﬁcult to align these unpairednucleotides across all the subgroups with muchconﬁdence. However, within the IC1 subgroup thisnumber is three in almost all cases, while it is alwaysﬁve in the IC2 subgroups, allowing us to obtain areliable local alignment for these two groups. Thesequences observed in these two subgroups areshown in Figure 7c. Formation of a (220·253)·255triple is feasible stereochemically, nucleotide 255being situated in the internal loop ﬂanking the220·253 base-pair.The combined e/c and x2analysis has identiﬁedthree additional base-triple candidates in the groupI introns, namely (262·312)·263, (110·211)·305 in theID and IA subgroups, and (220·253)·255 in the IC1and IC2 subgroups.Base-pair to base-pair correlations:identiﬁcation of neighbor effectsThe identiﬁcation of base-triples requires theability to distinguish between correlations due tophysical interactions and those due to other factors,such as RNA identity or accidental evolutionaryevents. We have suggested that networked sequencecorrelations are characteristic of triple-helix for-mation. We now propose to use this property to helpdistinguish base-triples (at least when present intriple helices) from other correlated positions.A simple method to assess neighbor effects is todirectly measure correlations between base-pairs.For this purpose, we perform a x2test as done in theprevious analysis, the only difference being acontingency table having 16 rows and 16 columns(instead of 16 × 4). The sparseness problem is againresolved here by creating smaller 2 × 2 tables,computing x2in each table, and retaining the highestvalue. A simple measure of the neighbor effect, N,could then involve computing x2for each setof adjacent base-pairs (i,j) and (i + 1,j − 1):N = x2(i,j,i + 1,j − 1). However, since sequence corre-lations also occur between positions separated byseveral base-pairs in the same helical stem (Tables 1and 2), the neighbor effect N at base-pair i,j can bemore accurately measured by averaging correlationsin a window comprising n base-pairs at each side ofi,j, using the following formula:N(i,j) =sk = 1, n(x2(i,j,i + k,j − k) + x2(i,j,i − k,j + k))2n(3)If i 2 n or j 2 n is not a paired position,the corresponding correlation is not computed,and n is corrected accordingly. We use n = 2, andthus evaluate a window of ﬁve base-pairs(from i − 2 to i + 2) surrounding i,j. Figure 8shows results obtained for tRNA and the group Iintron.Figure 8. Neighbor effects measured in equation (3). The density of the dots is proportional to N(i,j), darker dotsrepresenting the highest values and lighter dots the lowest values. Precise N(i,j) values for base-pairs of interest are givenin Tables 3 to 5. a, Type I tRNA. b, Group I intron.(b)(a)
JMB—MS 440Identiﬁcation of RNA Triples40Table 6Sequences observed at group I intron positions (109·212)and (108·213)108·213 : Neighbor effects (N)109·212 A·U U·A C·G G·CA·U — — 26 —C·G 5 7 6 122G·C — — 36 —U·G — — — 5Only values greater than 2 are shown.—Numbers in bold face represent more than 10% of the group Iintron sequences.Combining analyses for base-triple predictionThe various analyses presented here can becombined into a single protocol for base-tripleprediction. The criteria we propose to apply in thisprotocol remain loose at this stage of our work, butwill be reﬁned as the method is applied to otherclasses of RNA. These criteria are presented here.First, we believe good triple candidates shouldscore well in both base to base-pair correlations (x2and e/c) and neighbor effect analysis. A cutoff of25% of the highest value for x2and neighbor effectmeasurements would retain all experimentallyproven triples in tRNA and group I introns. Wetherefore require that values for x2and neighboreffects N (given in Tables 3 to 5) stand above thisthreshold. A measure of phylogenetic events (e/c)being available for group I introns, we require thattriple correlations in the group I intron are associatedto a signiﬁcant level of concerted mutations (at leastone asterisk in Table 5). Finally, to tighten theprediction criteria, we require x2correlations to bereciprocal. The triplets that best satisfy this stringentcriteria are revealed in the ﬁrst row in Table 7.This stringent criterion yields no false positives ineither tRNA family. In type II tRNA, the triple(13·29)·9 is predicted, but a question remains for thetriple (15·48)·20. We cannot use equation (3) tocompute the neighbor effect associated with thistriple, since no secondary base-pair ﬂanks the15·48 pair. However, the strong correlationobserved in Table 4 between 15·48 and 21 could verywell be a neighbor effect. Thus, we tentativelyinclude this triple in Table 7. In type I tRNA, two ofthe three yeast tRNAPhebase-triples are predicted,although 45·(10·25) is not. In group I introns,the previously identiﬁed P4 triples are predicted,along with one experimentally unproven interaction,(110·211)·305. Two triple candidates withIn tRNA and group I introns, helices associatedwith base-triples show signiﬁcantly largerneighbor effects (N, measured as in eqn (3)) thanthose helices with no known base-triples. Toillustrate these strong base-pair to base-paircorrelations, we show in Table 6 the sequencesobserved in group I introns at positions 109·212 and108·213. The base-pair G108·C213 is stronglyassociated with a C·G at position 109·212, whileC108·G213 is associated with A·U or G·C at position109·212.In group I introns (Figure 8b), neighbor effectsare consistent with triple formation in the P4/P6 helices, and are also signiﬁcant at positions110·211, a base-pair having a potential triplepartner (Table 5). However, no signiﬁcant neighboreffect supports the strong triple correlations(262·312)/263 and (220·253)/255. In spite ofthis result, we still support the formation ofbase-triples at these positions, since these tripleswould not be part of an extended triple-helicalregion, which we proposed was necessary for thebase-pairs to have noticeable neighbor effects. Also,base-pairs near 262·312 in P7 are extremelyconserved, and thus limit any base correlation in thisregion.Table 7Triples predicted in tRNA and group I introns based on Tables 3 to 5, using two differentcriteriaCriteria for tRNA tRNAtriple prediction type I type II Group I intronaStringentx2(base to base-pair) > 25% of (13·22)·46 (13·22)·9 (109·212)·260highest value (12·23)·9 (15·48)·20b(108·213)·259N > 25% of highest value (110·211)·305Best reciprocal correlate —(262·312)·263(220·253)·255cRelaxedx2(base to base-pair) > 25% of Same + Same + Same +highest value (11·24)·36 (12·23)·21 (216·257)·105N > 25% of highest value (10·25)·45Each position involved in onlyone triple (not necessarily bestreciprocal correlate)aFor group I intron triples, we use the phylogenetic event count as an additional criterion. Onlyputative triples associated with an asterisk in Table 5 are included.bN cannot be measured for this position, but there is a large cross-correlation at (15·48)/21.cThese 2 putative triples are not supported by neighbor effects, but are best reciprocal correlates andassociated with signiﬁcant phylogenetic events (see discussion in text).
JMB—MS 440Identiﬁcation of RNA Triples 41neighbor effects below the 25% threshold,(262·312)·263 and (220·253)·255, are noteworthy, sincethey satisfy all of our other requirements. While theother group I intron triples would be complexed ina triple-helix formation, these two putative triples areboth isolated from other known base-triples;therefore, they would not be part of a triple helix.Further study is required to determine if this is thereason for their lack of neighbor effects. Until we havethe results from this study, the biologist’s judgementis still necessary to resolve these ‘‘border-line’’ cases.The possible existence of the triple (110·221)·305 hasbeen discussed.The prediction criteria were relaxed by allowingfor non-reciprocal correlations, under the conditionthat no base-pair or single-stranded nucleotidebelongs to more than one triple (Table 7, line 2). Fortype I tRNAs, the triple 45·(10·25) is now predicted.The relaxed criteria also identify the correlation(11·24)/36. We suggest that this unique false positiveresults from a functional linkage between positions24 and 36, on the basis of experiments establishingthat mutations at position 24 affect codon/anticodonrecognition by tRNATrp(Hirsh, 1971; Smith & Yarus,1989). In type II tRNAs, the relaxed criteria identifythe correlation (12·23)/21. Instead of interacting withthe pair 12·23, as this correlation suggests, nucleotide21 faces the pair 8·14 in the type II tRNASercrystalstructure, and is proposed to interact with or facepair 8·14 in other type II tRNA solution structures(Dock-Bregeon et al., 1989; Baron et al., 1993).However, since bases 12·23 and 21 are close in space,we cannot rigorously exclude their interaction incertain type II tRNAs.In group I introns, the relaxed criterion identiﬁesthe triple (216·257)·105, one of the previouslyproposed P6 triples (Michel & Westhof, 1990).Conclusion and PerspectivesOur previous correlation analyses sought corre-lations that occur between two positions in an RNAalignment (Gutell et al., 1992). While these analyseseffectively predicted secondary structure pairing, wehad difﬁculty identifying base-triples with conﬁ-dence. We suggest here two reasons for thisweakness. First, structurally similar base-triples canform between bases that vary in a non-compensatoryfashion, which reduces covariation. Second, base-triples do not necessarily involve the same positionsin all members of an RNA family.With these obstacles in mind, we have developedmethods to enhance our ability to predict base-triples by speciﬁcally seeking correlations betweensecondary structure base-pairs and nucleotidesunpaired in the secondary structure. This signiﬁ-cantly enhances correlations for base-triples. Duringour earlier studies, we also identiﬁed weakercorrelations between many of the bases in the tRNAD-stem. We suggested that these effects could bespeciﬁc to base-triples forming local triple helices.We developed an algorithm that quantiﬁes theseneighbor effects in RNA secondary helices. The mostpronounced effects in tRNA were in the D-helix,while in the group I intron they were in the P4 andP6 helices, the same helices known to be involved intriple formation. The combination of these twocorrelation analyses identiﬁes known base-triplesmore effectively than any previous method.The accuracy of current protocols is limited byheterogeneity within the sequence datasets. Base-triple prediction will remain ambiguous as long asthe dataset analyzed contains RNAs that form triplesin different positions. For example, we are currentlyunable simultaneously to predict triples (13·22)·46and 45·(13·22) in type I tRNAs, since they bothoccur in the analyzed sequences. It should bepossible to isolate subsets of sequences displayingspeciﬁc correlations, and enhance predictions ineach subset. The growth of RNA databases, andthe availability of the algorithms presented herein,will certainly lead us in that direction. Anotherenhancement would be to combine the variousprediction criteria introduced in this study into anautomated protocol. An integration of x2correlationvalues and phylogenetic event counts would beparticularly useful in RNAs with well establishedphylogenetic relationships, such as the ribosomalRNAs.Materials and MethodsSequence alignmentsThe tRNA sequence alignment used was adapted fromSprinzl et al. (1991). We aligned the variable loop (whichwas not aligned in the original database), and removedmitochondrial sequences, leaving 895 type I and 263 typeII nuclear tRNAs, which were analyzed separately. Thegroup I intron alignment contains 222 sequences compiledby S. H. Damberger and R. R. Gutell (unpublished results).Analyses were performed only on the core regioncomprising the stems P1, P3, P4, P6, P6a, P7, P8, a part ofP5 and all intervening single-stranded segments. Intronsequences were classiﬁed into structurally distinctsubgroups (IA, IB, IC and ID) according to the deﬁnitionsof Michel & Westhof (1990). We further subdivided eachsubgroup using these criteria: (1) the sequences withineach subgroup were ordered by the type of gene in whichthe intron was found (e.g. ATP9, SSU rRNA, etc.). (2)The speciﬁc site in that gene where the intron was found(e.g. SSU site 531). (3) Cellular location (e.g. nucleus,mitochondrion, chloroplast) of the intron. (4) A roughphylogenetic ordering of the organisms.Structural dataDetailed base-triple information is available for sixtRNA crystal structures: yeast tRNAPhe(Quigley & Rich,1976; Sussman & Kim, 1976), Escherichia coli tRNAMetf (Wooet al., 1980), yeast tRNAAsp(Dumas et al., 1985), E. colitRNAGln(Rould et al., 1989), yeast tRNAMeti (Basavappa &Sigler, 1991) and Tetrahymena thermophilus tRNASer2 (GGA)(Biou et al., 1994). Although no crystal structures areavailable for group I introns, it has been suggested thattriples form in the P4 and P6 helices (Michel et al., 1990;Michel & Westhof, 1990). The existence of both P4 triplesand one of the proposed P6 triples is supported by
JMB—MS 440Identiﬁcation of RNA Triples42mutagenesis experiments (Michel et al., 1990; Green &Szostak, 1994). There is good evidence for the formation ofbase–base interactions in the P4 triples, but the nature ofthe interactions in the P6 triples remains unclear. NMRexperiments on a model oligonucleotide that partiallyreproduced the P4/P6 domain suggested that tripleinteractions exist in the form of base–backbone contacts(Chastain & Tinoco, 1993). However, the applicability ofthese latter results in the group I intron context is uncertain,given that important parts of the P4/P6 triple domain areabsent from the construct.ProgramsSequence alignments were visualized and manipulatedusing the alignment editor AE2 (T. Macke, The ScrippsClinic, CA) available from the Ribosomal Database Project(Larsen et al., 1993), and studied using a comparativesequence analysis program developed in our laboratory(S. H. Damberger, D. Gautheret & R. R. Gutell,unpublished results). This software computes frequenciesof bases, base-pairs and base-triples, performs pairwisecorrelation analyses using mutual information (Chiu &Kolodziejczak, 1991; Gutell et al., 1992), and computesvarious types of correlations based on x2tests andphylogenetic event counting, as discussed above. Sec-ondary structure graphics were produced using theprogram XRNA (B. Weiser & H. Noller, unpublishedresults).NotationWe adopted the notation (X·Y)·Z to describe a tripleinteraction involving the secondary base-pair X·Y andposition Z, where Z interacts with Y; and we use Z·(X·Y)when Z interacts with X. When interacting nucleotides arenot well established, as in the group I intron, we always usethe notation (X·Y)·Z. We use the term ‘‘base-triple’’ whenonly the bases interact, ‘‘nucleotide-triple’’ when base–backbone contacts are involved, and simply ‘‘triple’’ as thegeneral term. Correlations between positions X and Y arenoted X/Y. The numbering systems used are those of yeasttRNAPheand the T. thermophila group I intron.AcknowledgementsThis work was supported by grants from the NIH(GM48207) and the Colorado RNA Center to R.R.G. Wethank SUN Microsystems for their donation of computerequipment, and the W. M. Keck Foundation for its supportof RNA Science on the Boulder campus. We also thank DrT. Cech for comments on the manuscript, and Drs V. Rathand T. Steitz for sharing information on the tRNAGlnstructure.ReferencesBaron, C., Westhof, E., Bo¨ck, A. & Giege´, R. (1993). Solutionstructure of selenocysteine-inserting tRNASecfromEscherichia coli. J. Mol. Biol. 231, 274–292.Basavappa, R. & Sigler, P. B. (1991). The 3 A˚ crystalstructure of yeast initiator tRNA: functional impli-cations in initiator/elongator discrimination. EMBO J.10, 3105–3111.Bina-Stein, M. & Stein, A. (1976). Allosteric interpretationsof the Mg2 +binding to the denaturable Escherichia colitRNAGlu2 . Biochemistry, 15, 3912–3917.Biou, V., Yaremchuk, A., Tukalo, M. & Cusack, S. (1994).The 2.9 A˚ crystal structure of T. thermophylusseryl-tRNA synthetase complexed with tRNASer.Science, 263, 1404–1410.Cech, T. R., Damberger, S. D. & Gutell, R. R. (1994).Representation of the secondary and tertiary structureof group I introns. Nature Struc. Biol. 1, 273–280.Cedergren, R. J., LaRue, B. & Grosjean, H. (1981). Theevolving tRNA molecule. CRC Crit. Rev. Biochem. 11,35–104.Chastain, M. & Tinoco, I., Jr (1993). Nucleoside triplesfrom the group I intron. Biochemistry, 32, 14220–14228.Chiu, D. K. Y. & Kolodziejczak, T. (1991). Inferringconsensus structure from nucleic acid sequences.Comp. Appl. Biosci. 7, 347–342.Couture, S., Ellington, A. D., Gerber, A. S., Cherry, J. M.,Doudna, J. A., Green, R., Hanna, M., Pace, U.,Rajagopal, J. & Szostak, J. W. (1990). Mutationalanalysis of conserved nucleotides in a self-splicinggroup I intron. J. Mol. Biol. 215, 345–358.Dietrich, A., Romby, P., Mare´chal-Drouard, L., Guillemaut,P. & Giege´, R. (1990). Solution conformation of severalfree tRNALeuspecies from bean, yeast and Escherichiacoli, and interaction of these tRNAs with beancytoplasmic leucyl-tRNA synthetase. A phosphatealkylation study with ethylnitrosourea. Nucl. AcidsRes. 18, 2589–2597.Dock-Bregeon, A. C., Westhof, E., Giege´, R. & Moras, D.(1989). Solution structure of a tRNA with a largevariable region: yeast tRNASer. J. Mol. Biol. 206,707–722.Dumas, P., Ebel, J. P., Giege´, R., Moras, D., Thierry, J. C. &Westhof, E. (1985). Crystal structure of yeast tRNAAsp:atomic coordinates. Biochimie, 67, 597–606.Green, R. & Szostak, J. W. (1994). In vitro genetic analysisof the hinge region between helical elements P5-P4-P6and P7-P3-P8 in the sunY group I self-splicing intron.J. Mol. Biol. 235, 140–155.Gutell, R. R. (1993). Comparative studies of RNA: inferringhigher-order structure from patterns of sequencevariation. Curr. Opin. Struct. Biol. 3, 313–322.Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F. (1985).Comparative anatomy of 16S-like ribosomal RNA.Progr. Nucl. Acid. Res. 32, 155–216.Gutell, R. R., Power, A., Hertz, G. Z., Putz, E. J. & Stormo,G. D. (1992). Identifying constraints on the higher-order structure of RNA: continued developmentand application of comparative sequence analysismethods. Nucl. Acids Res. 20, 5785–5795.Gutell, R. R., Larsen, N. & Woese, C. R. (1994). Lessonsfrom an evolving rRNA: 16S and 23S rRNA structuresfrom a comparative perspective. Microbiol. Rev. 58,10–26.Haselman, T., Chappelear, J. E. & Fox, G. E. (1988). Fidelityof secondary and tertiary interactions in tRNA. Nucl.Acids Res. 16, 5673–5684.Hirsh, D. (1971). Tryptophan transfer RNA as the UGAsuppressor. J. Mol. Biol. 58, 439–458.Holbrook, S. R., Warrant, R. W., Church, G. M. & Kim, S. H.(1977). RNA–ligand interactions: [I] Magnesiumbinding sites in yeast tRNAPhe. Nucl. Acids Res. 4,2811–2820.Hou, Y. M. (1994). Structural elements that contribute to anunusual tertiary interaction in a transfer RNA.Biochemistry, 33, 4677–4681.Hou, Y. M. & Schimmel, P. (1989). Evidence that amajor determinant for the identity of a transferRNA is conserved in evolution. Biochemistry, 28,6800–6804.
Identiﬁcation of RNA Triples 43Hou, Y. M., Westhof, E. & Giege, R. (1993). An unusualRNA tertiary interaction has a role for the speciﬁcaminoacylation of a transfer RNA. Proc. Nat. Acad. Sci.,U.S.A. 90, 6776–6780.Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement ofa GNRA tetraloop in long-range RNA tertiaryinteractions. J. Mol. Biol. 236, 1271–1276.Klug, A., Ladner, J. & Robertus, J. D. (1974). The structuralgeometry of co-ordinated base changes in transferRNA. J. Mol. Biol. 89, 511–516.Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J.,Overbeek, R. N., Macke, T. J., Marsh, T. L. & Woese,C. R. (1993). The ribosomal database project. Nucl.Acids Res. 21 Suppl., 3021–3023.Levitt, M. (1969). Detailed model for transfer ribonucleicacid. Nature (London), 224, 759–763.Major, F., Gautheret, D. & Cedergren, R. (1993).Reproducing the three-dimensional structure of atRNA molecule from structural constraints. Proc. Nat.Acad. Sci., U.S.A. 90, 9408–9412.Malhotra, A., Tan, R. K. & Harvey, S. C. (1990). Predictionof the three-dimensional structure of Escherichia coli30S ribosomal subunit: a molecular mechanicsapproach. Proc. Nat. Acad. Sci., U.S.A. 87, 1950–1954.McClain, W. H. (1993a). Identity of Escherichia coli tRNACysdetermined by nucleotides in three regions of tRNAtertiary structure. J. Biol. Chem. 268, 19398–19402.McClain, W. H. (1993b). Rules that govern tRNA identityin protein synthesis. J. Mol. Biol. 234, 257–280.McClain, W. H. & Foss, K. R. (1988). Changing the identityof a tRNA by introducing a G-U wobble pair near the3 acceptor end. Science, 240, 793–796.McClain, W. H., Foss, K. R., Jenkins, R. A. & Schneider, J.(1991). Rapid determination of nucleotides that deﬁnetRNAGlyacceptor identity. Proc. Nat. Acad. Sci., U.S.A.88, 6147–6151.Michel, F. & Westhof, E. (1990). Modelling of thethree-dimensional architecture of group I catalyticintrons based on comparative sequence analysis.J. Mol. Biol. 216, 585–610.Michel, F., Ellington, A. D., Couture, S. & Szostak, J. W.(1990). Phylogenetic and genetic evidence for basetriple formation in the catalytic domain of group Iintrons. Nature (London), 347, 578–580.Ninio, J. (1982). Molecular Approaches to Evolution,pp. 24–27, Pitman Books Ltd., London, U.K.Olsen, G. J. (1983). Comparative analysis of nucleotidesequence data, PhD dissertation, University ofColorado Health Sciences Center, CO.Pu¨tz, J., Puglisi, J. D., Florentz, C. & Giege´, R. (1991).Identity elements for speciﬁc aminoacylation of yeasttRNAAspby cognate aspartyl-tRNA synthetase.Science, 252, 1696–1699.Pyle, A. M., Murphy, F. L. & Cech, T. R. (1992).RNA substrate binding site in the catalytic core ofthe Tetrahymena ribozyme. Nature (London), 358,123–128.Quigley, G. J. & Rich, A. (1976). Structural domains oftransfer RNA molecules. Science, 194, 796–806.Rould, M. A., Perona, J. J., So¨ll, D. & Steitz, T. A. (1989).Structure of E. coli glutamyl-tRNA synthetasecomplexed with tRNAGlnand ATP at 2.8 A˚ resolution.Science, 246, 1135–1142.Shultz, D. W. & Yarus, M. (1994). tRNA structure andribosomal function. I. tRNA nucleotide 27 to 43mutations enhance ﬁrst position wobble. J. Mol. Biol.235, 1381–1394.Smith, D. & Yarus, M. (1989). Transfer RNA andcoding speciﬁcity. II. A D-arm tertiary interactionthat restricts coding range. J. Mol. Biol. 206,503–511.Sprinzl, M., Dank, N., Nock, S. & Scho¨n, A.(1991). Compilation of tRNA sequences and se-quences of tRNA genes. Nucl. Acids Res. 19 (Suppl.)2127–2171.Sussman, J. L. & Kim, S.-H. (1976). Three-dimensionalstructure of a transfer RNA in two crystal forms.Science, 176, 853–858.Winker, S., Overbeek, R., Woese, C. R., Olsen, G. J. &Pﬂuger, N. (1990). Structure detection throughautomated covariance search. Comp. Appl. Biosci. 6,365–371.Woese, C. R. (1987). Bacterial evolution. Microbiol. Rev. 51,221–271.Woese, C. R. & Pace, N. R. (1993). Probing RNA structure,function and history by comparative analysis. In TheRNA World (Gesteland, R. F. & Atkins, J. F., eds),pp. 91–117, Cold Spring Harbor Laboratory Press,Cold Spring Harbor, NY.Woo, N. H., Roe, B. A. & Rich, A. (1980). Three-dimensionalstructure of Escherichia coli initiator tRNAMetf . Nature(London), 286, 346–351.Yarus, M. (1982). Translational efﬁciency of transferRNAs: uses of extended anticodon. Science, 218,646–652.Yaris, M., Illangesekare, M. & Christian, E. (1991). An axialbinding site in the Tetrahymena precursor RNA. J. Mol.Biol. 222, 995–1012.Edited by D. E. Draper(Received 20 July 1994; accepted in revised form 20 January 1995)