Comparative studies of RNA: inferring higher-orderstructure from patterns of sequence variationRobin R. GutellUniversity of Colorado, Boulder, USARNA structural chemistry and evolutionary biology, long considereddisparate fields of study, are the topics of this review. Evolution transcendsall of the sciences, adding a dimension that enriches every scienceit touches, and in its enrichment of those fields contributes back toevolutionary thought. RNA, on the other hand, is a molecule withan inordinate number of possible conformations; knowing which toevaluate experimentally is problematic. Analysis of RNA sequences froman evolutionary perspective reveals patterns of variation and sequenceconstraints, and suggests how an RNA sequence is folded up into itshigher-order structure.Current Opinion in Structural Biology 1993, 3:313-322Introduction“How far back the historian wishes to place the originsand antecedentsof the GlassBead Game is, ultimately,a matterof his personal choice. For like everygreat ideait has no real beginning; rather, it has always been, atleastthe ideaof it.” HermannHesse- MagisterLudi (TheGlassBeadGame)Imagine you’re a graduate student, and your thesisproject is to fold up a large RNA molecule (let us sayhypothetically, 16sand 23s ribosomal RNA[rRNA]) intoits correct secondary and tertiary structure. And if youhave some extra time, deduce something about thefunctionally important regions of these RNAmolecules.Unfortunately, your own laboratory experimentation isproducing little useful information. You ask: Can theseintrinsically complex molecules be folded up? How willI be able to saysomething profound about its function.And most important, will I be able to obtain an advanceddegreebefore the year 2000?And then one day, as if in a Herman Hessenovel, youarethrilled to learn that the experiments havebeen donefor you (CRWoese,personal communication). The tasknow is to find the notebooks containing these results,question how this information is stored, and wonderhow to interpret it. To keep a long story short, thenotebooks are found and are filled with homologousnucleic acidsequenceinformation, andwith the sequenc-ing and computer information revolutions upon us, youfind a large number of RNAsequences,all in computer-readableform.The interpretation of this sequenceinformation is at firstglance not readily apparent. Upon reflecting on yourgraduate course in evolutionary biology, which at thetime was taken simply for the fun of it, not for the se-rious molecular biologist, you come up with two simplebut powerful conclusions:(a) These RNAmolecules evolved to their present form,starting from a simpler lessconstrained state and beingprogressivelydefined and relined in structure and func-tion. Each structure existing today is thus highly tunedfor its own structure/function and for its complex anddynamic interactions with the outside world. (It shouldbe noted that this refinement in RNA structure is farmore subtle than we can fully appreciate with today’sexperimental methodology [1,2].>(b) Principles of RNAstructure, specific functional con-straints, and overall cellular mechanics mold the evolu-tion of these molecules. Although not necessarily im-plicit in its RNA structure, these principles, constraintsand mechanics are encoded in the actual sequence ofnucleotides.Decoding this collection of nucleic acid sequencesbe-gins with the realization that molecules performing sim-ilar functions (e.g. tRNAs) must have a similar three-dimensional structure. And knowing that different base-pair types(e.g.AU and GC) compose homologous struc-tural elements,e.g.a helix, it follows that similar higher-order structures can be derived from different primarystructures. (The details and extent of these realizationsand deductions remain to be elucidated and fully appre-ciated.)The scientific logic used here to deduce structural andfunctional relationships is tierent from conventionalexperimental analysis.Instead of designing and execut-ing experiments to test the hypothesis, here the exper-AbbreviationsRNase-ribonuclease; rRNA-ribosomal RNA.@ Current Biology Ltd ISSN 095940X 313
314 Nucleic acidsiments havealreadybeen done, and the sequencesarethe answers!Now we areasking:what are the questions,what are the hypothesisbeing tested,and how were theexperimentsdone?Thesequestionsunderlie the analysisof our homologous RNAsequencedatasets.Patternsofconsemtion and variation are analyzed,our initial goalbeiig to infer higher-order structure.Comparative sequence methods in practice‘Since all of the sRNAs[soluble RNAs]in the processoftransferingtheir respectiveamino acidsto protein on thesurfaceof ribosomesprobably haveto meetsimilar,if notidentical,spatialconstraints,it would seemreasonabletoconclude that all sRNAswill be capableof adopting es-sentiallyidentical three-dimensionalstructures.” [31“Figure 3 shows,however,that it is possible to constructvery similar base-pairedstructures in spite of the limitedsimilaritiesin sequences.”Transfer RNAShortly after the first few tRNA sequenceswere deter-mined, comparative sequence analysiswas applied tothesesequences,resulting in the well known cloverleafsecondarystructure [3-6]. For this analysis,secondarystructure helicescontaining only canonical (e.g.AU andGC) and GU pairings in common with all sequencesinthis datasetwere identiIied and scored positively. Thecloverleafwasthe only structure to satisfythis conditionfor a molecule76 (or so) nucleotides in length.5S ribosomal RNANearly a decade later: comparative sequence methodswere again called upon to fold 5s rRNA, a molecule120 nucleotides in length, into its secondary structure. Criteria similar to those applied to tRNA were usedhere.Whereasexperimental methods resulted in severaldifferent secondarystructure models, thesecomparativemethods,when applied to 10phylogeneticallydiversese-quences,resultedin a single analogoussecondarystruc-ture.16s and 23s ribosomal RNAIn I978 and 1980,the first complete sequencesfor ES-cbericbiucoli I6S and 23srRNAwere determined IS-lo],ad shortlythereafter,comparativestructurestudieswereinitiated to fold theseRNAsof length 1542and 2904nu-cleondes,the largest RNA.sso far for which structureshave been provided using these methods. Criteria sim-ilar to thoseused on tRNAand 5SrRNAresultedin asin-gle secondarystructure shared by all membersof theirrespectiverRNAdatasets(lb& 111-131;23s, [lP161).During this time, a helix was considered comparativelyproven if two or more base pairs within the potentialhelix each contained at least one covanation. As thenumber and diversity of sequencesincreased, the I6Sand 23s rRNAsecondarystructures modelswere refined,and variations in the different rRNAmodelswere largely. resolved.The most recentrefinementsof the 16sand 23srRNAstructures are basedon a large and diversecollec-tion of sequences[17.,18*,19].At this time, the questionof comparative proof can accesseach base pair in themodel; the vast majority of all base pairs in the higher-order structure can now be considered proven!Although appreciatedby some,this methodologywasnotwidely acceptedduring the early1980sandwasevencastnegativelyby some:“I don’t know how you can suggeststructure by just iooking at sequences”.With the successand attention given to the rRNAs,attitudeschanged.Manyof us started(mis)pronouncing genus-speciesnamesweknew nothing about, except for some RNA sequence.Many of us became(un)certilied microbiologist, proto-zoologist and the like.Other RNAsDuring the 198Os,comparativesequenceanalysiswasap-plied to other functionally important RNAmolecules,re-sulting in secondarystructure models for each of thesecomparativesequencedatasets.This list includes:group I[20,21] and II  introns, ribonuclease (RNase)P RNA[23,24], U-PNAs(Ul, U2, U4, U5 and u6 ), 7SSRPRNA [261,and telomeraseRNA [271.Are these comparatively derived structures congruentwith formal laboratory experimentation? The quick an-swer is ‘yes’,although the question cannot be addressedfor all F@IAsnoted above,and the long detailed answeris beyond the scope of this review. Sufficeit to say,allof the comparatively inferred secondary structure basepairs were present in the yeast tRNAphecrystal struc-ture 128,291,revealing the authenticity of this approach.Chemical probing experiments of the entire I6S rRNA(301were largely consistent with the comparatively de-rived secondarystructure, suggestingsuch a methodol-ogy could also be applied to large RNAs,although itdoesn’t by itself prove this model. Other experiments,discussedin part in the following sections, lend addi-tional support for the higher-order interactionsproposedwith thesecomparativestudies.Searching for tertiary interactionsTransfer RNAsWith the strong implication that comparative methodscan correctly deduce secondary structure, we can nowaskcan such a methodology alsodeduce tertiary interac-tions?The first attemptwasmadeon tRNA, resultingin, a few correct and a few incorrect tertiary interactionproposals (when comparedwith the crystalstructure so-lution 128,291).But with significantly larger and diversetRNAdatasets,and relined correlation analysismethods,alarger proportion of the higher-order structure can nowbe correctly inferred [32,33,34*,35].
Comparative studies of RNA Gutell 31516s and 23s ribosomal RNAsA searchfor tertiaryinteractions in the 16s and 23srRNAswasinitiated in the early 1980sand resulted in severalre-finementsin secondary-structurepairings and a few can-didatesfor tertiary pairings . The best candidate in-volved positions 570and 866 (E. cofi numbering) in 16.5rRNA 1371,forming a pseudoknot structure (see Fig. 1).As the l6.S and 23S-rRNAdatabasesgrew in number anddiversity, the number and variety of tertiary interactionsincreased(see Figs 1 and 2 and next section; for recentwork on 16sand 23siRNA,see [17*,18*,19,38]).Geneticand biochemical experimental analyseshave addressedand substantiateda number of these higher-order inter-actions (see also next section) [39*,40,41=,42,43,44].Fig. 1. Secondary structure diagrams for E. co/i 165 rRNA. All nu-cleotides are replaced with small open circles. Higher-order inter-actions more complex than the secondary structure helices aredenoted with a thick line or a large filled circle. Adapted from117.1 (for details, see I17*, X3*1).Group I intronsCovarianceanalysisof the group I intron databasehasbeen most impressive,producing a well establishedsec-ondary and tertiary structure model  and formingthe basis for a detailed three-dimensional model uponwhich functional experimental analysis can be based.Severalproposed base-triples , pseudoknots, andnon-canonical,pairings  have been substantiated us-ing site-directedmutagenesis.Ribonuclease P RNAThe RNA component of the RNaseP ribonuclease hasbeen extensively studied by comparative and experi-mental methodologies. An evaluation of the phyloge-netic commonality and diversity in RNaseP RNAsleadto the development of a mini-P RNA,a minimally con-figured RNAwith normal enzymatic activity . Severalcomparatively derived base pairings in the two pseudo-knot structures were testedand substantiated using site-directed mutagenesis[48=].Emerging principles of RNA structure“If you’ve seenone helix, you’ve seen them all.”“If I see one more secondary structure model, I’llscream!” (Comments overheard at a ribosome confer-ence)Fortunately, there is more to higher-order structure thanjust secondary-structure helices. Now we can ask whatare these additional RNA structure principles and canthey be inferred from comparative methods? Altema-tively we can ask what types of structural features arediscernable using comparative methods?Before we ad-dress these questions, it is important to step back andevaluatethesemethods, albeit in a most brief fashion.Initial searchesfor tertiary interactions haveusedanewercovarianceanalysisalgorithm, Merent from those usedfor inferring RNAsecondarystructure. The most notabledifferences are: correlating positions are identilied re-gardlessof the pairing type,in contrast to previous meth-ods that specifically looked for canonical and GU pairs;and the current algorithm only looks for correlating pairs,independent of surrounding structure, which is in sharpcontrast with older methods that only identified pairingswithin a potential secondary-structure helix. Analysis ofthe 16s and 23s rRNAdatasetsusing this newer algorithm,and without any knowledge of previous secondary-struc-ture proposals, identified the vast majority of all previ-ously suspectedsecondary-strucm-repairings (RRGutell,unpublished data)‘[34*]. Thus, searchingonly for corre-lating positions resulted in the two basic and underlyingprinciples of RNAstructure: namely,AU, GCand GU pair-ings, and the antiparallel and contiguous arrangementofthesepairings!It is of interest to note that the majority of all pairings inthe 16s and 23s rRNAdatasetsidentilied using our mostrecent analysisalgorithms (RRGutell, unpublished data)1341 are canonical or GU, and that the tiajority of theseare found in the conventional secondarystructure helix.Only a small percentageof all correlating pairs are non-canonical.Only a smallpercentageof all correlating pairslie outside of the secondarystructure, and these usuallyform pseudoknot structures or heliceswith a single basepair. These exceptional 16s and 23s rRNA interactionsare emphasizedin Figs 1 and 2, and discussedbelow.
(a)316 Nucleic acidsJFig. 2. Secondary structure diagrams for E. co/i 235 rRNA: (a) 5’ half; fb) 3’ half. All nucleotides are replaced with small open circles.Higher-order interactions more complex than the secondary structure helices are denoted with a thick line or a large filled circle. Dashedlines represent tentative tertiary interactions. Adapted from 117*1(for details, see 1170, 18.1).Non-canonical pairsOf the 16 possible pairing types, the six canonical andGU pairings account for the vast majority of all com-paratively’derived base pairs. In the majority of phylo-geneticbase-pairreplacements,one of thesesix typesisreplacedwith anotherof thesesix. Ten of the 16possiblepairings occur infrequently; however, specific classesofpairing typesand their phylogenetic replacementsarebe-ginning to emerge,with the most salient ones describedbelow (others havebeen identified [17*]).A:C+tC:ASeveralexamples of this replacement type occur inthe rRN& One occurs in 16s rRNAbetween positions1357 and 1365, at the end of a helix [2,17*]. Within(eu)bacteria, chloroplasts and mitochondria, this pairinterchangesbetween AG and GA; within Archue andEucurya, the interchangeoccurs solely between canon-ical pairing types.A second good example is found in23srRNAbetweenpositions 2112and 2169,which alongwith severalcanonical pairings forms a parallel structure(also seebelow) [170,381.Most interestingly,thesesametwo nucleotides are associatedwith the E site in trans-lation , suggestingthat this unusual pairing and/orthe parallel structure are functionally important aswellasstructurally unique.A:A+-L:CCorrelations between thesetwo pairing typeshavebeenfound in 16s rRNAbetween positions 722 and 733 ,and in 5s rRNAbetweenpositions 76and 100(RRGutell,unpublished data).A similar setof correlated pairings ispresent in the HIV1 Revbinding region of an in vitrogenetically selectedRNA 1501.In these three cases,thispairing is found in an internal loop, immediatelyadjacentto a helical structure. The Revprotein binding suggeststhat this non-canonicalpairing could be ageneralproteinrecognition motif.u:u+K:cSeveralexamples of these correlated replacementsarefound in 16s and 23s rRNA The two found in 16s liein different internal loops, immediately adjacent to theend of the helix. These same types of correlated pair-ings associatedomains 4 and 5 of 23s rRNA . Ther-modynamic studies have addressedthese pairing typesand found that UU and CC+ pairs can stabilize a du-
Comparative studies of RNA Gutell 317plex . Interestingly, these two pairings Can form anisomot+ic structure when one of the cytosines are pro-tonated, whereas the unprotonated form of the CC pairis destabilizing and not isomorphic in structure.C:UctA:CA percentage of the GU helical base pairings are very con-served, and for a percentage of these, replacement yieldsan AC pair. The three best examples are found in 16srRNA and all lie at the end of one helical element and inclose proximity to the end of an associated helix [17.1. Inthe translational decoding site of 16s rRNA, there is a CApairing at positions 1402 and 1500 in the overwhelmingmajority of all 16s (and 16S-like) rRNA sequences. A fewphylogenetically distinct mitochondria change this pair toa UG base pair [18*]. A GU and an AC base pair can formisomorphic structures when the adenine (of the AC) isprotonated. The thermostability of this A+C pair is closeto that of an AU pair [52**] suggesting that AC pairs, withthe adenine protonated, may well be paired in a specificstructural context.Tetra loopsThe hairpin loop of four bases is a common feature inthe rRNAs, occuring in over 50% and 40% of all 16s and23s hairpin loops, respectively. Among the 256 differentloop sequences of size four, there is a strong bias in therRNAs for three major classes: UUCG, GNRA, and CUUG[531. Thermodynamic analysis revealed that these loopsare surprisingly very stable [54,55], and structural analy-sis of these loops reveals an unusually compact structure[56,57].PseudoknotsPseudoknots are a popular and fashionable class of RNAstructure, delined as a set of base pairings that cross anexisting secondary-structure helix. Comparative methodshave been used to elucidate many examples in 16s and235 rRNAs [17.1, aswell as in many other RNA molecules[58-*]. Within the i-RN& these structures vary from oneto three base pairs in length, and are situated immedl-ately adjacent to another helical structure on one or bothends of its helix, suggesting a possible coaxial stack. NMRstudies of a simple pseudoknot structure have revealedcoaxial stacking of the helices .Site-directed mutagenesis has substantiated several ofthese pseudoknot interactions. In 16s rRNA, a helix ofthree base pairs formed between a side bulge at posi-tion 505 and the apex of the hairpin loop at position525 is nested between other helices, forming a complexpseudoknotted structure (assuming each helix occurs si-multaneously) with multiple coaxial stacking possibilities.This same region is highly conserved in primary and sec-ondary structure and strongly implicated in translationalfunction. A series of elegant site-directed mutagenesis ex-periments has addressed the structure of this proposedhelix [39*] and revealed that this helix is not only struc-turalIy correct, but also is directly implicated in trans-lational function, streptomycin binding, and binding toribosomal protein S12 (also see below).A complex pseudoknot structure situated with multiplecoaxial stacking possibilities involves 235 rRNA positions1343-1344 with 1403-1404 (E. coli numbering). Thisgeneral region of domain III is the binding site for anessential early-assembling ribosomal protein. This helixhas recently been experimentally altered in a Saccha-romyces cerevtie in vitro protein binding system, usingsite-directed mutagenesis. The results clearly show thatcanonical pairing in this short helix is required for properprotein binding [41l]_Other comparatively derived pseu-doknot structures have been proposed in RNaseP,groupI introns, and telomerase RNA [GO]. Some of these struc-tural elements have been strongly implicated in catalyticfunction, and have been evaluated and substantiated bysite-directed genetic analysis [46,48*]. It is interestingto note that the lengths of non-ribosomal RNA pseu-doknot helices are usually greater than those found inrRNk Other naturally occuring pseudoknot structureshave been suggested by comparative and experimentalcriteria, and discussed in some detail [589-l. More re-cently, the analysis of a collection of sequences derivedfrom an in vitro amplification and selection for bindingto HIV1 reverse transcrlptase has ident.i&i a pseudoknot-motif binding site .Coaxial stackingThe concept of comparative evidence for a coaxial stackwas initially proposed a decade ago  and states thattwo adjoining helices that vary in length might be coaxi-alIy stacked upon one another if their combined lengthremains constant. Based on simple spatial considera-tions, many secon% and pseudoknot helices in the16s and 23s rPNA can potentially stack upon one an-other; however, comparative evidence is lacking for allbut two that do satisfy this condition. The first exam-ple of coaxial stacking involves helices 500-504/541-545and 5ll-515/536-540 in 16s rRNA, the second is at thebase of the cr-sarcin loop in 23s rRNA, involving he-lices 26462652/2668-2674 and- 2675-2680/2727-2732[170,18*]. For both proposed coaxial stackings, thelengths of the underlying helices remain the same inall (eu)bacteria, chloroplast and mitochondria, but are ofdifferent lengths in the Arcbaeand Eucarya phylogeneticdomains. Both of these rRNA regions have been directlyimplicated in translational function [39**,49] and some ofthese functions are overlapping, suggesting that if thesecoaxial stacks do occur, they could be associated withribosomal function and with each other in acoordinatedmanner. As noted earlier, the re@on of 16s rRNA betweenpositions 500 and 545 is considered to be quite com-plex, with a pseudoknot structure and potential coaxialstacking. The coaxial stack proposed here on the basis ofcomparative evidence would only make this region morecomplex, and for alI of these suggested coaxial stackingsto occur, conformational rearrangements would be re-quired (i.e. not all of them can occur simultaneously).
318 Nucleic acidsThis should be aninteresting set of ideasto testexperi-mentally.Parallel interactionscomparative evidenceexists for two setSof parallel in-teractions in rENA.The more interesting of the two isfound in the 23s rRNA,involving three pairings arrangedin parallel: 2112-2169,2113-2170,and 2117-2172.Theiirst pair covaries between an AG and a GA pairing,whereasthe latter two change from one canonical pairto another [17*]. This region is structured further withthe interaction between position 2111 and the first andlastnucleotide of the ten-aloop at 2144-2147 . Thisunusual structure is associatedwith the translational E-site .Base triplesTransfer ENA, a molecule 76 nucleotides in length,contains three base-triple interactions (28,291.Thesehave been partially predicted by comparative methods[31-33,340,35].No convincing base triples have beenuncovered(so far) within the RNAs,nor haveany beenidentified (yet) in RNaseP RNAand the LJ-RNAs.Com-parative analysisof the group I introns have revealedhoweverseveralbasetriples 121,451,and thesehavebeensubstantiatedby site-directedmutagenesisand modeledin three dimensions. More recently, these two group Iintron basetriples havebeen characterizedand substan-tiatedby NMR.Conclusions“The G:U basepair in the upper stem might be impor-tant. Is it possible that the activatingenzymescould ex-tract enough information from the G:U pair and otherfeaturesof this double-stranded region for it to act asarecognition site?”An important question to askat this juncture iswhat typeof information can be obtained from comparative se-quenceanalysis?Severalexamplesof largestructuresandvariousstructuralelementshavebeen inferred from suchanalysii, and many of these are consistent with and/orproven with experimental methods. Beyond these, areadditional constraints present in RNAmolecules,and ifSO, whit are they?Do they suggestnew typesof higher-order structural motifs, recognition sitesfor proteins orother RNAmoleculesor do theyrepresentsubtle thermo-dynamic and/or structural reiinements?Wii we be ableto decode this information from sequenceinformationalone?Addressing such questions will require additional se-quences and diversity for each RNA dataset (e.g. 16srRNA). In parallel, the computer tools used for com-parativesequenceanalysiswill need to be expanded andrelined. Quantitation correlation analysisalgorithms arecapableof uncovering subtle constraints [32,33,34*,35].The most recent application of these methods is be-ginning to identify structural constraints beyond simplepairings (secondary and tertiary), and suggestsin somec&s that certain base-pair types (or simply bases) in-fluence the types of pairings (or bases) in close three-dimensional proximity (i.e. context eifect; see Figs 3and 4) [34-j. At the moment, these quantitative cor-relation analysismethods do not incorporate the num-ber of phylogenetic eventsunderlying each coordinatedbasechange(i.e. the number of compensatorychangesthat have occurred throughout the phylogenetic tree;the larger this number, the more significant the set ofchanges),although such eventshave been incorporatedinto a non-quantitative prologbased covarianceanalysisprogram . Incorporating a ‘phylogenetic events’fac-tor into quantitative correlation algorithms could wellimprove these quantitative methods and help identifyor strengthen the argument for new structural elementsand/or new.structural principles.O-3’000S’-0 - 0o-o0 - O-70o-oo-oo-oo-o TOo”iir. ;f0000I I I Ii7 ofi-;-q&g”~~~20 o--o Lawo-oo-o30-O- O-40o-o0 00 0Oo O0000I 00 / 00011--Fig. 3. Secondary-structure diagram of tRNA (yeast Phe number-ing) highlighting those positions correlating best with position 13(identified with filled triangle). Two large filled circles identify thetwo highest correlating positions (22 and 461, and smaller filledcircles identify the next five highest correlating positions withposition 13. Adapted from f34.1.What is called comparativesequenceanalysis(or phylo-genetic analysisby manywho are referring to structuralanalysisof thetypediscussedhere) goesbeyond the anal-ysis of a single RNAmolecule. It should be appreciatedthat such analysiscan and should encompassthe com-parison of different RNAmoleculesor subsetsof a givenmolecular database,when there is biological rationalefordoing so. For example, the complete tFWAsequence
Comparative studies of RNA &tell 319VARIABLE LOQP VARIABLE LOOPFig. 4. Stereo pair of the three-dimensional structure for this tRNA, with the seven best correlating positions with position 13 identifiedas in Fig. 3. Adapted from I34e1.databasecan be subdivided into the 20 amino acid ac-ceptors and analyzed for subtle structural differences,which could be the recognition signals for the differ-ent amino acyl synthetases[64,65]. The first attempt atthis wascompleted in 1966 , and is surprisingly goodgiven the small sequencedatasetused. An example ofintermolecular analysis includes a 16%23s rRNA com-parison. These two moIecules are befieved to interactduring translation, and thus any significant intermolec-ular correlation could wetl be pointing at a structurallyand functionally important site.Comparative structure analysis is not only generatinghigher-order structures that are widely accepted in var-ious RNAfields,it is alsoestablishing an agendafor vari-ous experimental designs.I havenoted how this methodhasidentified manyof the sign&ant RNAstructure prin-ciples,including Watson-Crick and GU pairings, antipar-allel and contiguous arrangement of these pairings, te-tra loops, pseudoknots, severalclassesof non-canonicalpairings (and their replacements),helix coaxial stacking,base-tripleinteractions,and setsof pairs that form paral-lel structures,The majority of thesestructures havenowbeenevalulatedand substantiatedin one form or anotherusingexperimental methods.When astructure of interestis atafunctional site,that function can be experimentallyevaluatedin the light of that structure. Thus a compara-tively derived higher-order structure can and should beconsidered a hypothesis,testedwith each new rRNAse-quence,evaluatedexperimentally, and subject to molec-ular modeling.Superimposinghomologous RNAstructures,for example16s and 16S-likerRNA,allows us the opportunity to eval-uate more than its higher-order structure, presentingus with a glimpse of what structural features are conserved throughout evolution or part thereof, which inturn suggestsstructural elements of possible functionalsignificance.When the number and diversity of struc-tures is sufficient, as it is for 16s and 23s rRNA (therenow exist over 2200 16s and 16S-likeand over 200 23sand 23Slike sequences), evolutionary events and path-ways can be mapped in great detail. Not only can thereconstruction of these events be played like a Disneymovie flip book (i.e. a frame by frame snapshot anima-tion of (r)RNA evolving), but underlying constraints onstructure and function can be deduced,from which prin-ciples and refinementsof RNAstructure and function canbe inferred.Darwinian or natural evolution has generated a won-derfully diverse collection of RNA molecules for us tocompare and contrast utilizing the comparative meth-ods discussedabove. Recently,the advent of new tech-niques in biochemistry has atlowed the molecular biol-ogist to practise a little evolutionary home brewing forthemselves.Starting with a very large collection of ran-dom oligonucleotide sequences,one can subject thesemacromoleculesto multiple rounds of selection and am-plification, enricl-+g for those sequences that best sat-isfy the constraint conditions. Such methodology nowputs some of Mother Nature’s authority into the handsof the research scientist. But, instead of designing ter-ribly complex and obtuse experiments, as it appearsto us mere mortals, the scientist can now select andenrich for something far simpler, such as a small figand or protein-binding site on an RNA molecule, orRNA molecules capable of defined catalytic functional-ity [66**,67**]. Such newer methodology lets the genieout of the bottle. ~Althoughthe sequencesare the answers,we now have a better appreciation of how theexperiments were done; for we ask the questions andknow what the underlying hypothesis is. The sequencesresulting from this work will now have a Yii associatedwith its Yang. The next few yearsshould be an excitingand rewarding time. (And people like me canstill saythatthe experiments havebeen done for us.)
320 Nucleic acidsSo,in closing,we.cansit backand marvelabout the rapidtechnologicaladvanceshappening all around us.The se-quencing revolution makes it possible to fill volumesof notebooks with homologous sequenceinformation.Computersand their networks allow us to readily store,manipulate, access,and analyze these notebooks. Thecomparativesequence/structure analysisparadigm putssomemeaningand dimensions to this information. It isaparadigmthatis itselfstill being definedanddeveloped.AcknowledgementsThis work was supponed by the NIH (GM 48207). RRGutell is anAssociatein the Program in Evolutionary Biology of the CanadianInstitute of AdvancedResearch.I wish to thank the WM Keck Foun-dation for their generous suppop of RNA science on the Bouldercampus, SUN Microsystemsfor their donation of computer equip-ment, and B@ Weiser,Tom Mackeand others Fordeveloping muchOFthe computer code used to analyze and present RNA structuralinformation.References and recommended readingPapersof part.ic;larinterest,publishedwithin the annualreviewperiod,havebeen highlightedas:.. .18.104.22.168.22.214.171.124.9.10.11.of special interestof outstanding interestWOEsECR:Just So Stories and Rube Goldberg Machines:Speculations on the Origin of the Protein Synthetic Ma-chinery. In HBOSOMES: Struchrre, Function, and GeneticrEdited by ChamblissG ef al. Baltimore: University Park Press;1980:357-373.WOnE CR, GUT&UR, GUPTAR, NOUERHF: Detailed Analy-sis of the Higher Order Structure of 16S-Like RIbosomalRibonucleic Acids. Microbial Rev 1983,47621669.RAJBHANoARVUI+STUARTA, FAUI~XNERRD,CHANCSH,KHORANAHG: Nucleotide Sequence Studies on Yeast PhenylalaninesRNA Cokd Spring Harb S’p Quant Biol1966, 31~425-434.MADISONJT, EWEIT GA, KUNGHK: On the Nucleotide Se-quence of Yeast Tyrosine Transfer RNA Cold Spring HarbSVmp Quanr Biol 1966,31:409-416.HOW RW,&CARJ, EVERETTGA, MADISONJT, MARQUISEEM,M~aatnSH,PEN~VV~CKJR,ZAMIRA: Structure-of a R&nucleicAcid. Science 1965, 147:1462-1465.ZACHAUHG, DUI-~NGD, FELDMANNH, MEKHERSF, KARAUW: Serine Specific Transfer Ribonucleic Acids. XIV. Com-pa&on of NucIeotIde Sequences and Secondary StructureModels. cdd Spring Harb 5jmip Quant Bioi 1966, 3h417-424.FOXGE, WOESECR: 5s RNA Secondary Structure. Nature1975,256:50>507.BROSIUSJ, PALMERMI, KENNEDYPJ,NOUERI-IF:Complete Nu-cleotide Sequence of a 16SRibosomaI RNA Gene from Es-cberlcbia colt Pm Nat1 Acad Sci USA 1978,75:4801-@05.BROSIUSJ, DULLT, NOLLERI-IF: Complete Nucleotide Se-quenceof a 23s Ribosomal RNA Gene from EscberfcbfacolL Pnx Nat1 Acad Sci USA 1980, iT201-204.BRG~IUSJ, DUU +IJ,SLEETERDD, NOLLERHF: Gene Organi-zation and Primary Structure of a Ribosomal RNA Operonfrom Bscbedcbia coli J Mol Biol 1981, i48:107-127.WOESE CR, MAGRUMLJ,GUPTAI( SIEGELRB, STAHLDA, KOPJ,CRAWFORDN, BROSIUSJ, Gur~u. R,HOGANJJ,NOLIXRI-IF:Sec-126.96.36.199.16.17..ondary Structure Model for Bacterial 16S RIbosomaI RN&PhylogenetIc, Enzymatic and Chemical Evidence. NucleicAcids Res 1980,8:2275-2293.STIECLERP, CARBONP, EBELJP, EHRESMANNC: A GeneraISecondary Structure Model for Procaryotic and EucaryoticRNAs of the Small Ribosomal Subunits. Eur J Bicxbem 1981,120:487-495.ZWEB C, GLOTZ C, B~COMBE R: Secondary Struc-ture Comparisons between Small Subunit Ribosomal RNAMolecules from Six Different Species. Nucleic Acids Res1981,93621-3640.NOUERHF, KOPJ. WHEATONV, BROSIUSJ, GIJTEURR,KOPYLOVAM, DOHMEF, HERRW, STAHLDA, GUPTAR,WOE~ECk Sec-ondary Structure Model for 23s RibosomaI RNA NtrcleicAcids R~s 1981,9:6167-6189.GLOIZC,?%IEBC, BRIMACOMBER:Secondary Structure of theLarge Subunit RIbosomaI RNA from Escberichfa coli, Zeamays ChIoroplast, and Human and Mouse MitochondrialRibosomes. Nucleic Acids Res 1981,9:3287-3306.BRAND C, KROL4 MACHAT~MA,POUYETJ, EUELJP,EDWA~UXK, KO~~ELH: primary and Secondary Structures of Es-cbericbia coli mre 600 23s RIbosomal RNA Comparisonwith Models of Secondary Structure for Maize Chloroplast23s rRNA and for Large Portions of Mouse and Human 16sMitochondriaI rRNAs. Nucleic Aciak Res 1981,9:43034324.Gunu RR,LARSENN, WOESECR: Lessons from an EvolvingRIbosomal RNA: 16s and 23s rRNA Structure from a Com-parative Perspective. In Ribosomal RNA Structure, Evolution,Gene Eapressim and Function in Protein Synthesis Editedby Ziimermann RA, Dahlberg AE. Boca Raton: CRC Press;1993:in press.ia. GUTEURR:The Simplicity behind the Elucidation of Com-. plex Structure In RIbosomal RNA 7%e Translational Appuratus Edited by Nierhaus KH ef al. New York: PlenumPublishing Corporation; 1993: in press.Another brief review of rRNAsttucture, lacedwith a few newer struc-tural themesin rRNAhigher-order constraints.Additional correlationsfor a few of the highly conservedand functionallysignificantregionsoftheseRNAsare presented.19. LARSENN: Higher Order Interactions in 23s rRNA Pm Nat1Acad Sci USA 1992,89:5044-5048.20. CECHm Conserved Sequences and Structures of Group IIntrons: Building an Active Site for RNA Catalysis-a Re-view. Gene 1988,73: 259-271.21. MICHELF, WOOF E: Modelling of the Three-DimensionalArchitecture of Group I Catalytic lntrons Based on Com-parative Sequence Analysis. J MoI Biol 1990,216:585-610.22. M~CHEI.F, UME~ONOK, OZEKIH: Comparative and FunctionalAnatomy of Group II Catalytic Introns-a Review. Gene1989, 825-30.23. JAMESBD, OLSENGJ, LIUJ, PACENR: The Secondary Struc-ture of Ribonuclease P RNA, the Catalytic Element of aRIbonucleoproteIn Enzyme. CeN1988, 52:19-26.24. BROWNJW, HAASES,JAMESBD, HUNTDA, PACENR Phyloge-netic Analysis and Evolution of RNaseP RNA in Proteobac-teria. J Bade61 1991, 1733855-3863.25. GUIHRE C. PATTEW~NB: SpIiceosomal snRNAs.Annu RevGenet 1988, 22:387419,26. Z~P~EBC: Structure and Function of Signal Recognition Par-ticle RNA Prog Nucleic Acid Res Mol Biol 1989,37207-234.27. ROMERODP, BIACKBURNEH: A Conserved Secondary Struc-ture for Telomerase RNA Cell 1991,67:343-353.A review intermixed with original comparativeinformation on 16sand23s rRNAhigher.order strucmreand rRNAstructural motifs.Written in1991/1992,alreadybecoming a bit out of date,and the book is still notout yet (as of February 1993).
Comparative studies of RNA Cutell 32188.8.131.52.32.33.34..KM SH:Three-Dimensional Structure of Transfer RNA Prog mal IINk an IntcaRNA Crosslinking Study. Nucleiic Acids ResNucleic Acid Res Mol Biol 1976, 17:181-216. 1992, 20:15931597.QIJIGEYGJ, RICH A: Structural Domains of Transfer RNAMolecules. Science 1976, 19&796-806.MOAZEDD, STERNS, NOUER I-IF: Rapid Chemical Robingof Conformation in 16S Ribosomal RNA and 30s Ribo-somal Subunits Using Primer Extension. J Mol Biol 1986,187399-416.45. MICHEL F, ELLINGTON AD, COV~URE S, SZOSTAK PJV: Phyloge-netic and Genetic Evidence for Base-Triples in the CataIyticDomain of Group I Introns. Nafure 1990, 347~578-580.m M: Detailed Molecular Model for Transfer Ribonu-cleic Acid. Nature 1969, 224759-763.46. COUTURES, EUINGTON AD, GERBER AS, CHERRY JM, DOUDNAJA, GREEN R, HANNA M, PACE U, RAJACOPAL J, SZOSTAK JW: Mu-tational Analysis of Conserved Nucleotides In a Self-SpIicingGroup I Intron. J Mol Biol 1990, 215:345-358.OBEN GJ: Comparative Analysis of Nucleotide SequenceData [PhD Thesis]. University of Colorado Health SciencesCenter; 1984.47. WAUGH DS, GREEN CJ, PACE M: The Design and CatalyticProperties of a Simplified Ribonuclease P RNA Science 1989,2441569-1571.HASEw T, CHAPPEIEAR JE, Fox GE: Fidelity of Secondaryand Tertiary Interactions in tRNA Nucleic Acids Res 1988,16:5673-5684.48. HAAS Es, MORSE DP, BROWN JW, SCHMIDT FJ, PACE NR: Long-. Range Structure In Ribonuclease P RNA Scieuce 1991,254:853-856.&TELL RR, POWER A, HER-IZ GZ, PUIZ EJ, STORMO GD:Identifying Constraints on the Higher-Order Structure ofRNA: Continued Development and Application of Compar-ative Sequence Analysis Methods. Nucleic Acids Res 1992,205785-5795.Another in the ‘although you told me, I want fo veri@ it for myselfseries.This time, the pseudoknots in RNaseP RNAare evaluatedusingsite-directedmutagenesis.49. NOUER HF, MOAZED D, STERN S, POWZRS T, ALIXN PN,ROBERTSON JM, WEARER B, TRIMAN K: Structure of rRNAand its Functional Interactions in Translation. In The Ri-bawme: Structure, Function G Evolution Edited by Hill WEef al. WashinggtonDC: American Society for Microbiology;1990~7392.Rehemems in quantitative correlation an&is methods and signif-icantly larger RNA databases are revealing newer and subtle RNAstructure! constraints. This methods paper, with a sampling of results,suggeststhat hirther analysiswith these methods will reveal more in-teresting structwal constraints.35. CHIU DKY, KOLODZ~EJCZAK T: Inferring Consensus Structurefrom Nucleic Acid Sequences. CompuI Appr Biosci 1991,7~347-352.36. GLITELL RR, WEISER B, WOESE CR, NOUER HF: ComparativeAnatomy of 16%Lie RIbosomaI RNA. Prog Nucleic Acid ResMol Biol 1985,32:155-216.37. GU~EURR,NOUER HF, WOE~E CR: Higher Order Structurein RibosomaI RNA EMBO J 1986, 5:1111-1113.38. GUTEURR,WOESE CR Higher-Order Structural Elements inRIbosomaIRN& Pseudoknots and the Use of NoncanonicalPalm Proc Nat1 Acad Sci USA 1990,87:663-667.39. POWERST, NOUER HF: A Functional Pseudoknot In 16S Ri-bosomal RNA EMBO J 1991, 10:22032214.yrnos, revealing manuscript. Not only is the comparatively inferredpseudoknot helix substantiated,this structure is shown to be associ-atedwith translationalfunction.40. CUNNINGHAM PR, NURSE K, BAKIN A, WEIIZMANN CJ, PFWMMM, OFENGAND J: Interaction between the fwo ConservedSingle-StrandedRegions at the Decoding Site of SmaIISub-unit Ribosomal RNA Is Essential for RIbosome Function.Bicxbemirtry 1992,31:12012-12022.41. KOOI EA, RLITGER~ CA, MULDER A, RIET JV, VENEMA J, RAUE HA:. The PhylogeneticalIy Conserved Doublet Tertiary Interac-. tion In Domain III of the Large Subunit rRNA Is Crucial forRibosomaI Protein Binding. Proc Nat1 Acud Sci USA 1993,90~213216.Another in a growing seriesof experimental works that substantiareacomparatively derived pseudoknot StwNw in this case, this StIUCNreis shown to be involved in ribosomal protein binding.42. BRIMACOMBE R, GREUER B, MITCHEU P, Ossww M, RINKE-APPELJ, SCHULER D, STADEK: Three-Dimensional Structure andFunction of &zbericbia CON16Sand 23s rRNA as Studiedby Cross-Liig Techniques. In 7% Rhome: Structure,Function 6 Evolution Edited by Hill WE et al. WashingtonDC: American Society for Microbiology; 1990:93-106.43. Rcrurl PC, LU M, DRAPER DE: Recognition of the Highly Con-served GTPase Center of 23s RIbosomaI RNA by Riboso-mal Protein Lll and the Antibiotic Thiostrepton. / Mol Eiol1991,221:1257-1268.44. DOIUNGT, GREUERB, BRIMACOMBE R: The Topography ofthe 3’ TermInaI Region of Eschetfchfa calf 16s Riboso-50.51.52.. .BARTEL DP, ZAPP ML, GREEN MR, SZOSTAK Jw: HIV1 Rev Regu-Iadon Involves Recognition of Non-Watson-Crick BasePairsin Vii RNA. Cell 1991, 67529-536.SANTA-LUCIA J JR, KIERZEK R, TURNER DH: Stabiities of Con-secutive A-C, C-C, G-G, U-C, and U-U Mismatches in RNAInternal Loops: Evidence for Stable Hydrogen-Bonded U-Uand C-C+ Pairs. Biocbemishy 1991, 30: 8242-8251.CHASTAIN M, T~QcO 1JR: svuctural Elements in RNA PmgNucleic Acid Res Mol Biol 1991, 41:131-177._-... -The most current and comprehensive rewewot RNAStruCNral motits,with a biasfrom the physicalchemistryperspective.An abbreviated Systerns Adminhrafion Guide to RNAstructure.184.108.40.206.57.58.. .WOESE CR, WIM(ER S, Gur~u. RR: ArcbitecNre of RibosomalRN& Constraints on the Sequence of Tetraloops. PIW.ZNullAcad Sci USA 1990, 878467-8471.TLIERK C. GAUSS P. THERMES C, GROEBE DR, GA- M, GUILDN, STO&O G, D’A~BE~oN-CARAFA Y, UHLENBECK OC, TINOCO1. BRODY EN. GOLD L: CUUCGG Hairpins: Extraordinarily&able RNA ‘Secondary Structures Associated with Va&ous Biochemical Processes. Proc Nat1 Acad Sci USA 1988,85:1364-13&XANTAOVP, IA SY,TINOCO1JR: A Thermodynamic Study ofUnusually Stable RNA and DNA Hairpins. Nucleic Aciak Res1991, 19:5901-5905.VARANIG, CHE~NG C, TINOCOI JR:Structure of an UnusuallyStable RNA Hairpin. Biocbemwy 1991, 30:3280-3289.HEUS H, PARDIA: Structural Features that Give Rise to theUnusual Stability of RNA HaIrpIns Containing GNRA Loops.Science 1991, 253~191-194.TEN DAM E, PLED K, DRAPER D: Structural and Func-tional Aspects of RNA Pseudoknots. Biock+hy 1992,31:11665-11676.Evemng (almost) you need to know about pseudoknots (but forgotto ask). Current in 1333,we should expect the 1995edition to be muchlarger.59. PUGuSJD, ‘WYATI JR,TINOCOI JR: Conformation of an RNAPseudoknot. J Mol Biol 1990, 214:437-453.60. TEN DAM E, BE~XUM AV, PUIJ CW& A Conserved PseudoknotIn Telomerase RNA Nucleic Acids Res 1991, 19~6951.61. TUERK C, M~cDouGti S, Gow L RNA Pseudoknots thatInhibit Human Immunodeficiency Viis Type 1 ReverseTranscriptase. Proc Nat1 Acud Sci USA 1992, 896988-6992.
322 Nucleic acids220.127.116.11.66.. .CHA~~A~HM, Tuwco I JR:A Base-Triple Structurai Domain.in RNA. Biochemishy 1992,31:12733-12741.See[67**].WINKERS, OVERBEEKR, WOESECR OISENGJ, PFLUGERN:Structure Detection through Automated Covariance Search.Coinput Appl Biosci 1990, 6365-371.McCw WH, Foss K: Changing the Identity of a tRNA byIntroducing a GU Wobble Pair near the 3’ Acceptor End.Science19BB,240:7937%.67. JOYCE GF: Directed Molecular Evolution. Sci Am 1992,26790-97.zese papers [66°0,6700]setthestagefor what’sto come.Thesenewermethods in biochemistry are changing the pace (no pun intended,Norm) at which sequencevariation is analyzedto infer structure andfunction.XHULhLUrlLH: Recognition of tRNAs by AminoacyLtRNASynthetaws. Prog Nucleic Acid Res Mol Biol 1991,41~23-87.SZOSTAKJW In vihu Genetics. Trends Biocbem Sci 1992, RRGuteU,MCBBiology CampusBox 347,Universityof Colorado,Boul-1789-93. der, Colorado 80309-0347,USA