Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 1 of 23RNA: General Categories (RNA)Ribonucleic acid (RNA) is one of the two major forms of nucleic acids. Each individualelement, or nucleotide, of RNA is comprised of three parts: a ribose sugar, a phosphate,and a cyclic base. Four primary bases are found in RNA: adenine, guanine, cytosine,and uracil. Other bases and modified forms of these bases sometimes appear; theMcCloskey Lab at the University of Utah has prepared a database of these modifiednucleotides.The major structural difference between RNA and DNA, the other major form of nucleicacid, is the sugar (deoxyribose in DNA, ribose in RNA). The ribose sugar increases thesusceptibility to degradation of RNA compared to DNA. Thus, RNA is best suited to(relatively) short-term uses, while DNA is sufficiently stable to be a good medium forgenetic inheritance.While DNA’s primary functions are the storage of genetic information and theproduction of RNA, cellular RNA has several distinct functions. Three major forms ofRNA are commonly discussed: messenger RNA, ribosomal RNA, and transfer RNA.(Ribosomal and transfer RNA are discussed separately.) Messenger RNA (mRNA)codes for proteins. mRNA nucleotide sequence is translated to amino acid sequencebased on a specific mapping (the “genetic code” for an organism) between sets of threemRNA nucleotides (codons) and amino acids. mRNA is produced from DNA duringtranscription. mRNAs sometimes contain untranslated regions (UTRs) at their 5 and 3ends that play several key roles in gene regulation and expression. mRNA is bothquickly synthesized and degraded as part of the regulation of protein synthesis.In addition to the three major RNAs, other RNAs that have different properties havebeen identified. Some RNAs form ribonucleoprotein complexes, others have catalyticactivities, and, in viruses, carry the genetic information for the organism rather thanDNA. Included among these other RNAs are intron RNAs, which must be excised fromother genes so that those genes can function, RNase P, and the U RNAs. Many of theseinteresting RNAs have been characterized, and several examples appear below.Noncoding RNAs (also referred to as small RNAs) are RNAs that do not code forproteins. (Technically, both ribosomal and transfer RNAs belong to this category.)Certain noncoding RNAs, such as the microRNAs (miRNAs) that have been isolatedfrom plants and animals, have been characterized and implicated in regulatory roles.Other noncoding RNAs have been implicated in the destruction of mRNA via RNAinterference (RNAi). Some additional noncoding RNAs that have been characterizedare described below.The bacterial tmRNA (also known as 10Sa RNA or SsrA) is a chimeric molecule withboth tRNA-like and mRNA-like characteristics. An incomplete mRNA with a truncated3’ end will cause the ribosome to “stall” with an incomplete protein attached. tmRNA“rescues” the ribosome by binding its tRNA-like portion to the ribosome. This binding
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 2 of 23positions the mRNA-like portion of tmRNA so that the ribosome can resumetranslation, using the tmRNA as its template. The mRNA-like portion codes for a signalpeptide that will be recognized by bacterial proteases and ensures that the partially-produced protein will be degraded.Small nucleolar RNAs (snoRNAs) are typically 60-300 nucleotide RNAs that areabundant in the nucleolus of a broad variety of eukaryotes. The snoRNAs associatewith proteins to form small nucleolar ribonucleoproteins (snoRNPs). snoRNAs come intwo major structural forms, one containing the box C and D motifs and the secondcontaining box H and ACA elements. Most snoRNAs are involved in the nucleotidemodification process, in either 2’-O-methylation (box C/D snoRNAs) orpseudouridylation (box H/ACA snoRNAs), for a wide range of RNAs by hybridizingto the region of the RNA that needs to be modified. Other snoRNAs are essential fornucleolytic cleavage of precursor RNAs. Two different mechanisms for the synthesis ofsnoRNAs have been observed; in vertebrates, snoRNAs are processed from previously-excised pre-mRNA introns, while in yeast and plants, the sources of snoRNAs arepolycistronic snoRNA transcripts. Vertebrate telomerase is a box H/ACA snoRNP.The signal recognition particle (SRP) RNA is involved in transport of newly-translatedsecretory proteins to the cytosol. The SRP binds to a signal at the N-terminus of aprotein as the protein is synthesized. The SRP then binds to a receptor that is bound tothe membrane of the endoplasmic reticulum. The protein begins to transverse themembrane en route to the cytosol, and protein synthesis continues, with the SRPrecycled to assist with another protein-ribosome complex.Guide RNAs (gRNAs) are a novel class of small noncoding RNA molecules that aretranscribed from the maxicircles and minicircles of trypanosome mitochondria. gRNAscontain the necessary information for proper editing (insertion or deletion of uridines)of mitochondrial precursor RNAs in the trypanosomes, resulting in functional RNAs.The 5’ end of a gRNA is complementary to its mRNA target sequence that is 3’ of themodification site, serving as an “anchor” for the gRNA. The central portion of a gRNAis complementary to the mature, edited mRNA and thus serves as the editing template.The 3’ end of a gRNA is a posttranscriptionally-added oligo[U] tail; the function of thistail is not presently certain.WebsitesModified RNA Nucleotides: http://medlib.med.utah.edu/RNAmods/Noncoding RNAs: http://biobases.ibch.poznan.pl/ncRNA/Noncoding RNAs in plants: http://www.prl.msu.edu/PLANTncRNAs/The RNA World: http://www.imb-jena.de/RNA.htmlRNABase: The RNA Structure Database: http://www.rnabase.org/RNAi Database: http://formaggio.cshl.org/%7Emarco/fabio/index.htmlRNase P Database: http://jwbrown.mbio.ncsu.edu/RNaseP/home.html
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 3 of 23snoRNA Database: http://rna.wustl.edu/snoRNAdb/SRP Database (Christian Zwieb): http://psyche.uthct.edu/dbs/SRPDB/SRPDB.htmltmRNA (Kelly Williams): http://www.indiana.edu/~tmrna/tmRNA (Christian Zwieb): http://psyche.uthct.edu/dbs/tmRDB/tmRDB.htmlUridine Insertion/Deletion RNA Editing (gRNA):http://www.rna.ucla.edu/trypanosome/UTR Database: http://bighost.area.ba.cnr.it/BIG/UTRHome/Yeast snoRNA Database: http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.htmlFurther ReadingBachellerie JP & Cavaillé J (1998). Small Nucleolar RNAs Guide the RiboseMethylations of Eukaryotic rRNAs. In: Modification and Editing of RNA. Grosjean H& Benne R, editors. ASM Press, Washington, DC.Estévez AM & Simpson L (1999). Uridine insertion/deletion RNA editing intrypanosome mitochondria – a review. Gene 240:247-260.Guthrie C & Patterson B (1988). Spliceosomal snRNAs. Annual Review of Genetics22:387-419.Hutvagner G & Zamore PD (2002). RNAi: nature abhors a double-strand. CurrentOpinion in Genetics and Development 12:225-232.Kiss T (2002). Small Nucleolar RNAs: An Abundant Group of Noncoding RNAs withDiverse Cellular Functions. Cell 109:145-148.Mattick JS (2001). Non-coding RNAs: the architects of eukaryotic complexity. EMBOReports 2:986-991.Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, & Bartel DP (2002). MicroRNAs inplants. Genes & Development 16:1616-1626.Samarsky DA & Fournier MJ (1999). A comprehensive database for the small nucleolarRNAs from Saccharomyces cerevisiae. Nucleic Acids Research 27:161-164.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 4 of 23Ribosomal RNA (RNA)Ribosomal RNAs (rRNAs) are specialized RNAs that provide the structural andcatalytic core of the ribosome, the cellular structure that is the site for protein synthesis.The three major forms of rRNA are the 5S, small subunit (SSU, 16S, or 16S-like), andlarge subunit (LSU, 23S, or 23S-like) rRNAs. rRNA may comprise up to 90% of a cell’sRNA.Ribosomal RNAs are organized into operons in the genome. In nuclear rRNA operons,the genes are separated by internal spacers, which are often used for phylogeneticanalyses. In most (higher) eukaryotes, the large subunit rRNA is divided into twopieces, the 5.8S rRNA and a larger rRNA (varying in size between 25S and 28S), with aspacer between them in the genomic sequence. Certain organisms extend this theme bydividing their rRNA molecules into fragments that must be assembled correctly toproduce a viable rRNA. The sizes of the rRNAs vary over a wide range; the Escherichiacoli rRNAs, which were the first to be sequenced and serve as a reference organism forcomparative analysis of rRNA, are 120, 1542, and 2904 nucleotides for the 5S, SSU, andLSU rRNAs, respectively.Size Variation in Ribosomal RNA: Approximate ranges of size (in nucleotides) forcomplete sequences are shown in the table. Unusual sequences (of vastly different)length are excluded.rRNA Molecule5S SSU LSUPhylogenetic Domain/Cell LocationBacteria 105-128 1470-1600 2750-3200Archaea 120-135 1320-1530 2900-3100Eukaryota Nuclear 115-125 1130-3725 2475-5450Eukaryota Chloroplast 115-125 1425-1630 2675-3200Eukaryota Mitochondria 115-125 685-2025 940-4500Overall 105-135 685-3725 940-5450Structure models for the ribosomal RNAs have been proposed using comparativesequence analysis methods. The majority of the base-pairs predicted by these methodsare canonical (G:C, A:U, and G:U) base-pairs that are consecutive and antiparallel witheach other, forming nested secondary structure helices. Many tertiary structureinteractions were also proposed with comparative analysis. These interactions includebase triples, non-canonical base-pairs, pseudoknots, and many RNA motifs (see “Motifsin RNA Tertiary Structure”). The most recent versions of the models for the Escherichiacoli 16S and 23S rRNAs are shown below (Figure XX).
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 5 of 23More than twenty years after the first 16S and 23S rRNA comparative structure modelswere proposed, the most recent structure models (Figure XX) were evaluated againstthe high-resolution crystal structures of both the small (Wimberly et al. 2000) and large(Ban et al. 2000) ribosomal subunits that were solved in 2000. The results wereaffirmative; approximately 97-98% of the 16S and 23S rRNA base-pairs, includingnearly all of the tertiary structure base-pairs, predicted with covariation analysis werepresent in these crystal structures. In addition, many new motifs have been proposedand characterized based upon the crystal structures.Figure XX. Comparative secondary structure diagrams for the Escherichia coli 16S and23S rRNAs. A, 16S rRNA; B, 23S rRNA, 5’ half; C, 23S rRNA, 3’ half. Base-pairsymbols: line, canonical (G:C or A:U); small filled circle, wobble (G:U); large opencircle, G:A; large closed circle, all other non-canonical base-pairs. Colors of base-pairsymbols indicate confidence based upon comparative analysis, with red representinghigh confidence, green less confidence, and black indicating that comparative analysisdoes not strongly support or argue against the base-pair. Grey base-pairs are (nearly)invariant, and blue base-pairs were predicted with comparative analysis but could notbe scored using the current system. See the Comparative RNA Web Site for moreinformation.1050100150200250300350400450500550600650700750800850900950100010501100115012001250130013501400145015005’3’IIIIIIm2m5m7m2mm4m5m2m62m62m3G[ ]AAAUUGAAG A G U UU GAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACAC AUGCAAG U CG AA C G G UA AC A G G A A G A A G CUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGA G G G GGA U AA C U A C U G GAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCG G G C C U C U U GCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGG CUCACCUAGGCGACG AUCCCUAGCUGGUCUGAG AGGA UG ACC A GC CACACUGGAACUGAGACA C GG U C C A GACUCCUAC GGGAGGC A GCAGUGGGGAAUAUUGCACAAUGGGCGCAA G C C U G A U G C A GCCA UGCCGCGUGUAUGAAGAAGGCCUUCG G G U UGU A AAG U A CUUUCAGCGGGGAGGAAGGGAGUAAAGUUAAU ACCUUUGCUCA UUGAC G UUACCCGCAGAAGAAGCACCGGCUA A CUCCGψGCCAGCAG C CGC GGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUG GGCGUAAAGCGCACGCAGGCGGUUUGUUAAGUCAGAUGUGAAAUCCCCGGGCUCA A C C U G G G AA CU G C A U C U G AU AC U G G C A A G CUUGAGUCUCGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAA A U G CGUA GAGA U C U G G A GG AA UAC CGGU GG C GAAGGCGGCCCCCUGGACGAAGACUGACGCUCAGGUGCGAAAGCGUGGGGA GCAAACAGGAUUA G AUACCCUGGUAGUCCACGC C G UAAACGAUG U C G A C U U GGAGGUUGUGCCC U UGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUCGACCGCCUG G GGAG UACG G C C GCAAGGUUAAAACUCAA AU G A A U U G A C GGG G G C C C GCA C A A GCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACG CGAAGAAC C U UACCUGGUCUUGACAUCCACGGAAGUUUUCAGAGA U G A G A A U GUGCCUU CGGGAACCGUGAGAC AGGUGCUGCA UGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUA AGUCCCG CAA C G A G CGC A ACC C U U A U C C U U U G U U G C CA GC G G U CCGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGU CAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUAC A AU GGCGCAUACA A A GAGAA GCGA C CUCG CGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUU CGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACA AGGU A A C C G U A G GGGAACCUGCGGUUGGAUCACCUCCUUAAIIIIII501001502002503003504004505005506006507007508008509009501000105011001150120012501300135014001450150015501600164029005’ 3’3’ halfm1m5m6(2407-2410)(2010-2011)(2018)(2057/2611 BP)(2016-2017)GGUUAAGCGACUAAGCGUACACGGUGGAUGCC CUG G C A G U C A G A GGCGAUGAAGGACGUGCUAAUC UGCGAUAAG CGUCGGUAAGGUGAUAUGAACC GUUAUAACCGGCGAUUUCCGA A U GGGGA AACCC AGUGUGUUU CGACACACUAUCAUUAACUGA A UCCAUAGGUUAAUGAGGCGAAC C G G G GG A A CUG A AACAUCUAAGUACCCCGAGGAAAAGAAAUCAACCGAGAUUCCCC CAGUAGCGGCGAGCGAACGGGGAGCAGCCCAG A G CCU G A AUC A G U G U G U G U G U U A G U GGAA GCGUCUGG AAAGGCGCGCG AUACAGGGUGACAGCCCCGUACACAAAAAUGCACAUGCUGUGAGCUCGAUGAGUAGGGCGGGACACGUGGU AUCCUGUCUGAAUAUGGGGGGAC C AUCCUCC A AGGCUAAAUACUCCUGACUGACCGAUAGUGAACCAGUACCGUGA G GGAA A GGCGAAAAGAACCCCGGCGA G G G GA GU GAA A A A GAA CCUGAAACCGUGUACGUACAAGCAGUGGGAGCACGCUUAGGCGUGUGACUGCGUA C C U UUUGUAUAAUGGGUCAGCGACUUAUAUUCUGUAGCAAG G U UA AC C G AAUAGGGGAGCCGAAGGGAAACCGAGUCUUAACU G G G C GUUA A GUUGCAGGGUAUAGACCCGAAACCCGGUGAUCUAGCCAUGGGCAG G U UG A AG G U U G G G UAACACUAACUGGAGGACCGAACCGACUAAUGψUGAAAA AUUAGCGGAUGACUUGUGGCUGGGGGUGAAAG GCCAAUC A AACCGGGAGAUA GCUGGUUCUCCCCGAAAGCUAUUUAGGUAGCGCCUCGUGAAUUCAUCUCCGGGGGUAGAGCACUGUUUCGGCAAGGGGGUCAUCCCGACUUA CCAACCCGAUGCAAACUG CGAAUACCGGAGA AUGUUAUCACGGGAGACACACGGCGGGψGCUAA C G U C C G U C G U GAAGAGGGAAAC AACCCA G A CCGCCAGCUAAGGUCCCA AA GU CAUGGUUAAGUGGGAAA CGAUGUGGGAAGGCCCAGAC A GCCAGGAUGUUGGCUUAGAAG C AG C C A U C A U UUAAA GAAAG CG UAAUAGCUCACUGGUCGAGUCGGCCUGCGCG G AAGAUGUAACGGGGCUAAACCAUGCACCGAAGCUGCGG CAGCGACGCU UAUGCGUUGUUGGGUAG G G G A GCGUUCUGUAAGCCUGCGAA GGUGUGCUG UGAGGCAUGCUGGAGGUAUCAGAAGUG CGAAUG C U G A CAUAAGUAACGA U A A AGCGGGUGAA AAGCCCGCU CGCCGGAAGACCAAGGGUUCCUGUCCAACGUUAA U C G G G G C A G GGUGAGU CGACCCCUAAGGCGAGGCCGAAA G G CGUAG U CG A UG GGAA ACAGGUUAA UAUUCCUGUACU U G G U G U U A C U G CG AA G G G G GGA CGGAGAAGGCUAUGUUGGCCGGGCGACGGUU G UC C C G G UUUAAGCGUGUAGGCUGGUUUUCCAGGCAAAU C C G G A A A A U CA AG G C UG A GG C G U GAUGA CG A G G C A C UACGGUGCUGAAGCAACAAAUGCCCUGCUUCCAGGAAAAGCCUCUAAGCAUCAGGUAACAUCAAAUCGUACCCCAAAC CG ACACAGGUGGUC AG G U A GAGAAUACCAAGGCG CGCUUAACCUUBIVVVI5’3’165017001750180018501900195020002050210021502200225023002350240024502500255026002650270027502800285029005’ halfm2m3m5m6m7mmm2(1269-1270)(413-416)(1262-1263)(746)(531)5mm-[m2G]GGUUAAGCU UGAGAGAA CUCGGGUGAAGGAACUAGGCAAAAUGGUGCCGUAACUUCG G GA G A AG G C A CGCUGAUAUGUAGGUGAGGUCCCU C GCGGAUGGAGCUGAAAUCAGU CGA AG A U A C C A G CUGGCUGCAACUGUUUAUUAA A AA C ACAGCACUGUGCAAACACGA AAGUGGACGUAUACGGUGUGAC G C CUGCCCG GUGCCGGAA GGUUAAUUGAUGGGGUUAGCGC AAGCGAAGCUCUUGAUCGAAGCCCCGGU AAACGGC GGCCGψAAC ψAψAACGGUC CU AAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCUGCACGAAUGGCGUAAUGAUGGCCAGGCUGUCUCCACCCGAGACUCA GU G A A AUUGAACUC GC U GUG AAGAUGCAGUGUAC C C G C G G CAA G A C G GAAAGA CCCCGUGAACCUUUACUAUAGCUUGACACUGAACAUUGAGCCUUGAUGUGUAGG A UAG G U G GGA GGCUUUGAA GUGUGGACGC CAGUCUGCAUGGAGCCGACCUUGAAAUACCACCCUUUAAUGUUUGAUGUUC U A A C G UUG A C C C G U AAUCCGGGUUGCGGACAGUGUCUGGUGGGUAGUU U GACUGG G GCGGUC UCCUCCUAAAG A GUAACGGAGGA G C A CGAAGGUUGGCUAAUCCUGGUCG G ACAUCAGGA GGUUA GUGC AAUGGCAUAAGCCAGCUUGAC U G C G A G C G U GACGGCGCGAGCAGGUGCGAAAGCAGGUCAUAGUGAUCCGGUGGUUCUGAAUGGAAGGGCCAUCGCUCAACGGAUAAAAGGU ACUCCGGGG A DAACAGG C ψGA U A C C G C CC A AG AG UUCAUAUCGACGGCGGUGUUUGGCACCUCGAψGUCGGCUCAUCACA U C C U G G G G C U G AAGUAGGUCCCAAGGGUAUGGCUGUUCGCCAUUUAAA GUGGUACGCGAGCψGGGUUUAGAACGUCGUGAGACA GUψCGGUCCCUAUCUGCCGUGGGCGCUGGAGAACU GAGGGGGGCUGCUCCUA GUA CG AGAGGACCGGAGUGGACGCAUC ACUGGU GUUCGGGUUGUCAUGCCAAUGGCACUGCCCGGUAGCUAAAUGCGGAAGAGAUAAGUGCUGAAAGCAUCU A AGCACGAA A CUUGCCCCGAGAUGAGUUCUCCCUGACCCUUUAAGGGUCCUGAAGGAA C G U U GA AGACGACGACGUUGAUAGGCCGGGUGUGU AAGCGCAGCGAUGCGUUGAGCUAACCGGUA CUAAUGAACCGUGAGGCUUAACCUUCWebsitesThe Comparative RNA Web (CRW) Site (all rRNAs):http://www.rna.icmb.utexas.edu/European rRNA Database (SSU and LSU rRNAs):http://oberon.rug.ac.be:8080/rRNA/5S rRNA Database: http://biobases.ibch.poznan.pl/5SData/Ribosomal Database Project II (RDP-II): http://rdp.cme.msu.edu/html/Ribosomal Internal Spacer Sequence Collection (RISSC): http://ulises.umh.es/RISSC/
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 6 of 23RNABase Ribosomal RNA Entries: http://www.rnabase.org/listing/?cat=rrnaFurther ReadingBan N, Nissen P, Hansen J, Moore PB, Steitz TA (2000). The complete atomic structureof the large ribosomal subunit at 2.4 Å resolution. Science 289:905-920.Gutell RR, Lee JC, & Cannone JJ (2002). The Accuracy of Ribosomal RNA ComparativeStructure Models. Current Opinion in Structural Biology 12:301-310.Harms J, Schluenzen F, Zarivach R, Bashan A, Gat S, Agmon I, Bartels H, Franceschi F,& Yonath A (2001). High resolution structure of the large ribosomal subunit from amesophilic eubacterium. Cell 107:679-688.Schluenzen F, Tocilj A, Zarivach R, Harms J, Gluehmann M, Janell D, Bashan A, BartelsH, Agmon I, Franceschi F, & Yonath A (2000). Structure of functionally activatedsmall ribosomal subunit at 3.3 Å resolution. Cell 102:615-623.Wimberly BT, Brodersen DE, Clemons WM Jr, Morgan-Warren RJ, Carter AP, VonheinC, Hartsch T, Ramakrishnan V (2000). Structure of the 30 S ribosomal subunit. Nature407:327-339.Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JHD, & NollerHF (2001). Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883–896.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 7 of 23Transfer RNA (RNA)Transfer RNAs (tRNAs) are typically 70-90 nt in length in nuclear and chloroplastgenomes and are directly involved in protein synthesis. The carboxyl-terminus of anamino acid is specifically attached to the 3’ end of the tRNA (aminoacylated). Theseaminoacylated tRNAs are substrates in translation, interacting with a specific mRNAcodon to position the attached amino acid for catalytic transfer to a growingpolypeptide chain. Thus, tRNAs decode (or translate) the nucleotide sequence duringprotein synthesis.tRNAs have a characteristic “cloverleaf” structure that was initially determined withcomparative analysis. Crystal structures of tRNA substantiated this secondarystructure and revealed that different tRNAs formed very similar tertiary structures,underscoring the key underlying principle of comparative analysis. The “variableloop” of tRNA is primarily responsible for length variation among tRNAs; some of themitochondrial tRNAs are smaller than the typical tRNA, shortening or deleting the D orTΨC helices.Figure YY. tRNA secondary structure (Saccharomyces cerevisiae phenylalanine tRNA).Structural features are labeled.ACCEPTORSTEMLOOPD LOOPVARIABLELOOPLOOPANTICODONΨT C102030 405060705’3’GCGGAUUUAGCUCAGDDGGG AG A G CGCCAGACUG A AYAUCUGGA GGUCC U G U GT ΨCGAUCCACAGAAUUCGCACCAΨT C STEMD STEMANTICODON STEMNon-mitochondrial tRNAs come in two types: type 1 and type 2. Structurally, themajor difference between the two types is the addition of a stem-loop structure in thevariable loop of type 2 tRNAs. The tRNA types do not correlate with the two classes ofaminoacyl-tRNA synthetases, where class 1 synthetases attach amino acids to the 2’-OHand class 2 synthetases attach amino acids to the 3’-OH of the terminal nucleotide of thetRNA.Over fifty modified nucleotides have been observed in different tRNA molecules.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 8 of 23WebsitesAminoacyl-tRNA Synthetases Database: http://rose.man.poznan.pl/aars/index.htmlGenomic tRNA: http://lowelab.ucsc.edu/GtRNAdb/Mattias Sprinzl’s tRNA compilation: http://www.uni-bayreuth.de/departments/biochemie/sprinzl/trna/Modified RNA Nucleotides: http://medlib.med.utah.edu/RNAmods/Further ReadingHolley RW, Apgar J, Everett GA, Madison JT, Marquisee M, Merrill SH, Penswick JR, &Zamir A (1965). Structure of a ribonucleic acid. Science 147:1462-1465.Kim SH (1979). Crystal structure of yeast tRNAphe and general structural features ofother tRNAs. In Transfer RNA: Structure, Properties, and Recognition (Schimmel PR,Soll D & Abelson JN, eds), pp. 83-100, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, New York.Kim SH, Suddath FL, Quigley GJ, McPherson A, Sussman JL, Wang AH, Seeman NC, &Rich A (1974). Three-dimensional tertiary structure of yeast phenylalanine transferRNA. Science 185:435-440.Levitt M (1969). Detailed molecular model for transfer ribonucleic acid. Nature 224:759-763.Marck C & Grosjean H (2002). tRNomics: Analysis of tRNA genes from 50 genomes ofEukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 8:1189-1232.Quigley GJ & Rich A (1976). Structural domains of transfer RNA molecules. Science194:796-806.Robertus JD, Ladner JE, Finch JT, Rhodes D, Brown RS, Clark BF, & Klug A (1974).Structure of yeast phenylalanine tRNA at 3Å resolution. Nature 250:546-551.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 9 of 23RNA Secondary and Tertiary Structure (RNA Structural Elements)For this discussion, we define secondary structure as two or more consecutive andcanonical base-pairs that are nested and antiparallel with one another. All otherstructure, including non-canonical or non-nested base-pairs and RNA motifs, isconsidered to be tertiary structure. Paired nucleotides are involved in interactions inthe structure model; all other nucleotides are considered to be unpaired.Base-Pairs (Canonical and Non-Canonical) (RNA Structural Elements)A base-pair is formed when two nucleotides hydrogen bond to each other, typicallybetween the bases of the nucleotides. Most RNA base-pairs observed to date areoriented with the backbones of the two nucleotides in an antiparallel configuration.The canonical base-pairs, G:C and A:U, plus the G:U ("wobble") base-pair, wereoriginally proposed by Watson and Crick. All other base-pairs are considered to benon-canonical. While "non-canonical" connotes an unusual or unlikely combination oftwo nucleotides, a significant number of non-canonical RNA base-pairs has beenproposed in rRNA comparative structure models and substantiated by the ribosomalsubunit crystal structures. A larger number of non-canonical base-pairs is present in thecrystal structures that were not predicted with comparative analysis.Stems (RNA Structural Elements)A stem is a set of base-pairs that are arranged adjacent to and antiparallel with oneanother. While an RNA helix is a collection of smaller stems connected by loops andbulges, the terms “stem” and “helix” are often used interchangeably.The base-pairs that comprise a stem are nested; that is, drawn graphically, each base-pair either contains or is contained within its neighbors. For two nested base-pairs, a:a’and b:b’, where a < a’, b < b’, and a < b in the 5’ to 3’ numbering system for a givenRNA molecule, the statement a < b < b’ < a’ is true. Figure ZZ shows nesting for tRNAin two different formats that represent the global arrangement of base-pairs. Nestingarrangements can be far more complicated in a larger RNA molecule. Most base-pairsare nested, and most helices are also nested. Base-pairs that are not nested arepseudoknots (see Pseudoknots).Loops (RNA Structural Elements)Unpaired nucleotides in a secondary structure model are commonly referred to asloops. Many of these loops close one end of an RNA stem and are called "hairpinloops;" phrased differently, the nucleotides in a hairpin loop are flanked by a singlestem. Loops that are flanked by two stems come in several forms. A "bulge loop"occurs only in one strand in a stem; the second strands nucleotides are all forming base-pairs. An "internal loop" is formed by parallel bulges on opposing strands, interruptingthe continuous base-pairing of the stem. Finally, a "multi-stem" loop forms when threeor more stems intersect. Figure AA is a schematic RNA that shows each of these types
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 10 of 23of loop. The sizes of each type of loop can vary; certain combinations of loop size andnucleotide composition have been shown to be more energetically stable.Figure ZZ. Nested and non-nested base-pairs. The comparative model of thesecondary and tertiary structure of tRNA is shown in two formats that highlight thehelical and nesting relationships. In both panels, the secondary structure base-pairs areshown in blue, tertiary base-pairs in red, and base triples in green. Some tertiary base-pairs are nested (red lines do not cross blue lines); red lines representing pseudoknotbase-pairs do cross. A. Histogram format, with the tRNA sequence shown as a“baseline” from left to right (5’ to 3’). Secondary structure elements are shown abovethe baseline and tertiary structure elements are shown below the baseline. The distancefrom the baseline to the interaction line is proportional to the distance between the twointeracting positions within the RNA sequence. B. Circular format, with the sequencedrawn clockwise (5 to 3) in a circle, starting at the top and base-base interactionsshown as lines traversing the circle. The tRNA structural elements are labeled.10 20 30 40 50 60 70-60-40-200204060The Structure of tRNAA ACCEPTORSTEMLOOPD LOOPLOOPLOOPANTICODONΨT CtRNA5’3’VARIABLE10203040506070B
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 11 of 23Figure AA. Loop types. This schematic RNA contains one example of each of the fourloop types. Colors and labels: B, bulge loop (red); H, hairpin loop (purple), I, internalloop (blue); M, multi-stem loop (green). Stems are shown in gray.535 3HB IMPseudoknots (RNA Structural Elements)A pseudoknot is an arrangement of helices and loops where the helices are not nestedwith respect to each other. Pseudoknots are so named due to the optical illusion ofknotting evoked by secondary and tertiary structure representations. A simplepseudoknot is represented in Figure BB-A. For two non-nested (pseudoknot) base-pairs, a:a’ and b:b’, where a < a’ and b < b’ in the 5’ to 3’ numbering system for a givenRNA molecule, the following statement will be true: a < b < a’ < b’. Contrast thissituation with a set of nested base-pairs (see Stems), where a < b < b’ < a’. Anotherdescriptive explanation (Figure BB-B) of pseudoknot formation is when the hairpinloop nucleotides from a stem-loop structure form a helix with nucleotides outside thestem-loop. Figure ZZ-B shows pseudoknot interactions in green; note how these linescross the blue lines that represent nested helices.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 12 of 23Figure BB. Schematic drawings of a simple pseudoknot. Stems are shown in red andgreen; loops are shown in blue, orange, and purple. A. Standard format. B. Hairpinloop format (after Hilbers et al. 1998).5353Stem 1Loop 1Stem 2Loop 2Loop 3Stem 1Stem 2Loop 1Loop 2 Loop 3A BWebsitesBase-Pair Directory (IMB Jena): http://www.imb-jena.de/IMAGE_BPDIR.htmlThe Comparative RNA Web (CRW) Site (descriptions and images of structuralelements): http://www.rna.icmb.utexas.edu/Non-Canonical Base-Pair Database: http://prion.bchs.uh.edu/bp_type/Pseudobase: http://wwwbio.leidenuniv.nl/~Batenburg/PKB.htmlFurther ReadingChastain M & Tinoco I Jr (1994). Structural Elements in RNA. Progress in Nucleic AcidResearch and Molecular Biology 44:131-177.Hilbers CW, Michiels PJ, & Heus HA (1998). New developments in structuredetermination of pseudoknots. Biopolymers 48:137-153.Pleij CWA (1994). RNA pseudoknots. Current Opinion in Structural Biology 4:337-344.ten Dam E, Pleij K, & Draper D (1992). Structural and functional aspects of RNApseudoknots. Biochemistry 31:11665-11676.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 13 of 23Motifs in RNA Tertiary Structure (RNA Structural Elements)Underlying the complex and elaborate secondary and tertiary structures for differentRNA molecules is a collection of different RNA building blocks, or structural motifs.Beyond the abundant G:C, A:U, and G:U base-pairs in the standard Watson-Crickconformation that are arranged into regular secondary structure helices, structuralmotifs are usually composed of non-canonical base-pairs (e.g. A:A) with non-standardbase-pair conformations that are usually not consecutive and antiparallel with oneanother. Approximately twenty-seven RNA structural motifs have been identified bydifferent research groups with differing methods and criteria; these motifs are listed inalphabetical order below.• 2’-OH-Mediated Helical Interactions: extensive hydrogen bonding between thebackbone of one strand and the minor groove of another in tightly-packed RNAs.• A Story: unpaired adenosines in the covariation-based structure models.• A-Minor: the minor groove faces of adenosines insert into the minor groove ofanother helix, forming hydrogen bonds with the 2’-OH groups of C:G base-pairs.• AA.AG@helix.ends: A:A and A:G oppositions exchange at ends of helices.• Adenosine Platform: two consecutive adenosines form a pseudo-base-pair thatallows for additional stacking of bases.• Base Triple: a base-pair interacts with a third nucleotide.• Bulge-Helix-Bulge: Archaeal internal loop motif that is a target for splicing.• Bulged-G: links a cross-strand A stack to an A-form helix.• Coaxial Stacking of Helices: two neighboring helices are stacked end-to-end.• Cross-Strand Purine Stack: consecutive adenosines from opposite strands of a helixare stacked.• Dominant G:U Base-Pair: G:U is the dominant base-pair (50% or greater),exchanging with canonical base-pairs over a phylogenetic group, in particularstructural locations.• E Loop/S Turn: an asymmetric internal or multi-stem loop (with consensussequence 5’-AGUA/RAA-3’) forms three non-canonical base-pairs.• E-like Loop: a symmetric internal loop that resembles an E Loop (with consensussequence 5-GHA/GAA-3) forms three non-canonical base-pairs.• Kink-Turn: named for its kink in the RNA backbone; two helices joined by aninternal loop interact via the A-minor motif.• Kissing Hairpin Loop: two hairpin loops interact to form a pseudocontinuous,coaxially stacked three-stem helix.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 14 of 23• Lone Pair: a base-pair that has no consecutive, adjacent base-pair neighbors.• Lonepair Triloop: a base-pair with no consecutive, adjacent base-pair neighborsencloses a three-nucleotide hairpin loop.• Metal-Binding: guanosine and uracil residues can bind metal ions in the majorgroove of RNA.• Metal-Core: specific nucleotide bases are exposed to the exterior to bind specificmetal ions.• Pseudoknots (see Pseudoknots)• Ribose Zipper: as two helices dock, the ribose sugars from two RNA strandsbecome interlaced.• Tandem G:A Opposition: two consecutive G:A oppositions occur in an internalloop.• Tetraloop: four-nucleotide hairpin loops with specific sequences.• Tetraloop Receptor: a structural element with a propensity to interact withtetraloops, often involving another structural motif.• Triplexes: stable “triple helix” observed only in model RNAs.• tRNA D-Loop:T-Loop: conserved tertiary base pairs between the D and T loops oftRNA.• U-turn: a loop with the sequences UNR or GNRA contains a sharp turn in itsbackbone, often followed immediately by other tertiary interactions.WebsitesThe Comparative RNA Web (CRW) Site (descriptions and images of structuralelements; motif-related publications): http://www.rna.icmb.utexas.edu/Distribution of RNA Motifs in Natural Sequences:http://www.centrcn.umontreal.ca/~bourdeav/Ribonomics/Pseudobase: http://wwwbio.leidenuniv.nl/~Batenburg/PKB.htmlRNABase: The RNA Structure Database: http://www.rnabase.org/SCOR (Structural Classification of RNA): http://scor.lbl.gov/domain_tert.htmlFurther Reading (Motifs)Batey RT, Rambo RP, & Doudna JA (1999). Tertiary Motifs in RNA Structure andFolding. Angewandte Chemie (International ed. in English) 38:2326-2343. [reviewdiscussing multiple motifs: Adenosine Platform; Base Triple; Coaxial Stacking ofHelices; Kissing Hairpin Loops; Pseudoknot; Tetraloop; Tetraloop Receptor; tRNAD-Loop:T-Loop]
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 15 of 23Blake RD, Massoulié J, & J. R. Fresco JR (1967). Polynucleotides. 8. A spectral approachto the equilibria between polyriboadenylate and polyribouridylate and theircomplexes. Journal of Molecular Biology 30:291-308. [Triplexes]Cate JH & Doudna JA (1996). Metal-binding sites in the major groove of a largeribozyme domain. Structure 4:1221–1229.Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Kundrot CE, Cech TR, & DoudnaJA (1996). Crystal structure of a group I ribozyme domain: principles of RNApacking. Science 273:1678-1685. [Metal-Core; Tetraloop Receptor]Cate JH, Gooding AR, Podell E, Zhou K, Golden BL, Szewczak AA, Kundrot CE, CechTR, & Doudna JA (1996). RNA tertiary structure mediation by adenosine platforms.Science 273:1696-1699. [Adenosine Platform; Tetraloop Receptor]Cate JH, Hanna RL, & Doudna JA (1997). A magnesium ion core at the heart of aribozyme domain. Nature Structural Biology 4:553-558. [Metal-Core; TetraloopReceptor]Chang KY & Tinoco I Jr (1997). The structure of an RNA "kissing" hairpin complex ofthe HIV TAR hairpin loop and its complement. Journal of Molecular Biology 269:52-66.[Kissing Hairpin Loops]Correll CC, Freeborn B, Moore PB, & Steitz TA (1997). Metals, motifs, and recognitionin the crystal structure of a 5S rRNA domain. Cell 91:705-712. [Cross-Strand PurineStack; example of Metal-Binding]Costa M & Michel F (1995). Frequent use of the same tertiary motif by self-foldingRNAs. The EMBO Journal 14:1276-1285. [Tetraloop Receptor]Costa M & Michel F (1997). Rules for RNA recognition of GNRA tetraloops deduced byin vitro selection: comparison with in vivo evolution. The EMBO Journal 16:3289-3302.[Tetraloop Receptor]Diener JL & Moore PB (1998). Solution structure of a substrate for the archaeal pre-tRNA splicing endonucleases: the bulge-helix-bulge motif. Molecular Cell 1:883–894.[Bulge-Helix-Bulge]Dirheimer G, Keith G, Dumas P, & Westhof E (1995). Primary, secondary, and tertiarystructures of tRNAs. In: tRNA: Structure, Biosynthesis, and Function (Söll D &RajBhandary U, editors). American Society for Microbiology, Washington, DC, pp.93-126. [tRNA D-Loop:T-Loop]Doherty EA, Batey RT, Masquida B, & Doudna JA (2001): A universal mode of helixpacking in RNA. Nature Structural Biology 8:339-343. [A-Minor]Elgavish T, Cannone JJ, Lee JC, Harvey SC, & Gutell RR (2001). AA.AG@Helix.Ends:A:A and A:G Base-pairs at the Ends of 16 S and 23 S rRNA Helices. Journal ofMolecular Biology 310:735-753. [AA.AG@helix.ends]Gautheret D, Damberger SH, & Gutell RR (1995). Identification of base-triples in RNAusing comparative sequence analysis. Journal of Molecular Biology 248:27-43. [BaseTriples]
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 16 of 23Gautheret D, Konings D, & Gutell RR (1994). A major family of motifs involving G-Amismatches in ribosomal RNA. Journal of Molecular Biology 242:1-8. [Tandem G:AOppositions]Gautheret D, Konings D, & Gutell RR (1995). GU base pairing motifs in ribosomalRNAs. RNA 1:807-814. [Dominant G:U Base-Pair]Gutell RR, Cannone JJ, Shang Z, Du Y, & Serra MJ (2000). A Story: Unpaired AdenosineBases in Ribosomal RNA. Journal of Molecular Biology 304:335-354. [A Story;Adenosine Platform; E Loop/S Turn; E-like Loop]Gutell RR, Cannone JJ, Konings D, & Gautheret D (2000). Predicting U-turns inRibosomal RNA with Comparative Sequence Analysis. Journal of Molecular Biology300:791-803. [U Turn]Gutell RR, Larsen N, & Woese CR (1994). Lessons from an evolving ribosomal RNA:16S and 23S rRNA structure from a comparative perspective. Microbiological Reviews58:10-26. [review of several motifs: Base Triple; Coaxial Stacking of Helices;Dominant G:U Base-Pair; Lone Pair; Pseudoknot; Tetraloop]Ippolito JA & Steitz TA (1998). A 1.3-A resolution crystal structure of the HIV-1 trans-activation response region RNA stem reveals a metal ion-dependent bulgeconformation. Proceedings of the National Academy of Sciences USA 95:9819-9824.[Metal-Core; Tetraloop Receptor]Jaeger L, Michel F & Westhof E (1994). Involvement of a GNRA tetraloop in long-rangeRNA tertiary interactions. Journal of Molecular Biology 236:1271-1276. [TetraloopReceptor]Klein DJ, Schmeing TM, Moore PB, & Steitz TA (2001): The kink-turn: a new RNAsecondary structure motif. TheEMBO Journal, 20:4214-4221. [Kink-Turn]Lee JC, Cannone JJ, & Gutell RR (2003). The Lonepair Triloop: A New Motif in RNAStructure. Journal of Molecular Biology 325:65-83. [Lonepair Triloop]Leonard GA, McAuley-Hecht KE, Ebel S, Lough DM, Brown T, Hunter WN (1994).Crystal and molecular structure of r(CGCGAAUUAGCG): an RNA duplex containingtwo G(anti).A(anti) base pairs. Structure 2:483-494. [2’-OH-Mediated HelicalInteractions]Leontis NB & Westhof E (1998). A common motif organizes the structure of multi-helixloops in 16 S and 23 S ribosomal RNAs. Journal of Molecular Biology 283:571-583. [ELoop/S Turn]Lietzke SE, Barnes CL, Berglund JA, & Kundrot CE (1996). The structure of an RNAdodecamer shows how tandem U-U base pairs increase the range of stable RNAstructures and the diversity of recognition sites. Structure 4:917-930. [2’-OH-Mediated Helical Interactions]Massoulié J (1968). [Associations of poly A and poly U in acid media. Irreversiblephenomenon] (French). European Journal of Biochemistry 3:439-447. [Triplexes]
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 17 of 23Moore PB (1999). Structural motifs in RNA. Annual Review of Biochemistry 68:287-300.[review of several motifs: Adenosine Platform; Bulge-Helix-Bulge; Bulged-G;Cross-Strand Purine Stack; Metal-Binding; Ribose Zipper; Tetraloop; TetraloopReceptor; U-Turn]Nissen P, Ippolito JA, Ban N, Moore PB, & Steitz TA (2001). RNA tertiary interactions inthe large ribosomal subunit: the A-minor motif. Proceedings of the National Academy ofSciences USA 98:4899-4903. [A-Minor]Pleij CWA (1994). RNA pseudoknots. Current Opinion in Structural Biology 4:337-344.[Pseudoknot]SantaLucia J Jr, Kierzek R, & Turner DH (1990). Effects of GA mismatches on thestructure and thermodynamics of RNA internal loops. Biochemistry 29:8813-8819.[Tandem G:A Oppositions]Tamura M & Holbrook SR (2002). Sequence and structural conservation in RNA ribosezippers. Journal of Molecular Biology 320:455-474. [Ribose Zipper]Traub W & Sussman JL (1982). Adenine-guanine base pairing ribosomal RNA. NucleicAcids Research 10:2701-2708. [AA.AG@helix.ends]Wimberly B (1994). A common RNA loop motif as a docking module and its functionin the hammerhead ribozyme. Nature Structural Biology 1:820-827. [E Loop/S Turn]Wimberly B, Varani G, & Tinoco I Jr. (1993). Biochemistry 32:1078–1087. [Bulged-G]Woese CR, Gutell R, Gupta R, & Noller HF (1983). Detailed analysis of the higher-orderstructure of 16S-like ribosomal ribonucleic acids. Microbiological Reviews 47:621-669.[AA.AG@helix.ends]Woese CR, Winker S, & Gutell RR (1990). Architecture of ribosomal RNA: constraintson the sequence of "tetra-loops." Proceedings of the National Academy of Sciences USA87:8467-8471. [Tetraloops]
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 18 of 23Comparative Sequence Analysis (RNA Structure Prediction)The two underlying principles for Comparative Sequence Analysis are very simple buthave a very profound influence on our prediction and understanding of RNA structure.The principles are: 1) Different RNA sequences could have the potential to fold into thesame secondary and tertiary structure, and 2) the specific structure and function forselect RNA molecules is maintained during the evolutionary process of geneticmutations and natural selection. Typically for RNA structure prediction, homologousbase-pairs that occur at the same positions in all of the sequences in the data set areidentified with covariation analysis (see “Covariation Analysis”), resulting in aminimal structure model. Different structural motifs (see “Motifs in RNA TertiaryStructure”) that have characteristic sequences at specific structural elements that aresufficiently conserved in the sequence data set are identified, culminating in a finalcomparative structure model.The comparative method has been used for a variety of RNA molecules, includingtRNA, the three rRNAs (5S, 16S, and 23S), ITS and IVS rRNAs, group I and II introns,RNase P, tmRNA, SRP RNA, and telomerase RNA. These methods will be moreaccurate and the predicted structure will have more detail for any one type of RNAwhen the number of sequences is large and the diversity among these sequences is high.Covariation Analysis (RNA Structure Prediction)While comparative sequence analysis is based on the simple proposition thatmolecules with the same function will have similar secondary and tertiary structures,covariation analysis, a subset of comparative sequence analysis, identifies base-pairsthat occur at the same positions in the RNA sequence in all of the RNA sequences in thedata set. Covariation analysis searches for positions that have the same pattern ofvariation in an alignment of sequences (see Figure CC). The most recentimplementation of this method usually base-pairs any two positions with the samepattern of variation, regardless of the types of base-pairs. While most of the base-pairtypes that are identified exchange between G:C, A:U, and G:U, covariation analysis hasalso identified exchanges between non-canonical base-pairs.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 19 of 23Figure CC. Examples of covariation. A. Schematic alignment. Five sequences areshown from 5’ at left to 3’ at right. Black and red lines above the alignment show base-pairing. Nucleotide position numbers appear in blue at the bottom of the alignment. B.Summary of covariations from the alignment in panel A. Position numbers arefollowed by the observed base-pair types for the seven base-pairs.Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:PositionNumbers:UAGCGAA nnnnnnn AUCGCUUUAACAAG nnnnnnn GUUGUUUCAGCAGG nnnnnnn GCUGCUCCAGGAGG nnnnnnn GCUCCUCUAAGAAA nnnnnnn AUUCUUU11111 11111221234567 8901234 5678901A B1:21 U:U, C:C2:20 A:U3:19 G:C, A:U4:18 C:G, G:C5:17 G:C, A:U6:16 A:U, G:C7:15 A:A, G:GFor example, the following four sets of base-pair exchanges show covariation: 1)A:U <-> U:A <-> G:C <-> C:G, 2) G:U <-> A:C, 3) U:U <-> C:C, 4) A:A <-> G:G. Bysearching for these coordinated positional variations in an well-aligned collection ofsequences, key elements of an RNA molecule’s core structure can be elucidated. Earliercovariation methods searched specifically for helices composed of canonical base-pairs.Improvements in covariation algorithms and an ever-growing collection of sequencesmake comprehensive searches that consider all base-pairing types in a context-independent manner possible. Due to the requirement that the two base-pairedpositions have the same pattern of variation, covariation analysis will only identify asubset of the total number of base-pairs that are in common to different sequences;other comparative methods must be employed to detect the remainder.Our confidence in the prediction of a base-pair with covariation analysis is directlyproportional to the dependence of the two paired positions. These positions that varyindependently of one another are less likely to form a base-pair that can be predictedwith covariation analysis. In contrast a greater extent of simultaneous variation at thetwo paired positions could indicate that the two positions are dependent on oneanother, and thus we are more confident in these base-pairs that are predicted withcovariation analysis. One of the family of methods that measures thedependence/independence between two positions that are proposed to be base-pairedis the chi-square statistic that gauges the types of base-pairs and their frequencies (see“Phylogenetic Event Counting”). The accuracy of the base-pair predictions withcovariation analysis is very high: approximately 97-98% of the 16S and 23S rRNA base-pairs predicted with covariation analysis are present in the high-resolution crystalstructures.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 20 of 23An important facet of the covariation analysis is that the current covariation methodsare comparing all positions in a sequence alignment, independently of previouspredictions, structural context, and principles of RNA structure. In practice, themajority of covariations involve two single nucleotides exchanging to maintain acanonical base-pair and are arranged into standard secondary structure helices. Thus,covariation analyses have independently determined the two most fundamentalprinciples in RNA structure: the Watson–Crick base-pairing relationship and theformation of helices from the antiparallel and consecutive arrangement of these base-pairs. In addition to these achievements, a significant number of examples of bothcovariation between canonical (A:U, G:C, and G:U) and non-canonical base-pairs andcovariation between non-canonical base-pair types have been predicted for the rRNAsand proven correct by the ribosomal subunit crystal structures. Likewise, examples oftertiary base-pairs that are not part of a larger helices, both short- and long-rangeinteractions, have been predicted and shown to exist.Phylogenetic Events (RNA Structure Prediction)For covariation analysis, a more complete measure of the dependence andindependence between the two positions with the same pattern of variation is gaugedby 1) the number of base-pair types that covary with one another (i.e., A:U, G:C), 2) thefrequency of these base-pair types, and (3) phylogenetic events, or the number of timesthat base-pair was created during the evolution of that base-pair. The first two gaugescan be measured with a chi-square statistic (see “Covariation Analysis”). The thirdgauge requires an understanding of the phylogenetic relationships between thesequences that are in the data set.These three gauges are exemplified by three different base-pairs in 16S rRNA, 9:25,502:543, and 245:283. The 9:25 base-pair has approximately 67% G:C and 33% C:G inthe nuclear-encoded rRNA genes in the three primary forms of life, the Eucarya,Archaea, and the Bacteria. The minimal number of times these base-pairs evolved(phylogenetic events) on the phylogenetic tree is about 4. In contrast, the 502:543 base-pair has 27% G:C, 30% C:G, and 42% A:U, with a minimum of 75 phylogenetic events.Last, the 245:283 base-pair has 38% C:C and 62% U:U in the same set of 16S rRNAsequences, with approximately 25 phylogenetic events.The phylogenetic event counting method can be used to augment the results from theanalysis of base-pair types and base-pair frequencies. It can add or subtract support fora base-pair predicted with covariation analysis. In some situations, this form of analysiscan suggest a base-pair that would not have been predicted based on base-pair type andfrequencies alone.WebsitesGutell Lab Comparative RNA Web Site:http://www.rna.icmb.utexas.edu/METHODS/
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 21 of 23Further ReadingGutell RR, Larsen N & Woese CR (1994). Lessons from an evolving rRNA: 16 S and 23 SrRNA structures from a comparative perspective. Microbiological Reviews 58:10-26.Gutell RR, Lee JC & Cannone JJ (2002). The accuracy of ribosomal RNA comparativestructure models. Current Opinion in Structural Biology 12:301–310.Michel F, Costa M, Massire I & Westhof E (2000). Modeling RNA tertiary structurefrom patterns of sequence variation. Methods in Enzymology 317:491-510.Woese CR & Pace NR (1993). Probing RNA structure, function, and history bycomparative analysis. In The RNA World (Gesteland RF & Atkins JF, editors), pp. 91-117, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 22 of 23RNA Folding (RNA Structure Prediction)The structure and function of any given RNA are dependent on each other. Thus, anunderstanding of how an RNA folds into its functional form can provide insights onthat function. Insight into any potential for fluidity or movement in an RNA structurecan also be indicative of functional properties.The “RNA Folding Problem” can be summarized as the challenge of folding an RNA’sprimary structure into its active secondary and tertiary structure. Currently, nocomplete answer to this question is available, although several approaches have beenable to provide insight. RNA folding algorithms search for secondary structure helicescomposed of consecutive, antiparallel canonical base-pairs. Another set of constraintscomes from thermodynamics, where RNA is expected to fold into its most energeticallystable structure. The kinetics of the folding process will also have an impact on the finalresult.Energy Minimization (RNA Structure Prediction)Traditionally, molecular biologists search for the most thermodynamically stablestructures, using energy minimization techniques. Thermodynamic energy values havebeen experimentally determined for consecutive base-pairs and a few other simplestructural elements. The assumption behind energy minimization in RNA folding isthat the folding process for an RNA molecule can be determined by summing up thetotals of the energy values for its simpler structural elements.The present set of thermodynamic folding algorithms does not always predict acomplete and correct secondary structure for an RNA molecule. This may indicate thateither our understanding of all of the thermodynamic parameters is incomplete or thatthe process is based upon a flawed assumption. These algorithms also are unable topredict tertiary structure base-pairs.WebsitesMichael Zuker’s Home Page (includes mfold and links to his current research):http://www.bioinfo.rpi.edu/~zukerm/Turner Group Home Page: http://rna.chem.rochester.edu/From IMB-JENA (both available from the above link but less direct):Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A PracticalGuide: http://www.bioinfo.rpi.edu/~zukerm/seqanal/Free Energy and Enthalpy Tables for RNA Folding From the Turner Group:http://www.bioinfo.rpi.edu/~zukerm/rna/energy/
Dictionary of Bioinformatics (JM Hancock and MJ Zvelebil, Editors)Entries by Jamie J. Cannone and Robin R. GutellPage 23 of 23Further ReadingBurkard ME, Turner DH, & Tinoco I Jr (1998). The Interactions that Shape RNA. In: TheRNA World, 2nd Edition (Gestland RF, Atkins JF, & Cech TR, editors). Cold SpringHarbor Press, pp. 233-264.Mathews DH, Sabina J, Zuker M, & Turner DH (1999). Expanded SequenceDependence of Thermodynamic Parameters Provides Robust Prediction of RNASecondary Structure. Journal of Molecular Biology 288:911-940.Mathews DH, Turner DH, & Zuker M (2000). RNA Secondary Structure Prediction. In:Current Protocols in Nucleic Acid Chemistry (Beaucage S, Bergstrom DE, Glick GD, &Jones RA, editors), John Wiley & Sons, New York, 11.2.1-11.2.10.Nagel JHA & Pleij CWA (2002). Self-induced structural switches in RNA. Biochimie84:913-923.Zuker M (1989). On finding all suboptimal foldings of an RNA molecule. Science244:48-52.Zuker M (2000). Calculating nucleic acid secondary structure. Current Opinion inStructural Biology 10:303-310.