Gutell 115.rna2dmap.bibm11.pp613-617.2011


Published on

Xu W., Wongsa A., Lee J., Shang L., Cannone J.J., and Gutell R.R. (2011).
RNA2DMap: A Visual Exploration Tool of the Information in RNA's Higher-Order Structure.
Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2011), Atlanta, GA. November 12-15, 2011. IEEE Computer Society, Washington, DC, USA. pp. 613-617.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Gutell 115.rna2dmap.bibm11.pp613-617.2011

  1. 1. RNA2DMap: A Visual Exploration Tool of the Information in RNA’s Higher-Order StructureWeijia Xu1,†Ame Wongsa 2, ‡Jung Lee 3,*Lei Shang4,* Jamie J Cannone 5,* Robin R. Gutell 6,*†Texas Advanced Computing Center,‡School of Information, *Institute for Cellular and Molecular BiologyThe University of Texas at AustinAustin, Texas,;;;;;;Abstract -- A new and emerging paradigm in molecular biologyis revealing that RNA is implicated in nearly every aspect ofthe metabolism in the cell. To enhance our understanding ofthe function of these RNA molecules in the cell, it is essentialthat we have a complete understanding of their higher-orderstructures. While many computational tools have beendeveloped to predict and analyse these higher-order RNAstructures, few are able to visualize them for analyticalpurposes. In this paper, we present an interactive visualizationtool of the secondary structure of RNA, named RNA2DMap.This program enables multiple-dimensions of informationabout RNA structure to be selected, customized and displayedto visually identify patterns and relationships. RNA2DMapfacilitates the comparative analysis and understanding ofRNAs that cannot be readily obtained with other graphical ortext output from computer programs. Three use cases arepresented to illustrate how RNA2DMap aids structuralanalysis.Keywords: Biological Data Visulation; RNA Struaral Analysis;Interative ApplicationI. INTRODUCTIONRNA structure, function, and evolution are studied withexperimental and computational methods. Comparativeanalysis, one of the computational methods, has been usedto determine an RNA’s higher-order structure with highaccuracy and detail, principles of RNA structure, and theevolution of RNA and phylogenetic relationships. Thisanalysis is dependent on the number and diversity of theRNA sequences and high-resolution crystal structureswithin any given RNA family, and the sophistication of thecomputational system to analyse the data. Informationrelated to RNA sequences is usually stored within adatabase or in text files with specific formats. We havedeveloped rCAD – RNA Comparative Analysis Databasethat utilizes the Microsoft SQL-server to organize,manipulate, and analyse these multiple dimensions ofinformation [1] [2]. rCAD is the foundation used toeffectively create inter-relationships between multiple typesof information. However, from these patterns, biases andrelationships in rCAD’s data, we are unable to synthesize allof this knowledge into a complete understanding of RNA’sstructure, function, and evolution..While most computational approaches are developedbased on data mining and algorithmic approaches tounderstand the structure and functions of RNA molecules,visual analytic approaches can provide insights from thedata. The secondary structure diagram of RNA is routinelyused to visualize the base pairs, helices, and structuralmotifs of the RNA’s higher-order structure. It is thereference point for discussion and analysis of sequencealignments, high-resolution crystal structures, andevolutionary relationships. Many RNA scientists map theirexperimental data and other relevant information onto thesediagrams. However this is usually done on a figure in apublished paper, not as an interactive graphics display.Our goal is to create a new and novel application to allowresearchers to dynamically navigate through this myriad ofmultiple dimensions of information. RNA2DMap, the focusof this paper, is a new foundation for the dynamic visualpresentation of the growing number of data types that willenhance our knowledge of RNA structure, folding, and itsevolution. RNA2DMap is akin to the Macroscope, anabstract computational system to facilitate an understandingof a complex system from all of its components, bothphysical and temporal. RNA2DMap is available at, part of Comparative RNA Web (CRW) Site [3].II. BACKGROUND AND RELATED WORKThe initial attempts to visualize RNA secondarystructure focused on the automatic drawing algorithms togenerate aesthetically pleasing secondary structure diagramswith very limited overlap of strands and interactions [4] [5][6]. A radial drawing algorithm was implemented inRNAViz which draws each helix and base in a multi-loop atregular angular distances [7]. JViz, includes multipledrawing algorithms such as linear linked graph [8], circularrepresentation and RNA dot plot [9]. RNAView is one ofthe first applications that displays the tertiary interactionswith RNA secondary structure [10]. PseudoViewer is a toolfor visualizing RNA secondary structure with Pseudoknots,a special type of structure motifs. PseudoViewer canefficiently visualize a large RNA sequence with any type ofpseudoknot as a compact planar drawing. PseudoViewerclaims to be 10 times faster than the previous algorithm andproduces a more aesthetic structure [11].Recent developments add features such as interactiveediting, annotation, and comparison among a set of RNA2011 IEEE International Conference on Bioinformatics and Biomedicine978-0-7695-4574-5/11 $26.00 © 2011 IEEEDOI 10.1109/BIBM.2011.60613
  2. 2. svcRspoddsctovTfimTincIIATcthmdwssPdthvscssnsecondary strvisualizing mucreates an inRNAMovies isstructure spacepredictions. Onopen source todiagrams of Rdesigned as asecondary struccapabilities too adapt tovisualization toThis tool drawfrom a few difmplements fouThe primary fnteractively edcan output visuII. VISUALIA. Main interfaThe main appconsists of twohe main visualFigure 1 OvervThe control pmolecule selecdisplay optionswith the molsequence andselected by lefPanel 3 providdiscussed in deThe main vishree parts, (visualization wsaved positioncontains optiostructure (zoomsecondary strunumber, to presructures. RNAultiple sets ofnterpolated ans primarily uss for evaluatinne of the commool to create, aRNA sequenceuser interfacectures in a speccreate visual rapplications iool for RNA sews a RNA secofferent RNA sur different drafeature for VAdit and annotaualization as staIZATION IMPLEace and basic inplication interfo groups: contrlization on theview of the RNA2panel is furtherctor, (2) nucs. Different Recule selectorstructure infft-click and/ordes options fortails in subsequsualization pan(4) the strucwindow – Strun toolbar. Thens, to changem controls), toucture, to locsent all positioAMovies issecondary strnimation ofsed to showng different secmonly used tooannotate and des [13]. Thisto create newcial format andrepresentationsin analysis.econdary strucondary structustructure file foawing algorithARNA is for tated secondaryatic images.EMENTATIONS Anteractionsface of RNA2rol panels onright.2DMap interface.r divided intocleotide informRNA moleculer. Panel 2 rformation forhovered by thdifferent viewuent sections.nel at the rightcture toolbar,ucture Navigae structure tooe the size ofprint current vcate nucleotidns and their baa system fructure data athe data [1RNA secondacondary structuols is XRNA,isplay secondasoftware toolw or edit existid is limited ins of the data aA more rececture is VARNure automaticaormats. VARNhms in Java [1the biologistsy structures aAND FEATURES2DMap (Fig.the left side a.three parts, (1mation, and (e can be selecteveals availabthe nucleotidhe mouse cursws which willt is composed(5) the maator, and (6) tolbar at the tf the secondaview or the entdes by positiase pair partnerforand2].aryureanaryisingitsandentNA.allyNA4].toandS1)and) a(3)tedbledesor.beofainthetoparytireionr ifthey foconservpositionbeen senteringsaved inB. SecoFigure 2conformCurrediagramstructurmarismmodelsmore Rutilizesdiagramwindowor zoomwith a sor decrspace oThe D“Structand basbase padefinedand coglyphs.2). ThepositionEscheralso beand numC. VisuThesix typdistancform a secondvation values fn toolbar at thselected, by dg the positionn their currentondary Structur2. Highlightingmations.ently the 5S, 1ms for the higres (Thermusmortui (5S ands (EscherichiaRNA familiess the coordinatm and mapsw. The visualizmed out to shosliding bar locrease the zoomof the visualizaDisplay Optionture” (Fig. 2). Tse pairs are mairs symbol byd by the Lee &olor for each c. The legend ise tertiary interan numbers forrichia coli (thee shown (or hidmbers displayeualization of ad“data set” tabes of informatces between thdary or tertiafor each positie bottom listsdouble-clickingn number onstate (e.g. RNre viewthe selected b6S, and 23S rRgh resolution tthermophilesd 23S)) and thcoli) are avails will be inclates for an RNthem withinzation can be zow an overviecated at the topm values. Usersation to move thns panel has twThe graphicalmodified in they default reve& Gutell nomenconformations in the Displaactions can bethe crystal strutypical referendden) with tic med every 50.dditional informin the displaytion: 1) the phhe selected nary interactionion in a table.those positiong on the nucthe bottom riNA distance colbase pair groupRNA secondarthree-dimensio(16S) and Hhe comparativlable. In the fuluded. The RNNA’s secondarythe main vizoomed in to shew of the entirp of the screens can also draghe current viewwo tabs – “Dataformat of the n“Structure” seeals the confornclature [15]. Tis shown withay Options pandisplayed or hucture and thence species) pomarks every temationy options panehysical three dnucleotide andn and theThe saveds that havecleotide oright, to beloring).ps and theirry structureonal crystalHaloarculae structureuture, manyNA2DMapy structureisualizationhow detailsre structureto increaseg the whitew.a Sets” andnucleotidesection. Thermation, asThe symbolh differentnel (Figurehidden. Theequivalentositions cann positionsel includesimensionald all other614
  3. 3. nfbsHFavrpcbcscmTdticSnsmininFTnucleotides in tfor every nuclebase pair types,secondary strucHighlighting basFigure 3 Five moTo simultaneand types of covisualizationrepresentation;tpair type whileconformation (based on theirconformationsstructural moticircles are showmotifs are largTherefore, usedifferent typesime. The colorcustomized to eShowing distanRNA2DMapnucleotides bas- left). Thespecified uppermap rangingndicates the sndicates the larFigure (left) 3DThe conservationthe RNA moleceotide, 3) coax, 5) the conformcture pairing, conotif types are higheously renderonformation, Rtechnique, athe color of ththe color of in(Fig. 3). Struunique arrangwith unpaireifs, displayedwn in Fig. 3. Tger than thosers can selects of conformar used for anyemphasize thence values in thcan also showsed on their podistance valuer limit value arfrom blackmallest distanrgest distance vD distances are hn view of the 16Scule, 2) the conxial stacking omation of the bnformation andhlighted in differeboth types ofRNA2DMap ustwo-coloredhe outer circlenner circle is fouctural motifsgement of based nucleotideswith large coThe sizes of coe used for bat multiple bations and moty of the circle rdesired patternhree dimensionw the distanceosition in crystes between zee mapped to ato green whece values andvalues.highlighted with aS rRNA secondarnservation valuof helices, 4) tbase pairs, andmotifent colors.f base pair groses a glyph basd nested circreveals the baor the base paiare categorize pair types as. Five differeolored nucleotiolored circles fase pair groupase pair grouptifs at the samrendering spacee values betwetal structure (Fero and the uscontinuous coere black cod the green coa color map; (rigry structure.uesthe6)oupsedcleaseir’,mebeeenFig.serlorlorlorght)HighlighThedetermiconservdiffereninvarianindicategreaterbelowlower cright rehighly vA. ExplA.B.FigureconformconformA totpair tconformconformproximstructurfundamVisualitypes aidentifyhting conservaextent of nucined with a mvation values ant shades ofnt have a cone greater variaare shown in1.9 are black. Tconservation veveals the extenvariable regionIV. EXElore base pair t5 A (Top).mations. B (Bottomations (see text)tal of 16 basetype can fmations. The fmations associmity with oneral motif theymental to aizing the frequand their confoy patterns thation valuescleotide consemodified Shannare associatedblack and gnservation valation. Positionred. PositionsThe density ofvalues (ie. grent and locationns of the RNAEMPLAR USERtypes and confMost frequentom). Least frequ).pair types areform approxfrequency ofated with eachanother are dy might bean RNAsuency and orgaormations is ahat have beenervation and vnon equation [with the cologray. Positionslue of 2. Lowns with valueswith values imf the black decreater variationns of highly co.SCENARIOSformation typesbase pair guent base pairpossible. Andximately 15each base pairh base pair typediagnostic of thassociated wihigher-orderanization of thevery effectiven characterizevariation is16]. Theseors red ands that arewer valuess of 1.9 ormmediatelyreases with). Fig. 4 -onserved tosgroups andgroups andd each basedifferentr type, thee, and theirhe type ofith and isstructure.e base paire means toed and to615
  4. 4. dvaaCathchththothpfAusAAsethppBFlowins1dssromdpTdiscover newvariation of anThe most freand conformatiand the least frC:C, C:U, andare shown in Fhe vast majoriconformationshelices althoughe regular helhe least frequeoutside of thehis latter grouppair types isfrequent conforA:G base paiusually occursecondary strucA:G base pairsA:A base pairssheared confoexceptions whehe middle ofpairs within aprevious analysB. Nucleotides wFigure 6 Tertiaryone-pair tri-loop,with a thickernteractions and gWhile compsecondary struc16S, and 23Sdimensional stsubunits detsubstantiated trRNAs and ideon our Thermumarismortui 2diagrams [20] [One of the gpredict an RNATo achieve thispatterns thatexisting motif.equent base paiions (Watson-Cequent base paU:U) and conFigures 5A andity of the mosoccur within tgh a few of theix (Fig 5A). Inent base pair grsecondary strup, the most comA:G, and thermations is shirs have theimmediatelycture helix. A ps at the end o. These A:A baormation [17].ere two consecthe helix. Thisecondary strusis [18].within structurey interactions betand the UAA/GAline in RNA2Dgreen lines for baparative analycture for numeS rRNAs [19tructures fortermined withe comparatintified the tertus thermophile23S and 5S[21].grand challengAs secondary as very ambitioucould be a groups (G:CCrick (WC) andair types (G:G,nformations (ned 5B respectivest frequent basthe regular secese base pairsn contrast theroups and confucture helix (Fmmon of the lee most commheared (S). Thesheared confoadjacent toprevious studyf a helix frequase pairs usual. However, tcutive A:G basis observationucture helix ise motifs form tetween nucleotideAN motifs and itsDMap. Blue linase-backbone inysis accuratelyerous RNAs, i9], the high-rthe 30S andith X-rayive structuretiary interactiones 16S rRNArRNA secoges in biologyand three-dimenus goal, an undnew motif orC, A:U, and G:d Wobble (Wb, G:A, A:A, A:early 40 in totely. As expectese pair types acondary structuoccur outsidevast majorityformations occFig 5B). Witheast frequent bamon of the lee majority of tormation. Thethe end ofrevealed that tuently changely form the samthere are somse pairs occurof tandem As consistent wertiary interactioes in the tetralos partner are draes for base-bateractions.y predicted tincluding the 5resolution thre50S ribosomcrystallograpmodel for tns that are drawand Haloarcuondary structuis to accuratensional structuderstanding of tr aU)b)),:C,tal)ed,andureofofcurhinaseasttheeseathetomemeinA:Gwithonsop,awnasethe5S,ee-malphythewnulaureelyure.thefundamis essenstructurloop [2that havor the sFig.examplorange)(pink).base-baare empthe otheC. InveFigure 7highlightWhilpositionsignificclustersrevealsin proxbetweespecificmore rethe rRNweakerFig. 7 rweakimmedistructurevaluatdetermicoordinthermopwithinmental rules thantial. The idenral motifs, inc23], and UAAve a prepondersugar-phosphat6 reveals a reles of these th), one tetralooThe tertiaryackbone bondinphasized wither tertiary strucestigation of no7 The secondaryts all neighbor efle the strongesns indicate a bcant covariatios of nucleotids that the positximity in the then two regionsc function durecent analysisNA secondaryr covariation sreveals the nucovariations.iately adjacenre diagram. Tted with the Rine their absonates in the higophiles 16S rRhydrogen bonat influence thentification andluding the tetrA/GAN [24] alrance to hydrote backbone ofegion of the 2hree motifs - top (purple), any interactionsng with nucleoa thicker linecture interactioon-base pairingstructural map offects (red lines cst covariation sbase pair, weaon scores hades. Our prevtions involvedhree-dimensions of the tRNAring protein sof 16S rRNA rstructure thatscores (also caucleotides thatThree sets ont with one anThese shadedRNA distance oolute physicalgh-resolution cRNA. All threending distancee correct foldincharacterizatioraloop [22], lonll contain spegen bond to anf the RNA mol23S rRNA thatwo UAA/GANnd one lone-pathat form basotides within thto distinguishons.g constraintsof T. thermophiluconnecting nuclescores for twoaker but still save been obsvious analysisin these assocnal structure oA that are invsynthesis [25]reveals severalhave these clualled neighbouare part of a nof covariationnother on theregions in Fioption in RNAdistance bascrystal structure sets of nuclee, suggestingng of RNAon of a fewne-pair tri-cific basesnother has fourN (reddishair tri-loopse pairs orhese motifsthem fromus 16S rRNAeotides).nucleotidestatisticallyserved forof tRNAciations areof tRNA orvolved in a[26]. Ourl regions ofusterings ofur effects).network ofns are notsecondaryig. 7 wereA2DMap tosed on there of the T.eotides arethat these616
  5. 5. nucleotides might form a base pair or are sufficiently closein three-dimensional space to have structural constraintswith the other nucleotides in these clusters of neighboureffects. This observation with RNA2DMap increases ourconfidence that these neighbour effects are true structuralconstraints and demonstrates nucleotides that do not form abase pair can influence the evolution of other nucleotidesthat are physically close with one another.V. CONCLUSIONSIn this paper, we present RNA2DMap, an interactive toolfor visualizing multiple types of information on an RNAsecondary structure. The primary visualization is based onthe standard RNA secondary structure diagram. Nestedcircles with different colors were used to reveal multipledimensions of information, including base pair types, basepair conformations, conservation values of RNA sequences,and the physical three-dimensional distances between theselected nucleotide and all other nucleotides. RNA2DMaphas the flexibility to show different combinations ofinformation on the RNA secondary structure. Wedemonstrate three use cases. The first reveals the frequencyand organization of the different base pair types and theirconformations. The second reveals tertiary interactionsassociated with a few of the structural motifs. The thirdutilizes the RNA distance function to determine if differentsets of positions with a weak covariation are sufficientlyclose in three-dimensional space to either form a base pairor affect the spatial constraints of the nucleotides on othernucleotides in this local region of the RNA structure.RNA2DMap can be adapted to work with any secondarystructure diagram generated with other programs.ACKNOWLEDGEMENTThis work was supported by NIH grants GM085337(awarded to RG and WX) and GM067317 (awarded to RG),and the Welch Foundation F-1427 (awarded to RG).REFERENCES[1] W. Xu, S. Ozer, and R. R. Gutell, "Covariant Evolutionary EventAnalysis for Base Interaction Prediction Using a Relational DatabaseManagement System for RNA," in proceedings of Scientific andStatistical Datatbase Management (SSDBM09), LNCS, Winslett, Ed.:Springer, pp. 200-216. 2009[2] S. Ozer, K. J. Doshi, W. Xu, and R. R Gutell, "rCAD: A NovelDatabase Schema for the Comparative Analysis of RNA," to appearIEEE e-science Conference, Dec. 2011.[3] K Yamamoto, N Sakurai, and H Yoshikura, "Graphics of RNAsecondary structure; towards an object-oriented algorithm," ComputAppl Biosci. , vol. 3, no. 2, pp. 99-103, June 1987.[4] O. Matzura and A. Wennborg, "RNAdraw: an integrated program forRNA secondary structure calculation and analysis under 32-bitMicrosoft Windows," Computer Applications in the Biosciences, vol.12, no. 3, pp. 247-9, 1996.[5] R. M. Felciano, R. O. Chen, and R. B. Altman, "RNA SecondaryStructure as a Reusable Interface to Biological InformationResources," SMI, SMI-96-0641, 1997.[6] P.D. Rijk, J. Wuyts, and R D Wachter, "RnaViz 2: an improvedrepresentation of RNA secondary structure," Bioinformatics, vol. 19,no. 2, pp. 299-300, 2003.[7] RNAfamily..[8] KC Wiese, E Glen, and A Vasudevan, "JViz.Rna--a Java tool forRNA secondary structure visualization," IEEE Trans Nanobioscience,vol. 4, no. 3, pp. 212-8, Sep. 2005.[9] H. Yang et al., "Tools for the automatic identification andclassification of RNA base pairs," Nucleic Acids Research, vol. 31,no. 13, pp. 3450-60, 2003.[10] Y. Byun and K. Han, "PseudoViewer3: generating planar drawings oflarge-scale RNA structures with pseudoknots," Bioinformatics, vol.25, pp. 1435-7, 2009.[11] R. Giegerich and D J. Evers, "RNA Movies: visualizing RNAsecondary structure spaces," Bioinformatics, 15(1) pp. 32-7, 1999.[12] UCSC, XRNA.[13] K. Darty, A. Denise, and Y. Ponty, "VARNA: Interactive drawing andediting of the RNA secondary structure," Bioinformatics, vol. 25, no.15, pp. 1974-5, April 2009.[14] J.C. Lee and R.R. Gutell, "Diversity of Base-pair Conformations andtheir Occurrence in rRNA Structure and RNA Structural Motifs,"Journal of Molecular Biology, vol. 344, pp. 1225-1249, 2004.[15] R.R. Gutell, B. Weiser, C.R. Woese, and H.F. Noller, "Comparativeanatomy of 16S-like ribosomal RNA," Progress in Nucleic AcidResearch and Molecular Biology, no. 32, pp. 155-216, 1985.[16] T. Elgavish, J.J. Cannone, J.C. Lee, S.C. Harvey, and R.R. Gutell,"AA.AG@Helix.Ends: A:A and A:G Base-pairs at the Ends of 16 Sand 23 S rRNA Helices," Journal of Molecular Biology, vol. 310, no.4, pp. 735-753, 2001.[17] D. Gautheret, D. Konings, and R.R. Gutell, "A major family of motifsinvolving G.A mismatches in ribosomal RNA," J. Mol. Biol., vol. 242,no. 1, pp. 1-8, 1994.[18] R.R. Gutell, J.C. Lee, and J.J. Cannone, "(2002). The Accuracy ofRibosomal RNA Comparative Structure Models," Current Opinion inStructural Biology, vol. 12, no. 3, pp. 301-310, 2002.[19] B. T. Wimberly et al., "Structure of the 30S ribosomal subunit,"Nature , no. 407, pp. 327-339, 2000.[20] N. Ban, P. Nissen, J. Hansen, P. B. Moore, and T. A. Steitz, "Thecomplete atomic structure of the large ribosomal subunit at 2.4 Aresolution," Science, no. 289, pp. 905-920, 2000.[21] C.R. Woese, S. Winker, and R.R. Gutell, "Architecture of RibosomalRNA: Constraints on the sequence of Tetra-loops," Proceedings of theNational Academy of Sciences, vol. 87, no. 21, pp. 8467-8471, 1990.[22] J.C. Lee, J.J. Cannone, and R.R. Gutell, "The Lonepair Triloop: ANew Motif in RNA Structure," Journal of Molecular Biology, vol.325, no. 1, pp. 65-83, 2003.[23] J.C. Lee, R.R. Gutell, and R. Russell, "The UAA/GAN internal loopmotif: a new RNA structural element that forms a cross-strand AAAstack and long-range tertiary interactions," Journal of MolecularBiology, vol. 360, no. 5, pp. 978-988, 2006.[24] R.R. Gutell, A. Power, G. Hertz, E. Putz, and G. Stormo, "IdentifyingConstraints on the Higher-Order Structure of RNA: ContinuedDevelopment and Application of Comparative Sequence AnalysisMethods," Nucleic Acids Research, no. 20, pp. 5875-95, 1992.[25] D. Gautheret, S.H. Damberger, and R.R. Gutell, "Identification ofBase Triples in RNA Using Comparative Sequence Analysis," Journalof Molecular Biology, vol. 248, no. 1, pp. 27-43, 1995.[26] J.J Cannone, S. Subramanian, M.N. Schnare, J.R Collett., L.M.DSouza, Y. Du, B. Feng, N. Lin, L.V. Madabusi, K.M. Müller, N.Pande, Z. Shang, N. Yu, and R.R. Gutell "The Comparative RNAWeb (CRW) Site: An Online Database of Comparative Sequence andStructure Information for Ribosomal, Intron, and Other RNAs,"BioMed Central Bioinformatics, 3:2, 2002617