Your SlideShare is downloading. ×
Benasque RNA 2012: RNA Motifs
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Benasque RNA 2012: RNA Motifs

113
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
113
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hunting RNA motifsPaul GardnerJuly 23, 2012Paul Gardner Hunting RNA motifs
  • 2. Classification problemsWhat is this?The second most abundant M. tuberculosis transcriptNo hits to Rfam, no homologs have been identified, no knownfunction0e+001e+052e+053e+054e+055e+056e+057e+05Mycobacteria tuberculosis genome: RNA−seq and annotationsReadcounts+ve strand−ve strandCDSsPfam domainncRNAsterminators4099500 4100000 4100500 4101000 4101500 4102000genome coordinateRv3661HADmiscrna00343Rv3662cRank Reads Gene1 5,741K SSU rRNA2 689K sRNA / RUF?3 236K RNaseP RNA4 197K LSU rRNA5 119K AT P synthase6 119K HSP X7 115K tmRNA8 103K Secreted proteinArnvig et al.(2011) PLoSPathog.Paul Gardner Hunting RNA motifs
  • 3. Classification problemsWhat about this?The seventh most abundant N. gonorrhoeae transcriptNo hits to Rfam, no homologs have been identified, no knownfunction020000400006000080000Neisseria gonorrhoeae genome: RNA−seq and annotationsReadcounts+ve strand−ve strandCDSsPfam domainncRNAsterminators2137500 2138000 2138500 2139000genome coordinateNGO2161Inositol_Pterminator12061terminator12062 NGO2162SEC−Cterminator12063terminator12064Rank Reads Gene1 206K RNaseP RNA2 178K tRNA -Ser3 130K ZapA 3‘ UT R?4 120K P hage protein5‘ UT R?5 97K tmRNA6 93K BRO protein7 81K sRNA ?8 81K Hsp70 protein9 76K tRNA -A spIsabella & Clark(2011) BMCGenomics.Paul Gardner Hunting RNA motifs
  • 4. Classification problemsAnd this?The ninth most abundant C. difficile transcriptNo hits to Rfam, no homologs have been identified, no knownfunction01000200030004000Clostridium difficile genome: RNA−seq and annotationsReadcounts+ve strand−ve strandCDSsPfam domainncRNAsterminators3014000 3014500 3015000 3015500 3016000 3016500 3017000genome coordinatetoxB toxB toxBterminator0881terminator1760 trpSRank Reads Gene1 12K RNaseP RNA2 12K 6S RNA3 12K SRP RNA4 11K ldhA5 10K tmRNA6 6K spoVG7 6K Gly reductase8 5K slpA9 4K sRNA / RUF?10 4K hypothetical prot.11 4K sRNA / RUF?Deakin, Lawleyet al. (2012)Unpublished.Paul Gardner Hunting RNA motifs
  • 5. Classification problemsAnd this?The eleventh most abundant C. difficile transcriptNo hits to Rfam, no homologs have been identified, no knownfunction01000200030004000Clostridium difficile genome: RNA−seq and annotationsReadcounts+ve strand−ve strandCDSsPfam domainncRNAsterminators1760600 1760800 1761000 1761200 1761400 1761600genome coordinatefeoA2 terminator1637terminator1105 acetyltransferaseRank Reads Gene1 12K RNaseP RNA2 12K 6S RNA3 12K SRP RNA4 11K ldhA5 10K tmRNA6 6K spoVG7 6K Gly reductase8 5K slpA9 4K sRNA / RUF?10 4K hypothetical prot.11 4K sRNA / RUF?Deakin, Lawleyet al. (2012)Unpublished.Paul Gardner Hunting RNA motifs
  • 6. What is our next big challenge?Past:How many non-coding RNA genes are there?Future:Can we determine the functions, if any, for large sets of givenRNAs?Paul Gardner Hunting RNA motifs
  • 7. Will a “periodic table of RNA” be useful?My aim is to build an analog to the Periodic Table forclassifying RNA families and motifs, enabling researchers topredict function.base basepairRAUAGAU YACAUU5´YGAAR5´CUU CGG5´RUR R RY5´RRGCGUR ARAGCY5´RYGGAGYR RRRC RRGARR5´CGAAGYYRYY RRGGGRUGGAG5´CCRAYCCCRU CCGAACUYGG5´A N Y A G N R A U N C G T loop U t ur n k t r n1 k t r n2 tw istRCYRGGAACUGARCRUYAGUACGGGA R R A5´YYYAGU AG Y RAGGAARRR5´ RYGRYAAYCRYA YYAGRGAAYC5´RCAGG AGY5´ACAC UGRYRY G Y R R R RRYCARUY5´RAGCRCGRAGY AYGYYRGUUY5´AAAAAGCYRYY RRYGGYUUUUUUY U Y5´RRARR YYUUUUU U Y5´sar r ic1 sar r ic2 U A A G A N C sr C loop dom V t er m 1 t er m 2R Y Y Y YGCGAGCAGACGCARAACRCCCRRYRRYGGGYGUUYUGCGUCUGCUCGCR R R R5´YUYUCUCAACAGUGYUUGRRRAAY5´ YYYYYAUGAY GRYYYYAAA YYYYYRRGRRYC U GAUYYYRRR5´GGGUCUCUCUGYUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUARCUAGGGAACCCAC5´ UGUAAACAUCCUY GACUGGAAGCUGURRRY R YRRRRGCUUUCAGUCGGAUGUUUGC5´ U CUUUGGUUAUCUAGCUGUAUGAGUGYY RCRUCAUAAAGCUAGAUACCGAARU5´ CYYRUCCCUGAGACCCUAACYUGUGAGYUYYYAGYUUCACARGURGGYUCUYGGGRCYRGG5´GCUAAAAGGAACGAUCGUUGUGAUAUGCGUURRUUYCGUUACAUAUCACAGUGAUUUUCCUUUAUARCGC5´ C Y GYGYYCAUCUUACYGRGCAGUGUUGGAUGYYY RRGYCUCUAAYACUGYCUGGUAAYGAUGRCRYC G G5´ Y Y Y Y R R GYACAURCUUCUUUAUAUCCCAUAYRAYRRRCUAUGGAAUGUAAAGAAGUAUGUAY Y Y G G Y5´ Y R R YYCRUCAAARUGGYUGUGAR UGUYRUCAUAUCACAGCCACUUUGAUGAGY U Y R R5´ Y A A RAAGGGAAYRGUUGCUGUGAURUAYYYA YYYYUYUAUAUCACAGUGGCUGUUCUUUUUG G U Y5´ YCRGGUGAGGUAGUAGGUUGUAUAGUURRRRYYYY YGGAGYAACURUACAAYCURCUACUUYCCUGR5´GGCUGGUCCGARRGUAGUGGGUUAYRUYAAYYYYUURYY Y YUCYCCCYCYCACU RCURYACUUGACURGCCUU U5´ YYYCUGYRRUGUCGUARYYYYYUGARCCRAYYYYYYGGGRGYYYYYRGGYAG CCCYYGGGAARCAARYRRRRYRCCC A CCURRRYRYRGGUUCARRRRYACGGCAYYRYGGRYYYY5´YYRCGRCCAUACRRRGRARCACCYGRUCCCAU CCGA ACYCRGAAGU UAAGCYYY YGGC YR RGUA C URG R YGRGRAYCCUGGGAARYRGGUGYYGYRRY5´GRUAGYYYARYGGY ARR R CRYYRGYUY AAYYRRRRYRRGG UUCRARUCCYYYYR5´RRAARYUCRYRRRRGYYACRRYGAGURYY RYRCUC YCYYYY G G G A A GGUC U G A GARGCCAYYRCCCUGGGGYRYYYYYYGRRRRGRRRR Y G R G Y YACCAGA A A YRR Y YYYRRGYUUGGAARRCUYRYGGCYRG Y R R Y UAGUCAAURYGRRYRRYYYRAACYCRAUUCAGACUAUCUYY5´T R I T I R E SE C I S m ir -T A R m ir -30 m ir -9 lin-4 m ir -5 m ir -8 m ir -1 m ir -2 m ir -6 let -7 Y R N A 6S 5S t R N A R N asePAURRGRYAGGYAUUGAACUGUAUUGUGCRCCUUGCAUARAGCUAAAGCACUAAAAAGGAGUAA5´AGUCAUGAUYGCUAUUCYY YAAAUAGUGAUUGUGAUAGCGAUGCGGYGUGUUG CGCACRYCGYAYCGCGC U5´AGAGGAARCRGGGGCCAYGCAGAAGCGUUCACGUCGCGGCCCCUGUCAGAUUCRGURAAUCUGCGAAUUCUGCU5´G A U ACAUAGGAACCUCCUC AAAGGAUUCUAUGGA C AGUCGAUGCAGGGAGGG A CRRCUCCCUGCAUCGGCGA U U U U5´ ACGRRGUR RARUGCGA U A A Y A YAAUAAUGAAAUUCCUCUU U G A CGGCCAAUAGCGAUAUUGGCCAUUUUUUU5´ RYCUUUAGCGGGYURRRUY A R U CURGYYGGYGUUUCGCCGRCY YURCYYUGAYRY5´RYYRYYCCGUGGUGAUUUGRYCGGCCGGCUUGCAGCCACGUUAAAYAAUCGCUAAARAGGCCGRGGRRR5´GUCGRRUY Y C ACUGA U G AG U C YU RARGACGAAAC5´ Y Y RAUYUAAARAAACA G CUU UCA AGU G CCU U U Y U GCA GUUYYYCARGAGCGCAAGAURGR U A5´RYGGYY GYUUGCCAUACGCCCYYY YYCGGC AGGUAUGGAARCACCCYCG Y A CGACUGGYYC GGACACYGYCGUCCCGCCAGAUC5´ CACAUCAGAU U UCCUGGUGUAA CGAAUUUUCAAGUGCU U CUUGCAUAAGCAAGUUURAUCCCGCYCCYYCGRGYCGGGAUUU5´ A U GGAGACAUGGCRUAAAGC C AGA RAG U R AGAACR U A A CYUAGACURUACUUGAACUGAUUYRCAUCUCAU U U U5´GCRCYGCAAAAUCRGRYGCC G G G AU UGGYAYCCCGRAYRRRRYRA R C G CYGCGYUUUUUU5´Y U R C G U G A C G A A G CGCGCGCAAAGUGG A CAAUAAAGCCURA G CRUYRAGUAGUCGYCAGACGCCGGUU A AGCCGGCGUUUU U U5´ YRYACGURYCYGUURURGYCCGGUUGCUUUGGUCGGUGACCGGRR R RRAGCCCRCUU GGUGGGYUUUU U5´GGYCRGCYCRCCC C CCRGRGCYGRCCG A C G G C C C C C G CU CCCCCCYGGCGGGGGYCGUCCCYY5´U U G G C G A U R UUUUUGGUU GGAAUGUAGUGYYYUUAR C A C U AAA CGC U GCCA C AAAUAACCUGUCAGUUAUUUCAYCAAAAAU A A A5´RYYRYUGCCCUCYGGGCGUUUCCUCCCUAGACUUGGCYYYYRRGGCCUUUUUUUUYYY5´SA M V sym R C P E B 3 F inP sr oB m sr SA M a H H 3 V m nt n3 livK D sr A C A E SA R isr K sr oD isr B 6C r spL suhBUYGCAUCCGCYAAYCGGUYA G C C GU G UCG C GG AA GGUUY YYAACCAG C URYY U Y Y G RAACRRAGRRAGGUGAGCG5´UGAAAGACGCGCAUUUGUU A U C A U CAUCC CUGU YCAGAGAUGYAAUUUGGCCACAGYRYGUGGCCUUUUC5´* UUCUACUGACUCUUUUAAAAUAAUUAUUCAUUGGAGG U UUAAUAUGAAUAUAA A G G A U G A G CAU AUAGAAGCGUUUGCUCYUUGUUAGAUCRGUUAGUAGGAA5´G A U U UGGURRCUGCGCUCUU C UAAGCCAGUUACCCGGUUCAAARAUUG C CAGCUUYGAACCUUCGAAAAACCACCUY CRRGGUGGUUUUUUCGU5´R R R R R R R RCUCRUAUAAYYYCRRRAAUAUG GYY Y G R R AGUU U C UACC R R G Y RC CGUAAAYRYYYGACUAYGAGRRR5´CGGCAUCCCCAUUA C CUAUGG ACACGGUGCCGC A R G C U C U G G R AG UUCGUYCCRGAGYYUGYYGGAARGGUUUUCCGUGUCCAG5´RRYGGARGCRRUGARYRYYYYUYAUYUG G GCACYUGRRRYRYGGAGCYAGU R GUGCAACCGRCCRYRRR5´GUUGUAACUAUGUUGCARYAR A C G AGAACCGAGUAUAGUUCAUGGGRU Y ACAUGAAUU G U UUAACURUCCUCUGGAU UCCCGUCCAUGRCAGUCGGUUC5´CUUACUGAGAGCACAAAGUUUCCCGUGCCAACAGGGAGUGUUAUAACGGUUUAUUAGUCUGGAGACGGCAGACUAUCCUCUUCCCGGUCCCCUAUGCCGGGUUUUUUUUAUGUC5´UURGRYUYRCCUGAAUGUGACUAUCACUUCAAACRRYGRGYAACCUCAGUAUCAUCRYRGAGYUAAACCCUCGCCGCCUGACGGYGAGGGUUUUCUUUUGGR5´U G U A A A A A A C A U Y A U U UAGCGUGAYUUUCUAUCAACAGC U A A CAAUUGUUAUUACUGCCUAAYGYUCAUAA G G G U A AUUUUAAAAAAGGG CGAUAAAAAACGAUUG G GGGAUGAGAYAUGAACGCUCA A G C A5´C C C A G A G G U A U U G A UUGGUGAURRCAYYU C URUGYUYAUUYAUURCACCAA C C U G C G C RGAUGCGCAGGUUUUUUUU5´ARRRYYYYYAAURYCAACYUUUAGCGCACGGCUCUYYAAGAGCCAUUYCCCUAGRCCAAACAGGAAUYGUUUGGYCUUUUUUU5´GGGCARGAUAUGUGAAGURGCYACCGCAAGCYGRUACYCUUCACYY Y C CUUA U UCG CUYGCUCAACGGRAUCYUGCUCUG C G A G G C Y5´GUGCRRYCYRAUUYYRGYYGYGCCYRYRARAACAUCAYAARAUACGGCRCRRCCACRAUUUCCCUGGUGUUGGCGCAGUAUUCGCGCACCCCGGUCUACC5´YUUYRYURRUUUYAUCARAYCU GUUUGAURRAAGYUARYGARR Y Y C A Y UAACRGCUYUYGCY GGCY Y GACCCGAGRYYGUUUU U U U5´RACGUUCAYCCYYYRGGRCGCAYRAYCARRYCAYGGAACGGGGRYYUGRR5´sucA Sr aD sxy R N A I P ur ine SA M -C hl cdiG M P 2 A nt i-Q G adY r nk ldr P r fA O m r A -B R yeB t r aJ 2 Sr aH 23Sm et h D S-p epU U C G G C C Y CGCRRCGYUU YUYCGYYGCCC U C U G C A YGCCGUCGCCGACGCAYUCCYAUUCGAA Y Y G UGCGAUCCUGUCGC CYUCCUGCGGCGCGGC5´ CGYRGCGCUUGUUAU UURYYG C UGUGUAG U GUCGUCYYRA R Y Y R G R R Y Y YAAACCCCGCCYUUYGGCGGGGUUUUGC U U U U U5´** CUUACCGGAGGYRUAUGGACCCUGA UCC C ACY C C UCUCCCC GAUGGAGAAUYYYUUUCCGGUAAGCC Y G Y C U Y YRCUGYYUUACCGG UGYGUAAGGCAGUGA C G U Y U5´GGRAGRYRYCUGGU G RYCGGCUUCA AACCGRY GRRGYRYYYYGGYRGG UUCGAYUCCYRYYCUYCC5´ UGACCCUUUA RCCRAGGGUCACC U A G C C A A C U G A C GUUGUUAGUGAAYYYAUGUUCAC ARAUARGCCAAUCGCUUUGCGRUUGGCU U U U U U U U U5´ C U U A A URAACAAGAAAACYAAR C GUACYUUCCY CCUGAGUUCAGGCUGGAAUGCGC ACAG C U RAU U G U U G A U AAG G G CUACUCAUACCGACAAGCCAGUGAAGCGAUGAAUGUCGGUUCCA C5´RUYYRCUGAYGAGUCCCAA AUAGGACGAAA C G CGCGUCYGRAU5´ CUCCAUGUAUCUUUGGGACCUGUCAGCUGUGGCAG UCUCCC UUCCUAGCCAUGGAAG A G C A U A U U C UUGUUUAUUGGCAAAGCUGUCACCAUUURAUUGGUAUCAGA U UCUGACUUGCACAAGUAACAUU C5´ C Y G G U U GGUGGCGCACUUCCYYACGGGCGGUGU RUYACGY R Y U R Y R R Y A G A R R R A Y A C CAGCCCGCYRRRAGCGGGCUUU U U U5´GUCAUACUACGGUGCAAYGYRRAAAGU AAACGAUGACC C YARGAACUCYRGG U AA AAURCRUAUCAAAAUGYAAAAUUGUY U G A C C U G G GRUYYUCCGGGUYRGYUYUUUU5´U R U G C U A A C U R R R A A YGUUGYA URYAACCCUUGRYGCUUAU YCCUUURYCAAGCA U A U U A Y ARCGRUCGYYAA A G G A G A A A U G5´U C R A A A G A A C AUGAAAUGGAGGAGAAAUUACAGCA A U U UAUCARC UGAAAUUAUAGGUGUAGACACAUGUCAGCR G UGGAAACAGUUUC U AUCA A A A UUA A AGUAUUUAGAGAUUUUCCUC AAAUUUCAAA U5´ACAGGGUARGGRYYYYYUURURRRRRYCCUUACCGGRUUUCUCAARUYGGRGYAAAYCCGRUUGRARUAUARAGGARG5´CGYGUUAUAUGCCUUUAUUGUCACARUUYUUUUUYYGYUGRYCAUUGGYAYYAUURAUUYCCAGCRAUAAAYGACAAGCCCGAACRYUGUUCGGGCUUUUUUUURRUYA5´Y Y Y AUGGYGGYGRGGGRRCCUUYG GG YYGCCGGUUCCYYRCCGGU Y U RCCAACCCYYRCYRCCACC Y5´AUGGAYRUGCGCAGGAAGCGCRAAGACARACAGGGACACRYAGGRACCCGGAUGGYGGRRYAGGAUGUCAGGRAACAGUCUGCAAAGCCCCGCYYYGGCGGGGUUUU5´P s-R ho r nk ps M gsens t R N A S Q r r isr C H H 1 SN R 24 T r p ldr gr eA pr eQ 12 H A R 1F T er m L eu M icC C 4 R sm Y R ib osom ePaul Gardner Hunting RNA motifs
  • 8. What is a motif?Escher, MC (1938) Sky and Water 1.Wiktionary definitions for motif:A recurring or dominant element; atheme.For the purposes of this work:an RNA motif is a recurring RNAsequence and/or secondary structurefound within larger structures thatcan be modelled by either acovariance model or a profile HMM.Paul Gardner Hunting RNA motifs
  • 9. What is a motif?The kink-turn RNA motifFound in rRNA, riboswitches, RNase P, snoRNAs, ...An asymmetric internal loopCauses a sharp kink between two flanking helical regionsCruz & Westhof (2011) Sequence-based identification of 3D structural modules in RNA with RMDetect. NatureMethods.Paul Gardner Hunting RNA motifs
  • 10. Can we build a seed alignment and CM resource of thesemotifs?For the hairpin motifs?YESFor the internal loopmotifs?MAYBEFor the sequence motifs?Not certain yetGNRAYGAAR5´k-turn-1RYGGAGYR RRRC RRGARR5´k-turn-2CGAAGYYRYY RRGGGRUGGAG5´Doshi et al. (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy pa-rameters for RNA secondary structure prediction. BMCBioinformatics.Paul Gardner Hunting RNA motifs
  • 11. The RNA Motifs: RMfamANYARAUAGAU YACAUU5´GNRAYGAAR5´T-loopRUR R RY5´UNCGCUU CGG5´U-turnRRGCGUR ARAGCY5´k-turn-1RYGGAGYR RRRC RRGARR5´k-turn-2CGAAGYYRYY RRGGGRUGGAG5´pK-turnA GGRRRYRRYR C YCGYYRYYYC5´sarcin-ricin-1RACCYRRYGAACUGAAR CAUCUCAGUARYRGAGGA R A A5´sarcin-ricin-2YYAGUAG Y RAGGAAR5´tandem-GAG CRRUGARRRRAYAAGARARYRYYYYRGAG5´twist_upCCGAYCCCAU CCGAACUCGG5´UAA_GANRYGRYAAYCRYA YYAGRGAAYC5´C-loopGACAC UGYRY R Y R RRRCARUY5´Csr-RsmRCAGG AGY5´domain-VRAGCRCGRAGY AYGYYRGUUY5´SRP_S_domainRGRAC YYRYCAGGYGR AARAGCAGYRAAY5´vapC_targetA U A UYGRRCRCCRGYCGC RYYG Y C5´terminator1AAAAAGCYRYY RRYGGYUUUUUU Y U Y5´terminator2RRARR YYUUUU U U Y5´TRITR Y Y Y YGCGAGCAGACGCARAACRCCCRRYRRYGGGYGUUYUGCGUCUGCUCGCR R R R5´R R A R G G R G R R Y A U G R R5´R R G G R R R Y A U G A R R5´A A G G R R A U G R5´Paul Gardner Hunting RNA motifs
  • 12. Breaking RulesPrefer motifs found inmultiple families &/ormultiple locations in thesame familyAligning analogousstructuresMotifs are allowed tooverlapModels are non-specificHigh false-positive ratesSequences are not edited,spliced or otherwiseinterfered withSequence sources arediverse (PDB, EMBL &publications)Paul Gardner Hunting RNA motifs
  • 13. An example entry# STOCKHOLM 1.0#=GF ID twist_up#=GF AU Gardner PP#=GF SE Gardner PP#=GF SS Published; PMID:21976732#=GF GA 19.00#=GF DR URL;http://rnafrabase.cs.put.poznan.pl/?act=pdbdetails&id=1S72#=GF RN [1]#=GF RM 21976732#=GF RT Clustering RNA structural motifs in ribosomal RNAs using#=GF RT secondary structural alignment.#=GF RA Zhong C, Zhang S#=GF RL Nucleic Acids Res. 2012;40:1307-17.2zkr_Y/27-46 CCGAUCUCGUC-UGAUCUCGG#=GR 2zkr_Y/27-46 SS <<<<---<_____>--->>>>2gv3_A/3-20 CCAGACUC---CCGAAUCUGG#=GR 2gv3_A/3-20 SS (((((.(.---....))))))3iz9_C/29-48 CGGAUCCCGUC-AGAACUCCG#=GR 3iz9_C/29-48 SS ((((...(...-.)...))))3bbo_B/31-50 CCAA-UCCAUCCCGAACUUGG#=GR 3bbo_B/31-50 SS .(.....<.....>.....).1nkw_9/33-53 CCCACCCCAUGCCGAACUGGG#=GR 1nkw_9/33-53 SS (((...............)))1c2x_C/31-51 CUGACCCCAUGCCGAACUCAG#=GR 1c2x_C/31-51 SS ((((...<.....>...))))1giy_B/33-53 CCGUUCCCAUUCCGAACACGG1S72_9/29-50 CCGUACCCAUCCCGAACACGG#=GR 1S72_9/29-50 SS ((((...(.....)...))))#=GR 1S72_9/29-50 MT ...xxxxx.....xxxxx...#=GC SS_cons ((((...(.....)...))))//Paul Gardner Hunting RNA motifs
  • 14. An example entry# STOCKHOLM 1.0#=GF ID twist_up#=GF AU Gardner PP#=GF SE Gardner PP#=GF SS Published; PMID:21976732#=GF GA 19.00#=GF DR URL;http://rnafrabase.cs.put.poznan.pl/?act=pdbdetails&id=1S72#=GF RN [1]#=GF RM 21976732#=GF RT Clustering RNA structural motifs in ribosomal RNAs using#=GF RT secondary structural alignment.#=GF RA Zhong C, Zhang S#=GF RL Nucleic Acids Res. 2012;40:1307-17.2zkr_Y/27-46 CCGAUCUCGUC-UGAUCUCGG#=GR 2zkr_Y/27-46 SS <<<<---<_____>--->>>>2gv3_A/3-20 CCAGACUC---CCGAAUCUGG#=GR 2gv3_A/3-20 SS (((((.(.---....))))))3iz9_C/29-48 CGGAUCCCGUC-AGAACUCCG#=GR 3iz9_C/29-48 SS ((((...(...-.)...))))3bbo_B/31-50 CCAA-UCCAUCCCGAACUUGG#=GR 3bbo_B/31-50 SS .(.....<.....>.....).1nkw_9/33-53 CCCACCCCAUGCCGAACUGGG#=GR 1nkw_9/33-53 SS (((...............)))1c2x_C/31-51 CUGACCCCAUGCCGAACUCAG#=GR 1c2x_C/31-51 SS ((((...<.....>...))))1giy_B/33-53 CCGUUCCCAUUCCGAACACGG1S72_9/29-50 CCGUACCCAUCCCGAACACGG#=GR 1S72_9/29-50 SS ((((...(.....)...))))#=GR 1S72_9/29-50 MT ...xxxxx.....xxxxx...#=GC SS_cons ((((...(.....)...))))//Paul Gardner Hunting RNA motifs
  • 15. An example annotated Rfam alignment# STOCKHOLM 1.0#=GF ID 5S_rRNA#=GF AC RF00001#...#=GF WK 5S_ribosomal_RNA#=GF SQ 712#=GF MT.G GNRA#=GF MT.s sarcin-ricin-2#=GF MT.t twist_upX62858.1/62-115 GCUGCG-U-UGUGGGUGUGUACUGCGGUUUU-UUG-CUGUGGGAA--GCCCACUUCA-CUGX01588.1/64-115 UCACGU--UAGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X01588.1/64-115 MT.0 .......................GGGGGGGGGGGGGGGGG.....................X05870.1/363-414 UCACGU--UGGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X05870.1/363-414 MT.0 .......................GGGGGGGGGGGGGGGGG.....................L27170.1/63-116 CCAGCG--UCCGGCA--AGUACUGGAGUGCGCGAGCCUCUGGGAA-AUCCGGU-UCG-CCG#=GR L27170.1/63-116 MT.0 ..ssss..sssssss..ssssssssssssssssssssssssssss.sssssss.sss.ss.L27163.1/61-115 CCAGCGUUCCGGGGA---GUACUGGAGUGCGCGACCCUCUGGGAA--ACCGGGUUCG-CCG#=GR L27163.1/61-115 SS ...(((((((((.........)))))))))(((((((...((......))))).))).)..#=GR L27163.1/61-115 MT.0 .................................GGGGGGGGGGGG..GGGGGGG.......#=GR L27163.1/61-115 MT.1 ..sssssssssssss...sssssssssssssssssssssssssss..ssssssssss.ss.L27343.1/60-114 CCAGCGAACCAGCUA---GUACUAGAGUGGGAGACCCUCUGGGAG-CGCUGGU-UCG-CCG#=GR L27343.1/60-114 MT.0 .......................GGGGGGGGGGGGGGGGG.....................#=GR L27343.1/60-114 MT.1 ....sssssssssss...sssssssssssssssssssssssssss.sssssss.sss.s..M33891.1/66-117 CCAGCG--CCGGAGA---GUACUGCGC-GGGCAACCGCGUGGGAG--GCGAGG-CCGCUCG#=GR M33891.1/66-117 MT.0 .......................GGGG.GGGGGGGGGGGG.....................X01484.1/62-114 GUCAGG--CCCAGUU--AGUACUGAGGUGGGCGACCACUUGGGAA-CACUGGG-U-G-CUG#=GR X01484.1/62-114 MT.0 ........................GGGGGGGGGGGGGGGG.....................#=GR X01484.1/62-114 MT.1 ...sss..sssssss..ssssssssssssssssssssssssssss.sssssss.s.s.ss.#=GC SS_cons ::::::::::<-<<--------<<-----<<____>>----->>----->>->::::::::#=GC RF cCaGcG..cccggga...GUAcUggGGUggGcgAcCcucUGGgAA.CaCccgg.uCG.CUG//YRCGRCCAUACRGRRCACCGUCCCR U YCGACCGRAGU YAAGCYGG C Y RGUA C UR R YG GGGACYYUGGGAAYRGGYGYYGYR5Paul Gardner Hunting RNA motifs
  • 16. An example annotated Rfam alignment# STOCKHOLM 1.0#=GF ID 5S_rRNA#=GF AC RF00001#...#=GF WK 5S_ribosomal_RNA#=GF SQ 712#=GF MT.G GNRA#=GF MT.s sarcin-ricin-2#=GF MT.t twist_upX62858.1/62-115 GCUGCG-U-UGUGGGUGUGUACUGCGGUUUU-UUG-CUGUGGGAA--GCCCACUUCA-CUGX01588.1/64-115 UCACGU--UAGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X01588.1/64-115 MT.0 .......................GGGGGGGGGGGGGGGGG.....................X05870.1/363-414 UCACGU--UGGUGGG---GCCGUGGAUACCGUGAGGAUCC-GCAG-CCCCACU-AAG-CUG#=GR X05870.1/363-414 MT.0 .......................GGGGGGGGGGGGGGGGG.....................L27170.1/63-116 CCAGCG--UCCGGCA--AGUACUGGAGUGCGCGAGCCUCUGGGAA-AUCCGGU-UCG-CCG#=GR L27170.1/63-116 MT.0 ..ssss..sssssss..ssssssssssssssssssssssssssss.sssssss.sss.ss.L27163.1/61-115 CCAGCGUUCCGGGGA---GUACUGGAGUGCGCGACCCUCUGGGAA--ACCGGGUUCG-CCG#=GR L27163.1/61-115 SS ...(((((((((.........)))))))))(((((((...((......))))).))).)..#=GR L27163.1/61-115 MT.0 .................................GGGGGGGGGGGG..GGGGGGG.......#=GR L27163.1/61-115 MT.1 ..sssssssssssss...sssssssssssssssssssssssssss..ssssssssss.ss.L27343.1/60-114 CCAGCGAACCAGCUA---GUACUAGAGUGGGAGACCCUCUGGGAG-CGCUGGU-UCG-CCG#=GR L27343.1/60-114 MT.0 .......................GGGGGGGGGGGGGGGGG.....................#=GR L27343.1/60-114 MT.1 ....sssssssssss...sssssssssssssssssssssssssss.sssssss.sss.s..M33891.1/66-117 CCAGCG--CCGGAGA---GUACUGCGC-GGGCAACCGCGUGGGAG--GCGAGG-CCGCUCG#=GR M33891.1/66-117 MT.0 .......................GGGG.GGGGGGGGGGGG.....................X01484.1/62-114 GUCAGG--CCCAGUU--AGUACUGAGGUGGGCGACCACUUGGGAA-CACUGGG-U-G-CUG#=GR X01484.1/62-114 MT.0 ........................GGGGGGGGGGGGGGGG.....................#=GR X01484.1/62-114 MT.1 ...sss..sssssss..ssssssssssssssssssssssssssss.sssssss.s.s.ss.#=GC SS_cons ::::::::::<-<<--------<<-----<<____>>----->>----->>->::::::::#=GC RF cCaGcG..cccggga...GUAcUggGGUggGcgAcCcucUGGgAA.CaCccgg.uCG.CUG//YRCGRCCAUACRGRRCACCGUCCCR U YCGACCGRAGU YAAGCYGG C Y RGUA C UR R YG GGGACYYUGGGAAYRGGYGYYGYR5Paul Gardner Hunting RNA motifs
  • 17. Uses for a motif DB (I): Improving a RNA alignments andstructure prediction constraintsYRCGRCCAUACRGRRCACCGUCCCR U YCGACCGRAGU YAAGCYGG C Y RGUA C UR R YG GGGACYYUGGGAAYRGGYGYYGYR5Original RfammodelSarcin-ricinloopGNRAtetraloopRefined RfammodelYRCGGCCAUACRGRRCACCGUCCCR UYCGACCGAAGUYAAGCYGG C Y R GU A C UR R G G GGACCYUGGGAAYRGGYGYYGYR5Paul Gardner Hunting RNA motifs
  • 18. Uses for a motif DB (II): RNA structural motifsThe Lysine RiboswitchRRGUAGAGGYGCRRYYAAGUAYUYGRRRRY C R R UGAR R R YGA A R GGR R R R U Y G CCGA ARYRRRY YYYYGYU GGYRYRYG AARRRYRRRACUGU C A Y R RYYRUGGAGGCUAYY5´GAGYRRC RGA5´RGRYR ACY5´GRA5´UR5´32% K-turn11% U-turn13% T-loop 21% GNRATetraloopP1P2P3P4P5Paul Gardner Hunting RNA motifs
  • 19. Uses for a motif DB (II): RNA structural motifsThe Lysine RiboswitchPasteurella multocidaBacillus sp.Escherichia coliStaphylococcus epidermidisLactococcus lactis subsp. lactisStaphylococcus haemolyticusStaphylococcus saprophyticusActinobacillus pleuropneumoniaeHaemophilus ducreyiBacillus haloduransVibrio splendidusBacillus subtilisSerratia proteamaculansListeria monocytogenesVibrio sp.Vibrio vulnificusLactobacillus plantarumMannheimia succiniciproducensBacillus cereusKlebsiella pneumoniaeLactococcus lactis subsp. cremorisListeria innocuaHaemophilus influenzaeVibrio parahaemolyticusClostridium botulinumThermoanaerobacter tengcongensisLactobacillus brevisThermoanaerobacter sp.Bacillus licheniformisPectobacterium atrosepticumMannheimia haemolyticaYersinia frederikseniiStaphylococcus aureusHaemophilus somnusListeria welshimeriProteobacteriaFirmicutesPasteurell.VibrionalesEnterobac.ClostridiaLactobacil.BacillalesP2 stemRYGGAGYR RRRC RRGARR5´Kink-turnRRGCGUR ARAGCY5´U-turnP4 stemYGAAR5´GNRAtetrloopRUR R RY5´T-loopP2 P4Paul Gardner Hunting RNA motifs
  • 20. Uses for a motif DB (III): Functionally characterising novelRNAsYYURRUYYYA GRARYCRCAYGGA YG YG R Y A GRYGGCCAY G GAYGGCRYRURRCGGYYCG U YRRYRARYYYRR5YRGG AR5´Csr-Rsm bindingmotifTwoAYGGAY sRNA: 216 homologuesfound in Human gut metagenomesequences, γ-Proteobacteria andClostridiales speciesWeinberg Z et al. (2010)”Comparative genomics reveals 104candidate structured RNAs frombacteria, archaea and theirmetagenomes”. Genome Biol.Paul Gardner Hunting RNA motifs
  • 21. Csr-Rsm backgroundToledo-Arana et al. (2007) Small noncoding RNAs controlling pathogenesis. Current Opinion in Microbiology.Paul Gardner Hunting RNA motifs
  • 22. Uses for a motif DB (IV): Identifying and ranking ncRNAsof interest in transcriptome results0 50 100 150 2000.000.030.06Distribution of motif scores on S. typhi ncRNAsSum of motif scores (bits)Density Known RNAsRNA−seq ncRNAsShuffled0 50 100 150 2000.00.40.8CDF of motif scores on S. typhi ncRNAsSum of motif scores (bits)CDFK−S.test(Known RNA > Shuffled) => P=2.11e−37K−S.test(RNA−seq RNA > Shuffled) => P=1.47e−09Paul Gardner Hunting RNA motifs
  • 23. Uses for a motif DB (IV): Identifying and ranking ncRNAsof interest in transcriptome results0 500 1000 1500 20000.00000.0010Distribution of motif scores on S. typhi ncRNAs alignmentsSum of motif scores (bits)Density Known RNAsRNA−seq ncRNAsShuffled0 500 1000 1500 20000.00.40.8CDF of motif scores on S. typhi ncRNAs alignmentsSum of motif scores (bits)CDFK−S.test(Known RNA > Shuffled) => P=8.15e−06K−S.test(RNA−seq RNA > Shuffled) => P=0.53Paul Gardner Hunting RNA motifs
  • 24. 80 highest scoring matches between RMfam & RfamT−loopTerminatorCsr−RsmDomain−VGNRAtandem−GAk−turnSRP_SUNCGU−turntwist_upsarcin−ricinqqqqqqqqqqqqtmRNAcyano_tmRNAtRNA−SectRNAmascRNA−menRNARNaseP_bact_asuhBHis_leaderL10_leaderL20_leaderPhe_leaderThr_leaderrimPQrrIRES_ApthoCsrBPrrB_RsmZTwoAYGGAYU6Intron_gpIIRNaseP_archRNaseP_bact_bGEMM_RNA_motifalpha_tmRNAIntron_gpIgroup−II−D1D4−7group−II−D1D4−6group−II−D1D4−5group−II−D1D4−3group−II−D1D4−1MOCO_RNA_motifmanAClostridiales−1RtTserCT−boxSNORA73SAMArchaea_SRPBacteria_small_SRPBacteria_large_SRPFungi_SRPMetazoa_SRPPlant_SRPwcaGU3C4HIV_FETermite−leu5S_rRNASSU_rRNA_bacteriaSSU_rRNA_archaeaSSU_rRNA_eukaryaCobalaminFMNEntero_5_CREgroup−II−D1D4−4qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qqqqqqqqqqqqPaul Gardner Hunting RNA motifs
  • 25. What about the highly expressed ncRNAs?U U R R C C U U G U G U G Y R G G U C G A G U A C A A A G G A G C A CGGAAGCYCGGRAGGCCAA GGYCGACCGGAA G AGAAGGYUCGAYCUCCCGAYCCGRG CYCCCA G C A C G G RC CG GY A CC C A C G C GGAGCR GCCGCR A YAARGGCARAAGYGU UGYGGGCCUGCGURA U U C G R ARYRRAC GG YCGACGR C C C UUUGGGUGGGGYUGCRRCCRRARGUCGYRRCGCCGAGGCCACCCA CGCAACCCRYYGCACGCUUGGUAACCGRGUCCGUGCURGCGGGCGGYGAYCRUYGGRCGCCGCCCGCUUYRYUR5RYGGAGYR RRRC RRGARR5´Possible K-turnRRARR YYUUUU U U Y5´Possible Rho-independentterminatorPossible Rho-independentterminatorPossible Rho-independentterminatorPossible Rho-independentterminatorPossible Rho-independentterminatorPossible Rho-independentterminatorN_gon_RUF7Y YGCUGRCCRCYGCYUUUAU C CCUUUUGGCRGUGCAGCA A G U A AGACGUUUUCC CGCAAUGUAUU GRCCAUUCAUACUUC CCUURGUAUGRAUGGUUUUUUUURYRRR A A A U G C C G U C U G A A YGUUCAGACGGCAUUURU5´C_dif_RUF9UCAGAAGGCUUGYUUUCUGAY U U U U U U A U U C A A A U C A A A A A Y A R G U A G U U A A Y U U U U A Y C U UAUUCUACUUUCC C CAAAGUAAGAAUA U Y A C Y Y U U RGAURRGCAGCYYAUCU U U U U U U A A A5´C_dif_RUF11UCAGAAGGCUUGYUUUCUGAY U U U U U U A U U C A A A U C A A A A A Y A R G U A G U U A A Y U U U U A Y C U UAUUCUACUUUCC C CAAAGUAAGAAUA U Y A C Y Y U U RGAURRGCAGCYYAUCU U U U U U U U A A R5´M_tub_RUF3Possible Rho-independentterminatorLow Seq. G+CRNA RNA code RNA z M otifs complexity sRNA (gen.)M .tub. RUF3 No RNA (prob=1.00) Terminator,k-turn No 59% (66%)N.gon. RUF7 No RNA (prob=0.94) Terminator No 43% (52%)C.dif. RUF9 ? (P =0.044) RNA (prob=1.00) Terminator No 30% (29%)C.dif. RUF11 ? (P =0.079) RNA (prob=1.00) Terminator Yes (23/ 170nts) 26% (29%)Paul Gardner Hunting RNA motifs
  • 26. What about the highly expressed ncRNAs?Are they antisense to any 5‘ UTRs0 5 10 15 20 25 300.00.40.8RNAup RNA−RNA energies between sRNAs and 5 UTRsnegenergy (−kcal/mol)Freq. M.tub3 (native)M.tub3 (shuffled)N.gon7 (native)N.gon7 (shuffled)C.dif9 (native)C.dif9 (shuffled)C.dif11 (native)C.dif11 (shuffled)0 5 10 15 20 25 300.000.10Difference between native and shuffled CDFsnegenergy (−kcal/mol)∆CDFK−S.test(M.tub3 RNA ~ Shuffled) => P=3.70e−07K−S.test(N.gon7 RNA ~ Shuffled) => P=0.01K−S.test(C.dif9 RNA ~ Shuffled) => P<2.2e−16K−S.test(C.dif11 RNA ~ Shuffled) => P<2.2e−16Paul Gardner Hunting RNA motifs
  • 27. What about the highly expressed ncRNAs?Are they antisense to any 3‘ UTRs0 5 10 15 200.00.40.8RNAup RNA−RNA energies between sRNAs and 3 UTRsnegenergy (−kcal/mol)Freq. M.tub3 (native)M.tub3 (shuffled)N.gon7 (native)N.gon7 (shuffled)C.dif9 (native)C.dif9 (shuffled)C.dif11 (native)C.dif11 (shuffled)0 5 10 15 200.000.10Difference between native and shuffled CDFsnegenergy (−kcal/mol)∆CDFK−S.test(M.tub3 RNA ~ Shuffled) => P=5.13e−09K−S.test(N.gon7 RNA ~ Shuffled) => P=8.21e−06K−S.test(C.dif9 RNA ~ Shuffled) => P<2.2e−16K−S.test(C.dif11 RNA ~ Shuffled) => P<2.2e−16Paul Gardner Hunting RNA motifs
  • 28. Discussion pointsA useful curation tool for building alignments and structuresCan provide testable function hypotheses, sometimesLimited use outside of alignments based on the SalmonellaTyphi resultsEven in alignments, false-positives & negatives are commonMore motifs (Terminators, Shine-Dalgarno, ...)Limit trivial motifs and prevent rediscovery of old motifsSnapshots of the data are available fromhttps://github.com/ppgardne/RMfamCan anyone here do better on MTB3, NGON7, CDIF9 &CDIF11?Paul Gardner Hunting RNA motifs
  • 29. Thanks!Lars Barquist, AlexBateman & Rob KnightPPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.Paul Gardner Hunting RNA motifs
  • 30. RNA motifsANYARAUAGAU YACAUU5´GNRAYGAAR5´UNCGCUU CGG5´T-loopRUR R RY5´U-turnRRGCGUR ARAGCY5´k-turn-1RYGGAGYR RRRC RRGARR5´k-turn-2CGAAGYYRYY RRGGGRUGGAG5´sarcin-ricin-1RACCYRRYGAACUGAAR CAUCUCAGUARYRGAGGA R A A5´sarcin-ricin-2YYAGUAG Y RAGGAAR5´twist_upCCGAYCCCAU CCGAACUCGG5´UAA_GANRYGRYAAYCRYA YYAGRGAAYC5´Csr-RsmRCAGG AGY5´C-loopGACAC UGYRY R Y R RRRCARUY5´domain-VRAGCRCGRAGY AYGYYRGUUY5´terminator1AAAAAGCYRYY RRYGGYUUUUUU Y U Y5´terminator2RRARR YYUUUU U U Y5´TRITR Y Y Y YGCGAGCAGACGCARAACRCCCRRYRRYGGGYGUUYUGCGUCUGCUCGCR R R R5´R R A R G G R G R R Y A U G R R5´R R G G R R R Y A U G A R R5´A A G G R R A U G R5´T−loopTerminatorCsr−RsmDomain−VGNRAtandem−GAk−turnSRP_SUNCGU−turntwist_upsarcin−ricinqqqqqqqqqqqqtmRNAcyano_tmRNAtRNA−SectRNAmascRNA−menRNARNaseP_bact_asuhBHis_leaderL10_leaderL20_leaderPhe_leaderThr_leaderrimPQrrIRES_ApthoCsrBPrrB_RsmZTwoAYGGAYU6Intron_gpIIRNaseP_archRNaseP_bact_bGEMM_RNA_motifalpha_tmRNAIntron_gpIgroup−II−D1D4−7group−II−D1D4−6group−II−D1D4−5group−II−D1D4−3group−II−D1D4−1MOCO_RNA_motifmanAClostridiales−1RtTserCT−boxSNORA73SAMArchaea_SRPBacteria_small_SRPBacteria_large_SRPFungi_SRPMetazoa_SRPPlant_SRPwcaGU3C4HIV_FETermite−leu5S_rRNASSU_rRNA_bacteriaSSU_rRNA_archaeaSSU_rRNA_eukaryaCobalaminFMNEntero_5_CREgroup−II−D1D4−4qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qqqqqqqqqqqqPaul Gardner Hunting RNA motifs
  • 31. RNA classificationBackofen et al. (2007) RNAs everywhere: genome-wide annotation of structured RNAs. Journal of ExperimentalZoology.Paul Gardner Hunting RNA motifs