Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RNA bioinformatics
Paul Gardner
April 2, 2015
Paul Gardner RNA bioinformatics
Main questions
How can we predict RNA structure?
Paul Gardner RNA bioinformatics
Why do we care about RNA?
RNA is important for translation and gene regulation
2
3 of the ribosome is RNA. Ribosomal funct...
RNA: why is this stuff interesting?
RNA world was an essential step to modern protein-DNA
based life (using current reasona...
RNA interference
Image lifted from: http://en.wikipedia.org/wiki/RNA interference
Paul Gardner RNA bioinformatics
RNA: structure
G
C
G
G
A
U
UU
A
GCUC
AGD
D
G
G G A
G A G C
G
C
C
A
GA
C
U
G
A A
.
A
.
C
U
G
GAGG
U
C
C U G U G
T . C
G
A
U...
RNA: base-pairing
Canonical (Watson-Crick) base-pairs C · G, A · U.
Non-canonical (Wobble) base-pair G · U
Note: other non...
RNA: base-pairing
Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair
Paul Gardner RNA bioinformatics
RNA: base-pairing
bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total
WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1%
Wb 0...
RNA stacking
Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted ...
RNA: number of structures
AN is the number of possible secondary sequences of length N.
AN ∼ 4N
SN is the number of possib...
How can we make a secondary structure prediction
algorithm?
Maximize the number of base-pairs in a
RNA sequence?
Nussinov ...
Structure prediction: Nussinov
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Image from: Eddy S...
Structure prediction: Nussinov
Maximize the number of base-pairs in RNA sequence.
Seq = s1s2 · · · sn
Ni,j = 0, ∀ j − i < ...
Structure prediction: Nussinov
There are a few problems with this approach:
the solution to Nussinov is frequently not uni...
Structure prediction: Zuker
Nearest neighbour model
Modified Nussinov algorithm to find minimal free energy
(most stable) st...
Structure prediction: Zuker
WXY Z CG GC AU UA GU UG
CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11
GC -3.42 -3.26 -2.35 -2.24 -1.5...
Suboptimal structures
“There is an embarrassing abundance of structures having a free
energy near that of the optimum.” (M...
Accuracy of MFE predictions
Non-independant benchmarks:
Walter et al. (1994) Mean sensitivity 63.6
Mathews et al. (1999) M...
Limitations of MFE predictions
Energy parameters: estimated at constant salt
concentrations and temperatures.
Energy model...
Comparative sequence analysis
Input: a set of sequences with the same biological function
which are assumed to have approx...
Comparative sequence analysis
Evolution of RNA sequences
Base-pairs that covary have strong evolutionary support
U
A
C
A
A...
Alignment Folding: RNAalifold
Generate an alignment (e.g. with ClustalW)
Find a consensus structure that is both energetic...
Alignment Folding: RNAalifold
RNAalifold: energy + covariation.
βi,j =
1
N
N
α
Zα
i,j − Cov
Ci,j =
2
N(N − 1)
bα
i bα
j ,b...
Covariation metrics
Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves
i...
Rfam: annotation hierarchy
Types Clans Families Sequences
ribozyme
tRNA
CD-box_snoRNA
splicing
thermoregulator
leader
HACA...
Building an Rfam family
A structure from literature
An Rfam family: produced manually from publication figures
Paul Gardner...
An example Rfam entry
Paul Gardner RNA bioinformatics
Relevant reading
Reviews:
Eddy SR (2004) How do RNA folding algorithms work?
Nature Biotechnology.
Methods:
Hofacker et al...
The End
Paul Gardner RNA bioinformatics
Upcoming SlideShare
Loading in …5
×

BIOL335: RNA bioinformatics

2,355 views

Published on

A brief introduction to RNA folding.

Published in: Science
  • Hello! I have searched hard to find a reliable and best research paper writing service and finally i got a good option for my needs as ⇒ HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

BIOL335: RNA bioinformatics

  1. 1. RNA bioinformatics Paul Gardner April 2, 2015 Paul Gardner RNA bioinformatics
  2. 2. Main questions How can we predict RNA structure? Paul Gardner RNA bioinformatics
  3. 3. Why do we care about RNA? RNA is important for translation and gene regulation 2 3 of the ribosome is RNA. Ribosomal function is preserved even after amino-acid residues are deleted from the active site! Current estimates indicate that the number of ncRNA genes is comparable to the number of protein coding genes. mDNA uDNA rDNA tDNA pre-mRNA mRNA nascent protein localised protein spliceosome ribosome tRNA + RNase P RNase MRP+snoRNP snoRNP SRP tmRNA transcription splicing translation transport RISC (miRNA) Paul Gardner RNA bioinformatics
  4. 4. RNA: why is this stuff interesting? RNA world was an essential step to modern protein-DNA based life (using current reasonable models). Which came first, DNA or protein? RNA has catalytic potential (like protein), carries hereditary information (like DNA). Image by James W. Brown, www.mbio.ncsu.edu/JWB/soup.html Paul Gardner RNA bioinformatics
  5. 5. RNA interference Image lifted from: http://en.wikipedia.org/wiki/RNA interference Paul Gardner RNA bioinformatics
  6. 6. RNA: structure G C G G A U UU A GCUC AGD D G G G A G A G C G C C A GA C U G A A . A . C U G GAGG U C C U G U G T . C G A UC CACAG A A U U C G C A C CA Variable LoopAnticodon Loop T ΨC Loop 10 15 20 25 30 355 40 45 50 55 60 65 70 75 Anticodon Loop Acceptor Stem GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’ Secondary Structure Tertiary StructureB C Primary StructureA Acceptor Stem T ΨC Loop ΨΨ Ψ Ψ Y 65 60 55 40 10 20 15 5 70 75 25 30 35 45 50 D Loop 3’ 5’ 5’ 3’ D Loop Paul Gardner RNA bioinformatics
  7. 7. RNA: base-pairing Canonical (Watson-Crick) base-pairs C · G, A · U. Non-canonical (Wobble) base-pair G · U Note: other non-canonical base-pairs do occur, but these are “rare” and generally re-defined as “tertiary” interactions. Central dogma of structural biology: structure is important for function. Images lifted from: http://en.wikipedia.org/wiki/Base pair Paul Gardner RNA bioinformatics
  8. 8. RNA: base-pairing Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair Paul Gardner RNA bioinformatics
  9. 9. RNA: base-pairing bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1% Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6% Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3% Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0% Just 71.3% of rRNA contacts are canonical or G:U wobble! Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rRNA structure and RNA structural motifs J Mol Biol. Paul Gardner RNA bioinformatics
  10. 10. RNA stacking Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from: http://rna.ucsc.edu/pdbrestraints/index.html Paul Gardner RNA bioinformatics
  11. 11. RNA: number of structures AN is the number of possible secondary sequences of length N. AN ∼ 4N SN is the number of possible secondary structures of length N. S0 = S1 = 1 SN+1 = SN + N j=1 Sj−1SN−j+1 SN ∼ 1.8N Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics. Paul Gardner RNA bioinformatics
  12. 12. How can we make a secondary structure prediction algorithm? Maximize the number of base-pairs in a RNA sequence? Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  13. 13. Structure prediction: Nussinov Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Image from: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Paul Gardner RNA bioinformatics
  14. 14. Structure prediction: Nussinov Maximize the number of base-pairs in RNA sequence. Seq = s1s2 · · · sn Ni,j = 0, ∀ j − i < 3. Ni,j = max    Ni+1,j−1 + ρ(i, j), i, j pair Ni+1,j , i unpaired Ni,j−1, j unpaired maxi<k<j [Ni,k + Nk+1,j ] bifurcation O(n3) in CPU, O(n2) in memory. ρ(i, j) = 1 if si and sj are complementary, otherwise ρ(i, j) = 0. N1,n = BPmax . Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  15. 15. Structure prediction: Nussinov There are a few problems with this approach: the solution to Nussinov is frequently not unique. For example, the 77 nucleotide long tRNAhis has 22 base-pairs in the phylogentic structure, there are 149, 126 structures with the maximal number of 26 base-pairs! The method ignores stacking interactions. Fontana (2002) Modelling ‘evo-devo’ with RNA. BioEssays. Paul Gardner RNA bioinformatics
  16. 16. Structure prediction: Zuker Nearest neighbour model Modified Nussinov algorithm to find minimal free energy (most stable) structures A U C G U A G C S3 S2 S1 S1 S2 S3 GU L A C Free Energy = L + + + = −1.70 kcal/mol = 5.00 − 2.11 − 2.35 − 2.24 ∆Gstack = ∆H37,stack − T∆S37,stack ∆Gloop = −T∆S37,loop Tinoco et al. (1971) Estimation of secondary structure in RNA. Nature. Paul Gardner RNA bioinformatics
  17. 17. Structure prediction: Zuker WXY Z CG GC AU UA GU UG CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11 GC -3.42 -3.26 -2.35 -2.24 -1.53 -2.51 AU -2.24 -2.08 -0.93 -1.10 -0.55 -1.36 UA -2.35 -2.11 -1.33 -0.93 -1.00 -1.27 GU -2.51 -2.11 -1.27 -1.36 +0.47 +1.29 UG -1.53 -1.41 -1.00 -0.55 +0.30 +0.47 Energies (∆G in kcals/mol) of 5 3 W X Y Z 3 5 stacked basepairs. Note that ∆G of 5 3 W X Y Z 3 5 stacks is the same as 5 3 Z Y X W 3 5 stacks. Mathews et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. JMB. Paul Gardner RNA bioinformatics
  18. 18. Suboptimal structures “There is an embarrassing abundance of structures having a free energy near that of the optimum.” (McCaskill 1990) −5 0 5 10 15 20 25 30 35 −22 −21.8 −21.6 −21.4 −21.2 −21 −20.8 −20.6 −20.4 −20.2 −20 dBP (Si ,Smfe ) ∆G(kcal/mol) G C G G A U U U A G CU C A G U U G G G A G A G C G C C A G A C U G A A G A U U U G G AG G U C C U G U G U U C G A U C C A C A G A A U U C G C A G C G G A UUU A GCUC AGU U G G G A G A G C G C C A G A C U G A A GA U U U G GAGG U C C U G U G U U C G AUC CACAG A A U U C G C A G C G G A U U UA G C UCAGUUG GGAG A G C G C C A G A C U G A AGAU U U G G A G G U C C U G U G U UC GAUC CA CA G A A U U C G C A Biological Suboptimal MFE Wuchty et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures, Biopolymers. Paul Gardner RNA bioinformatics
  19. 19. Accuracy of MFE predictions Non-independant benchmarks: Walter et al. (1994) Mean sensitivity 63.6 Mathews et al. (1999) Mean sensitivity 72.9% Independant benchmarks: Doshi et al. (2004) Mean sensitivity 41% Dowell & Eddy (2004) Mean sensitivity 56% Mean PPV 48% Gardner & Giegerich (2004) Mean sensitivity 56% Mean PPV 46% Data-sets: tRNA, SSU rRNA, LSU rRNA, SRP, RNase P, tmRNA. Paul Gardner RNA bioinformatics
  20. 20. Limitations of MFE predictions Energy parameters: estimated at constant salt concentrations and temperatures. Energy model: models of loop energies are extrapolated from relatively few experiments, no pseudoknots, ... Cellular environment: contains proteins, RNAs, DNAs, sugars, etc Post-transcriptional modifications: many functional RNAs have been covalently modified. Folding kinetics: RNAs fold along “pathways”, perhaps becoming trapped in sub-optimal conformations. Co-transcriptional folding: RNAs fold during transcription, the transcriptional apparatus occludes 3’ portions of the sequence. Transcription is jerky: transcriptional pausing can influence folding. Paul Gardner RNA bioinformatics
  21. 21. Comparative sequence analysis Input: a set of sequences with the same biological function which are assumed to have approximately the same structure. Output: the common structural elements, aligned sequences and a phylogeny which best explains the observed data. 2 4 5 3 1 >1 GCAUCCAUGGCUGAAUGGUUAAAGCGCCCAACUCAUAAUUGGCGAACUCGCGGGUUCAAUUCCUGCUGGAUGCA >2 GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA >3 UGGGCUAUGGUGUAAUUGGCAGCACGACUGAUUCUGGUUCAGUUAGUCUAGGUUCGAGUCCUGGUAGCCCAG >4 GAAGAUCGUCGUCUCCGGUGAGGCGGCUGGACUUCAAAUCCAGUUGGGGCCGCCAGCGGUCCCGGGCAGGUUCGACUCCUGUGAUCUUCCG >5 CUAAAUAUAUUUCAAUGGUUAGCAAAAUACGCUUGUGGUGCGUUAAAUCUAAGUUCGAUUCUUAGUAUUUACC ** * 1 GCAUCCAUGGCUGAAU-GGUU-AAAGCGCCCAACUCAUAAUUGGCGAA-- 2 GCAUUGGUGGUUCAGU-GGU--AGAAUUCUCGCCUGCCACGCGG-GAG-- 3 UGGGCUAUGGUGUAAUUGGC--AGCACGACUGAUUCUGGUUCAG-UUA-- 4 GAAGAUCGUCGUCUCC-GGUG-AGGCGGCUGGACUUCAAAUCCA-GU-UG 5 CUAAAUAUAUUUCAAU-GGUUAGCAAAAUACGCUUGUGGUGCGU-UAA-- **** * ** 1 ------------------CUCGCGGGUUCAAUUCCUGCUGGAUGC-A 2 ------------------G-CCCGGGUUCGAUUCCCGGCCAAUGC-A 3 ------------------G-UCUAGGUUCGAGUCCUGGUAGCCCA-G 4 GGGCCGCCAGCGGUCCCG--GGCAGGUUCGACUCCUGUGAUCUUCCG 5 ------------------A-UCUAAGUUCGAUUCUUAGUAUUUAC-C S M A D M Y MUR SYUC A MY- G G Y u a A V M M M R M H C R MY U S H V R H K C V R c K W A - - - - - c c - c c a - c - - - c c c -V-YS Y R R G U U C R AY U CCYRS Y M D M Y V M c V Paul Gardner RNA bioinformatics
  22. 22. Comparative sequence analysis Evolution of RNA sequences Base-pairs that covary have strong evolutionary support U A C A A G A G U G C G U U U A A G U AY R Y A A S M G U S C G Y K K A A G Y RY A U A A N A D U G C G U U G A A G U R c b (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) UACAAGAGUGCGCUUAAGUA UGCAAAAGUCCGUUUAAGCA UAUAACCUUUCGAGGAAAUA CAUAAUAAUGCGUUGAAGUG a MIS YAUAANADUGCGUUGAAGURAncestral UACAAGAGUGCGUUUAAGUA YRYAASMGUSCGYKKAAGYR consensus consensusAncestral MIS G U A U G C U G C G U A fast fast slow Paul Gardner RNA bioinformatics
  23. 23. Alignment Folding: RNAalifold Generate an alignment (e.g. with ClustalW) Find a consensus structure that is both energetically stable in all sequences and has covariation support G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA G C B K M W WU A GCUC A GU u - G G K A G A G C R Y Y W S A Y U K A W R A U C W R RAKG u C S C S -R G U U C G AWY CYSKB W W U S S G C A UA Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  24. 24. Alignment Folding: RNAalifold RNAalifold: energy + covariation. βi,j = 1 N N α Zα i,j − Cov Ci,j = 2 N(N − 1) bα i bα j ,bβ i bβ j DH(bα i bα j , bβ i bβ j )Πα ij Πβ ij Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  25. 25. Covariation metrics Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics. Paul Gardner RNA bioinformatics
  26. 26. Rfam: annotation hierarchy Types Clans Families Sequences ribozyme tRNA CD-box_snoRNA splicing thermoregulator leader HACA-box_snoRNA scaRNA Intron IRES frameshift_element sRNA riboswitch antisense rRNA miRNA CRISPR Cis-reg. Gene snRNA snoRNA Intron Types Paul Gardner RNA bioinformatics
  27. 27. Building an Rfam family A structure from literature An Rfam family: produced manually from publication figures Paul Gardner RNA bioinformatics
  28. 28. An example Rfam entry Paul Gardner RNA bioinformatics
  29. 29. Relevant reading Reviews: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Methods: Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  30. 30. The End Paul Gardner RNA bioinformatics

×