Energy Landscapes and Dynamics of Biopolymers

425 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
425
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Energy Landscapes and Dynamics of Biopolymers

  1. 1. Energy Landscapes and Dynamics of Biopolymers Michael Thomas Wolfinger University of Vienna 5 March 2012
  2. 2. Outline 1 Biopolymer structure 2 Energy landscapes 3 Folding kinetics 4 RNA refolding 5 Summary
  3. 3. The RNA model AA GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA G G U A CAU GC AU CG CG AG ✲ GG A G G ✲ G GAGC U U CUCG C UGA A C UU AG U A G UA C U AU A G UU C GU C C GC U AG CG GC A C C A ´´´´´´´ ºº ´´´´ ºººººººº µµµµ º ´´´´´ ººººººº µµµµµ ººººº ´´´´´ ººººººº µµµµµ µµµµµµµ ºººº A secondary structure is a list of base pairs that fulfills two constraints: A base may participate in at most one base pair. Base pairs must not cross, i.e., no two pairs (i, j) and (k, l) may have i < k < j < l. (no pseudo-knots) The optimal as well as the suboptimal structures can be computed recursively.
  4. 4. The HP-model F(l) L(l) H In this simplified model, a conformation is a F(f) R(r) self-avoiding walk (SAW) on a given lattice in 2 P or 3 dimensions. Each bond is a straight line, F(f) R(b) L(f) bond angles have a few discrete values. The 20 L(r) letter alphabet of amino acids (monomers) is reduced to a two letter alphabet, namely H and FRRLLFLF P. H represents hydrophobic monomers, P represents hydrophilic or polar monomers. L U L Advantages: B F B F lattice-independent folding algorithms R R D P F simple energy function W M Z hydrophobicity can be reasonably modeled Q Q L F R X B R N Y B P
  5. 5. Energy functions RNA Lattice Proteins The standard energy model expresses the free The energy function for a sequence with energy of a secondary structure S as the sum n residues S = s1 s2 . . . sn with si ∈ A = of the energies of its loops l {a1 , a2 , . . . , ab }, the alphabet of b residues, and an overall configuration x = (x1 , x2 , . . . , xn ) on a lattice L can be written as the sum of pair E (S) = ∑ E (l) l∈S potentials E (S, x) = ∑ Ψ[si , sj ]. i < j −1 |xi − xj | = 1 H P N X i j H P H −4 0 0 0 H −1 0 P 0 1 −1 0 N P 0 0 N 0 −1 1 0 1 G A U X 0 0 0 0 A G U U GC U A UG G A UUC C G U C G A U C UUG U U G A UCCC C G U A G G U GU GC A G G G AU C G G G G C A U U UAC G U GG E = −17.5kcal/mol E = −16
  6. 6. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  7. 7. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  8. 8. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  9. 9. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  10. 10. The move set o o o o o o o o o o o o o o o o o FRLLRLFRR FRLLFLFRR o o o o o FUDLUDLU FURLUDLU For each move there must be an inverse move Resulting structure must be in X Move set must be ergodic
  11. 11. Energy barriers and barrier trees Some topological definitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further define a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predefined threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
  12. 12. Energy barriers and barrier trees Some topological definitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further define a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predefined threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
  13. 13. The lower part of the energy landscape Two conformations x and y are mutually accessible at the level η (written as x η y ) if there is a walk from x to y such that all conformations z in the walk satisfy E (S, z) ≤ η . The saddle height ˆ f (x, y ) of x and y is defined by f (x, y ) = min{η | x ˆ η y} ≤η Given the set of all local minima Xmin below threshold η , the lower energy part X ≤η of the energy landscape can alternatively be written as ≤η ˆ X ≤η = {y | ∃x ∈ Xmin : f (x, y ) ≤ η } Given a restricted set of low-energy conformations, Xinit , and a reasonable value for η , the lower part of the energy landscape can be calculated. M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 2006.
  14. 14. The Flooder approach E 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 Ep 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 D 11111111111111111111 00000000000000000000 D 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 B 11111111111111111111 00000000000000000000 B 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 A A
  15. 15. The concept of Barriers
  16. 16. The algorithm of Barriers Barriers Require: all suboptimal secondary structures within a certain energy range from mfe 1: B ⇐ 0 / 2: for all x ∈ subopt do 3: K ⇐0 / 4: N ⇐ generate neighbors(x) 5: for all y ∈ N do 6: if b ⇐ lookup hash(y ) then 7: K ⇐ K ∪b 8: end if 9: end for 10: if K = 0 then / 11: B ⇐ B ∪ {x} 12: end if 13: if |K | ≥ 2 then 14: merge basins(K ) 15: end if 16: write hash(x) 17: end for
  17. 17. The flooding algorithm 1
  18. 18. The flooding algorithm 1 2
  19. 19. The flooding algorithm 1 2 3
  20. 20. The flooding algorithm 1 2 3 4
  21. 21. Barrier tree example
  22. 22. Information from the barrier trees Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins A gradient basin is the set of all initial points from which a gradient walk (steepest descent) ends in the same local minimum.
  23. 23. Folding kinetics Biomolecules may have kinetic traps which prevent them from reaching equilibrium within the lifetime of the molecule. Long molecules are often trapped in such meta-stable states during transcription. Possible solutions are Stochastic folding simulations (predict folding pathways) Predicting structures for growing fragments of the sequence Analysis of the energy landscape based on complete suboptimal folding
  24. 24. Biopolymer dynamics The probability distribution P of structures as a function of time is ruled by a set of forward equations, also known as the master equation dPt (x) = ∑ [Pt (y )kxy − Pt (x)kyx ] dt y =x Given an initial population distribution, how does the system evolve in time? (What is the population distribution after n time-steps?) d Pt = UPt =⇒ Pt = e tU P0 dt
  25. 25. Kinfold: A stochastic kinetic foling algorithm Simulate folding kinetics by a rejection-less Monte-Carlo type algorithm: Generate all neighbors using the move-set Assign rates to each move, e.g. P1 P2 P8 P3 P7 P4 ∆E P6 P5 Pi = min 1, exp − kT Select a move with probability proportional to its rate Advance clock 1/ ∑i Pi .
  26. 26. Treekin: Barrier tree kinetics Local minima Gradient basins Saddle points Partition functions Barrier heights Free energies of (gradient) basins With this information, a reduced dynamics can be formulated as a Markov process by means of macrostates (i.e. basins in the barrier tree) and Arrhenius-like transition rates between them. 1 d Pt = UPt =⇒ Pt = e tU P0 0.8 ✟✟ ✙ ✟ dt ❍❍ population percentage ❍❥ ❍ 0.6 ✟ macro-states form a partition of the ✟✟ ✙ ✟ full configuration space 0.4 transition rates between macro-states 0.2 ✠     ❍❍ ∗ ❍❥ ❍ rβ α = Γβ α exp −(Eβ α − Gα )/kT 0 -2 -1 0 1 2 3 4 5 6 10 10 10 10 10 10 10 10 10 time M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Efficient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004.
  27. 27. Dynamics of a short artificial RNA CUGCGGCUUUGGCUCUAGCC n = 20 6.0 34 33 32 4.0 31 30 29 28 27 25 26 24 23 22 21 20 2.0 18 19 17 16 14 13 15 12 11 10 0.0 9 8 7 6 -2.0 5 4 3 2 -4.0 1 1 mfe 2 0.8 3 4 5 8 population probability 0.6 0.4 0.2 0 -2 0 2 3 -1 1 4 10 10 10 10 10 10 10 time
  28. 28. Dynamics of tRNA GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYAUCUGGAGGUCCUGUGTPCGAUCCACAGAAUUCGCACCA 1 1 mfe 5 0.8 80 0.8 56 population probability population probability 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 3 5 6 7 8 9 1 2 3 4 5 6 7 8 4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 time time
  29. 29. Dynamics of lattice proteins: HEX/TET lattice NNHHPPNNPHHHHPXP n = 16 2.0 1 NNHHPPNNPHHHHPXP n = 16 1.0 1 4 0.8 7 0.0 24 25 25 (pinfold) population probability -1.0 0.6 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 -2.0 13 14 15 16 17 18 19 20 21 22 23 0.4 -3.0 0.2 -4.0 9 10 8 11 12 -5.0 0 -3 2 3 4 5 6 7 1 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 time -2.0 1 -4.0 1 0.8 4 16 -6.0 17 population probability 37 0.6 37 (pinfold) -8.0 -10.0 0.4 -12.0 0.2 40 34 39 42 48 47 46 49 24 25 29 28 30 32 31 33 36 38 45 41 35 37 44 27 50 26 43 -14.0 9 4 5 11 10 20 19 6 18 8 7 12 15 14 17 16 22 13 21 23 3 2 1 0 -2 0 2 6 8 4 10 10 10 10 10 10 time
  30. 30. Barrier tree kinetics - problems and pitfalls The method works fine for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
  31. 31. Barrier tree kinetics - problems and pitfalls The method works fine for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
  32. 32. The PathFinder tool A heuristic approach to efficiently estimate low-energy refolding paths Overall procedure for direct paths: 1 Calculate distance bewteen start and target structure 2 Generate all neighbors of the start structure whose distance to the target is less than the distance of the start structure 3 Sort those neighbor structures by their energies 4 Take the n energetically best structures, take them as new starting points and repeat the procedure until the stop structure is found 5 If a path has been found, try to find another one with lower energy barrier Energy Energy START START STOP STOP 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 Distance Distance
  33. 33. PathFinder example - SV11 SV11 is a RNA switch of length 115 E = −69.2 kcal/mol E = −96.4 kcal/mol metastable stable template for Qβ replicase not a template
  34. 34. SV11 refolding path 1/3: Esaddle = −52.2 kcal/mol -50 ✛ ✛ -60 Energy (kcal/mol) -70 ■ ❅ ❅ ❅ ✕ ✁ ✁ ✁ -80 ✁ -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  35. 35. SV11 refolding path 2/3: Esaddle = −57.7 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ -80  ✒     -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  36. 36. SV11 refolding path 3/3: Esaddle = −59.2 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ GC GC A GC C U U AU CG CG CG CG CG -80 CG UAA UAA ✒   CG GC GC GC GC   GC GC UA CG AU CC G C U GG C GA CG A G C C U GC U G G C GUU U C C GA G AC C GC C C C U CU A U G A A G A U A A U A U A G GCU GGCC -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  37. 37. SV11 refolding paths -50 -60 Energy (kcal/mol) -70 -80 -90 -100 0 10 20 30 40 50 60 70 80 steps
  38. 38. libPF - a generic path sampling library In practice, this path sampling heuristics is implemented as a C library All structures along a path are stored in a hash and therefore available for the next iterations Heuristics routines are strictly separated from model-dependent routines, i.e. the library is completely generic Currently, RNA secondary structures and lattice proteins are implemented It is easy to extend the functionality to other discrete systems
  39. 39. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  40. 40. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  41. 41. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  42. 42. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  43. 43. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  44. 44. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  45. 45. References Christoph Flamm, Ivo Hofacker, Peter Stadler Rolf Backofen, Sebastian Will, Martin Mann M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Efficient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004. M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 74(4):725–732, 2006. Ch. Flamm, W. Fontana, I.L. Hofacker, and P. Schuster. RNA folding at elementary step resolution. RNA, 6:325–338, 2000 C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.

×