Upcoming SlideShare
×

# Energy Landscapes and Dynamics of Biopolymers

425 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
425
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
16
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Energy Landscapes and Dynamics of Biopolymers

1. 1. Energy Landscapes and Dynamics of Biopolymers Michael Thomas Wolﬁnger University of Vienna 5 March 2012
2. 2. Outline 1 Biopolymer structure 2 Energy landscapes 3 Folding kinetics 4 RNA refolding 5 Summary
3. 3. The RNA model AA GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA G G U A CAU GC AU CG CG AG ✲ GG A G G ✲ G GAGC U U CUCG C UGA A C UU AG U A G UA C U AU A G UU C GU C C GC U AG CG GC A C C A ´´´´´´´ ºº ´´´´ ºººººººº µµµµ º ´´´´´ ººººººº µµµµµ ººººº ´´´´´ ººººººº µµµµµ µµµµµµµ ºººº A secondary structure is a list of base pairs that fulﬁlls two constraints: A base may participate in at most one base pair. Base pairs must not cross, i.e., no two pairs (i, j) and (k, l) may have i < k < j < l. (no pseudo-knots) The optimal as well as the suboptimal structures can be computed recursively.
4. 4. The HP-model F(l) L(l) H In this simpliﬁed model, a conformation is a F(f) R(r) self-avoiding walk (SAW) on a given lattice in 2 P or 3 dimensions. Each bond is a straight line, F(f) R(b) L(f) bond angles have a few discrete values. The 20 L(r) letter alphabet of amino acids (monomers) is reduced to a two letter alphabet, namely H and FRRLLFLF P. H represents hydrophobic monomers, P represents hydrophilic or polar monomers. L U L Advantages: B F B F lattice-independent folding algorithms R R D P F simple energy function W M Z hydrophobicity can be reasonably modeled Q Q L F R X B R N Y B P
5. 5. Energy functions RNA Lattice Proteins The standard energy model expresses the free The energy function for a sequence with energy of a secondary structure S as the sum n residues S = s1 s2 . . . sn with si ∈ A = of the energies of its loops l {a1 , a2 , . . . , ab }, the alphabet of b residues, and an overall conﬁguration x = (x1 , x2 , . . . , xn ) on a lattice L can be written as the sum of pair E (S) = ∑ E (l) l∈S potentials E (S, x) = ∑ Ψ[si , sj ]. i < j −1 |xi − xj | = 1 H P N X i j H P H −4 0 0 0 H −1 0 P 0 1 −1 0 N P 0 0 N 0 −1 1 0 1 G A U X 0 0 0 0 A G U U GC U A UG G A UUC C G U C G A U C UUG U U G A UCCC C G U A G G U GU GC A G G G AU C G G G G C A U U UAC G U GG E = −17.5kcal/mol E = −16
6. 6. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of conﬁgurations an energy function f : X → R a symmetric neighborhood relation N : X × X
7. 7. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of conﬁgurations an energy function f : X → R a symmetric neighborhood relation N : X × X
8. 8. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of conﬁgurations an energy function f : X → R a symmetric neighborhood relation N : X × X
9. 9. Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of conﬁgurations an energy function f : X → R a symmetric neighborhood relation N : X × X
10. 10. The move set o o o o o o o o o o o o o o o o o FRLLRLFRR FRLLFLFRR o o o o o FUDLUDLU FURLUDLU For each move there must be an inverse move Resulting structure must be in X Move set must be ergodic
11. 11. Energy barriers and barrier trees Some topological deﬁnitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further deﬁne a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predeﬁned threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolﬁnger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
12. 12. Energy barriers and barrier trees Some topological deﬁnitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further deﬁne a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predeﬁned threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolﬁnger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
13. 13. The lower part of the energy landscape Two conformations x and y are mutually accessible at the level η (written as x η y ) if there is a walk from x to y such that all conformations z in the walk satisfy E (S, z) ≤ η . The saddle height ˆ f (x, y ) of x and y is deﬁned by f (x, y ) = min{η | x ˆ η y} ≤η Given the set of all local minima Xmin below threshold η , the lower energy part X ≤η of the energy landscape can alternatively be written as ≤η ˆ X ≤η = {y | ∃x ∈ Xmin : f (x, y ) ≤ η } Given a restricted set of low-energy conformations, Xinit , and a reasonable value for η , the lower part of the energy landscape can be calculated. M. T. Wolﬁnger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 2006.
14. 14. The Flooder approach E 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 Ep 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 D 11111111111111111111 00000000000000000000 D 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 B 11111111111111111111 00000000000000000000 B 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 A A
15. 15. The concept of Barriers
16. 16. The algorithm of Barriers Barriers Require: all suboptimal secondary structures within a certain energy range from mfe 1: B ⇐ 0 / 2: for all x ∈ subopt do 3: K ⇐0 / 4: N ⇐ generate neighbors(x) 5: for all y ∈ N do 6: if b ⇐ lookup hash(y ) then 7: K ⇐ K ∪b 8: end if 9: end for 10: if K = 0 then / 11: B ⇐ B ∪ {x} 12: end if 13: if |K | ≥ 2 then 14: merge basins(K ) 15: end if 16: write hash(x) 17: end for
17. 17. The ﬂooding algorithm 1
18. 18. The ﬂooding algorithm 1 2
19. 19. The ﬂooding algorithm 1 2 3
20. 20. The ﬂooding algorithm 1 2 3 4
21. 21. Barrier tree example
22. 22. Information from the barrier trees Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins A gradient basin is the set of all initial points from which a gradient walk (steepest descent) ends in the same local minimum.
23. 23. Folding kinetics Biomolecules may have kinetic traps which prevent them from reaching equilibrium within the lifetime of the molecule. Long molecules are often trapped in such meta-stable states during transcription. Possible solutions are Stochastic folding simulations (predict folding pathways) Predicting structures for growing fragments of the sequence Analysis of the energy landscape based on complete suboptimal folding
24. 24. Biopolymer dynamics The probability distribution P of structures as a function of time is ruled by a set of forward equations, also known as the master equation dPt (x) = ∑ [Pt (y )kxy − Pt (x)kyx ] dt y =x Given an initial population distribution, how does the system evolve in time? (What is the population distribution after n time-steps?) d Pt = UPt =⇒ Pt = e tU P0 dt
25. 25. Kinfold: A stochastic kinetic foling algorithm Simulate folding kinetics by a rejection-less Monte-Carlo type algorithm: Generate all neighbors using the move-set Assign rates to each move, e.g. P1 P2 P8 P3 P7 P4 ∆E P6 P5 Pi = min 1, exp − kT Select a move with probability proportional to its rate Advance clock 1/ ∑i Pi .
26. 26. Treekin: Barrier tree kinetics Local minima Gradient basins Saddle points Partition functions Barrier heights Free energies of (gradient) basins With this information, a reduced dynamics can be formulated as a Markov process by means of macrostates (i.e. basins in the barrier tree) and Arrhenius-like transition rates between them. 1 d Pt = UPt =⇒ Pt = e tU P0 0.8 ✟✟ ✙ ✟ dt ❍❍ population percentage ❍❥ ❍ 0.6 ✟ macro-states form a partition of the ✟✟ ✙ ✟ full conﬁguration space 0.4 transition rates between macro-states 0.2 ✠     ❍❍ ∗ ❍❥ ❍ rβ α = Γβ α exp −(Eβ α − Gα )/kT 0 -2 -1 0 1 2 3 4 5 6 10 10 10 10 10 10 10 10 10 time M. T. Wolﬁnger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Eﬃcient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004.
27. 27. Dynamics of a short artiﬁcial RNA CUGCGGCUUUGGCUCUAGCC n = 20 6.0 34 33 32 4.0 31 30 29 28 27 25 26 24 23 22 21 20 2.0 18 19 17 16 14 13 15 12 11 10 0.0 9 8 7 6 -2.0 5 4 3 2 -4.0 1 1 mfe 2 0.8 3 4 5 8 population probability 0.6 0.4 0.2 0 -2 0 2 3 -1 1 4 10 10 10 10 10 10 10 time
28. 28. Dynamics of tRNA GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYAUCUGGAGGUCCUGUGTPCGAUCCACAGAAUUCGCACCA 1 1 mfe 5 0.8 80 0.8 56 population probability population probability 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 3 5 6 7 8 9 1 2 3 4 5 6 7 8 4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 time time
29. 29. Dynamics of lattice proteins: HEX/TET lattice NNHHPPNNPHHHHPXP n = 16 2.0 1 NNHHPPNNPHHHHPXP n = 16 1.0 1 4 0.8 7 0.0 24 25 25 (pinfold) population probability -1.0 0.6 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 -2.0 13 14 15 16 17 18 19 20 21 22 23 0.4 -3.0 0.2 -4.0 9 10 8 11 12 -5.0 0 -3 2 3 4 5 6 7 1 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 time -2.0 1 -4.0 1 0.8 4 16 -6.0 17 population probability 37 0.6 37 (pinfold) -8.0 -10.0 0.4 -12.0 0.2 40 34 39 42 48 47 46 49 24 25 29 28 30 32 31 33 36 38 45 41 35 37 44 27 50 26 43 -14.0 9 4 5 11 10 20 19 6 18 8 7 12 15 14 17 16 22 13 21 23 3 2 1 0 -2 0 2 6 8 4 10 10 10 10 10 10 time
30. 30. Barrier tree kinetics - problems and pitfalls The method works ﬁne for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
31. 31. Barrier tree kinetics - problems and pitfalls The method works ﬁne for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
32. 32. The PathFinder tool A heuristic approach to eﬃciently estimate low-energy refolding paths Overall procedure for direct paths: 1 Calculate distance bewteen start and target structure 2 Generate all neighbors of the start structure whose distance to the target is less than the distance of the start structure 3 Sort those neighbor structures by their energies 4 Take the n energetically best structures, take them as new starting points and repeat the procedure until the stop structure is found 5 If a path has been found, try to ﬁnd another one with lower energy barrier Energy Energy START START STOP STOP 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 Distance Distance
33. 33. PathFinder example - SV11 SV11 is a RNA switch of length 115 E = −69.2 kcal/mol E = −96.4 kcal/mol metastable stable template for Qβ replicase not a template
34. 34. SV11 refolding path 1/3: Esaddle = −52.2 kcal/mol -50 ✛ ✛ -60 Energy (kcal/mol) -70 ■ ❅ ❅ ❅ ✕ ✁ ✁ ✁ -80 ✁ -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
35. 35. SV11 refolding path 2/3: Esaddle = −57.7 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ -80  ✒     -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
36. 36. SV11 refolding path 3/3: Esaddle = −59.2 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ GC GC A GC C U U AU CG CG CG CG CG -80 CG UAA UAA ✒   CG GC GC GC GC   GC GC UA CG AU CC G C U GG C GA CG A G C C U GC U G G C GUU U C C GA G AC C GC C C C U CU A U G A A G A U A A U A U A G GCU GGCC -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
37. 37. SV11 refolding paths -50 -60 Energy (kcal/mol) -70 -80 -90 -100 0 10 20 30 40 50 60 70 80 steps
38. 38. libPF - a generic path sampling library In practice, this path sampling heuristics is implemented as a C library All structures along a path are stored in a hash and therefore available for the next iterations Heuristics routines are strictly separated from model-dependent routines, i.e. the library is completely generic Currently, RNA secondary structures and lattice proteins are implemented It is easy to extend the functionality to other discrete systems
39. 39. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
40. 40. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
41. 41. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
42. 42. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
43. 43. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
44. 44. Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a ﬂooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
45. 45. References Christoph Flamm, Ivo Hofacker, Peter Stadler Rolf Backofen, Sebastian Will, Martin Mann M. T. Wolﬁnger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Eﬃcient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004. M. T. Wolﬁnger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 74(4):725–732, 2006. Ch. Flamm, W. Fontana, I.L. Hofacker, and P. Schuster. RNA folding at elementary step resolution. RNA, 6:325–338, 2000 C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolﬁnger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.