Energy Landscapes and Dynamics of Biopolymers
Upcoming SlideShare
Loading in...5
×
 

Energy Landscapes and Dynamics of Biopolymers

on

  • 224 views

 

Statistics

Views

Total Views
224
Views on SlideShare
224
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Energy Landscapes and Dynamics of Biopolymers Energy Landscapes and Dynamics of Biopolymers Presentation Transcript

  • Energy Landscapes and Dynamics of Biopolymers Michael Thomas Wolfinger University of Vienna 5 March 2012
  • Outline 1 Biopolymer structure 2 Energy landscapes 3 Folding kinetics 4 RNA refolding 5 Summary
  • The RNA model AA GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA G G U A CAU GC AU CG CG AG ✲ GG A G G ✲ G GAGC U U CUCG C UGA A C UU AG U A G UA C U AU A G UU C GU C C GC U AG CG GC A C C A ´´´´´´´ ºº ´´´´ ºººººººº µµµµ º ´´´´´ ººººººº µµµµµ ººººº ´´´´´ ººººººº µµµµµ µµµµµµµ ºººº A secondary structure is a list of base pairs that fulfills two constraints: A base may participate in at most one base pair. Base pairs must not cross, i.e., no two pairs (i, j) and (k, l) may have i < k < j < l. (no pseudo-knots) The optimal as well as the suboptimal structures can be computed recursively. View slide
  • The HP-model F(l) L(l) H In this simplified model, a conformation is a F(f) R(r) self-avoiding walk (SAW) on a given lattice in 2 P or 3 dimensions. Each bond is a straight line, F(f) R(b) L(f) bond angles have a few discrete values. The 20 L(r) letter alphabet of amino acids (monomers) is reduced to a two letter alphabet, namely H and FRRLLFLF P. H represents hydrophobic monomers, P represents hydrophilic or polar monomers. L U L Advantages: B F B F lattice-independent folding algorithms R R D P F simple energy function W M Z hydrophobicity can be reasonably modeled Q Q L F R X B R N Y B P View slide
  • Energy functions RNA Lattice Proteins The standard energy model expresses the free The energy function for a sequence with energy of a secondary structure S as the sum n residues S = s1 s2 . . . sn with si ∈ A = of the energies of its loops l {a1 , a2 , . . . , ab }, the alphabet of b residues, and an overall configuration x = (x1 , x2 , . . . , xn ) on a lattice L can be written as the sum of pair E (S) = ∑ E (l) l∈S potentials E (S, x) = ∑ Ψ[si , sj ]. i < j −1 |xi − xj | = 1 H P N X i j H P H −4 0 0 0 H −1 0 P 0 1 −1 0 N P 0 0 N 0 −1 1 0 1 G A U X 0 0 0 0 A G U U GC U A UG G A UUC C G U C G A U C UUG U U G A UCCC C G U A G G U GU GC A G G G AU C G G G G C A U U UAC G U GG E = −17.5kcal/mol E = −16
  • Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  • Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  • Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  • Folding landscape - energy landscape The energy landscape of a biopolymer molecule is a complex surface of the (free) energy versus the conformational degrees of freedom. Number of RNA secondary structures dim Lattice Type µ γ 3 SQ 2.63820 1.34275 cn ∼ 1.86n · n− 2 2 TRI 4.15076 1.343 dynamic programming algorithms available HEX 1.84777 1.345 SC 4.68391 1.161 Number of LP structures 3 BCC 6.53036 1.161 cn ∼ µ n · nγ −1 FCC 10.0364 1.162 problem is NP-hard Formally, three things are needed to construct an energy landscape: A set X of configurations an energy function f : X → R a symmetric neighborhood relation N : X × X
  • The move set o o o o o o o o o o o o o o o o o FRLLRLFRR FRLLFLFRR o o o o o FUDLUDLU FURLUDLU For each move there must be an inverse move Resulting structure must be in X Move set must be ergodic
  • Energy barriers and barrier trees Some topological definitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further define a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predefined threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
  • Energy barriers and barrier trees Some topological definitions: A structure is a d local minimum if its energy is lower c b than the energy of all neighbors 5 a local maximum if its energy is higher 4 3 than the energy of all neighbors 2 saddle point if there are at least two E local minima thar can be reached by a 1* downhill walk starting at this point We further define a walk between two conformations x and y as a list of conformations x = x1 . . . xm+1 = y such that ∀1 ≤ i ≤ m : N(xi , xi+1 ) the lower part of the energy landscape (written as X ≤η ) as all conformations x such that E (S, x) ≤ η (with a predefined threshold η ). C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
  • The lower part of the energy landscape Two conformations x and y are mutually accessible at the level η (written as x η y ) if there is a walk from x to y such that all conformations z in the walk satisfy E (S, z) ≤ η . The saddle height ˆ f (x, y ) of x and y is defined by f (x, y ) = min{η | x ˆ η y} ≤η Given the set of all local minima Xmin below threshold η , the lower energy part X ≤η of the energy landscape can alternatively be written as ≤η ˆ X ≤η = {y | ∃x ∈ Xmin : f (x, y ) ≤ η } Given a restricted set of low-energy conformations, Xinit , and a reasonable value for η , the lower part of the energy landscape can be calculated. M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 2006.
  • The Flooder approach E 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 Ep 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 D 11111111111111111111 00000000000000000000 D 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 C 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 B 11111111111111111111 00000000000000000000 B 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 11111111111111111111 00000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 A A
  • The concept of Barriers
  • The algorithm of Barriers Barriers Require: all suboptimal secondary structures within a certain energy range from mfe 1: B ⇐ 0 / 2: for all x ∈ subopt do 3: K ⇐0 / 4: N ⇐ generate neighbors(x) 5: for all y ∈ N do 6: if b ⇐ lookup hash(y ) then 7: K ⇐ K ∪b 8: end if 9: end for 10: if K = 0 then / 11: B ⇐ B ∪ {x} 12: end if 13: if |K | ≥ 2 then 14: merge basins(K ) 15: end if 16: write hash(x) 17: end for
  • The flooding algorithm 1
  • The flooding algorithm 1 2
  • The flooding algorithm 1 2 3
  • The flooding algorithm 1 2 3 4
  • Barrier tree example
  • Information from the barrier trees Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins A gradient basin is the set of all initial points from which a gradient walk (steepest descent) ends in the same local minimum.
  • Folding kinetics Biomolecules may have kinetic traps which prevent them from reaching equilibrium within the lifetime of the molecule. Long molecules are often trapped in such meta-stable states during transcription. Possible solutions are Stochastic folding simulations (predict folding pathways) Predicting structures for growing fragments of the sequence Analysis of the energy landscape based on complete suboptimal folding
  • Biopolymer dynamics The probability distribution P of structures as a function of time is ruled by a set of forward equations, also known as the master equation dPt (x) = ∑ [Pt (y )kxy − Pt (x)kyx ] dt y =x Given an initial population distribution, how does the system evolve in time? (What is the population distribution after n time-steps?) d Pt = UPt =⇒ Pt = e tU P0 dt
  • Kinfold: A stochastic kinetic foling algorithm Simulate folding kinetics by a rejection-less Monte-Carlo type algorithm: Generate all neighbors using the move-set Assign rates to each move, e.g. P1 P2 P8 P3 P7 P4 ∆E P6 P5 Pi = min 1, exp − kT Select a move with probability proportional to its rate Advance clock 1/ ∑i Pi .
  • Treekin: Barrier tree kinetics Local minima Gradient basins Saddle points Partition functions Barrier heights Free energies of (gradient) basins With this information, a reduced dynamics can be formulated as a Markov process by means of macrostates (i.e. basins in the barrier tree) and Arrhenius-like transition rates between them. 1 d Pt = UPt =⇒ Pt = e tU P0 0.8 ✟✟ ✙ ✟ dt ❍❍ population percentage ❍❥ ❍ 0.6 ✟ macro-states form a partition of the ✟✟ ✙ ✟ full configuration space 0.4 transition rates between macro-states 0.2 ✠     ❍❍ ∗ ❍❥ ❍ rβ α = Γβ α exp −(Eβ α − Gα )/kT 0 -2 -1 0 1 2 3 4 5 6 10 10 10 10 10 10 10 10 10 time M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Efficient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004.
  • Dynamics of a short artificial RNA CUGCGGCUUUGGCUCUAGCC n = 20 6.0 34 33 32 4.0 31 30 29 28 27 25 26 24 23 22 21 20 2.0 18 19 17 16 14 13 15 12 11 10 0.0 9 8 7 6 -2.0 5 4 3 2 -4.0 1 1 mfe 2 0.8 3 4 5 8 population probability 0.6 0.4 0.2 0 -2 0 2 3 -1 1 4 10 10 10 10 10 10 10 time
  • Dynamics of tRNA GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYAUCUGGAGGUCCUGUGTPCGAUCCACAGAAUUCGCACCA 1 1 mfe 5 0.8 80 0.8 56 population probability population probability 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 3 5 6 7 8 9 1 2 3 4 5 6 7 8 4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 time time
  • Dynamics of lattice proteins: HEX/TET lattice NNHHPPNNPHHHHPXP n = 16 2.0 1 NNHHPPNNPHHHHPXP n = 16 1.0 1 4 0.8 7 0.0 24 25 25 (pinfold) population probability -1.0 0.6 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 -2.0 13 14 15 16 17 18 19 20 21 22 23 0.4 -3.0 0.2 -4.0 9 10 8 11 12 -5.0 0 -3 2 3 4 5 6 7 1 -2 -1 0 1 2 3 4 5 10 10 10 10 10 10 10 10 10 time -2.0 1 -4.0 1 0.8 4 16 -6.0 17 population probability 37 0.6 37 (pinfold) -8.0 -10.0 0.4 -12.0 0.2 40 34 39 42 48 47 46 49 24 25 29 28 30 32 31 33 36 38 45 41 35 37 44 27 50 26 43 -14.0 9 4 5 11 10 20 19 6 18 8 7 12 15 14 17 16 22 13 21 23 3 2 1 0 -2 0 2 6 8 4 10 10 10 10 10 10 time
  • Barrier tree kinetics - problems and pitfalls The method works fine for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
  • Barrier tree kinetics - problems and pitfalls The method works fine for moderately sized systems. Currently, we consider approx. 100 million structures within a single run of Barriers to calculate the topology of the landscape. However, we are interested in larger systems: biologically relevant RNA switches large 3D lattice proteins The next steps: use high-level diagonalization routines for sparse matrices calculate low-energy structures sample (thermodynamics properties of) individual basins sample low-enery refolding paths
  • The PathFinder tool A heuristic approach to efficiently estimate low-energy refolding paths Overall procedure for direct paths: 1 Calculate distance bewteen start and target structure 2 Generate all neighbors of the start structure whose distance to the target is less than the distance of the start structure 3 Sort those neighbor structures by their energies 4 Take the n energetically best structures, take them as new starting points and repeat the procedure until the stop structure is found 5 If a path has been found, try to find another one with lower energy barrier Energy Energy START START STOP STOP 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 Distance Distance
  • PathFinder example - SV11 SV11 is a RNA switch of length 115 E = −69.2 kcal/mol E = −96.4 kcal/mol metastable stable template for Qβ replicase not a template
  • SV11 refolding path 1/3: Esaddle = −52.2 kcal/mol -50 ✛ ✛ -60 Energy (kcal/mol) -70 ■ ❅ ❅ ❅ ✕ ✁ ✁ ✁ -80 ✁ -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  • SV11 refolding path 2/3: Esaddle = −57.7 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ -80  ✒     -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  • SV11 refolding path 3/3: Esaddle = −59.2 kcal/mol -50 ✟ ✟✟ ✙ -60 Energy (kcal/mol) -70 ❄ ■ ❅ ❅ ❅ GC GC A GC C U U AU CG CG CG CG CG -80 CG UAA UAA ✒   CG GC GC GC GC   GC GC UA CG AU CC G C U GG C GA CG A G C C U GC U G G C GUU U C C GA G AC C GC C C C U CU A U G A A G A U A A U A U A G GCU GGCC -90 ✲ -100 0 10 20 30 40 50 60 70 80 steps
  • SV11 refolding paths -50 -60 Energy (kcal/mol) -70 -80 -90 -100 0 10 20 30 40 50 60 70 80 steps
  • libPF - a generic path sampling library In practice, this path sampling heuristics is implemented as a C library All structures along a path are stored in a hash and therefore available for the next iterations Heuristics routines are strictly separated from model-dependent routines, i.e. the library is completely generic Currently, RNA secondary structures and lattice proteins are implemented It is easy to extend the functionality to other discrete systems
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • Conclusion Discrete models allow a detailed study of the energy surface Barrier trees represent the landscape topology The lower part of the energy landscape is accessible by a flooding approach Macrostate folding kinetics reduces simulation time drastically A path sampling approach yields low-energy refolding paths and is a valuable tool for further kinetics studies
  • References Christoph Flamm, Ivo Hofacker, Peter Stadler Rolf Backofen, Sebastian Will, Martin Mann M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F. Stadler. Efficient computation of RNA folding dynamics. J. Phys. A: Math. Gen., 37(17):4731–4741, 2004. M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler. Exploring the lower part of discrete polymer model energy landscapes. Europhys. Lett., 74(4):725–732, 2006. Ch. Flamm, W. Fontana, I.L. Hofacker, and P. Schuster. RNA folding at elementary step resolution. RNA, 6:325–338, 2000 C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.