Evolving energy function
      for protein structure prediction

                     Paweł Widera
           pawel.widera...
Outline


1   Introduction

2   Protein energy models

3   Genetic Programming problem formulation

4   Results

5   Concl...
Protein structure prediction
From 1D sequence to 3D structure

          LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEE...
International prediction contest
Critical Assessment of techniques for protein Structure Prediction




CASP facts
   bian...
Ab initio predictor schema
From sequence to the final model

                                     Target sequence



      ...
Ab initio predictor schema
From sequence to the final model

                                     Target sequence



      ...
The algorithm of folding
Anfinsen’s thermodynamic hypothesis [Anfinsen, 1973]




                                          ...
Model assesment
Correlation between energy and similarity to native




                                                  ...
Model assesment
Correlation between energy and similarity to native




                                                  ...
All-atom force field
Folding simulation
                                         l
                                 bonds k...
All-atom force field
Folding simulation
                                         l
                                 bonds k...
Simplified knowledege-based potential
Protein structure prediction




                                     i −1           ...
Energy function
Weighted sum of terms vs. evolved function



                                                            ...
Energy function
Weighted sum of terms vs. evolved function



                                                            ...
Energy function
Weighted sum of terms vs. evolved function



                                                            ...
Fitness evaluation
Evolutionary objective




  1     construction of the reference ranking RR
        (decoys sorted by s...
Reference ranking construction
Correlation between energy and similarity to native



        R0        0       1         ...
Rankings comparison
Measure of distance between rankings




  43215                      Distance functions
  34152      ...
Rankings comparison
Measure of distance between rankings




  43215                      Distance functions
  34152      ...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Decoys sampling
Selection vs. noise reduction




                                                         Simple selectio...
Experiment design




  Pawel Widera   Evolving energy function for PSP   2009-06-30   15 / 26
Evolutionary progress
Fitness throughout generations



                          levenshtein-steadystate                 ...
Evolutionary progress
Fitness throughout generations



                      spearman-linear-ts8-generational            ...
Landscape analysis
Fitness distribution for the random walk




   Pawel Widera            Evolving energy function for PS...
Landscape analysis
Fitness distribution for the random walk


                   0.8




                   0.7




      ...
Population diveristy analysis
Genotype - Phenotype - Fitness mapping




                                                 ...
Population diveristy analysis
Genotype - Phenotype - Fitness mapping




   Pawel Widera          Evolving energy function...
Population diveristy analysis
Genotype - Phenotype - Fitness mapping




   Pawel Widera          Evolving energy function...
Improvement over random walk
Is the evolution any good?




                  decoys set         improvement            av...
Best evolved energy functions
Comparison to naive combination of energy terms




                                        ...
Best evolved energy functions
Comparison to naive combination of energy terms




                                        ...
Best evolved energy functions
Comparison to naive combination of energy terms




                                        ...
Best evolved energy functions
Comparison to naive combination of energy terms




                                        ...
Comparison to weighted sum of terms
Nelder-Mead downhill simplex optimisation




                        spearman-sigmoid...
Distribution of terminals and operators
Did the evolution discovered any knowledge?




      energy term         correlat...
Distribution of terminals and operators
Did the evolution discovered any knowledge?




      energy term         correlat...
Distribution of terminals and operators
Did the evolution discovered any knowledge?




      energy term         correlat...
Distribution of terminals and operators
Did the evolution discovered any knowledge?




      energy term         correlat...
Summary


Conclusions
   GP evolved function outperforms linear combination of weights
    GP choice of energy terms reflec...
Summary


Conclusions
   GP evolved function outperforms linear combination of weights
    GP choice of energy terms reflec...
Thank you!


Acknowledgements
This work was supported by Marie Curie
Action MEST-CT-2004-7597 under the
Sixth Framework Pr...
Publications


 1     P. Widera, J.M. Garibaldi, N. Krasnogor
       Evolutionary design of the energy function for protei...
References


  Anfinsen, C. (1973).
  Principles that Govern the Folding of Protein Chains.
  Science, 181(4096):223–30.

 ...
Upcoming SlideShare
Loading in …5
×

A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

1,102 views

Published on

In this talk I introduce a computational challenge for GP researchers, namely, the automated synthesis of energy functions for protein structure prediction.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,102
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

  1. 1. Evolving energy function for protein structure prediction Paweł Widera pawel.widera@cs.nott.ac.uk Natalio Krasnogor, Jonathan Garibaldi Department of Computer Science Ben-Gurion University of the Negev Beer Sheva, Israel 2009-06-30
  2. 2. Outline 1 Introduction 2 Protein energy models 3 Genetic Programming problem formulation 4 Results 5 Conclusions Pawel Widera Evolving energy function for PSP 2009-06-30 2 / 26
  3. 3. Protein structure prediction From 1D sequence to 3D structure LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAA IKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP Protein basics 20 aminoacid alphabet sequence encodes structure structure determines activity Pawel Widera Evolving energy function for PSP 2009-06-30 3 / 26
  4. 4. International prediction contest Critical Assessment of techniques for protein Structure Prediction CASP facts biannual competition started in 1994 parallel prediction and experimental verification model assesment by human experts Prediction difficulty comparative modelling (sequence similarity) fold recognition (new or existing) ab initio modelling (first principles) Pawel Widera Evolving energy function for PSP 2009-06-30 4 / 26
  5. 5. Ab initio predictor schema From sequence to the final model Target sequence PSIPRED Secondary Fold SAM-T02 PSI-BLAST structure recognition JUFO prediction and threading Initial ab initio prediction Optimisation Clustering Final models Pawel Widera Evolving energy function for PSP 2009-06-30 5 / 26
  6. 6. Ab initio predictor schema From sequence to the final model Target sequence PSIPRED Secondary Fold SAM-T02 PSI-BLAST structure recognition JUFO prediction and threading Initial ab initio prediction Optimisation Clustering Final models Pawel Widera Evolving energy function for PSP 2009-06-30 5 / 26
  7. 7. The algorithm of folding Anfinsen’s thermodynamic hypothesis [Anfinsen, 1973] Refolding experiment folds to the same native state native state is energetically stable Energy funnel roll down free energy hill avoid local minima traps [Dill and Chan, 1997] Pawel Widera Evolving energy function for PSP 2009-06-30 6 / 26
  8. 8. Model assesment Correlation between energy and similarity to native Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Pawel Widera Evolving energy function for PSP 2009-06-30 7 / 26
  9. 9. Model assesment Correlation between energy and similarity to native Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Pawel Widera Evolving energy function for PSP 2009-06-30 7 / 26
  10. 10. All-atom force field Folding simulation l bonds ki i 2 (l − li0 )+ angles kiθ i 2 (θ − θi0 )+ ω torsions Vi i 2 [1 + cos(ni ωi − φi )]+ σij 12 σij 6 qi qj N−1 N i=1 j=i+1 4 ij rij − rij + 4π 0 rij Intermolecular forces bond forces (stretching, bending, rotating) short range forces (Pauli repulsion, van der Waals’ interactions) electrostatic forces (Coulomb’s law) Rosetta@home in CASP7 140k computers (37 TFLOPS) — 500k CPU hours per domain Pawel Widera Evolving energy function for PSP 2009-06-30 8 / 26
  11. 11. All-atom force field Folding simulation l bonds ki i 2 (l − li0 )+ angles kiθ i 2 (θ − θi0 )+ ω torsions Vi i 2 [1 + cos(ni ωi − φi )]+ σij 12 σij 6 qi qj N−1 N i=1 j=i+1 4 ij rij − rij + 4π 0 rij Intermolecular forces bond forces (stretching, bending, rotating) short range forces (Pauli repulsion, van der Waals’ interactions) electrostatic forces (Coulomb’s law) Rosetta@home in CASP7 140k computers (37 TFLOPS) — 500k CPU hours per domain Pawel Widera Evolving energy function for PSP 2009-06-30 8 / 26
  12. 12. Simplified knowledege-based potential Protein structure prediction i −1 i +1 ˆ ni ˆ bi ˆ vi i Example Estiff = ˆ ˆ ˆ ˆ −λvi · vi+4 − λ bi · bi+2 − λΘ1 (i) + Θ2 (i) + Θ3 (i) i Eenv = i V (NPi , NAi , NOi , Ai ) Pawel Widera Evolving energy function for PSP 2009-06-30 9 / 26
  13. 13. Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
  14. 14. Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
  15. 15. Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
  16. 16. Fitness evaluation Evolutionary objective 1 construction of the reference ranking RR (decoys sorted by similarity to native) 2 ranking decoys using evolved energy function RE (decoys sorted by energy) 3 rankings comparison - RR vs. RE 4 fitness = average distance for all proteins Pawel Widera Evolving energy function for PSP 2009-06-30 11 / 26
  17. 17. Reference ranking construction Correlation between energy and similarity to native R0 0 1 2 3 4 5 6 7 RMSD 3.2 2.1 5.2 1.2 3.5 2.1 4.8 3.5 R1 3 1 5 0 4 7 6 2 R2 3.0 1.5 7.0 0.0 4.5 1.5 6.0 4.5 Ranking types R1 - permutation of indices R2 - averaged ranks Pawel Widera Evolving energy function for PSP 2009-06-30 12 / 26
  18. 18. Rankings comparison Measure of distance between rankings 43215 Distance functions 34152 Levenshtein edit distance - O(n) 1 1 1 4 3 → 10 Kendall Tau distance - O( n(n−1) ) 2 1 Spearman footrule distance - O( 2 n2 ) Ranks weighting 4 3 2 1 linear 1 5 5 5 5 → 4.6 sigmoid Pawel Widera Evolving energy function for PSP 2009-06-30 13 / 26
  19. 19. Rankings comparison Measure of distance between rankings 43215 Distance functions 34152 Levenshtein edit distance - O(n) 1 1 1 4 3 → 10 Kendall Tau distance - O( n(n−1) ) 2 1 Spearman footrule distance - O( 2 n2 ) Ranks weighting 4 3 2 1 linear 1 5 5 5 5 → 4.6 sigmoid Pawel Widera Evolving energy function for PSP 2009-06-30 13 / 26
  20. 20. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  21. 21. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  22. 22. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  23. 23. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  24. 24. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  25. 25. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  26. 26. Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
  27. 27. Experiment design Pawel Widera Evolving energy function for PSP 2009-06-30 15 / 26
  28. 28. Evolutionary progress Fitness throughout generations levenshtein-steadystate spearman-generational kendall-generational 0.0040 0.42 0.520 0.0035 0.40 0.515 0.0030 fitness fitness fitness 0.38 0.0025 0.510 0.0020 0.36 0.505 0.0015 0.34 0.0010 0.500 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation generation generation Observations Round I - early saturation Round II - small but constant improvement Pawel Widera Evolving energy function for PSP 2009-06-30 16 / 26
  29. 29. Evolutionary progress Fitness throughout generations spearman-linear-ts8-generational spearman-sigmoid-ts8-elitism kendall-ts4-generational 0.44 0.54 0.525 0.42 0.52 0.520 0.40 0.50 0.515 fitness fitness fitness 0.38 0.48 0.510 0.36 0.46 0.505 0.34 0.44 0.32 0.42 0.500 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation generation generation Observations Round I - early saturation Round II - small but constant improvement Pawel Widera Evolving energy function for PSP 2009-06-30 16 / 26
  30. 30. Landscape analysis Fitness distribution for the random walk Pawel Widera Evolving energy function for PSP 2009-06-30 17 / 26
  31. 31. Landscape analysis Fitness distribution for the random walk 0.8 0.7 0.6 fitness 0.5 0.4 0.3 d-100 d-58 f-100 f-42 random-100 top-100 uniform-100 all Pawel Widera Evolving energy function for PSP 2009-06-30 17 / 26
  32. 32. Population diveristy analysis Genotype - Phenotype - Fitness mapping Diversity measures F - fitness entropy (frequency of duplicates) P - root mean square distance between rankings G - number of unique trees <#T, #NT, depth> Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
  33. 33. Population diveristy analysis Genotype - Phenotype - Fitness mapping Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
  34. 34. Population diveristy analysis Genotype - Phenotype - Fitness mapping Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
  35. 35. Improvement over random walk Is the evolution any good? decoys set improvement avg best all 0.78% 0.710 uniform-100 0.96% 0.711 random-100 1.28% 0.713 top-100 1.93% 0.702 s-42 7.76% 0.713 s-100 7.64% 0.772 d-58 8.21% 0.780 d-100 10.88% 0.804 Pawel Widera Evolving energy function for PSP 2009-06-30 19 / 26
  36. 36. Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
  37. 37. Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
  38. 38. Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
  39. 39. Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
  40. 40. Comparison to weighted sum of terms Nelder-Mead downhill simplex optimisation spearman-sigmoid correlation method d-100 all d-100 all simplex 0.734 0.638 0.650 0.166 GP 0.835 *0.714 0.740 *0.200 Pawel Widera Evolving energy function for PSP 2009-06-30 21 / 26
  41. 41. Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
  42. 42. Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
  43. 43. Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
  44. 44. Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
  45. 45. Summary Conclusions GP evolved function outperforms linear combination of weights GP choice of energy terms reflects their correlation to RMSD decoys from real prediction process are more difficult to asses bloat control is necessary to evolve more compact functions Ideas for the future more complex total fitness distance measured using ProCKSI consensus Rosetta generated decoys additional energy terms (SA, RCH) Pawel Widera Evolving energy function for PSP 2009-06-30 23 / 26
  46. 46. Summary Conclusions GP evolved function outperforms linear combination of weights GP choice of energy terms reflects their correlation to RMSD decoys from real prediction process are more difficult to asses bloat control is necessary to evolve more compact functions Ideas for the future more complex total fitness distance measured using ProCKSI consensus Rosetta generated decoys additional energy terms (SA, RCH) Pawel Widera Evolving energy function for PSP 2009-06-30 23 / 26
  47. 47. Thank you! Acknowledgements This work was supported by Marie Curie Action MEST-CT-2004-7597 under the Sixth Framework Programme of the European Community. Ben Gurion University of the Negev’s Distinguished Scientists Visitor Program and Prof. Moshe Sipper. Contact natalio.krasnogor@cs.nott.ac.uk Pawel Widera Evolving energy function for PSP 2009-06-30 24 / 26
  48. 48. Publications 1 P. Widera, J.M. Garibaldi, N. Krasnogor Evolutionary design of the energy function for protein structure prediction In IEEE Congress on Evolutionary Computation, CEC’09, p1305–1312, Trondheim, Norway, May 2009 2 P. Widera, J.M. Garibaldi, N. Krasnogor GP challange: evolving the energy function for protein structure prediction submitted to Genetic Programming and Evolvable Machines, 2008 Pawel Widera Evolving energy function for PSP 2009-06-30 25 / 26
  49. 49. References Anfinsen, C. (1973). Principles that Govern the Folding of Protein Chains. Science, 181(4096):223–30. Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nat Struct Mol Biol, 4(1):10–19. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004). Protein Structure Prediction Using Rosetta. In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods in Enzymology, pages 66–93. Academic Press. Wu, S., Skolnick, J., and Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol, 5(1):17. Zhang, Y., Kolinski, A., and Skolnick, J. (2003). TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction. Biophys. J., 85(2):1145–1164. Pawel Widera Evolving energy function for PSP 2009-06-30 26 / 26

×