• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction
 

A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

on

  • 1,043 views

In this talk I introduce a computational challenge for GP researchers, namely, the automated synthesis of energy functions for protein structure prediction.

In this talk I introduce a computational challenge for GP researchers, namely, the automated synthesis of energy functions for protein structure prediction.

Statistics

Views

Total Views
1,043
Views on SlideShare
1,012
Embed Views
31

Actions

Likes
1
Downloads
22
Comments
0

4 Embeds 31

http://www.cs.nott.ac.uk 27
http://www.slideshare.net 2
http://localhost 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction Presentation Transcript

    • Evolving energy function for protein structure prediction Paweł Widera pawel.widera@cs.nott.ac.uk Natalio Krasnogor, Jonathan Garibaldi Department of Computer Science Ben-Gurion University of the Negev Beer Sheva, Israel 2009-06-30
    • Outline 1 Introduction 2 Protein energy models 3 Genetic Programming problem formulation 4 Results 5 Conclusions Pawel Widera Evolving energy function for PSP 2009-06-30 2 / 26
    • Protein structure prediction From 1D sequence to 3D structure LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAA IKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP Protein basics 20 aminoacid alphabet sequence encodes structure structure determines activity Pawel Widera Evolving energy function for PSP 2009-06-30 3 / 26
    • International prediction contest Critical Assessment of techniques for protein Structure Prediction CASP facts biannual competition started in 1994 parallel prediction and experimental verification model assesment by human experts Prediction difficulty comparative modelling (sequence similarity) fold recognition (new or existing) ab initio modelling (first principles) Pawel Widera Evolving energy function for PSP 2009-06-30 4 / 26
    • Ab initio predictor schema From sequence to the final model Target sequence PSIPRED Secondary Fold SAM-T02 PSI-BLAST structure recognition JUFO prediction and threading Initial ab initio prediction Optimisation Clustering Final models Pawel Widera Evolving energy function for PSP 2009-06-30 5 / 26
    • Ab initio predictor schema From sequence to the final model Target sequence PSIPRED Secondary Fold SAM-T02 PSI-BLAST structure recognition JUFO prediction and threading Initial ab initio prediction Optimisation Clustering Final models Pawel Widera Evolving energy function for PSP 2009-06-30 5 / 26
    • The algorithm of folding Anfinsen’s thermodynamic hypothesis [Anfinsen, 1973] Refolding experiment folds to the same native state native state is energetically stable Energy funnel roll down free energy hill avoid local minima traps [Dill and Chan, 1997] Pawel Widera Evolving energy function for PSP 2009-06-30 6 / 26
    • Model assesment Correlation between energy and similarity to native Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Pawel Widera Evolving energy function for PSP 2009-06-30 7 / 26
    • Model assesment Correlation between energy and similarity to native Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Pawel Widera Evolving energy function for PSP 2009-06-30 7 / 26
    • All-atom force field Folding simulation l bonds ki i 2 (l − li0 )+ angles kiθ i 2 (θ − θi0 )+ ω torsions Vi i 2 [1 + cos(ni ωi − φi )]+ σij 12 σij 6 qi qj N−1 N i=1 j=i+1 4 ij rij − rij + 4π 0 rij Intermolecular forces bond forces (stretching, bending, rotating) short range forces (Pauli repulsion, van der Waals’ interactions) electrostatic forces (Coulomb’s law) Rosetta@home in CASP7 140k computers (37 TFLOPS) — 500k CPU hours per domain Pawel Widera Evolving energy function for PSP 2009-06-30 8 / 26
    • All-atom force field Folding simulation l bonds ki i 2 (l − li0 )+ angles kiθ i 2 (θ − θi0 )+ ω torsions Vi i 2 [1 + cos(ni ωi − φi )]+ σij 12 σij 6 qi qj N−1 N i=1 j=i+1 4 ij rij − rij + 4π 0 rij Intermolecular forces bond forces (stretching, bending, rotating) short range forces (Pauli repulsion, van der Waals’ interactions) electrostatic forces (Coulomb’s law) Rosetta@home in CASP7 140k computers (37 TFLOPS) — 500k CPU hours per domain Pawel Widera Evolving energy function for PSP 2009-06-30 8 / 26
    • Simplified knowledege-based potential Protein structure prediction i −1 i +1 ˆ ni ˆ bi ˆ vi i Example Estiff = ˆ ˆ ˆ ˆ −λvi · vi+4 − λ bi · bi+2 − λΘ1 (i) + Θ2 (i) + Θ3 (i) i Eenv = i V (NPi , NAi , NOi , Ai ) Pawel Widera Evolving energy function for PSP 2009-06-30 9 / 26
    • Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
    • Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
    • Energy function Weighted sum of terms vs. evolved function GP input F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] terminals: T1 , . . . , T8 T1 ∗T3 T4 −w2 ∗T1 F (T ) = w1 ∗log(T2 ) + sin T5 ∗exp(cos(w1 ∗T3 )) functions: add sub mul div sin cos exp log random ephemerals in range [0,1] GP tree example size = 60 depth = 17 Pawel Widera Evolving energy function for PSP 2009-06-30 10 / 26
    • Fitness evaluation Evolutionary objective 1 construction of the reference ranking RR (decoys sorted by similarity to native) 2 ranking decoys using evolved energy function RE (decoys sorted by energy) 3 rankings comparison - RR vs. RE 4 fitness = average distance for all proteins Pawel Widera Evolving energy function for PSP 2009-06-30 11 / 26
    • Reference ranking construction Correlation between energy and similarity to native R0 0 1 2 3 4 5 6 7 RMSD 3.2 2.1 5.2 1.2 3.5 2.1 4.8 3.5 R1 3 1 5 0 4 7 6 2 R2 3.0 1.5 7.0 0.0 4.5 1.5 6.0 4.5 Ranking types R1 - permutation of indices R2 - averaged ranks Pawel Widera Evolving energy function for PSP 2009-06-30 12 / 26
    • Rankings comparison Measure of distance between rankings 43215 Distance functions 34152 Levenshtein edit distance - O(n) 1 1 1 4 3 → 10 Kendall Tau distance - O( n(n−1) ) 2 1 Spearman footrule distance - O( 2 n2 ) Ranks weighting 4 3 2 1 linear 1 5 5 5 5 → 4.6 sigmoid Pawel Widera Evolving energy function for PSP 2009-06-30 13 / 26
    • Rankings comparison Measure of distance between rankings 43215 Distance functions 34152 Levenshtein edit distance - O(n) 1 1 1 4 3 → 10 Kendall Tau distance - O( n(n−1) ) 2 1 Spearman footrule distance - O( 2 n2 ) Ranks weighting 4 3 2 1 linear 1 5 5 5 5 → 4.6 sigmoid Pawel Widera Evolving energy function for PSP 2009-06-30 13 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Decoys sampling Selection vs. noise reduction Simple selection top uniform random Bin based selection equal size equal distance Pawel Widera Evolving energy function for PSP 2009-06-30 14 / 26
    • Experiment design Pawel Widera Evolving energy function for PSP 2009-06-30 15 / 26
    • Evolutionary progress Fitness throughout generations levenshtein-steadystate spearman-generational kendall-generational 0.0040 0.42 0.520 0.0035 0.40 0.515 0.0030 fitness fitness fitness 0.38 0.0025 0.510 0.0020 0.36 0.505 0.0015 0.34 0.0010 0.500 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation generation generation Observations Round I - early saturation Round II - small but constant improvement Pawel Widera Evolving energy function for PSP 2009-06-30 16 / 26
    • Evolutionary progress Fitness throughout generations spearman-linear-ts8-generational spearman-sigmoid-ts8-elitism kendall-ts4-generational 0.44 0.54 0.525 0.42 0.52 0.520 0.40 0.50 0.515 fitness fitness fitness 0.38 0.48 0.510 0.36 0.46 0.505 0.34 0.44 0.32 0.42 0.500 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation generation generation Observations Round I - early saturation Round II - small but constant improvement Pawel Widera Evolving energy function for PSP 2009-06-30 16 / 26
    • Landscape analysis Fitness distribution for the random walk Pawel Widera Evolving energy function for PSP 2009-06-30 17 / 26
    • Landscape analysis Fitness distribution for the random walk 0.8 0.7 0.6 fitness 0.5 0.4 0.3 d-100 d-58 f-100 f-42 random-100 top-100 uniform-100 all Pawel Widera Evolving energy function for PSP 2009-06-30 17 / 26
    • Population diveristy analysis Genotype - Phenotype - Fitness mapping Diversity measures F - fitness entropy (frequency of duplicates) P - root mean square distance between rankings G - number of unique trees <#T, #NT, depth> Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
    • Population diveristy analysis Genotype - Phenotype - Fitness mapping Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
    • Population diveristy analysis Genotype - Phenotype - Fitness mapping Pawel Widera Evolving energy function for PSP 2009-06-30 18 / 26
    • Improvement over random walk Is the evolution any good? decoys set improvement avg best all 0.78% 0.710 uniform-100 0.96% 0.711 random-100 1.28% 0.713 top-100 1.93% 0.702 s-42 7.76% 0.713 s-100 7.64% 0.772 d-58 8.21% 0.780 d-100 10.88% 0.804 Pawel Widera Evolving energy function for PSP 2009-06-30 19 / 26
    • Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
    • Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
    • Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
    • Best evolved energy functions Comparison to naive combination of energy terms Correlation to RMSD d-100 0.76 (generational+ADF) all decoys 0.30 (steady-state+elitism) best single term 0.24 worst single term -0.20 naive combination of terms 0.12 original I-TASSER energy 0.44 (0.51/0.65) Pawel Widera Evolving energy function for PSP 2009-06-30 20 / 26
    • Comparison to weighted sum of terms Nelder-Mead downhill simplex optimisation spearman-sigmoid correlation method d-100 all d-100 all simplex 0.734 0.638 0.650 0.166 GP 0.835 *0.714 0.740 *0.200 Pawel Widera Evolving energy function for PSP 2009-06-30 21 / 26
    • Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
    • Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
    • Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
    • Distribution of terminals and operators Did the evolution discovered any knowledge? energy term correlation Use of energy terms T1 (E13 ) 0.03 ± 0.11 most frequent: T4 , T5 T2 (E14 ) 0.20 ± 0.17 T3 (E15 ) 0.15 ± 0.15 least frequent: T1 , T6 T4 (Estiff ) 0.24 ± 0.22 T5 (EHB ) −0.16 ± 0.20 Use of operators T6 (Epair ) 0.01 ± 0.14 most frequent T7 (Eelectro ) −0.20 ± 0.23 add, div T8 (Eenv ) 0.04 ± 0.16 least frequent average 0.06 sin, cos, log Pawel Widera Evolving energy function for PSP 2009-06-30 22 / 26
    • Summary Conclusions GP evolved function outperforms linear combination of weights GP choice of energy terms reflects their correlation to RMSD decoys from real prediction process are more difficult to asses bloat control is necessary to evolve more compact functions Ideas for the future more complex total fitness distance measured using ProCKSI consensus Rosetta generated decoys additional energy terms (SA, RCH) Pawel Widera Evolving energy function for PSP 2009-06-30 23 / 26
    • Summary Conclusions GP evolved function outperforms linear combination of weights GP choice of energy terms reflects their correlation to RMSD decoys from real prediction process are more difficult to asses bloat control is necessary to evolve more compact functions Ideas for the future more complex total fitness distance measured using ProCKSI consensus Rosetta generated decoys additional energy terms (SA, RCH) Pawel Widera Evolving energy function for PSP 2009-06-30 23 / 26
    • Thank you! Acknowledgements This work was supported by Marie Curie Action MEST-CT-2004-7597 under the Sixth Framework Programme of the European Community. Ben Gurion University of the Negev’s Distinguished Scientists Visitor Program and Prof. Moshe Sipper. Contact natalio.krasnogor@cs.nott.ac.uk Pawel Widera Evolving energy function for PSP 2009-06-30 24 / 26
    • Publications 1 P. Widera, J.M. Garibaldi, N. Krasnogor Evolutionary design of the energy function for protein structure prediction In IEEE Congress on Evolutionary Computation, CEC’09, p1305–1312, Trondheim, Norway, May 2009 2 P. Widera, J.M. Garibaldi, N. Krasnogor GP challange: evolving the energy function for protein structure prediction submitted to Genetic Programming and Evolvable Machines, 2008 Pawel Widera Evolving energy function for PSP 2009-06-30 25 / 26
    • References Anfinsen, C. (1973). Principles that Govern the Folding of Protein Chains. Science, 181(4096):223–30. Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nat Struct Mol Biol, 4(1):10–19. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004). Protein Structure Prediction Using Rosetta. In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods in Enzymology, pages 66–93. Academic Press. Wu, S., Skolnick, J., and Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol, 5(1):17. Zhang, Y., Kolinski, A., and Skolnick, J. (2003). TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction. Biophys. J., 85(2):1145–1164. Pawel Widera Evolving energy function for PSP 2009-06-30 26 / 26