Protein Structure Prediction using Coarse Grain Force Fields

993 views
855 views

Published on

PhD defense talk - held on 12.02.2010

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
993
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Protein Structure Prediction using Coarse Grain Force Fields

  1. 1. Protein Structure Predictionusing Coarse Grain Force Fields Nasir Mahmood 12.02.2010
  2. 2. Overview • Introduction • Probabilistic Ab Initio – Standard – Score function – Search Method – Results • Probabilistic Ab Initio - Extended – Score Function : Introducing Solvation – Search Method: Bias Fix – Results • Outlook • Summary 2
  3. 3. “All the information requiredby protein to adopt its finalconformation is encoded inits sequence” • information he referred to has not been decoded yet • interestingly, these days we also know about proteins like ‘prions’ Christian B. Anfinsen (1916 - 1995) Source: http://nobelprize.org/ 3
  4. 4. X-Ray Crystallography Experimental NMR Methods SpectroscopyN Cryo-EM Time (year)
  5. 5. X-Ray Crystallography Experimental Methods NMR SpectroscopyN Cryo-EM Time (year) More than 3 decades and only 60000+ structures 5
  6. 6. 100 × 10 6 Sequence 90 × 10 6 Database Growth 80 × 10 6 X-Ray Crystallography 70 × 10 6 Experimental NMR Methods Spectroscopy 60 × 10 6 Cryo-EMN 50 × 10 6 40 × 10 6 30 × 10 6 20 × 10 6 10 × 10 6 Time (year) 6
  7. 7. Experimental Data X-Ray Experimental Crystallography Methods NMR Spectroscopy PDBMethods Accuracy Cryo-EM Computation cost Homology PDB dependence Computational Modeling Methods Fold Recognition Ab Initio Modeling Physical Principles 7
  8. 8. • Monte Carlo Methods • Molecular Dynamics • Physics-based • Best but most difficult (Force fields) • Computationally expensive • Statistics-based Pi = e - ∆E/kBT • Boltzmann distributions • Statistical mechanical ensembles • We use Descriptive StatisticsAb Initio • Bayesian formulation • No hidden approximationsMethods • No energies but find distributions 8
  9. 9. • Simulated Annealing /• Coarse Grained Monte Carlo • reduced dimensionality • Move set: biased & unbiased • relies on dihedral angles • Acceptance criterion: ratio of probabilities • no side chains • 5-atoms representation • Fragment Assembly • Purely Probabilistic Force Field • Mixture of Probabilities: • Sequence, Structure, SolvationOur Ab Initio • No energies Method • No Boltzmann statistics 9
  10. 10. ProbabilisticScore Function 10
  11. 11. 1. Sequence • Multi-way Bernoulli E MP N S A W Y F I D KG Q H T S L C2. Structure • Representation : • Reduced, Simplified • 5-atoms per amino acid • dihedral angles (phi, psi) • Bivariate Gaussian 11
  12. 12. i i+1 i+2 1.5 × 10 6 (B) (A) Sequence Structure -3.1 -2.0 -0.5 -1.7 -2.0 -1.5 -2.2 i A S T C W R I -1.1 -0.9 -0.7 -0.5 -0.3 -0.8 -1.0 -2.0 -0.5 -1.7 -2.0 -1.5 -2.2 -1.1i+1 S T C W R I M -0.9 -0.7 -0.5 -0.3 -0.8 -1.0 -1.1 -0.5 -1.7 -2.0 -1.5 -2.2 -1.1 -2.1i+2 T C W R I M F -0.7 -0.5 -0.3 -0.8 -1.0 -1.1 -0.4 … … 3.1 2.0 1.5 1.7 -2.0 -1.5 -1.2 N P L E N R R V 1.1 0.9 -2.5 2.3 -0.9 -1.2 -0.8 (C) 12
  13. 13. Fragment Generation Classified ACAD .. CCAD .. WFTG .. STST.. STDC .. WFDC .. DCWF .. GAEG .. GAEG .. GGGG .. Expectation Maximization Fragment Bayesian Library Statistical Models Classifier 13
  14. 14. 14 20 05 -32 80 W E W C 87 -71 15 -07 20 05 -32 80 W W E W 87 -71 15 -07 20 05 -32 80 Q W W E 87 -71 15 -07 20 05 -32 8087 -71 15 -07 A Q W W20 05 -32 80 87 -71 15 -07 Structure 20 05 -32 80 T A Q W 87 -71 15 -07 20 05 -32 80 T T A T 87 -71 15 -07 20 05 -32 80 L T T A 87 -71 15 -07 20 05 -32 80 T L T I 87 -71 15 -07 T Sequence 20 05 -32 80 L T L T 87 -71 15 -07 L 20 05 -32 80 S L T M 87 -71 15 -07 S 20 05 -32 80 A S L T 87 -71 15 -07 A class 0 class 1 class 2 class 3 class 4 class 5 class 6 DCWF .. GAEG .. WFDC .. GGGG .. GAEG .. Classified ACAD .. CCAD .. WFTG .. STDC .. STST..
  15. 15. Search Method 15
  16. 16. Initial (random) p(x i ) conformation Relative probabilities: Pi = p(x ) i -1Probability • Normal methods : Pi = e - ∆E/kBT (i) (i-1) Final Model Conformational space 16
  17. 17. 180 Random Angle 0 Generator PDB -180 0 180 180 phi psi 0 psi 93 177 66 14 167 73 31 54 -180 -180 0 180 phi Fragment ≈ 2 × 10 6 fragments Library Unbiased Biased 17
  18. 18. Interplay of Cartesian Coordinates & Dihedral AnglesChoi, V.: 2005, On Updating torsion angles of molecular conformations, 18J Chem Inf Model 46, 438–444.
  19. 19. Results 19
  20. 20. Results 2hfq Model Native 20
  21. 21. Results 2hd3Model Native 21
  22. 22. Results 2gzv Psi Phi Model Native 22
  23. 23. Results 2hj1 Score Time TemperatureModel Native 23
  24. 24. Results Psi Phi Score Time Temperature 24
  25. 25. Score Function:Introducing Solvation 25
  26. 26. 26
  27. 27. PDB 27
  28. 28. Trp PDBGly Lys Ser 28
  29. 29. 1. Sequence • Multi-way Bernoulli E MP N S A W Y F I KG QH T S D L C2. Structure 3. Solvation• Representation : • Simple Gaussian • Reduced, Simplified • 5-atoms per amino acid • dihedral angles (phi, psi)• Bivariate Gaussian 29
  30. 30. • Mixture Models: Re-Classified  Connections ACAD .. CCAD .. WFTG .. STST.. STDC ..  Residues PDB   Geometry Location in protein WFDC .. DCWF .. GAEG .. GAEG .. GGGG .. Sequence Structure Solvation -3.1 -2.0 -0.5 -1.7 A S L T 12 07 08 11 -1.1 -0.9 -0.7 -0.5 -2.0 -0.5 -1.7 -1.2 S L T I 07 08 11 09 -0.9 -0.7 -0.5 -0.4 Expectation MaximizationFragment Bayesian Library Statistical Models Classifier 30
  31. 31. Search Method:Bias Fix & Combining Fragments 31
  32. 32. Bias Fix 32
  33. 33. Combining Fragments andProbabilities 33
  34. 34. Results 34
  35. 35. Results 1fsv2hep Native Model 35
  36. 36. Results 2k4x 1agt Model Native 36
  37. 37. Results2k532k4n Native Model 37
  38. 38. Results 2hf1 Native Model 38
  39. 39. Future Outlook • Introduce hydrogen bonds – as a probabilistic term • Hydrogen bond N energies have normal distribution • Use Simple Gaussian model Hydrogen bond energy (kcal/mol) 39
  40. 40. Summary• Purely Probabilistic Approach for Protein Structure Prediction• Score function consists of a set of probability distributions• Conformation probabilities - mixture of probabilities, no energies at all• generates protein/protein-like conformations• long-range interactions not well represented• In future, hydrogen bond term could improve results• Application to sequence optimization• Rapid sampling – combine with other score functions 40
  41. 41. Thanks for your attention!

×