• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Computational Protein Design. 2. Computational Protein Design Techniques
 

Computational Protein Design. 2. Computational Protein Design Techniques

on

  • 1,426 views

 

Statistics

Views

Total Views
1,426
Views on SlideShare
1,356
Embed Views
70

Actions

Likes
2
Downloads
84
Comments
0

1 Embed 70

http://bioretrosynth.issb.genopole.fr 70

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Computational Protein Design. 2. Computational Protein Design Techniques Computational Protein Design. 2. Computational Protein Design Techniques Presentation Transcript

    • Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 45
    • Computational Protein Design Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 45
    • A Blueprint of CPD Approaches∗ RS : research studies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 45
    • Molecular Signature Descriptors A 2D representation of the molecular graphs Atomic signature : as an undirected colored graphs G(V , E, C), Xh with V : atoms, E : bonds, C : atom type h σ(G) = σ(x) (1) The signature descriptor of height h of atom x x∈V in the molecular graph G, or h σ(x), is a The signature is a systematic canonical representation of the subgraph of codification of the molecular G containing all atoms that are at distance h graph [Faulon et al., 2004] from x σ(methylcyclopropane) = 1 [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])) 2 [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])) 1 [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H]))) 1 [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))) 4 [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))) 3 [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 45
    • Molecular Signature of Reactions and Proteins Signature of a reaction. The signature of reaction R S1 + S2 + . . . + Sn → P1 + P2 + . . . + Pn (2) that transforms n substrates into m products is given by the difference between the signature of the products and the signature of the substrates: h Xh Xh σ(R) = σ(p) − σ(s) (3) p∈P s∈S Signature of protein sequences. The protein P is represented by the linear chain given by its collapsed graph at residue level, a reduced molecular graph representation G(V , E, C) known as string signature where V : residues a ∈ A, E : contiguous in sequence, C : amino acid type h Xh σ(P) = σ(a) (4) a∈A Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 45
    • Protein Contact Maps The protein contact map is a graph representation of the 3D interactions at residue level G(V , E, C) where V : residues, E : contacts, C : amino acid type Two residues are considered to interact when atoms between both residues are at a distance lower than a predetermined threshold (tipically 4.5 ∼ 5 Å) Contact maps can account for long-range interactions and conformational states Song et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 45
    • Sequence and Structure-Based CPD Sequence-based CPD methods are in some cases a good trade-off between complexity of the model and accuracy of the predictions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 45
    • Sequence-based Knowledge-based potentials The simplest way to score a protein and to identify active regions is through amino acid scales or indexes AAindex is a database of 544 amino acid indexes 94 Amino Acid Matrices 47 amino acid pair-wise contact potentials Examples: hydrophobicity, accessibility, van der Waals volume, secondary structure propensity, flexibility This approach is widely used when analyzing conserved motifs and correlated mutations in protein fold families through multiple alignments Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 45
    • Quantitative Structure-Activity Relationship (QSAR) Techniques The goal is to model causal relationships QSAR is a statistical method used between extensively by the chemical and pharmaceutical industries in structures of interacting molecules small-molecules and peptide measurables properties of scientific optimization or commercial interest such as ADME/Tox (absorption, distribution, metabolism, excretion, and toxicity) of drugs Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 45
    • QSAR Model Evaluation Model predictability is generally evaluated through the leave-one-out (LOO) cross-validation correlation coefficient q 2 Partial least-squares (PLS) regression is commonly used Additional nonlinear terms can be added through the use of nonlinear regression or machine learning techniques (kernel methods, random forests, etc) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 45
    • QSAR Modeling Workflow Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 45
    • Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 45
    • Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 45
    • The ProSAR Algorithm An extension of SAR-based approaches to CPD It formalizes the decision-making processes about which mutations to include in combinatorial libraries N XX y = cij xij (5) i=1 j∈A y : the predicted function (activity) of the protein sequence cij : the regression coefficients corresponding to the mutational effect of having residue j among the 20 amino acids A at postion i xij : binary variable indicating the presence or absence of residue j at position i Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 45
    • Improving Catalytic Function by ProSAR-driven Enzyme Evolution Statistical analysis of protein sequence activity relationships Bacterial biocatalysis of Atorvastatin (Lipitor) (cholesterol-lowering drug) Codexis Inc. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 45
    • Structure-based CPD Energy functions and molecular force fields Local conformational restrictions Predicting entropic factors Protein topological properties From Narasimhan et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 45
    • Energy Functions and Molecular Force Fields In structure-based CPD, folds are usually represented by the spatial coordinates of the backbone atoms or design scaffold Protein design is done by amino acid side chains along the scaffold Side chains are only permitted to assume a discrete set of statistically preferred conformations: rotamers Rotamer/backbone and rotamer/rotamer interaction energies are tabulated These potential energies can then be approximated by using any of the standard force fields : CHARMM, AMBER, GROMOS Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 45
    • Molecular Force FieldsAMBER: a classical force field for energy and MD calculations: X 1 X 1 X 1 V (r N ) = kb (l − l0 )2 + ka (θ − θ0 )2 + Vn [1 + cos(nω − γ)] 2 2 2 bonds angles torsions N−1 X ( "„ « „ «6 # ) X N r0ij 12 r0ij qi qj + i,j −2 + (6) rij rij 4π 0 rij j=1 i=j+1 P 1 (·): energy between covalently bonded atoms. Pbonds angles (·): energy due to the geometry of electron orbitals involved in covalent 2 bonding. P torsions (·): energy for twisting a bond due to bond order (e.g. double bonds) and 3 neighboring bonds or lone pairs of electrons. PN−1 PN i=j+1 (·): non-bonded energy between all atom pairs: 4 j=1 1 van der Waals energies 2 Electrostatic energies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 45
    • Structure-based Knowledge-based Potentials They are built by performing a large-scale statistical study of structural databases such as PDB (Protein Data Bank) Rotamer libraries (∼ 150 rotameric states) Binary patterning: only some type of amino acids are allowed based on the hydrophobic environment An implicit solvation model Secondary structure propensity Frequency of small segments in the PDB Pairwise potentials van der Waals interactions Hydrogen bonding Electrostatics Entropy-based penalties for flexible side-chainsFrom Boas and Harbury [2007] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 45
    • Energy Functions Design along the backbone or scaffold Rotamer/backbone and rotamer/rotamer interact. energies tabulated Precomputed from molecular force fields : CHARMM, AMBER, GROMOSTotal energy of the protein X X ETOT = Ek (rk ) + Ekl (rk , rl ) (7) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ) : the self-energy of a particular rotamer rk Ekl (rk , rl ) : the pair energy of rotamers rk , rj Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 45
    • The Role of Dynamics Besides protein structure, protein dynamics can play a direct role in molecular recognition Flexible proteins recognize their targets through induced fit or conformational selection, likely showing promiscuity Binding is commonly enthalpy-driven, but in some cases entropy is important, for instance: Proteins with multiple binding sites Small hydrophobic molecules Two types of source of protein motions: Protein flexibility: intraconformational dynamics (fast time scale motions) Conformational heterogeneity: interconformational dynamics Gibbs free energy: ∆G = ∆H − T ∆S (8) ∆S = ∆Ssolv + ∆Sconf + ∆Srt (9) ∆Sconf : conformational entropy of protein and ligand ∆Srtf : rotational and translational degree of freedoms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 45
    • Predicting Side-chain Dynamics from Structural Descriptors The Lipari-Szabo model free approach approach allows to quantify motions from NMR experiments by computing the generalized order parameter S 2 Protein backbone dynamics : 15 NH and 13 Cα H NMR relaxation methods Protein side chain methyl dynamics : 13 Cα H NMR relaxation methods (side-chain motions in the picosecond-to-nanosecond time regime) From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal nuclease, pin1, sh3 domain, MSG This technique provides only measurements for the Cα of methyl groups in side chains : ALA, LEU, ILE, MET, THR, VAL Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 45
    • Structural Descriptors of Methyl Dynamics We consider the following parameters influencing side-chain dynamics : Packing density at the methyl site i and its neighboring residues j within a sphere of r =5Å 0 1 X X B X Pi = Cj e−rij = e−rjk A e−rij (10) C @ rij <5Å rij <5Å rjk <5Å Side chain stiffness : number of dihedral angles separating the backbone from the methyl carbon. weighted by the side-chain packing Rotameric state : angular distance ∆χ = χ − χ0 to the closest rotameric state χ0 in the library Elongation : distance from the methyl site to the Cα Pairwise contact potential : a knowledge-based potential of frequence of contacts between residues at several distances computed from the PDB Solvation effect : DSSP accessibility and residue hydrophobicity Van der Waals contacts Hydrogen bonds (in the case of Threonine) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 45
    • Predicting Methyl Side-chain DynamicsAlgorithm : neural networkCross-validation : r = 0.71 ± 0.029 Example : experimental and predicted(p-value = 4.6 × 10−87 ) changes in ∆S 2 of barnase after binding barstar Protein MD method r (MD) r (nnet) ubiquitin AMBER99SB 0.81 0.81 TNfn3 CHARMM 22 0.62 0.79 ∆S 2 > 0 ∆S 2 < 0 FNfn10 CHARMM 22 0.51 0.64 rigidification flexibilization barnase OPLS-AA/L 0.55 0.64 calmodulin FDPB 0.60 0.72[Carbonell and del Sol, 2009] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 45
    • Search Algorithms in CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 45
    • Search Algorithms Objective: finding the best design within the space of all possible amino acid/rotameric states A vast search space: 20N or pN N: number of positions to mutate p: number of rotameric states Strategies Deterministic algorithms Dead-end elimination (DEE) algorithm: a pruning method. Some accelerations of the DEE algorithm: upper-bound estimation; the “magic bullet” metric; conformational splitting; background optimization Stochastic algorithms Monte Carlo Simulated annealing Genetic algorithms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 45
    • The DEE Algorithm It assumes that the energy of the protein can be written as X X ETOT = Ek (rk ) + Ekl (rk , rl ) (11) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ):" the self-energy of a particular rotamer rk Ekl (rk , rl ): the pair energy of the rotamers rk , rj Complexity: Single search scales quadratically with total number of rotamers O((p × N)2 ) Pair search scales cubically O((p × N)3 ) Brute force enumeration : O(pN ) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 45
    • The DEE Algorithm Single rotamers and rotamer pairs are eliminated during the computational cycles Single elimination : eliminate rotamer if some other rotamer in the side chain gives better energy N X N X A Ek (rk ) + min Ekl (rk , rlX ) A > B Ek (rk ) + max Ekl (rk , rlX ) B (12) X X l=1 l=1 Pairs elimination : eliminate pair of rotamers in two positions if there exists another pair that gives better energy def Ukl = Ek (rk ) + El (rlB ) + Ekl (rk , rlB ) AB A A (13) N X “ ” AB Ukl + min Eki (rk , riX ) + Elj (rlB , rjX ) > A X i=1 N X “ ” CD Ukl + max Eki (rk , riX ) + Elj (rlD , rjX ) C (14) X i=1 Values are precomputed and stored in energy matrices Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 45
    • Stochastic Algorithms Search in the space of feasible designs by making a series of combinations of random and directed moves Monte Carlo Metropolis: a move consists of exchanging one rotamer for another at a randomly chosen position, a modification is accepted if it lowers the energy Simulated Annealing allows to explore nearby solutions at the initial cycles of the search Genetic Algorithms: a population of models is propagated (evolved) throughout the course of the run and genetic operators, such as recombination, are used to create new models from existing parents They are fast, can be scaled up to problems of large complexity They are not guaranteed to converge to the optimal solution Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 45
    • The SCHEMA Algorithm Equivalent to an in silico directed evolution Consists of scoring libraries of hybrid protein sequences against the parental sequence Scoring: Calculate the number of interactions between residues (contacts within 4.5 Å) that are disrupted in the creation of hybrid proteins Hybrids are scored for stability by counting the number of disruptions Protein is partitioned into blocks that should not From [Meyer et al., 2006] interrupted by crossovers (analog to genetic algorithms) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 45
    • The OPTCOM and IPRO Algorithms for Library Design The OPTCOM algorithm: The IPRO algorithm: Balances size and Identify point mutations in the parent sequences quality of the library using energy-based scoring fuctions Residue and rotamer choices are driven by a mixed-integer linear programming formulation (MILP)From [Saraf et al., 2006] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 45
    • Some Web Resources IPRO: Iterative Protein Redesign and Optimization. http://maranas.che.psu.edu/IPRO.htm EGAD: A Genetic Algorithm for protein Design. http://egad.ucsd.edu/software.php RosettaDesign: A software package. http://rosettadesign.med.unc.edu/ SCHEMA A pair-wise energy function for scoring protein chimeras made from homologous proteins. http://www.che.caltech.edu/groups/fha/ schema-tools/schema-overview.html SHARPEN: Systematic Hierarchical Algorithms for Rotamers and Proteins on an Extended Network. http://koko.che.caltech.edu/sharpenabout.html WHAT IF: Software for protein modelling, design, validation, and visualisation. http://swift.cmbi.ru.nl/whatif/ FoldX: A force field for energy calculations and protein design. http://foldx.crg.es/ Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 45
    • De Novo-Designed Proteins In de novo designs, some assumptions are needed in order to make the search space tractable Usually we start from some basic motifs or domains as scaffolds for the design Examples: βαβ motif resembling a zinc finger 3 and 4 helix bundles Helical coiled-coils Helix bundle motifs can be parametrized using a few global variables that describe the global structure Applications: New metal-binding sites Nonbiological cofactors for novel biomaterials and electromechanical devices Novel enzymatic activities Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 45
    • Example: De Novo Design of a Metalloprotein Computational de novo design of a four-helix (108 residues) bundle containing the non-biological cofactor iron diphenyl porphyrin (DPP-Fe) [Bender et al., 2007] The initial helix bundle was selected as low-energy structure computed with MCSA STITCH: a program to select loops connecting helices from PDB Select CHARMM and PROCHECK for removing overlaps 4 His and the 4 Thr residues to support the 6-point coordination of the Fe(III) cations SCADS: provides side-dependent amino acid probabilities in each round Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 45
    • Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 45
    • Challenges in Sequence and Structure-Based CPDModeling Greater availability of 3D protein structural information More accurate energy functions Improvement of rigid and flexible dockingDesign Improvement in search algorithms Parametrization for non-natural amino acidsPrediction Beyond additive models: using machine-learning algorithms More complete environment descriptors Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 45
    • Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 45
    • Bibliography IGretchen M. Bender, Andreas Lehmann, Hongling Zou, Hong Cheng, H. Christopher Fry, Don Engel, Michael J. Therien, J. Kent Blasie, Heinrich Roder, Jeffrey G. Saven, and William F. DeGrado. De Novo Design of a Single-Chain Diphenylporphyrin Metalloprotein. Journal of the American Chemical Society, 129(35):10732–10740, September 2007. ISSN 0002-7863. doi: 10.1021/ja071199j. URL http://dx.doi.org/10.1021/ja071199j.F. Edward Boas and Pehr B. Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, April 2007. ISSN 0959-440X. doi: 10.1016/j.sbi.2007.03.006. URL http://dx.doi.org/10.1016/j.sbi.2007.03.006.Pablo Carbonell and Antonio del Sol. Methyl side-chain dynamics prediction based on protein structure. Bioinformatics, pages btp463+, July 2009. doi: 10.1093/bioinformatics/btp463. URL http://dx.doi.org/10.1093/bioinformatics/btp463.Jean-Loup L. Faulon, Michael J. Collins, and Robert D. Carr. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. Journal of chemical information and computer sciences, 44(2):427–436, 2004. ISSN 0095-2338. doi: 10.1021/ci0341823. URL http://dx.doi.org/10.1021/ci0341823.Michelle M. Meyer, Lisa Hochrein, and Frances H. Arnold. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Engineering Design and Selection, 19(12):563–570, December 2006. ISSN 1741-0126. doi: 10.1093/protein/gzl045. URL http://dx.doi.org/10.1093/protein/gzl045.Diwahar Narasimhan, Mark R. Nance, Daquan Gao, Mei-Chuan Ko, Joanne Macdonald, Patricia Tamburi, Dan Yoon, Donald M. Landry, James H. Woods, Chang-Guo Zhan, John J. G. Tesmer, and Roger K. Sunahara. Structural analysis of thermostabilizing mutations of cocaine esterase. Protein Engineering Design and Selection, 23(7):537–547, July 2010. doi: 10.1093/protein/gzq025. URL http://dx.doi.org/10.1093/protein/gzq025.Manish C. Saraf, Gregory L. Moore, Nina M. Goodey, Vania Y. Cao, Stephen J. Benkovic, and Costas D. Maranas. IPRO: an iterative computational protein library redesign and optimization procedure. Biophysical journal, 90(11):4167–4180, June 2006. ISSN 0006-3495. doi: 10.1529/biophysj.105.079277. URL http://dx.doi.org/10.1529/biophysj.105.079277.Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, Michael M. Gromiha, and Tatsuya Akutsu. Prediction of Protein Folding Rates from Structural Topology and Complex Network Properties. IPSJ Transactions on Bioinformatics, 3:40–53, 2010. doi: 10.2197/ipsjtbio.3.40. URL http://dx.doi.org/10.2197/ipsjtbio.3.40. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 45