Computational Protein Design                     3. Applications of Computational Protein Design                          ...
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
Applications of CPD in Systems Biology The challenge : robust and reliable                               The Structural In...
Applications of CPD in Synthetic Biology   Engineering signal transduction: modifying the specificity and specificity of   r...
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
Antibody-Antigen Interactions   Antibodies are gamma globulin proteins found   in the immune system of vertebrates   Basic...
The Variable Domain FV   The variable domain is the most important   region for binding to antigens   The FV contains     ...
In Silico Design of Immunodiagnostics Assays for Anti TNF-α   Tumor necrosis factor-alpha (TNF-α), a cytokine involved in ...
Computational Protein Affinity Design for Anti TNF-α Antibodies     Pablo Carbonell (iSSB)   Computational Protein Design  ...
Building the Model   No crystal structure available of the   TNF-α antibody-antigen complex   Therefore, our first step is ...
Docking and Scoring   Using zDock (Accelrys Inc.) for the   generation of docked complexes        Fast Fourier Transform b...
Hot-spots and Energy Minimization   Predicting hot-spots        By using Foldx , we performed an in silico alanine        ...
In Silico Combinatorial LibraryIn silico combinatorial libraries of mutants around thecomplementary determining regions (C...
Virtual Screening   The most beneficial mutations were selected in order to build a combinatorial   library of double and t...
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
The Modular Organization of Binding Sites     Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010 ...
The modular Distribution of Domain-Domain BindingWhy choosing domains?    Domains form independent structural and    funct...
Graph Modular Decomposition                                                                               K               ...
Modularity   Modularity Qs is a measure of how tightly members of a module s interact                                     ...
Biding Site and Modular Overlaps   Modular composition of binding site j :             mj = (mj1 , mj2 , . . . , mjM )    ...
The Modular Organization of Domain-Domain Interfaces      Non-overlapping binding sites      are assigned to different    ...
Using Modularity to Identify Binding Regions   Modularity can be used to   identify binding surfaces   Accuracy and covera...
Intra-Module Cooperativity and Inter-Module Independence   Human IL-4: a cytokine that plays a   regulatory role in the im...
Intra-Module Cooperativity and Inter-Module Independence   TEM1 β-lactamase confers antibiotic   resistance to E. coli   T...
Intra-Module Cooperativity and Inter-Module IndependenceTCR hVβ2.1 (TSST-1 antibody). 2 cooperative distant clusters      ...
Modularity as a Measure of Residue Cooperativity   Protein domains can be decomposed into a set of modules that contain gr...
Energetic Determinants of Protein Binding Affinity  The modular decomposition of protein  structures is a structural charac...
Binding Site ClusteringSingle and multiple interfaces    Binding sites correspond to residues interacting with the partner...
Protein Binding Affinity and Specificity   Binding energies and alanine scanning for each complex estimated using FoldX   [S...
Hot-Spots and Partner Motifs   A hot-spot : |∆∆Gbind | = |∆GMUT →ALA − ∆GWT | ≥ 2 kcal/mol   In most of the cases, hot-spo...
Hot-spots Modular Distribution and Specificity        We have shown already examples of energetic independence of hot-spots...
Modular Distribution of Hot-spots and SpecificityUbiquitin. A promiscuous protein with weak interactions                   ...
The Role of Thermodynamics in Promiscuous Binding   In general, protein-protein interactions involving promiscuous binding...
Large-scale Analysis Workflow    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   35 / 58
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
Applications in Synthetic Biology: Design of Metabolic PathwaysThe Bio-RetroSynth project                               AN...
Tasks in the Bio-RetroSynth project   Bioretrosynthesis. Graphs for heterologous compounds production in E. coli   Computa...
The Signature Reaction Space σ(R)    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   39 / 58
Examples of Retrosynthesis Graphs in the Reaction Signature Space                                 RetroPath : an online-to...
Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   41 / 58
Ranking Pathways   Gene heterogeneity   Heterologous gene expression   Enzyme performance for the specified reaction   Comp...
Predicting Compound Toxicity        MIC (IC50) assays in E. coli for commercial chemical compounds, including        antib...
Enzyme Performance   Putative reactions R ∗ discovered in the signature space h σ(R) by the   retrosynthesis algorithm oft...
Taking Advantage of Enzyme Promiscuity in Protein Engineering        Enzymes can potentially process multiple substrates o...
A Quantitive Definition of Enzyme PromiscuityDefinitions    Enzyme multispecificity: the ability of enzymes to transform a br...
Catalytic and Substrate PromiscuityGiven two reactions RA and RB that an enzyme can process :    The enzyme has catalytic ...
Molecular Signatures-Based Prediction of Enzyme PromiscuityBuilding the dataset      Pablo Carbonell (iSSB)   Computationa...
Support Vector Machine AlgorithmSignature space ishighly-dimensional:    2-mers: 202    3-mers: 203    4-mers: 204    ... ...
Performance of the SVM Predictor   Accuracy reaches 85% for the whole dataset           Eukaryotes 88%           Prokaryot...
Secondary Structure Around Catalytic Sites                                                      Secondary structure distri...
Top k -mers in Promiscuity     Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   52 / 58
Application: Reverse Engineering of a Promiscuous TransaminasePromiscuity induced by directed evolution [Rothman and Kirsc...
Outline1   Applications in Systems and Synthetic Biology2   Protein Affinity Enhancement3   Protein Modular Design4   Prote...
Conclusions   Computational analysis of biological networks can provide insights into the   mechanisms of protein binding ...
AcknowledgmentsUniversity of Evry / Genopole                               National Museum of Natural HistoryiSSB - Faulon...
Computational Protein Design                     3. Applications of Computational Protein Design                          ...
Bibliography IS. C. Rothman and J. F. Kirsch. How does an enzyme evolved in vitro compare to naturally occurring homologs ...
Upcoming SlideShare
Loading in …5
×

Computational Protein Design. 3. Applications in Systems and Synthetic Biology

2,331 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,331
On SlideShare
0
From Embeds
0
Number of Embeds
85
Actions
Shares
0
Downloads
89
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Computational Protein Design. 3. Applications in Systems and Synthetic Biology

  1. 1. Computational Protein Design 3. Applications of Computational Protein Design Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 58
  2. 2. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 58
  3. 3. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 58
  4. 4. Applications of CPD in Systems Biology The challenge : robust and reliable The Structural Interactome methods of information correlation and integration of HT -omics networks Unveiling new relationships that closes the gap between molecular characteristics of proteins and other compounds within the cell systems characteristics of the cell as whole Computational intelligence algorithms for large-scale discovery studies Choosing the right set of descriptors Generating cellular interaction networks : the structural interactome Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 58
  5. 5. Applications of CPD in Synthetic Biology Engineering signal transduction: modifying the specificity and specificity of receptors Engineering genetic networks Modifying transcription Targeting gene repair and modification Novel biosensors Minimal cells and synthetic genomes Metabolic pathway engineering Feedback loops design and sensitivity analysis Programmable switches: allosteric, epigenetic, riboswitches Conditionally delivery of drugs Modulation of signal transduction pathways Inhibition of protein function Adoption of a toxic conformation Cell-cell communication Orthogonal genes Mathematical dynamical models Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 58
  6. 6. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 58
  7. 7. Antibody-Antigen Interactions Antibodies are gamma globulin proteins found in the immune system of vertebrates Basic structural units: Two large heavy chains (VH ) Two small light chains (VL ) The Fab region or fragment antigen-binding is a region of an antibody that binds to antigens The Fc region or fragment crystallizable region is the tail region that interact with cell surface receptors The FV region : variable domain Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 58
  8. 8. The Variable Domain FV The variable domain is the most important region for binding to antigens The FV contains 3 variable loops of β-strands on the light chain VL 3 variable loops of β-strands on the heavy chain VH These loops are referred to as the complementarity determining regions (CDRs) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 58
  9. 9. In Silico Design of Immunodiagnostics Assays for Anti TNF-α Tumor necrosis factor-alpha (TNF-α), a cytokine involved in systemic inflammation, can induce several cell responses depending on the cellular context: activation of NF-κβ-mediated proliferative programs programmed cell death. The early detection of innusual concentrations of TNF-α is a diagnostic biomarker of inflammation conditions such as metabolic disorders (obesity), rheumatoid, tuberculosis, and cancer diseases. Moreover, the use of anti-TNF-α inhibitors have appeared in recent years as a new therapeutic approach for inflammatory immune-mediated diseases. The currently used TNF-α inhibitory molecules are antibodies or soluble TNF receptors which sequester TNF-α. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 58
  10. 10. Computational Protein Affinity Design for Anti TNF-α Antibodies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 58
  11. 11. Building the Model No crystal structure available of the TNF-α antibody-antigen complex Therefore, our first step is to build a model of the complex through structural homology and docking TNF-α trimer Anti-TNF-α model from Swiss-Model Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 58
  12. 12. Docking and Scoring Using zDock (Accelrys Inc.) for the generation of docked complexes Fast Fourier Transform based protein docking program. The top 2000 ranked predictions are returned. Scoring the complexes through the use of FastContact Contact binding free energy scoring tool for protein-protein complex structures The estimates are based on rigid bodies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 58
  13. 13. Hot-spots and Energy Minimization Predicting hot-spots By using Foldx , we performed an in silico alanine scanning in order to predict consensus hot-spots for the models. These hot-spots were experimentally verified in the laboratory by the experimental group. 3 initial models were selected based on different criteria: minimum predicted binding energy in FastContact highest coverage of known hot-spots in anti-TNF-α. Energy was then minimized for the complexes by using Discovery Studio (Accelrys Inc.). Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 58
  14. 14. In Silico Combinatorial LibraryIn silico combinatorial libraries of mutants around thecomplementary determining regions (CDR) were built asfollows: Models for single-mutation variants were computed through through the use of Biopolymer and Builder (Accelrys Inc.) for rotamer selection and side chain positioning Mutants were then submitted to a cluster machine of 64 × 4-core nodes for local energy minimization of the CDRs by using gromacs Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 58
  15. 15. Virtual Screening The most beneficial mutations were selected in order to build a combinatorial library of double and triple mutants. Variants with the lowest predicted binding affinity were shortlisted and compared with beneficial mutations observed in the literature Computation time: 2 weeks in 64 nodes × 4 cores cluster. The 6 best mutation were transferred to the molecular biology laboratory to be tested through ELISA immunoprecipitation assays. Then, a new round of virtual screening was launched starting from the best predicted variants. After three rounds, values close to a 3-fold improvement in binding affinity (measured as − log10 Kd ) were obtained. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 58
  16. 16. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 58
  17. 17. The Modular Organization of Binding Sites Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 58
  18. 18. The modular Distribution of Domain-Domain BindingWhy choosing domains? Domains form independent structural and functional units Dataset Domains are building blocks that can be Source : iPFAM rearranged to create proteins with different 330 protein domains functions 370 domain-domain interactions Domains are evolutionarily conserved: Multiple alignments different organisms use the same domains in 5 organisms: E. coli, S. cerevisiae, C. elegans D. melanogaster, H. sapiens protein-protein interactionsObjective : large-scale topological analysis of Binding site clustering :binding domains Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 58
  19. 19. Graph Modular Decomposition K " # X ls „ ds «2 Domains can be decomposed further Q= − (1) L 2L into connectivity modules by s=1 clustering the domain contact map ls = number of edges between nodes in module s G(V , E, C) ds = sum of node degrees in module s Girvan-Newman algorithm [PNAS L = total number of edges in the network (2002)] with maximum modularity stop rule [Kashtan and Alon, PNAS (2005)]: 1 The betweenness of all existing edges in the network is calculated first. Edge betweenness : the number of shortest paths between pairs of nodes that run along the edge 2 The edge with the highest betweenness is removed 3 The betweenness of all edges affected by the removal is recalculated 4 Repeat 2 and 3 until the modularity Q for the K connected clusters in the network becomes maximum Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 58
  20. 20. Modularity Modularity Qs is a measure of how tightly members of a module s interact „ «2 ls ds Qs = − (2) L 2L ls = number of edges between nodes in module s ds = sum of node degrees in module s L = total number of edges in the network ls L : fraction of edges in the network that connect vertices in the module s ` ds ´2 2L : the expected value of the same quantity if edges fall at random ˆs = ds ps = ds ds /2 l (3) 2 2 L ps : probability of an edge to connect nodes in module s ˆ In a randomly partitioned network, the expected modularity is Qs = 0 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 58
  21. 21. Biding Site and Modular Overlaps Modular composition of binding site j : mj = (mj1 , mj2 , . . . , mjM ) (4) Similarity in modular compoisition between binding sites i and j : PM k =1 mik mjk M(i, j) = (5) |mi||mj | Relative interface between i ad j : » – 1 ni nj C(i, j) = + (6) Kringle domain (PF00051) 2 Ni Nj Binding site A (blue) Binding site B (red) ni (nj ) : number of residues in i (j) with 1 4 3 ! contacts in j (i) C(A, B) = + (7) 2 10 8 Ni (Nj ): number of residues in binding site i (j) (2, 8, 0, 0, 0) · (0, 2, 3, 3, 0)T M(A, B) = √ √ (8) 68 23 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 58
  22. 22. The Modular Organization of Domain-Domain Interfaces Non-overlapping binding sites are assigned to different modules Modules with high modularity Q contain a significant percentage of binding site regions[Del Sol, Carbonell, PLOS Comp. Biology, (2007)] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 58
  23. 23. Using Modularity to Identify Binding Regions Modularity can be used to identify binding surfaces Accuracy and coverage of modularity and surface hydrophobic patches are greater than residue conservation Combining modularity with the other two methods improves notably the performance Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 58
  24. 24. Intra-Module Cooperativity and Inter-Module Independence Human IL-4: a cytokine that plays a regulatory role in the immune system IL-4 contains 3 energetically independent clusters of hot-spots located in 3 modules These hot-spots can be used to generate binding affinity and specificity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 58
  25. 25. Intra-Module Cooperativity and Inter-Module Independence TEM1 β-lactamase confers antibiotic resistance to E. coli This enzyme is inhibited by BLIP A mutagenesis study showed that there are 2 hot-spot clusters which are energetically independent These clusters are located in different modules Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 58
  26. 26. Intra-Module Cooperativity and Inter-Module IndependenceTCR hVβ2.1 (TSST-1 antibody). 2 cooperative distant clusters hGHbp (human growth hormone). Cooperative hot-spotsof hot-spots around the binding site located in 1 module distant to the binding siteCI-2 Serine protease Chymotrypsin inhibitor. A cluster of RI (ribonuclease inhibitor). Hot-spots located in differenthot-spot located far away from the binding interface modules are known to be independent Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 58
  27. 27. Modularity as a Measure of Residue Cooperativity Protein domains can be decomposed into a set of modules that contain groups of specialized residues Binding sites are usually located in highly cooperative modules Modularity, combined with sequence conservation and surface patches, can be used to predict functional regions This modular architecture confers robustness to protein structures and contributes to the determination of binding affinity and specificity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 58
  28. 28. Energetic Determinants of Protein Binding Affinity The modular decomposition of protein structures is a structural characterization of protein interactions In order to know more about the interplay between binding affinity and specificity, it is necessary a thermodynamics characterization We focus in this study on one specific interactome: the yeast interactome (main source: MIPS) Structural interactome: for 259 hubs (>5 partners) participating in 877 different interactions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 58
  29. 29. Binding Site ClusteringSingle and multiple interfaces Binding sites correspond to residues interacting with the partner at a distance ≤5Å Binding sites are mapped into the reference sequence of the hub and clustered by using a version of the algorithm in Teyra et al. [2008] 1 Compute the N × N binary distance matrix D where  1 i ∩j =∅ D(i, j) = δij (9) 0 i ∩j =∅ 2 Start with k = N clusters 3 Compute the {k − 1}-means clustering of D 4 Recompute D for the k − 1 clusters 5 Repeat step 3 while all binding sites within clusters overlap Total interfaces: 539, involved in 1 to 5 interactions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 58
  30. 30. Protein Binding Affinity and Specificity Binding energies and alanine scanning for each complex estimated using FoldX [Schymkowitz et al., 2005] Specific binding sites tend to bind their partners with higher affinity than promiscuous sites Interactions between promiscuous binding sites tend to be weaker Interaction type −∆G [(kcal/mol)/resid] Specific-specific 0.93 Promiscuous-promiscuous 0.85 Specific-promiscuous 0.50 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 58
  31. 31. Hot-Spots and Partner Motifs A hot-spot : |∆∆Gbind | = |∆GMUT →ALA − ∆GWT | ≥ 2 kcal/mol In most of the cases, hot-spots are specific to one interaction. Some of them are promiscuous Are hot-spots specific? Binding site motifs of interacting partners are determinants of specificity As the promiscuity of the hot-spots increases, the number of common motifs in the partners increase A common evolutionary origin of divergent partners in promiscuous binding Number of interac- Average number of common tions in hot-spots motifs interacting with hot- spots 1 1.4 2 2.5 3 3.0 4 4.0 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 58
  32. 32. Hot-spots Modular Distribution and Specificity We have shown already examples of energetic independence of hot-spots in modules Furthermore, the relative number of binding site modules containing hot-spots increases with the number of partners A small part of hot-spots participate in more than one interaction, probably acting as binding site anchors[ Carbonell, Nussinov, Del Sol, Proteomics, 2009] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 58
  33. 33. Modular Distribution of Hot-spots and SpecificityUbiquitin. A promiscuous protein with weak interactions Cytochrome b. An example of a specific binding site Calmoduline-dependent kinase. An example of a specificcdc42 GTPase. It contains a central module acting as a site binding siteanchor Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 58
  34. 34. The Role of Thermodynamics in Promiscuous Binding In general, protein-protein interactions involving promiscuous binding sites are weaker Proteins generally interact with partners with a similar degree of promiscuity Hot-spots in promiscuous binding sites tend to be more distributed over different modules Knowing the modular distribution of hot-spots involved in different interactions might allow us to rationally modify binding specificity and affinity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 58
  35. 35. Large-scale Analysis Workflow Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 58
  36. 36. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 58
  37. 37. Applications in Synthetic Biology: Design of Metabolic PathwaysThe Bio-RetroSynth project ANR Chair d’Excellence, Faulon’s Lab Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 58
  38. 38. Tasks in the Bio-RetroSynth project Bioretrosynthesis. Graphs for heterologous compounds production in E. coli Computational protein design. Machine learning to mine genomic databases for predicting protein function Pathway design. Rank pathways to select the best to engineer Quantitative Structure-Activity Relationship (QSAR) for enzyme activity and inhibition based on experimental databases and toxicity assays. Metabolic engineering. E. coli plasmids in order to construct combinatorial libraries of highest rank heterologous pathways found to produce a target product Engineering optimization. Flux Balance Analysis (FBA) and non-linear optimization methods to maximize target yield Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 58
  39. 39. The Signature Reaction Space σ(R) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 58
  40. 40. Examples of Retrosynthesis Graphs in the Reaction Signature Space RetroPath : an online-tool for retrosynthesis search of metabolic pathways [D. Fichera, P. Carbonell, J.L. Faulon, Predicting heterologous compound-forming reaction pathways through retrosynthesis hypergraphs, in preparation]Penicillin (antibiotic) Galantamine (treatment of Alzeihmer’s disease) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 58
  41. 41. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 58
  42. 42. Ranking Pathways Gene heterogeneity Heterologous gene expression Enzyme performance for the specified reaction Compound toxicity Estimation of nominal fluxes Consistency of the predicted phenotype 0 1 X 1 X 1 C(p) = @ + het(gene) + tox(prod)A + (10) perf (gene) flux genes(p) prod(gene) p∗ = arg min C(p) (11) p Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 58
  43. 43. Predicting Compound Toxicity MIC (IC50) assays in E. coli for commercial chemical compounds, including antibiotics Molecular signature-based QSAR model[A.G. Planson, E. Paillard, F. Vogliolo, P. Carbonell, J.L. Faulon, unpublished] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 58
  44. 44. Enzyme Performance Putative reactions R ∗ discovered in the signature space h σ(R) by the retrosynthesis algorithm often lack annotated enzyme sequences in databases A protein design procedure has to be implemented in order to identify the best heterologous enzyme sequence candidate to insert Conceptually, the idea is to define a metric in the reaction σ(R) and sequence σ(S) signature spaces a convolution operation * between both spaces that generates the kernel function k ((R1 , S1 ), (R2 , S2 )) a machine-learning algorithm In practical terms, we are searching in the sequence space S for enzymes with a putative level of promiscuity for the desired reaction R ∗ Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 58
  45. 45. Taking Advantage of Enzyme Promiscuity in Protein Engineering Enzymes can potentially process multiple substrates or reactions We can study enzyme promiscuity to enhance enzyme efficiency by protein engineering techniques Enzyme promiscuity is an intermediate step in directed evolution[Tracewell and Arnold, 2009] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 58
  46. 46. A Quantitive Definition of Enzyme PromiscuityDefinitions Enzyme multispecificity: the ability of enzymes to transform a broad range of closely related substrates Promiscuous function: enzyme activities other than the native oneUsing reaction signatures to measure promiscuity : An enzyme is promiscuous if catalyzes at least 2 reactions with different signatures Reaction chemical diversity for reactions RA and RB at height h: h ||h σ(RA ) · h σ(RA )|| d(RA , RB ) = 1 − (12) ||h σ(R A )||2 + ||h σ(RB )||2 − ||h σ(RA ) · h σ(RB )|| Depending on the chosen h range, it is possible to distinguish between catalytic promiscuity and substrate specificity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 46 / 58
  47. 47. Catalytic and Substrate PromiscuityGiven two reactions RA and RB that an enzyme can process : The enzyme has catalytic promiscuity if 1 σ(RA ) =1 σ(RB ) (13)(We look at the bonds that are created and/or broken by the chemical transformation) The enzyme has substrate promiscuity if 0−3 σ(RA ) =0−3 σ(RB ) (14) (We look at the chemical structures of the substrates) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 47 / 58
  48. 48. Molecular Signatures-Based Prediction of Enzyme PromiscuityBuilding the dataset Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 48 / 58
  49. 49. Support Vector Machine AlgorithmSignature space ishighly-dimensional: 2-mers: 202 3-mers: 203 4-mers: 204 ... The SVM algorithm selects the weighted combination of data points (support vectors) that performs the best separation We compute from the support vectors the contribution or α-value of each signature to the prediction of promiscuity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 49 / 58
  50. 50. Performance of the SVM Predictor Accuracy reaches 85% for the whole dataset Eukaryotes 88% Prokaryotes 87% 4-mer α-value frequency [%] ALAA 10.9 13.9% AVAA 10.4 12.7% LAAA 11.3 11.4% ELAA 11.5 10.9% ... ... ... Distance to catalytic residues (Catalytic Site Atlas) Distribution of top k -mers provide insights into promiscuous active regions of the enzyme Top k -mers are depleted around catalytic sites of non-promiscuous enzymes Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 50 / 58
  51. 51. Secondary Structure Around Catalytic Sites Secondary structure distribution Beta Helix Loop All residues 15.69% 40.64% 43.67% Catalytic sites 23.79% 32.15% 44.05% Non-promiscuous 20.85% 33.65% 45.50% Promiscuous 30.00% 29.00% 41.00% Average deviation from random Helices are in general underrepresented in catalytic residues Beta strands are significantly overrepresented in promiscuous enzymes Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 51 / 58
  52. 52. Top k -mers in Promiscuity Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 52 / 58
  53. 53. Application: Reverse Engineering of a Promiscuous TransaminasePromiscuity induced by directed evolution [Rothman and Kirsch, 2003]: AATase (EC 2.6.1.1) → TATase (EC 2.6.1.5) Signatures (k -mers) with highest α-value change[Carbonell, P., Faulon, J.L., Bioinformatics, 2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 53 / 58
  54. 54. Outline1 Applications in Systems and Synthetic Biology2 Protein Affinity Enhancement3 Protein Modular Design4 Protein Promiscuity Reengineering5 Conclusions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 54 / 58
  55. 55. Conclusions Computational analysis of biological networks can provide insights into the mechanisms of protein binding affinity and specificity We use molecular graph descriptors in combination with systems-level characteristics to train machine-learning predictors of protein activity Applications Protein optimization Understanding protein function and evolution Design of synthetic biological circuits Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 55 / 58
  56. 56. AcknowledgmentsUniversity of Evry / Genopole National Museum of Natural HistoryiSSB - Faulon’s Lab Promiscuity & EvolutionMetabolic Engineering & Synthetic Biology Guillaume Lecointre Jean-Loup Faulon Anne-Gaelle Planson National Cancer Institute (NIH) Davide Fichera Ioana Popescu Hot-spots & Specificity Julio Peyroncely Elodie Paillard Florence Vogliolo Chloe Sarnowski Ruth Nussinov Antoine Decrulle University of North CarolinaFuijrebio NMR spectroscopyStructural Bioinformatics Andrew Lee Antonio del Sol Hirotomo Fujihashi Polytechnic University of Valencia Dolors Amoros Marcos Arauzo-Bravo Computational IntelligenceSwiss Institute of Bioinformatics Jose Luis Navarro Adolfo HilarioPeptide identification in HPLC/MS Polytechnic Institute of NYU Ron D. Appel Alexandre Masselot Nonlinear dynamics Zhong-Ping Jiang Shiwendra Panwar Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 56 / 58
  57. 57. Computational Protein Design 3. Applications of Computational Protein Design Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 57 / 58
  58. 58. Bibliography IS. C. Rothman and J. F. Kirsch. How does an enzyme evolved in vitro compare to naturally occurring homologs possessing the targeted function? Tyrosine aminotransferase from aspartate aminotransferase. Journal of molecular biology, 327(3):593–608, March 2003. ISSN 0022-2836. URL http://view.ncbi.nlm.nih.gov/pubmed/12634055.Joost Schymkowitz, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano. The FoldX web server: an online force field. Nucleic acids research, 33(Web Server issue), July 2005. ISSN 1362-4962. doi: 10.1093/nar/gki387. URL http://dx.doi.org/10.1093/nar/gki387.Joan Teyra, Maciej Paszkowski-Rogacz, Gerd Anders, and M. Teresa Pisabarro. SCOWLP classification: structural comparison and analysis of protein binding regions. BMC bioinformatics, 9:9+, January 2008. ISSN 1471-2105. doi: 10.1186/1471- 2105- 9- 9. URL http://dx.doi.org/10.1186/1471- 2105- 9- 9.Cara A. Tracewell and Frances H. Arnold. Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current opinion in chemical biology, 13(1):3–9, February 2009. ISSN 1879-0402. doi: 10.1016/j.cbpa.2009.01.017. URL http://dx.doi.org/10.1016/j.cbpa.2009.01.017. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 58 / 58

×