1. INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
PROTEIN ENGINEERING
STRATEGIES
Group 4:
Sourik Dey (Enrollment ID:18610023)
Yogiraj Jakkal (Enrollment ID:18610011)
Tiasa Sen (Enrollment ID:18610031)
Sushmita Chakraborty (Enrollment ID:18610028)
Jashaswi Basu (Enrollment ID:18610012)
Nishant Jyoti (Enrollment ID:19903005)
Anil Kumar Koundal (Enrollment ID:18903030)
2. WORKS OF FRANCES H. ARNOLD
2
• Frances H. Arnold received a nobel prize in
the field of Chemistry in 2018 for her work
in the “directed evolution of enzymes”.
• She used the principles of genetic
change and selection to develop
proteins that can solve mankind’s
chemical problems.
• Directed evolution is an iterative procedure
which involves the identification of a
starting state protein, diversification of its
gene, an expression and screening strategy,
re-diversification, re-screening, and so on
until a satisfactory performance level in
terms of enzymatic activity, binding affinity
or specificity is reached.
4. •Random mutagenesis is
applied to protein.
•A selection regime is used to
pick out variants that have the
desired qualities.
•Further rounds of mutation
and selection are then applied.
•Creating libraries of variants
processing desired properties.
4
Steps of Directed Evolution
5. The formation of mutation in the gene of interest by the following diverse ways:-
□ Error Prone PCR
□ DNA Shuffling
□ Saturation Mutation
Occurrence of Mutations
GOAL
STARTING GENE
5
6. Error-Prone PCR relies on the
misincorporation of nucleotides by DNA
polymerase to generate point mutation in
a gene sequence.
The low fidelity of DNA polymerases
under certain conditions generates point
mutations during PCR amplification of a
gene of interest.
Increased magnesium concentration
supplementation with manganese or the
use of mutagenic dNTPs can reduce the
base pairing fidelity and increase
mutation rate.
Error Prone PCR
7. DNA Shuffling
Here DNase is used to fragment a set of parent genes into pieces of 50-100 bp in
length.
This is then followed by a polymerase chain reaction (PCR) without primers.
DNA fragments with sufficient overlapping homologous sequence will anneal to
each other and are then extended by DNA polymerase.
Several rounds of this PCR extension are allowed to occur, after some of the DNA
molecules reach the size of the parental genes.
These genes can then be amplified with another PCR, this time with the addition
of primers that are designed to complement the ends of the strands.
It is possible to recombine portions of these genes to generate hybrids or chimeric
forms with unique properties, hence the term DNA shuffling
9. Sequence Saturation Mutagenesis
9
• Sequence saturation mutagenesis (SeSaM) is a chemo-enzymatic random
mutagenesis method applied for the directed evolution of proteins and enzymes.
• In four PCR-based reaction steps, phosphorothioate nucleotides are inserted in the gene
sequence, cleaved and the resulting fragments elongated by universal or
degenerate nucleotides.
• These nucleotides are then replaced by standard nucleotides, allowing for a broad
distribution of nucleic acid mutations spread over the gene sequence.
• A preference to transversions is given in this technique along with N unique focus on
consecutive point mutations, both difficult to generate by other mutagenesis
techniques.
10. STEP I of SeSaM
Universal “SeSaM”-sequences inserted by
PCR with gene-specific primers binding in
front of and behind the gene of interest.
The gene of interest with its flanking
regions is amplified to introduce these
SeSaM fwd and SeSaM rev sequences.
Generated fwd template and rev templates
are amplified in a PCR reaction with a pre-
defined mixture of phosphorothioate and
standard nucleotides to ensure an even
distribution of inserted mutations over the
full length of the gene.
PCR products of Step 1 are cleaved
specifically at the phosphorothioate bonds,
generating a pool of single-stranded DNA
fragments of different lengths starting from
the universal primer.
11. STEP II, III & IV of SeSaM
Step 2 - DNA single strands are elongated by
one to several universal or degenerate bases
catalyzed by terminal deoxynucleotidyl
transferase (TdT). This step is the key step to
introduce the characteristic consecutive
mutations to randomly mutate entire codons.
Step 3 - PCR is performed recombining the
single stranded DNA fragments with the
corresponding full-length reverse template,
generating the full-length double stranded
gene including universal or degenerate bases
in its sequence.
Step 4 - Replacement of the
universal/degenerate bases in the gene
sequence by random standard nucleotides in
SeSaM to generate a diverse array of full-
length gene sequences with substitution
mutations.
12. PROTEIN STRUCTURE AND RELATION OF
CHROMOPHORE IN RHODOPSIN
Ref.: Herwig et. al., Directed Evolution of a Bright Near-Infrared Fluorescent Rhodopsin Using a Synthetic Chromophore, Cell
Chemical Biology 24, 415–425, March 16, 2017 Elsevier Ltd., http://dx.doi.org/10.1016/j.chembiol.2017.02.008
12
13. IDENTIFICATION
OF GENE
CONSTRUCT
PREPARATION &
INTEGRATION
ERROR PRONE
PCR AND
SELECTION
GENE CONSTRUCT AND METHODOLOGY
Eukaryotic construct for expression of a fusion of GFP andArch
mutant (Arch Mut) driven by a CaMKIIa promoter. TS:
trafficking sequence. ES: Export signal. WPRE: woodchuck
hepatitis virus posttranscriptional enhancer
13
14. RESULTS
Ref.: Herwig et. al., Directed Evolution of a Bright Near-Infrared Fluorescent Rhodopsin Using a Synthetic Chromophore, Cell
Chemical Biology 24, 415–425, March 16, 2017 Elsevier Ltd., http://dx.doi.org/10.1016/j.chembiol.2017.02.008
14
15. ENZYME KINETICS AND FLUOROSCENCE
INTENSITY COMPARISON
Ref.: Herwig et. al., Directed Evolution of a Bright Near-Infrared Fluorescent Rhodopsin Using a Synthetic Chromophore, Cell
Chemical Biology 24, 415–425, March 16, 2017 Elsevier Ltd., http://dx.doi.org/10.1016/j.chembiol.2017.02.008
15
17. BIOCATALYSIS
17
• Use of natural substances to speed up chemical reactions
• Biological sources- enzymes/ whole cells
• Pharmaceutical, chemical, food & agro-based industries
• Benefits over chemical catalysis-
o Toxic by-products bypassed- cleaner, no need to clean toxins
oEnzymes have specificity & ability to function in mild
conditions
oEnzymes larger than traditional catalysts- more contact points
between substrate & enzyme
oModifications easily made by protein engineering, so that an
enzyme can work with a different substrate.
18. • Major factors to be accounted for- reaction kinetics & stability
• Understanding structure & function of enzymes-
o stability
o activity
o sustainability
o substrate specificity
18
19. PROCESS OF BIOCATALYSIS
Target
Primary screening:
Commercial enzymes
Existing enzyme libraries
Microorganisms
Suitable enzyme/ whole cell for biocatalysis
Secondary screening:
Kinetics
High- level expression/ metabolic engineering
Selectivity/ Productivity
Directed evolution
Optimized enzyme/ whole cell
Application & process engg. :
Solubilized or immobolized process
Aqueous or biphasic system
Product recovery/ enzyme/cofactor recycle
Economics
Optimized bio or chemo-bio process
Scale-up:
Engineering
Waste handling
Environmental impact
Production plant
19
20. Biocatalysis of α- isophorone to ketoisophorone
• Monoterpenoid α- isophorone sourced from renewable plant dry matter
• Can be hydroxylated to 4-hydroxy-isophorone which is the main
precursor for the synthesis of ketoisophorone.
• Ketoisophorone is a key intermediate for the production of carotenoids
and Vitamin E.
• Chemical route:
α- isophorone β- isophorone ketoisophoroneisomerization
High temp.
Equilibrium shifted towards
substrate; only 2% yeild
commonly obtained
20
21. • Direct selective allylic oxidation of α- isophorone to ketoisophorone was
also demonstrated but: it required the use of toxic heavy metals
undesired toxic by-products yielded
requires harsh conditions
• Greener way-
Enzyme- catalysed hydroxylation of α- isophorone to 4-
hydroxyisophorone (HID) and further oxidation of this to obtain
ketoisophorone
Biocatalysis of α- isophorone to ketoisophorone
21
22. Water
Biocatalysed oxidation of 4HID to KET with a two-enzyme system using an
alcohol dehydrogenase (ADHaa) & NOX to regenerate the NADPH using
oxygen as a sacrificial substrate
O NADP+ NADPH+ H+O
NOX
O
OH
ADHaa
Biocatalysis of α- isophorone to ketoisophorone
22
23. CATALYTIC PROMISCUITY
• The early applications of directed evolution of enzymes aimed to optimize the stability
and performance under new reaction conditions.
• Arnold and co-workers have repeatedly shown that it is possible to evolve enzymes to
improve their activity under new conditions in terms of solution composition,
temperature, etc., and to change their catalytic activity towards new substrates and
reactions.
• This is possible as long as the enzyme that is chosen as a starting point has at least some
low level of activity for the intended reaction, i.e. some level of catalytic promiscuity.
23
24. DIRECTED EVOLUTION OF TRYPTOPHAN SYNTHASE
24
• Tryptophan synthase (TrpS) is a pyridoxal phosphate (PLP)-dependent enzyme that
catalyzes the condensation of indole and L-serine to form L-tryptophan.
• The enzyme consists of 2 subunits : α and β, which have low catalytic efficiencies in
isolation. Their activities increase upon complex formation.
• TrpB loses upto 95% of its activity and is subject to inactivation outside of its native
complex.
• AIM : to check if directed evolution could be used to recover the activity lost when trpA
is removed and create a highly active stand-alone trpB enzyme.
25. Selection of parent enzyme, TrpB, from Pyrococcus furiosus
Directed Evolution of PfTrpB for Stand-Alone Function
Recombination of 12 most activating mutations from first
generation
Screening of 1208 clones to identify PfTrpB4D11 and
PfTrpB0B2
Biochemical Comparison of Evolved PfTrpB Enzymes with PfTrpS
25
26. DIRECTED EVOLUTION OF CYTOCHROME 450
• In a series of studies, Arnold and co-workers changed the activity of cytochrome P450 to
catalyse a set of reactions for which no specific enzyme was previously available.
• The intuition for novel reactions for a given enzyme is based on mechanism or chemical
and structural similarities.
26
27. • One such reaction is cyclopropanation. Cytochrome P450 has a catalytic promiscuity and
an ability to catalyse, with very low efficiency, the cyclopropanation of styrene by
ethyl-diazoacetate (EDA).
• To optimize catalytic activity, a change of the iron-ligating residue from Cys to Ser or His
was included, leading to a shift in the characteristic 450-nm Soret peak in the
absorbance spectrum of the enzyme to 411 nm. Therefore, the evolved enzymes were
called cytochrome P411.
An evolved biocatalyst for
cyclopropanation. The cytochrome P411
variant of cytochrome P450 with the
protein backbone shown as ribbon
representation and sidechains as sticks.
Side-chains that were mutated in
engineered variants are shown in red.
27
29. STRUCTURAL CHARACTERIZATION OF ENGINEERED
PROTEINS
29
To understand and gain a pictorial visualization of the protein-subunit
interfaces involved in activity regulation, active site organization of the
engineered enzymes, and substrate and cofactor binding-sites, the crystal
structures of the engineered proteins are determined.
Visualizing the advanced protein variants at the molecular level tells the
story behind beneficial mutations. These crystal structures provide the
foundation of the protein engineering efforts undergone by the research
group.
30. Directed evolution
Selection of the Parent
Enzyme
Comparison of Kinetics
of Evolved Enzyme with
Wild Type
Structural
Characterization of
Mutated Enzyme
30
31. X-ray crystallography is a tool used for determining the atomic and
molecular structure of a crystal, in which the crystalline atoms cause a beam
of incident X-rays to diffract into many specific directions. By measuring the
angles and intensities of these diffracted beams, a crystallographer can
produce a three-dimensional picture of the density of electrons within the
crystal. From this electron density maps, the mean positions of the atoms in
the crystal can be determined, as well as their chemical bonds, their
disorder and various other informations.
31
X-Ray Crystallography:
33. Directed evolution of an iron-containing enzymatic
catalyst—based on a cytochrome P450 monooxygenase—for
the highly enantioselective intermolecular amination of
benzylic C–H bonds by site-saturation mutagenesis
33
34. Active site view of the P-4 A82L
A78V F263L crystal structure,
showing the haem in white and the
iron atom in orange. Key active site
residues are labelled and shown
as sticks in blue. Residue S400
ligates the iron centre; mutations at
positions 78, 82, 263, and 267
enhance C–H amination activity
and/or selectivity. All beneficial
mutations identified in this study lie
in the P411 active site on the distal
face of the haem.
34
35. OTHER METHODS FOR STRUCTURE CHARACTERIZATION
35
□ Experimental Approaches:
▪ NMR Spectroscopy
▪ Cryo Electron Microscopy
□ Computational Approaches:
▪ Homology Modelling
▪ Fold Recognition
▪ Threading
36. References:
36
• Prier, C. K., Zhang, R. K., Buller, A. R., Brinkmann-Chen, S., & Arnold, F. H. (2017).
Enantioselective, intermolecular benzylic C–H amination catalysed by an engineered
iron-haem enzyme. Nature chemistry, 9(7), 629.
• Wright, C. M., Majumdar, A., Tolman, J. R., & Ostermeier, M. (2010). NMR characterization
of an engineered domain fusion between maltose binding protein and TEM1 β‐lactamase
provides insight into its structure and allosteric mechanism. Proteins: Structure, Function,
and Bioinformatics, 78(6), 1423-1430.
• Siezen, R. J., de Vos, W. M., Leunissen, J. A., & Dijkstra, B. W. (1991). Homology modelling
and protein engineering strategy of subtilases, the family of subtilisin-like serine
proteinases. Protein Engineering, Design and Selection, 4(7), 719-737.
• García-Nafría, J., Lee, Y., Bai, X., Carpenter, B., & Tate, C. G. (2018). Cryo-EM structure of
the adenosine A2A receptor coupled to an engineered heterotrimeric G protein. Elife, 7,
e35946.
• http://fhalab.caltech.edu/
• https://en.wikipedia.org/wiki/X-ray_crystallography
38. Path towards Structure-Guided Recombination
38
• In Natural evolution, genes from different individuals are mixed
through mating or pollination.
• This leads to combination of beneficial properties and loss of less
functional gene mutation.
• Willem Stemmer used the test tube equivalent to mating i.e., DNA
shuffling to achieve the same target.
• Using several cycles of DNA shuffling he changed enzymes so that
it became much more effective than the original enzyme.
39. Schnepf H.E, etal. Bacillus thuringiensis and its pesticidal crystal proteins. Microbiology and
Molecular Biology Reviews. 1998, 62:775-806.
39
40. Structure-Guided Recombination
• Homologous recombination is much more conservative than
random mutation, and leads to better protein folding probability as
compared to random mutation.
• Results in formation of chimeric enzymes which are different from
one another in seq. but have greater protein folding probability.
• So, in this method recombination is guided by structural
information.
• To do this computer algorithm has been developed called as
SCHEMA.
40
41. SCHEMA Recombination
41
• The primary goal is to maximize the mutation level of the chimeras
and the probability of folding in order to promote functional
evolution without disrupting structure.
• This algorithm select crossovers to minimize the average
disruption, E , of the library, subject to constraints on the length of
each fragment.
• SCHEMA disruption E counts the number of interactions that are
broken by recombination.
• Libraries of various proteins such as arginases, beta-lactamases has
been developed by this method.
42. 42
Diverse Chimeras Created by Site-Directed Recombination
A.Site-directed recombination of three bacterial cytochromes P450 showing crossover sites
chosen to minimize the number of disrupted contacts.
B.Sequences of three parents and 97 folded P450 chimeras and number of amino acid changes
relative to the closest parent
Arnold, F. H. et al (2006). Structure-guided recombination creates an artificial family of cytochromes P450. PLoS
43. 43
Advantages of Structure-Guided Recombination
• Helps in creating novel, highly functional protein diversity.
• Helps in understanding the benefits of recombination in evolution.
• Recombination in test tube is not limited to two parents, nor to
sequences from the same species.
• Enables the recombination of more distant parents.
45. Motivation
• Screening is the most laborious and resource-intensive step.
• The size of mutant library grows exponentially with the
number of residues in protein.
• Inadequate biophysical prediction methods to map
mutation- function relationship
• MD simulations require hundreds of hours of processing and
mechanistic understanding of the reaction.
• Machine Learning is a powerful, efficient, and versatile
tool for variety of applications.
• Leverage known data to guide future works.
47. Training Model
• Protein fitness data of Human GB1 protein from Wu et al.
(2016) was used.
• Simulations were performed.
• For ML, 570 variants were used. 95% library coverage and
3-fold the library size.
• The single-mutation walk to identify mutations at 4
positions
has 4+3+2+1 = 10 libraries.
• Therefore, 570 total variants.
49. Application
• Rma NOD catalyzes Me-EDA reaction.
• Rma NOD catalyzes Carbon-Silicon bond formation,
resulting in individual enantiomers with high
selectivity.
• Mutated form of the enzyme was used.
• Enantiomeric excess (ee) was used as fitness score.
• ee for (S)-enantiomer was increased from 76% to 93%.
• ee for (R)-enantiomer was found to be 79%.
53. Results and Discussion
• ML can be used to quickly screen a full recombination
library
in silico.
• Sidestep the need to understand
physico-chemical properties of novel proteins.
• Avoid negative epistatic mutational combinations.
• Can also give novel results.
54. References
• Zachary Wu et al., Machine learning-assisted directed
protein evolution with combinatorial libraries (2019), PMID:
30979809.
• Kevin K. Yang et al., Machine-learning-guided directed
evolution for protein engineering (2019), PMID: 31308553.
• Nicholas C. Wu et al., Adaptation in protein fitness
landscapes is facilitated by indirect paths (2016), PMID:
27391790.