1. www.buffalo.edu
Introduction
The ability to predict the loop structure in a protein is
useful in many studies, including homology modeling,
protein design and docking.
There are significant challenges in obtaining the high
quality models as the loop length increases.
The current research aims to overcome the challenges
caused by the ruggedness of the energy landscape
around a native protein structure, i.e. the presence of high
energy barriers immediately around the structure by
locally manipulating the shape of the energy landscape
during certain steps of the conformational search.
Methodology
Sequence – Robust Loop Modeling with PyRosetta
Aparajita Dasgupta, Dr. Sheldon Park
Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Email: adasgupt@buffalo.edu, sjpark6@buffalo.edu
PyRosetta is the Python version of Rosetta, a suite of
software to support computational protein structure
analysis. In the context of Rosetta, the kinematic
closure (KIC) loop algorithm, allows prediction of the
structure of loops of up to twelve amino acids with high
accuracy, i.e. < 1 Å (Mandell et al Nature Method 2009,
6:551-2).
We note that protein structure, especially the main
chain conformation, often exhibits robustness against
small sequence variations. Using such transient
mutations which smooth the energy landscape creates
the possibility of improving results during the
conformational search.
Figure 1: Procedure to improve conformational search by introducing transient mutations using KIC loop
protocol in PyRosetta
Results
Most protein structures yielded “funnel – shaped”
continuous graphs while only some diverged from this
trend
Merely increasing the number of wild type structures
(structures without any alanine mutation) did not lead to
improved results
Results Future Work
Citations
Acknowledgments
Figure 2: RMSD vs minimized energy for each of the 20 wild type (non-mutated) proteins. Each graph
represents 600 structures generated by the KIC loop protocol. Note the funnel shaped contour in most
cases. For the proteins where the contour develops differently, prediction of loop structure is very difficult due
to the presence of multiple conformations with different energies at the same RMSD
Figure 3: RMSD vs minimized energy for 3 wild type(1cnv, 1t1d and 1i7p) proteins. Each graph represents
7500 structures generated by the KIC loop protocol for wild type structures. Although the overall energy
surface behaves similar as in the case of 600 structures, there is no marked improvement in either minimizing
energy or predicting loop structure. This leads to the conclusion that site directed mutagenesis is indeed the
right approach. Furthermore, increasing the structures also did not yield the classic “funnel-shaped” energy
contour that is favorable for loop prediction as is evident in the cases of 1cnv and 1i7p. This is due to the fact
that while the number of conformations does indeed increase, the energy landscape is not smoothed and
hence those structures which may be possible but are not calculated due to the presence of a local maxima
are not taken into account in this case as well.
One dimensional analysis of RMSD did not yield any
conclusive results to point out which amino acids (if any)
led to more difficult energy landscapes for modeling
purposes
Mutated structures led to lower energy and resulted in
better structure prediction
Figure 4: Boxplots depicting distribution of LRMSD for each of the 20 amino acids. For each proteins and its
13 versions (12 mutants and 1 wild type), the minimum RMSD was calculated and the mutated residue for
that particular structure was noted. Boxplots were plotted to visualize if any clear trends appeared signifying
which amino acids posed an issue in de-novo modeling. While some amino acids are common in occurrence
as compared to others, a clear trend was not visible while plotting. The main conclusion drawn from this
exercise was that one dimensional analysis does not yield any trends and that a two dimensional analysis of
RMSD with another observable property (Energy, in current experiment) is vital to clearly understand the
bottlenecks associated with loop modeling
Figure 5: RMSD vs minimized energy for all 20 proteins for wild type and mutant structures. Each data point
on each graph represents a single average structure from the cluster which were formed from each type of
mutant. The blue data points are mutant structures while the purple data points are wild type structures. In all
cases the mutated structures had lower energy than the wild type structure. This leads us to the conclusion
that site directed mutagenesis can indeed lead to improved de novo structure prediction when coupled with
the KIC loop protocol. Since energy and RMSD are significantly lower than the wild type structures, the odds
of arriving at a correct structure increase greatly when using these mutated structures.
While applying site directed mutagenesis led to better results,
there are still minor differences in the predicted structure and
the actual structure
Our initial approach was to combine all mutants and wild
type structures together and determine whether this
smoothed the energy landscape further
However, this approach did not yield conclusive results
The current approach is to aim to linearize the RMSD and
energy relationship for each protein near the lowest energy
threshold obtained using linear regression techniques and
neural networks
The authors would like to thank the UB School of Engineering
and Applied Science
Figure 6: 1cnv native structure and minimum energy model mutated back to wild type. The RMSD is 3.3 A for
this system. The current algorithm still leaves a few questions to be answered with regards to the energy
function, the role of each type of amino acid and the characteristic energy landscape for each protein
1. Mandell, J. D., Coutsias, A. E., & Kortemme, T.
(2009). Sub-angstrom accuracy in protein loop
reconstruction by robotics-inspired conformational
sampling. Nature Methods .
2. Baugh, E. H., Lyskov, S., Weitzner, B. D., & Gray, J.
(2011). Real-Time PyMOL Visualization for Rosetta
and PyRosetta. PLOS One .
3. Das R, Baker D (2008) Macromolecular modeling
with Rosetta. Biochemistry 77: 363–382.