SlideShare a Scribd company logo
1 of 52
Download to read offline
Bristol Centre for Complexity Science
Modelling the Folding and Stability of
de novo Hexameric Coiled-Coils
Adam Zienkiewicz
Supervisors:
Prof. Derek Wolfson (University of Bristol)
Dr. Richard Sessions (University of Bristol)
Prof. Noah Linden (University of Bristol)
A dissertation submitted to the University of Bristol in accordance with the requirements of the degree
of Master of Research by advanced study in Complexity Science in the Faculty of Engineering
Submitted: 7 January 2013
9,926 words
Abstract
The rational de novo design of proteins is making progressive inroads into expanding the huge
variety of intricate structures already provided by nature. The recent synthesis of a novel six-helix
coiled-coil CC-Hex represents an entirely new protein fold, possessing an internal pore with a va-
riety of potential bioengineering applications. Subtle modification to the peptide sequence results
in subdivision of the central channel and gives rise to conformational sensitivities with respect to
ambient environment. Although the substitution of hydrophobic leucine residues with polar aspar-
tate residues is accommodated within the channel of the hexamer, the stability of the structure is
compromised with increasing pH. A major step in understanding the chemistry and future applica-
tions of this unique protein is to be able explain and predict the observed folding behaviour. The
inherent complexities of current ab initio approaches to protein structure prediction are reduced by
considering only the dominant physiochemical interactions, specific to the L24D aspartate hexamer
mutant. Using a statistical description of transitional hexameric microstates, combined with electro-
static energy models, we describe a new approach to predicting the macroscopic unfolding behaviour
of the L24D hexamer. The models described in this study present the foundations of a potentially
powerful new tool in the field of protein structure prediction.
Acknowledgements
This work was supported by the Engineering and Physical Science Research Council (EPSRC) [Grant
number: EP/I013717/1].
Author declaration
I declare that the work in this dissertation was carried out in accordance with the requirements of the
University’s Regulations and Code of Practice for Taught Postgraduate Programmes and that it has not
been submitted for any other academic award. Except where indicated by specific reference in the text,
this work is my own work. Work done in collaboration with, or with the assistance of others, is indicated
as such. I have identified all material in this dissertation which is not my own work through appropri-
ate referencing and acknowledgement. Where I have quoted from the work of others, I have included
the source in the references/bibliography. Any views expressed in the dissertation are those of the author.
SIGNED: (Signature of student)
DATE:
Contents
1 Introduction 1
2 pH dependence of hexamer oligomerisation 4
2.1 Modelling the hexamer energy landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Two dimensional, regular geometric model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Relative permittivity of the hexamer channel . . . . . . . . . . . . . . . . . . . . . 8
2.3 Rigid backbone energy minimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 A statistical approach to model hexamer unfolding 14
3.1 Folding free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Intrinsic pKa of L24D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Expressing the pKa in terms of free energy . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Concentration of a general configuration . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Results and Discussion 20
4.1 Intrinsic aspartate pKa dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Folding free energy ∆Gf dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Dielectric constant εr dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Parameter space search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.1 Best fit parameters - rmsd minimisation . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Summary and Conclusions 32
A Geometric coulomb model of electrostatics 34
B Normalised energy and torsion angles for hexamer species fragments 35
C Log-concentration plot for L24D species 36
D L24D species concentrations for varying parameter values 37
D.1 Intrinsic aspartate pKa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
D.2 Folding free energy ∆Gf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
D.3 Dielectric constant (relative permittivity) εr . . . . . . . . . . . . . . . . . . . . . . . . . . 39
E Best-fit concentration data (rigid model energy landscape) 40
Introduction
1 Introduction
The de novo design of peptides and proteins is a rapidly developing approach for investigating and
extending the huge natural repertoire of protein structures and functions. The combinatorially huge
number of amino acid sequences is limited to a greatly reduced subset of those which will fold quickly
and uniformly to a single native state. Through the processes of evolution by natural selection, nature
has explored a vast number of these sequences, devising proteins which interact within a vast web of
biological process inside every living organism. It is well understood however, that the structural space
which has been explored by nature forms only a very limited fraction of potential protein structures and
functions [1]. The development of de novo proteins is therefore fundamental to increasing our under-
standing of protein structures, with the potential to present novel scaffolds with functional properties of
interest to a wide variety of biotechnological applications.
The development of synthetic proteins is a diverse field, with a variety of techniques used to extend the
boundaries of natural structures. Some methods have successfully focussed on replicating specific protein
features, with subtle modifications to known architectures in order to produces novel protein folds. Other
approaches include generating novel structures in silico [2], however a major impediment to the success-
ful implementation of these designs is our limited understanding of the amino acid sequence-to-structure
relationship, generally known as the ‘protein folding problem’. In very few cases, the rational or de novo
design of proteins has been successful with the help of structure prediction algorithms which circumvent
the application complex ab initio computational techniques [3]. For example, the sequence-to-structure
relationships of certain structural motifs are well established due to their ubiquity in natural proteins.
By compiling databases which classify proteins into ‘families’ based on conserved amino acid sequences,
a wide variety of bioinformatic techniques have been developed with the aim of predicting secondary
and higher order protein structures [4–6]. These techniques, known as homology methods, rely on ex-
perimentally determined structure/sequence data, combined with the development of sophisticated and
efficient algorithms used to compare and evaluate new sequence data. Unlike ab initio methods however,
which attempt to combine physically realistic forcefields with efficient methods to sample conformational
folding space in silico, homology methods are limited to known structural families and offer limited solu-
tions with regard to predicting and designing de novo proteins. Current ab initio methods are however
computationally intensive [7–9] and rely on incomplete or approximate physiochemical forcefields.
The focus of this research is to identify properties of a de novo protein which allow us to develop new
methods to predict folding behaviour with respect to specifically designed features of the protein. In
particular, we consider the pH response and structural stability of a novel six-helix coiled-coil protein by
developing a unique statistical approach to predict protein folding behaviour. The inherent complexities
of standard ab initio methods are avoided by exploiting the highly symmetric structure of certain pro-
teins such as coiled-coils, shedding light on new techniques which may be applied more broadly to tackle
the protein folding problem. This bold statement is somewhat justified in that coiled-coils, comprised of
a bundle of two or more alpha-helices, represent on average 3% of all protein-encoding regions across all
known genomes [10]. The universality of the coiled-coil motif, representing one of the simplest tertiary
protein structures, therefore provides a very useful example for protein folding and design investiga-
tions [11,12].
Coiled-coils are highly symmetric bundles of alpha-helices ranging from between 2-5 strands, with natu-
ral proteins favouring most commonly dimeric and trimeric coils, wound together like a strand of rope.
Coiled-coil regions are found in proteins involved in biologically important functions with great diversity
1
Introduction
Figure 1: Cartoon of NMR structure of the trimeric coiled-coil domain of chicken cartilage matrix
protein (matrilin-1) [PDB: 1AQ5] - taken from [15]
from gene transcription to regulating muscle proteins - examples include the dimerisation region of the
‘leucine zipper’ (GCN4) [13,14]; and the trimeric coiled-coil domain of a cartilage protein matrilin-1 [15]
- shown in figure 1. Crucially for the purposes of protein design, there is a well established sequence-
to-structure relationship which drives the folding and assembly of all coiled-coils, taking the form of a
repeated heptad of hydrophobic and polar residues (HPPHPPP)n. Originally proposed by Pauling [16],
the sequence repeat encodes amphipathic helices which interact via spiralling hydrophobic seams to form
a helical bundle. The seminal work of Crick in 1953 [17] later verified this elegant relationship, sup-
plemented with description of the highly specific helix side-chain interactions known as knobs-into-holes
(KIH) packing. Most recently, the consideration of these hallmark attributes has allowed the rational de
novo design of peptides which oligamerise into stable, six-helix (hexameric) coiled-coils, representing a
novel, stand-alone protein fold not found in nature [18].
The models developed in this research focus on the de novo protein CC-Hex, a six-helix coiled-coil
designed and synthesised by the Woolfson group at the University of Bristol [18]. Unlike previously ob-
served coiled-coils, comprised of 5 or fewer component helices, CC-Hex possesses the remarkable feature
of a central channel running the length of the structure, approximately 6˚A in diameter. Early indications
from X-ray diffraction experiments (XRD) and electron density analysis suggest the presence of a chain
of water molecules occupying the channel. Whether or not the pore is strictly water permeable is still
debated, however the existence of a well defined channel provides strong motivation for the rational
design of tubular or ion-channel proteins. The additional feature of CC-Hex, providing a crucial basis
for the models developed in this study, is the mutability of the leucine residue at position 24 on the
peptide, accepting polar residues such as aspartic acid (D) and histidine (H) - where the amino-acid
sequences for CC-Hex and the respective CC-Hex-D24 and CC-Hex-H24 mutants can be found in figure
2. By crystallising the stable hexamers at low pH, the mutant structures were analysed with XRD to
reveal a partitioning of the central channel at the polar residue with a large interior chamber and a
smaller chamber at the N-terminus (shown in figure 4 in the following chapter). As a result of these
single residue substitutions however, the helical folding of the mutant peptide was found to be almost
completely compromised at neutral ambient pH, unlike the parent hexamer.
2
Introduction
Figure 2: Amino-acid sequences for de novo hexameric coiled-coil protein CC-Hex (red) and L24
mutants. (blue) The CC-Hex-D24 (L24D) mutant protein - [18]
The design and synthesis of the novel hexameric protein provides an exciting new scaffold with a variety
of functional applications. As a stand-alone structure possessing a well defined and potentially mutable
internal pore, CC-Hex presents novel opportunities for the development of drug delivery applications,
ion-channels and other membrane spanning proteins. It is important therefore that we explore how
the chemistry of the internal channel, via the different mutant forms, affects the overall structure and
stability of the protein. The key objective of this research is to therefore to provide analytical models
which can explain and predict the folding behaviour and stability of the hexamer by highlighting the
dominant physical interactions between local structures. Specifically, in this research we consider the
folding behaviour of the aspartate CC-Hex-D24 mutant (L24D) protein with respect to experimental
observations.
3
pH dependence of hexamer oligomerisation
2 pH dependence of hexamer oligomerisation
The mathematical models developed in this research are constructed in order to reproduce and quantify
the experimentally observed unfolding of the L24D hexamer as a function of solution pH. In particular the
experimentally observed helicity of the L24D mutant, unlike the parent CC-Hex, responds significantly
with respect to the background pH in solution. Circular dichroism (CD) spectroscopic experiments pro-
vide a direct measure of the bulk helicity of peptides in solution by measuring the relative directional
(L/R) absorption of circularly polarised light. This helicity data can then be used to infer the structure
and conformation of proteins, where the detected signal indicates the overall helicity due to secondary,
and higher order structural features. For example, the signal detected from a solution of oligermerised
coiled-coils, combines components due both to the alpha-helical backbones (secondary structure) and
the (tertiary) helical coil of the multiple peptide backbones. Using helicity data gathered from these
experiments (figure 3) Zaccai & Chi et al. [18] inferred the denaturing, or unfolding, the L24D hexamer
as they varied the pH of the solution - observing a transition from fully folded hexamers below pH 3, to
less than 20% folded above pH 7.4.
Figure 3: Helicity of hexamer mutants as judged by the CD signal at 222nm as function of pH at
20◦
C. Plots indicate realtive folding of L24D (red diamonds), L24H (blue squares) and 1:1 mixture of
L24D:L24H (purple circles) in solution. Note that the parent CC-Hex (not shown) did not show any
appreciable changes in helicity or stability of this range of pH - data from [10,19]
The central mechanism affecting the stability of the L24D hexamer mutant, addressed in the following
sections, is thought to be primarily due to the repulsive electrostatic forces produced by proximal charged
acid groups in the hexamer channel, modulated by the pH of the background solution. The activity of
solvated hydrogen ions, in solution with the individual peptides, directly influences the electrostatic
forces between the individual peptide coils via the association / disassociation of hydrogen ions with the
polar side-chains of the aspartic acid (Asp) residues. In the context of the fully folded hexamer, the
helical backbones are oriented such that the Asp side-chains, located at residue 24, are directed towards
the inside the central channel running down the length of the hexameric coiled-coil, shown in figure 4.
4
pH dependence of hexamer oligomerisation
Figure 4: (left) Cartoon representation of the L24D hexamer looking down the central axis. The
helical peptide backbones (ribbons) are shown with the six aspartate residue side-chains (sticks) oriented
inwards towards the channel (≈ 6˚A in diameter). (right) Cutaway profile of hexamer with aspartate
‘ring’ identified within grey section - images created with PyMol from XRD data (PDB id: 3R46)
Crucial to the stability of the hexamer, the functional acid groups of the individual aspartate residues
respond to the pH of environment by accepting or donating a single hydrogen ion (H+
) with the effect
of modifying the net charge of the side-chain. More specifically, the aspartate residues are protonated,
or deprotonated, through the equilibrium reaction
HD D−
+ H+
(1)
where D represents an aspartic acid residue. The equilibrium constant, or acid disassociation constant,
for this process is defined in the usual way, as the quotient of the concentration of products and reactants
Ka =
[D−
][H+
]
[HD]
(2)
Due to the many orders of magnitude spanned by Ka, we can take the negative logarithm to obtain the
intrinsic pKa of the residue, defined as the pH for which the titrating site is 50% occupied, or protonated
pKa = − log10 Ka (3)
The reaction between an aspartic acid and its deprotonated, or conjugate base state, is shown in figure
5. When the pH of the solution is equal to the pKa, we expect to find approximately equal proportions
of both species, by definition. As the pH is varied, the concentration of solvated hydrogen ions changes,
according to
pH = − log[H+
] (4)
For example, as the pH increases (more basic) the concentration of H+
ions in solution decreases, acting
as a ‘hydrogen sponge’, tipping the equilibrium in favour of the base residue. In this scenario, deproto-
nation of the aspartic acid side-chain leads to a chemical resonance, in which some electrons, previously
shared covalently between a hydrogen and oxygen atom, become de-localised resulting in a negatively
charged site at the location indicated in figure 5. Conversely at low pH, the balance is favoured in
the other direction where an increased concentration of available H+
ions are ‘captured’ by otherwise
5
2.1 Modelling the hexamer energy landscape
deprotonated base residues. It is this transition, between neutral, protonated aspartate groups at low
pH, to negatively charged deprotonated states at higher pH, which causes variability in the net charge
of the hexamer as a function of pH. Moreover the proximity of multiple charged sites on the aspartate
side-chain, buried within the interior channel, is expected to provide sufficient electrostatic repulsion to
overcome the folding free energy of the hexamer itself, leading to the partial or complete unfolding of
the protein.
Figure 5: Acid disassociation reaction between neutral aspartate (AspØ
) (left), and negatively charged
conjugate (Asp ) (right)
2.1 Modelling the hexamer energy landscape
In the following sections we outline a simple geometric model of charge distribution within the hexamer
channel, which is then compared and validated with results using more a detailed molecular mechanics
forcefield. Once a consistent model describing the hexamer energy landscape is established, the princi-
ples of equilibrium thermodynamics can then be applied to derive the relative concentrations of folded
and unfolded states. Prior to exploring a detailed statistical approach in the next chapter, it is per-
tinent to develop simple models to predict the varying electrostatic energy of L24D with regard to its
specific folded conformation as a hexameric coiled-coil. From this foundation we can construct models
of increasing complexity, providing testable predictions of the folding behaviour and therefore valuable
information on the functional chemical properties of this novel protein.
As the background pH is increased the equilibrium between neutral, protonated aspartate residues (AspØ
)
shifts in favour of the negatively charged, deprotonated (Asp ) resonance. When considering the ensem-
ble of all peptides in solution, we therefore predict that the number of charged aspartate residues will
increase as a function of pH. At the microscopic level however, individual hexamers form a discrete dis-
tribution of uniquely charged states. By virtue of the regular structure of the folded hexameric peptides,
we are able to identify each of the possible charged ‘species’ and propose a simple model to approximate
their electrostatic contribution.
Given that each of the six aspartate residues within a hexamer can exist in either a neutral (AspØ
)
or charged (Asp ) state, there are exactly 26
= 64 possible configurations of charges. However, when
considering a parallel homo-oligamerization of indistinguishable peptides, the number of unique arrange-
ments of charged sites is reduced by symmetry to a total of 13 distinct species, identified with grey
shading in figure 6. To differentiate between the different charged hexamer species, a shorthand nomen-
clature is introduced designating each neutral AspØ
with lower-case d and charged Asp with upper-case
D, given below for each of the 13 configurations:
6
2.1 Modelling the hexamer energy landscape
Q = 0 : dddddd
Q = −1 : Dddddd
Q = −2 :



DDdddd
DdDddd
DddDdd
Q = −3 :



DDDddd
DddDDd
DdDdDd
Q = −4 :



DDDDdd
DDDdDd
DDdDDd
Q = −5 : DDDDDd
Q = −6 : DDDDDD (5)
where the groupings Q indicate the total charge in natural (proton) units from fully neutral to a full com-
plement of negative charges Q = −6. By considering the energetics of each charged species with respect
to the native folding energy of the hexamer, we can develop a statistical model allowing us to predict the
relative concentrations of each of the folded (hexameric) species as compared to the unfolded (monomer)
species - which in our nomenclature can be labelled [d] and [D] for concentrations of monomeric peptides
containing respectively AspØ
or Asp residues.
Figure 6: The 64 possible arrangements of parallel hexamers with peptides containing either a neutral
AspØ
(red) or negatively charged Asp (blue) at the 24th residue. The number of unique arrangements
is reduced to 13 (grey) by charge symmetry in both polar and longitudinal axes - figure reproduced
from [19]
7
2.2 Two dimensional, regular geometric model
2.2 Two dimensional, regular geometric model
For each of these 13 unique charge combinations we can derive an approximation the electrostatic poten-
tial, due to aspartate residues, by considering point charges positioned in sequence on the vertices of a
regular hexagon. For example, species ‘DDDdDd’ equates to the charge sequence [−1, −1, −1, 0, −1, 0]
and would be interpreted by a hexagonal arrangement of point charges with same sequence of values
around the vertices. Using this simple approach, we imagine taking a two-dimensional slice across the
central axis of the hexamer, isolating the electrostatically active region, providing an approximation of
how the electric potential due to aspartate deprotonation affects the potential energy of the hexamer.
In this greatly simplified model we assume that the separation of charged sites is constrained within a
single plane, with positions fixed at the vertices of a regular hexagon and therefore ignore the potential
for the peptide backbones to flex in response to opposing electrostatic forces within the channel. As
a first approximation, we therefore expect the predicted electric potential energy of a given species to
represent an upper bound, or an overestimate of the actual molecular energy.
The total electric potential of multiple point charges is found by application of Coulomb’s law (in SI
units), calculated as the sum over all combinations of pairwise interactions between sites i ↔ j separated
by distance rij with i, j = 1..6:
UE =
1
2
1
4πε0εr
n
i=1
n
j=i
qiqj
rij
(6)
where ε0 and εr are respectively the permittivity of free space and the relative permittivity (dielectric
constant) of the medium. The distance parameters rij are found using simple geometry (see figure 2.2)
considering a regular hexagon with three unique inter-vertex distances, expressed in terms of the dis-
tance between adjacent vertices R, obtained from XRD crystal structure data for L24D. The molecular
representation of the XRD data in figure 7 highlights the apsartate residues and the position of the
side-chains within the hexamer channel from which the six inter-aspartate distances are measured. From
this data the adjacent vertex distance is estimated to be R ≈ 4.9˚A.
The energy levels of the charged hexamer species, calculated from (6) with dielectric εr = 1, are plotted
in figure 10 in order of increasing energy. Precise energy values for each species can be found in appendix
A. As expected the overall energy increases with the number of charged sites, where the energy of all
species with charge Q have higher energy than those with charge < Q. For species with the same overall
charge (Q = 3, 4, 5), it is also found the energy increases across each grouping in a consistent way, from
lower energy in alternating charge arrangements to highest energy when charges are clustered together.
In this fixed arrangement of charges on a regular polygon, it is also noted that the ordering and relative
separation of the energy levels are invariant of the distance of the distance R.
2.2.1 Relative permittivity of the hexamer channel
The value of the relative (static) permittivity εr is particularly significant in the calculation of elec-
trostatic energy, in this context expressing the amount to which charged interactions are screened by
polarisable molecules occupying the channel of the hexamer. With a diameter of the order 10˚A, allowing
for flexibility of the backbone structure, it is thought fairly likely that sufficient solvent molecules (water:
εr ≈ 80) are able to penetrate the channel and thus act to screen charge interactions with a dielectric
1 < εr < 80. When considering the possible applications of the de novo CC-Hex protein, an ability to
estimate the dielectric constant within the channel is highly desirable if we are to predict the possible
8
2.2 Two dimensional, regular geometric model
Figure 7: Representation of a slice through the L24D hexamer, indicating the positions and approximate
distances between deprotonated aspartate side-chains surrounding the central pore. Distance between
adjacent asparate resonances R ≈ 4.9˚A - image created with PyMol from XRD data
chemical interactions which can take place in the hexamer pore. The dielectric constant is particularly
difficult to measure experimentally, with most methods instead relying on fitting experimental data to
statistical models or by solving an appropriately parameterised Poisson’s equation [20]. For this reason,
the inclusion of the dielectric constant to moderate the effect of charge interactions, is crucial to the
parameterisation of subsequent unfolding models and the comparison to the experimental data. As a
first approximation, the set of idealised species energy levels Uv = {U1..UN }, calculated under vacuum
conditions in the absence of explicit solvents (εr = 1), can be reduced by a factor εr ∈ R+
Us =
Uv
εr
(7)
producing a set of ‘solvated’ energy levels Us moderated by the dielectric medium. In reality, the in-
creased energy due to proximal charges may lead to flexing of the hexamer and variation of the channel
pore size, leading to more complex non-linear relationship. However, given the rigid backbone models
described in this section, we assume the formula provided in (7) where the value of εr provides a linear
scaling parameter of the relative energy levels for each charged hexamer species.
9
2.2 Two dimensional, regular geometric model
Figure 8: Geometric calculation of inter-
vertex distances on a regular hexagon - adja-
cent vertices separated by distance r12 = R
with opposite vertices separated by r14 =
2R.
Interior angle α is calculated by formula for
a regular n-polygon (n = 6):
∠(α) = π(n − 2)/n = 2π/3
⇒ ∠(β) = π − ∠(α)/2 = π/6.
Finally, application of the sine rule yields
the distance between alternating vertices
r13 = 2R sin(2π/3).
10
2.3 Rigid backbone energy minimisation
2.3 Rigid backbone energy minimisation
In the following discussion, the energy landscape of the charged hexamer species is refined by using a
more comprehensive energy calculation forcefield, applied to an artificially constructed hexamer fragment
with a cross section centred on the aspartate ring. Extending the rigid two dimensional representation in
the previous model, the charged asparte groups are now considered in their specific local context. Using
a model with this intermediate complexity allows us to explore a more detailed energy landscape by
specifying a single degree of movement: evaluating the energetics as the aspartate side-chain groups are
rotated around the dihedral (torsion) angle χ2, defined in figure 9. By considering the energy minima,
we can estimate how charging of aspartate residues affects their relative (ensemble average) orientation
within the confinement of the hexamer channel, comparing the resulting electrostatics with values cal-
culated using the simple geometric model.
Figure 9: (left) Molecular representation of the artifically constructed hexamer channel fragment -
species DdDdDd, with interior ring of AspØ
/ Asp residues (thick sticks) at position 24, and isoleucine
residues (thin lines) at positions 20 and 27. (right inset) a single Asp residue indicating the allowed
torsion angle χ2
Included in the hexamer fragment description are the isoleucine residues located at positions 20 and 27,
one turn of the helix before and after the aspartate at position 24, with side-chains oriented into the
central channel above and below the aspartate ring. The fragment therefore only contains the two most
proximal residue ‘rings’ on either side of the aspartate groups which provide the strongest influence on
their native conformation in the context of the hexamer, after that of the charged sites themselves. After
stripping out all but three residues (Ile-Asp-Ile) from each helix in the hexamer XRD data, 13 separate
fragments were prepared in silico, corresponding to the different charged species configurations. For each
of these fragments, the six aspartate residues were then modified to reflect either the neutral AspØ
or
charged Asp as per figure 5, according to the 13 unique charge sequences formalised in (5).
Each species fragment was analysed with the Discover (Accelrys) molecular mechanics (MM) software
using a consistent valence force-field model to calculate the intramolecular energy in terms of bond en-
11
2.3 Rigid backbone energy minimisation
ergy and components for non-bonded Van der Waals and Coulomb (electrostatic) energies. Finally, the
energy landscape of each hexamer fragment was probed using an automated process to systematically
sample χ2 angle permutations with increasing resolution. Fragments were first sampled with an angle
resolution of ±24◦
across the full range [−180◦
, +180◦
], a total of 156
∼ 11.4 million permutations. The
minimum energy conformations for each species were then used as mid-point values m, to initialise a
second higher resolution calculation to sample angle conformations in the range [m − 12◦
, m + 12◦
] with
±2◦
precision, a further 126
∼ 3 million permutations. The full torsion and energy data produced by
these calculations can be found in appendix B
Having located the conformational energy minima of each fragment in this regime, the relative energies
of each species due to all non-bonded (i.e. electrostatic and VdW) interactions are isolated and rescaled
with respect to a background provided by the neutral (d6
≡ dddddd) hexamer. The energy levels for each
species fragment are shown in figure 10, where they are compared with those calculated from the simple
hexagonal coulomb model with R = 4.9˚A. Having found the most energetically favourable aspartate side
chain rotamers for all AspØ
/Asp residues, it was found that minimised energy levels almost exactly
match those derived by the simple geometric model, given a constant value of R estimated from the same
initial XRD data. The resulting atomic coordinate data for each energy minimised species fragment in-
dicated that charged site positions can still be very well approximated by a planar, regular hexagon and
this the geometric model should indeed be expected to yield very similar energies. Of course any large
deviations from this regular arrangement are restricted due to the rigid placement of all other atoms in
the model, and crucially the peptide backbone itself.
12
2.3Rigidbackboneenergyminimisation
Figure 10: Modelled energy landscape of the L24D hexamer - (blue squares) electrostatic energy calculated using geometric Coulomb model, (grey circles) non-
bonded energy calculated using hexamer fragment molecular mechanics with energy minimised Asp χ2 torsion angles. Bar plot indicates the absolute difference
between energy models.
13
A statistical approach to model hexamer unfolding
3 A statistical approach to model hexamer unfolding
The following section addresses the problem of modelling the the unfolding behaviour of the hexamer
as a function of pH, given that we have obtained approximations of the charged energy landscape. The
model described uses a statistical approach to derive the concentrations of each hexamer species as frac-
tions of the ensemble of possible neutral and charged peptide states. By calculation of the ratio of folded
(hexamer) species to the unfolded (single peptide) species, an unfolding curve similar to those found in
figure 3 can therefore be computed numerically, across a range of pH, and compared to experimental data.
By assuming a Boltzmann distribution of states across all folded hexamers and unfolded single peptides,
the individual species concentrations can be calculated based on two experimentally observed properties
of the (neutral) L24D hexamer: the folding free energy ∆Gf defining the energetic stability of the
hexamer; and the intrinsic (aspartate) pKa of the L24D peptide defining the balance between neutral
and charged species according to the pH. The third important variable which is considered is the dielectric
constant εr, the variation of which acts to modulate, or smooth the relative electrostatic energy levels.
Using these three parameters, we can quantify the effects of each variable on the resulting concentrations
produced by the model.
3.1 Folding free energy
As with all physical systems, biological molecules including proteins, seek to achieve a minimum of free
energy. More specifically, the Gibbs free energy is the chemical potential which is minimised at the point
when a system reaches an equilibrium with its environment at constant temperature and pressure. Al-
ternatively the free energy of a molecule or protein represents an amount of energy available to produce
thermodynamic work in a fully reversible reaction. For stable proteins therefore, the free energy of a
protein at equilibrium is negative, implying a spontaneous reaction causing the protein to fold into the
native state with minimum energy. To produce a conformational change in the structure of the protein, a
sufficient amount of energy must be added to drive the reaction in the other direction. Correspondingly,
if this additional energy is removed, the reaction is reversed and the protein returns in favour to the
native conformation.
The electrostatic energy levels which have been computed for the charged hexameric states have so far
been calculated with reference to a neutral background, increasing from zero. The electrostatic energy
however is only one contribution to the energy of a folded hexamer, with additional components due
to covalent and hydrogen bonding; along with hydrophobic and Van der Waals interactions. Assuming
that the protonation of L24D aspartate residues produces only significant variation in the electrostatic
contribution, the energy levels computed can be combined with the background energy equal to the total
free folding energy of the ‘parent’ hexamer - ∆Gf . A full normalisation of the energy levels with these
assumptions, including the effect of a dielectric medium becomes:
∆G(i) =
Uv(i)
εr
+ ∆Gf (8)
where ∆G(i) and is the renormalised folding energy of charged hexamer species i, given the computed
electrostatic energy Uv(i).
Having introduced the L24D folding free energy in the form given above in (8), computed vacuum energy
levels are subsequently parameterised by two quantities: the dielectric εr and the folding free energy ∆Gf .
Whilst a useful estimate for the dielectric constant is not immediately available, the value of ∆Gf has
14
3.2 Intrinsic pKa of L24D
been estimated with urea denaturing experiments by Chi [19] with a value of ∆Gf ≈ −46kcal mol−1
3.2 Intrinsic pKa of L24D
The significance of the aspartate pKa has already been discussed in section 2, representing the reaction
equilibrium constant between neutral protonanted and charged deprotonated states. An estimated value
and the correct parameterisation of the pKa is therefore required in order to predict how the relative
concentrations of charged species changes as a function of pH. A precise value of pKa = 3.86 for aspartic
acid is well documented, for an individual molecule in isolation. Although this value is useful for a
first approximation, it is understood that the pKa is dependent on numerous factors due to the specific
structural context. Experimental techniques such as H-NMR are able to determine pKa values; as
well as empirical methods, for example those offered by computational tools such as pKaTool [21] and
ProPKA [22]. For the purposes of this investigation, the intrinsic pKa is explored as a fundamental
system parameter, in conjunction with values for εr and ∆Gf , such that specific model predictions of
hexamer folding can be explored in terms of each quantity.
3.3 Mathematical model
In the following mathematical description, the notation already developed is used to identify the 13
different charged hexamer species with combinations of 6 letters of either AspØ
- d’s and Asp - D’s,
and isolated (unfolded) neutral d or charged D. For every L24D peptide, the reaction between either
state is written as
d D + H+
(9)
with a reaction quotient Q given by the ratio of reactants to products
Q =
[H+
] [D]
[d]
(10)
The general relation between the Gibbs free-energy ∆G of the reaction at any moment in time, and the
standard-state free-energy (∆G◦
) is given by
∆G = ∆G◦
+ kBT ln(Q) (11)
where the force driving the reaction is by definition zero (∆G = 0) at equilibrium when Q = K, thus
0 = ∆G◦
+ kBT ln(K) (12)
3.3.1 Expressing the pKa in terms of free energy
Using this general equilibrium equation, a formula for the rate constant K can be derived in terms of the
specific reaction in (10) and the standard-state energy, expanded in terms of the reaction components
∆G◦
= G(D) + G(H+
) − G(d)
K =
[H+
] [D]
[d] eq
= exp −
1
kBT
G(D) + G(H+
) − G(d) (13)
At this stage, the pH of the environment can be included explicitly by taking the natural logarithm and
using the substitution pH = − ln[H+
]/(ln 10).
15
3.3 Mathematical model
ln
[D]
[d]
= (ln 10)pH −
1
kBT
G(D) + G(H+
) − G(d) (14)
The sum of the unknown free energy components can thus finally be expressed in terms of the intrinsic
pKa which is itself a function of the pH given the concentrations of d and D as follows
pH = pKa + ln
[D]
[d]
/ ln 10 (15)
resulting in the following expression
G(D) + G(H+
) − G(d) = kBT(ln 10)pKa (16)
3.3.2 Concentration of a general configuration
Using the expressions developed so far, each of the 13 charged hexamer species are now considered to
have a generalised configuration “Dxdy” containing x = [1..6] number of d (AspØ
) peptides with the
remaining y (= 6 − x) many D (Asp ) peptides. Using this notation, we can formalise a charge neutral
equilibrium between 6 unfolded neutral peptides, and the different folded hexamers
(x + y)d Dxdy + xH+
(17)
noting that the number of negatively charged Asp peptides in the hexamer are balance by an equal
number of H+
ions. Following the same reasoning as before in (9) to (13), we can therefore assume the
following approximate reaction rate equation:
[Dxdy]i[H+
]x
[d]x+y
= Ωi exp −
1
kBT
G(Dxdy)i + xG(H+
) − (x + y)G(d)
= Ωi exp −
1
kBT
G(Dxdy)i − xG(D) − yG(d) + x G(D) + G(H+
) − G(d) (18)
where the factor Ωi is an entropy term equal to the number of ways in which species (macrostates)
[Dxdy]i can be realised if we consider all indistinguishable microstates, as described in figure 6. For
example there are Ω = 6 ways of combining charges to create species Dddddd, whereas only Ω = 2 ways
of permuting sites to form species DdDdDd.
Using the formula expansion above, appropriate free energy component terms can be collected such that
the rate equation can be expressed in terms of the folding free energies ∆Gi of the i’th hexamer species.
Crucially, these are the values which have been calculated using the electrostatic models and normalised
to the reference folding energy of the neutral hexamer as per (8). Applying the following substitution
∆Gi = G(Dxdy)i − xG(D) − yG(d) (19)
and replacing the remaining energy components using the expressions for the pKa derived in (16), we
can now rewrite the rate equation as
[Dxdy]i[H+
]x
[d]x+y
= Ωi exp
−∆Gi
kBT
− x (ln 10) pKa (20)
To complete the description, the concentration of hydrogen ions [H+
] can be replaced using the definition
of pH to obtain
16
3.3 Mathematical model
[Dxdy]i = [d]6
Ωi exp
−∆Gi
kBT
+ x (ln 10) (pH − pKa) (21)
This last formula (21) therefore provides the key expression for calculating the concentration of the i’th
hexamer species [Dxdy]i in terms of neutral unfolded peptide concentration [d], as a function of pH and
the intrinsic pKa. In order to find particular solutions for a given species however, a fixed reference
concentration is required i.e. the conserved total concentration of L24D:
[L24D] = [d] + [D] + [hexamers] (22)
The conservation equation above can be rewritten solely in terms of concentration [d], firstly by substi-
tuting for [D] using (15)
[D] = [d]10(pH−pKa)
(23)
and secondly by replacing the total concentration of folded species using the key formula (21) - taking
the sum over i = [1..13] with corresponding energies ∆Gi
[hexamers] = [d]6
13
i=1
Ωi exp −
∆Gi
kBT
+ (xi ln 10)(pH − pKa) (24)
With these substitutions, the conservation equation in (22) becomes a 6th order polynomial in terms of
[d] of the form
α = [d] + β[d] + γ[d]6
(25)
where {α, β, γ} ∈ R+
are constants which are known (α) or can be calculated directly (β, γ) for any
value of pH ∈ R : {0...14} - outlined below.
• α = [L24D] - the total, conserved concentration of L24D, where a specific experimental value can
be used
• β = 10(pH−pKa)
- a factor relating the proportions of unfolded peptides [d] (neutral), and [D]
(charged) depending on the enviroment pH and the intrinsic pKa of the monomeric peptide
• γ is the summation of all exponential components, including the entropy factor Ωi, of hexamer
species from (24), given the energy level of each species ∆Gi and the total number of negatively
charged components of each species xi = {1, .., 6}
The general equation we need to solve is, replacing the neutral peptide concentration [d] with x is written
as
γx6
+ (β + 1) x − α = 0 (26)
Crucially, for any given pH, this equation can now be solved for the concentration [d], from which we
can then to calculate the remaining concentration fractions composed of species [D], via (23) and each
of the folded hexameric species via (21). A final hurdle remains in that the high order of this polynomial
prevents us from simply obtaining a tractable analytic solution. Instead we can employ a standard
numerical method to evaluate the roots, specifically the Newton-Raphson method which is discussed in
the following section [3.4].
17
3.4 Newton-Raphson method
3.4 Newton-Raphson method
The standard Newton-Raphson method can be employed to find successively more accurate solutions xn
to a function x : f(x) = 0, defined over the real numbers, via the following iterative procedure, given an
initial first guess x0 for a value of the root
xn+1 = xn −
f(xn)
f (xn)
(27)
Providing that we begin with an initial guess xo close enough to the root value, with the extra condition
that f (xo) = 0, each iteration of the method produces a value of xn typically closer to the real root
with approximately quadratic convergence for well-behaved functions. For a full pedagogical review of
the Newton-Raphson method, its applications and limitations, the reader is referred to Burden & Faires
- Numerical Analysis [23].
An explicit requirement in the formulation of this method is that we can fully calculate the derivative
of the function f . Fortunately, the conservation equation which we need to solve in (26) is indeed both
continuous and easily differentiable, where f is found using simple calculus as shown below.
f(x) = γx6
+ (β + 1) x − α
⇒ f (x) = 6γx5
+ β + 1 (28)
For any given value of the pH and energy levels ∆Gi, we are therefore able to calculate a numerical
approximation to the concentration x = [d] to arbitrary precision, halting the iterative process when the
absolute difference value of successive approximations falls below a given threshold ε. To prevent the
possible occurrence of an infinite loop, a maximum number of iterations is prescribed such that if [d]
cannot be found to the required precision within a nmax iterations, the algorithm outputs an ‘error-value’
or flag to unambiguously report that a suitable solution was not found.
Figure 11: General behaviour of function f(x) = γx6
+(β + 1) x−α, indicating a single unique solution
in the (shaded) x positive region
18
3.4 Newton-Raphson method
By considering the general properties of the function f(x), shown in figure 11, we are reassured that a
unique solution can be found using this method, given that we are interested only in positive solutions
of x corresponding to a physical concentration [d]. As coefficients α, β and γ are all positive real num-
bers, f(x) is a monotonically increasing function with a single positive root, identified in the shaded
region of figure 11. We can therefore choose a first guess value x0 for every calculation within the range:
0 ≤ xo ≤ [L24D], i.e. positive and bounded by the total concentration of all species, with confidence
that [d] can be found within a finite number of iterations. A sensible value for x0 is thus always chosen to
be at the midpoint where x0 = [L24D]/2. To obtain sufficiently accurate solutions given potentially very
small species concentrations, a value of ε = 10−20
was chosen. Subsequently for all parameterisations
of the model examined in this research, the N-R procedure requires less than nmax = 200 iterations to
obtain the desired precision ε over the entire pH range.
The essential steps of the Newton-Raphson method described are summarised in the following pseu-
docode, generating increasingly more accurate solutions to the the value of x, given the functions f(x)
and f (x):
x0 = [L24D]/2
for n = 1 → nmax do
xn ← xn−1 − f(xn−1)
f (xn−1)
if abs (xn − xn−1) < ε then
return xn
else if n = nmax then
return errorvalue
end if
end for
19
Results and Discussion
4 Results and Discussion
As a first analysis of the model, the L24D species concentrations are computed using a set of basis pa-
rameters, chosen to provide an initial comparison with the experimental unfolding data supplied by [19].
Total L24D concentration: [L24D] = 100µM [19]
Background folding free energy: ∆Gf = −46 kcal.mol−1
[19]
Intrinsic pKa(D): pKa = 3.9 (value for isolated aspartic acid)
dielectic constant: εr = 10 (first approximation)
Temperature: T = 20◦
(293.15 K)
The primary input to the calculation is the set of hexamer species energy level values, where the energy
minimised fragment model values have been used for the remainder of this analysis. The energy levels
are subsequently reshaped according to the parameters ∆Gf and εr, before being used to calculated
the respective species concentrations using the method described in the previous section. Using the
numerical approach described it is therefore possible to compute the individual concentrations at any
given pH across a specified range with arbitrary resolution. The data in figure 12 shows the results of
the model using given the basis parameters, plotting the concentration of each species as a function of
pH, calculated in 0.1 unit intervals.
Figure 12: Unfolded and folded L24D species concentrations computed for a 100µM solution with
parameters pKa = 3.9, ∆Gf = −46kcal.mol−1
, εr = 10, T = 20◦
The data in figure 12 shows the predicted concentrations of the two unfolded peptides [d], [D] along with
the 13 folded hexamers species. Shown at this scale however, only the 4 hexamer species with lowest
energies have concentrations more over 1µM across the pH range. The vanishingly small concentrations
of other hexamer species can be compared by plotting the log-concentration, shown in appendix C.
At very low pH the concentration of all neutral hexamer [dddddd] dominates, as expected given the stable
folding energy in a regime well below the pKa where deprotonation of any of the aspartate residues is
unlikely. As the pH is increased, the concentration of the singly charged [Dddddd] hexamer increases in
exact anti-phase to the decreasing neutral hexamer concentration. Since the electrostatic energy is un-
affected by an isolated charge site, the two hexamers have approximately equivalent folding free energy,
however taking into account the Ω = 6 ways in which the latter hexamer can be constructed, versus
20
Results and Discussion
the single neutral construction, the concentration equilibrium between the two species has pH just below
(apx. −5
6 units) the intrinsic pKa specified .
As the pH increases beyond the pKa, the concentration of the singly charged hexamer continues to rise
as that of the neutral ‘parent’ diminishes towards zero. The decreasing background concentration of
H+
ions acts a proton sponge, promoting the deprotonation of aspartate residues. With the equilibrium
tipped increasingly favour of proton disassociation however, at pH ∼ 5 the concentration of Dddddd
hexamers begins to decline as low concentrations (< 20µM) of hexamers with two charged sites are
produced. Consistent with the energy level ordering found in the previous models, concentrations of the
lower energy DddDdd hexamers exceed those of the less energetically favourable DdDddd and DDdddd
species; the latter of which is not detectable on the scale of this plot. At the same pH where we see pro-
duction of doubly charged hexamers, there is a steady increase in concentration of unfolded [D] charged
peptides which rapidly saturates as the concentration of folded hexamers falls to zero above pH ∼ 7.5.
In terms of the folding behavour of the L24D protein, the data calculated for this parameterisation
(∆Gf = −46, εr = 10) indicates that beyond this pH value, the packing of charged residues within a
folded hexamer becomes energetically less favourable than for production of isolated [D] peptides. It is
also clear from these preliminary results that, for this trial parameterisation, the predicted concentrations
of hexamers with more than 3 charged residues are vanishingly small and statistically very unlikely to
exist.
Unlike the numerical model which is able to compute concentration data for individual species, it is very
difficult to experimentally discriminate these concentrations uniquely in solution. Helicity measurements
obtained via CD spectroscopy can however be directly compared to the numerical data by considering
the ratio of unfolded/folded concentrations [d] + [D] : [hexamers]. Specifically, the experimental helicity
data in figure 3 can normalised between 0 (fully unfolded) and 1 (fully folded), and compared with the
numerical unfolded ratio as shown in figure 13 below.
Figure 13: Unfolding curve indicating the relative proportion of folded (0) and unfolded (1) species.
(Blue) Model data computed using the basis parameters pKa = 3.9, ∆Gf = −46kcal.mol−1
, εr = 10,
T = 20◦
. (Black) normalised CD helicity data from [19]
The shape of the (blue) unfolding curve, produced from the numerical concentration data, clearly mirrors
the results already discussed. The smooth, alternate mixing of two folded hexamers with pH < 5 domi-
21
4.1 Intrinsic aspartate pKa dependence
nates the concentration, indicated by a flat zero region of the unfolding curve at this pH. At intermediate
pH (5-7) doubly charged hexamers are present in measurable concentrations, however with the energetics
of these state rapidly becoming unfavourable, a steep inflection is observed with saturation of unfolded
peptides above pH ≈ 7.5.
The general shape of the unfolding curve, a sigmoid with a steep inflection zone between pH=4.5-7 is
comparable to the normalised helicity data, plotted with black squares in figure 13. However, the nu-
merical model produces a curve with an ‘apparent’ pKa ≈ 6.8, measured at the centre of the inflection,
which is shifted by approximately 1.6 pH units from the experimental value of 5.2, and by 2.9 pH units
from the specified intrinsic value of 3.9. It also appears that the rate of unfolding with respect to pH
above the apparent pKa is more than observed experimentally, producing a visibly steeper portion of
the curve in this region.
Given that the experimental species components are inaccessible, the next phase of the analysis concerns
how the shape of the unfolding curve varies with respect to the three main input parameters ∆Gf , εr
and the intrinsic Asp pKa.
4.1 Intrinsic aspartate pKa dependence
The intrinsic pKa unfolding dependence is studied by computing the model unfolding curve as before,
for a range of pKa values whilst keeping other parameters set at the default values provided earlier. The
unfolding curves produced for pKa = {2, 3, 4, 5} are shown in figure 14 below. The corresponding species
concentration plots can also be found in appendix D.1.
Figure 14: Unfolding curves for pKa = {2, 3, 4, 5} with fixed parameters ∆Gf = −46kcal.mol−1
,
εr = 10, T = 20◦
- compared to experimental L24D folding curve
The pKa dependence observed is a precisely linear translation of the unfolding curve, shifting the ap-
parent pKa in direct proportion to the intrinsic value with a fixed difference of (apparent-intrinsic) =
2.9 pH units. Given that the intrinsic pKa defines the equilibrium between neutral and charged Asp
residues, with all other parameters fixed, transition equilibria between all species plotted in appendix
D.1 are shifted linearly by ∆pKa. In physical terms, a higher pKa means that deprotonation occurs more
22
4.2 Folding free energy ∆Gf dependence
favourably at a correspondingly higher pH and thus, the overall transition between folded and unfolded
states is shifted by the same amount.
In comparison to the experimental data, it is found that reducing the value of the intrinsic pKa in isola-
tion, provides a better approximation to the observed L24D folding curve. Given the linear dependence
between the intrinsic and apparent pKa values, indicates that pKa ≈ 2.5 provides a best fit to the
apparent pKa = 5.2 observed experimentally.
4.2 Folding free energy ∆Gf dependence
The dependence of the folding free energy of the hexamer is analysed by normalising the species energy
levels to varying background ∆Gf as per (8), prior to calculating species concentrations over the pH
range. A set of unfolding curves with ∆Gf = {−50, −45, −40, −35} are shown in figure 15 below, with
the corresponding species concentration found in appendix D.2.
Figure 15: Unfolding curves for ∆Gf = {−50, −45, −40, −35} with fixed parameters pKa = 3.9,
εr = 10, T = 20◦
- compared to experimental L24D folding curve
A more negative folding free energy indicates a more stable protein fold, such that a greater amount of
energy must is required to overcome the energy associated with the folded conformation of the peptides
in the hexamer. If the folding free energy is less negative (less stable), it is expected that fewer proximal
charged sites can be accommodated within the hexamer due to the increased electrostatic potential. The
species concentration data provided by the model supports this notion, indicating a significant decrease in
the concentration of multiply charged hexamers above the pKa as ∆Gf is made less negative. Below the
intrinsic pKa where the majority of aspartate residues are protonated (neutral) the total concentration
is dominated by the neutral dddddd hexmaer. As the folding energy is made increasingly less negative
however, the proportion of peptides in the unfolded [d] state increases, reflecting the decreased stability
of the folded hexamer. At increased folding energies above ∆Gf ≈ −35kcal.mol−1
the hexameric protein
is insufficiently stable to accommodate more than a single charged residue. In this regime, the apparent
pKa of the unfolding curve begins to rapidly converge on the intrinsic value, representing the equilibrium
between the two allowable and energetically similar folded species. The additional baseline increase of
23
4.3 Dielectric constant εr dependence
unfolded [d] peptides below the pKa leads the ‘lifting and flattening’ of the unfolding curve observed in
figure 15 as ∆Gf is increased.
4.3 Dielectric constant εr dependence
The role of the dielectric constant is analysed in a similar way, fixing the remaining parameters as before,
using values of εr in a prescribed range up to the approximate dielectric constant of water εr ≈ 80. The
computed unfolding curves for εr = {1, 10, 20, 30, 60, 80} are show in figure 16 below and associated
species concentration plots in appendix D.3.
Figure 16: Unfolding curves for εr = {1, 10, 20, 30, 60, 80} with fixed parameters pKa = 3.9, ∆Gf =
−46kcal.mol−1
, T = 20◦
- compared to experimental L24D folding curve
As the dielectric εr is increased above the nominal vacuum value εr = 1, the difference between succes-
sive energy levels Ui is reduced. In physical terms, a dielectric medium acts to shield charge interactions
thus reducing the electrostatic potential incurred by proximal charged residues in the hexamer. Conse-
quently, as the value of εr is increased, the slope of the unfolding curves produced by the model become
increasingly flattened due to relatively higher concentrations of more negatively charged folded species
at a given pH.
Given the basis parameterisation used for this analysis, successive increases in εr modifies the energet-
ically favoured distribution of charged residues in such a way as to produce significant concentrations
of multiply charged hexamer species. With little or no dielectric screening (εr < 10) the concentrations
of hexamer species with more than 2 charged Asp residues are vanishingly dilute. Above εr ≈ 30
however, the model predicts concentrations of hexamer species with 4 charged residues with experimen-
tally significant concentrations. For this reason, as the pH increases, charged residues are favourably
accommodated in hexameric species, delaying the onset of unfolded [D] production resulting in the more
shallow unfolding curves observed in figure D.3. At values close to the εr(water) ≈ 80, the energy
landscape has been smoothed considerably with an energy difference between neutral and fully charged
hexameric species UD6 − Ud6 ∼ 10.5kcal.mol−1
(c.f. ∆Gf = −46kcal.mol−1
). At this ‘extreme’ the
binding of 6 charged residues in a folded hexamer represents a more stable native state than for unfolded
[D] peptides, indicated by the red curve (figure 16) which remains more than 90% folded across the
24
4.4 Parameter space search
entire pH range.
4.4 Parameter space search
Having considered each of the three key model parameters in isolation, a set of unique and physically
consistent effects on the unfolding behaviour has been established. It is also clear that model predictions
assuming experimental parameter values do not reproduce the experimentally observed unfolding data
(see figure 13). The previous analysis can be therefore extended by considering how different parame-
terisations of the model varies the quality of fit with respect to the experimental data. In the following
analysis, the three model parameters pKa, ∆Gf and εr are systematically varied across an appropriate
range and resolution, with the unfolding curve computed for each permutation. Specifically the root-
mean-squared-deviation (rmsd) value for each curve is calculated with respect to the experimental data
points, providing a numerical comparison indicating the quality of fit for each parameter combination.
For each parameter combination β = (pKa, ∆Gf , εr), the unfolded ratio is computed numerically at
each experimental pH value. The rmsd values for each parametrisation β are calculated according to
rmsdβ = 1
n
n
i=1 (yexp − yβ)
2
i , where yexp, yβ are respectively the experimental and numerically pre-
dicted unfolded fractions. The resulting matrix of parameters and associated rmsd values is visualised
by plotting points with (3D) coordinates given by the vector β and with a size according to the reciprocal
of the rmsd. Normalised rmsd values are then used to colour each point across a spectrum from blue
to red with increasing rmsd. The data in figure 17 highlights the results of a systematic search through
35,280 parameter combinations given the following value ranges for each of the three key variables:
Intrinsic Asp pKa: pKa = [2.0, 2.1, 2.2, .., 4]
Free folding energy: ∆Gf = [−50, −49, −48, .., −30]
Dielectric constant: εr = [1, 2, 3, .., 80]
(taking ∼ 1 minute of processing time on on a typical desktop PC)
The resulting ‘4D’ plot in figure 17 indicates a very clear linear interdependence between values of the
intrinsic pKa and ∆Gf , with a proportionality moderated by the value of εr. The relationship between
the pKa and ∆Gf corresponds with the results from earlier analysis in that the apparent pKa shifts
linearly in response to both parameters but in opposition. The resulting interdependence is that as one
variable increases, the value of the other must decrease by an amount proportional to produce a similar
quality of fit to the experimental data.
The low rmsd region of parameter space is explored further by considering parameterisations which
produce rmsd values less than an arbitrary threshold. The interpolated contours in figure 18 show the
strong coupling between pKa and ∆Gf for the 565 parameterisations with rmsd < 0.05. Representing
the search space data in this way shows the best fit region of pKa and ∆Gf parameter space, contoured
in steps of εr = 8. At values of εr < 30 there are low rmsd parameterisations with values of pKa
and ∆Gf linearly distributed across the entire search range. As the value of εr increases above this
however, the low rmsd region shrinks significantly, identifying a more restricted range of both pka and
∆Gf . For example, in the set of parameters considered, with a dielectric εr > 48 there are are only
20 parameterisations which produce an rmsd < 0.05. In this regime it is found that the folding curve,
and hence the rmsd, is much more sensitive to small changes in the folding energy with a much smaller
range of ∆Gf and corresponding pKa values producing good approximations to experimental data. A
25
4.4 Parameter space search
large dielectric results in an energy landscape flattened across all species towards ∆Gf , the value of
which therefore becomes increasingly influential in governing global species concentrations and thus the
unfolding behaviour.
Figure 17: 3-parameter space search: point coordinates indicate parameter combination (pKa, ∆Gf , εr)
with size proportional to the inverse of the rmsd calculated with respect to experimental L24D unfolding
data points. Points are coloured from red to blue by relative rmsd value
Figure 18: 2D (left) and 3D (right) plots for parameterisations with rmsd < 0.05 with respect to L24D
folding data. Parameter space is grid interpolated in terms of pKa and ∆Gf and contoured with respect
to εr. The strongly coupled interdependence of pKa and ∆Gf is evident with a range progressively
restricted by increasing εr.
26
4.4 Parameter space search
4.4.1 Best fit parameters - rmsd minimisation
By selecting the parameter values which minimise the rmsd, we can plot the ‘best fit’ unfolding curve
with a comparison to the normalised L24D helicity data. The systematic parameter search yields best
fit parameters β = (pKa = 2.8, ∆Gf = −41, εr = 37) resulting in the (blue) unfolding curve shown
in figure 19. The predicted unfolding behaviour provides a qualitatively good match to the normalised
experimental L24D data (black squares), with an apparent pKa of both curves ∼ 5.2. More detailed
inspection indicates that, as a result of a balanced (unweighted) fit to the experimental data points, the
unfolding behaviour either side of the apparent pKa is not captured particularly well. With parameters
β , the gradient of the unfolding transition is observed to be somewhat shallower than the experimental
curve, converging to an unfolded fraction of ∼ 2% at low pH; compared to experimental data which
indicates a fully folded (∼ 0%) ensemble.
Figure 19: rmsd minimised ‘best-fit’ unfolding curves compared to experimental L24D data (black
squares). Three curves indicate weighted rmsd minimisation: (blue) - equal weighting, (red) - 5x weight
on first low pH data point, (green) - weighting prioritising three high pH data points
In order to assess how parameter variation can produce a more accurate fit to the low pH region
(pH< 5.2), the rmsd minimisation procedure was repeated using a scheme to give extra weight to
the first experimental data value (pH=3.4); where rmsdβL
= 1
n
n
i=1 wi yexp − yβL
2
i
with weighting
vector w = [.5, .1, .1, .1, .1, .1]. By prioritising the initial ‘baseline’ value in this way, a new parameter
set βL = (pKa = 2.0, ∆Gf = −49, εr = 9) was found using the same parameter search ranges as before.
The corresponding (red) unfolding curve is shown in figure 19. The curve produced with parameter
set β provides a much closer fit to the experimental data with a smaller convergent unfolded fraction
at low pH (< 0.2%) and a congruent transition slope up to ∼ pH 5.5. Above this pH however, the
predicted and experimental curves diverge with a steeper transition gradient predicted than observed
experimentally. Similarly, the higher pH region can be given priority using a different weighting vector,
e.g. w = [0, 0, 0, .25, .5, .25] leading to best fit parameters βH = (pKa = 3.4, ∆Gf = −36, εr = 67) and
the (green) unfolding curve.
In terms of the parameter sets β , βL and βH it is found that the combination of a more negative (stable)
27
4.5 Molecular dynamics
folding energy, higher pKa and a reduced dielectric constant, produces unfolding behaviour which better
matches experimental folding behaviour below the apparent pKa ≈ 5.2. Above the apparent pKa, the
transition gradient is matched more closely with increased (less stable) folding energy and a significantly
higher dielectric constant. Unfortunately, the freedom to chose arbitrary parameter combinations to
produce curves which are compared to only 6 data points, prevents us from making direct predictions for
any individual physical parameter. However, it is encouraging that very close curve fits can be obtained
where all three parameters are found within a physically realistic range.
4.5 Molecular dynamics
To supplement the static (rigid backbone) energy models, a set of 13 molecular dynamics (MD) sim-
ulations were performed with the aim of providing a more accurate description of the folded hexamer
energy landscape. By including explicit solvent molecules (water) in the dynamical simulation, we can
also implicitly include the effects of dielectric screening and thus reduce the number of free parameters of
the model. The full molecular dynamics simulation of the full L24D hexamer, including around 3000 pro-
tein atoms and approximately 92,000 solvent atoms, is computationally very demanding with a limited
simulation time of 20ns for this investigation. The results presented here therefore reveal only a partial
and qualitative review of the simulated structures. Given the relatively short simulation time-frame, it is
not clear whether the any of the test structures reached the energetic equilibrium required to provide a
comparison between rigid model energy landscapes and incorporation into the statistical unfolding model.
Starting from the L24D crystal structure data, 13 hexameric coiled-coil species were prepared in silico,
according to the unique aspartate charge sequences. Each structure was then individually configured
using the energy minimised Asp side-chain rotamer conformations (χ2 angles) provided in appendix B.
Molecular dynamics simulations were carried out using the Gromacs (MPICH) package [24] in conjunc-
tion with the AMBER forcefield and TIP3P water model. Using the Gromacs pre-processing tools, each
protein structure was combined with water solvent molecules, and subject to a preliminary energy min-
imisation with a short time-scale position-restraint MD simulation. Full (unrestrained) MD simulation
was then performed on the University of Bristol’s high-performance Blue Crystal cluster, utilising 32
(dual processor) parallel compute nodes.
During each of the 20ns MD simulations, energy components are recorded every 10ps. The data in table
1 indicates the electrostatic (Coulomb) energy of each structure, referenced to the electrostatic energy
of the neutral (d6
) hexamer, averaged over different simulation time slices (0-1ns, 10-20ns & 0-20ns).
There are a number of key observations, especially with comparison to the rigid backbone model pre-
dictions. We note that although most of the MD energy levels are within the same order of magnitude
as the predicted values, the species ordering of the energy landscape is not conserved. More specifically,
the energy of species with adjacent charges (DDd..) are not strictly greater than those with alternating
charged sites (DdD..), as predicted by the rigid models. Furthermore, the energy for the fully charged D6
hexamer is of intermediate energy, and is less than all Q = −4 species, contrary to previous predictions.
Observation of the corresponding atomic (MD) trajectory data suggest that the reason for this almost
certainly due to the flexibility of the alpha-helix backbone. In all species containing two or more adjacent
charged side-chains, the peptide backbone of one or more of the alpha-helicies was found to unwind such
that the charged Asp residue points outwards, away from the interior channel. The additional potential
energy due to proximal charges is therefore found to be sufficient in some cases to provoke a significant
conformational change such that a lower energy minima is reached. Indeed from observations of the
28
4.5 Molecular dynamics
Species Uc − Uc(dddddd) Uv (MM frag.)
0 - 1 ns 10 - 20 ns 0 - 20 ns
dddddd 0.00 0.00 0.00 0.00
Dddddd 28.44 17.45 20.79 0.00
DddDdd 45.89 21.03 28.20 36.35
DdDddd 25.10 5.98 16.49 41.97
DDdddd 22.47 18.88 24.86 72.69
DdDdDd 93.45 72.90 72.66 125.91
DddDDd 88.67 13.38 38.96 151.01
DDDddd 48.04 59.75 64.05 187.35
DDdDDd 103.01 127.15 115.44 302.01
DDDdDd 119.98 135.28 131.21 307.64
DDDDdd 117.83 129.54 125.00 338.36
DDDDDd 79.59 138.62 128.82 531.34
DDDDDD 98.23 86.52 97.51 797.01
kcal.mol−1
Table 1: MD simulation results: relative electrostatic (Coulomb) energies of the 13 L24D hexamer
species, compared with rigid model vacuum energy approximations Uv computed using rigid backbone
fragment model, with εr = 1
trajectories of all charged species over the 20ns simulation time-frame, it was found that the C-terminus
of the hexamer suffers both extensive fraying (unwinding of coiled-coil) and deformation of the central
channel. Across the species, the structural deformations include flattening, or pinching, of the central
channel; expansion of the channel; and partial or full twisting of peptide backbones to separate adjacent
charged sites near the 24’th residue position. In contrast, the structure of the neutral d6
hexamer main-
tains a stable, regular hexameric coiled-coil structure across the full length of the peptides.
Although the energy trajectories have not yet been fully analysed, the atomic position trajectories suggest
that after 20ns, the simulated hexamers are still far from equilibrium and their native state. In general
however, it is found that the electrostatic energy contribution for each species, averaged over the full sim-
ulation period, is still significantly less than rigid backbone models predict, especially for Q < −3. Given
the observed structural distortions, a large contribution to the energy difference is due to the flexibility of
the peptide backbones allowing an increased separation between like charges. Additional charge shielding
is also expected due to the explicit inclusion of dielectric water molecules. However, an unexpected de-
viation from this pattern is found, noting the energy of the singly charged hexamer (Dddddd) where full
protein MD simulation indicates a significant electrostatic energy increase (> 20kcal.mol−1
) compared to
the neutral d6
hexamer. The MD data therefore implies that a single Asp residue increases the overall
electrostatic energy by interacting in a way which is not fully captured by either of the simplified models.
Evidence indicating that a singly charged hexamers may provide an additional contribution to the elec-
trostatic energy is particularly significant when we consider the distribution of folded hexamers species
concentrations as a function of pH. The energy landscapes predicted by previous models implied the
existence of two folded hexamer species (dddddd & Dddddd) which are energetically equivalent (or very
close), providing a folded state for charged Asp residues to occupy as the pH increases with little or no
energy penalty. A Dddddd hexamer associated with non-zero energy increase would therefore result in
lower concentrations of that species and a shift in the apparent pKa to lower pH. Although the energy
levels obtained from the MD study are not guaranteed to be fully equilibrated values, a preliminary
analysis can be made using the statistical model developed in this research.
29
4.5 Molecular dynamics
Figure 20: Comparison between unfolding curves generated with different energy landscapes, using
basis parameters pKa = 3.9, ∆Gf = −46: (red) Molecular dynamics energy levels (0-20ns average),
(green) modified MD energy levels with U(Dddddd) set to zero, (blue) rigid fragment model energy levels
(εr = 10).
The data in figure 20 shows the predicted unfolding behaviour using three different energy landscapes and
the basis (experimental) parameters pKa(Asp) = 3.9 and ∆Gf = −46. As expected the (red) unfolding
curve based on UMD, the molecular dynamics energy levels, produces a curve with an apparent pKa
much closer to the experimental curve than when using the rigid (fragment) model energy landscape. If
we modify the UMD energy levels such that U(Dddddd) has zero energy, we obtain the (green) curve, which
shows only a very small deviation from the (blue) rigid fragment model curve. From this comparison, it
is found that the energy difference between the neutral d6
state and the singly charged Dddddd species,
plays a dominant role in the unfolding behaviour of the hexamer.
Using species energy levels (UMD) obtained from the MD simulations (0-20ns average) a best-fit unfold-
ing curve was found with parameters [pKa = 3.0, ∆Gf = −47]. The dielectric parameter, now implicitly
included within UMD, was fixed throughout the search with εr = 1. The resulting parameter set pro-
duces a very close fit with the experimental unfolding curve, and with a folding energy also very close
to the experimental value ∆Gf (L24D) ≈ −46kcal.mol−1
[19]. Further analysis of the concentration
decomposition, shown in figure 21, indicates that the predicted unfolding transition is directly associated
with a transition between concentrations of neutral d6
hexamers to unfolded, charged D peptides as
the pH is increased. Importantly, this suggests that concentrations of all charged hexamer species are
vanishingly small. This result contrasts with predictions made using the rigid (fragment) model energy
landscape (see appendix E) where significant (measurable) concentrations of charged folded species are
predicted. A practical experiment to measure concentrations of stable, folded hexamers with charged
residues, if they exist, would therefore be highly beneficial to support these differing predictions.
30
4.5 Molecular dynamics
Figure 21: (top) best-fit unfolding curve using species energy levels derived from MD simulation (UMD),
(bottom) corresponding hexamer species concentration as a function of pH, indicating only d6
neutral
hexamer in significant concentrations
31
Summary and Conclusions
5 Summary and Conclusions
The main objective of this research has been to construct a variety of models which describe the electro-
static energy and corresponding folding behaviour of the coiled-coil L24D protein as a function of pH.
The assumptions of the model posit that the primary feature of the protein structure which responds to
a change in pH is the ring of polar aspartic acid side-chain groups at the L24 position of each peptide.
As the Asp residues become increasingly deprotonated at higher pH, the side-chains become negatively
charged with the associated electrostatic forces leading to unfolding of the hexamers.
In order to obtain the energy landscape of the 13 uniquely charged hexamer species, two ‘rigid’ elec-
trostatic models have been considered, where the relative position of the peptide bonds are fixed. The
simplest approximation is provided by a geometric Coulomb model which calculates the electrostatic
energy due to charge interactions on the vertices of a regular hexagon, parameterised by a single value
R - the distance between adjacent Asp sites. A second model, using partial atomic structures gained
from XRD experiments, was developed to find energy minimised side-chain conformation angles given
their local context, with electrostatic energies calculated using conventional molecular force-field meth-
ods. Using an inter-vertex distance R estimated from the XRD data, it was found that both the simple
analytical model and the more complex molecular mechanics model, provide an almost identical energy
landscape. The limited side-chain flexibility provided in the second model therefore provides only a very
minor refinement to the simple analytical model.
Having developed a statistical method to compute the concentrations of each hexamer species and un-
folded peptides from the energy landscape, it has been possible to generate unfolding curves which can be
compared to experimental (CD) helicity data for the L24D protein as a function of pH. Using experimen-
tal values for the intrinsic (Asp) pKa = 3.9 and folding free energy ∆Gf = −46kcal.mol−1
, it was found
that the apparent pKa of the predicted unfolding curve, where approximately 50% of the concentration
is unfolded, is displaced by +1.6 pH units from the experimental curve. Subsequent parameter space
searches, with direct comparison to helicity data using rmsd minimisation techniques, indicated that
model predictions using a comparable folding energy (∆Gf ≈ −49) and minimal dielectric screening
(εr ≈ 9) produced unfolding curves which are most consistent with experimental data, however with
an intrinsic pKa ≈ 2.0. The disagreement between this ‘predicted’ pKa value and the known aspartate
value (3.9) could be due to a number of factors which have not been accounted for in the model. For
example, in the derivation of the concentration model, the same value of the intrisic pKa is used in two
different biological contexts. The most obvious contextual difference is found when a pKa value which
denotes the acid disassociation rate constant for aspartate residues in unfolded peptides, i.e. [d] to [D]
transitions, and the value related to charge transitions within the folded hexamer channel, for example
[Dddddd] to [DDdddd]. As currently implemented, the concentration model does not discriminate be-
tween these two values which may indeed differ due to the local environment of the Asp residue. A future
modification to the current model could therefore be to define the pKa for both contexts separately, in-
corporating a fixed intrinsic (unfolded) value and a folded ‘channel context’ value left as a free parameter.
Particularly useful insight has been gained by comparing the energy landscapes predicted via rigid ap-
proximations to those calculated by molecular dynamics simulations of each hexamer species. After 20ns
of relaxation time in explicit water, MD trajectory data suggests that the majority of charged hexamers
suffer significant structural distortions; N-terminus fraying and possible instability as hexameric coiled-
coils at longer time-scales. Overall, the data obtained with MD simulation indicates a much flatter
energy landscape than predicted via rigid models, consistent with the observed structural distortions in
32
Summary and Conclusions
response to proximal charges. Although the MD simulations performed almost certainly do not provide
a fully equilibrated energy landscape, there is good evidence that there is a significant energy difference
between neutral d6
hexamers and singly charged Dddddd species (∼ 20kcal.mol−1
), contrary to rigid
model predictions. The effect of this energy gap is to shift the apparent pKa closer to the experimental
value with a concentration model best-fit search yielding an intrinsic pKa = 3.0 and a folding free energy
∆Gf = −47 - very close to the experimental value (−46kcal.mol−1
).
Although there is still an obvious discrepancy between the intrinsic Asp value and the best-fit pKa,
it is clear that the electrostatic potential difference between d6
and Dddddd plays a dominant role in
the observed unfolding behaviour and should clarified by further experiments in silico. Concentration
predictions using the currently available MD energy landscape indicate that concentrations of all charged
hexamers are vanishingly small across the entire pH range, with unfolding behaviour characterised by a
smooth transition between neutral d6
and unfolded D (charged) peptides. In this respect, it would be
interesting to design a practical experiment which could detect, or even discriminate concentrations of
charged hexamer species in solution. If significant concentrations of these species can be found, it would
clearly cast doubt on the latter prediction.
The models provided in this research provide a clear indication that a statistical approach can indeed
be used make qualitative predictions of the folding dynamics of a complex, oligermerised protein. The
rapid calculation of ensemble microstate concentrations is made possible by assuming an approximate
energy landscape. The comparison between estimated energy landscapes, and those provided by more
complex molecular simulation, suggests however that more accurate energy models may be required to
predict unfolding behaviour comparable to experimental observations.
33
Geometric coulomb model of electrostatics
A Geometric coulomb model of electrostatics
Figure 22: Schematic diagrams of folded hexamer species and associated energy levels (kcal.mol−1
)
calculated by 2D geometric electrostatic model, with radius R = 4.9˚A and dielectric εr = 1. Red vertices
indicate neutral AspØ
residue sites, Blue vertices indicate negatively charged Asp sites.
34
Normalised energy and torsion angles for hexamer species fragments
B Normalised energy and torsion angles for hexamer species
fragments
Species U − Udddddd χ2(A) χ2(B) χ2(C) χ2(D) χ2(E) χ2(F)
(kcal.mol−1
) dihedral angle (±2◦
)
dddddd 0.00 -32 146 -32 146 -32 146
Dddddd -0.57 -26 148 152 150 -34 -26
DddDdd 35.23 -28 150 -34 -28 150 -34
DdDddd 41.52 -26 -40 -30 150 -38 -30
DDdddd 68.18 -28 -38 152 154 -34 -30
DdDdDd 127.43 -30 -42 -30 -42 -32 -42
DddDDd 151.09 -32 154 -34 -32 -40 -44
DDDddd 182.86 -28 -42 -40 154 -36 -32
DDdDDd 299.41 -34 -38 -46 -36 -34 -46
DDDdDd 305.42 -32 -40 -40 -34 -36 -44
DDDDdd 332.40 -30 -40 -42 -40 158 -38
DDDDDd 525.83 -34 -40 -42 -42 -40 -48
DDDDDD 839.36 -40 -42 -42 -42 -42 -42
Table 2: Minimised energy levels and χ2 torsion angles for 13 charge species hexamer fragments.
Minimised energy conformations are calculated to ±2◦
. Energy values due to non-bonded interactions,
referenced to neutral species background.
35
Log-concentration plot for L24D species
C Log-concentration plot for L24D species
Figure 23: Log-concentration plot colour grouped by total charge Q, for unfolded and folded L24D
species - model parameters pKa = 3.9, ∆Gf = −46kcal.mol−1
, εr = 10, T = 20◦
.
36
L24D species concentrations for varying parameter values
D L24D species concentrations for varying parameter values
D.1 Intrinsic aspartate pKa
pKa = 2
pKa = 3
pKa = 4
pKa = 5
Figure 24: Predicted L24D species concentration as a function of pH for varying values of intrinsic
pKa = {2, 3, 4, 5}. ∆Gf = −46kcal.mol−1
, εr = 10, T = 20◦
37
D.2 Folding free energy ∆Gf
D.2 Folding free energy ∆Gf
∆Gf = −50kcal.mol−1
∆Gf = −45kcal.mol−1
∆Gf = −40kcal.mol−1
∆Gf = −35kcal.mol−1
Figure 25: Predicted L24D species concentration as a function of pH for varying values of the hexamer
folding free energy ∆Gf = {−50, −45, −40, −35}. pKa = 3.9, εr = 10, T = 20◦
38
D.3 Dielectric constant (relative permittivity) εr
D.3 Dielectric constant (relative permittivity) εr
εr = 20
εr = 30
εr = 60
εr = 80
Figure 26: Predicted L24D species concentration as a function of pH for varying values of the dielectric
constant εr = {20, 30, 60, 80}. pKa = 3.9, ∆Gf = −46kcal.mol−1
, T = 20◦
39
E Best-fit concentration data (rigid model energy landscape)
Balanced weighting: w = [1, 1, 1, 1, 1, 1] - pKa = 2.8, ∆Gf = −41, εr = 37
low pH weighting: w = [.5, .1, .1, .1, .1, .1] - pKa = 2.0, ∆Gf = −49, εr = 9
Balanced weighting: w = [0, 0, 0, .25, .5, .25] - pKa = 3.4, ∆Gf = −36, εr = 67
Figure 27: Concentration decomposition for parameter search rmsd minimisation ‘best-fit’ unfolding
curves described in figure 19. Each parameter set was found by applying different weighting vector w in
the calculation of the rmsd to prioritise fit to different parts of the unfolding curve (see section 4.4).
References
[1] Xi Liu, K Fan, and W Wang. The number of protein folds and their distribution over families in
nature. Proteins, 54(3):491–9, February 2004.
[2] J T MacDonald et al. De novo backbone scaffolds for protein design. Proteins, 78(5):1311–25, April
2010.
[3] B Kuhlman et al. Design of a novel globular protein fold with atomic-level accuracy. Science (New
York, N.Y.), 302(5649):1364–8, November 2003.
[4] S F Altschul et al. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.
[5] J Gough, K Karplus, R Hughey, and C Chothia. Assignment of homology to genome sequences
using a library of hidden Markov models that represent all proteins of known structure. Journal of
molecular biology, 313(4):903–19, November 2001.
[6] O Rackham et al. The evolution and structure prediction of coiled coils across all genomes. Journal
of molecular biology, 403(3):480–93, October 2010.
[7] T Narumi et al. A 55 TFLOPS Simulation of Amyloid-forming Peptides from Yeast Prion Sup35
with the Special-purpose Computer System MDGRAPE-3. Proceedings of the 2006 ACM/IEEE
conference on Supercomputing, pages 1–13, 2006.
[8] D Shaw et al. Millisecond-Scale Molecular Dynamics Simulations on Anton. Proceedings of the
Conference on High Performance Computing Networking, Storage and Analysis, 2009.
[9] L Pierce et al. Routine Access to Millisecond Time Scale Events with Accelerated Molecular Dy-
namics. Journal of chemical theory and computation, 8(9):2997–3002, September 2012.
[10] N R Zaccai et al. A de novo peptide hexamer with a mutable channel (supplimentary material).
Nature chemical biology, 7(12):935–41, December 2011.
[11] D N Woolfson. The Design of Coiled-Coil Structures and Assemblies. Advances in Protein Chemistry,
70:79–112, 2005.
[12] D N Woolfson. An Introduction to Coiled coils (http://www.lifesci.sussex.ac.uk/research/woolfson/html/).
[13] E K O’Shea et al. X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil.
Science, 254:539–544, 1991.
[14] P Konig and T J Richmond. The X-ray Structure of the GCN4-bZIP Bound to ATF/CREB Site
DNA Shows the Complex Depends on DNA Flexibility. Journal of Molecular Biology, 233(1):139–
154, 1993.
[15] S A Dames et al. NMR structure of a parallel homotrimeric coiled coil. Nat.Struct.Biol, 5:687–691,
1998.
[16] L Pauling, R B Corey, and H R Branson. The structure of proteins: Two hydrogen-bonded helical
configurations of the polypeptide chain. PNAS, 37:205–211, 1951.
[17] F Crick. The packing of α-helices: simple coiled-coils. Acta Crystallographica, 6(8):689–697, Septem-
ber 1953.
[18] N R. Zaccai et al. A de novo peptide hexamer with a mutable channel. Nature chemical biology,
7(12):935–41, December 2011.
[19] H-C Chi. . PhD thesis, University of Bristol, 2012.
[20] J A Ng et al. Estimating the dielectric constant of the channel protein and pore. European biophysics
journal : EBJ, 37(2):213–22, February 2008.
[21] J E Nielsen. Analysing the pH-dependent properties of proteins using pKa calculations. Journal of
molecular graphics & modelling, 25(5):691–9, January 2007.
[22] H Li, A D Robertson, and J H Jensen. Very fast empirical prediction and rationalization of protein
pKa values. Proteins, 61(4):704–21, December 2005.
[23] R L Burden and J D Faires. Numerical Analysis. Thomson, 8 edition, 2005.
[24] B Hess, C Kutzner, D van der Spoel, and E Lindahl. Gromacs 4. J. Chem. Theory Comput.,
4:435–447, 2008.

More Related Content

What's hot

Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular DockingSatarupa Deb
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug designThanh Truong
 
molecular docking
molecular dockingmolecular docking
molecular dockingKOUSHIK DEB
 
Fragment based drug design complementary tool for drug design
Fragment based drug design  complementary tool for drug designFragment based drug design  complementary tool for drug design
Fragment based drug design complementary tool for drug designNIPER hyderabad
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and SimulationsAbhilash Kannan
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modelingpkchoudhury
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsVikram Aditya
 
Introduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsIntroduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsLee Larcombe
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_predictionShwetA Kumari
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS santosh Kumbhar
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
Macromolecular interaction
Macromolecular interactionMacromolecular interaction
Macromolecular interactionCharthaGaglani
 

What's hot (18)

Docking
DockingDocking
Docking
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
Fragment based drug design complementary tool for drug design
Fragment based drug design  complementary tool for drug designFragment based drug design  complementary tool for drug design
Fragment based drug design complementary tool for drug design
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock Tools
 
Introduction to In silico engineering for biologics
Introduction to In silico engineering for biologicsIntroduction to In silico engineering for biologics
Introduction to In silico engineering for biologics
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Molecular Docking
 Molecular Docking Molecular Docking
Molecular Docking
 
Macromolecular interaction
Macromolecular interactionMacromolecular interaction
Macromolecular interaction
 

Viewers also liked

Overcoming racism 2015 debrief
Overcoming racism 2015 debriefOvercoming racism 2015 debrief
Overcoming racism 2015 debriefCatherine Dahlberg
 
E patrika Hindi Dept. Meridian school,Madhapur
E patrika Hindi Dept. Meridian school,MadhapurE patrika Hindi Dept. Meridian school,Madhapur
E patrika Hindi Dept. Meridian school,MadhapurAnju Dubey
 
Ir reverse engineering (1)
Ir reverse engineering (1)Ir reverse engineering (1)
Ir reverse engineering (1)raptor0102
 
"Emolectura: vivir el instante".
"Emolectura: vivir el instante". "Emolectura: vivir el instante".
"Emolectura: vivir el instante". Estado
 
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...ramonaleddy
 
SIGN LANGUAGE IN NIGERIA
SIGN LANGUAGE IN NIGERIASIGN LANGUAGE IN NIGERIA
SIGN LANGUAGE IN NIGERIAItohowo Okon
 
Benefits of quitting
Benefits of quittingBenefits of quitting
Benefits of quittingjayson eliseo
 
Turvallista aikuista etsimässä uusi malli väkivaltaa perheessään kohdanneide...
Turvallista aikuista etsimässä  uusi malli väkivaltaa perheessään kohdanneide...Turvallista aikuista etsimässä  uusi malli väkivaltaa perheessään kohdanneide...
Turvallista aikuista etsimässä uusi malli väkivaltaa perheessään kohdanneide...Lastensuojelun Keskusliitto
 
Offer and acceptance
Offer and acceptanceOffer and acceptance
Offer and acceptanceRenu Verma
 
Buisines model innovation - Is Healthcare falling behind?
Buisines model innovation - Is Healthcare falling behind?Buisines model innovation - Is Healthcare falling behind?
Buisines model innovation - Is Healthcare falling behind?Pamela Spence
 
KERALA 06 Nights / 07 Days
KERALA 06 Nights / 07 Days KERALA 06 Nights / 07 Days
KERALA 06 Nights / 07 Days TravelEzze
 

Viewers also liked (20)

Overcoming racism 2015 debrief
Overcoming racism 2015 debriefOvercoming racism 2015 debrief
Overcoming racism 2015 debrief
 
E patrika Hindi Dept. Meridian school,Madhapur
E patrika Hindi Dept. Meridian school,MadhapurE patrika Hindi Dept. Meridian school,Madhapur
E patrika Hindi Dept. Meridian school,Madhapur
 
Ir reverse engineering (1)
Ir reverse engineering (1)Ir reverse engineering (1)
Ir reverse engineering (1)
 
article_main
article_mainarticle_main
article_main
 
"Emolectura: vivir el instante".
"Emolectura: vivir el instante". "Emolectura: vivir el instante".
"Emolectura: vivir el instante".
 
Salvedades
SalvedadesSalvedades
Salvedades
 
Correo electrónico
Correo electrónicoCorreo electrónico
Correo electrónico
 
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...
Edt 620 inst. appl. technolgy fall 2015 there's an app for that ramona leddy ...
 
малик
маликмалик
малик
 
InStyle_Mucha
InStyle_MuchaInStyle_Mucha
InStyle_Mucha
 
SIGN LANGUAGE IN NIGERIA
SIGN LANGUAGE IN NIGERIASIGN LANGUAGE IN NIGERIA
SIGN LANGUAGE IN NIGERIA
 
Benefits of quitting
Benefits of quittingBenefits of quitting
Benefits of quitting
 
Class and Status
Class and StatusClass and Status
Class and Status
 
Turvallista aikuista etsimässä uusi malli väkivaltaa perheessään kohdanneide...
Turvallista aikuista etsimässä  uusi malli väkivaltaa perheessään kohdanneide...Turvallista aikuista etsimässä  uusi malli väkivaltaa perheessään kohdanneide...
Turvallista aikuista etsimässä uusi malli väkivaltaa perheessään kohdanneide...
 
Play school in delhi
Play school in delhiPlay school in delhi
Play school in delhi
 
Offer and acceptance
Offer and acceptanceOffer and acceptance
Offer and acceptance
 
Inferencias
InferenciasInferencias
Inferencias
 
Buisines model innovation - Is Healthcare falling behind?
Buisines model innovation - Is Healthcare falling behind?Buisines model innovation - Is Healthcare falling behind?
Buisines model innovation - Is Healthcare falling behind?
 
KERALA 06 Nights / 07 Days
KERALA 06 Nights / 07 Days KERALA 06 Nights / 07 Days
KERALA 06 Nights / 07 Days
 
ism certificates
ism certificatesism certificates
ism certificates
 

Similar to Final report - Adam Zienkiewicz

Protdock - Aatu Kaapro
Protdock - Aatu KaaproProtdock - Aatu Kaapro
Protdock - Aatu KaaproSwapnesh Singh
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
利用分子動力學電腦模擬研究聚穀氨醯胺及胰
利用分子動力學電腦模擬研究聚穀氨醯胺及胰利用分子動力學電腦模擬研究聚穀氨醯胺及胰
利用分子動力學電腦模擬研究聚穀氨醯胺及胰Hsin-Lin Chiang
 
Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Spencer Bliven
 
Photophysics of dendrimers colombi
Photophysics of dendrimers   colombiPhotophysics of dendrimers   colombi
Photophysics of dendrimers colombiGiorgio Colombi
 
Jacob Kleine undergrad. Thesis
Jacob Kleine undergrad. ThesisJacob Kleine undergrad. Thesis
Jacob Kleine undergrad. ThesisJacob Kleine
 
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...Sayeed Ali
 
Larry O'Connell - Thesis
Larry O'Connell - ThesisLarry O'Connell - Thesis
Larry O'Connell - ThesisLarry O'Connell
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryLee Larcombe
 
Geldenhuys model 20418
Geldenhuys model 20418Geldenhuys model 20418
Geldenhuys model 20418hajarchokri1
 
Submitted Report Final Draft
Submitted Report Final DraftSubmitted Report Final Draft
Submitted Report Final DraftOwen Walton
 

Similar to Final report - Adam Zienkiewicz (20)

Protdock - Aatu Kaapro
Protdock - Aatu KaaproProtdock - Aatu Kaapro
Protdock - Aatu Kaapro
 
Richard Allen Thesis
Richard Allen ThesisRichard Allen Thesis
Richard Allen Thesis
 
thesis.compressed
thesis.compressedthesis.compressed
thesis.compressed
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Proquest Burye Thesis
Proquest Burye ThesisProquest Burye Thesis
Proquest Burye Thesis
 
利用分子動力學電腦模擬研究聚穀氨醯胺及胰
利用分子動力學電腦模擬研究聚穀氨醯胺及胰利用分子動力學電腦模擬研究聚穀氨醯胺及胰
利用分子動力學電腦模擬研究聚穀氨醯胺及胰
 
Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]
 
Photophysics of dendrimers colombi
Photophysics of dendrimers   colombiPhotophysics of dendrimers   colombi
Photophysics of dendrimers colombi
 
Jacob Kleine undergrad. Thesis
Jacob Kleine undergrad. ThesisJacob Kleine undergrad. Thesis
Jacob Kleine undergrad. Thesis
 
NMacgearailt Sumit_thesis
NMacgearailt Sumit_thesisNMacgearailt Sumit_thesis
NMacgearailt Sumit_thesis
 
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...
Structural and Functional Analysis of Conserved Amino Acid Residues in Phosph...
 
CADD
CADDCADD
CADD
 
Larry O'Connell - Thesis
Larry O'Connell - ThesisLarry O'Connell - Thesis
Larry O'Connell - Thesis
 
DevanBicherThesis
DevanBicherThesisDevanBicherThesis
DevanBicherThesis
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
M.tech Thesis
M.tech ThesisM.tech Thesis
M.tech Thesis
 
Geldenhuys model 20418
Geldenhuys model 20418Geldenhuys model 20418
Geldenhuys model 20418
 
Diploma Thesis
Diploma ThesisDiploma Thesis
Diploma Thesis
 
Submitted Report Final Draft
Submitted Report Final DraftSubmitted Report Final Draft
Submitted Report Final Draft
 
replicación 4.pdf
replicación 4.pdfreplicación 4.pdf
replicación 4.pdf
 

Final report - Adam Zienkiewicz

  • 1. Bristol Centre for Complexity Science Modelling the Folding and Stability of de novo Hexameric Coiled-Coils Adam Zienkiewicz Supervisors: Prof. Derek Wolfson (University of Bristol) Dr. Richard Sessions (University of Bristol) Prof. Noah Linden (University of Bristol) A dissertation submitted to the University of Bristol in accordance with the requirements of the degree of Master of Research by advanced study in Complexity Science in the Faculty of Engineering Submitted: 7 January 2013 9,926 words
  • 2.
  • 3. Abstract The rational de novo design of proteins is making progressive inroads into expanding the huge variety of intricate structures already provided by nature. The recent synthesis of a novel six-helix coiled-coil CC-Hex represents an entirely new protein fold, possessing an internal pore with a va- riety of potential bioengineering applications. Subtle modification to the peptide sequence results in subdivision of the central channel and gives rise to conformational sensitivities with respect to ambient environment. Although the substitution of hydrophobic leucine residues with polar aspar- tate residues is accommodated within the channel of the hexamer, the stability of the structure is compromised with increasing pH. A major step in understanding the chemistry and future applica- tions of this unique protein is to be able explain and predict the observed folding behaviour. The inherent complexities of current ab initio approaches to protein structure prediction are reduced by considering only the dominant physiochemical interactions, specific to the L24D aspartate hexamer mutant. Using a statistical description of transitional hexameric microstates, combined with electro- static energy models, we describe a new approach to predicting the macroscopic unfolding behaviour of the L24D hexamer. The models described in this study present the foundations of a potentially powerful new tool in the field of protein structure prediction.
  • 4.
  • 5. Acknowledgements This work was supported by the Engineering and Physical Science Research Council (EPSRC) [Grant number: EP/I013717/1].
  • 6.
  • 7. Author declaration I declare that the work in this dissertation was carried out in accordance with the requirements of the University’s Regulations and Code of Practice for Taught Postgraduate Programmes and that it has not been submitted for any other academic award. Except where indicated by specific reference in the text, this work is my own work. Work done in collaboration with, or with the assistance of others, is indicated as such. I have identified all material in this dissertation which is not my own work through appropri- ate referencing and acknowledgement. Where I have quoted from the work of others, I have included the source in the references/bibliography. Any views expressed in the dissertation are those of the author. SIGNED: (Signature of student) DATE:
  • 8.
  • 9. Contents 1 Introduction 1 2 pH dependence of hexamer oligomerisation 4 2.1 Modelling the hexamer energy landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Two dimensional, regular geometric model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Relative permittivity of the hexamer channel . . . . . . . . . . . . . . . . . . . . . 8 2.3 Rigid backbone energy minimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 A statistical approach to model hexamer unfolding 14 3.1 Folding free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Intrinsic pKa of L24D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Expressing the pKa in terms of free energy . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 Concentration of a general configuration . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Results and Discussion 20 4.1 Intrinsic aspartate pKa dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Folding free energy ∆Gf dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Dielectric constant εr dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 Parameter space search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4.1 Best fit parameters - rmsd minimisation . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Summary and Conclusions 32 A Geometric coulomb model of electrostatics 34 B Normalised energy and torsion angles for hexamer species fragments 35 C Log-concentration plot for L24D species 36 D L24D species concentrations for varying parameter values 37 D.1 Intrinsic aspartate pKa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 D.2 Folding free energy ∆Gf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 D.3 Dielectric constant (relative permittivity) εr . . . . . . . . . . . . . . . . . . . . . . . . . . 39 E Best-fit concentration data (rigid model energy landscape) 40
  • 10.
  • 11. Introduction 1 Introduction The de novo design of peptides and proteins is a rapidly developing approach for investigating and extending the huge natural repertoire of protein structures and functions. The combinatorially huge number of amino acid sequences is limited to a greatly reduced subset of those which will fold quickly and uniformly to a single native state. Through the processes of evolution by natural selection, nature has explored a vast number of these sequences, devising proteins which interact within a vast web of biological process inside every living organism. It is well understood however, that the structural space which has been explored by nature forms only a very limited fraction of potential protein structures and functions [1]. The development of de novo proteins is therefore fundamental to increasing our under- standing of protein structures, with the potential to present novel scaffolds with functional properties of interest to a wide variety of biotechnological applications. The development of synthetic proteins is a diverse field, with a variety of techniques used to extend the boundaries of natural structures. Some methods have successfully focussed on replicating specific protein features, with subtle modifications to known architectures in order to produces novel protein folds. Other approaches include generating novel structures in silico [2], however a major impediment to the success- ful implementation of these designs is our limited understanding of the amino acid sequence-to-structure relationship, generally known as the ‘protein folding problem’. In very few cases, the rational or de novo design of proteins has been successful with the help of structure prediction algorithms which circumvent the application complex ab initio computational techniques [3]. For example, the sequence-to-structure relationships of certain structural motifs are well established due to their ubiquity in natural proteins. By compiling databases which classify proteins into ‘families’ based on conserved amino acid sequences, a wide variety of bioinformatic techniques have been developed with the aim of predicting secondary and higher order protein structures [4–6]. These techniques, known as homology methods, rely on ex- perimentally determined structure/sequence data, combined with the development of sophisticated and efficient algorithms used to compare and evaluate new sequence data. Unlike ab initio methods however, which attempt to combine physically realistic forcefields with efficient methods to sample conformational folding space in silico, homology methods are limited to known structural families and offer limited solu- tions with regard to predicting and designing de novo proteins. Current ab initio methods are however computationally intensive [7–9] and rely on incomplete or approximate physiochemical forcefields. The focus of this research is to identify properties of a de novo protein which allow us to develop new methods to predict folding behaviour with respect to specifically designed features of the protein. In particular, we consider the pH response and structural stability of a novel six-helix coiled-coil protein by developing a unique statistical approach to predict protein folding behaviour. The inherent complexities of standard ab initio methods are avoided by exploiting the highly symmetric structure of certain pro- teins such as coiled-coils, shedding light on new techniques which may be applied more broadly to tackle the protein folding problem. This bold statement is somewhat justified in that coiled-coils, comprised of a bundle of two or more alpha-helices, represent on average 3% of all protein-encoding regions across all known genomes [10]. The universality of the coiled-coil motif, representing one of the simplest tertiary protein structures, therefore provides a very useful example for protein folding and design investiga- tions [11,12]. Coiled-coils are highly symmetric bundles of alpha-helices ranging from between 2-5 strands, with natu- ral proteins favouring most commonly dimeric and trimeric coils, wound together like a strand of rope. Coiled-coil regions are found in proteins involved in biologically important functions with great diversity 1
  • 12. Introduction Figure 1: Cartoon of NMR structure of the trimeric coiled-coil domain of chicken cartilage matrix protein (matrilin-1) [PDB: 1AQ5] - taken from [15] from gene transcription to regulating muscle proteins - examples include the dimerisation region of the ‘leucine zipper’ (GCN4) [13,14]; and the trimeric coiled-coil domain of a cartilage protein matrilin-1 [15] - shown in figure 1. Crucially for the purposes of protein design, there is a well established sequence- to-structure relationship which drives the folding and assembly of all coiled-coils, taking the form of a repeated heptad of hydrophobic and polar residues (HPPHPPP)n. Originally proposed by Pauling [16], the sequence repeat encodes amphipathic helices which interact via spiralling hydrophobic seams to form a helical bundle. The seminal work of Crick in 1953 [17] later verified this elegant relationship, sup- plemented with description of the highly specific helix side-chain interactions known as knobs-into-holes (KIH) packing. Most recently, the consideration of these hallmark attributes has allowed the rational de novo design of peptides which oligamerise into stable, six-helix (hexameric) coiled-coils, representing a novel, stand-alone protein fold not found in nature [18]. The models developed in this research focus on the de novo protein CC-Hex, a six-helix coiled-coil designed and synthesised by the Woolfson group at the University of Bristol [18]. Unlike previously ob- served coiled-coils, comprised of 5 or fewer component helices, CC-Hex possesses the remarkable feature of a central channel running the length of the structure, approximately 6˚A in diameter. Early indications from X-ray diffraction experiments (XRD) and electron density analysis suggest the presence of a chain of water molecules occupying the channel. Whether or not the pore is strictly water permeable is still debated, however the existence of a well defined channel provides strong motivation for the rational design of tubular or ion-channel proteins. The additional feature of CC-Hex, providing a crucial basis for the models developed in this study, is the mutability of the leucine residue at position 24 on the peptide, accepting polar residues such as aspartic acid (D) and histidine (H) - where the amino-acid sequences for CC-Hex and the respective CC-Hex-D24 and CC-Hex-H24 mutants can be found in figure 2. By crystallising the stable hexamers at low pH, the mutant structures were analysed with XRD to reveal a partitioning of the central channel at the polar residue with a large interior chamber and a smaller chamber at the N-terminus (shown in figure 4 in the following chapter). As a result of these single residue substitutions however, the helical folding of the mutant peptide was found to be almost completely compromised at neutral ambient pH, unlike the parent hexamer. 2
  • 13. Introduction Figure 2: Amino-acid sequences for de novo hexameric coiled-coil protein CC-Hex (red) and L24 mutants. (blue) The CC-Hex-D24 (L24D) mutant protein - [18] The design and synthesis of the novel hexameric protein provides an exciting new scaffold with a variety of functional applications. As a stand-alone structure possessing a well defined and potentially mutable internal pore, CC-Hex presents novel opportunities for the development of drug delivery applications, ion-channels and other membrane spanning proteins. It is important therefore that we explore how the chemistry of the internal channel, via the different mutant forms, affects the overall structure and stability of the protein. The key objective of this research is to therefore to provide analytical models which can explain and predict the folding behaviour and stability of the hexamer by highlighting the dominant physical interactions between local structures. Specifically, in this research we consider the folding behaviour of the aspartate CC-Hex-D24 mutant (L24D) protein with respect to experimental observations. 3
  • 14. pH dependence of hexamer oligomerisation 2 pH dependence of hexamer oligomerisation The mathematical models developed in this research are constructed in order to reproduce and quantify the experimentally observed unfolding of the L24D hexamer as a function of solution pH. In particular the experimentally observed helicity of the L24D mutant, unlike the parent CC-Hex, responds significantly with respect to the background pH in solution. Circular dichroism (CD) spectroscopic experiments pro- vide a direct measure of the bulk helicity of peptides in solution by measuring the relative directional (L/R) absorption of circularly polarised light. This helicity data can then be used to infer the structure and conformation of proteins, where the detected signal indicates the overall helicity due to secondary, and higher order structural features. For example, the signal detected from a solution of oligermerised coiled-coils, combines components due both to the alpha-helical backbones (secondary structure) and the (tertiary) helical coil of the multiple peptide backbones. Using helicity data gathered from these experiments (figure 3) Zaccai & Chi et al. [18] inferred the denaturing, or unfolding, the L24D hexamer as they varied the pH of the solution - observing a transition from fully folded hexamers below pH 3, to less than 20% folded above pH 7.4. Figure 3: Helicity of hexamer mutants as judged by the CD signal at 222nm as function of pH at 20◦ C. Plots indicate realtive folding of L24D (red diamonds), L24H (blue squares) and 1:1 mixture of L24D:L24H (purple circles) in solution. Note that the parent CC-Hex (not shown) did not show any appreciable changes in helicity or stability of this range of pH - data from [10,19] The central mechanism affecting the stability of the L24D hexamer mutant, addressed in the following sections, is thought to be primarily due to the repulsive electrostatic forces produced by proximal charged acid groups in the hexamer channel, modulated by the pH of the background solution. The activity of solvated hydrogen ions, in solution with the individual peptides, directly influences the electrostatic forces between the individual peptide coils via the association / disassociation of hydrogen ions with the polar side-chains of the aspartic acid (Asp) residues. In the context of the fully folded hexamer, the helical backbones are oriented such that the Asp side-chains, located at residue 24, are directed towards the inside the central channel running down the length of the hexameric coiled-coil, shown in figure 4. 4
  • 15. pH dependence of hexamer oligomerisation Figure 4: (left) Cartoon representation of the L24D hexamer looking down the central axis. The helical peptide backbones (ribbons) are shown with the six aspartate residue side-chains (sticks) oriented inwards towards the channel (≈ 6˚A in diameter). (right) Cutaway profile of hexamer with aspartate ‘ring’ identified within grey section - images created with PyMol from XRD data (PDB id: 3R46) Crucial to the stability of the hexamer, the functional acid groups of the individual aspartate residues respond to the pH of environment by accepting or donating a single hydrogen ion (H+ ) with the effect of modifying the net charge of the side-chain. More specifically, the aspartate residues are protonated, or deprotonated, through the equilibrium reaction HD D− + H+ (1) where D represents an aspartic acid residue. The equilibrium constant, or acid disassociation constant, for this process is defined in the usual way, as the quotient of the concentration of products and reactants Ka = [D− ][H+ ] [HD] (2) Due to the many orders of magnitude spanned by Ka, we can take the negative logarithm to obtain the intrinsic pKa of the residue, defined as the pH for which the titrating site is 50% occupied, or protonated pKa = − log10 Ka (3) The reaction between an aspartic acid and its deprotonated, or conjugate base state, is shown in figure 5. When the pH of the solution is equal to the pKa, we expect to find approximately equal proportions of both species, by definition. As the pH is varied, the concentration of solvated hydrogen ions changes, according to pH = − log[H+ ] (4) For example, as the pH increases (more basic) the concentration of H+ ions in solution decreases, acting as a ‘hydrogen sponge’, tipping the equilibrium in favour of the base residue. In this scenario, deproto- nation of the aspartic acid side-chain leads to a chemical resonance, in which some electrons, previously shared covalently between a hydrogen and oxygen atom, become de-localised resulting in a negatively charged site at the location indicated in figure 5. Conversely at low pH, the balance is favoured in the other direction where an increased concentration of available H+ ions are ‘captured’ by otherwise 5
  • 16. 2.1 Modelling the hexamer energy landscape deprotonated base residues. It is this transition, between neutral, protonated aspartate groups at low pH, to negatively charged deprotonated states at higher pH, which causes variability in the net charge of the hexamer as a function of pH. Moreover the proximity of multiple charged sites on the aspartate side-chain, buried within the interior channel, is expected to provide sufficient electrostatic repulsion to overcome the folding free energy of the hexamer itself, leading to the partial or complete unfolding of the protein. Figure 5: Acid disassociation reaction between neutral aspartate (AspØ ) (left), and negatively charged conjugate (Asp ) (right) 2.1 Modelling the hexamer energy landscape In the following sections we outline a simple geometric model of charge distribution within the hexamer channel, which is then compared and validated with results using more a detailed molecular mechanics forcefield. Once a consistent model describing the hexamer energy landscape is established, the princi- ples of equilibrium thermodynamics can then be applied to derive the relative concentrations of folded and unfolded states. Prior to exploring a detailed statistical approach in the next chapter, it is per- tinent to develop simple models to predict the varying electrostatic energy of L24D with regard to its specific folded conformation as a hexameric coiled-coil. From this foundation we can construct models of increasing complexity, providing testable predictions of the folding behaviour and therefore valuable information on the functional chemical properties of this novel protein. As the background pH is increased the equilibrium between neutral, protonated aspartate residues (AspØ ) shifts in favour of the negatively charged, deprotonated (Asp ) resonance. When considering the ensem- ble of all peptides in solution, we therefore predict that the number of charged aspartate residues will increase as a function of pH. At the microscopic level however, individual hexamers form a discrete dis- tribution of uniquely charged states. By virtue of the regular structure of the folded hexameric peptides, we are able to identify each of the possible charged ‘species’ and propose a simple model to approximate their electrostatic contribution. Given that each of the six aspartate residues within a hexamer can exist in either a neutral (AspØ ) or charged (Asp ) state, there are exactly 26 = 64 possible configurations of charges. However, when considering a parallel homo-oligamerization of indistinguishable peptides, the number of unique arrange- ments of charged sites is reduced by symmetry to a total of 13 distinct species, identified with grey shading in figure 6. To differentiate between the different charged hexamer species, a shorthand nomen- clature is introduced designating each neutral AspØ with lower-case d and charged Asp with upper-case D, given below for each of the 13 configurations: 6
  • 17. 2.1 Modelling the hexamer energy landscape Q = 0 : dddddd Q = −1 : Dddddd Q = −2 :    DDdddd DdDddd DddDdd Q = −3 :    DDDddd DddDDd DdDdDd Q = −4 :    DDDDdd DDDdDd DDdDDd Q = −5 : DDDDDd Q = −6 : DDDDDD (5) where the groupings Q indicate the total charge in natural (proton) units from fully neutral to a full com- plement of negative charges Q = −6. By considering the energetics of each charged species with respect to the native folding energy of the hexamer, we can develop a statistical model allowing us to predict the relative concentrations of each of the folded (hexameric) species as compared to the unfolded (monomer) species - which in our nomenclature can be labelled [d] and [D] for concentrations of monomeric peptides containing respectively AspØ or Asp residues. Figure 6: The 64 possible arrangements of parallel hexamers with peptides containing either a neutral AspØ (red) or negatively charged Asp (blue) at the 24th residue. The number of unique arrangements is reduced to 13 (grey) by charge symmetry in both polar and longitudinal axes - figure reproduced from [19] 7
  • 18. 2.2 Two dimensional, regular geometric model 2.2 Two dimensional, regular geometric model For each of these 13 unique charge combinations we can derive an approximation the electrostatic poten- tial, due to aspartate residues, by considering point charges positioned in sequence on the vertices of a regular hexagon. For example, species ‘DDDdDd’ equates to the charge sequence [−1, −1, −1, 0, −1, 0] and would be interpreted by a hexagonal arrangement of point charges with same sequence of values around the vertices. Using this simple approach, we imagine taking a two-dimensional slice across the central axis of the hexamer, isolating the electrostatically active region, providing an approximation of how the electric potential due to aspartate deprotonation affects the potential energy of the hexamer. In this greatly simplified model we assume that the separation of charged sites is constrained within a single plane, with positions fixed at the vertices of a regular hexagon and therefore ignore the potential for the peptide backbones to flex in response to opposing electrostatic forces within the channel. As a first approximation, we therefore expect the predicted electric potential energy of a given species to represent an upper bound, or an overestimate of the actual molecular energy. The total electric potential of multiple point charges is found by application of Coulomb’s law (in SI units), calculated as the sum over all combinations of pairwise interactions between sites i ↔ j separated by distance rij with i, j = 1..6: UE = 1 2 1 4πε0εr n i=1 n j=i qiqj rij (6) where ε0 and εr are respectively the permittivity of free space and the relative permittivity (dielectric constant) of the medium. The distance parameters rij are found using simple geometry (see figure 2.2) considering a regular hexagon with three unique inter-vertex distances, expressed in terms of the dis- tance between adjacent vertices R, obtained from XRD crystal structure data for L24D. The molecular representation of the XRD data in figure 7 highlights the apsartate residues and the position of the side-chains within the hexamer channel from which the six inter-aspartate distances are measured. From this data the adjacent vertex distance is estimated to be R ≈ 4.9˚A. The energy levels of the charged hexamer species, calculated from (6) with dielectric εr = 1, are plotted in figure 10 in order of increasing energy. Precise energy values for each species can be found in appendix A. As expected the overall energy increases with the number of charged sites, where the energy of all species with charge Q have higher energy than those with charge < Q. For species with the same overall charge (Q = 3, 4, 5), it is also found the energy increases across each grouping in a consistent way, from lower energy in alternating charge arrangements to highest energy when charges are clustered together. In this fixed arrangement of charges on a regular polygon, it is also noted that the ordering and relative separation of the energy levels are invariant of the distance of the distance R. 2.2.1 Relative permittivity of the hexamer channel The value of the relative (static) permittivity εr is particularly significant in the calculation of elec- trostatic energy, in this context expressing the amount to which charged interactions are screened by polarisable molecules occupying the channel of the hexamer. With a diameter of the order 10˚A, allowing for flexibility of the backbone structure, it is thought fairly likely that sufficient solvent molecules (water: εr ≈ 80) are able to penetrate the channel and thus act to screen charge interactions with a dielectric 1 < εr < 80. When considering the possible applications of the de novo CC-Hex protein, an ability to estimate the dielectric constant within the channel is highly desirable if we are to predict the possible 8
  • 19. 2.2 Two dimensional, regular geometric model Figure 7: Representation of a slice through the L24D hexamer, indicating the positions and approximate distances between deprotonated aspartate side-chains surrounding the central pore. Distance between adjacent asparate resonances R ≈ 4.9˚A - image created with PyMol from XRD data chemical interactions which can take place in the hexamer pore. The dielectric constant is particularly difficult to measure experimentally, with most methods instead relying on fitting experimental data to statistical models or by solving an appropriately parameterised Poisson’s equation [20]. For this reason, the inclusion of the dielectric constant to moderate the effect of charge interactions, is crucial to the parameterisation of subsequent unfolding models and the comparison to the experimental data. As a first approximation, the set of idealised species energy levels Uv = {U1..UN }, calculated under vacuum conditions in the absence of explicit solvents (εr = 1), can be reduced by a factor εr ∈ R+ Us = Uv εr (7) producing a set of ‘solvated’ energy levels Us moderated by the dielectric medium. In reality, the in- creased energy due to proximal charges may lead to flexing of the hexamer and variation of the channel pore size, leading to more complex non-linear relationship. However, given the rigid backbone models described in this section, we assume the formula provided in (7) where the value of εr provides a linear scaling parameter of the relative energy levels for each charged hexamer species. 9
  • 20. 2.2 Two dimensional, regular geometric model Figure 8: Geometric calculation of inter- vertex distances on a regular hexagon - adja- cent vertices separated by distance r12 = R with opposite vertices separated by r14 = 2R. Interior angle α is calculated by formula for a regular n-polygon (n = 6): ∠(α) = π(n − 2)/n = 2π/3 ⇒ ∠(β) = π − ∠(α)/2 = π/6. Finally, application of the sine rule yields the distance between alternating vertices r13 = 2R sin(2π/3). 10
  • 21. 2.3 Rigid backbone energy minimisation 2.3 Rigid backbone energy minimisation In the following discussion, the energy landscape of the charged hexamer species is refined by using a more comprehensive energy calculation forcefield, applied to an artificially constructed hexamer fragment with a cross section centred on the aspartate ring. Extending the rigid two dimensional representation in the previous model, the charged asparte groups are now considered in their specific local context. Using a model with this intermediate complexity allows us to explore a more detailed energy landscape by specifying a single degree of movement: evaluating the energetics as the aspartate side-chain groups are rotated around the dihedral (torsion) angle χ2, defined in figure 9. By considering the energy minima, we can estimate how charging of aspartate residues affects their relative (ensemble average) orientation within the confinement of the hexamer channel, comparing the resulting electrostatics with values cal- culated using the simple geometric model. Figure 9: (left) Molecular representation of the artifically constructed hexamer channel fragment - species DdDdDd, with interior ring of AspØ / Asp residues (thick sticks) at position 24, and isoleucine residues (thin lines) at positions 20 and 27. (right inset) a single Asp residue indicating the allowed torsion angle χ2 Included in the hexamer fragment description are the isoleucine residues located at positions 20 and 27, one turn of the helix before and after the aspartate at position 24, with side-chains oriented into the central channel above and below the aspartate ring. The fragment therefore only contains the two most proximal residue ‘rings’ on either side of the aspartate groups which provide the strongest influence on their native conformation in the context of the hexamer, after that of the charged sites themselves. After stripping out all but three residues (Ile-Asp-Ile) from each helix in the hexamer XRD data, 13 separate fragments were prepared in silico, corresponding to the different charged species configurations. For each of these fragments, the six aspartate residues were then modified to reflect either the neutral AspØ or charged Asp as per figure 5, according to the 13 unique charge sequences formalised in (5). Each species fragment was analysed with the Discover (Accelrys) molecular mechanics (MM) software using a consistent valence force-field model to calculate the intramolecular energy in terms of bond en- 11
  • 22. 2.3 Rigid backbone energy minimisation ergy and components for non-bonded Van der Waals and Coulomb (electrostatic) energies. Finally, the energy landscape of each hexamer fragment was probed using an automated process to systematically sample χ2 angle permutations with increasing resolution. Fragments were first sampled with an angle resolution of ±24◦ across the full range [−180◦ , +180◦ ], a total of 156 ∼ 11.4 million permutations. The minimum energy conformations for each species were then used as mid-point values m, to initialise a second higher resolution calculation to sample angle conformations in the range [m − 12◦ , m + 12◦ ] with ±2◦ precision, a further 126 ∼ 3 million permutations. The full torsion and energy data produced by these calculations can be found in appendix B Having located the conformational energy minima of each fragment in this regime, the relative energies of each species due to all non-bonded (i.e. electrostatic and VdW) interactions are isolated and rescaled with respect to a background provided by the neutral (d6 ≡ dddddd) hexamer. The energy levels for each species fragment are shown in figure 10, where they are compared with those calculated from the simple hexagonal coulomb model with R = 4.9˚A. Having found the most energetically favourable aspartate side chain rotamers for all AspØ /Asp residues, it was found that minimised energy levels almost exactly match those derived by the simple geometric model, given a constant value of R estimated from the same initial XRD data. The resulting atomic coordinate data for each energy minimised species fragment in- dicated that charged site positions can still be very well approximated by a planar, regular hexagon and this the geometric model should indeed be expected to yield very similar energies. Of course any large deviations from this regular arrangement are restricted due to the rigid placement of all other atoms in the model, and crucially the peptide backbone itself. 12
  • 23. 2.3Rigidbackboneenergyminimisation Figure 10: Modelled energy landscape of the L24D hexamer - (blue squares) electrostatic energy calculated using geometric Coulomb model, (grey circles) non- bonded energy calculated using hexamer fragment molecular mechanics with energy minimised Asp χ2 torsion angles. Bar plot indicates the absolute difference between energy models. 13
  • 24. A statistical approach to model hexamer unfolding 3 A statistical approach to model hexamer unfolding The following section addresses the problem of modelling the the unfolding behaviour of the hexamer as a function of pH, given that we have obtained approximations of the charged energy landscape. The model described uses a statistical approach to derive the concentrations of each hexamer species as frac- tions of the ensemble of possible neutral and charged peptide states. By calculation of the ratio of folded (hexamer) species to the unfolded (single peptide) species, an unfolding curve similar to those found in figure 3 can therefore be computed numerically, across a range of pH, and compared to experimental data. By assuming a Boltzmann distribution of states across all folded hexamers and unfolded single peptides, the individual species concentrations can be calculated based on two experimentally observed properties of the (neutral) L24D hexamer: the folding free energy ∆Gf defining the energetic stability of the hexamer; and the intrinsic (aspartate) pKa of the L24D peptide defining the balance between neutral and charged species according to the pH. The third important variable which is considered is the dielectric constant εr, the variation of which acts to modulate, or smooth the relative electrostatic energy levels. Using these three parameters, we can quantify the effects of each variable on the resulting concentrations produced by the model. 3.1 Folding free energy As with all physical systems, biological molecules including proteins, seek to achieve a minimum of free energy. More specifically, the Gibbs free energy is the chemical potential which is minimised at the point when a system reaches an equilibrium with its environment at constant temperature and pressure. Al- ternatively the free energy of a molecule or protein represents an amount of energy available to produce thermodynamic work in a fully reversible reaction. For stable proteins therefore, the free energy of a protein at equilibrium is negative, implying a spontaneous reaction causing the protein to fold into the native state with minimum energy. To produce a conformational change in the structure of the protein, a sufficient amount of energy must be added to drive the reaction in the other direction. Correspondingly, if this additional energy is removed, the reaction is reversed and the protein returns in favour to the native conformation. The electrostatic energy levels which have been computed for the charged hexameric states have so far been calculated with reference to a neutral background, increasing from zero. The electrostatic energy however is only one contribution to the energy of a folded hexamer, with additional components due to covalent and hydrogen bonding; along with hydrophobic and Van der Waals interactions. Assuming that the protonation of L24D aspartate residues produces only significant variation in the electrostatic contribution, the energy levels computed can be combined with the background energy equal to the total free folding energy of the ‘parent’ hexamer - ∆Gf . A full normalisation of the energy levels with these assumptions, including the effect of a dielectric medium becomes: ∆G(i) = Uv(i) εr + ∆Gf (8) where ∆G(i) and is the renormalised folding energy of charged hexamer species i, given the computed electrostatic energy Uv(i). Having introduced the L24D folding free energy in the form given above in (8), computed vacuum energy levels are subsequently parameterised by two quantities: the dielectric εr and the folding free energy ∆Gf . Whilst a useful estimate for the dielectric constant is not immediately available, the value of ∆Gf has 14
  • 25. 3.2 Intrinsic pKa of L24D been estimated with urea denaturing experiments by Chi [19] with a value of ∆Gf ≈ −46kcal mol−1 3.2 Intrinsic pKa of L24D The significance of the aspartate pKa has already been discussed in section 2, representing the reaction equilibrium constant between neutral protonanted and charged deprotonated states. An estimated value and the correct parameterisation of the pKa is therefore required in order to predict how the relative concentrations of charged species changes as a function of pH. A precise value of pKa = 3.86 for aspartic acid is well documented, for an individual molecule in isolation. Although this value is useful for a first approximation, it is understood that the pKa is dependent on numerous factors due to the specific structural context. Experimental techniques such as H-NMR are able to determine pKa values; as well as empirical methods, for example those offered by computational tools such as pKaTool [21] and ProPKA [22]. For the purposes of this investigation, the intrinsic pKa is explored as a fundamental system parameter, in conjunction with values for εr and ∆Gf , such that specific model predictions of hexamer folding can be explored in terms of each quantity. 3.3 Mathematical model In the following mathematical description, the notation already developed is used to identify the 13 different charged hexamer species with combinations of 6 letters of either AspØ - d’s and Asp - D’s, and isolated (unfolded) neutral d or charged D. For every L24D peptide, the reaction between either state is written as d D + H+ (9) with a reaction quotient Q given by the ratio of reactants to products Q = [H+ ] [D] [d] (10) The general relation between the Gibbs free-energy ∆G of the reaction at any moment in time, and the standard-state free-energy (∆G◦ ) is given by ∆G = ∆G◦ + kBT ln(Q) (11) where the force driving the reaction is by definition zero (∆G = 0) at equilibrium when Q = K, thus 0 = ∆G◦ + kBT ln(K) (12) 3.3.1 Expressing the pKa in terms of free energy Using this general equilibrium equation, a formula for the rate constant K can be derived in terms of the specific reaction in (10) and the standard-state energy, expanded in terms of the reaction components ∆G◦ = G(D) + G(H+ ) − G(d) K = [H+ ] [D] [d] eq = exp − 1 kBT G(D) + G(H+ ) − G(d) (13) At this stage, the pH of the environment can be included explicitly by taking the natural logarithm and using the substitution pH = − ln[H+ ]/(ln 10). 15
  • 26. 3.3 Mathematical model ln [D] [d] = (ln 10)pH − 1 kBT G(D) + G(H+ ) − G(d) (14) The sum of the unknown free energy components can thus finally be expressed in terms of the intrinsic pKa which is itself a function of the pH given the concentrations of d and D as follows pH = pKa + ln [D] [d] / ln 10 (15) resulting in the following expression G(D) + G(H+ ) − G(d) = kBT(ln 10)pKa (16) 3.3.2 Concentration of a general configuration Using the expressions developed so far, each of the 13 charged hexamer species are now considered to have a generalised configuration “Dxdy” containing x = [1..6] number of d (AspØ ) peptides with the remaining y (= 6 − x) many D (Asp ) peptides. Using this notation, we can formalise a charge neutral equilibrium between 6 unfolded neutral peptides, and the different folded hexamers (x + y)d Dxdy + xH+ (17) noting that the number of negatively charged Asp peptides in the hexamer are balance by an equal number of H+ ions. Following the same reasoning as before in (9) to (13), we can therefore assume the following approximate reaction rate equation: [Dxdy]i[H+ ]x [d]x+y = Ωi exp − 1 kBT G(Dxdy)i + xG(H+ ) − (x + y)G(d) = Ωi exp − 1 kBT G(Dxdy)i − xG(D) − yG(d) + x G(D) + G(H+ ) − G(d) (18) where the factor Ωi is an entropy term equal to the number of ways in which species (macrostates) [Dxdy]i can be realised if we consider all indistinguishable microstates, as described in figure 6. For example there are Ω = 6 ways of combining charges to create species Dddddd, whereas only Ω = 2 ways of permuting sites to form species DdDdDd. Using the formula expansion above, appropriate free energy component terms can be collected such that the rate equation can be expressed in terms of the folding free energies ∆Gi of the i’th hexamer species. Crucially, these are the values which have been calculated using the electrostatic models and normalised to the reference folding energy of the neutral hexamer as per (8). Applying the following substitution ∆Gi = G(Dxdy)i − xG(D) − yG(d) (19) and replacing the remaining energy components using the expressions for the pKa derived in (16), we can now rewrite the rate equation as [Dxdy]i[H+ ]x [d]x+y = Ωi exp −∆Gi kBT − x (ln 10) pKa (20) To complete the description, the concentration of hydrogen ions [H+ ] can be replaced using the definition of pH to obtain 16
  • 27. 3.3 Mathematical model [Dxdy]i = [d]6 Ωi exp −∆Gi kBT + x (ln 10) (pH − pKa) (21) This last formula (21) therefore provides the key expression for calculating the concentration of the i’th hexamer species [Dxdy]i in terms of neutral unfolded peptide concentration [d], as a function of pH and the intrinsic pKa. In order to find particular solutions for a given species however, a fixed reference concentration is required i.e. the conserved total concentration of L24D: [L24D] = [d] + [D] + [hexamers] (22) The conservation equation above can be rewritten solely in terms of concentration [d], firstly by substi- tuting for [D] using (15) [D] = [d]10(pH−pKa) (23) and secondly by replacing the total concentration of folded species using the key formula (21) - taking the sum over i = [1..13] with corresponding energies ∆Gi [hexamers] = [d]6 13 i=1 Ωi exp − ∆Gi kBT + (xi ln 10)(pH − pKa) (24) With these substitutions, the conservation equation in (22) becomes a 6th order polynomial in terms of [d] of the form α = [d] + β[d] + γ[d]6 (25) where {α, β, γ} ∈ R+ are constants which are known (α) or can be calculated directly (β, γ) for any value of pH ∈ R : {0...14} - outlined below. • α = [L24D] - the total, conserved concentration of L24D, where a specific experimental value can be used • β = 10(pH−pKa) - a factor relating the proportions of unfolded peptides [d] (neutral), and [D] (charged) depending on the enviroment pH and the intrinsic pKa of the monomeric peptide • γ is the summation of all exponential components, including the entropy factor Ωi, of hexamer species from (24), given the energy level of each species ∆Gi and the total number of negatively charged components of each species xi = {1, .., 6} The general equation we need to solve is, replacing the neutral peptide concentration [d] with x is written as γx6 + (β + 1) x − α = 0 (26) Crucially, for any given pH, this equation can now be solved for the concentration [d], from which we can then to calculate the remaining concentration fractions composed of species [D], via (23) and each of the folded hexameric species via (21). A final hurdle remains in that the high order of this polynomial prevents us from simply obtaining a tractable analytic solution. Instead we can employ a standard numerical method to evaluate the roots, specifically the Newton-Raphson method which is discussed in the following section [3.4]. 17
  • 28. 3.4 Newton-Raphson method 3.4 Newton-Raphson method The standard Newton-Raphson method can be employed to find successively more accurate solutions xn to a function x : f(x) = 0, defined over the real numbers, via the following iterative procedure, given an initial first guess x0 for a value of the root xn+1 = xn − f(xn) f (xn) (27) Providing that we begin with an initial guess xo close enough to the root value, with the extra condition that f (xo) = 0, each iteration of the method produces a value of xn typically closer to the real root with approximately quadratic convergence for well-behaved functions. For a full pedagogical review of the Newton-Raphson method, its applications and limitations, the reader is referred to Burden & Faires - Numerical Analysis [23]. An explicit requirement in the formulation of this method is that we can fully calculate the derivative of the function f . Fortunately, the conservation equation which we need to solve in (26) is indeed both continuous and easily differentiable, where f is found using simple calculus as shown below. f(x) = γx6 + (β + 1) x − α ⇒ f (x) = 6γx5 + β + 1 (28) For any given value of the pH and energy levels ∆Gi, we are therefore able to calculate a numerical approximation to the concentration x = [d] to arbitrary precision, halting the iterative process when the absolute difference value of successive approximations falls below a given threshold ε. To prevent the possible occurrence of an infinite loop, a maximum number of iterations is prescribed such that if [d] cannot be found to the required precision within a nmax iterations, the algorithm outputs an ‘error-value’ or flag to unambiguously report that a suitable solution was not found. Figure 11: General behaviour of function f(x) = γx6 +(β + 1) x−α, indicating a single unique solution in the (shaded) x positive region 18
  • 29. 3.4 Newton-Raphson method By considering the general properties of the function f(x), shown in figure 11, we are reassured that a unique solution can be found using this method, given that we are interested only in positive solutions of x corresponding to a physical concentration [d]. As coefficients α, β and γ are all positive real num- bers, f(x) is a monotonically increasing function with a single positive root, identified in the shaded region of figure 11. We can therefore choose a first guess value x0 for every calculation within the range: 0 ≤ xo ≤ [L24D], i.e. positive and bounded by the total concentration of all species, with confidence that [d] can be found within a finite number of iterations. A sensible value for x0 is thus always chosen to be at the midpoint where x0 = [L24D]/2. To obtain sufficiently accurate solutions given potentially very small species concentrations, a value of ε = 10−20 was chosen. Subsequently for all parameterisations of the model examined in this research, the N-R procedure requires less than nmax = 200 iterations to obtain the desired precision ε over the entire pH range. The essential steps of the Newton-Raphson method described are summarised in the following pseu- docode, generating increasingly more accurate solutions to the the value of x, given the functions f(x) and f (x): x0 = [L24D]/2 for n = 1 → nmax do xn ← xn−1 − f(xn−1) f (xn−1) if abs (xn − xn−1) < ε then return xn else if n = nmax then return errorvalue end if end for 19
  • 30. Results and Discussion 4 Results and Discussion As a first analysis of the model, the L24D species concentrations are computed using a set of basis pa- rameters, chosen to provide an initial comparison with the experimental unfolding data supplied by [19]. Total L24D concentration: [L24D] = 100µM [19] Background folding free energy: ∆Gf = −46 kcal.mol−1 [19] Intrinsic pKa(D): pKa = 3.9 (value for isolated aspartic acid) dielectic constant: εr = 10 (first approximation) Temperature: T = 20◦ (293.15 K) The primary input to the calculation is the set of hexamer species energy level values, where the energy minimised fragment model values have been used for the remainder of this analysis. The energy levels are subsequently reshaped according to the parameters ∆Gf and εr, before being used to calculated the respective species concentrations using the method described in the previous section. Using the numerical approach described it is therefore possible to compute the individual concentrations at any given pH across a specified range with arbitrary resolution. The data in figure 12 shows the results of the model using given the basis parameters, plotting the concentration of each species as a function of pH, calculated in 0.1 unit intervals. Figure 12: Unfolded and folded L24D species concentrations computed for a 100µM solution with parameters pKa = 3.9, ∆Gf = −46kcal.mol−1 , εr = 10, T = 20◦ The data in figure 12 shows the predicted concentrations of the two unfolded peptides [d], [D] along with the 13 folded hexamers species. Shown at this scale however, only the 4 hexamer species with lowest energies have concentrations more over 1µM across the pH range. The vanishingly small concentrations of other hexamer species can be compared by plotting the log-concentration, shown in appendix C. At very low pH the concentration of all neutral hexamer [dddddd] dominates, as expected given the stable folding energy in a regime well below the pKa where deprotonation of any of the aspartate residues is unlikely. As the pH is increased, the concentration of the singly charged [Dddddd] hexamer increases in exact anti-phase to the decreasing neutral hexamer concentration. Since the electrostatic energy is un- affected by an isolated charge site, the two hexamers have approximately equivalent folding free energy, however taking into account the Ω = 6 ways in which the latter hexamer can be constructed, versus 20
  • 31. Results and Discussion the single neutral construction, the concentration equilibrium between the two species has pH just below (apx. −5 6 units) the intrinsic pKa specified . As the pH increases beyond the pKa, the concentration of the singly charged hexamer continues to rise as that of the neutral ‘parent’ diminishes towards zero. The decreasing background concentration of H+ ions acts a proton sponge, promoting the deprotonation of aspartate residues. With the equilibrium tipped increasingly favour of proton disassociation however, at pH ∼ 5 the concentration of Dddddd hexamers begins to decline as low concentrations (< 20µM) of hexamers with two charged sites are produced. Consistent with the energy level ordering found in the previous models, concentrations of the lower energy DddDdd hexamers exceed those of the less energetically favourable DdDddd and DDdddd species; the latter of which is not detectable on the scale of this plot. At the same pH where we see pro- duction of doubly charged hexamers, there is a steady increase in concentration of unfolded [D] charged peptides which rapidly saturates as the concentration of folded hexamers falls to zero above pH ∼ 7.5. In terms of the folding behavour of the L24D protein, the data calculated for this parameterisation (∆Gf = −46, εr = 10) indicates that beyond this pH value, the packing of charged residues within a folded hexamer becomes energetically less favourable than for production of isolated [D] peptides. It is also clear from these preliminary results that, for this trial parameterisation, the predicted concentrations of hexamers with more than 3 charged residues are vanishingly small and statistically very unlikely to exist. Unlike the numerical model which is able to compute concentration data for individual species, it is very difficult to experimentally discriminate these concentrations uniquely in solution. Helicity measurements obtained via CD spectroscopy can however be directly compared to the numerical data by considering the ratio of unfolded/folded concentrations [d] + [D] : [hexamers]. Specifically, the experimental helicity data in figure 3 can normalised between 0 (fully unfolded) and 1 (fully folded), and compared with the numerical unfolded ratio as shown in figure 13 below. Figure 13: Unfolding curve indicating the relative proportion of folded (0) and unfolded (1) species. (Blue) Model data computed using the basis parameters pKa = 3.9, ∆Gf = −46kcal.mol−1 , εr = 10, T = 20◦ . (Black) normalised CD helicity data from [19] The shape of the (blue) unfolding curve, produced from the numerical concentration data, clearly mirrors the results already discussed. The smooth, alternate mixing of two folded hexamers with pH < 5 domi- 21
  • 32. 4.1 Intrinsic aspartate pKa dependence nates the concentration, indicated by a flat zero region of the unfolding curve at this pH. At intermediate pH (5-7) doubly charged hexamers are present in measurable concentrations, however with the energetics of these state rapidly becoming unfavourable, a steep inflection is observed with saturation of unfolded peptides above pH ≈ 7.5. The general shape of the unfolding curve, a sigmoid with a steep inflection zone between pH=4.5-7 is comparable to the normalised helicity data, plotted with black squares in figure 13. However, the nu- merical model produces a curve with an ‘apparent’ pKa ≈ 6.8, measured at the centre of the inflection, which is shifted by approximately 1.6 pH units from the experimental value of 5.2, and by 2.9 pH units from the specified intrinsic value of 3.9. It also appears that the rate of unfolding with respect to pH above the apparent pKa is more than observed experimentally, producing a visibly steeper portion of the curve in this region. Given that the experimental species components are inaccessible, the next phase of the analysis concerns how the shape of the unfolding curve varies with respect to the three main input parameters ∆Gf , εr and the intrinsic Asp pKa. 4.1 Intrinsic aspartate pKa dependence The intrinsic pKa unfolding dependence is studied by computing the model unfolding curve as before, for a range of pKa values whilst keeping other parameters set at the default values provided earlier. The unfolding curves produced for pKa = {2, 3, 4, 5} are shown in figure 14 below. The corresponding species concentration plots can also be found in appendix D.1. Figure 14: Unfolding curves for pKa = {2, 3, 4, 5} with fixed parameters ∆Gf = −46kcal.mol−1 , εr = 10, T = 20◦ - compared to experimental L24D folding curve The pKa dependence observed is a precisely linear translation of the unfolding curve, shifting the ap- parent pKa in direct proportion to the intrinsic value with a fixed difference of (apparent-intrinsic) = 2.9 pH units. Given that the intrinsic pKa defines the equilibrium between neutral and charged Asp residues, with all other parameters fixed, transition equilibria between all species plotted in appendix D.1 are shifted linearly by ∆pKa. In physical terms, a higher pKa means that deprotonation occurs more 22
  • 33. 4.2 Folding free energy ∆Gf dependence favourably at a correspondingly higher pH and thus, the overall transition between folded and unfolded states is shifted by the same amount. In comparison to the experimental data, it is found that reducing the value of the intrinsic pKa in isola- tion, provides a better approximation to the observed L24D folding curve. Given the linear dependence between the intrinsic and apparent pKa values, indicates that pKa ≈ 2.5 provides a best fit to the apparent pKa = 5.2 observed experimentally. 4.2 Folding free energy ∆Gf dependence The dependence of the folding free energy of the hexamer is analysed by normalising the species energy levels to varying background ∆Gf as per (8), prior to calculating species concentrations over the pH range. A set of unfolding curves with ∆Gf = {−50, −45, −40, −35} are shown in figure 15 below, with the corresponding species concentration found in appendix D.2. Figure 15: Unfolding curves for ∆Gf = {−50, −45, −40, −35} with fixed parameters pKa = 3.9, εr = 10, T = 20◦ - compared to experimental L24D folding curve A more negative folding free energy indicates a more stable protein fold, such that a greater amount of energy must is required to overcome the energy associated with the folded conformation of the peptides in the hexamer. If the folding free energy is less negative (less stable), it is expected that fewer proximal charged sites can be accommodated within the hexamer due to the increased electrostatic potential. The species concentration data provided by the model supports this notion, indicating a significant decrease in the concentration of multiply charged hexamers above the pKa as ∆Gf is made less negative. Below the intrinsic pKa where the majority of aspartate residues are protonated (neutral) the total concentration is dominated by the neutral dddddd hexmaer. As the folding energy is made increasingly less negative however, the proportion of peptides in the unfolded [d] state increases, reflecting the decreased stability of the folded hexamer. At increased folding energies above ∆Gf ≈ −35kcal.mol−1 the hexameric protein is insufficiently stable to accommodate more than a single charged residue. In this regime, the apparent pKa of the unfolding curve begins to rapidly converge on the intrinsic value, representing the equilibrium between the two allowable and energetically similar folded species. The additional baseline increase of 23
  • 34. 4.3 Dielectric constant εr dependence unfolded [d] peptides below the pKa leads the ‘lifting and flattening’ of the unfolding curve observed in figure 15 as ∆Gf is increased. 4.3 Dielectric constant εr dependence The role of the dielectric constant is analysed in a similar way, fixing the remaining parameters as before, using values of εr in a prescribed range up to the approximate dielectric constant of water εr ≈ 80. The computed unfolding curves for εr = {1, 10, 20, 30, 60, 80} are show in figure 16 below and associated species concentration plots in appendix D.3. Figure 16: Unfolding curves for εr = {1, 10, 20, 30, 60, 80} with fixed parameters pKa = 3.9, ∆Gf = −46kcal.mol−1 , T = 20◦ - compared to experimental L24D folding curve As the dielectric εr is increased above the nominal vacuum value εr = 1, the difference between succes- sive energy levels Ui is reduced. In physical terms, a dielectric medium acts to shield charge interactions thus reducing the electrostatic potential incurred by proximal charged residues in the hexamer. Conse- quently, as the value of εr is increased, the slope of the unfolding curves produced by the model become increasingly flattened due to relatively higher concentrations of more negatively charged folded species at a given pH. Given the basis parameterisation used for this analysis, successive increases in εr modifies the energet- ically favoured distribution of charged residues in such a way as to produce significant concentrations of multiply charged hexamer species. With little or no dielectric screening (εr < 10) the concentrations of hexamer species with more than 2 charged Asp residues are vanishingly dilute. Above εr ≈ 30 however, the model predicts concentrations of hexamer species with 4 charged residues with experimen- tally significant concentrations. For this reason, as the pH increases, charged residues are favourably accommodated in hexameric species, delaying the onset of unfolded [D] production resulting in the more shallow unfolding curves observed in figure D.3. At values close to the εr(water) ≈ 80, the energy landscape has been smoothed considerably with an energy difference between neutral and fully charged hexameric species UD6 − Ud6 ∼ 10.5kcal.mol−1 (c.f. ∆Gf = −46kcal.mol−1 ). At this ‘extreme’ the binding of 6 charged residues in a folded hexamer represents a more stable native state than for unfolded [D] peptides, indicated by the red curve (figure 16) which remains more than 90% folded across the 24
  • 35. 4.4 Parameter space search entire pH range. 4.4 Parameter space search Having considered each of the three key model parameters in isolation, a set of unique and physically consistent effects on the unfolding behaviour has been established. It is also clear that model predictions assuming experimental parameter values do not reproduce the experimentally observed unfolding data (see figure 13). The previous analysis can be therefore extended by considering how different parame- terisations of the model varies the quality of fit with respect to the experimental data. In the following analysis, the three model parameters pKa, ∆Gf and εr are systematically varied across an appropriate range and resolution, with the unfolding curve computed for each permutation. Specifically the root- mean-squared-deviation (rmsd) value for each curve is calculated with respect to the experimental data points, providing a numerical comparison indicating the quality of fit for each parameter combination. For each parameter combination β = (pKa, ∆Gf , εr), the unfolded ratio is computed numerically at each experimental pH value. The rmsd values for each parametrisation β are calculated according to rmsdβ = 1 n n i=1 (yexp − yβ) 2 i , where yexp, yβ are respectively the experimental and numerically pre- dicted unfolded fractions. The resulting matrix of parameters and associated rmsd values is visualised by plotting points with (3D) coordinates given by the vector β and with a size according to the reciprocal of the rmsd. Normalised rmsd values are then used to colour each point across a spectrum from blue to red with increasing rmsd. The data in figure 17 highlights the results of a systematic search through 35,280 parameter combinations given the following value ranges for each of the three key variables: Intrinsic Asp pKa: pKa = [2.0, 2.1, 2.2, .., 4] Free folding energy: ∆Gf = [−50, −49, −48, .., −30] Dielectric constant: εr = [1, 2, 3, .., 80] (taking ∼ 1 minute of processing time on on a typical desktop PC) The resulting ‘4D’ plot in figure 17 indicates a very clear linear interdependence between values of the intrinsic pKa and ∆Gf , with a proportionality moderated by the value of εr. The relationship between the pKa and ∆Gf corresponds with the results from earlier analysis in that the apparent pKa shifts linearly in response to both parameters but in opposition. The resulting interdependence is that as one variable increases, the value of the other must decrease by an amount proportional to produce a similar quality of fit to the experimental data. The low rmsd region of parameter space is explored further by considering parameterisations which produce rmsd values less than an arbitrary threshold. The interpolated contours in figure 18 show the strong coupling between pKa and ∆Gf for the 565 parameterisations with rmsd < 0.05. Representing the search space data in this way shows the best fit region of pKa and ∆Gf parameter space, contoured in steps of εr = 8. At values of εr < 30 there are low rmsd parameterisations with values of pKa and ∆Gf linearly distributed across the entire search range. As the value of εr increases above this however, the low rmsd region shrinks significantly, identifying a more restricted range of both pka and ∆Gf . For example, in the set of parameters considered, with a dielectric εr > 48 there are are only 20 parameterisations which produce an rmsd < 0.05. In this regime it is found that the folding curve, and hence the rmsd, is much more sensitive to small changes in the folding energy with a much smaller range of ∆Gf and corresponding pKa values producing good approximations to experimental data. A 25
  • 36. 4.4 Parameter space search large dielectric results in an energy landscape flattened across all species towards ∆Gf , the value of which therefore becomes increasingly influential in governing global species concentrations and thus the unfolding behaviour. Figure 17: 3-parameter space search: point coordinates indicate parameter combination (pKa, ∆Gf , εr) with size proportional to the inverse of the rmsd calculated with respect to experimental L24D unfolding data points. Points are coloured from red to blue by relative rmsd value Figure 18: 2D (left) and 3D (right) plots for parameterisations with rmsd < 0.05 with respect to L24D folding data. Parameter space is grid interpolated in terms of pKa and ∆Gf and contoured with respect to εr. The strongly coupled interdependence of pKa and ∆Gf is evident with a range progressively restricted by increasing εr. 26
  • 37. 4.4 Parameter space search 4.4.1 Best fit parameters - rmsd minimisation By selecting the parameter values which minimise the rmsd, we can plot the ‘best fit’ unfolding curve with a comparison to the normalised L24D helicity data. The systematic parameter search yields best fit parameters β = (pKa = 2.8, ∆Gf = −41, εr = 37) resulting in the (blue) unfolding curve shown in figure 19. The predicted unfolding behaviour provides a qualitatively good match to the normalised experimental L24D data (black squares), with an apparent pKa of both curves ∼ 5.2. More detailed inspection indicates that, as a result of a balanced (unweighted) fit to the experimental data points, the unfolding behaviour either side of the apparent pKa is not captured particularly well. With parameters β , the gradient of the unfolding transition is observed to be somewhat shallower than the experimental curve, converging to an unfolded fraction of ∼ 2% at low pH; compared to experimental data which indicates a fully folded (∼ 0%) ensemble. Figure 19: rmsd minimised ‘best-fit’ unfolding curves compared to experimental L24D data (black squares). Three curves indicate weighted rmsd minimisation: (blue) - equal weighting, (red) - 5x weight on first low pH data point, (green) - weighting prioritising three high pH data points In order to assess how parameter variation can produce a more accurate fit to the low pH region (pH< 5.2), the rmsd minimisation procedure was repeated using a scheme to give extra weight to the first experimental data value (pH=3.4); where rmsdβL = 1 n n i=1 wi yexp − yβL 2 i with weighting vector w = [.5, .1, .1, .1, .1, .1]. By prioritising the initial ‘baseline’ value in this way, a new parameter set βL = (pKa = 2.0, ∆Gf = −49, εr = 9) was found using the same parameter search ranges as before. The corresponding (red) unfolding curve is shown in figure 19. The curve produced with parameter set β provides a much closer fit to the experimental data with a smaller convergent unfolded fraction at low pH (< 0.2%) and a congruent transition slope up to ∼ pH 5.5. Above this pH however, the predicted and experimental curves diverge with a steeper transition gradient predicted than observed experimentally. Similarly, the higher pH region can be given priority using a different weighting vector, e.g. w = [0, 0, 0, .25, .5, .25] leading to best fit parameters βH = (pKa = 3.4, ∆Gf = −36, εr = 67) and the (green) unfolding curve. In terms of the parameter sets β , βL and βH it is found that the combination of a more negative (stable) 27
  • 38. 4.5 Molecular dynamics folding energy, higher pKa and a reduced dielectric constant, produces unfolding behaviour which better matches experimental folding behaviour below the apparent pKa ≈ 5.2. Above the apparent pKa, the transition gradient is matched more closely with increased (less stable) folding energy and a significantly higher dielectric constant. Unfortunately, the freedom to chose arbitrary parameter combinations to produce curves which are compared to only 6 data points, prevents us from making direct predictions for any individual physical parameter. However, it is encouraging that very close curve fits can be obtained where all three parameters are found within a physically realistic range. 4.5 Molecular dynamics To supplement the static (rigid backbone) energy models, a set of 13 molecular dynamics (MD) sim- ulations were performed with the aim of providing a more accurate description of the folded hexamer energy landscape. By including explicit solvent molecules (water) in the dynamical simulation, we can also implicitly include the effects of dielectric screening and thus reduce the number of free parameters of the model. The full molecular dynamics simulation of the full L24D hexamer, including around 3000 pro- tein atoms and approximately 92,000 solvent atoms, is computationally very demanding with a limited simulation time of 20ns for this investigation. The results presented here therefore reveal only a partial and qualitative review of the simulated structures. Given the relatively short simulation time-frame, it is not clear whether the any of the test structures reached the energetic equilibrium required to provide a comparison between rigid model energy landscapes and incorporation into the statistical unfolding model. Starting from the L24D crystal structure data, 13 hexameric coiled-coil species were prepared in silico, according to the unique aspartate charge sequences. Each structure was then individually configured using the energy minimised Asp side-chain rotamer conformations (χ2 angles) provided in appendix B. Molecular dynamics simulations were carried out using the Gromacs (MPICH) package [24] in conjunc- tion with the AMBER forcefield and TIP3P water model. Using the Gromacs pre-processing tools, each protein structure was combined with water solvent molecules, and subject to a preliminary energy min- imisation with a short time-scale position-restraint MD simulation. Full (unrestrained) MD simulation was then performed on the University of Bristol’s high-performance Blue Crystal cluster, utilising 32 (dual processor) parallel compute nodes. During each of the 20ns MD simulations, energy components are recorded every 10ps. The data in table 1 indicates the electrostatic (Coulomb) energy of each structure, referenced to the electrostatic energy of the neutral (d6 ) hexamer, averaged over different simulation time slices (0-1ns, 10-20ns & 0-20ns). There are a number of key observations, especially with comparison to the rigid backbone model pre- dictions. We note that although most of the MD energy levels are within the same order of magnitude as the predicted values, the species ordering of the energy landscape is not conserved. More specifically, the energy of species with adjacent charges (DDd..) are not strictly greater than those with alternating charged sites (DdD..), as predicted by the rigid models. Furthermore, the energy for the fully charged D6 hexamer is of intermediate energy, and is less than all Q = −4 species, contrary to previous predictions. Observation of the corresponding atomic (MD) trajectory data suggest that the reason for this almost certainly due to the flexibility of the alpha-helix backbone. In all species containing two or more adjacent charged side-chains, the peptide backbone of one or more of the alpha-helicies was found to unwind such that the charged Asp residue points outwards, away from the interior channel. The additional potential energy due to proximal charges is therefore found to be sufficient in some cases to provoke a significant conformational change such that a lower energy minima is reached. Indeed from observations of the 28
  • 39. 4.5 Molecular dynamics Species Uc − Uc(dddddd) Uv (MM frag.) 0 - 1 ns 10 - 20 ns 0 - 20 ns dddddd 0.00 0.00 0.00 0.00 Dddddd 28.44 17.45 20.79 0.00 DddDdd 45.89 21.03 28.20 36.35 DdDddd 25.10 5.98 16.49 41.97 DDdddd 22.47 18.88 24.86 72.69 DdDdDd 93.45 72.90 72.66 125.91 DddDDd 88.67 13.38 38.96 151.01 DDDddd 48.04 59.75 64.05 187.35 DDdDDd 103.01 127.15 115.44 302.01 DDDdDd 119.98 135.28 131.21 307.64 DDDDdd 117.83 129.54 125.00 338.36 DDDDDd 79.59 138.62 128.82 531.34 DDDDDD 98.23 86.52 97.51 797.01 kcal.mol−1 Table 1: MD simulation results: relative electrostatic (Coulomb) energies of the 13 L24D hexamer species, compared with rigid model vacuum energy approximations Uv computed using rigid backbone fragment model, with εr = 1 trajectories of all charged species over the 20ns simulation time-frame, it was found that the C-terminus of the hexamer suffers both extensive fraying (unwinding of coiled-coil) and deformation of the central channel. Across the species, the structural deformations include flattening, or pinching, of the central channel; expansion of the channel; and partial or full twisting of peptide backbones to separate adjacent charged sites near the 24’th residue position. In contrast, the structure of the neutral d6 hexamer main- tains a stable, regular hexameric coiled-coil structure across the full length of the peptides. Although the energy trajectories have not yet been fully analysed, the atomic position trajectories suggest that after 20ns, the simulated hexamers are still far from equilibrium and their native state. In general however, it is found that the electrostatic energy contribution for each species, averaged over the full sim- ulation period, is still significantly less than rigid backbone models predict, especially for Q < −3. Given the observed structural distortions, a large contribution to the energy difference is due to the flexibility of the peptide backbones allowing an increased separation between like charges. Additional charge shielding is also expected due to the explicit inclusion of dielectric water molecules. However, an unexpected de- viation from this pattern is found, noting the energy of the singly charged hexamer (Dddddd) where full protein MD simulation indicates a significant electrostatic energy increase (> 20kcal.mol−1 ) compared to the neutral d6 hexamer. The MD data therefore implies that a single Asp residue increases the overall electrostatic energy by interacting in a way which is not fully captured by either of the simplified models. Evidence indicating that a singly charged hexamers may provide an additional contribution to the elec- trostatic energy is particularly significant when we consider the distribution of folded hexamers species concentrations as a function of pH. The energy landscapes predicted by previous models implied the existence of two folded hexamer species (dddddd & Dddddd) which are energetically equivalent (or very close), providing a folded state for charged Asp residues to occupy as the pH increases with little or no energy penalty. A Dddddd hexamer associated with non-zero energy increase would therefore result in lower concentrations of that species and a shift in the apparent pKa to lower pH. Although the energy levels obtained from the MD study are not guaranteed to be fully equilibrated values, a preliminary analysis can be made using the statistical model developed in this research. 29
  • 40. 4.5 Molecular dynamics Figure 20: Comparison between unfolding curves generated with different energy landscapes, using basis parameters pKa = 3.9, ∆Gf = −46: (red) Molecular dynamics energy levels (0-20ns average), (green) modified MD energy levels with U(Dddddd) set to zero, (blue) rigid fragment model energy levels (εr = 10). The data in figure 20 shows the predicted unfolding behaviour using three different energy landscapes and the basis (experimental) parameters pKa(Asp) = 3.9 and ∆Gf = −46. As expected the (red) unfolding curve based on UMD, the molecular dynamics energy levels, produces a curve with an apparent pKa much closer to the experimental curve than when using the rigid (fragment) model energy landscape. If we modify the UMD energy levels such that U(Dddddd) has zero energy, we obtain the (green) curve, which shows only a very small deviation from the (blue) rigid fragment model curve. From this comparison, it is found that the energy difference between the neutral d6 state and the singly charged Dddddd species, plays a dominant role in the unfolding behaviour of the hexamer. Using species energy levels (UMD) obtained from the MD simulations (0-20ns average) a best-fit unfold- ing curve was found with parameters [pKa = 3.0, ∆Gf = −47]. The dielectric parameter, now implicitly included within UMD, was fixed throughout the search with εr = 1. The resulting parameter set pro- duces a very close fit with the experimental unfolding curve, and with a folding energy also very close to the experimental value ∆Gf (L24D) ≈ −46kcal.mol−1 [19]. Further analysis of the concentration decomposition, shown in figure 21, indicates that the predicted unfolding transition is directly associated with a transition between concentrations of neutral d6 hexamers to unfolded, charged D peptides as the pH is increased. Importantly, this suggests that concentrations of all charged hexamer species are vanishingly small. This result contrasts with predictions made using the rigid (fragment) model energy landscape (see appendix E) where significant (measurable) concentrations of charged folded species are predicted. A practical experiment to measure concentrations of stable, folded hexamers with charged residues, if they exist, would therefore be highly beneficial to support these differing predictions. 30
  • 41. 4.5 Molecular dynamics Figure 21: (top) best-fit unfolding curve using species energy levels derived from MD simulation (UMD), (bottom) corresponding hexamer species concentration as a function of pH, indicating only d6 neutral hexamer in significant concentrations 31
  • 42. Summary and Conclusions 5 Summary and Conclusions The main objective of this research has been to construct a variety of models which describe the electro- static energy and corresponding folding behaviour of the coiled-coil L24D protein as a function of pH. The assumptions of the model posit that the primary feature of the protein structure which responds to a change in pH is the ring of polar aspartic acid side-chain groups at the L24 position of each peptide. As the Asp residues become increasingly deprotonated at higher pH, the side-chains become negatively charged with the associated electrostatic forces leading to unfolding of the hexamers. In order to obtain the energy landscape of the 13 uniquely charged hexamer species, two ‘rigid’ elec- trostatic models have been considered, where the relative position of the peptide bonds are fixed. The simplest approximation is provided by a geometric Coulomb model which calculates the electrostatic energy due to charge interactions on the vertices of a regular hexagon, parameterised by a single value R - the distance between adjacent Asp sites. A second model, using partial atomic structures gained from XRD experiments, was developed to find energy minimised side-chain conformation angles given their local context, with electrostatic energies calculated using conventional molecular force-field meth- ods. Using an inter-vertex distance R estimated from the XRD data, it was found that both the simple analytical model and the more complex molecular mechanics model, provide an almost identical energy landscape. The limited side-chain flexibility provided in the second model therefore provides only a very minor refinement to the simple analytical model. Having developed a statistical method to compute the concentrations of each hexamer species and un- folded peptides from the energy landscape, it has been possible to generate unfolding curves which can be compared to experimental (CD) helicity data for the L24D protein as a function of pH. Using experimen- tal values for the intrinsic (Asp) pKa = 3.9 and folding free energy ∆Gf = −46kcal.mol−1 , it was found that the apparent pKa of the predicted unfolding curve, where approximately 50% of the concentration is unfolded, is displaced by +1.6 pH units from the experimental curve. Subsequent parameter space searches, with direct comparison to helicity data using rmsd minimisation techniques, indicated that model predictions using a comparable folding energy (∆Gf ≈ −49) and minimal dielectric screening (εr ≈ 9) produced unfolding curves which are most consistent with experimental data, however with an intrinsic pKa ≈ 2.0. The disagreement between this ‘predicted’ pKa value and the known aspartate value (3.9) could be due to a number of factors which have not been accounted for in the model. For example, in the derivation of the concentration model, the same value of the intrisic pKa is used in two different biological contexts. The most obvious contextual difference is found when a pKa value which denotes the acid disassociation rate constant for aspartate residues in unfolded peptides, i.e. [d] to [D] transitions, and the value related to charge transitions within the folded hexamer channel, for example [Dddddd] to [DDdddd]. As currently implemented, the concentration model does not discriminate be- tween these two values which may indeed differ due to the local environment of the Asp residue. A future modification to the current model could therefore be to define the pKa for both contexts separately, in- corporating a fixed intrinsic (unfolded) value and a folded ‘channel context’ value left as a free parameter. Particularly useful insight has been gained by comparing the energy landscapes predicted via rigid ap- proximations to those calculated by molecular dynamics simulations of each hexamer species. After 20ns of relaxation time in explicit water, MD trajectory data suggests that the majority of charged hexamers suffer significant structural distortions; N-terminus fraying and possible instability as hexameric coiled- coils at longer time-scales. Overall, the data obtained with MD simulation indicates a much flatter energy landscape than predicted via rigid models, consistent with the observed structural distortions in 32
  • 43. Summary and Conclusions response to proximal charges. Although the MD simulations performed almost certainly do not provide a fully equilibrated energy landscape, there is good evidence that there is a significant energy difference between neutral d6 hexamers and singly charged Dddddd species (∼ 20kcal.mol−1 ), contrary to rigid model predictions. The effect of this energy gap is to shift the apparent pKa closer to the experimental value with a concentration model best-fit search yielding an intrinsic pKa = 3.0 and a folding free energy ∆Gf = −47 - very close to the experimental value (−46kcal.mol−1 ). Although there is still an obvious discrepancy between the intrinsic Asp value and the best-fit pKa, it is clear that the electrostatic potential difference between d6 and Dddddd plays a dominant role in the observed unfolding behaviour and should clarified by further experiments in silico. Concentration predictions using the currently available MD energy landscape indicate that concentrations of all charged hexamers are vanishingly small across the entire pH range, with unfolding behaviour characterised by a smooth transition between neutral d6 and unfolded D (charged) peptides. In this respect, it would be interesting to design a practical experiment which could detect, or even discriminate concentrations of charged hexamer species in solution. If significant concentrations of these species can be found, it would clearly cast doubt on the latter prediction. The models provided in this research provide a clear indication that a statistical approach can indeed be used make qualitative predictions of the folding dynamics of a complex, oligermerised protein. The rapid calculation of ensemble microstate concentrations is made possible by assuming an approximate energy landscape. The comparison between estimated energy landscapes, and those provided by more complex molecular simulation, suggests however that more accurate energy models may be required to predict unfolding behaviour comparable to experimental observations. 33
  • 44. Geometric coulomb model of electrostatics A Geometric coulomb model of electrostatics Figure 22: Schematic diagrams of folded hexamer species and associated energy levels (kcal.mol−1 ) calculated by 2D geometric electrostatic model, with radius R = 4.9˚A and dielectric εr = 1. Red vertices indicate neutral AspØ residue sites, Blue vertices indicate negatively charged Asp sites. 34
  • 45. Normalised energy and torsion angles for hexamer species fragments B Normalised energy and torsion angles for hexamer species fragments Species U − Udddddd χ2(A) χ2(B) χ2(C) χ2(D) χ2(E) χ2(F) (kcal.mol−1 ) dihedral angle (±2◦ ) dddddd 0.00 -32 146 -32 146 -32 146 Dddddd -0.57 -26 148 152 150 -34 -26 DddDdd 35.23 -28 150 -34 -28 150 -34 DdDddd 41.52 -26 -40 -30 150 -38 -30 DDdddd 68.18 -28 -38 152 154 -34 -30 DdDdDd 127.43 -30 -42 -30 -42 -32 -42 DddDDd 151.09 -32 154 -34 -32 -40 -44 DDDddd 182.86 -28 -42 -40 154 -36 -32 DDdDDd 299.41 -34 -38 -46 -36 -34 -46 DDDdDd 305.42 -32 -40 -40 -34 -36 -44 DDDDdd 332.40 -30 -40 -42 -40 158 -38 DDDDDd 525.83 -34 -40 -42 -42 -40 -48 DDDDDD 839.36 -40 -42 -42 -42 -42 -42 Table 2: Minimised energy levels and χ2 torsion angles for 13 charge species hexamer fragments. Minimised energy conformations are calculated to ±2◦ . Energy values due to non-bonded interactions, referenced to neutral species background. 35
  • 46. Log-concentration plot for L24D species C Log-concentration plot for L24D species Figure 23: Log-concentration plot colour grouped by total charge Q, for unfolded and folded L24D species - model parameters pKa = 3.9, ∆Gf = −46kcal.mol−1 , εr = 10, T = 20◦ . 36
  • 47. L24D species concentrations for varying parameter values D L24D species concentrations for varying parameter values D.1 Intrinsic aspartate pKa pKa = 2 pKa = 3 pKa = 4 pKa = 5 Figure 24: Predicted L24D species concentration as a function of pH for varying values of intrinsic pKa = {2, 3, 4, 5}. ∆Gf = −46kcal.mol−1 , εr = 10, T = 20◦ 37
  • 48. D.2 Folding free energy ∆Gf D.2 Folding free energy ∆Gf ∆Gf = −50kcal.mol−1 ∆Gf = −45kcal.mol−1 ∆Gf = −40kcal.mol−1 ∆Gf = −35kcal.mol−1 Figure 25: Predicted L24D species concentration as a function of pH for varying values of the hexamer folding free energy ∆Gf = {−50, −45, −40, −35}. pKa = 3.9, εr = 10, T = 20◦ 38
  • 49. D.3 Dielectric constant (relative permittivity) εr D.3 Dielectric constant (relative permittivity) εr εr = 20 εr = 30 εr = 60 εr = 80 Figure 26: Predicted L24D species concentration as a function of pH for varying values of the dielectric constant εr = {20, 30, 60, 80}. pKa = 3.9, ∆Gf = −46kcal.mol−1 , T = 20◦ 39
  • 50. E Best-fit concentration data (rigid model energy landscape) Balanced weighting: w = [1, 1, 1, 1, 1, 1] - pKa = 2.8, ∆Gf = −41, εr = 37 low pH weighting: w = [.5, .1, .1, .1, .1, .1] - pKa = 2.0, ∆Gf = −49, εr = 9 Balanced weighting: w = [0, 0, 0, .25, .5, .25] - pKa = 3.4, ∆Gf = −36, εr = 67 Figure 27: Concentration decomposition for parameter search rmsd minimisation ‘best-fit’ unfolding curves described in figure 19. Each parameter set was found by applying different weighting vector w in the calculation of the rmsd to prioritise fit to different parts of the unfolding curve (see section 4.4).
  • 51. References [1] Xi Liu, K Fan, and W Wang. The number of protein folds and their distribution over families in nature. Proteins, 54(3):491–9, February 2004. [2] J T MacDonald et al. De novo backbone scaffolds for protein design. Proteins, 78(5):1311–25, April 2010. [3] B Kuhlman et al. Design of a novel globular protein fold with atomic-level accuracy. Science (New York, N.Y.), 302(5649):1364–8, November 2003. [4] S F Altschul et al. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990. [5] J Gough, K Karplus, R Hughey, and C Chothia. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Journal of molecular biology, 313(4):903–19, November 2001. [6] O Rackham et al. The evolution and structure prediction of coiled coils across all genomes. Journal of molecular biology, 403(3):480–93, October 2010. [7] T Narumi et al. A 55 TFLOPS Simulation of Amyloid-forming Peptides from Yeast Prion Sup35 with the Special-purpose Computer System MDGRAPE-3. Proceedings of the 2006 ACM/IEEE conference on Supercomputing, pages 1–13, 2006. [8] D Shaw et al. Millisecond-Scale Molecular Dynamics Simulations on Anton. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009. [9] L Pierce et al. Routine Access to Millisecond Time Scale Events with Accelerated Molecular Dy- namics. Journal of chemical theory and computation, 8(9):2997–3002, September 2012. [10] N R Zaccai et al. A de novo peptide hexamer with a mutable channel (supplimentary material). Nature chemical biology, 7(12):935–41, December 2011. [11] D N Woolfson. The Design of Coiled-Coil Structures and Assemblies. Advances in Protein Chemistry, 70:79–112, 2005. [12] D N Woolfson. An Introduction to Coiled coils (http://www.lifesci.sussex.ac.uk/research/woolfson/html/). [13] E K O’Shea et al. X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science, 254:539–544, 1991. [14] P Konig and T J Richmond. The X-ray Structure of the GCN4-bZIP Bound to ATF/CREB Site DNA Shows the Complex Depends on DNA Flexibility. Journal of Molecular Biology, 233(1):139– 154, 1993. [15] S A Dames et al. NMR structure of a parallel homotrimeric coiled coil. Nat.Struct.Biol, 5:687–691, 1998. [16] L Pauling, R B Corey, and H R Branson. The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. PNAS, 37:205–211, 1951. [17] F Crick. The packing of α-helices: simple coiled-coils. Acta Crystallographica, 6(8):689–697, Septem- ber 1953. [18] N R. Zaccai et al. A de novo peptide hexamer with a mutable channel. Nature chemical biology, 7(12):935–41, December 2011.
  • 52. [19] H-C Chi. . PhD thesis, University of Bristol, 2012. [20] J A Ng et al. Estimating the dielectric constant of the channel protein and pore. European biophysics journal : EBJ, 37(2):213–22, February 2008. [21] J E Nielsen. Analysing the pH-dependent properties of proteins using pKa calculations. Journal of molecular graphics & modelling, 25(5):691–9, January 2007. [22] H Li, A D Robertson, and J H Jensen. Very fast empirical prediction and rationalization of protein pKa values. Proteins, 61(4):704–21, December 2005. [23] R L Burden and J D Faires. Numerical Analysis. Thomson, 8 edition, 2005. [24] B Hess, C Kutzner, D van der Spoel, and E Lindahl. Gromacs 4. J. Chem. Theory Comput., 4:435–447, 2008.