1. Computational Analysis of Nucleosome Core Particles Utilizing
Interactive Chromatin Modeling
Vernon D. Dutch∗
Louisiana Tech University and
305 Wisteria St, Ruston LA 71272
Dr. Thomas Bishop, (Advisor)†
Louisiana Tech University
(Dated: May 7, 2014)
Abstract
The integration of bioinformatics and computational biology provides a powerful approach for
extending our understanding of genomic mechanisms. We are developing and utilizing tools for
Interactive Chromatin Modeling (ICM) which use known material properties of DNA such as
conformations and fluctuations. It also encompasses information on nucleosome structure, and
nucleosome positioning data. Specifically, this computational analysis is performed by inputting
ICM files into the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), which
allows allow us to model chromatin as coarse-grained molecules. Effective usage of tools like ICM
is essential for the rational design of chromatin models and provides a foundation for structural
analysis of proposed molecular mechanisms. The main objective is to demonstrate how to properly
account for the spatial constraints imposed by DNA while demonstrating how the packing of
nucleosomes provides unexplored insights into acknowledged biologic mechanism.
1
2. INTRODUCTION
Chromatin is a biomolecular complex composed of DNA and histones that folds DNA
so that it can fit in a cell nucleus[10] Since a DNA sequence represents the instruction set
for life, the chromatin is the entity that houses these instructions. As a material entity,
DNA is literally the common thread in every chromatin fiber. If the path of DNA through
chromatin is known, this foundation can be utilized for the reassembly of 3D structure of
chromatin. A piece-wise description of the path was the motivation for the two-angle models
of chromatin.[1] Both angles correspond to the total twist of the linker DNA (a function of
linker length between nucleosomes) as well as the initial to final angle between the two linker
attachment sites on each nucleosome. Nucleosomes in this model are rigid entities composed
of 147 base pair of DNA wrapped around a histone octamer.Theoretical and computational
analysis[2] of this simple model reveal chromatin topologies ranging from ribbon-like struc-
tures to those of multi-start solenoid topologies.[3] These results are compatible given mul-
tiple experimental studies.[4–7] More advanced models or coarse-grain representations are
required to introduce anomalies associated with thermal fluctuation of linker DNA, sequence
specific conformations of linker DNA, breather motions of the nucleosome, non-canonical nu-
cleosome structures, or the irregular spacing of nucleosomes. Essentially, the fact is that
DNA remains the common thread in chromatin. The Interactive Chromatin Modeling Web
server (ICM-Web) demonstrates that models of chromatin containing 10,000’s of base pairs
can be generated in real time given only a sequence of DNA.[8]. (See Fig. 1)
ICM WEB SERVER FUNCTIONALITY
The ICM web server is an interactive tool that allows users to rapidly assess nucleosome
stability and fold sequences of DNA into putative chromatin templates. These templates
are based on nucleobases, which are nitrogen-containing biological compounds (nitrogenous
bases) found within nucleotides. Primary nucleobases for DNA include adenosine (A), cy-
tosine (C), guanine (G), and thymine (T). In the nucleic acid of RNA, thymine is replaced
with uracil (U) as a nucleobase. The nucleotides are the ingredients of both deoxyribonu-
cleic acid (DNA) and ribonucleic acid (RNA). In genetics, they nitrogenous bases are simply
referred to as bases. Their ability to form base-pairs and to stack upon one another are the
2
3. FIG. 1: A model of chromatin which can be generated and altered in real time with ICM tools.
DNA is modeled with sequence specific conformation and fluctuations (yellow). The red spheres
are histone octamers and the smaller cyan spheres are 5 base pair long DNA segments. Chromatin
is modeled using known or predicted nucleosome positions or folded into condensed chromatin(left
and right sides in turquoise).[8]
essential reason helical structure of DNA and RNA is possible. Templates are also defined
in certain positions known helical parameters. All parameters act as models to study the
conformation of DNA in a nucleosome core particle.(See Fig 2) The rigidity of the covalent
bonds in the sugar-phosphate backbones explain that the base pairs are strongly correlated
with each other.
ICM Web takes a sequence composed of As, Cs, Gs, and Ts as input and generates
(i) a nucleosome energy level diagram, (ii) coarse-grained representations of free DNA and
chromatin and (iii) plots of the helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) as
a function of position.[8]. The user is capable of choose various energy models, nucleosome
structures and methods for placing nucleosomes. ICM also includes a default energy model
which achieves a correlation coefficient of 0.7 utilizing 100 experimentally determined values
of stability. This energy model allows it to properly predict the location of nucleosomes in
situations of retrovirus development and other phenomena of reverse transcription of RNA
into DNA. In addition to setting specific methods nucleosome placement, ICM also graphs
the nucleosome energy level for chromatin samples.(See Fig 3)
ICM relies on the El Hassan algorithm in order to effectively model multiple chromatin
3
4. FIG. 2: The multiple models represent the basic helical and dimer parameters that are used for
modeling LAMMPS output files in VMD. The lower left corner displays the coordinate system
defining the parameters spatially.[12]
samples. The denoted algorithm utilizes a local Euler-angle-based scheme for assessing the
internal kinematics or geometry of a general dinucleotide step in double-helical DNA.
The geometry of a dinucleotide step is completely defined by: (1) the base-pair parameters
that describe the relative position and orientation of one base with respect to the other in
a standard base-pair,and (2) the step parameters that describe the relative position and
orientation of the two base-pairs.[13] A DNA path is generated with the El Hassans algorithm
that requires histone cores docked to the DNA scaffold at the appropriate locations based
on a set of DNA helical parameters. By inputting a sequence of DNA, ICM can immediately
generate equilibrium conformations of DNA using the sequence specific conformation and
thermal fluctuations obtained from theory or experiment based on the users preference. The
chromatin generation is caused by ICM replacing the helical parameter values for free DNA
4
5. with those describing the nucleosome superhelix wherever a nucleosome position has existed.
These preexisting positions are acquired via nucleosome positioning algorithm, the prediction
algorithms of others, or known nucleosome positions from a previous model. Functionality
like this allows the generation of all-atom models or coarse-grain representation.
Steric collisions of atoms within molecules are a possibility within ICM. A primary goal
of modifying ICM-Web is to enforce a minimum separation between nucleosomes to produce
probable steric model for short segments of DNA that are heavily packed with nucleosomes.
Longer strands of DNA with minimum nucleosomes have a stronger chance of clashes. The
web server maintains conformational freedom so that force-fields can rapidly produce accept-
able structures with necessary minimization. Utilization of force field coarse grain models
ensures the stability of models produced and has been practiced in other experimental stud-
ies. This adaption made to the web server retains compatibility with traditional all atom
molecular mechanics force fields. Currently we use LAMMPS for the minimization and
sampling of the structures generated by ICM.
UTILIZATION OF LAMMPS SOFTWARE
In order to improve ICM for research purposes,implementing the Large-scale
Atomic/Molecular Massively Parallel Simulator (LAMMPS) ensures that particle data can
be processed efficiently for proper simulation of its behavior. LAMMPS is a classical molec-
ular dynamics simulation program that models an ensemble of particles in a liquid, solid,
or gaseous state.[9] The program was developed by Sandia National Laboratories, a US
Department of Energy facility, with funding from the United States Department of Energy
(DOE). Designed as an open-source code, it can be distributed freely under the terms of the
GNU Public License (GPL). The availability of the code allows users to create their own
DNA modeling samples for ICM with no major issues of compatibility. Proper usage of this
code allows input files containing relevant atomic data to be analyzed to produce an output
of values and behavior of the molecule being assessed. These output files are then uploaded
into the Virtual Molecular Dynamics (VMD) in order to view a three dimensional model
of the chromatin. These models also include animation function for observing the behavior
of a chromatin in three dimensional space. Once the model is produced, it can be further
examined to ensure that proper energy minimization has taken place to ensure no collision
5
6. FIG. 3: Data is charted to displaying nucleosome energy as a function of base pairs. This graphs
relays the minimal amount of energy used by the nucleosome within a particular chromatin of free
DNA over an indicated range of base pairings.[11]
of molecules has taken place.(See Fig 4)
LAMMPS requires specific files for displaying information as an input script. These
scripts include an initialization, atomic definition, settings, and simulation commands. Ini-
tialization includes parameters that need to be defined before creating the atomic models.
It primarily defined a system of units and number of processors needed to extrapolate data.
It also standardizes force field parameters which include the bond, angle, and other needed
styles for modeling chromatin. The atomic definition has 2 ways of being interpreted in
LAMMPS. The data can be interpreted with molecular topology information on a separate
file. Data can also be assessed for atoms on a lattice with no molecular topology using
a unique set of commands. Once molecular topology and atomic definition are finalized,
6
7. FIG. 4: Comparison of chromatin model using ICM tools. The left side displays a molecule of free
DNA. The user can zoom in on the nucleosome as shown with the chromatin on the right side.
settings for the simulation need to be placed. Settings include force field coefficients and
simulations parameters for the input file. They also contain the functionality of output
options that the user may want included before sampling the data in LAMMPS. The simu-
lation commands are statements within the file to initiate data processing in LAMMPS.[5]
Although LAMMPS has proven to be an efficient computing engine for ICM web, it should
be noted that it is not primarily designed nucleosome and biomolecular simulations. This
translates into some quandaries when integrating atomic and coarse-grain approaches.
There is potential to have knotted structures generated by ICM that are incapable of being
minimized. Normally our research group does not encounter knotted structures frequently
since it is customary to select parameters corresponding to steric allowed regions as on a
two-angle phase map for chromatin modeling. A significant task in our research involves
working with a minimization protocol for our coarse grain force field model that supports
interactive chromatin folding for arbitrary initial conformations. The resulting benefit of
constructing this protocol allows ICM to retain an initial assembly that restricts the initial
models to known chromatin topologies, allowing users to more easily avoid knots.
7
8. LAMMPS MINIMIZATION
Conformations of DNA require that you maintain a set amount of energy so that possible
chromatin models are created without multiple knotted structures. Comprehension of how
to utilize minimization commands for the LAMMPS software provides a method to edit the
allocated energy for the chromatin model. Performing an energy minimization of the system
requires repetitive adjustment of atomic coordinates. These iterations are concluded when
one of the maximum limit criteria is satisfied. Once completed the configuration must be
in the local potential energy minimum. The preferred approximate destination within the
energy minimum is the critical point for the objective function, although this point is not
always located in the minimum.
The minimization algorithm has to be defined with the minstyle or minmodify command.
Minimize commands can be interspersed with run commands to alternate between relaxation
and dynamics. The minimizers bound the distant atoms to move in one iteration, so that
you can relax systems with highly overlapped atoms (large energies and forces) by pushing
the atoms off of each other.[9] Another means of relaxing a system are to run dynamics
via a limited timestep. Dynamics can also be run using fix viscous command that instills
a damping force which slowly drains all kinetic energy from the system. The pairstyle soft
potential can be used to un-overlap atoms while running dynamics.[9]
Models returned by ICM only rely on the nucleosome footprints and conformation of
DNA. It does not account for steric interactions, so the models may exhibit non-physical
steric overlap. Our research group adopted the coarse grained nucleosome-bead model of
Korolevet[14, 15]in order to rectify this occurrence. The entire process of effectively minimiz-
ing all structures required implementing a two phase minimization strategy. Phase I used
a simple soft repulsion model while Phase II continues with the nucleosome-bead model.
Using this two phase strategy allows the optimization of more obtuse models created in
ICM into physically possible models.
EVALUATION OF MINIMIZATION
Minimization information that was procured from the LAMMPS output files was com-
piled into a spreadsheet for analyzing results. Multiple minimization energies from different
8
9. FIG. 5: Individual soft repulsion models of chromosomes have nucleosome energies displayed. This
semi-log graph has a very high range of nucleosome energy for all the sequences. These energies
will the result into improbable models.
[8]
chromatin samples were utilized in order determine the consistency of minimization. Using
the two phase method, we can make comparisons of the data related to multiple different
chromosomes. Phase I displays the data for the initial energies calculated in LAMMPS for
each chromosome.(See Fig 5) Phase II is the result of minimization of nucleosome energies
utilizing nucleosome-bead models.(See Fig 6). The data being utilized to analyze the results
of minimization is based on experimental models of other defined chromosomes.[16–21] The
sequences used for these chromosomes are defined as CHA, HIS, and PHO.
CONCLUSION
We have observed the process of how modeling chromatin is implemented with ICM and
the proper way to interpret its results. It has been referenced that ICM uses a real time sim-
ple elastic model for nucleosome placement. The nucleosome energy level diagrams provide
a passive way to monitor energy differences associated with various footprint configura-
tions. Our current stability for ICM is primarily basic because is does not necessarily factor
9
10. FIG. 6: Individual nucleosome bead models of chromosomes have nucleosome energies displayed.
The trend for semi-log graph is credited to minimized energies evaluated in LAMMPS. Minimization
of nucleosome energy values results in possible models of chromatin.
[8]
in nucleosome-nucleosome and linker-linker interactions or any other excessive nucleosome
phenomena.
Due to the indecisive nature of what can define a particular chromatin, the ICM Web
serves a primary purpose of rapidly assembling models of chromatin to have a physical exam-
ple of biophysical data. Its more of a tool of exploring chromatin at a molecular level rather
than predicting a specific sequence of DNA. We plan to further progress the functionality
of ICM with more defined force constraint in order to maintain proper representations of
chromatin that can be utilized for more extensive biological and physical experimentations.
∗ vdd001@latech.edu
† bishop@latech.edu
[1] Woodcock, C.L., Grigoryev, S.A., Horowitz, R.A. and Whitaker, N. (1993) A chromatin fold-
ing model that incorporates linker variability generates fibers resembling the native structures.
Proc Natl Acad Sci U S A, 90, 9021-9025.
10
11. [2] Schiessel, H. (2003) The physics of chromatin. Journal of Physics-Condensed Matter, 15,
R699-R774.
[3] Diesinger, P.M. and Heermann, D.W. (2006) Two-angle model and phase diagram for chro-
matin. Physical Review E, 74.
[4] Li, G. and Reinberg, D. (2011) Chromatin higher-order structures and gene regulation. Curr
Opin Genet Dev, 21, 175-186.
[5] Bednar, J. and Woodcock, C.L. (1999), Chromatin, Vol. 304, pp. 191-213.
[6] Rhodes, D., Sandin, S., Routh, A. and Robinson, P. (2009) Chromatin Higher Order Structure
and Regulation of its Compaction. Journal of Biomolecular Structure I& Dynamics, 26, 181.
[7] Robinson, P.J.J., Fairall, L., Huynh, V.A.T. and Rhodes, D. (2006) EM measurements define
the dimensions of the ”30-nm” chromatin fiber: Evidence for a compact, interdigitated struc-
ture. Proceedings of the National Academy of Sciences of the United States of America, 103,
6506-6511.
[8] Stolz, R.C. and Bishop, T.C. (2010) ICM Web: the interactive chromatin modeling web server.
Nucleic Acids Res, 38, W254-261.
[9] Mayo, Olfason, and Goddard, III. ”LAMMPS-ICMS Documentation.” Journal of Phys-
ical Chemistry (2014): 11-12. Sandia National Laboratories. Web. 13 Feb. 2014.
¡http://lammps.sandia.gov¿.
[10] Cooper, Geoffrey M. The Cell: A Molecular Approach. Washington, D.C.: ASM, 1997. Print.
[11] Bishop Theoretical Molecular Modeling Lab - Louisiana Tech University.” Bishop Theoretical
Molecular Modeling Lab - Louisiana Tech University. Ed. Thomas C. Bishopy, Ph.D. N.p.,
n.d. Web. 24 Mar. 2014. ¡http://dna.engr.latech.edu/¿.
[12] Lu, X.-J. ”3DNA: A Software Package for the Analysis, Rebuilding and Visualization of Three-
dimensional Nucleic Acid Structures.” Nucleic Acids Research 31.17 (2003): 5108-121. Print.
[13] el Hassan, M.A. and Calladine, C.R. (1995) The assessment of the geometry of dinucleotide
steps in double-helical DNA; a new local calculation scheme. J Mol Biol, 251, 648-664.
[14] Yang et al. Biophysics Journal, 2009, Vol. 96, 2082-2094
[15] Korolevet al.SciVerseScience Direct. 2012, Vo. 22, 151-159 —doi:10.1016/j.sbi.2012.01.006
[16] Cizhong , Jiang, B. Franklin Pugh, Nature Reviews Genetics 10, 161-172 (March 2009) —
doi:10.1038/nrg2522
[17] Mavrich et al. Nature 2007, 446:572-576
11
12. [18] Field et al. PLoS Comput Biol 2008, 4:e1000216.
[19] Shivaswamy et al. PLoS Biol 2008, 6:e65.
[20] Whitehouse et al. Nature 2007, 450:1031-1035.
[21] Lee et al. Nat Genet 2007, 39:1235-1244
12