SAN DIEGO SUPERCOMPUTER CENTER
AMBER 14 and GPUs:
Creating the World's Fastest MD Package
on Commodity Hardware
Ross Walker, Associate Research Professor and NVIDIA CUDA Fellow 
San Diego Supercomputer and Department of Chemistry  Biochemistry	

Adrian Roitberg, Professor, Quantum Theory Project,
Department of Chemistry, University of Florida	

1
SAN DIEGO SUPERCOMPUTER CENTER 2	

Presenters!
Ross Walker!
San Diego Supercomputer Center 

Department of Chemistry and
Biochemistry

UC San Diego!
Adrian Roitberg!
Quantum Theory Project

Department of Chemistry

University of Florida!
Scott Le Grand!
Senior Engineer!
Amazon Web Services!
SAN DIEGO SUPERCOMPUTER CENTER
Webinar Overview!
•  Part 1!
•  Overview of Amber  AmberTools!
•  New Features in Amber v14!
•  Overview of the Amber GPU Project!
•  New GPU features in Amber v14!
•  Recommended Hardware Configurations!
•  Performance!
•  Part 2!
•  Hydrogen Mass Repartitioning!
•  Constant pH MD!
•  Multi-dimensional Replica Exchange MD!
3
SAN DIEGO SUPERCOMPUTER CENTER
What is AMBER?!
An MD simulation
package	

	

Latest version is 14.0 as of April 2014	

	

distributed in two parts:	

	

	

- AmberTools, preparatory and analysis
programs, free under GPL	

	

- Amber the main simulation programs,
under academic licensing	

	

independent from the accompanying
forcefields	

A set of MD forcefields	

	

fixed charge, biomolecular forcefields:	

ff94, ff99, ff99SB, ff03, ff11, ff12SB ...	

	

experimental polarizable forcefields e.g.
ff02EP	

	

parameters for general organic
molecules, solvents, carbohydrates
(Glycam), etc.	

	

in the public domain	

4
SAN DIEGO SUPERCOMPUTER CENTER
Development!
AMBER is developed by a loose community of academic groups, led by
Dave Case at Rutgers University, New Jersey, building on the earlier
work of Peter Kollman at UCSF!
SAN DIEGO SUPERCOMPUTER CENTER
Development!
SAN DIEGO SUPERCOMPUTER CENTER
The AMBER Development Team 

A Multi-Institutional Research Collaboration

!Principal contributors to the current codes:
David A. Case (Rutgers)

Tom Darden (OpenEye)

Thomas E. Cheatham III (Utah)

Carlos Simmerling (Stony Brook)

Adrian Roitberg (Florida)

Junmei Wang (UT Southwestern Medical Center)
Robert E. Duke (NIEHS and UNC-Chapel Hill) Ray
Luo (UC Irvine)

Daniel R. Roe (Utah)

Ross C. Walker (SDSC, UCSD)

Scott LeGrand (Amazon)

Jason Swails (Rutgers)

David Cerutti (Rutgers)

Joe Kaus (UCSD)

Robin Betz (UCSD)

Romain M. Wolf (Novartis)

Kenneth M. Merz (Michigan State)

Gustavo Seabra (Recife, Brazil)

Pawel Janowski (Rutgers)!
Andreas W. Götz (SDSC, UCSD)
István Kolossváry (Budapest and D.E.
Shaw) Francesco Paesani (UCSD)Jian
Liu (Berkeley) Xiongwu Wu (NIH)!
Thomas Steinbrecher (Karlsruhe)
Holger Gohlke (Düsseldorf)!
Nadine Homeyer (Düsseldorf)!
Qin Cai (UC Irvine)!
Wes Smith (UC Irvine)!
Dave Mathews (Rochester)!
Romelia Salomon-Ferrer (SDSC,
UCSD) Celeste Sagui (North Carolina
State) Volodymyr Babin (North
Carolina State) Tyler Luchko (Rutgers)!
Sergey Gusarov (NINT)!
Andriy Kovalenko (NINT)!
Josh Berryman (U. of Luxembourg)
Peter A. Kollman (UC San Francisco)!
7
SAN DIEGO SUPERCOMPUTER CENTER 8
SAN DIEGO SUPERCOMPUTER CENTER
Availability!
•  AmberTools v14 – Available as a free download via the Amber
Website!
•  Amber v14 – Available for download after purchase from the Amber
Website!
http://ambermd.org/!
9
SAN DIEGO SUPERCOMPUTER CENTER
Distribution!
10	

v14 Released April 15, 2014!
AmberTools v14 – Free, OpenSource (BSD, LGPL etc.)!
Amber v14 – Academic ($500), Industry ($20,000 / $15,000), HPC Center ($2000)!
AmberTools v14!
Amber v14

Adds PMEMD MD Engine Support!
SAN DIEGO SUPERCOMPUTER CENTER
AmberTools v14: New Features!
•  The sander module, our workhorse simulation program, is now a
part of AmberTools!
•  Greatly expanded and improved cpptraj program for analyzing
trajectories!
•  New documentation and tools for inspecting and modifying Amber
parameter files!
•  Improved workflow for setting up and analyzing simulations;!
•  New capability for semi-empirical Born-Oppenheimer molecular
dynamics!
•  EMIL: a new absolute free energy method using TI!
•  New Free Energy Workflow (FEW) tool automates free energy
calculations (LIE, TI, and MM/PBSA-type calculations)!
•  QM/MM Improvements – External interface to many QM packages
including TeraChem (for GPU accelerate QM/MM).!
•  Completely reorganized Reference Manual!
11
SAN DIEGO SUPERCOMPUTER CENTER
Amber v14: New Features!
•  Major updates to PMEMD GPU support.!
•  Expanded methods for free energy calculations.!
•  Soft core TI supported in PMEMD.!
•  Support for Intel MIC (experimental).!
•  Support for Intel MPI.!
•  ScaledMD.!
•  Improved aMD support.!
•  Self Guided Langevin Support.!
12
SAN DIEGO SUPERCOMPUTER CENTER
GPU Accelerated MD!
13
SAN DIEGO SUPERCOMPUTER CENTER
The Project!
Develop a GPU accelerated

version of AMBER’s PMEMD

!
!

San Diego

Supercomputer Center !Ross C. Walker!
!
!University of Florida ! !Adrian Roitberg!
!
!NVIDIA (now AWS) ! !Scott Le Grand

!
Funded in part by the NSF
SI2-SSE Program.!
14
SAN DIEGO SUPERCOMPUTER CENTER
Project Info!
•  AMBER Website: http://ambermd.org/gpus/!
Publications!
1.  Salomon-Ferrer, R.; Goetz, A.W.; Poole, D.; Le Grand, S.;  Walker, R.C.* Routine microsecond
molecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald , J. Chem. Theory
Comput., in press, 2013, DOI: 10.1021/ct400314y!
2.  Goetz, A.W., Williamson, M.J., Xu, D., Poole, D.,Grand, S.L., Walker, R.C. Routine microsecond
molecular dynamics simulations with amber - part i: Generalized born, Journal of Chemical Theory and
Computation, 2012, 8 (5), pp 1542-1555, DOI:10.1021/ct200909j!
3.  Pierce, L.C.T., Salomon-Ferrer, R. de Oliveira, C.A.F. McCammon, J.A. Walker, R.C., Routine access
to millisecond timescale events with accelerated molecular dynamics., Journal of Chemical Theory and
Computation, 2012, 8 (9), pp 2997-3002, DOI: 10.1021/ct300284c!
4.  Salomon-Ferrer, R.; Case, D.A.; Walker, R.C.; An overview of the Amber biomolecular simulation
package, WIREs Comput. Mol. Sci., 2012, in press, DOI: 10.1002/wcms.1121!
5.  Grand, S.L.; Goetz, A.W.; Walker, R.C.; SPFP: Speed without compromise - a mixed precision model
for GPU accelerated molecular dynamics simulations, Chem. Phys. Comm., 2013, 184, pp374-380,
DOI: 10.1016/j.cpc.2012.09.022!
15
SAN DIEGO SUPERCOMPUTER CENTER
Motivations!
16
SAN DIEGO SUPERCOMPUTER CENTER
Maximizing Impact!
Supercomputers!
•  Require expensive low 

latency interconnects.!
•  Large node counts needed.!
•  Vast power consumption!!
•  Politics controls access.!
Desktops and Small Clusters!
•  Cheap, ubiquitous.!
•  Easily maintained.!
•  Simple to use.!
! 17
SAN DIEGO SUPERCOMPUTER CENTER
Better Science?!
•  Bringing the tools the researcher needs into his
own lab.!
•  Can we make a researcher’s desktop look like a small cluster (remove
the queue wait)?!
•  Can we make MD truly interactive (real time feedback /
experimentation?)!
•  Can we find a way for a researcher to cheaply increase the power of
all his graduate students workstations?!
•  Without having to worry about available power (power cost?).!
•  Without having to worry about applying for cluster

time.!
•  Without having to have a full time ‘student’

to maintain the group’s clusters?!
•  GPU’s offer a possible cost

effective solution.!
18
SAN DIEGO SUPERCOMPUTER CENTER
Design Goals!
Overriding Design Goal: Sampling for the 99%!
!
•  Focus on  2 million atoms.!
•  Maximize single workstation performance.!
•  Focus on minimizing costs.!
•  Be able to use very cheap nodes.!
•  Both gaming and tesla cards.!
•  Transparent to user (same input, same output)!
!
!
19	

The 0.0001%	

 The 1.0%	

 The 99.0%
SAN DIEGO SUPERCOMPUTER CENTER
Version History!
•  AMBER 10 – Released Apr 2008!
•  Implicit Solvent GB GPU support released as patch Sept 2009.!
•  AMBER 11 – Released Apr 2010!
•  Implicit and Explicit solvent supported internally on single GPU.!
•  Oct 2010 – Bugfix.9 doubled performance on single GPU, added
multi-GPU support.!
•  AMBER 12 – Released Apr 2012!
•  Added Umbrella Sampling Support, 1D-REMD, Simulated Annealing,
aMD, IPS and Extra Points.!
•  Aug 2012 – Bugfix.9 new SPFP precision model, support for Kepler I,
GPU accelerate NMR restraints, improved performance.!
•  Jan 2013 – Bugfix.14 support CUDA 5.0, Jarzynski on GPU, GBSA.
Kepler II support.!
20
SAN DIEGO SUPERCOMPUTER CENTER
New in AMBER 14 (GPU)!
•  ~20-30% performance improvement for single GPU runs.!
•  Peer to peer support for multi-GPU runs providing enhanced multi-GPU scaling.!
•  Hybrid bitwise reproducible fixed point precision model as standard (SPFP)!
•  Support for Extra Points in Multi-GPU runs.!
•  Jarzynski Sampling!
•  GBSA support!
•  Support for off-diagonal modifications to VDW parameters.!
•  Multi-dimensional Replica Exchange (Temperature and Hamiltonian)!
•  Support for CUDA 5.0, 5.5 and 6.0!
•  Support for latest generation GPUs.!
•  Monte Carlo barostat support providing NPT performance equivalent to NVT.!
•  ScaledMD support.!
•  Improved accelerated (aMD) MD support.!
•  Explicit solvent constant pH support.!
•  NMR restraint support on multiple GPUs.!
•  Improved error messages and checking.!
•  Hydrogen mass repartitioning support (4fs time steps).!
21
SAN DIEGO SUPERCOMPUTER CENTER
Reproducibility

Critical for Debugging Software and
Hardware!
22	

•  SPFP precision model is bitwise reproducible.
•  Same simulation from same random seed = same
result.
•  Is used to validate hardware (misbehaving GPUs)
(Exxact AMBER Certified Machines)
•  Successfully identified 2 GPU models with underlying
hardware issues based on this.
SAN DIEGO SUPERCOMPUTER CENTER
Reproducibility!
Good GPU! Bad GPU!
0.0:  Etot   =    -58229.3134!
0.1:  Etot   =    -58229.3134!
0.2:  Etot   =    -58229.3134!
0.3:  Etot   =    -58229.3134!
0.4:  Etot   =    -58229.3134!
0.5:  Etot   =    -58229.3134!
0.6:  Etot   =    -58229.3134!
0.7:  Etot   =    -58229.3134!
0.8:  Etot   =    -58229.3134!
0.9:  Etot   =    -58229.3134!
0.10:  Etot   =    -58229.3134!
0.0:  Etot   =    -58229.3134!
0.1:  Etot   =    -58227.1072!
0.2:  Etot   =    -58229.3134!
0.3:  Etot   =    -58218.9033!
0.4:  Etot   =    -58217.2088!
0.5:  Etot   =    -58229.3134!
0.6:  Etot   =    -58228.3001!
0.7:  Etot   =    -58229.3134!
0.8:  Etot   =    -58229.3134!
0.9:  Etot   =    -58231.6743!
0.10:  Etot   =    -58229.3134!
!
23	

Final Energy after 10^6 MD steps (~45 mins per run)
SAN DIEGO SUPERCOMPUTER CENTER
Supported GPUs!
•  Hardware Version 3.0 / 3.5 (Kepler I / Kepler II)!!
!Tesla K20/K20X/K40!
!Tesla K10!
!GTX-Titan / GTX-Titan-Black!
!GTX770 / 780 / 780Ti!
!GTX670 / 680 / 690!
!Quadro cards supporting SM3.0* or 3.5* !!
•  Hardware Version 2.0 (Fermi) !!
!Tesla M2090!
!Tesla C2050/C2070/C2075 (and M variants)!
!GTX560 / 570 / 580 / 590!
!GTX465 / 470 / 480!
!Quadro cards supporting SM2.0* !!
24
SAN DIEGO SUPERCOMPUTER CENTER
Supported System Sizes!
25	

Explicit Solvent PME!
Implicit Solvent GB!
SAN DIEGO SUPERCOMPUTER CENTER
Running on GPUs!
•  Details provided on: http://ambermd.org/gpus/!
•  Compile (assuming nvcc = 5.0 installed)!
•  cd $AMBERHOME!
•  ./configure –cuda gnu!
•  make install!
•  make test!
•  Running on GPU!
•  Just replace executable pmemd with pmemd.cuda!
•  $AMBERHOME/bin/pmemd.cuda –O –i mdin …!
If GPUs are set to process exclusive mode pmemd
just ‘Does the right thing’™!
26
SAN DIEGO SUPERCOMPUTER CENTER
Running on Multiple GPUs!
•  Compile (assuming nvcc = 5.0 installed)!
•  cd $AMBERHOME!
•  ./configure –cuda –mpi gnu!
•  make install!
!
•  Running on Multiple-GPUs!
•  export CUDA_VISIBLE_DEVICES=0,1!
•  mpirun –np 2 $AMBERHOME/bin/pmemd.cuda.MPI –O –i mdin …!
If the selected GPUs can 'talk' to each other via peer to peer then the code
will automatically use it. (look for Peer to Peer : ENABLED in mdout).!
27
SAN DIEGO SUPERCOMPUTER CENTER
Recommended Hardware
Configurations!
•  DIY Systems!
•  Exxact AMBER Certified Systems!
•  Come with AMBER preinstalled.!
•  Tesla or GeForce – 3 year warranty.!
•  Free test drives available.!
•  See the following page for updates:!
!
!http://ambermd.org/gpus/recommended_hardware.htm#hardware!
28
SAN DIEGO SUPERCOMPUTER CENTER
DIY 2 GPU System!
Antec P280 Black ATX Mid Tower Case!
http://tinyurl.com/lh4d96s

$109.99!
!
Corsair RM Series 1000 Watt ATX/
EPS 80PLUS Gold-Certified Power
Supply

http://tinyurl.com/l3lchqu

$184.99!
!
Intel i7-4820K LGA 2011 64 Technology
CPU BX80633I74820K!
http://tinyurl.com/l3tql8m!
$324.99!
Corsair Vengeance 16GB
(4x4GB) DDR3 1866 MHZ
(CMZ16GX3M4X1866C9)!
http://tinyurl.com/losocqr!
$182.99!
!
Asus DDR3 2400 Intel LGA 2011
Motherboard P9X79-E WS

http://tinyurl.com/kbqknvv
$460.00!
Seagate Barracuda 7200 3 TB
7200RPM Internal Bare Drive
ST3000DM001!
http://tinyurl.com/m9nyqtk
$107.75!
2 of EVGA GeForce GTX780
SuperClocked 3GB GDDR5
384bit!
http://tinyurl.com/m5e6c43!
$509.99 each!
Or K20C or K40C!
Total Price : $2411.84	

29	

Intel Thermal Solution Air!
http://tinyurl.com/mjwdp5e!
$21.15!
SAN DIEGO SUPERCOMPUTER CENTER
AMBER Certified Desktops!
30	

Joint program with Exxact Inc.!
http://exxactcorp.com/index.php/solution/solu_list/65!
•  Custom Solutions
Available.!
•  Free Test Drives.!
•  GSA Compliant.!
•  Sole source
compliant.!
•  Peer to Peer ready.!
•  Turn key AMBER
GPU solutions.!
Contact Ross Walker (ross@rosswalker.co.uk) or Mike Chen (mchen@exxactcorp.com) for more info.!
SAN DIEGO SUPERCOMPUTER CENTER
AMBER Exxact Certified Clusters!
31	

Contact Ross Walker (ross@rosswalker.co.uk) or Mike Chen (mchen@exxactcorp.com) for more info.!
SAN DIEGO SUPERCOMPUTER CENTER
Performance!
AMBER 14!
32	

Realistic benchmark suite available from:

http://ambermd.org/gpus/benchmarks.htm!
33	

30.21!
81.26!
129.79!
184.67!
270.99!
251.43!
361.33!
280.01!
389.63!
280.54!
383.32!
196.99!
263.85!
266.07!
364.67!
0.00! 50.00! 100.00! 150.00! 200.00! 250.00! 300.00! 350.00! 400.00! 450.00!
2xE5-2660v2 CPU (16 Cores)!
1X C2075!
2X C2075!
1X GTX 680!
2X GTX 680!
1X GTX 780!
2X GTX 780!
1X GTX 780Ti!
2X GTX 780Ti!
1X GTX Titan Black!
2X GTX Titan Black!
1X K20!
2X K20!
1X K40!
2X K40!
Performance (ns/day)!
DHFR (NVE) HMR 4fs 23,558 Atoms!
34	

29.01!
79.49!
126.26!
179.45!
261.23!
239.14!
334.29!
271.68!
366.51!
270.39!
359.91!
190.01!
248.40!
245.00!
335.25!
0.00! 50.00! 100.00! 150.00! 200.00! 250.00! 300.00! 350.00! 400.00!
2xE5-2660v2 CPU (16 Cores)!
1X C2075!
2X C2075!
1X GTX 680!
2X GTX 680!
1X GTX 780!
2X GTX 780!
1X GTX 780Ti!
2X GTX 780Ti!
1X GTX Titan Black!
2X GTX Titan Black!
1X K20!
2X K20!
1X K40!
2X K40!
Performance (ns/day)!
DHFR (NPT) HMR 4fs 23,558 Atoms!
35	

15.81!
41.66!
66.84!
95.45!
142.37!
128.37!
181.96!
146.46!
204.13!
146.12!
200.63!
102.67!
137.67!
138.33!
189.41!
0.00! 50.00! 100.00! 150.00! 200.00! 250.00!
2xE5-2660v2 CPU (16 Cores)!
1X C2075!
2X C2075!
1X GTX 680!
2X GTX 680!
1X GTX 780!
2X GTX 780!
1X GTX 780Ti!
2X GTX 780Ti!
1X GTX Titan Black!
2X GTX Titan Black!
1X K20!
2X K20!
1X K40!
2X K40!
Performance (ns/day)!
DHFR (NVE) 2fs 23,558 Atoms!
36	

0.85!
2.39!
3.46!
5.72!
8.19!
7.40!
10.47!
9.59!
13.64!
9.60!
13.36!
6.73!
8.80!
9.46!
13.13!
0.00! 2.00! 4.00! 6.00! 8.00! 10.00! 12.00! 14.00! 16.00!
2xE5-2660v2 CPU (16 Cores)!
1X C2075!
2X C2075!
1X GTX 680!
2X GTX 680!
1X GTX 780!
2X GTX 780!
1X GTX 780Ti!
2X GTX 780Ti!
1X GTX Titan Black!
2X GTX Titan Black!
1X K20!
2X K20!
1X K40!
2X K40!
Performance (ns/day)!
Cellulose (NVE) 2fs 408,609 Atoms!
SAN DIEGO SUPERCOMPUTER CENTER
Cumulative Performance

(The real differentiator)!
37	

 1.1 microseconds per day on a $6000 desktop!	

CPUs are
irrelevant to
AMBER GPU
performance.	

	

Buy $100 CPUs	

	

$6000
Workstation gets
you  1100ns/day
aggregate
performance for
the DHFR NVE
HMR benchmark	

30.21	
  
280.54	
  
2x	
  280.54	
  
3x	
  280.54	
  
4x	
  280.54	
  
0	
   200	
   400	
   600	
   800	
   1000	
   1200	
  
2xE5-­‐2660v2	
  CPU	
  (16	
  Cores)	
  
1X	
  GTX	
  Titan	
  Black	
  
2X	
  Individual	
  GTX	
  Titan	
  Black	
  
3X	
  Individual	
  GTX	
  Titan	
  Black	
  
4X	
  Individual	
  GTX	
  Titan	
  Black	
  
Cumulave	
  Performance	
  (ns/day)	
  
Amber	
  Cumulave	
  Performance	
  
DHFR	
  (NVE)	
  HMR	
  4fs	
  23,558	
  Atoms	
  
SAN DIEGO SUPERCOMPUTER CENTER
Historical Single Node / Single GPU
Performance

(DHFR Production NVE 2fs)!
38	

0	
  
20	
  
40	
  
60	
  
80	
  
100	
  
120	
  
2007	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
   2014	
  
ns/day	
  
PMEMD	
  Peformance	
  
GPU	
  
CPU	
  
Linear	
  (GPU)	
  
Linear	
  (CPU)	
  
SAN DIEGO SUPERCOMPUTER CENTER
Cost Comparison!
Traditional Cluster! GPU Workstation!
Nodes Required! 12!
Interconnect! QDR IB!
Time to complete
simulations!
4.98 days!
Power Consumption! 5.7 kW (681.3 kWh)!
System Cost (per day)! $96,800 ($88.40)!
Simulation Cost! (681.3 * 0.18) + (88.40 *
4.98)!
$562.87!
39	

4 simultaneous simulations, 23,000 atoms, 250ns each, 5 days
maximum time to solution.
SAN DIEGO SUPERCOMPUTER CENTER
Cost Comparison!
Traditional Cluster! GPU Workstation!
Nodes Required! 12! 1 (4 GPUs)!
Interconnect! QDR IB! None!
Time to complete
simulations!
4.98 days! 2.25 days!
Power Consumption! 5.7 kW (681.3 kWh)! 1.0 kW (54.0 kWh)!
System Cost (per day)! $96,800 ($88.40)! $5200 ($4.75)!
Simulation Cost! (681.3 * 0.18) + (88.40 *
4.98)!
(54.0 * 0.18) + (4.75 *
2.25)!
$562.87! $20.41!
40	

4 simultaneous simulations, 25,000 atoms, 250ns each, 5 days
maximum time to solution. 	

25x cheaper AND solution obtained in less than half the time
SAN DIEGO SUPERCOMPUTER CENTER
QA!
41
Replica Exchange Molecular Dynamics in the Age
of Heterogeneous Architectures.
The stuff dreams are made of…
Adrian E. Roitberg
Quantum Theory Project
Department of Chemistry
University of Florida
4  Conformational space has many local minima
4  Barriers hinder rapid escape from high-energy minima
4  How can we search and score all conformations?
Coordinate
EnergyMost of chemistry and certainly all of biology happens at
temperatures substantially above 0K
Why is convergence so slow?
•  We want the average of observable properties- so we
need to include in that weighted average all of the
important values that it might have
•  Structural averages can be slow to converge- you need to
wait for the structure to change, and do it many, many
times.
•  Sampling an event only once doesn’t tell you the correct
probability (or weight) for that event
What is the proper weight? Boltzmann’s equation
Pi
=
e
− Ei / kbT
e
− Ei / kbT
i
∑
We need to sample phase for a biomolecule.
Our tool of choice is molecular dynamics.
Loop over time steps, and at each time compute forces and
energies.
Within regular MD methodology there is only one way to
sample more conformational space à Run longer
The usual (old) way to port code to new architectures was to
focus on bottlenecks. In MD, this meant non-bonded
interactions (vdw + electrostatics) that scaled O(N2) (potentially
O(N lnN) and took ~90% of the time.
With GPUs we can run MUCH faster, for lower cost
So what? It is still a microsecond every 2 days, not enough to
sample conformational space, except locally.
So we have code that run FAST in one/two gpus, and we have
the emergence of large machines (keeneland, blue waters,
stampede, etc) with many gpus.
Are there methods we can use to leverage these two
components?
Simple way, run MANY independent simulations, each fast. The
probability of something “interesting” happening increases linearly
with the number of runs. Good, but not good enough.
Can we run independent simulations, and have them ‘talk’ to each
other and increase overall sampling ?
Yes !
Can we do it according to the laws of statistical mechanics?
Yes !
Temperature Replica Exchange
(Parallel tempering)
375K
350K
325K
300K
REMD
Hansmann, U., CPL 1997
Sugita  Okamoto, CPL 1999
? ??
You do not EXCHANGE, you ask PERMISSION to exchange and then decide.
[ ] [ ]
min(1,exp{( )( ( ) ( )})i j
m n E q E qρ β β= − −
vnew = vold
Tnew
Told
P X( )w X → X'
#$
%
' = P X '( )w X ' → X( )
P X( )= e
−βEX
Impose desired weighting (limiting
distribution) on exchange calculation
375K
350K
325K
300K
4 Impose reversibility/detailed balance
Rewrite using only potential energies, by rescaling kinetic energy for both
replicas.
Exchange?
Replica Exchange Methods
=
Treasure hunt + bartering
Only works when the replicas explore phase space ‘well’ and
exchange configurations.
The only gain comes from exchanging configurations.
Large gains in sampling efficiency using REMD
Perfect for heterogeneous architectures: Need FAST execution
within a replica, and the exchanges involve very little data (e.g.
communication overhead is effectively zero). Scaling is very close
to perfect.
Important questions:
How many replicas?
Where do I put the replicas?
How often do I try to exchange ?
How do I measure convergence and speedup?
“Exchange frequency in replica exchange
molecular dynamics”.
Daniel Sindhikara, Yilin Meng and Adrian
E. Roitberg.
Journal of Chemical Physics. 128,
024103 (2008).
“Optimization of Umbrella Sampling
Replica Exchange Molecular Dynamics by
Replica Positioning” Sabri-Dashti, D.;
Roitberg, A. E. Journal of Chemical Theory
and Computation. 9, 4692. (2013)
REMD does not have to be restricted to T-space !
H-REMD is a powerful technique that uses different Hamiltonians for
each replica.
H1
H4
H2
H3
Within H-REMD, there is another ‘split’
A.  Replicas in ‘space’ == umbrella sampling, each replica has a different
umbrella, replicas exchange. You can also use different restraints,
different restraint weights for NMR refinement, tighter restraints… Be
creative
B.  Replicas is true Hamiltonian space. Each replica is a different molecule,
or small changes in protonation, or force fields.
Imagine replicas where the rotational barriers (bottlenecks for sampling in
polymers) are reduced from the regular value. Thos
pKa (protein) = pKa (model) +
1
2.303kBT
ΔG proteinAH → proteinA−
( )− ΔG(AH → A−
)$
%

'
Application to a ‘hard’ pKa prediction. ASP 26 in Thoredoxin.
pKa = 7.5 (a regular ASP has a pKa of 4.0)
Meng, Yilin; Dashti, Danial Sabri; Roitberg, Adrian E. Journal of
Chemical Theory and Computation. 7, 2721-2727. (2011)
REMD methods lend themselves to lego-style mix and match
No restriction to use only one replica ladder
One could do Multidimensional-REMD.
Multiscale REMD
T-umbrella
Umbrella-umbrella-umbrella
Radius of gyration-T
T-umbrella (QM/MM) (Get enthalpy/entropy !!)
Accelerated MD
Computational
Studies of
RNA Dynamics
“Multi-dimensional Replica Exchange
Molecular Dynamics Yields a Converged
Ensemble of an RNA Tetranucleotide”
Bergonzo, C.; Henriksen, N.; Roe, D.; Swails,
J.; Roitberg, A. E.; Cheatham III, T.
Journal of Chemical Theory and Computation.
10:492–499 (2014)
Experimental rGACC
ensemble
A-form Reference
NMR Major NMR Minor
70-80% 20-30%
Yildirim et al. J.Phys. Chem. B, 2011, 115, 9261.
•  NMR spectroscopy
for single stranded
rGACC
•  Major and minor A-
form-like structures
Complete Sampling is Difficult
•  Conformational
variability is high,
even though the
structure is small
•  The populations of
different structures
(shown blue, green,
red, and yellow) are
still changing, even
after  5 µs of
simulation
Henriksen et al., 2013, J. Phys. Chem. B., 117, 4914.
Niel Henriksen
Enhanced Sampling using
Multi-Dimensional Replica Exchange MD
(M-REMD)
•  Parallel simulations
(“replicas”) run at different
Temperatures (277K –
400K), Hamiltonians (Full
dihedrals – 30% dihedrals)
•  Exchanges are attempted at
specific intervals
•  192 replicas, each run on its
own GPU
Sugita, Y.; Kitao, A.; Okamoto, Y.; 2000, J. Chem. Phys., 113, 6042.
Henriksen et al., J. Phys Chem B, 2013.117, 4014-27.; Bergonzo et al.; 2014, JCTC, 10, 492.
Criteria for Convergence
•  Independent runs with different
starting structures sample the
same conformations in the
same probabilities
•  Kullback-Leibler divergence of
the final probability distributions
(of any simulation observable)
between 2 runs reaches zero
61Bergonzo et al.; 2014, JCTC, 10, 492.
Cyan indicates a 95% confidence level, where slope
and y-intercept ranges are denoted by α and β.
NMR Major
18.1%, 31.7%
NMR Minor
24.6%, 23.4%
Intercalated anti
15.9%, 10.7%
Top 3 Clusters
ff12%, DAC%
63
NMR Major, 1-syn*
48.6%
*This structure is never sampled in Amber FF simulations.
pH-Replica Exchange molecular Dynamics in proteins using
a discrete protonation method
Each replica runs MD at a different pH (talk to me if
interested), exchange between replica. One can titrate
biomolecules very efficiently.
Acid-range HEWL Titration
Residue
Calculated
pKa
Exp. pKa
Glu 7 3.6 2.6
His 15 6.0 5.5
Asp 18 2.3 2.8
Glu 35 5.7 6.1
Asp 48 1.2 1.4
Asp 52 2.7 3.6
Asp 66 -18.1 1.2
Asp 87 2.5 2.2
Asp 101 3.7 4.5
Asp 119 2.3 3.5
RMSE 6.5 ------
(1) Webb, H.; Tynan-Connolly, B. M.; Lee, G. M.; Farrell, D.; O’Meara, F.; Sondergaard, C. R.; Teilum, K.; Hewage, C.; McIntosh, L. P.; Nielsen, J. E. Proteins 2011, 79, 685–702.
Constant pH MD, implicit solvent
pH Replica Exchange MD
pH = 5
pH = 4
pH = 3
pH = 2
pH = 5
pH = 4
pH = 3
pH = 2
?
?
Itoh, S. G.; Damjanovic, A.; Brooks, B. R. Proteins 2011, 79, 3420–3436.
Protonation State Sampling
•  CpHMD
 pH-REMD
RSS = 2.33 × 10-2
 RSS = 3.60 × 10-4
REMD wins, not by much
•  CpHMD
 pH-REMD
Protonation State Sampling
•  CpHMD
pH-REMD
RSS = 2.51 × 10-4
RSS = 1.33 × 10-1
REMD wins by a lot
Residue CpHMD pKa pH-REMD pKa
Experimental
pKa
Glu 7 3.6 3.8 2.6
His 15 6.0 5.9 5.5
Asp 18 2.3 2.0 2.8
Glu 35 5.7 5.0 6.1
Asp 48 1.2 2.0 1.4
Asp 52 2.7 2.3 3.6
Asp 66 -18.1 1.5 1.2
Asp 87 2.5 2.7 2.2
Asp 101 3.7 3.6 4.5
Asp 119 2.3 2.4 3.5
RMSE 6.5 0.89 ------
Substantially better results with pH-REMD
Ackowledgements:
David Case
Andy McCammon
John Mongan
Ross Walker
Tom Cheatham III
Dario Estrin
Marcelo Marti
Leo Boechi
$ + computer time
NSF/Blue Waters PRAC
NSF SI2
Keeneland Director’s allocation
Titan
NVIDIA
Dr. Yilin MengNatali Di Russo Dr. Jason Swails
Dr. Danial Sabri-Dashti
Mass Repartitioning to
Increase MD Timestep
(double the fun, double the
pleasure)
Chad Hopkins, Adrian Roitberg
Background
•  Work based off of :
–  “Improving Efficiency of Large Time-scale Molecular
Dynamics Simulations of Hydrogen-rich Systems”,
Feenstra, Hess, Berendsen – JCC 1999
•  Recommended increasing H mass by a factor of 4,
repartitioning heavy atom masses to keep total
system mass constant
•  For technical reasons with Amber code, we
increase by a factor of 3
•  Waters not repartitioned (unneeded due to
different constraint handling)
Repartitioning Example
1.008
12.01
12.01
12.01
16.00
14.01
1.008
1.008
3.024
3.024
3.024
5.962
12.01
9.994
16.00
11.99
Original After Repartitioning
Small peptide sampling (capped 3-ALA)
10 x 20 (40) ns Simulations
normal topology, 2 fs timestep
middle phi/psi populations
repartitioned topology, 4 fs timestep
middle phi/psi populations
Average Timings (Tesla M2070s)
Normal topology, 2 fs timestep 160 ns/day
Repartitioned topology, 4 fs timestep 305 ns/day
Dynamics test (peptide)
Free energy map generated from MD trajectory for phi/psi near N-terminal
Norm 2fs Repart 2fs Repart 4fs
average hops 112 114 200
Counting hops (for equal numbers of frames in each case) across the barrier between
the minima centered at (-60,150) and (-60,-40):
Per-residue RMSDs (HEWL)
C-N termini distance profile
Free energy calculation with mass
repartitioning
•  PMF from umbrella sampling for disulfide
linker dihedral in MTS spin label used in
EPR experiments
SAN DIEGO SUPERCOMPUTER CENTER
Acknowledgements!
San Diego Supercomputer Center
University of California San Diego
University of Florida
National Science Foundation
NSF SI2-SSE Program
NVIDIA Corporation
Hardware + People
People
Scott Le Grand Duncan Poole Mark Berger
Sarah Tariq Romelia Salomon Andreas Goetz
Robin Betz Ben Madej Jason Swails
Chad Hopkins 79
TEST DRIVE K40 GPU -
WORLD’S FASTEST GPU
I am very glad that NVIDIA has provided the GPU Test Drive.
Both my professor and I appreciate the opportunity as it
profoundly helped our research.
“
”Mehran Maghoumi, Brock Univ., Canada
www.nvidia.com/GPUTestDrive
SPECIAL OFFER FOR AMBER USERS
Buy 3 K40 Get 4th GPU Free
Contact your HW vendor or NVIDIA representative for more details
www.nvidia.com/GPUTestDrive
UPCOMING GTC EXPRESS WEBINARS
May 14: CUDA 6: Performance Overview
May 20: Using GPUs to Accelerate Orthorectification,
Atmospheric Correction, and
Transformations for Big Data
May 28: An Introduction to CUDA Programming
June 3: The Next Steps for Folding@home
www.gputechconf.com/gtcexpress
NVIDIA GLOBAL IMPACT AWARD
•  $150,000 annual award
•  Categories include: disease
research, automotive safety,
weather prediction
•  Submission deadline: Dec. 12,
2014
•  Winner announced at GTC 2015
Recognizing groundbreaking work with GPUs in tackling
key social and humanitarian problems
impact.nvidia.com

AMBER14 & GPUs

  • 1.
    SAN DIEGO SUPERCOMPUTERCENTER AMBER 14 and GPUs: Creating the World's Fastest MD Package on Commodity Hardware Ross Walker, Associate Research Professor and NVIDIA CUDA Fellow San Diego Supercomputer and Department of Chemistry Biochemistry Adrian Roitberg, Professor, Quantum Theory Project, Department of Chemistry, University of Florida 1
  • 2.
    SAN DIEGO SUPERCOMPUTERCENTER 2 Presenters! Ross Walker! San Diego Supercomputer Center 
 Department of Chemistry and Biochemistry
 UC San Diego! Adrian Roitberg! Quantum Theory Project
 Department of Chemistry
 University of Florida! Scott Le Grand! Senior Engineer! Amazon Web Services!
  • 3.
    SAN DIEGO SUPERCOMPUTERCENTER Webinar Overview! •  Part 1! •  Overview of Amber AmberTools! •  New Features in Amber v14! •  Overview of the Amber GPU Project! •  New GPU features in Amber v14! •  Recommended Hardware Configurations! •  Performance! •  Part 2! •  Hydrogen Mass Repartitioning! •  Constant pH MD! •  Multi-dimensional Replica Exchange MD! 3
  • 4.
    SAN DIEGO SUPERCOMPUTERCENTER What is AMBER?! An MD simulation package Latest version is 14.0 as of April 2014 distributed in two parts: - AmberTools, preparatory and analysis programs, free under GPL - Amber the main simulation programs, under academic licensing independent from the accompanying forcefields A set of MD forcefields fixed charge, biomolecular forcefields: ff94, ff99, ff99SB, ff03, ff11, ff12SB ... experimental polarizable forcefields e.g. ff02EP parameters for general organic molecules, solvents, carbohydrates (Glycam), etc. in the public domain 4
  • 5.
    SAN DIEGO SUPERCOMPUTERCENTER Development! AMBER is developed by a loose community of academic groups, led by Dave Case at Rutgers University, New Jersey, building on the earlier work of Peter Kollman at UCSF!
  • 6.
    SAN DIEGO SUPERCOMPUTERCENTER Development!
  • 7.
    SAN DIEGO SUPERCOMPUTERCENTER The AMBER Development Team 
 A Multi-Institutional Research Collaboration
 !Principal contributors to the current codes: David A. Case (Rutgers)
 Tom Darden (OpenEye)
 Thomas E. Cheatham III (Utah)
 Carlos Simmerling (Stony Brook)
 Adrian Roitberg (Florida)
 Junmei Wang (UT Southwestern Medical Center) Robert E. Duke (NIEHS and UNC-Chapel Hill) Ray Luo (UC Irvine)
 Daniel R. Roe (Utah)
 Ross C. Walker (SDSC, UCSD)
 Scott LeGrand (Amazon)
 Jason Swails (Rutgers)
 David Cerutti (Rutgers)
 Joe Kaus (UCSD)
 Robin Betz (UCSD)
 Romain M. Wolf (Novartis)
 Kenneth M. Merz (Michigan State)
 Gustavo Seabra (Recife, Brazil)
 Pawel Janowski (Rutgers)! Andreas W. Götz (SDSC, UCSD) István Kolossváry (Budapest and D.E. Shaw) Francesco Paesani (UCSD)Jian Liu (Berkeley) Xiongwu Wu (NIH)! Thomas Steinbrecher (Karlsruhe) Holger Gohlke (Düsseldorf)! Nadine Homeyer (Düsseldorf)! Qin Cai (UC Irvine)! Wes Smith (UC Irvine)! Dave Mathews (Rochester)! Romelia Salomon-Ferrer (SDSC, UCSD) Celeste Sagui (North Carolina State) Volodymyr Babin (North Carolina State) Tyler Luchko (Rutgers)! Sergey Gusarov (NINT)! Andriy Kovalenko (NINT)! Josh Berryman (U. of Luxembourg) Peter A. Kollman (UC San Francisco)! 7
  • 8.
  • 9.
    SAN DIEGO SUPERCOMPUTERCENTER Availability! •  AmberTools v14 – Available as a free download via the Amber Website! •  Amber v14 – Available for download after purchase from the Amber Website! http://ambermd.org/! 9
  • 10.
    SAN DIEGO SUPERCOMPUTERCENTER Distribution! 10 v14 Released April 15, 2014! AmberTools v14 – Free, OpenSource (BSD, LGPL etc.)! Amber v14 – Academic ($500), Industry ($20,000 / $15,000), HPC Center ($2000)! AmberTools v14! Amber v14
 Adds PMEMD MD Engine Support!
  • 11.
    SAN DIEGO SUPERCOMPUTERCENTER AmberTools v14: New Features! •  The sander module, our workhorse simulation program, is now a part of AmberTools! •  Greatly expanded and improved cpptraj program for analyzing trajectories! •  New documentation and tools for inspecting and modifying Amber parameter files! •  Improved workflow for setting up and analyzing simulations;! •  New capability for semi-empirical Born-Oppenheimer molecular dynamics! •  EMIL: a new absolute free energy method using TI! •  New Free Energy Workflow (FEW) tool automates free energy calculations (LIE, TI, and MM/PBSA-type calculations)! •  QM/MM Improvements – External interface to many QM packages including TeraChem (for GPU accelerate QM/MM).! •  Completely reorganized Reference Manual! 11
  • 12.
    SAN DIEGO SUPERCOMPUTERCENTER Amber v14: New Features! •  Major updates to PMEMD GPU support.! •  Expanded methods for free energy calculations.! •  Soft core TI supported in PMEMD.! •  Support for Intel MIC (experimental).! •  Support for Intel MPI.! •  ScaledMD.! •  Improved aMD support.! •  Self Guided Langevin Support.! 12
  • 13.
    SAN DIEGO SUPERCOMPUTERCENTER GPU Accelerated MD! 13
  • 14.
    SAN DIEGO SUPERCOMPUTERCENTER The Project! Develop a GPU accelerated
 version of AMBER’s PMEMD
 ! !
 San Diego
 Supercomputer Center !Ross C. Walker! ! !University of Florida ! !Adrian Roitberg! ! !NVIDIA (now AWS) ! !Scott Le Grand
 ! Funded in part by the NSF SI2-SSE Program.! 14
  • 15.
    SAN DIEGO SUPERCOMPUTERCENTER Project Info! •  AMBER Website: http://ambermd.org/gpus/! Publications! 1.  Salomon-Ferrer, R.; Goetz, A.W.; Poole, D.; Le Grand, S.;  Walker, R.C.* Routine microsecond molecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald , J. Chem. Theory Comput., in press, 2013, DOI: 10.1021/ct400314y! 2.  Goetz, A.W., Williamson, M.J., Xu, D., Poole, D.,Grand, S.L., Walker, R.C. Routine microsecond molecular dynamics simulations with amber - part i: Generalized born, Journal of Chemical Theory and Computation, 2012, 8 (5), pp 1542-1555, DOI:10.1021/ct200909j! 3.  Pierce, L.C.T., Salomon-Ferrer, R. de Oliveira, C.A.F. McCammon, J.A. Walker, R.C., Routine access to millisecond timescale events with accelerated molecular dynamics., Journal of Chemical Theory and Computation, 2012, 8 (9), pp 2997-3002, DOI: 10.1021/ct300284c! 4.  Salomon-Ferrer, R.; Case, D.A.; Walker, R.C.; An overview of the Amber biomolecular simulation package, WIREs Comput. Mol. Sci., 2012, in press, DOI: 10.1002/wcms.1121! 5.  Grand, S.L.; Goetz, A.W.; Walker, R.C.; SPFP: Speed without compromise - a mixed precision model for GPU accelerated molecular dynamics simulations, Chem. Phys. Comm., 2013, 184, pp374-380, DOI: 10.1016/j.cpc.2012.09.022! 15
  • 16.
    SAN DIEGO SUPERCOMPUTERCENTER Motivations! 16
  • 17.
    SAN DIEGO SUPERCOMPUTERCENTER Maximizing Impact! Supercomputers! •  Require expensive low 
 latency interconnects.! •  Large node counts needed.! •  Vast power consumption!! •  Politics controls access.! Desktops and Small Clusters! •  Cheap, ubiquitous.! •  Easily maintained.! •  Simple to use.! ! 17
  • 18.
    SAN DIEGO SUPERCOMPUTERCENTER Better Science?! •  Bringing the tools the researcher needs into his own lab.! •  Can we make a researcher’s desktop look like a small cluster (remove the queue wait)?! •  Can we make MD truly interactive (real time feedback / experimentation?)! •  Can we find a way for a researcher to cheaply increase the power of all his graduate students workstations?! •  Without having to worry about available power (power cost?).! •  Without having to worry about applying for cluster
 time.! •  Without having to have a full time ‘student’
 to maintain the group’s clusters?! •  GPU’s offer a possible cost
 effective solution.! 18
  • 19.
    SAN DIEGO SUPERCOMPUTERCENTER Design Goals! Overriding Design Goal: Sampling for the 99%! ! •  Focus on 2 million atoms.! •  Maximize single workstation performance.! •  Focus on minimizing costs.! •  Be able to use very cheap nodes.! •  Both gaming and tesla cards.! •  Transparent to user (same input, same output)! ! ! 19 The 0.0001% The 1.0% The 99.0%
  • 20.
    SAN DIEGO SUPERCOMPUTERCENTER Version History! •  AMBER 10 – Released Apr 2008! •  Implicit Solvent GB GPU support released as patch Sept 2009.! •  AMBER 11 – Released Apr 2010! •  Implicit and Explicit solvent supported internally on single GPU.! •  Oct 2010 – Bugfix.9 doubled performance on single GPU, added multi-GPU support.! •  AMBER 12 – Released Apr 2012! •  Added Umbrella Sampling Support, 1D-REMD, Simulated Annealing, aMD, IPS and Extra Points.! •  Aug 2012 – Bugfix.9 new SPFP precision model, support for Kepler I, GPU accelerate NMR restraints, improved performance.! •  Jan 2013 – Bugfix.14 support CUDA 5.0, Jarzynski on GPU, GBSA. Kepler II support.! 20
  • 21.
    SAN DIEGO SUPERCOMPUTERCENTER New in AMBER 14 (GPU)! •  ~20-30% performance improvement for single GPU runs.! •  Peer to peer support for multi-GPU runs providing enhanced multi-GPU scaling.! •  Hybrid bitwise reproducible fixed point precision model as standard (SPFP)! •  Support for Extra Points in Multi-GPU runs.! •  Jarzynski Sampling! •  GBSA support! •  Support for off-diagonal modifications to VDW parameters.! •  Multi-dimensional Replica Exchange (Temperature and Hamiltonian)! •  Support for CUDA 5.0, 5.5 and 6.0! •  Support for latest generation GPUs.! •  Monte Carlo barostat support providing NPT performance equivalent to NVT.! •  ScaledMD support.! •  Improved accelerated (aMD) MD support.! •  Explicit solvent constant pH support.! •  NMR restraint support on multiple GPUs.! •  Improved error messages and checking.! •  Hydrogen mass repartitioning support (4fs time steps).! 21
  • 22.
    SAN DIEGO SUPERCOMPUTERCENTER Reproducibility
 Critical for Debugging Software and Hardware! 22 •  SPFP precision model is bitwise reproducible. •  Same simulation from same random seed = same result. •  Is used to validate hardware (misbehaving GPUs) (Exxact AMBER Certified Machines) •  Successfully identified 2 GPU models with underlying hardware issues based on this.
  • 23.
    SAN DIEGO SUPERCOMPUTERCENTER Reproducibility! Good GPU! Bad GPU! 0.0:  Etot   =    -58229.3134! 0.1:  Etot   =    -58229.3134! 0.2:  Etot   =    -58229.3134! 0.3:  Etot   =    -58229.3134! 0.4:  Etot   =    -58229.3134! 0.5:  Etot   =    -58229.3134! 0.6:  Etot   =    -58229.3134! 0.7:  Etot   =    -58229.3134! 0.8:  Etot   =    -58229.3134! 0.9:  Etot   =    -58229.3134! 0.10:  Etot   =    -58229.3134! 0.0:  Etot   =    -58229.3134! 0.1:  Etot   =    -58227.1072! 0.2:  Etot   =    -58229.3134! 0.3:  Etot   =    -58218.9033! 0.4:  Etot   =    -58217.2088! 0.5:  Etot   =    -58229.3134! 0.6:  Etot   =    -58228.3001! 0.7:  Etot   =    -58229.3134! 0.8:  Etot   =    -58229.3134! 0.9:  Etot   =    -58231.6743! 0.10:  Etot   =    -58229.3134! ! 23 Final Energy after 10^6 MD steps (~45 mins per run)
  • 24.
    SAN DIEGO SUPERCOMPUTERCENTER Supported GPUs! •  Hardware Version 3.0 / 3.5 (Kepler I / Kepler II)!! !Tesla K20/K20X/K40! !Tesla K10! !GTX-Titan / GTX-Titan-Black! !GTX770 / 780 / 780Ti! !GTX670 / 680 / 690! !Quadro cards supporting SM3.0* or 3.5* !! •  Hardware Version 2.0 (Fermi) !! !Tesla M2090! !Tesla C2050/C2070/C2075 (and M variants)! !GTX560 / 570 / 580 / 590! !GTX465 / 470 / 480! !Quadro cards supporting SM2.0* !! 24
  • 25.
    SAN DIEGO SUPERCOMPUTERCENTER Supported System Sizes! 25 Explicit Solvent PME! Implicit Solvent GB!
  • 26.
    SAN DIEGO SUPERCOMPUTERCENTER Running on GPUs! •  Details provided on: http://ambermd.org/gpus/! •  Compile (assuming nvcc = 5.0 installed)! •  cd $AMBERHOME! •  ./configure –cuda gnu! •  make install! •  make test! •  Running on GPU! •  Just replace executable pmemd with pmemd.cuda! •  $AMBERHOME/bin/pmemd.cuda –O –i mdin …! If GPUs are set to process exclusive mode pmemd just ‘Does the right thing’™! 26
  • 27.
    SAN DIEGO SUPERCOMPUTERCENTER Running on Multiple GPUs! •  Compile (assuming nvcc = 5.0 installed)! •  cd $AMBERHOME! •  ./configure –cuda –mpi gnu! •  make install! ! •  Running on Multiple-GPUs! •  export CUDA_VISIBLE_DEVICES=0,1! •  mpirun –np 2 $AMBERHOME/bin/pmemd.cuda.MPI –O –i mdin …! If the selected GPUs can 'talk' to each other via peer to peer then the code will automatically use it. (look for Peer to Peer : ENABLED in mdout).! 27
  • 28.
    SAN DIEGO SUPERCOMPUTERCENTER Recommended Hardware Configurations! •  DIY Systems! •  Exxact AMBER Certified Systems! •  Come with AMBER preinstalled.! •  Tesla or GeForce – 3 year warranty.! •  Free test drives available.! •  See the following page for updates:! ! !http://ambermd.org/gpus/recommended_hardware.htm#hardware! 28
  • 29.
    SAN DIEGO SUPERCOMPUTERCENTER DIY 2 GPU System! Antec P280 Black ATX Mid Tower Case! http://tinyurl.com/lh4d96s
 $109.99! ! Corsair RM Series 1000 Watt ATX/ EPS 80PLUS Gold-Certified Power Supply
 http://tinyurl.com/l3lchqu
 $184.99! ! Intel i7-4820K LGA 2011 64 Technology CPU BX80633I74820K! http://tinyurl.com/l3tql8m! $324.99! Corsair Vengeance 16GB (4x4GB) DDR3 1866 MHZ (CMZ16GX3M4X1866C9)! http://tinyurl.com/losocqr! $182.99! ! Asus DDR3 2400 Intel LGA 2011 Motherboard P9X79-E WS
 http://tinyurl.com/kbqknvv $460.00! Seagate Barracuda 7200 3 TB 7200RPM Internal Bare Drive ST3000DM001! http://tinyurl.com/m9nyqtk $107.75! 2 of EVGA GeForce GTX780 SuperClocked 3GB GDDR5 384bit! http://tinyurl.com/m5e6c43! $509.99 each! Or K20C or K40C! Total Price : $2411.84 29 Intel Thermal Solution Air! http://tinyurl.com/mjwdp5e! $21.15!
  • 30.
    SAN DIEGO SUPERCOMPUTERCENTER AMBER Certified Desktops! 30 Joint program with Exxact Inc.! http://exxactcorp.com/index.php/solution/solu_list/65! •  Custom Solutions Available.! •  Free Test Drives.! •  GSA Compliant.! •  Sole source compliant.! •  Peer to Peer ready.! •  Turn key AMBER GPU solutions.! Contact Ross Walker (ross@rosswalker.co.uk) or Mike Chen (mchen@exxactcorp.com) for more info.!
  • 31.
    SAN DIEGO SUPERCOMPUTERCENTER AMBER Exxact Certified Clusters! 31 Contact Ross Walker (ross@rosswalker.co.uk) or Mike Chen (mchen@exxactcorp.com) for more info.!
  • 32.
    SAN DIEGO SUPERCOMPUTERCENTER Performance! AMBER 14! 32 Realistic benchmark suite available from:
 http://ambermd.org/gpus/benchmarks.htm!
  • 33.
    33 30.21! 81.26! 129.79! 184.67! 270.99! 251.43! 361.33! 280.01! 389.63! 280.54! 383.32! 196.99! 263.85! 266.07! 364.67! 0.00! 50.00! 100.00!150.00! 200.00! 250.00! 300.00! 350.00! 400.00! 450.00! 2xE5-2660v2 CPU (16 Cores)! 1X C2075! 2X C2075! 1X GTX 680! 2X GTX 680! 1X GTX 780! 2X GTX 780! 1X GTX 780Ti! 2X GTX 780Ti! 1X GTX Titan Black! 2X GTX Titan Black! 1X K20! 2X K20! 1X K40! 2X K40! Performance (ns/day)! DHFR (NVE) HMR 4fs 23,558 Atoms!
  • 34.
    34 29.01! 79.49! 126.26! 179.45! 261.23! 239.14! 334.29! 271.68! 366.51! 270.39! 359.91! 190.01! 248.40! 245.00! 335.25! 0.00! 50.00! 100.00!150.00! 200.00! 250.00! 300.00! 350.00! 400.00! 2xE5-2660v2 CPU (16 Cores)! 1X C2075! 2X C2075! 1X GTX 680! 2X GTX 680! 1X GTX 780! 2X GTX 780! 1X GTX 780Ti! 2X GTX 780Ti! 1X GTX Titan Black! 2X GTX Titan Black! 1X K20! 2X K20! 1X K40! 2X K40! Performance (ns/day)! DHFR (NPT) HMR 4fs 23,558 Atoms!
  • 35.
    35 15.81! 41.66! 66.84! 95.45! 142.37! 128.37! 181.96! 146.46! 204.13! 146.12! 200.63! 102.67! 137.67! 138.33! 189.41! 0.00! 50.00! 100.00!150.00! 200.00! 250.00! 2xE5-2660v2 CPU (16 Cores)! 1X C2075! 2X C2075! 1X GTX 680! 2X GTX 680! 1X GTX 780! 2X GTX 780! 1X GTX 780Ti! 2X GTX 780Ti! 1X GTX Titan Black! 2X GTX Titan Black! 1X K20! 2X K20! 1X K40! 2X K40! Performance (ns/day)! DHFR (NVE) 2fs 23,558 Atoms!
  • 36.
    36 0.85! 2.39! 3.46! 5.72! 8.19! 7.40! 10.47! 9.59! 13.64! 9.60! 13.36! 6.73! 8.80! 9.46! 13.13! 0.00! 2.00! 4.00!6.00! 8.00! 10.00! 12.00! 14.00! 16.00! 2xE5-2660v2 CPU (16 Cores)! 1X C2075! 2X C2075! 1X GTX 680! 2X GTX 680! 1X GTX 780! 2X GTX 780! 1X GTX 780Ti! 2X GTX 780Ti! 1X GTX Titan Black! 2X GTX Titan Black! 1X K20! 2X K20! 1X K40! 2X K40! Performance (ns/day)! Cellulose (NVE) 2fs 408,609 Atoms!
  • 37.
    SAN DIEGO SUPERCOMPUTERCENTER Cumulative Performance
 (The real differentiator)! 37 1.1 microseconds per day on a $6000 desktop! CPUs are irrelevant to AMBER GPU performance. Buy $100 CPUs $6000 Workstation gets you 1100ns/day aggregate performance for the DHFR NVE HMR benchmark 30.21   280.54   2x  280.54   3x  280.54   4x  280.54   0   200   400   600   800   1000   1200   2xE5-­‐2660v2  CPU  (16  Cores)   1X  GTX  Titan  Black   2X  Individual  GTX  Titan  Black   3X  Individual  GTX  Titan  Black   4X  Individual  GTX  Titan  Black   Cumulave  Performance  (ns/day)   Amber  Cumulave  Performance   DHFR  (NVE)  HMR  4fs  23,558  Atoms  
  • 38.
    SAN DIEGO SUPERCOMPUTERCENTER Historical Single Node / Single GPU Performance
 (DHFR Production NVE 2fs)! 38 0   20   40   60   80   100   120   2007   2008   2009   2010   2011   2012   2013   2014   ns/day   PMEMD  Peformance   GPU   CPU   Linear  (GPU)   Linear  (CPU)  
  • 39.
    SAN DIEGO SUPERCOMPUTERCENTER Cost Comparison! Traditional Cluster! GPU Workstation! Nodes Required! 12! Interconnect! QDR IB! Time to complete simulations! 4.98 days! Power Consumption! 5.7 kW (681.3 kWh)! System Cost (per day)! $96,800 ($88.40)! Simulation Cost! (681.3 * 0.18) + (88.40 * 4.98)! $562.87! 39 4 simultaneous simulations, 23,000 atoms, 250ns each, 5 days maximum time to solution.
  • 40.
    SAN DIEGO SUPERCOMPUTERCENTER Cost Comparison! Traditional Cluster! GPU Workstation! Nodes Required! 12! 1 (4 GPUs)! Interconnect! QDR IB! None! Time to complete simulations! 4.98 days! 2.25 days! Power Consumption! 5.7 kW (681.3 kWh)! 1.0 kW (54.0 kWh)! System Cost (per day)! $96,800 ($88.40)! $5200 ($4.75)! Simulation Cost! (681.3 * 0.18) + (88.40 * 4.98)! (54.0 * 0.18) + (4.75 * 2.25)! $562.87! $20.41! 40 4 simultaneous simulations, 25,000 atoms, 250ns each, 5 days maximum time to solution. 25x cheaper AND solution obtained in less than half the time
  • 41.
  • 42.
    Replica Exchange MolecularDynamics in the Age of Heterogeneous Architectures. The stuff dreams are made of… Adrian E. Roitberg Quantum Theory Project Department of Chemistry University of Florida
  • 43.
    4  Conformational spacehas many local minima 4  Barriers hinder rapid escape from high-energy minima 4  How can we search and score all conformations? Coordinate EnergyMost of chemistry and certainly all of biology happens at temperatures substantially above 0K
  • 44.
    Why is convergenceso slow? •  We want the average of observable properties- so we need to include in that weighted average all of the important values that it might have •  Structural averages can be slow to converge- you need to wait for the structure to change, and do it many, many times. •  Sampling an event only once doesn’t tell you the correct probability (or weight) for that event What is the proper weight? Boltzmann’s equation Pi = e − Ei / kbT e − Ei / kbT i ∑
  • 45.
    We need tosample phase for a biomolecule. Our tool of choice is molecular dynamics. Loop over time steps, and at each time compute forces and energies. Within regular MD methodology there is only one way to sample more conformational space à Run longer The usual (old) way to port code to new architectures was to focus on bottlenecks. In MD, this meant non-bonded interactions (vdw + electrostatics) that scaled O(N2) (potentially O(N lnN) and took ~90% of the time.
  • 46.
    With GPUs wecan run MUCH faster, for lower cost So what? It is still a microsecond every 2 days, not enough to sample conformational space, except locally. So we have code that run FAST in one/two gpus, and we have the emergence of large machines (keeneland, blue waters, stampede, etc) with many gpus. Are there methods we can use to leverage these two components? Simple way, run MANY independent simulations, each fast. The probability of something “interesting” happening increases linearly with the number of runs. Good, but not good enough.
  • 47.
    Can we runindependent simulations, and have them ‘talk’ to each other and increase overall sampling ? Yes ! Can we do it according to the laws of statistical mechanics? Yes !
  • 48.
    Temperature Replica Exchange (Paralleltempering) 375K 350K 325K 300K REMD Hansmann, U., CPL 1997 Sugita Okamoto, CPL 1999 ? ?? You do not EXCHANGE, you ask PERMISSION to exchange and then decide.
  • 49.
    [ ] [] min(1,exp{( )( ( ) ( )})i j m n E q E qρ β β= − − vnew = vold Tnew Told P X( )w X → X' #$ % ' = P X '( )w X ' → X( ) P X( )= e −βEX Impose desired weighting (limiting distribution) on exchange calculation 375K 350K 325K 300K 4 Impose reversibility/detailed balance Rewrite using only potential energies, by rescaling kinetic energy for both replicas. Exchange?
  • 50.
    Replica Exchange Methods = Treasurehunt + bartering Only works when the replicas explore phase space ‘well’ and exchange configurations. The only gain comes from exchanging configurations.
  • 51.
    Large gains insampling efficiency using REMD Perfect for heterogeneous architectures: Need FAST execution within a replica, and the exchanges involve very little data (e.g. communication overhead is effectively zero). Scaling is very close to perfect. Important questions: How many replicas? Where do I put the replicas? How often do I try to exchange ? How do I measure convergence and speedup? “Exchange frequency in replica exchange molecular dynamics”. Daniel Sindhikara, Yilin Meng and Adrian E. Roitberg. Journal of Chemical Physics. 128, 024103 (2008). “Optimization of Umbrella Sampling Replica Exchange Molecular Dynamics by Replica Positioning” Sabri-Dashti, D.; Roitberg, A. E. Journal of Chemical Theory and Computation. 9, 4692. (2013)
  • 52.
    REMD does nothave to be restricted to T-space ! H-REMD is a powerful technique that uses different Hamiltonians for each replica. H1 H4 H2 H3
  • 53.
    Within H-REMD, thereis another ‘split’ A.  Replicas in ‘space’ == umbrella sampling, each replica has a different umbrella, replicas exchange. You can also use different restraints, different restraint weights for NMR refinement, tighter restraints… Be creative B.  Replicas is true Hamiltonian space. Each replica is a different molecule, or small changes in protonation, or force fields. Imagine replicas where the rotational barriers (bottlenecks for sampling in polymers) are reduced from the regular value. Thos
  • 54.
    pKa (protein) =pKa (model) + 1 2.303kBT ΔG proteinAH → proteinA− ( )− ΔG(AH → A− )$ % ' Application to a ‘hard’ pKa prediction. ASP 26 in Thoredoxin. pKa = 7.5 (a regular ASP has a pKa of 4.0)
  • 55.
    Meng, Yilin; Dashti,Danial Sabri; Roitberg, Adrian E. Journal of Chemical Theory and Computation. 7, 2721-2727. (2011)
  • 56.
    REMD methods lendthemselves to lego-style mix and match No restriction to use only one replica ladder One could do Multidimensional-REMD. Multiscale REMD T-umbrella Umbrella-umbrella-umbrella Radius of gyration-T T-umbrella (QM/MM) (Get enthalpy/entropy !!) Accelerated MD
  • 57.
    Computational Studies of RNA Dynamics “Multi-dimensionalReplica Exchange Molecular Dynamics Yields a Converged Ensemble of an RNA Tetranucleotide” Bergonzo, C.; Henriksen, N.; Roe, D.; Swails, J.; Roitberg, A. E.; Cheatham III, T. Journal of Chemical Theory and Computation. 10:492–499 (2014)
  • 58.
    Experimental rGACC ensemble A-form Reference NMRMajor NMR Minor 70-80% 20-30% Yildirim et al. J.Phys. Chem. B, 2011, 115, 9261. •  NMR spectroscopy for single stranded rGACC •  Major and minor A- form-like structures
  • 59.
    Complete Sampling isDifficult •  Conformational variability is high, even though the structure is small •  The populations of different structures (shown blue, green, red, and yellow) are still changing, even after 5 µs of simulation Henriksen et al., 2013, J. Phys. Chem. B., 117, 4914. Niel Henriksen
  • 60.
    Enhanced Sampling using Multi-DimensionalReplica Exchange MD (M-REMD) •  Parallel simulations (“replicas”) run at different Temperatures (277K – 400K), Hamiltonians (Full dihedrals – 30% dihedrals) •  Exchanges are attempted at specific intervals •  192 replicas, each run on its own GPU Sugita, Y.; Kitao, A.; Okamoto, Y.; 2000, J. Chem. Phys., 113, 6042. Henriksen et al., J. Phys Chem B, 2013.117, 4014-27.; Bergonzo et al.; 2014, JCTC, 10, 492.
  • 61.
    Criteria for Convergence • Independent runs with different starting structures sample the same conformations in the same probabilities •  Kullback-Leibler divergence of the final probability distributions (of any simulation observable) between 2 runs reaches zero 61Bergonzo et al.; 2014, JCTC, 10, 492. Cyan indicates a 95% confidence level, where slope and y-intercept ranges are denoted by α and β.
  • 62.
    NMR Major 18.1%, 31.7% NMRMinor 24.6%, 23.4% Intercalated anti 15.9%, 10.7% Top 3 Clusters ff12%, DAC%
  • 63.
    63 NMR Major, 1-syn* 48.6% *Thisstructure is never sampled in Amber FF simulations.
  • 64.
    pH-Replica Exchange molecularDynamics in proteins using a discrete protonation method Each replica runs MD at a different pH (talk to me if interested), exchange between replica. One can titrate biomolecules very efficiently.
  • 65.
    Acid-range HEWL Titration Residue Calculated pKa Exp.pKa Glu 7 3.6 2.6 His 15 6.0 5.5 Asp 18 2.3 2.8 Glu 35 5.7 6.1 Asp 48 1.2 1.4 Asp 52 2.7 3.6 Asp 66 -18.1 1.2 Asp 87 2.5 2.2 Asp 101 3.7 4.5 Asp 119 2.3 3.5 RMSE 6.5 ------ (1) Webb, H.; Tynan-Connolly, B. M.; Lee, G. M.; Farrell, D.; O’Meara, F.; Sondergaard, C. R.; Teilum, K.; Hewage, C.; McIntosh, L. P.; Nielsen, J. E. Proteins 2011, 79, 685–702. Constant pH MD, implicit solvent
  • 66.
    pH Replica ExchangeMD pH = 5 pH = 4 pH = 3 pH = 2 pH = 5 pH = 4 pH = 3 pH = 2 ? ? Itoh, S. G.; Damjanovic, A.; Brooks, B. R. Proteins 2011, 79, 3420–3436.
  • 67.
    Protonation State Sampling • CpHMD pH-REMD RSS = 2.33 × 10-2 RSS = 3.60 × 10-4 REMD wins, not by much •  CpHMD pH-REMD
  • 68.
    Protonation State Sampling • CpHMD pH-REMD RSS = 2.51 × 10-4 RSS = 1.33 × 10-1 REMD wins by a lot
  • 69.
    Residue CpHMD pKapH-REMD pKa Experimental pKa Glu 7 3.6 3.8 2.6 His 15 6.0 5.9 5.5 Asp 18 2.3 2.0 2.8 Glu 35 5.7 5.0 6.1 Asp 48 1.2 2.0 1.4 Asp 52 2.7 2.3 3.6 Asp 66 -18.1 1.5 1.2 Asp 87 2.5 2.7 2.2 Asp 101 3.7 3.6 4.5 Asp 119 2.3 2.4 3.5 RMSE 6.5 0.89 ------ Substantially better results with pH-REMD
  • 70.
    Ackowledgements: David Case Andy McCammon JohnMongan Ross Walker Tom Cheatham III Dario Estrin Marcelo Marti Leo Boechi $ + computer time NSF/Blue Waters PRAC NSF SI2 Keeneland Director’s allocation Titan NVIDIA Dr. Yilin MengNatali Di Russo Dr. Jason Swails Dr. Danial Sabri-Dashti
  • 71.
    Mass Repartitioning to IncreaseMD Timestep (double the fun, double the pleasure) Chad Hopkins, Adrian Roitberg
  • 72.
    Background •  Work basedoff of : –  “Improving Efficiency of Large Time-scale Molecular Dynamics Simulations of Hydrogen-rich Systems”, Feenstra, Hess, Berendsen – JCC 1999 •  Recommended increasing H mass by a factor of 4, repartitioning heavy atom masses to keep total system mass constant •  For technical reasons with Amber code, we increase by a factor of 3 •  Waters not repartitioned (unneeded due to different constraint handling)
  • 73.
  • 74.
    Small peptide sampling(capped 3-ALA) 10 x 20 (40) ns Simulations normal topology, 2 fs timestep middle phi/psi populations repartitioned topology, 4 fs timestep middle phi/psi populations Average Timings (Tesla M2070s) Normal topology, 2 fs timestep 160 ns/day Repartitioned topology, 4 fs timestep 305 ns/day
  • 75.
    Dynamics test (peptide) Freeenergy map generated from MD trajectory for phi/psi near N-terminal Norm 2fs Repart 2fs Repart 4fs average hops 112 114 200 Counting hops (for equal numbers of frames in each case) across the barrier between the minima centered at (-60,150) and (-60,-40):
  • 76.
  • 77.
  • 78.
    Free energy calculationwith mass repartitioning •  PMF from umbrella sampling for disulfide linker dihedral in MTS spin label used in EPR experiments
  • 79.
    SAN DIEGO SUPERCOMPUTERCENTER Acknowledgements! San Diego Supercomputer Center University of California San Diego University of Florida National Science Foundation NSF SI2-SSE Program NVIDIA Corporation Hardware + People People Scott Le Grand Duncan Poole Mark Berger Sarah Tariq Romelia Salomon Andreas Goetz Robin Betz Ben Madej Jason Swails Chad Hopkins 79
  • 80.
    TEST DRIVE K40GPU - WORLD’S FASTEST GPU I am very glad that NVIDIA has provided the GPU Test Drive. Both my professor and I appreciate the opportunity as it profoundly helped our research. “ ”Mehran Maghoumi, Brock Univ., Canada www.nvidia.com/GPUTestDrive
  • 81.
    SPECIAL OFFER FORAMBER USERS Buy 3 K40 Get 4th GPU Free Contact your HW vendor or NVIDIA representative for more details www.nvidia.com/GPUTestDrive
  • 82.
    UPCOMING GTC EXPRESSWEBINARS May 14: CUDA 6: Performance Overview May 20: Using GPUs to Accelerate Orthorectification, Atmospheric Correction, and Transformations for Big Data May 28: An Introduction to CUDA Programming June 3: The Next Steps for Folding@home www.gputechconf.com/gtcexpress
  • 83.
    NVIDIA GLOBAL IMPACTAWARD •  $150,000 annual award •  Categories include: disease research, automotive safety, weather prediction •  Submission deadline: Dec. 12, 2014 •  Winner announced at GTC 2015 Recognizing groundbreaking work with GPUs in tackling key social and humanitarian problems impact.nvidia.com