Nodes, box size, atoms, cpu time, cpu+gpu time, gpu speedup11x1x13276842.26.336.67 x82x2x226214441.86.736.21 x273x3x388473641.56.866.05 x644x4x4209715241.57.185.78 x1255x5x5409600041.47.185.77 x2166x6x67077888427.665.48 x3437x7x71123942441.98.345.02 x5128x8x81677721642.38.415.03 x7299x9x92388787242.58.924.76 x
Before we end this session I would like to tell you about GPU Test Drive. It is an excellent resource for computational chemistry researchers such as yourself to evaluate benefits of GPU computing in speeding up your simulations. Most importantly it is free.NVIDIA along with its partners is offering access to remotely hosted GPU cluster. You can run applications such as AMBER and NAMD to find out how your models speed up. You can also try code that you have developed to run on GPU and see how it scales on a 8 GPU cluster. All you need to do is sign up and log in – it is really that easy! We have several partners who are demonstrating the GPU Test Drive on the GTC show floor. Please plan on visiting them.Sign up forms have been given out. If you are interested please fill them out and return them to me.
LAMMPS Molecular Dynamics on GPU
LAMMPS, Dec. 2011 or later
Summary/ConclusionsBenefits of GPU Accelerated ComputingFaster than CPU only systems in all testsLarge performance boost with small marginal price increaseEnergy usage cut in halfGPUs scale very well within a node and over multiple nodesTesla K20 GPU is our fastest and lowest power high performance GPU to date Try GPU accelerated LAMMPS for free – www.nvidia.com/GPUTestDrive
More Science for Your Money Embedded Atom Model Blue node uses 2x E5-2687W (8 Cores 6 and 150W per CPU). 5.5 Green nodes have 2x E5-2687W and 1 5 or 2 NVIDIA K10, K20, or K20X GPUs (235W).Speedup Compared to CPU Only 4.5 4 3.3 2.92 3 2.47 2 1.7 1 0 CPU Only CPU + 1x CPU + 1x CPU + 1x CPU + 2x CPU + 2x CPU + 2x K10 K20 K20X K10 K20 K20X Experience performance increases of up to 5.5x with Kepler GPU nodes.
K20X, the Fastest GPU Yet 7 Blue node uses 2x E5-2687W (8 Cores and 150W per CPU). 6 Green nodes have 2x E5-2687W and 2 NVIDIA M2090s or K20X GPUs (235W).Speedup Relative to CPU Alone 5 4 3 2 1 0 CPU Only CPU + 2x M2090 CPU + K20X CPU + 2x K20X Experience performance increases of up to 6.2x with Kepler GPU nodes. One K20X performs as well as two M2090s
Get a CPU Rebate to Fund Part of Your GPU Budget Acceleration in Loop Time Computation by Additional GPUs Running NAMD version 2.9 20 18.2 The blue node contains Dual X5670 CPUs 18 (6 Cores per CPU). 16 The green nodes contain Dual X5570 CPUs Normalized to CPU Only 14 12.9 (4 Cores per CPU) and 1-4 NVIDIA M2090 GPUs. 12 9.88 10 8 6 5.31 4 2 0 1 Node 1 Node + 1x M20901 Node + 2x M20901 Node + 3x M20901 Node + 4x M2090 Increase performance 18x when compared to CPU-only nodes Cheaper CPUs used with GPUs AND still faster overall performance when compared to more expensive CPUs!
Excellent Strong Scaling on Large Clusters LAMMPS Gay-Berne 134M Atoms 600 GPU Accelerated XK6 500 CPU only XE6 Loop Time (seconds) 400 3.55x 300 200 3.48x 3.45x 100 0 300 400 500 600 700 800 900 Nodes From 300-900 nodes, the NVIDIA GPU-powered XK6 maintained 3.5x performance compared to XE6 CPU nodes Each blue Cray XE6 Nodes have 2x AMD Opteron CPUs (16 Cores per CPU) Each green Cray XK6 Node has 1x AMD Opteron 1600 CPU (16 Cores per CPU) and 1x NVIDIA X2090
GPUs Sustain 5x Performance for Weak Scaling Weak Scaling with 32K Atoms per Node 45 40 Loop Time (seconds) 35 30 6.7x 5.8x 4.8x 25 20 15 10 5 0 1 8 27 64 125 216 343 512 729 Nodes Performance of 4.8x-6.7x with GPU-accelerated nodes when compared to CPUs alone Each blue Cray XE6 Node have 2x AMD Opteron CPUs (16 Cores per CPU) Each green Cray XK6 Node has 1x AMD Opteron 1600 CPU (16 Core per CPU) and 1x NVIDIA X2090
Faster, Greener — Worth It! Energy Consumed in one loop of EAM 140 120 GPU-accelerated computing uses Lower is better 53% less energy than CPU only 100Energy Expended (kJ) 80 60 Energy Expended = Power x Time Power calculated by combining the component’s TDPs 40 20 0 1 Node 1 Node + 1 K20X 1 Node + 2x K20X Blue node uses 2x E5-2687W (8 Cores and 150W per CPU) and CUDA 4.2.9. Green nodes have 2x E5-2687W and 1 or 2 NVIDIA K20X GPUs (235W) running CUDA 5.0.36.
Molecular Dynamics with LAMMPS on a Hybrid Cray Supercomputer W. Michael Brown National Center for Computational Sciences Oak Ridge National Laboratory NVIDIA Technology Theater, Supercomputing 2012 November 14, 2012
Recommended GPU Node Configuration for LAMMPS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 # of GPUs per CPU socket 1-2 GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand13 Scale to multiple nodes with same single node configuration
GPU Test Drive Experience GPU Acceleration For Computational Chemistry Researchers, Biophysicists Preconfigured with Molecular Dynamics Apps Remotely Hosted GPU Servers Free & Easy – Sign up, Log in and See Results www.nvidia.com/gputestdrive14