Your SlideShare is downloading. ×
0
LAMMPS, Dec. 2011 or later
Summary/ConclusionsBenefits of GPU Accelerated ComputingFaster than CPU only systems in all testsLarge performance boost w...
More Science for Your Money                                                  Embedded Atom Model                          ...
K20X, the Fastest GPU Yet                                7                                                                ...
Get a CPU Rebate to Fund Part of Your GPU Budget                               Acceleration in Loop Time Computation by   ...
Excellent Strong Scaling on Large Clusters                                                 LAMMPS Gay-Berne 134M Atoms    ...
GPUs Sustain 5x Performance for Weak Scaling                                                Weak Scaling with 32K Atoms pe...
Faster, Greener — Worth It!                         Energy Consumed in one loop of EAM                       140          ...
Molecular Dynamics with LAMMPS on a Hybrid Cray Supercomputer                    W. Michael Brown        National Center f...
Early Kepler Benchmarks on Titan                      32.00                                                               ...
Early Kepler Benchmarks on Titan                             64.00                                                        ...
Early Titan XK6/XK7 Benchmarks           18                             Speedup with Acceleration on XK6/XK7 Nodes        ...
Recommended GPU Node Configuration for         LAMMPS Computational Chemistry                   Workstation or Single Node...
GPU Test Drive     Experience GPU Acceleration     For Computational Chemistry     Researchers, Biophysicists     Preconfi...
Upcoming SlideShare
Loading in...5
×

LAMMPS Molecular Dynamics on GPU

1,121

Published on

Benchmarks showing benefits of running LAMMPS Molecular Dynamics Application on GPUs

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,121
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • CPU OnlyCPU + K10CPU + 2K101k202k201k20x2k20xLoop time: 382.13225115.4154.684.2130.569.9
  • CPU OnlyCPU + K10CPU + 2K101k202k201k20x2k20xLoop time: 382.13225115.4154.684.2130.569.9
  • Config: loop time:2x X5670 (HP Z800) 2717.6301xM2090 (2xX5570)511.7502xM2090 (2xX5570)274.9703xM2090 (2xX5570)210.4304xM2090 (2xX5570)148.880
  • nodes:300400500600700800900CPU-only time:563.96423.83339.62281.58260.98220.83203.13CPU+GPU time: 159.06118.6296.4481.0371.5763.7658.96GPU speedup ratio:  3.553.573.523.483.653.463.45
  • Nodes, box size, atoms, cpu time, cpu+gpu time, gpu speedup11x1x13276842.26.336.67 x82x2x226214441.86.736.21 x273x3x388473641.56.866.05 x644x4x4209715241.57.185.78 x1255x5x5409600041.47.185.77 x2166x6x67077888427.665.48 x3437x7x71123942441.98.345.02 x5128x8x81677721642.38.415.03 x7299x9x92388787242.58.924.76 x
  • Power WTimeenergy spentCpu 300 382 114Cpu 1 k20x 535 130 69Cpu 2 k20x 770 70 54
  • Before we end this session I would like to tell you about GPU Test Drive. It is an excellent resource for computational chemistry researchers such as yourself to evaluate benefits of GPU computing in speeding up your simulations. Most importantly it is free.NVIDIA along with its partners is offering access to remotely hosted GPU cluster. You can run applications such as AMBER and NAMD to find out how your models speed up. You can also try code that you have developed to run on GPU and see how it scales on a 8 GPU cluster. All you need to do is sign up and log in – it is really that easy! We have several partners who are demonstrating the GPU Test Drive on the GTC show floor. Please plan on visiting them.Sign up forms have been given out. If you are interested please fill them out and return them to me.
  • Transcript of "LAMMPS Molecular Dynamics on GPU"

    1. 1. LAMMPS, Dec. 2011 or later
    2. 2. Summary/ConclusionsBenefits of GPU Accelerated ComputingFaster than CPU only systems in all testsLarge performance boost with small marginal price increaseEnergy usage cut in halfGPUs scale very well within a node and over multiple nodesTesla K20 GPU is our fastest and lowest power high performance GPU to date Try GPU accelerated LAMMPS for free – www.nvidia.com/GPUTestDrive
    3. 3. More Science for Your Money Embedded Atom Model Blue node uses 2x E5-2687W (8 Cores 6 and 150W per CPU). 5.5 Green nodes have 2x E5-2687W and 1 5 or 2 NVIDIA K10, K20, or K20X GPUs (235W).Speedup Compared to CPU Only 4.5 4 3.3 2.92 3 2.47 2 1.7 1 0 CPU Only CPU + 1x CPU + 1x CPU + 1x CPU + 2x CPU + 2x CPU + 2x K10 K20 K20X K10 K20 K20X Experience performance increases of up to 5.5x with Kepler GPU nodes.
    4. 4. K20X, the Fastest GPU Yet 7 Blue node uses 2x E5-2687W (8 Cores and 150W per CPU). 6 Green nodes have 2x E5-2687W and 2 NVIDIA M2090s or K20X GPUs (235W).Speedup Relative to CPU Alone 5 4 3 2 1 0 CPU Only CPU + 2x M2090 CPU + K20X CPU + 2x K20X Experience performance increases of up to 6.2x with Kepler GPU nodes. One K20X performs as well as two M2090s
    5. 5. Get a CPU Rebate to Fund Part of Your GPU Budget Acceleration in Loop Time Computation by Additional GPUs Running NAMD version 2.9 20 18.2 The blue node contains Dual X5670 CPUs 18 (6 Cores per CPU). 16 The green nodes contain Dual X5570 CPUs Normalized to CPU Only 14 12.9 (4 Cores per CPU) and 1-4 NVIDIA M2090 GPUs. 12 9.88 10 8 6 5.31 4 2 0 1 Node 1 Node + 1x M20901 Node + 2x M20901 Node + 3x M20901 Node + 4x M2090 Increase performance 18x when compared to CPU-only nodes Cheaper CPUs used with GPUs AND still faster overall performance when compared to more expensive CPUs!
    6. 6. Excellent Strong Scaling on Large Clusters LAMMPS Gay-Berne 134M Atoms 600 GPU Accelerated XK6 500 CPU only XE6 Loop Time (seconds) 400 3.55x 300 200 3.48x 3.45x 100 0 300 400 500 600 700 800 900 Nodes From 300-900 nodes, the NVIDIA GPU-powered XK6 maintained 3.5x performance compared to XE6 CPU nodes Each blue Cray XE6 Nodes have 2x AMD Opteron CPUs (16 Cores per CPU) Each green Cray XK6 Node has 1x AMD Opteron 1600 CPU (16 Cores per CPU) and 1x NVIDIA X2090
    7. 7. GPUs Sustain 5x Performance for Weak Scaling Weak Scaling with 32K Atoms per Node 45 40 Loop Time (seconds) 35 30 6.7x 5.8x 4.8x 25 20 15 10 5 0 1 8 27 64 125 216 343 512 729 Nodes Performance of 4.8x-6.7x with GPU-accelerated nodes when compared to CPUs alone Each blue Cray XE6 Node have 2x AMD Opteron CPUs (16 Cores per CPU) Each green Cray XK6 Node has 1x AMD Opteron 1600 CPU (16 Core per CPU) and 1x NVIDIA X2090
    8. 8. Faster, Greener — Worth It! Energy Consumed in one loop of EAM 140 120 GPU-accelerated computing uses Lower is better 53% less energy than CPU only 100Energy Expended (kJ) 80 60 Energy Expended = Power x Time Power calculated by combining the component’s TDPs 40 20 0 1 Node 1 Node + 1 K20X 1 Node + 2x K20X Blue node uses 2x E5-2687W (8 Cores and 150W per CPU) and CUDA 4.2.9. Green nodes have 2x E5-2687W and 1 or 2 NVIDIA K20X GPUs (235W) running CUDA 5.0.36.
    9. 9. Molecular Dynamics with LAMMPS on a Hybrid Cray Supercomputer W. Michael Brown National Center for Computational Sciences Oak Ridge National Laboratory NVIDIA Technology Theater, Supercomputing 2012 November 14, 2012
    10. 10. Early Kepler Benchmarks on Titan 32.00 4 16.00 XK7+GPU 8.00 4.00 XK6 3 Time (s)Atomic Fluid 2.00 Time (s) XK6+GPU 1.00 2 0.50 XK7+GPU 0.25 XK6 0.13 1 XK6+GPU 0.06 0.03 0 1 2 4 8 16 32 64 128 Nodes 1 4 16 64 6 96 24 4 25 38 40 10 16 3.0 8.00 XK7+GPU 2.5 4.00 2.0 Time (s) 2.00 Time (s)Bulk Copper XK6 1.5 1.00 1.0 0.50 XK6+GPU 0.5 0.25 0.0 0.13 Nodes 1 4 16 64 6 96 24 4 25 38 1 2 4 8 16 32 64 128 40 10 16
    11. 11. Early Kepler Benchmarks on Titan 64.00 32 32.00 XK7+GPU 16 16.00 Time (s)Protein 8 Time (s) 8.00 XK6 4.00 4 2.00 XK6+GPU 2 1.00 0.50 1 1 4 16 64 256 4096 16384 1024 1 2 4 8 16 32 64 128 Nodes 128.00 16 64.00 14 32.00 XK7+GPU 12 16.00 10 Time (s) 8.00 Time (s)Liquid Crystal 4.00 XK6 8 2.00 6 1.00 XK6+GPU 4 0.50 0.25 2 0.13 0 1 2 4 8 16 32 64 128 Nodes 1 4 16 64 6 96 24 4 25 38 40 10 16
    12. 12. Early Titan XK6/XK7 Benchmarks 18 Speedup with Acceleration on XK6/XK7 Nodes 16 1 Node = 32K Particles 14 900 Nodes = 29M Particles 12 10 8 6 4 2 0 Atomic Fluid (cutoff Atomic Fluid (cutoff Bulk Copper Protein Liquid Crystal = 2.5σ) = 5.0σ)XK6 (1 Node) 1.92 4.33 2.12 2.6 5.82XK7 (1 Node) 2.90 8.38 3.66 3.36 15.70XK6 (900 Nodes) 1.68 3.96 2.15 1.56 5.60XK7 (900 Nodes) 2.75 7.48 2.86 1.95 10.14
    13. 13. Recommended GPU Node Configuration for LAMMPS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 # of GPUs per CPU socket 1-2 GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand13 Scale to multiple nodes with same single node configuration
    14. 14. GPU Test Drive Experience GPU Acceleration For Computational Chemistry Researchers, Biophysicists Preconfigured with Molecular Dynamics Apps Remotely Hosted GPU Servers Free & Easy – Sign up, Log in and See Results www.nvidia.com/gputestdrive14
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×