AMBER Molecular Dynamics on GPU

  • 3,127 views
Uploaded on

Benchmarks showing benefits of running AMBER Molecular Dynamics Application on GPUs

Benchmarks showing benefits of running AMBER Molecular Dynamics Application on GPUs

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,127
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
27
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • ns/dayDual E5-2687W CPUs 3.4Dual E5-2687W CPUs + M2090 11.9Dual E5-2687W CPUs + K10 18.9Dual E5-2687W CPUs + K20 22.4Dual E5-2687W CPUs + K20X 25.39
  • cpu ns/day gpu ns/dayTrpcage 210 420Jacnve 12.47 68.6Factor 9 3.42 18.9Cellulose .74 3.73Myoglobin 6.12 122.3Nucleosome .1 2.4
  • cpu ns/day gpu ns/dayTRPcage GB 210.32 559.32JAC NVE PME 12.47 81.09Factor IX NVE PME 3.42 22.44Cellulose NVE PME 0.74 5.39Myoglobin GB 6.12 156.45Nucleosome GB 0.10 2.80SPFP ECC off
  • cpu ns/day gpu ns/dayTrpcage 210 585Jacnve 12.47 89.13Factor 9 3.42 25.4Cellulose .74 6.14Myoglobin 6.12 175.77Nucleosome .1 3.13
  • Nodes cpu ns/day gpu ns/day1 .65 3.312 1.14 4.134 2.01 4.8
  • Energytdpsec/nsenergy (kJ)2e5-2687 300 6928 207822687+K10 5351259673 22687+K20 535 1065 569 22687+K20X 535 969 518
  • 1 CPU node (dual CPUs) = 12.47 ns/day1 CPU+ GPU node (dual CPUs and GPUs) = 95.59 ns/day
  • Costs:CPU: 8 nodes * 2 CPU/node * $2000/cpuGPU: 2 CPUs + 1 GPUPerformance:CPU: 40.44 ns/day from gordon supercomputerGPU: 68.6 ns/day
  • Perflab:no gpuk10k20k20x2k102k202k20xcell 1 cpu0.374.445.46.166.376.937.67cell 2 cpu0.744.345.396.146.46.787.5
  • Energytdpsec/nsenergy (kJ)2e5-2687 300 6928 207822687+K10 5351259673 22687+K20 535 1065 569 22687+K20X 535 969 518

Transcript

  • 1. AMBER 12 GPU Support Revision 12.1 12/21/20121 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 2. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive2 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 3. Protein Folding Simulation With AMBER Accelerated By GPUs 4.7x Faster 80 ns/day 367 ns/day 4 core CPU 4 core CPU + Tesla M2070 GPUData courtesy of AMBER.org
  • 4. Kepler - Our Fastest Family of GPUs Yet 30.00 Factor IX Running AMBER 12 GPU Support Revision 12.1 25.39 25.00 The blue node contains Dual E5-2687W CPUs 22.44 (8 Cores per CPU). 7.4x The green nodes contain Dual E5-2687W CPUs (8 20.00 18.90 Cores per CPU) and either 1x NVIDIA M2090, 1x K10 Nanoseconds / Day or 1x K20 for the GPU 6.6x 15.00 11.85 5.6x 10.00 3.5x 5.00 3.42 0.00 Factor IX 1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X M2090 GPU speedup/throughput increased from 3.5x (with M2090) to 7.4x (with K20X) when compared to a CPU only node4 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 5. K10 Accelerates Simulations of All Sizes 30 Running AMBER 12 GPU Support Revision 12.1 The blue node contains Dual E5-2687W CPUs 25 24.00 (8 Cores per CPU).Speedup Compared to CPU Only The green nodes contain Dual E5-2687W CPUs (8 19.98 20 Cores per CPU) and 1x NVIDIA K10 GPU 15 10 5.50 5.53 5.04 5 2.00 0 CPU TRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin Nucleosome All Molecules GB PME PME PME GB GB Gain 24x performance by adding just 1 GPU Nucleosome when compared to dual CPU performance
  • 6. Run AMBER 28x Faster With Tesla K20 GPUs 30.00 28.00 Running AMBER 12 GPU Support Revision 12.1 25.56 SPFP with CUDA 4.2.9 ECC Off 25.00 The blue node contains 2x Intel E5-2687W CPUs Speedup Compared to CPU Only (8 Cores per CPU) 20.00 Each green nodes contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs 15.00 10.00 7.28 6.50 6.56 5.00 2.66 1.00 0.00 CPU All TRPcage GB JAC NVE PME Factor IX NVE Cellulose NVE Myoglobin GB Nucleosome Molecules PME PME GB Gain 28x throughput/performance by adding just one K20 GPU Nucleosome when compared to dual CPU performance6 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 7. K20X Accelerates Simulations of All Sizes 35 31.30 Running AMBER 12 GPU Support Revision 12.1 30 28.59 The blue node contains Dual E5-2687W CPUs (8 Cores per CPU). Speedup Compared to CPU Only 25 The green nodes contain Dual E5-2687W CPUs (8 Cores per CPU) and 1x NVIDIA K20X GPU 20 15 10 8.30 7.15 7.43 5 2.79 0 CPU TRPcage JAC NVE Factor IX NVE Cellulose NVE Myoglobin Nucleosome All Molecules GB PME PME PME GB GB Gain 31x performance by adding just one K20X GPU Nucleosome when compared to dual CPU performance7 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 8. K10 Strong Scaling over Nodes Cellulose 408K Atoms (NPT) Running AMBER 12 with CUDA 4.2 ECC Off 6 The blue nodes contains 2x Intel X5670 CPUs (6 Cores per CPU) 5 The green nodes contains 2x Intel X5670 CPUs (6 Cores per CPU) plus 2x NVIDIA K10 GPUs 4Nanoseconds / Day 2.4x 3 CPU Only 3.6x With GPU 2 5.1x 1 Cellulose 0 1 2 4 Number of Nodes GPUs significantly outperform CPUs while scaling over multiple nodes
  • 9. Kepler – Universally Faster 9 Running AMBER 12 GPU Support Revision 12.1 8 The CPU Only node contains Dual E5-2687W CPUs (8 Cores per CPU).Speedups Compared to CPU Only 7 The Kepler nodes contain Dual E5-2687W CPUs (8 6 Cores per CPU) and 1x NVIDIA K10, K20, or K20X GPUs 5 JAC 4 Factor IX Cellulose 3 2 1 0 CPU Only CPU + K10 CPU + K20 CPU + K20X Cellulose The Kepler GPUs accelerated all simulations, up to 8x
  • 10. K10 Extreme Performance Running AMBER 12 GPU Support Revision 12.1 JAC 23K Atoms (NVE) 120 The blue node contains Dual E5-2687W CPUs (8 Cores per CPU). 97.99 The green node contain Dual E5-2687W CPUs (8 100 Cores per CPU) and 2x NVIDIA K10 GPUsNanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain 7.8X performance by adding just 2 GPUs when compared to dual CPU performance
  • 11. K20 Extreme Performance DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 120 The blue node contains 2x Intel E5-2687W CPUs 95.59 (8 Cores per CPU) 100 Each green node contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU Nanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain > 7.5X throughput/performance by adding just 2 K20 GPUs when compared to dual CPU performance11 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 12. Replace 8 Nodes with 1 K20 GPU 90.00 35000 $32,000.00 Running AMBER 12 GPU Support Revision 12.1 81.09 SPFP with CUDA 4.2.9 ECC Off 80.00 30000 The eight (8) blue nodes each contain 2x Intel 70.00 E5-2687W CPUs (8 Cores, $2000 per CPU) 65.00 25000 Each green node contains 2x Intel E5-2687W 60.00 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPU ($2500 per GPU) 50.00 20000 40.00 15000 30.00 10000 20.00 $6,500.00 5000 10.00 0.00 0 Nanoseconds/Day Cost DHFR Cut down simulation costs to ¼ and gain higher performance12 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 13. Replace 7 Nodes with 1 K10 GPU Performance on JAC NVE Cost Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 80 35000 The eight (8) blue nodes each contain 2x Intel 70 30000 E5-2687W CPUs (8 Cores, $2000 per CPU) 60 The green node contains 2x Intel E5-2687W 25000 CPUs (8 Cores per CPU) plus 1x NVIDIA K10 Nanoseconds / Day GPU ($3000 per GPU) 50 20000 40 15000 30 10000 20 10 5000 0 0 CPU Only GPU Enabled CPU Only GPU Enabled DHFR Cut down simulation costs to ¼ and increase performance by 70%13 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 14. Extra CPUs decrease Performance Cellulose NVE Running AMBER 12 GPU Support Revision 12.1 8 The orange bars contains one E5-2687W CPUs (8 Cores per CPU). 7 The blue bars contain Dual E5-2687W CPUs (8 6 Cores per CPU)Nanoseconds / Day 5 4 1 E5-2687W 2 E5-2687W 3 2 1 0 Cellulose CPU Only CPU with dual K20sWhen used with GPUs, dual CPU sockets perform worse than single CPU sockets.
  • 15. Kepler - Greener Science Running AMBER 12 GPU Support Revision 12.1 Energy used in simulating 1 ns of DHFR JAC 2500 The blue node contains Dual E5-2687W CPUs (150W each, 8 Cores per CPU). The green nodes contain Dual E5-2687W CPUs (8 2000 Cores per CPU) and 1x NVIDIA K10, K20, or K20X Lower is better GPUs (235W each).Energy Expended (kJ) 1500 Energy Expended 1000 = Power x Time 500 0 CPU Only CPU + K10 CPU + K20 CPU + K20X The GPU Accelerated systems use 65-75% less energy
  • 16. Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 4+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 1-2 # of GPUs per CPU socket (4 GPUs on 1 socket is good to do 4 fast serial GPU runs) GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better16 Scale to multiple nodes with same single node configuration AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 17. Benefits of GPU AMBER Accelerated Computing Faster than CPU only systems in all tests Most major compute intensive aspects of classical MD ported Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated AMBER for free – www.nvidia.com/GPUTestDrive17 AMBER Benchmark Report, Revision 2.0, dated Nov. 5, 2012