Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20161121 open hyperscale#6

634 views

Published on

NVIDIA GPUマネージメント入門

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

20161121 open hyperscale#6

  1. 1. エヌビディア ディープラーニングソリューションアーキテクト 兼 CUDAエンジニア 村上 真奈 NVIDIA GPUマネージメント入門 HyperScale 勉強会 #06
  2. 2. 3 前回の復習 http://docs.aws.amazon.com/ja_jp/AWSEC2/latest/UserGuide/using_cluster_computing.html P2インスタンスついに出た!
  3. 3. 4 前回の復習 https://aws.amazon.com/jp/blogs/aws/new-p2-instance-type-for-amazon-ec2-up-to-16-gpus/ とりあえずこのコマンドを実行すると良いらしい
  4. 4. 5 これって何をしているの?
  5. 5. 6 GPU Boost For NVIDIA Tesla GPU Boostという言葉を聞いた事がありますか? https://www.nvidia.com/content/PDF/kepler/nvidia-gpu-boost-tesla-k40-06767-001-v02.pdf Increase Performance with GPU Boost ベースクロック(デフォルトクロック)とBoost Clock(ブーストクロック)が存在。 ブースとクロックは複数ある場合もある Tesla K40,GPU Boost whitepaperより引用
  6. 6. 7 Increase Performance with GPU Boost and K80 Autoboost • PARALLEL FORALL(NVIDIA技術ブログ)にTesla K80のパフォーマンス向上の為にという記事あり • GPU BoostとAutoboostの説明あり https://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/ Increase Performance with GPU Boost
  7. 7. 8 NVIDIA SMI & NVML • NVIDIA Management Library (NVML SDK) • http://developer.nvidia.com/tesla-deployment-kit • Python NVMLバインド • http://pypi.python.org/pypi/nvidia-ml-py/ • Perl NVMLバインド • http://search.cpan.org/~nvbinding/nvidia-ml-pl/ Monitoring and Managing NVIDIA GPUs in Cluster Environments
  8. 8. 9 いろいろ書いてあるが、、、
  9. 9. 10 まずはやってみましょう
  10. 10. 11 GPUパフォーマンス設定の手順 1. クラスタ内のGPU数を確認する 2. 現在のGPUのクロックを確認する 3. Persistenceモードを有効にする 4. SUPPORTED_CLOCKSの確認する 5. クロックの変更する GPU Boost
  11. 11. 12 1.クラスタ内のGPU数を確認する • nvidia-smiを実行(GPU一覧の取得のみならばnvidia-smi –Lでも良い) GPUパフォーマンス設定の手順 $ nvidia-smi Sun Nov 20 08:00:34 2016 +-----------------------------------------------------------------------------------------------------------------------------------+ | NVIDIA-SMI 361.93.02 Driver Version: 361.93.02 | |-------------------------------------------+-------------------------------------------+------------------------------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |=============================+==============================+=============================| | 0 Tesla P100-SXM2... On | 0000:06:00.0 Off | 0 | | N/A 23C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 1 Tesla P100-SXM2... On | 0000:07:00.0 Off | 0 | | N/A 25C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 2 Tesla P100-SXM2... On | 0000:0A:00.0 Off | 0 | | N/A 26C P0 28W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 3 Tesla P100-SXM2... On | 0000:0B:00.0 Off | 0 | | N/A 24C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 4 Tesla P100-SXM2... On | 0000:85:00.0 Off | 0 | | N/A 25C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 5 Tesla P100-SXM2... On | 0000:86:00.0 Off | 0 | | N/A 25C P0 28W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 6 Tesla P100-SXM2... On | 0000:89:00.0 Off | 0 | | N/A 26C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+ | 7 Tesla P100-SXM2... On | 0000:8A:00.0 Off | 0 | | N/A 23C P0 29W / 300W | 0MiB / 16280MiB | 0% Default | +-------------------------------------------+-------------------------------------------+------------------------------------------+
  12. 12. 13 2.現在のGPUのクロックを確認する • nvidia-smi –q –i [GPU IDまたはBus ID] –d CLOCK GPUパフォーマンス設定の手順 $ nvidia-smi -q -i 0 -d CLOCK ==============NVSMI LOG============== Timestamp : Sun Nov 20 20:23:45 2016 Driver Version : 361.93.02 Attached GPUs : 8 GPU 0000:07:00.0 Clocks Graphics : 405 MHz SM : 405 MHz Memory : 715 MHz Video : 835 MHz Applications Clocks Graphics : 1328 MHz Memory : 715 MHz Default Applications Clocks Graphics : 1328 MHz Memory : 715 MHz Max Clocks Graphics : 1480 MHz SM : 1480 MHz Memory : 715 MHz Video : 1480 MHz SM Clock Samples Duration : 4499.47 sec
  13. 13. 14 3.Persistenceモードを有効にする • Persistenceモードとは、CUDAアプリケーションやXアプリケーションが存在していない状態でも NVIDIAドライバをロードし続ける為の設定 • もしドライバがロードされていない場合、ロードしてから計算が走るのでオーバーヘッドが発生してしまう 可能性がある • sudo nvidia-smi –pm ENABLED –i [GPU IDまたはBus ID] GPUパフォーマンス設定の手順 $ nvidia-smi -pm ENABLED -i 0 Persistence mode is already Enabled for GPU 0000:06:00.0. All done.
  14. 14. 15 4. SUPPORTED_CLOCKSの確認する • nvidia-smi –q –i [GPU IDまたはBus ID] –d SUPPORTED_CLOCKS GPUパフォーマンス設定の手順 Attached GPUs : 8 GPU 0000:06:00.0 Supported Clocks Memory : 715 MHz Graphics : 1480 MHz Graphics : 1468 MHz Graphics : 1455 MHz Graphics : 1442 MHz Graphics : 1430 MHz Graphics : 1417 MHz Graphics : 1404 MHz Graphics : 1392 MHz Graphics : 1379 MHz Graphics : 1366 MHz Graphics : 1354 MHz Graphics : 1341 MHz Graphics : 1328 MHz Graphics : 1316 MHz Graphics : 1303 MHz Graphics : 1290 MHz Graphics : 1278 MHz Graphics : 1265 MHz Graphics : 1252 MHz Graphics : 1240 MHz
  15. 15. 16 5.クロックの変更する • graphics clock rateはmemory clock rateは依存しているので、両方を設定する必要がある • sudo nvidia-smi –ac <memory,graphics> –i [GPU IDまたはBus ID] GPUパフォーマンス設定の手順 $ sudo nvidia-smi -ac 715,1480 -i 0 Applications clocks set to "(MEM 715, SM 1480)" for GPU 0000:06:00.0 All done. $ sudo nvidia-smi -q -i 0 -d CLOCK ==============NVSMI LOG============== Timestamp : Sun Nov 20 21:16:54 2016 Driver Version : 361.93.02 Attached GPUs : 8 GPU 0000:06:00.0 Clocks Graphics : 405 MHz SM : 405 MHz Memory : 715 MHz Video : 835 MHz Applications Clocks Graphics : 1480 MHz Memory : 715 MHz Default Applications Clocks Graphics : 1328 MHz Memory : 715 MHz Max Clocks Graphics : 1480 MHz SM : 1480 MHz Memory : 715 MHz
  16. 16. 17 ベンチマークをしてみよう
  17. 17. 18 Graphics Clock = 1328 MHz ベンチマーク dgxuser@dgx-1:~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody -benchmark Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "Tesla P100-SXM2-16GB" with compute capability 6.0 > Compute 6.0 CUDA device: [Tesla P100-SXM2-16GB] 57344 bodies, total time for 10 iterations: 110.661 ms = 297.154 billion interactions per second = 5943.080 single-precision GFLOP/s at 20 flops per interaction
  18. 18. 19 Graphics Clock = 1480 MHz ベンチマーク dgxuser@dgx-1:~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody -benchmark Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "Tesla P100-SXM2-16GB" with compute capability 6.0 > Compute 6.0 CUDA device: [Tesla P100-SXM2-16GB] 57344 bodies, total time for 10 iterations: 103.895 ms = 316.505 billion interactions per second = 6330.091 single-precision GFLOP/s at 20 flops per interaction
  19. 19. 20 その他
  20. 20. 21 その他 • デフォルトクロックに戻す • sudo nvidia-smi –rac –i [GPU IDまたはBus ID] • クロック変更可能なユーザを設定する • sudo nvidia-smi –acp [UNRESTRICTEDまたはRESTRICTED] –i [GPU IDまたはBus ID] • などなど GPUパフォーマンス設定の手順
  21. 21. Thank you!

×