Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Great Hiroshima with Python 170830

1,017 views

Published on

Chainer の環境構築と性能比較
西本卓也 nishimotz 2017-08-30
すごい広島 with Python [5]

Published in: Technology
  • Be the first to comment

Great Hiroshima with Python 170830

  1. 1. Chainer の環境構築と性能比較 西本卓也 (nishimotz) 2017-08-30 すごい広島 with Python [5]
  2. 2. PyCon JP 2017 Tutorial 2
  3. 3. インターフェース 2017年8月号 3
  4. 4. GPUを支える技術 4
  5. 5. NVDAといえば • NonVisual Desktop Access • NVIDIA 5
  6. 6. Chainer 1.24 の環境構築 • Windows + Vagrant + VirtualBox ubuntu/xenial64 $ sudo apt-get install python3-matplotlib ソースから公式 Python 3.6.2 を make install Python BootCamp テキストの手順 $ python3.6 -m venv env $ . env/bin/activate $ pip install matplotlib $ pip install chainer==1.24.0 6
  7. 7. train_mnist $ wget https://github.com/pfnet/chainer/archive/v1.24.0.tar.gz $ tar xzf v1.24.0.tar.gz $ python chainer-1.24.0/examples/mnist/train_mnist.py mnist? https://localab.jp/blog/mnist-for-ml-beginners/ 7
  8. 8. ThinkPad X260 GPU: -1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.191114 0.109617 0.942 0.9672 28.4063 2 0.0734576 0.0930718 0.97685 0.971 60.5345 3 0.0507312 0.0619334 0.983633 0.9811 92.2415 4 0.0359045 0.0770756 0.988717 0.9781 124.72 5 0.0277748 0.0769588 0.99085 0.9785 160.511 6 0.0226964 0.0846334 0.992467 0.978 194.347 7 0.0219887 0.0684169 0.99265 0.981 226.282 8
  9. 9. ThinkPad X260 : 714sec 8 0.0176104 0.0667137 0.994433 0.9843 259.557 9 0.0170781 0.0892604 0.9948 0.9786 292.731 10 0.0147111 0.0833657 0.99545 0.9822 327.362 11 0.0161634 0.0842604 0.994533 0.9803 361.649 12 0.0106007 0.0931015 0.996767 0.9818 397.067 13 0.0112231 0.0903538 0.996517 0.9814 434.085 14 0.013213 0.0965812 0.996 0.982 470.016 15 0.0105413 0.0995516 0.9966 0.981 507.188 16 0.00924478 0.104709 0.997217 0.9818 546.777 17 0.00905827 0.101083 0.997067 0.9826 586.079 18 0.0108249 0.117545 0.996733 0.9812 632.828 19 0.0103275 0.0996102 0.997033 0.9827 675.499 20 0.00743735 0.0794613 0.997867 0.9852 714.887 9
  10. 10. result/accuracy.png 10
  11. 11. result/loss.png 11
  12. 12. Raspberry Pi 3 : 予想17時間 $ python chainer-1.24.0/examples/mnist/train_mnist.py GPU: -1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz... epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time total [#.................................................] 2.50% this epoch [#########################.........................] 50.00% 300 iter, 0 epoch / 20 epochs 0.18415 iters/sec. Estimated time to finish: 17:38:55.259967. 12
  13. 13. AWS EC2 で GPU を使う • Amazon Linux AMI with NVIDIA GRID GPU Driver • sudo CUDA_PATH=/opt/nvidia/cuda pip install chainer • http://qiita.com/unnonouno/items/78ca98cf4911b5135c6f • us-east-1 13
  14. 14. g2.2xlarge : 0.65USD / hr 14
  15. 15. Amazon Linux + Python 3.6.2 $ sudo yum -y groupinstall 'Development tools' $ sudo yum -y install openssl-devel sqlite-devel $ wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgz $ tar axvf ./Python-3.6.2.tgz $ cd ./Python-3.6.2/ $ ./configure --with-ensurepip $ make $ sudo make install $ cd .. 15
  16. 16. cuDNN が入ってない $ python chainer-1.24.0/examples/mnist/train_mnist.py -g1 GPU: 1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 /home/ec2-user/env/lib/python3.6/site-packages/chainer/cuda.py:92: UserWarning: cuDNN is not enabled. Please reinstall chainer after you install cudnn (see https://github.com/pfnet/chainer#installation). 'cuDNN is not enabled.n' Traceback (most recent call last): File "chainer-1.24.0/examples/mnist/train_mnist.py", line 130, in <module> main() File "chainer-1.24.0/examples/mnist/train_mnist.py", line 67, in main chainer.cuda.get_device_from_id(args.gpu).use() File "cupy/cuda/device.pyx", line 89, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2275) File "cupy/cuda/device.pyx", line 95, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2227) File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2915) File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241) cupy.cuda.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal 16
  17. 17. Bitfusion Ubuntu 14 Chainer 17 • https://github.com/bitfusionio/amis/tree/master/awsmrkt-bfboost-ubuntu14-cuda75-chainer
  18. 18. p2.xlarge : 0.99USD / hr 18
  19. 19. Python 2.7.6 + Chainer 1.21.0 $ python train_mnist.py --gpu=0 GPU: 0 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.191271 0.0934538 0.942817 0.9713 86.1949 2 0.0743666 0.0828909 0.977165 0.9742 89.6567 ... ... 17 0.0108012 0.0939435 0.996982 0.9842 138.588 18 0.0121638 0.0951775 0.996616 0.9829 141.862 19 0.00975043 0.108709 0.997082 0.983 145.137 20 0.00649515 0.128444 0.998166 0.9793 148.352 19
  20. 20. Python 3.6.2 + Chainer 1.24.0 GPU: 0 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.190091 0.107102 0.942867 0.9671 43.8556 2 0.0749841 0.100598 0.976716 0.9685 47.4641 ... ... 17 0.0136074 0.106203 0.995732 0.9804 100.789 18 0.0109891 0.0939196 0.996682 0.9831 104.359 19 0.00688507 0.12296 0.997966 0.9806 107.878 20 0.0092544 0.0957009 0.997382 0.984 111.401 20
  21. 21. 請求の確認 21
  22. 22. まとめ • Python 3.6.2 + Chainer 1.24.0 • ThinkPad X260 VirtualBox : 714sec • Core i7-6500U 2.6GHz • Bitfusion Ubuntu 14 Chainer : 111sec • p2.xlarge (0.99USD/hr) で PC の約6.4倍 • 参考までに Python 2.7 + Chainer 1.21 : 148sec • 実は http://aa.bb.cc.dd:8888 で Jupyter が使える • Raspberry Pi 3 : 66484 sec • 約18時間30分 (対GPUで1/600 / 対PCで 1/93) • CPU負荷はずっと25%(シングルコア) • 計算結果は比較していない 22

×