Chainer の環境構築と性能比較
西本卓也 (nishimotz)
2017-08-30
すごい広島 with Python [5]
PyCon JP 2017 Tutorial
2
インターフェース 2017年8月号
3
GPUを支える技術
4
NVDAといえば
• NonVisual Desktop Access
• NVIDIA
5
Chainer 1.24 の環境構築
• Windows + Vagrant + VirtualBox
ubuntu/xenial64
$ sudo apt-get install python3-matplotlib
ソースから公式 Python 3.6.2 を make install
Python BootCamp テキストの手順
$ python3.6 -m venv env
$ . env/bin/activate
$ pip install matplotlib
$ pip install chainer==1.24.0
6
train_mnist
$ wget https://github.com/pfnet/chainer/archive/v1.24.0.tar.gz
$ tar xzf v1.24.0.tar.gz
$ python chainer-1.24.0/examples/mnist/train_mnist.py
mnist?
https://localab.jp/blog/mnist-for-ml-beginners/
7
ThinkPad X260
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.191114 0.109617 0.942 0.9672 28.4063
2 0.0734576 0.0930718 0.97685 0.971 60.5345
3 0.0507312 0.0619334 0.983633 0.9811 92.2415
4 0.0359045 0.0770756 0.988717 0.9781 124.72
5 0.0277748 0.0769588 0.99085 0.9785 160.511
6 0.0226964 0.0846334 0.992467 0.978 194.347
7 0.0219887 0.0684169 0.99265 0.981 226.282
8
ThinkPad X260 : 714sec
8 0.0176104 0.0667137 0.994433 0.9843 259.557
9 0.0170781 0.0892604 0.9948 0.9786 292.731
10 0.0147111 0.0833657 0.99545 0.9822 327.362
11 0.0161634 0.0842604 0.994533 0.9803 361.649
12 0.0106007 0.0931015 0.996767 0.9818 397.067
13 0.0112231 0.0903538 0.996517 0.9814 434.085
14 0.013213 0.0965812 0.996 0.982 470.016
15 0.0105413 0.0995516 0.9966 0.981 507.188
16 0.00924478 0.104709 0.997217 0.9818 546.777
17 0.00905827 0.101083 0.997067 0.9826 586.079
18 0.0108249 0.117545 0.996733 0.9812 632.828
19 0.0103275 0.0996102 0.997033 0.9827 675.499
20 0.00743735 0.0794613 0.997867 0.9852 714.887
9
result/accuracy.png
10
result/loss.png
11
Raspberry Pi 3 : 予想17時間
$ python chainer-1.24.0/examples/mnist/train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz...
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
total [#.................................................] 2.50%
this epoch [#########################.........................] 50.00%
300 iter, 0 epoch / 20 epochs
0.18415 iters/sec. Estimated time to finish: 17:38:55.259967.
12
AWS EC2 で GPU を使う
• Amazon Linux AMI with NVIDIA GRID GPU Driver
• sudo CUDA_PATH=/opt/nvidia/cuda pip install chainer
• http://qiita.com/unnonouno/items/78ca98cf4911b5135c6f
• us-east-1
13
g2.2xlarge : 0.65USD / hr
14
Amazon Linux + Python 3.6.2
$ sudo yum -y groupinstall 'Development tools'
$ sudo yum -y install openssl-devel sqlite-devel
$ wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgz
$ tar axvf ./Python-3.6.2.tgz
$ cd ./Python-3.6.2/
$ ./configure --with-ensurepip
$ make
$ sudo make install
$ cd ..
15
cuDNN が入ってない
$ python chainer-1.24.0/examples/mnist/train_mnist.py -g1
GPU: 1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
/home/ec2-user/env/lib/python3.6/site-packages/chainer/cuda.py:92: UserWarning: cuDNN is not enabled.
Please reinstall chainer after you install cudnn
(see https://github.com/pfnet/chainer#installation).
'cuDNN is not enabled.n'
Traceback (most recent call last):
File "chainer-1.24.0/examples/mnist/train_mnist.py", line 130, in <module>
main()
File "chainer-1.24.0/examples/mnist/train_mnist.py", line 67, in main
chainer.cuda.get_device_from_id(args.gpu).use()
File "cupy/cuda/device.pyx", line 89, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2275)
File "cupy/cuda/device.pyx", line 95, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2227)
File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2915)
File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
16
Bitfusion Ubuntu 14 Chainer
17
• https://github.com/bitfusionio/amis/tree/master/awsmrkt-bfboost-ubuntu14-cuda75-chainer
p2.xlarge : 0.99USD / hr
18
Python 2.7.6 + Chainer 1.21.0
$ python train_mnist.py --gpu=0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.191271 0.0934538 0.942817 0.9713 86.1949
2 0.0743666 0.0828909 0.977165 0.9742 89.6567
...
...
17 0.0108012 0.0939435 0.996982 0.9842 138.588
18 0.0121638 0.0951775 0.996616 0.9829 141.862
19 0.00975043 0.108709 0.997082 0.983 145.137
20 0.00649515 0.128444 0.998166 0.9793 148.352
19
Python 3.6.2 + Chainer 1.24.0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.190091 0.107102 0.942867 0.9671 43.8556
2 0.0749841 0.100598 0.976716 0.9685 47.4641
...
...
17 0.0136074 0.106203 0.995732 0.9804 100.789
18 0.0109891 0.0939196 0.996682 0.9831 104.359
19 0.00688507 0.12296 0.997966 0.9806 107.878
20 0.0092544 0.0957009 0.997382 0.984 111.401
20
請求の確認
21
まとめ
• Python 3.6.2 + Chainer 1.24.0
• ThinkPad X260 VirtualBox : 714sec
• Core i7-6500U 2.6GHz
• Bitfusion Ubuntu 14 Chainer : 111sec
• p2.xlarge (0.99USD/hr) で PC の約6.4倍
• 参考までに Python 2.7 + Chainer 1.21 : 148sec
• 実は http://aa.bb.cc.dd:8888 で Jupyter が使える
• Raspberry Pi 3 : 66484 sec
• 約18時間30分 (対GPUで1/600 / 対PCで 1/93)
• CPU負荷はずっと25%(シングルコア)
• 計算結果は比較していない
22

Great Hiroshima with Python 170830

  • 1.
  • 2.
    PyCon JP 2017Tutorial 2
  • 3.
  • 4.
  • 5.
  • 6.
    Chainer 1.24 の環境構築 •Windows + Vagrant + VirtualBox ubuntu/xenial64 $ sudo apt-get install python3-matplotlib ソースから公式 Python 3.6.2 を make install Python BootCamp テキストの手順 $ python3.6 -m venv env $ . env/bin/activate $ pip install matplotlib $ pip install chainer==1.24.0 6
  • 7.
    train_mnist $ wget https://github.com/pfnet/chainer/archive/v1.24.0.tar.gz $tar xzf v1.24.0.tar.gz $ python chainer-1.24.0/examples/mnist/train_mnist.py mnist? https://localab.jp/blog/mnist-for-ml-beginners/ 7
  • 8.
    ThinkPad X260 GPU: -1 #unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.191114 0.109617 0.942 0.9672 28.4063 2 0.0734576 0.0930718 0.97685 0.971 60.5345 3 0.0507312 0.0619334 0.983633 0.9811 92.2415 4 0.0359045 0.0770756 0.988717 0.9781 124.72 5 0.0277748 0.0769588 0.99085 0.9785 160.511 6 0.0226964 0.0846334 0.992467 0.978 194.347 7 0.0219887 0.0684169 0.99265 0.981 226.282 8
  • 9.
    ThinkPad X260 :714sec 8 0.0176104 0.0667137 0.994433 0.9843 259.557 9 0.0170781 0.0892604 0.9948 0.9786 292.731 10 0.0147111 0.0833657 0.99545 0.9822 327.362 11 0.0161634 0.0842604 0.994533 0.9803 361.649 12 0.0106007 0.0931015 0.996767 0.9818 397.067 13 0.0112231 0.0903538 0.996517 0.9814 434.085 14 0.013213 0.0965812 0.996 0.982 470.016 15 0.0105413 0.0995516 0.9966 0.981 507.188 16 0.00924478 0.104709 0.997217 0.9818 546.777 17 0.00905827 0.101083 0.997067 0.9826 586.079 18 0.0108249 0.117545 0.996733 0.9812 632.828 19 0.0103275 0.0996102 0.997033 0.9827 675.499 20 0.00743735 0.0794613 0.997867 0.9852 714.887 9
  • 10.
  • 11.
  • 12.
    Raspberry Pi 3: 予想17時間 $ python chainer-1.24.0/examples/mnist/train_mnist.py GPU: -1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz... Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz... epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time total [#.................................................] 2.50% this epoch [#########################.........................] 50.00% 300 iter, 0 epoch / 20 epochs 0.18415 iters/sec. Estimated time to finish: 17:38:55.259967. 12
  • 13.
    AWS EC2 でGPU を使う • Amazon Linux AMI with NVIDIA GRID GPU Driver • sudo CUDA_PATH=/opt/nvidia/cuda pip install chainer • http://qiita.com/unnonouno/items/78ca98cf4911b5135c6f • us-east-1 13
  • 14.
  • 15.
    Amazon Linux +Python 3.6.2 $ sudo yum -y groupinstall 'Development tools' $ sudo yum -y install openssl-devel sqlite-devel $ wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgz $ tar axvf ./Python-3.6.2.tgz $ cd ./Python-3.6.2/ $ ./configure --with-ensurepip $ make $ sudo make install $ cd .. 15
  • 16.
    cuDNN が入ってない $ pythonchainer-1.24.0/examples/mnist/train_mnist.py -g1 GPU: 1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 /home/ec2-user/env/lib/python3.6/site-packages/chainer/cuda.py:92: UserWarning: cuDNN is not enabled. Please reinstall chainer after you install cudnn (see https://github.com/pfnet/chainer#installation). 'cuDNN is not enabled.n' Traceback (most recent call last): File "chainer-1.24.0/examples/mnist/train_mnist.py", line 130, in <module> main() File "chainer-1.24.0/examples/mnist/train_mnist.py", line 67, in main chainer.cuda.get_device_from_id(args.gpu).use() File "cupy/cuda/device.pyx", line 89, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2275) File "cupy/cuda/device.pyx", line 95, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2227) File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2915) File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241) cupy.cuda.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal 16
  • 17.
    Bitfusion Ubuntu 14Chainer 17 • https://github.com/bitfusionio/amis/tree/master/awsmrkt-bfboost-ubuntu14-cuda75-chainer
  • 18.
  • 19.
    Python 2.7.6 +Chainer 1.21.0 $ python train_mnist.py --gpu=0 GPU: 0 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.191271 0.0934538 0.942817 0.9713 86.1949 2 0.0743666 0.0828909 0.977165 0.9742 89.6567 ... ... 17 0.0108012 0.0939435 0.996982 0.9842 138.588 18 0.0121638 0.0951775 0.996616 0.9829 141.862 19 0.00975043 0.108709 0.997082 0.983 145.137 20 0.00649515 0.128444 0.998166 0.9793 148.352 19
  • 20.
    Python 3.6.2 +Chainer 1.24.0 GPU: 0 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.190091 0.107102 0.942867 0.9671 43.8556 2 0.0749841 0.100598 0.976716 0.9685 47.4641 ... ... 17 0.0136074 0.106203 0.995732 0.9804 100.789 18 0.0109891 0.0939196 0.996682 0.9831 104.359 19 0.00688507 0.12296 0.997966 0.9806 107.878 20 0.0092544 0.0957009 0.997382 0.984 111.401 20
  • 21.
  • 22.
    まとめ • Python 3.6.2+ Chainer 1.24.0 • ThinkPad X260 VirtualBox : 714sec • Core i7-6500U 2.6GHz • Bitfusion Ubuntu 14 Chainer : 111sec • p2.xlarge (0.99USD/hr) で PC の約6.4倍 • 参考までに Python 2.7 + Chainer 1.21 : 148sec • 実は http://aa.bb.cc.dd:8888 で Jupyter が使える • Raspberry Pi 3 : 66484 sec • 約18時間30分 (対GPUで1/600 / 対PCで 1/93) • CPU負荷はずっと25%(シングルコア) • 計算結果は比較していない 22