Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Chainer


Published on

Chainer is a deep learning framework which is flexible, intuitive, and powerful. This slide introduces some unique features of Chainer and its additional packages such as ChainerMN (distributed learning), ChainerCV (computer vision), ChainerRL (reinforcement learning)

Published in: Engineering
  • Be the first to comment

Introduction to Chainer

  1. 1. Last update: 28 July, 2017
  2. 2. Chainer – a deep learning framework Chainer provides a set of features required for research and development using deep learning such as designing neural networks, training, and evaluation. Designing a network Training, evaluation Data set
  3. 3. Features and Characteristics of Chainer Powerful ☑ CUDA ☑ cuDNN ☑ NCCL Versatile ☑ Convolutional Network ☑ Recurrent Network ☑ Many Other Components ☑ Various Optimizers Intuitive ☑ Define-by-Run ☑ High debuggability Supports GPU calculation using CUDA High-speed training/inference by cuDNN Supports a fast, multi-GPU learning using NCCL N-dimensional Convolution, Deconvolution, Pooling, BN, etc. RNN components such as LSTM, Bi-directional LSTM, GRU and Bi-directional GRU Many layer definitions and various loss functions used in neural networks Various optimizers, e.g., SGD, MomentumSGD, AdaGrad, RMSProp, Adam, etc. Easy to write a complicated network User-friendly error messages. Easy to debug using pure Python debuggers. Well-abstracted common tools for various NN learning, easy to write a set of learning flows☑ Simple APIs
  4. 4. Popularity Growth of Chainer
  5. 5. Neural network = Computational graph NN can be interpreted as a computational graph that applies many linear and nonlinear functions to input vectors
  6. 6. How to handle a computational graph A definition of computational graph exists apart from code that performs computation according to the definition Static The actual code that performs computation is treated as a definition of computational graph Dynamic
  7. 7. Chainer is the first deep-learning framework to adopt “Define-by-Run”* How about Chainer? → Dynamic ● Define-and-Run(static graph) Consists of two steps: first to build a computational graph, then feed data to the computational graph (Caffe, theano, TensorFlow, etc.) ● Define-by-Run(dynamic graph) Describing a forward-pass computation means to construct a computational graph for the backward computation (Chainer, DyNet, PyTorch, etc.) * autograd adopted Define-by-Run but it was not a framework for deep learning.
  8. 8. Define-and-Run and Define-by-Run # Building x = Variable(‘x’) y = Variable(‘y’) z = x + 2 * y # Evaluation for xi, yi in data: eval(z, (xi, yi)) # Build, evaluate at the same time for xi, yi in data: x = Variable(xi) y = Variable(yi) z = x + 2 * y You can make a branch to change the forward computation depending on the data Define-and-Run Define-by-Run
  9. 9. How to write a Convolutional Network import chainer import chainer.links as L import chainer.functions as F class LeNet5(chainer.Chain): def __init__(self): super(LeNet5, self).__init__() with self.init_scope(): self.conv1 = L.Convolution2D(1, 6, 5, 1) self.conv2 = L.Convolution2D(6, 16, 5, 1) self.conv3 = L.Convolution2D(16, 120, 4, 1) self.fc4 = L.Linear(None, 84) self.fc5 = L.Linear(84, 10) • Start writing a model by inheriting Chain class • Register parametric layers inside the init_scope • Write forward computation in __call__ method (no need to write backward computation) def __call__(self, x): h = F.sigmoid(self.conv1(x)) h = F.max_pooling_2d(h, 2, 2) h = F.sigmoid(self.conv2(h)) h = F.max_pooling_2d(h, 2, 2) h = F.sigmoid(self.conv3(h)) h = F.sigmoid(self.fc4(h)) return self.fc5(h)
  10. 10. Training models model = LeNet5() model = L.Classifier(model) # Dataset is a list! ([] to access, having __len__) dataset = [(x1, t1), (x2, t2), ...] # iterator to return a mini-batch retrieved from dataset it = iterators.SerialIterator(dataset, batchsize=32) # Optimization methods (you can easily try various methods by changing SGD to # MomentumSGD, Adam, RMSprop, AdaGrad, etc.) opt = optimizers.SGD(lr=0.01) opt.setup(model) updater = training.StandardUpdater(it, opt, device=0) # device=-1 if you use CPU trainer = training.Trainer(updater, stop_trigger=(100, 'epoch')) For more details, refer to official examples:
  11. 11. Define-by-Run brings flexibility and intuitiveness “Forward computation” becomes a definition of network • Depending on data, it is easy to change a network structure • You can define a network itself by Python code =The network structure can be treated as a program instead of data. For Chainer, the “forward computation” can be written in Python • Enables you to write a network structure freely using the syntax of Python • Define-by-Run makes it easy to insert any process like putting a print statement between network computations (In case of define-and-run which compiles a network, this kind of debugging is difficult) • Easy to reuse code of the same network for other purposes with few changes (e.g. by just adding a conditional branch partially) • Easy to check intermediate values and the design of the network itself using external debugging tools etc.
  12. 12. Chainer v2.0.1 Significantly reduced memory consumption, organized API in response to the users feedback Aggressive Buffer Release to reduce the memory consumption during training→ CuPy has been released as an independent library. This allows for array operations using GPU via an interface highly compatible with NumPy.
  13. 13. CuPy Independent library to handle all GPU calculations in Chainer Lower cost to migrate CPU code to GPU with NumPy-compatible API GPU-execute linear algebra algorithms such as a singular value decomposition Rich in examples such as KMeans, Gaussian Mixture Model import numpy as np x = np.random.rand(10) W = np.random.rand(10, 5) y =, W) import cupy as cp x = cp.random.rand(10) W = cp.random.rand(10, 5) y =, W) GPU
  14. 14. Add-on packages for Chainer Distribute deep learning, deep reinforcement learning, computer vision ChainerMN (Multi-Node): additional package for distributed deep learning   High scalability (100 times faster with 128GPU) ChainerRL: deep reinforcement learning library   DQN, DDPG, A3C, ACER, NSQ, PCL, etc. OpenAI Gym support ChainerCV: provides image recognition algorithms, dataset wrappers   Faster R-CNN, Single Shot Multibox Detector (SSD), SegNet, etc.
  15. 15. ChainerMN Chainer + Multi-Node
  16. 16. ChainerMN: Multi-node Keeping the easy-to-use characteristics of Chainer as is, ChainerMN enables to use multiple nodes which have multiple GPUs easily to make training faster GPU GPU InfiniBand GPU GPU InfiniBand MPI NVIDIA NCCL
  17. 17. Destributed deep learning with ChainerMN 100x speed up with 128 GPUs
  18. 18. Comparison with other frameworks ChainerMN is the fastest at the comparison of elapsed time to train ResNet-50 on ImageNet dataset for 100 epochs (May 2017)
  19. 19. We confirmed that if we increase the number of nodes, the almost same accuracy can be achieved Speedup without dropping the accuracy
  20. 20. Scale-out test on Microsoft Azure
  21. 21. Easy-to-use API of ChainerMN You can start using ChainerMN just by wrapping one line! optimizer = chainer.optimizers.MomentumSGD() optimizer = chainermn.DistributedOptimizer( chainer.optimizers.MomentumSGD())
  22. 22. ARM template will be announced soon ↑ Click this to make a master node ↑ Click this to make worker nodes
  23. 23. Scaling via web interface You can launch a scale-set of Azure instances super easily!
  24. 24. ChainerRL Chainer + Reinforcement Learning
  25. 25. Reinforcement Learning: ChainerRL: Deep Reinforcement Learning Library Train an agent which interacts with the environment to maximize the rewards Action Env Observation, Reward
  26. 26. Reinforcement Learning with ChainerRL 1. Create an environment Action Env Observation, Reward
  27. 27. Distribution: Softmax, Mellowmax, Gaussian,… Policy: Observation → Distribution of actions 2. Define an agent model Reinforcement Learning with ChainerRL
  28. 28. 2. Define an agent model (contd.) Q-Function: Observation → Value of each action (expectation of the sum of future rewards) ActionValue: Discrete, Quadratic Reinforcement Learning with ChainerRL
  29. 29. Action Env Observation, Reward 3. Create an agent Reinforcement Learning with ChainerRL
  30. 30. 4. Interact with the environment! Reinforcement Learning with ChainerRL
  31. 31. Algorithms provided by ChainerRL • Deep Q-Network (Mnih et al., 2015) • Double DQN (Hasselt et al., 2016) • Normalized Advantage Function (Gu et al., 2016) • (Persistent) Advantage Learning (Bellemare et al., 2016) • Deep Deterministic Policy Gradient (Lillicrap et al., 2016) • SVG(0) (Heese et al., 2015) • Asynchronous Advantage Actor-Critic (Mnih et al., 2016) • Asynchronous N-step Q-learning (Mnih et al., 2016) • Actor-Critic with Experience Replay (Wang et al., 2017) <- NEW! • Path Consistency Learning (Nachum et al., 2017) <- NEW! • etc.
  32. 32. ChainerRL Quickstart Guide • Define a Q-function in a Jupyter notebook and learn the Cart Pole Balancing problem with DQN
  33. 33. ChainerCV Chainer + Computer Vision
  34. 34. Evaluate your model on popular datasets Running and training deep-learning models easier for Computer Vision tasks ChainerCV Datasets Pascal VOC, Caltech-UCSD Birds-200-2011, Stanford Online Products, CamVid, etc. Models Faster R-CNN, SSD, SegNet (will add more models!) Training tools Evaluation tools Dataset Abstraction Train popular models with your data
  35. 35. Start computer vision research using deep learning much easier ChainerCV Latest algorithms with your data Provide complete model code, training code, inference code for segmentation algorithms (SegNet, etc.) and object detection algorithms (Faster R-CNN, SSD, etc.), and so on All code is confirmed to reproduce the results All training code and model code reproduced the experimental results shown in the original paper
  36. 36. • If you want to see some examples of ChainerCV and the reproducing code for some papers, please check the official Github repository (chainer/chainercv) • The right figure shows the result of the inference code of Faster RCNN example • The pre-trained weights are automatically downloaded! $ pip install chainercv
  37. 37. • • → ←
  38. 38. • • •
  39. 39. Intel Chainer
  40. 40. Intel Chainer with MKL-DNN Backend CPU CuPy NVIDIA GPU CUDA cuDNN BLAS NumPy Chainer MKL-DNN Intel Xeon/Xeon Phi MKL
  41. 41. Intel Chainer with MKL-DNN Backend MKL-DNN • Neural Network library optimized for Intel architectures • Supported CPUs: ✓ Intel Atom(R) processor with Intel(R) SSE4.1 support ✓ 4th, 5th, 6th and 7th generation Intel(R) Core processor ✓ Intel(R) Xeon(R) processor E5 v3 family (code named Haswell) ✓ Intel(R) Xeon(R) processor E5 v4 family (code named Broadwell) ✓ Intel(R) Xeon(R) Platinum processor family (code name Skylake) ✓ Intel(R) Xeon Phi(TM) product family x200 (code named Knights Landing) ✓ Future Intel(R) Xeon Phi(TM) processor (code named Knights Mill) • MKL-DNN accelerates the computation of NN on the above CPUs
  42. 42. Intel Chainer with MKL-DNN Backend convnet-benchmarks* result: Intel Chainer Chainer with NumPy (MKL-Build) Alexnet Forward 429.16 ms 5041.91 ms Alexnet Backward 841.73 ms 5569.49 ms Alexnet Total 1270.89 ms 10611.40 ms ~8.35x faster than NumPy backend!
  43. 43. Intel Chainer with MKL-DNN Backend Intel is developing Intel Chainer as a fork of Chainer v2
  44. 44. Applications using Chainer
  45. 45. Object Detection
  46. 46. Semantic Segmentation
  47. 47. Ponanza Chainer ● Won the 2nd place at The 27th World Computer Shogi Championship ● Based on Ponanza which was the champion for two years in a row (2015, 2016) ● “Ponanza Chainer” applied Deep Learning for ordering the possible next moves for which “Ponanza” should think ahead deeply ● “Ponanza Chainer” wins “Ponanza” with a probability of 80% Team PFN Issei Yamamoto Akira Shimoyama Team Ponanza
  48. 48. Paints Chainer ● Auto Sketch Colorization ● Train a neural network with a large dataset of paintings ● It takes a line drawings as input, and output a colorized image! ● You can also give color hits which indicates preferable colors
  49. 49. Installation of Chainer
  50. 50. 1. Install CUDA Toolkit 8.0 2. Install cuDNN v6.0 Library 3. Install NCCL for Multi-GPUs 4. Install CuPy and Chainer % pip install cupy % pip install chainer Chainer on Ubuntu For more details, see the official installation guide:
  51. 51. Chainer on Windows with NVIDIA GPU 1. Install Visual C++ 2015 Build Tools 2. Install CUDA Toolkit 8.0 3. Install cuDNN v6.0 Library for Windows 10 Put all files under C:Program FilesNVIDIA GPU Computing ToolkitCUDAv8.0 4. Install Anaconda 4.3.1 Python 3.6 or 2.7 5. Add environmental variables - Add “C:Program Files (x86)Microsoft Visual Studio 14.0VCbin” to PATH variable - Add “C:Program Files (x86)Windows Kits10Include10.0.10240.0ucrt” to INCLUDE variable 6. Install Chainer on Anaconda Prompt > pip install chainer
  52. 52. Chainer on Azure Use Data Science Virtual Machine for Linux (Ubuntu) •Ready for CUDA 8.0 & cuDNN 5.1 •After ssh, ”pip install --user chainer” 1 2 3
  53. 53. Chainer Model Export tfchain: TensorFlow export (experimental) Caffe-export: Caffe export (experimental) • • Supports Linear, Convolution2D, MaxPooling2D, ReLU • Just add @totf decorator right before the forward method of the model • Currently closed project • Supports Conv2D, Deconv2D, BatchNorm, ReLU, Concat, Softmax, Reshape
  54. 54. External Projects for Model Portability DLPack • • The model conversion to run it on a web browser supports Chainer WebDNN • ck • MXNet, Torch, Caffe2 have joined to discuss the guideline of memory layout of tensor and the common operator interfaces
  55. 55. Companies supporting Chainer
  56. 56. Companies supporting Chainer
  57. 57. Contributing to Chainer
  58. 58. Chainer is an open-source project. • You can send a PR from here: • The development speed of Deep Learning research is super fast, therefore, to provide the state-of-the-art technologies through Chainer, we continuously update the development plans: • Chainer v3.0.0 will be released on 26th September! • Will support gradient of gradient (higher order differentiation) • Will add the official Windows support ensured by Microsoft The release schedule after v2.0.1 (4th July)→