Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tokyo Webmining Talk1

7,257 views

Published on

Talk in Tokyo Webmining @FreakOut

Published in: Technology
  • Sex in your area is here: ❤❤❤ http://bit.ly/39pMlLF ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/39pMlLF ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Tokyo Webmining Talk1

  1. 1. Deep Learning Implementations and Frameworks (Very short version of PAKDD 2016 tutorial) Kenta Oono oono@preferred.jp Preferred Networks Inc. 25th Jun. 2016 Tokyo Webmining @FreakOut 1/31
  2. 2. Overview •1st session (8:30 ‒ 10:00) •Introduction (AK) •Basics of neural networks (AK) •Common design of neural network implementations (KO) •2nd session (10:30 ‒ 12:30) •Differences of deep learning frameworks (ST) •Coding examples of frameworks (KO & ST) •Conclusion (ST) 2/31
  3. 3. Full contents • Session1 • Basics of neural Networks • http://www.slideshare.net/atsu-kan/pakdd2016-tutorial-dlif-introduction- and-basics-63030841 • Common design of neural networks implementation • http://www.slideshare.net/KentaOono/common-design-of-deep-learning- frameworks • Session2 • Differences of deep learning frameworks • http://www.slideshare.net/beam2d/differences-of-deep-learning- frameworks • Coding examples of frameworks • will be available soon. 3/31
  4. 4. Basics of Neural Networks Atsunori Kanemura AIST, Japan 4/31
  5. 5. Mathematical model for a neuron • Compare the product of input and weights (parameters) with a threshold • Plasticity of the neuron = The change of parameters and … … f : nonlinear transform b x w b x1 x2 y w xD y = f ⇣ DX d=1 wdxd b ⌘ = f(wT x b) b 5/31
  6. 6. Parameter update • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • Take several samples (say, 128) from the dataset (mini-batch), estimate the gradient. • Theoretically motivated as the Robbins-Monro algorithm • SGD to general gradient-based algorithms • Adam, AdaGrad, etc. • Use momentum and other techniques w w rrwJ(w) = w NX n=1 h(xn h(xn, w ¯ ) def = (f(wT xn) yn)f(wT xn)(1 f(w 6/31
  7. 7. Gradient descent • The gradient of the loss for 1-layer model is • The update rule (r is a constant learning rate) rwJ(w) = 1 2 NX n=1 rw(f(wT xn) y⇤ n)2 = NX n=1 (f(wT xn) y⇤ n)rwf(wT xn) = NX n=1 (f(wT xn) y⇤ n)f(wT xn)(1 f(wT xn))xn w w rrwJ(w) = w NX n=1 h(xn, w)xn h(xn, w) def = (f(wT xn) y⇤ n)f(wT xn)(1 f(wT xn)) 7/31
  8. 8. Neural networks • Multi-layered • Minimize the loss to learn the parameters ※ f works element-wise y1 = f1(W 10 x) y2 = f2(W 21 y1 ) y3 = f3(W 32 y2 ) ... yL = fL(W (L)(L 1) yL 1 ) J({W }) = 1 2 NX n=1 (yL (xn) y⇤ n)2 8/31
  9. 9. Backprop • Use the chain rule to derive the gradient • E.g. 2-layer case • Calculate gradient recursively from top to bottom layers • Cf. Gradient vanishing, ReLU y1 n = f(W 10 xn), y2 n = f(w21 · y1 n) @J @W10 kl = X n,i @J @y1 ni @y1 ni @W10 kl J(W 10 , w21 ) = 1 2 X n (y2 n y⇤ n)2 9/31
  10. 10. Common Design of Deep Learning Frameworks Kenta Oono <oono@preferred.jp> Preferred Networks Inc. 10/31
  11. 11. Steps for training neural networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the Neural Network (NN) parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters 11/31
  12. 12. Technology stack of DL framework name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU 12/31
  13. 13. Technology stack of Chainer cuDNN Chainer NumPy CuPy BLAS cuBLAS, cuRAND CPU GPU name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware 13/31
  14. 14. Neural Network as a Computational Graph • In simplest form, NN is represented as a computational graph (CG) that is a stack of bipartite DAGs (Directed Acyclic Graph) consisting of data nodes and operator nodes. y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 data node operator node 14/31
  15. 15. Example: Multi-layer Perceptron (MLP) x Affine W1 b1 h1 ReLU a1 Affine W2 b2 h2 ReLU a2 Soft max y Cross Entropy Lo ss t It is choice of implementation if CG includes weights and biases. 15/31
  16. 16. Automatic Differentiation • Computes gradient of some specified data nodes (e.g. loss) with respect to each data node. • Each operator node must have backward operation to calculate gradients w.r.t. its inputs from gradients w.r.t. its outputs (realization of chain rule). • e.g. Function class of Chainer has backward method. • e.g. Each layer classes of Caffe has Backward_cpu and Backward_gpu methods • e.g. Autograd has a thin wrapper that adds gradient methods as a closure to most of NumPy methods. 16/31
  17. 17. Backprop through CG ∇y z∇x1 z ∇z z = 1 y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 17/31
  18. 18. Backprop as extended graphs x1 mul suby x3 z x2 dzid neg mul mul dy dx 3 dx 1 dx 2 forward propagation backward propagation y = x1 * x2 z = y - x3 18/31
  19. 19. Differences of Deep Learning Frameworks Seiya Tokui Preferred Networks, Inc. 19/31
  20. 20. Training of Neural Networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters automated 20/31
  21. 21. Framework Design Choices • The most crucial part of NN frameworks is • How to define the parameters • How to define the loss function of the parameters (= how to write computational graphs) • These also influence on APIs for forward prop, backprop, and parameter updates (i.e., numerical optimization) • And all of these are determined by how to implement computational graphs • Other parts are also important, but are mostly common to implementations of other types of machine learning methods 21/31
  22. 22. Framework Comparison: Basic information* Viewpoint Torch.nn** Theano*** Caffe autograd (NumPy, Torch) Chainer MXNet Tensor- Flow GitHub stars 4,719 3,457 9,590 N: 654 T: 554 1,295 3,316 20,981 Started from 2002 2008 2013 2015 2015 2015 2015 Open issues/PRs 97/26 525/105 407/204 N: 9/0 T: 3/1 95/25 271/18 330/33 Main developers Facebook, Twitter, Google, etc. Université de Montréal BVLC (U.C. Berkeley) N: HIPS (Harvard Univ.) T: Twitter Preferred Networks DMLC Google Core languages C/Lua C/Python C++ Python/Lua Python C++ C++/Python Supported languages Lua Python C++/Python MATLAB Python/Lua Python C++/Python R/Julia/Go etc. C++/Python * Data was taken on Apr. 12, 2016 ** Includes statistics of Torch7 *** There are many frameworks on top of Theano, though we omit them due to the space constraints 22/31
  23. 23. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations 23/31
  24. 24. Framework Comparison: Design Choices Design Choice Torch.nn Theano- based Caffe autograd (NumPy, Torch) Chainer MXNet Tensor- Flow 1.NN definition Script (Lua) Script* (Python) Data (protobuf) Script (Python, Lua) Script (Python) Script (many) Script (Python) 2. Graph construction Prebuild Prebuild Prebuild Dynamic Dynamic Prebuild** Prebuild 3. Backprop Through graph Extended graph Through graph Extended graph Through graph Through graph Extended graph 4. Parameters Hidden in operators Separate nodes Hidden in operators Separate nodes Separate nodes Separate nodes Separate nodes 5. Update formula Outside of graphs Part of graphs Outside of graphs Outside of graphs Outside of graphs Outside of graphs** Part of graphs 6. Optimization - Advanced optimization - - - - Simple optimization 57 Parallel computation Multi GPU Multi GPU (libgpuarray) Multi GPU Multi GPU (Torch) Multi GPU Multi node Multi GPU Multi node Multi GPU * Some of Theano-based frameworks use data (e.g. yaml) ** Dynamic dependency analysis and optimization is supported (no autodiff support)24/31
  25. 25. How to write NNs in text format Write NNs in declarative configuration files Framework builds layers of NNs as written in the files (e.g. prototxt, YAML). E.g.: Caffe (prototxt), Pylearn2 (YAML) Write NNs by procedural scripting Framework provides APIs of scripting languages to build NNs. E.g.: most other frameworks 25/31
  26. 26. 2. How to build computational graphs Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Compute the gradient (backprop) Update the NN parameters Define how to compute the loss Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss Compute the gradient (backprop) Update the NN parameters Build once, run several times Build one at every iteration 26/31
  27. 27. 3. How to compute backprop Backprop through graphs Framework only builds graphs of forward prop, and do backprop by backtracking the graphs. E.g.: Torch.nn, Caffe, MXNet, Chainer Backprop as extended graphs Framework builds graphs for backprop as well as those for forward prop. E.g.: Theano, TensorFlow a mul suby c z b a mul suby c z b dzid neg mul mul dy dc da db ∇y z∇x1 z ∇z z = 1 27/31
  28. 28. 4. How to represent parameters Parameters as part of operator nodes Parameters are owned by operator nodes (e.g., convolution layers), and not directly appear in the graphs. E.g.: Torch.nn, Caffe, MXNet Parameters as separate nodes in the graphs Parameters are represented as separate variable nodes. E.g.: Theano, Chainer, TensorFlow x Affine (own W and b) y x Affine yW b 28/31
  29. 29. 5. How to update parameters Update parameters by own routines outside of the graphs Update formulae are implemented directly using the backend array libraries. E.g.: Torch.nn, Caffe, MXNet, Chainer Represent update formulae as a part of the graphs Update formulae are built as a part of computational graphs. E.g.: Theano, TensorFlow 29/31
  30. 30. 6. How to achieve the computational performance Transform the graphs to optimize the computations There are many ways to optimize the computations. Theano supports variout optimizations. TensorFlow does simple ones. Provide easy ways to write custom operator nodes Users can write their own operator nodes optimized to their purposes. Torch, MXNet, and Chainer provide ways to write one code that runs both on CPU and GPU. Chainer also provides ways to write custom CUDA kernels without manual compilation steps. 30/31
  31. 31. 7. How to scale the computations Multi-GPU parallelizations Nowadays, most popular frameworks start supporting multi-GPU computations. Multi-GPU (one machine) is enough for most use cases today. Distributed computations (i.e., multi-node parallelizations) Some frameworks also support distributed computations to further scale the learning. MXNet uses a simple distributed key-value store. TensorFlow uses gRPC. It will also support easy-to-use cloud environments. CNTK uses simple MPI. 31/31
  32. 32. Conclusion • We introduced the basics of NNs, typical designs of their implementations, and pros/cons of various design choices. • Deep learning is an emerging field with increasing speed of development, so quick try-and-error is crutial for the research/development in this field • In that mean, using frameworks as highly reusable parts of NNs is important • There are growing number of frameworks in this world, though most of them have different aspects, so it is also important to choose one appropriate for your purpose 32/31

×