Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to deep learning @ Startup.ML by Andres Rodriguez


Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel offers various software and hardware to support a diversity of workloads and user needs. Intel Nervana delivers a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. This platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.

Related Books

Free with a 30 day trial from Scribd

See all
  • Login to see the comments

Introduction to deep learning @ Startup.ML by Andres Rodriguez

  1. 1. Proprietary and confidential. Do not distribute. Introduction to deep learning with neon MAKING MACHINES SMARTER.™
  2. 2. Nervana Systems Proprietary 2 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Example: recognition of handwritten digits • Model ingredients in-depth • Deep learning with neon
  3. 3. Nervana Systems Proprietary Intel Nervana‘s deep learning solution stack 3 Images Video Text Speech Tabular Time series Solutions
  4. 4. Nervana Systems Proprietary Intel Nervana in action 4 Healthcare: Tumor detection Automotive: Speech interfaces Finance: Time-series search engine Positive: Negative: Agricultural Robotics Oil & Gas Positive: Negative: Proteomics: Sequence analysis Query: Results:
  5. 5. Nervana Systems Proprietary 5 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Example: recognition of handwritten digits • Model ingredients in-depth • Deep learning with neon
  6. 6. Nervana Systems Proprietary 7 Training error
  7. 7. Nervana Systems Proprietary 8 Training error x x x x x x x x x x x x x x x x x xx x x x xx xx x Testing error
  8. 8. Nervana Systems Proprietary 9 Training Time Error Training Error Testing/Validation Error Underfitting Overfitting Bias-Variance Trade-off
  9. 9. Nervana Systems Proprietary 10 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  10. 10. Nervana Systems Proprietary 11 (𝑓1, 𝑓2, … , 𝑓𝐾) SVM Random Forest Naïve Bayes Decision Trees Logistic Regression Ensemble methods 𝑁 × 𝑁 𝐾 ≪ 𝑁 Arjun
  11. 11. Nervana Systems Proprietary 12 ~60 million parameters Arjun But old practices apply: Data Cleaning, Underfit/Overfit, Data exploration, right cost function, hyperparameters, etc. 𝑁 × 𝑁
  12. 12. Nervana Systems Proprietary 13
  13. 13. Nervana Systems Proprietary 𝑦𝑥2 𝑥3 𝑥1 𝑎 max(𝑎, 0) 𝑡𝑎𝑛ℎ(𝑎) Output of unit Activation Function Linear weights Bias unit Input from unit j 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑 𝑔 ∑
  14. 14. Nervana Systems Proprietary Input Hidden Output Affine layer: Linear + Bias + Activation
  15. 15. Nervana Systems Proprietary MNIST dataset 70,000 images (28x28 pixels) Goal: classify images into a digit 0-9 N = 28 x 28 pixels = 784 input units N = 10 output units (one for each digit) Each unit i encodes the probability of the input image of being of the digit i N = 100 hidden units (user-defined parameter) Input Hidden Output
  16. 16. Nervana Systems Proprietary N=784 N=100 N=10 Total parameters: 𝑊𝑖→𝑗, 𝑏𝑗 𝑊𝑗→𝑘, 𝑏 𝑘 𝑊𝑖→𝑗 𝑏𝑗 𝑊𝑗→𝑘 𝑏 𝑘 784 x 100 100 100 x 10 10 = 84,600 𝐿𝑎𝑦𝑒𝑟 𝑖 𝐿𝑎𝑦𝑒𝑟 𝑗 𝐿𝑎𝑦𝑒𝑟 𝑘
  17. 17. Nervana Systems Proprietary Input Hidden Output 1. Randomly seed weights 2. Forward-pass 3. Cost 4. Backward-pass 5. Update weights
  18. 18. Nervana Systems Proprietary Input Hidden Output 𝑊𝑖→𝑗, 𝑏𝑗 ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1) 𝑊𝑗→𝑘, 𝑏 𝑘 ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)
  19. 19. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 28x28 Input Hidden Output
  20. 20. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 28x28 Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function 𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)
  21. 21. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function 𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ) Δ𝑊𝑖→𝑗 Δ𝑊𝑗→𝑘
  22. 22. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ 𝑊∗ 𝜕𝐶 𝜕𝑊∗ compute
  23. 23. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊𝑗→𝑘 𝑥 𝑘 + 𝑏 𝑘) 𝑊∗
  24. 24. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊𝑗→𝑘 𝑥 𝑘 + 𝑏 𝑘) 𝑎(𝑊𝑗→𝑘, 𝑥 𝑘) = 𝑊𝑗→𝑘 ∗ 𝜕𝐶 𝜕𝑊∗ = 𝜕𝐶 𝜕𝑔 ∙ 𝜕𝑔 𝜕𝑎 ∙ 𝜕𝑎 𝜕𝑊∗ a 𝑔 = max(𝑎, 0) a 𝑔′(𝑎) = 𝐶 𝑔(𝑎 𝑊𝑗→𝑘, 𝑥 𝑘 )
  25. 25. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 𝑘(𝑎 𝑘 𝑊𝑗→𝑘, 𝑔𝑗(𝑎𝑗(𝑊𝑖→𝑗, 𝑥𝑗)) 𝜕𝐶 𝜕𝑊∗ = 𝜕𝐶 𝜕𝑔 𝑘 ∙ 𝜕𝑔 𝑘 𝜕𝑎 𝑘 ∙ 𝜕𝑎 𝑘 𝜕𝑔𝑗 ∙ 𝜕𝑔𝑗 𝜕𝑎𝑗 ∙ 𝜕𝑎𝑗 𝜕𝑊∗ 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 𝑘 𝑎 𝑘(𝑊𝑗→𝑘, 𝑥 𝑘 = 𝑦𝑗 𝑦𝑗 𝑊𝑖→𝑗 ∗
  26. 26. Nervana Systems Proprietary Szegedy et al, 2015 Schmidhuber, 1997 • Activation functions • Weight initialization • Learning rule • Layer architecture (number of layers, layer types, depth, etc.)
  27. 27. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊
  28. 28. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 Update weights via: Δ𝑊 = 𝛼 ∗ 1 𝑁 𝛿𝑊 Learning rate
  29. 29. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 minibatch #1 weight update minibatch #2 weight update
  30. 30. Nervana Systems Proprietary Epoch 0 Epoch 1 Sample numbers: • Learning rate ~0.001 • Batch sizes of 32-128 • 50-90 epochs
  31. 31. Nervana Systems Proprietary SGDGradient Descent
  32. 32. Nervana Systems Proprietary Krizhevsky, 2012 60 million parameters 120 million parameters Taigman, 2014
  33. 33. Nervana Systems Proprietary 34 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  34. 34. Nervana Systems Proprietary Dataset Model/Layers Activation OptimizerCost 𝐶(𝑦, 𝑡)
  35. 35. Nervana Systems Proprietary Filter + Non-Linearity Pooling Filter + Non-Linearity Fully connected layers … “how can I help you?” cat Low level features Mid level features Object parts, phonemes Objects, words *Hinton et al., LeCun, Zeiler, Fergus Filter + Non-Linearity Pooling
  36. 36. Nervana Systems Proprietary Tanh Rectified Linear UnitLogistic -1 1 1 0 𝑔 𝑎 = 𝑒 𝑎 ∑ 𝑘 𝑒 𝑎 𝑘 Softmax
  37. 37. Nervana Systems Proprietary Gaussian Gaussian(mean, sd) GlorotUniform Uniform(-k, k) Xavier Uniform(k, k) Kaiming Gaussian(0, sigma) 𝑘 = 6 𝑑𝑖𝑛 + 𝑑 𝑜𝑢𝑡 𝑘 = 3 𝑑𝑖𝑛 𝜎 = 2 𝑑𝑖𝑛
  38. 38. Nervana Systems Proprietary • Cross Entropy Loss • Misclassification Rate • Mean Squared Error • L1 loss
  39. 39. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 0 0 0 1 0 0 0 0 0 0 Ground Truth − 𝑘 𝑡 𝑘 × log(𝑦 𝑘) = −log(0.3)
  40. 40. Nervana Systems Proprietary 0.3 0.3 0.4 0.3 0.4 0.3 0.1 0.2 0.7 0 0 1 0 1 0 1 0 0 Outputs Targets Correct? Y Y N 0.1 0.2 0.7 0.1 0.7 0.2 0.3 0.4 0.3 0 0 1 0 1 0 1 0 0 Y Y N -(log(0.4) + log(0.4) + log(0.1))/3 =1.38 -(log(0.7) + log(0.7) + log(0.3))/3 =0.64
  41. 41. Nervana Systems Proprietary • SGD with Momentum • RMS propagation • Adagrad • Adadelta • Adam
  42. 42. Nervana Systems Proprietary Δ𝑊1 Δ𝑊2 Δ𝑊3 Δ𝑊4 training time 𝛼 𝑡=4 ′ = 𝛼 Δ𝑊2 2 + Δ𝑊3 2 + Δ𝑊4 2
  43. 43. Nervana Systems Proprietary Δ𝑊1 Δ𝑊2 Δ𝑊3 Δ𝑊4 training time 𝛼 𝑡=𝑇 ′ = 𝛼 ∑ 𝑡=0 𝑡=𝑇 Δ𝑊𝑡 2
  44. 44. Nervana Systems Proprietary 45 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  45. 45. Nervana Systems Proprietary
  46. 46. Nervana Systems Proprietary
  47. 47. Nervana Systems Proprietary •Popular, well established, developer familiarity •Fast to prototype •Rich ecosystem of existing packages. •Data Science: pandas, pycuda, ipython, matplotlib, h5py, … •Good “glue” language: scriptable plus functional and OO support, plays well with other languages
  48. 48. Nervana Systems Proprietary Backend NervanaGPU, NervanaCPU Datasets MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short- Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normaliza tion, Bidirectional-RNN, Bidirectional-LSTM Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
  49. 49. Nervana Systems Proprietary 1. Generate backend 2. Load data 3. Specify model architecture 4. Define training parameters 5. Train model 6. Evaluate
  50. 50. Nervana Systems Proprietary
  51. 51. NERVANA andres.rodriguez@intel.com

×