Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intel Nervana Artificial Intelligence Meetup 1/31/17

Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel Nervana has built a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. Nervana’s platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data. In this talk, we will give an overview of Nervana’s DL platform and get some hands-on experience using this platform to train and execute deep learning models.

Speaker: Will Constable

Join our Meetup Group: https://www.meetup.com/SV-Deep-Learning/

Intel Nervana Artificial Intelligence Meetup 1/31/17

  1. 1. Proprietary and confidential. Do not distribute. Introduction to deep learning with neon MAKING MACHINES SMARTER.™
  2. 2. Nervana Systems Proprietary 2 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Example: recognition of handwritten digits • Model ingredients in-depth • Deep learning with neon
  3. 3. Nervana Systems Proprietary Intel Nervana‘s deep learning solution stack 3 Images Video Text Speech Tabular Time series Solutions
  4. 4. Nervana Systems Proprietary Deep Dream Autoencoders Deep Speech 2 Skip-thought SegNet Fast-RCNN Object Localization Deep Reinforcement Learning imdb Sentiment Analysis Video Activity Detection Deep Residual Net bAbI Q&A AIICNN AlexNet GoogLeNet VGG https://github.com/NervanaSystems/ModelZoo
  5. 5. Nervana Systems Proprietary Intel Nervana in action 5 Healthcare: Tumor detection Automotive: Speech interfaces Finance: Time-series search engine Positive: Negative: Agricultural Robotics Oil & Gas Positive: Negative: Proteomics: Sequence analysis Query: Results:
  6. 6. Nervana Systems Proprietary • Optimized AVX-2 and AVX-512 instructions • Intel® Xeon® processors and Intel® Xeon Phi™ processors • Optimized for common deep learning operations • GEMM (useful in RNNs and fully connected layers) • Convolutions • Pooling • ReLU • Batch normalization • Coming soon: LSTM, GRU, Winograd-based convolutions 6
  7. 7. Nervana Systems Proprietary
  8. 8. Nervana Systems Proprietary 8 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Example: recognition of handwritten digits • Model ingredients in-depth • Deep learning with neon
  9. 9. Nervana Systems Proprietary 9 • SUPERVISED LEARNING • DATA -> LABELS • UNSUPERVISED LEARNING • NO LABELS; CLUSTERING • REDUCING DIMENSIONALITY • REINFORCEMENT LEARNING • REWARD ACTIONS (E.G., ROBOTICS)
  10. 10. Nervana Systems Proprietary 10 • SUPERVISED LEARNING • DATA -> LABELS • UNSUPERVISED LEARNING • NO LABELS; CLUSTERING • REDUCING DIMENSIONALITY • REINFORCEMENT LEARNING • REWARD ACTIONS (E.G., ROBOTICS)
  11. 11. Nervana Systems Proprietary 11 (𝑓#, 𝑓%, … , 𝑓') SVM Random Forest Naïve Bayes Decision Trees Logistic Regression Ensemble methods 𝑁×𝑁 𝐾 ≪ 𝑁 Arjun
  12. 12. Nervana Systems Proprietary 12 Animals Faces Chairs Fruits Vehicles
  13. 13. Nervana Systems Proprietary Animals Faces Chairs Fruits Vehicles 13
  14. 14. Nervana Systems Proprietary Animals Faces Chairs Fruits Vehicles 14 Training error x x x x x x x x x x x x x x x x x xx x x x xx xx x Testing error
  15. 15. Nervana Systems Proprietary 15 Training Time Error Training Error Testing/Validation Error Underfitting Overfitting Bias-Variance Trade-off
  16. 16. Nervana Systems Proprietary 16 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Example: recognition of handwritten digits • Model ingredients in-depth • Deep learning with neon
  17. 17. Nervana Systems Proprietary 17 ~60 million parameters Arjun But old practices apply: Data Cleaning, Underfit/Overfit, Data exploration, right cost function, hyperparameters, etc. 𝑁×𝑁
  18. 18. Nervana Systems Proprietary 18 Bigger Data Better Hardware Smarter Algorithms Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $1000.00 Cost / GB in 2015: $0.03 Advances in algorithm innovation, including neural networks, leading to better accuracy in training models
  19. 19. Nervana Systems Proprietary 19
  20. 20. Nervana Systems Proprietary 20 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  21. 21. Nervana Systems Proprietary 𝑦𝑥% 𝑥0 𝑥# 𝑎 max(𝑎, 0) 𝑡𝑎𝑛ℎ(𝑎) Output of unit Activation Function Linear weights Bias unit Input from unit j 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑 𝑔 ∑
  22. 22. Nervana Systems Proprietary Input Hidden Output Affine layer: Linear + Bias + Activation
  23. 23. Nervana Systems Proprietary MNIST dataset 70,000 images (28x28 pixels) Goal: classify images into a digit 0-9 N = 28 x 28 pixels = 784 input units N = 10 output units (one for each digit) Each unit i encodes the probability of the input image of being of the digit i N = 100 hidden units (user-defined parameter) Input Hidden Output
  24. 24. Nervana Systems Proprietary N=784 N=100 N=10 Total parameters: 𝑊@→B, 𝑏B 𝑊B→D, 𝑏D 𝑊@→B 𝑏B 𝑊B→D 𝑏D 784 x 100 100 100 x 10 10 = 84,600 𝐿𝑎𝑦𝑒𝑟 𝑖 𝐿𝑎𝑦𝑒𝑟 𝑗 𝐿𝑎𝑦𝑒𝑟 𝑘
  25. 25. Nervana Systems Proprietary Input Hidden Output 1. Randomly seed weights 2. Forward-pass 3. Cost 4. Backward-pass 5. Update weights
  26. 26. Nervana Systems Proprietary Input Hidden Output 𝑊@→B, 𝑏B ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1) 𝑊B→D, 𝑏D ∼ 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(0,1)
  27. 27. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 28x28 Input Hidden Output
  28. 28. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 28x28 Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function 𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ)
  29. 29. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function 𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ) Δ𝑊@→B Δ𝑊B→D
  30. 30. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ 𝑊∗ 𝜕𝐶 𝜕𝑊∗ compute
  31. 31. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D 𝑥D + 𝑏D) 𝑊∗
  32. 32. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔 ∑(𝑊B→D 𝑥D + 𝑏D) 𝑎(𝑊B→D, 𝑥D) = 𝑊B→D ∗ 𝜕𝐶 𝜕𝑊∗ = 𝜕𝐶 𝜕𝑔 𝜕𝑔 𝜕𝑎 𝜕𝑎 𝜕𝑊∗ a 𝑔 = max ( 𝑎, 0) a 𝑔′(𝑎) = 𝐶 𝑔(𝑎 𝑊B→D, 𝑥D )
  33. 33. Nervana Systems Proprietary Input Hidden Output 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D(𝑎D 𝑊B→D, 𝑔B(𝑎B(𝑊@→B, 𝑥B)) 𝜕𝐶 𝜕𝑊∗ = 𝜕𝐶 𝜕𝑔D 𝜕𝑔D 𝜕𝑎D 𝜕𝑎D 𝜕𝑔B 𝜕𝑔B 𝜕𝑎B 𝜕𝑎B 𝜕𝑊∗ 𝐶 𝑦, 𝑡𝑟𝑢𝑡ℎ = 𝐶 𝑔D 𝑎D(𝑊B→D, 𝑥D = 𝑦B 𝑦B 𝑊@→B ∗
  34. 34. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_)
  35. 35. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝑑𝐽 𝒘(_) 𝑑𝒘
  36. 36. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝒘(#) = 𝒘(_) − 𝑑𝐽 𝒘(_) 𝑑𝒘
  37. 37. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝒘(#) = 𝒘(_) − 𝛼 𝑑𝐽 𝒘(_) 𝑑𝒘 learning rate
  38. 38. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝒘(#) = 𝒘(_) − 𝛼 𝑑𝐽 𝒘(_) 𝑑𝒘 𝒘(#) too small
  39. 39. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝒘(#) = 𝒘(_) − 𝛼 𝑑𝐽 𝒘(_) 𝑑𝒘 𝒘(#) too large
  40. 40. Nervana Systems Proprietary 𝐽 𝒘(_) = ` 𝑐𝑜𝑠𝑡(𝒘(_), 𝒙𝑖) b @c# 𝒘𝒘(_) 𝒘(#) = 𝒘(_) − 𝛼 𝑑𝐽 𝒘(_) 𝑑𝒘 𝒘(#) good enough
  41. 41. Nervana Systems Proprietary 𝐽 𝒘(#) = ` 𝑐𝑜𝑠𝑡(𝒘(#), 𝒙𝑖) b @c# 𝒘𝒘(%) 𝒘(%) = 𝒘(#) − 𝛼 𝑑𝐽 𝒘(#) 𝑑𝒘 𝒘(#)
  42. 42. Nervana Systems Proprietary 𝐽 𝒘(%) = ` 𝑐𝑜𝑠𝑡(𝒘(%), 𝒙𝑖) b @c# 𝒘 𝒘(0) = 𝒘(%) − 𝛼 𝑑𝐽 𝒘(%) 𝑑𝒘 𝒘(%) 𝒘(0)
  43. 43. Nervana Systems Proprietary 𝐽 𝒘(0) = ` 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖) b @c# 𝒘 𝒘(g) = 𝒘(0) − 𝛼 𝑑𝐽 𝒘(0) 𝑑𝒘 𝒘(g) 𝒘(0)
  44. 44. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊
  45. 45. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 Update weights via: Δ𝑊 = 𝛼 ∗ 1 𝑁 ` 𝛿𝑊 Learning rate
  46. 46. Nervana Systems Proprietary fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 fprop cost bprop 𝛿𝑊 minibatch #1 weight update minibatch #2 weight update
  47. 47. Nervana Systems Proprietary Epoch 0 Epoch 1 Sample numbers: • Learning rate ~0.001 • Batch sizes of 32-128 • 50-90 epochs
  48. 48. Nervana Systems Proprietary SGDGradient Descent
  49. 49. Nervana Systems Proprietary Krizhevsky, 2012 60 million parameters 120 million parameters Taigman, 2014
  50. 50. Nervana Systems Proprietary 50 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  51. 51. Nervana Systems Proprietary Dataset Model/Layers Activation OptimizerCost 𝐶(𝑦, 𝑡)
  52. 52. Nervana Systems Proprietary Filter + Non-Linearity Pooling Filter + Non-Linearity Fully connected layers … “how can I help you?” cat Low level features Mid level features Object parts, phonemes Objects, words *Hinton et al., LeCun, Zeiler, Fergus Filter + Non-Linearity Pooling
  53. 53. Nervana Systems Proprietary Tanh Rectified Linear UnitLogistic -1 1 1 0 𝑔 𝑎 = 𝑒j ∑ 𝑒jk D Softmax
  54. 54. Nervana Systems Proprietary Gaussian Gaussian(mean, sd) GlorotUniform Uniform(-k, k) Xavier Uniform(k, k) Kaiming Gaussian(0, sigma) 𝑘 = 6 𝑑@m + 𝑑nop 𝑘 = 3 𝑑@m 𝜎 = 2 𝑑@m
  55. 55. Nervana Systems Proprietary • Cross Entropy Loss • Misclassification Rate • Mean Squared Error • L1 loss
  56. 56. Nervana Systems Proprietary 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 0 0 0 1 0 0 0 0 0 0 Ground Truth − ` 𝑡D×log ( 𝑦D) D = −log (0.3)
  57. 57. Nervana Systems Proprietary 0.3 0.3 0.4 0.3 0.4 0.3 0.1 0.2 0.7 0 0 1 0 1 0 1 0 0 Outputs Targets Correct? Y Y N 0.1 0.2 0.7 0.1 0.7 0.2 0.3 0.4 0.3 0 0 1 0 1 0 1 0 0 Y Y N -(log(0.4) + log(0.4) + log(0.1))/3 =1.38 -(log(0.7) + log(0.7) + log(0.3))/3 =0.64
  58. 58. Nervana Systems Proprietary • SGD with Momentum • RMS propagation • Adagrad • Adadelta • Adam
  59. 59. Nervana Systems Proprietary Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g training time 𝛼pcx y = 𝛼 ∑ Δ𝑊p %pcx pc_
  60. 60. Nervana Systems Proprietary Δ𝑊# Δ𝑊% Δ𝑊0 Δ𝑊g training time 𝛼pcg y = 𝛼 Δ𝑊% % + Δ𝑊0 % + Δ𝑊g %
  61. 61. Nervana Systems Proprietary 61 • Intel Nervana overview • Machine learning basics • What is deep learning? • Basic deep learning concepts • Model ingredients in-depth • Deep learning with neon
  62. 62. Nervana Systems Proprietary
  63. 63. Nervana Systems Proprietary
  64. 64. Nervana Systems Proprietary •Popular, well established, developer familiarity •Fast to prototype •Rich ecosystem of existing packages. •Data Science: pandas, pycuda, ipython, matplotlib, h5py, … •Good “glue” language: scriptable plus functional and OO support, plays well with other languages
  65. 65. Nervana Systems Proprietary Backend NervanaGPU, NervanaCPU Datasets MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short- Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normali zation, Bidirectional-RNN, Bidirectional-LSTM Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
  66. 66. Nervana Systems Proprietary 1. Generate backend 2. Load data 3. Specify model architecture 4. Define training parameters 5. Train model 6. Evaluate
  67. 67. Nervana Systems Proprietary
  68. 68. NERVANA andres.rodriguez@intel.com

×