Proprietary and confidential. Do not distribute.
Introduction to Deep Learning and Neon
MAKING MACHINES SMARTER.™
Kyle H. Ambert, PhD

Senior Data Scientist
May 25 , 2017th
@TheKyleAmbert
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Together, we create production deep learning solutions in multiple
domains, while advancing the field of applied analytics and optimization.
Nervana Systems Proprietary
8
Intel’s Interest in Analytics
To provide the infrastructure
for the fastest time-to-insight
To create tools that enable
scientists to think about their
research, rather than their
process
To enable users to ask bigger
questions
Bigger Data Better Hardware Smarter Algorithms
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles
every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2015: $0.03
Advances in neural
networks leading to better
accuracy in training models
Great solutions require great hardware!
Nervana Systems Proprietary
LIBRARIES Intel® MKL
Intel® MKL-DNN
FRAMEWORKS
Intel® DAAL
HARDWARE
Memory/Storage FabricCompute
Intel
Distribution
MORE
UNLEASHING
POTENTIAL
FULL
SOLUTIONS
PLATFORMS/TOOLS
BIGDL
Intel® Nervana™ Deep
Learning Platform
Intel® Nervana™
Cloud
Intel® Nervana™
Graph
Nervana Systems Proprietary
10
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
11
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
12
AI? Machine Learning? Deep Learning?
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Ingest
Data
Engineer
Features
Structure

Model
Clean
Data
Visualize
Query/
Analyze
TrainM
odel
Deploy
Nervana Systems Proprietary
16
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
17
A Quite Brief History of Deep Learning
• 1960s: Neural networks used for binary classification
• 1970s: Neural networks popularity dries after not delivering on the hype
• 1980s: Backpropagation is used to train deep networks
• 1990s: Neural networks take the back seat to support vector machines due to the nice
theoretical properties and guarantee bounds
• 2010s: Access to large datasets and more computation allowed deep networks to return and
have state-of-the-art results in speech, vision, and natural language processing
• 1949: The Organization of Behavior is published
(Hebb!)
(Minsky)
Today: Deep Learning is a fast-moving area of academic and applied analytics!
There are many opportunities for new discoveries!
(Vapnik)
(Hinton)
Nervana Systems Proprietary
18
ML v. DL: Practical Differences
 
SVM
Random Forest
Naïve Bayes
Decision Trees
Logistic Regression
Ensemble methods
 
 
Harrison
Nervana Systems Proprietary
19
End-to-End Deep learning
~60 million parameters
Harrison
 
Nervana Systems Proprietary
20
Workflows in Machine Learning
⟹ The same rules apply for deep learning!
➝ Preprocessing data
➝ Feature extraction
➝ Parsimony in model selection
⟹ How we go about some of this does change…
Nervana Systems Proprietary
21
End-to-End Deep learning: Data Considerations
Nervana Systems Proprietary
22
End-to-End Deep learning: Data Considerations
Nervana Systems Proprietary
23
End-to-End Deep learning: Data Considerations
X X
X
XX
X
Labels: Harrison? Transformations! More data is always better!
Nervana Systems Proprietary
Deep Learning: Networks of Artificial Neurons
 
 
 
Output of unit
Activation Function
Linear weights Bias unit
Input from unit j
  
 
   
 
 
 
 
⟹ With an explosion of moving parts,
being able to understand and keep
track of what sort of model is being
built becomes even more important!
Nervana Systems Proprietary
Practical example: recognition of handwritten digits
MNIST dataset
70,000 images (28x28 pixels)
Goal: classify images into a digit 0-9
N = 28 x 28 pixels
= 784 input units
N = 10 output units (one
for each digit)
Each unit i encodes the
probability of the input
image of being of the
digit i
N = 100 hidden units
(user-defined
parameter)
Input
Hidden
Output
Nervana Systems Proprietary
Training procedure
Input
Hidden
Output 1. Randomly seed weights
2. Forward-pass
3. Cost
4. Backward-pass
5. Update weights
Nervana Systems Proprietary
Forward pass
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
Input
Hidden
Output
28x28
Nervana Systems Proprietary
Cost
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
28x28
Input
Hidden
Output
0
0
0
1
0
0
0
0
0
0
Ground Truth
Cost function
 
Nervana Systems Proprietary
Backward pass
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
Input
Hidden
Output
0
0
0
1
0
0
0
0
0
0
Ground Truth
Cost function
 
 ∆Wi→j
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output  
compute
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
=
 
 
 
a
! = max	((,0)
a
!′(()
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
 
 
Nervana Systems Proprietary
Training
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
Nervana Systems Proprietary
Gradient descent
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
Update weights via:
 
Learning rate
Nervana Systems Proprietary
Stochastic (minibatch) Gradient descent
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
minibatch #1
weight update
minibatch #2
weight update
Nervana Systems Proprietary
Stochastic (minibatch) Gradient descent
Epoch 0
Epoch 1
Sample numbers:
• Learning rate ~0.001
• Batch sizes of 32-128
• 50-90 epochs
Nervana Systems Proprietary
Why Does This Work at All?
Krizhevsky, 2012
60 million parameters
120 million parameters
Taigman, 2014
Nervana Systems Proprietary
39
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
Nervana in 30 seconds. Possibly less.
40
neon deep
learning
framework
train deployexplore
nervana
engine
2-3x speedup on
Titan X GPUs
cloudn
Nervana Systems Proprietary
neon framework
Nervana Systems Proprietary
nervana cloud
Web Interface Command Line
Nervana Systems Proprietary
43
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
Ge(i)t Neon!
1. git clone https://github.com/NervanaSystems/neon.git
2. pip install {h5py, pyaml, virtualenv}
3. brew install {opencv|opencv3}
4. make {python2|python3}
5. . .venv/bin/activate
6. examples/mnist_mlp.py
7. deactivate
⟹ https://goo.gl/jZgfNg
Documentation!
Nervana Systems Proprietary
Deep learning ingredients
Dataset Model/Layers Activation OptimizerCost
 
Nervana Systems Proprietary
neon overview
Backend NervanaGPU, NervanaCPU, NervanaMGPU
Datasets
MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank,
Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO
Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal
Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer
Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short-
Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalizat
ion, Bidirectional-RNN, Bidirectional-LSTM
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
Nervana Systems Proprietary
Curated Models
47
• https://github.com/NervanaSystems/ModelZoo
• Pre-trained weights and models
SegNet
Deep Speech 2
Skip-thought
Autoencoders
Deep Dream
Nervana Systems Proprietary
Neon workflow
1. Generate backend
2. Load data
3. Specify model architecture
4. Define training parameters
5. Train model
6. Evaluate
Nervana Systems Proprietary
Interacting with Neon
1. Via command line
2. In a virtual environment
3. In an ipython/jupyter notebook
4. ncloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
53
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
54
Nervana Systems Proprietary
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Popular with 2D + depth (+ time) inputs
•Gray or RBG images
•Videos
•Synthetic aperture radar
•Spectrogram (speech)
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout,
softmax
•Use multiple copies of the same feature on the input
(correlation)
•Use several features (aka kernels, filters)
•Reduces number of weights compared to fully connected
Nervana Systems Proprietary
•Layers: convolution, rectified linear units (ReLu),
pooling, dropout, softmax
•It is fast – no normalization or exponential computations
•Induces sparsity in the hidden units
 
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Downsampling
•Reduces the number of parameters
•Provides some translation invariance
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Reduces overfitting – Prevents co-adaptation on training data
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•aka “normalized exponential function”
•Normalizes vector to a probability distribution 
Nervana Systems Proprietary
Code!
Nervana Systems Proprietary
63
DEEP LEARNING USE CASES!
Long Short-Term Memory (LSTM)
Nervana Systems Proprietary
Why Recurrent Neural Networks?
Input
Hidden
Output
• Temporal dependencies
• Variable sequence length
• Independence
• Fixed Length
Nervana Systems Proprietary
Recurrent neuron
 
 
 
 
 
 
   
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
0.1
0.7
0.1
0.1
-0.3
0.6
1.6
1
0
0
0
0.1
0.3
0.4
0.2
0.7
-0.4
-0.4
1
0
0
0
0.3
0.0
0.6
0.1
0.1
-0.8
0.1
1
0
0
0
0.0
0.0
0.2
0.8
“h” “e” “l” “l”
“e” “l” “l” “o”
 
Learned a language model!
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
0.1
0.7
0.1
0.1
-0.3
0.6
1.6
1
0
0
0
0.1
0.3
0.4
0.2
0.7
-0.4
-0.4
1
0
0
0
0.4
0.0
0.5
0.1
0.1
-0.8
0.1
1
0
0
0
0.0
0.0
0.2
0.8
“cash” “flow” “is” “high”
“flow” “is” “high” “today”
 
Learned a language model!
“low”
“high”
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
0
0
0
1
“this” “movie” “was” “bad”
NEGATIVE
“and” “long” <eos>
0.1
-0.8
0.1
1
0
0
0
0.7
-0.4
-0.4
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.2
0.8
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
“neon” “is” “amazing”
0.1
-0.8
0.1
0.7
-0.4
-0.4
-0.3
0.6
1.6
0.1
0.7
0.1
0.1
0.1
0.3
0.4
0.2
0.3
0.0
0.6
0.1
0.0
0.0
0.2
0.8
“neon” “est” “incroyable” “!”
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
“neon” “is” “amazing”
0.1
-0.8
0.1
0.7
-0.4
-0.4
-0.3
0.6
1.6
0.1
0.7
0.1
0.1
0.1
0.3
0.4
0.2
0.3
0.0
0.6
0.1
0.0
0.0
0.2
0.8
“neon”“est”“incroyable”“!”
Nervana Systems Proprietary
Long-Short Term Memory (LSTM)
 
       
1 1
 
1
Manipulate memory cell:
1. “forget” (flush the memory)
2. “input” (add to memory)
3. “output” (get from memory)
Nervana Systems Proprietary
Example – Sentiment analysis with LSTM
“Okay, sorry, but I loved this movie. I just
love the whole 80’s genre of these kind
of movies, because you don’t see many
like this...” -~CupidGrl~
POSITIVE
The plot/writing is completely unrealistic and just dumb at
times. Bond is dressed up in a white tux on an overnight
train ride? eh, OK. But then they just show up at the
villain’s compound like nothing bad is going to happen to
them. How stupid is this Bond?
NEGATIVE
Nervana Systems Proprietary
Preprocessing
“Okay, sorry, but I loved this movie. I just
love the whole 80’s genre of these kind
of movies, because you don’t see many
like this...” -~CupidGrl~
[5, 4, 940, 107, 14, 672, 1790,
333, 47, 11, 7890, …,1]
Out-of-Vocab
(e.g. CupidGrl)
• Limit vocab size to 20,000 words
• Truncate each example to 128 words [from the left]
• Pad examples up to 128 whitespace
Nervana Systems Proprietary
Model
d=128
embedding layer
LSTM
LSTM
LSTM
LSTM
N=2
[5, 4, 940, 107,
14, 672, 1790,
333, 47, 11,
7890, …,1]
 
POS
NEG
N=64
LSTM AffineRecurrentSum
 
Nervana Systems Proprietary
Data flow
d=128
embedding layer
LSTM
(2, 1)
POS
NEG
LSTM Affine
    
LSTM LSTM LSTM
       
RecurrentSum
 
 
n=64
Nervana Systems Proprietary
Data flow in batches with neon
d=128
embedding layer
LSTM
(2, bsz)
[5, 4, 940, 107,
14, 672, 1790,
333, 47, 11,
7890,…, 1]
 
POS
NEG
LSTM Affine
 
    
LSTM LSTM LSTM
       
RecurrentSum
 
 
n=64
Nervana Systems Proprietary
Code!
LSTM
Nervana Systems Proprietary
More Code!
LSTM
Nervana Systems Proprietary
In Summary…
1. Deep learning methods are powerful and versatile
2. It’s important to understand how DL relates to
traditional ML methods
3. The barrier of entry to using DL in practice is
lowered with the neon framework on the Nervana
ecosystem
kyle.h.ambert@intel.com
@TheKyleAmbert

Introduction to Deep Learning and neon at Galvanize

  • 1.
    Proprietary and confidential.Do not distribute. Introduction to Deep Learning and Neon MAKING MACHINES SMARTER.™ Kyle H. Ambert, PhD
 Senior Data Scientist May 25 , 2017th @TheKyleAmbert
  • 2.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 3.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 4.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 5.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 6.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 7.
    Nervana Systems Proprietary Aboutme & Intel’s Artificial Intelligence Products Group (AIPG) + Together, we create production deep learning solutions in multiple domains, while advancing the field of applied analytics and optimization.
  • 8.
    Nervana Systems Proprietary 8 Intel’sInterest in Analytics To provide the infrastructure for the fastest time-to-insight To create tools that enable scientists to think about their research, rather than their process To enable users to ask bigger questions Bigger Data Better Hardware Smarter Algorithms Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $1000.00 Cost / GB in 2015: $0.03 Advances in neural networks leading to better accuracy in training models Great solutions require great hardware!
  • 9.
    Nervana Systems Proprietary LIBRARIESIntel® MKL Intel® MKL-DNN FRAMEWORKS Intel® DAAL HARDWARE Memory/Storage FabricCompute Intel Distribution MORE UNLEASHING POTENTIAL FULL SOLUTIONS PLATFORMS/TOOLS BIGDL Intel® Nervana™ Deep Learning Platform Intel® Nervana™ Cloud Intel® Nervana™ Graph
  • 10.
    Nervana Systems Proprietary 10 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 11.
    Nervana Systems Proprietary 11 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 12.
    Nervana Systems Proprietary 12 AI?Machine Learning? Deep Learning?
  • 13.
    Machine learning isthe development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data
  • 14.
    Machine learning isthe development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data
  • 15.
    Machine learning isthe development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data Ingest Data Engineer
Features Structure
 Model Clean Data Visualize Query/ Analyze TrainM odel Deploy
  • 16.
    Nervana Systems Proprietary 16 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 17.
    Nervana Systems Proprietary 17 AQuite Brief History of Deep Learning • 1960s: Neural networks used for binary classification • 1970s: Neural networks popularity dries after not delivering on the hype • 1980s: Backpropagation is used to train deep networks • 1990s: Neural networks take the back seat to support vector machines due to the nice theoretical properties and guarantee bounds • 2010s: Access to large datasets and more computation allowed deep networks to return and have state-of-the-art results in speech, vision, and natural language processing • 1949: The Organization of Behavior is published (Hebb!) (Minsky) Today: Deep Learning is a fast-moving area of academic and applied analytics! There are many opportunities for new discoveries! (Vapnik) (Hinton)
  • 18.
    Nervana Systems Proprietary 18 MLv. DL: Practical Differences   SVM Random Forest Naïve Bayes Decision Trees Logistic Regression Ensemble methods     Harrison
  • 19.
    Nervana Systems Proprietary 19 End-to-EndDeep learning ~60 million parameters Harrison  
  • 20.
    Nervana Systems Proprietary 20 Workflowsin Machine Learning ⟹ The same rules apply for deep learning! ➝ Preprocessing data ➝ Feature extraction ➝ Parsimony in model selection ⟹ How we go about some of this does change…
  • 21.
    Nervana Systems Proprietary 21 End-to-EndDeep learning: Data Considerations
  • 22.
    Nervana Systems Proprietary 22 End-to-EndDeep learning: Data Considerations
  • 23.
    Nervana Systems Proprietary 23 End-to-EndDeep learning: Data Considerations X X X XX X Labels: Harrison? Transformations! More data is always better!
  • 24.
    Nervana Systems Proprietary DeepLearning: Networks of Artificial Neurons       Output of unit Activation Function Linear weights Bias unit Input from unit j                  ⟹ With an explosion of moving parts, being able to understand and keep track of what sort of model is being built becomes even more important!
  • 25.
    Nervana Systems Proprietary Practicalexample: recognition of handwritten digits MNIST dataset 70,000 images (28x28 pixels) Goal: classify images into a digit 0-9 N = 28 x 28 pixels = 784 input units N = 10 output units (one for each digit) Each unit i encodes the probability of the input image of being of the digit i N = 100 hidden units (user-defined parameter) Input Hidden Output
  • 26.
    Nervana Systems Proprietary Trainingprocedure Input Hidden Output 1. Randomly seed weights 2. Forward-pass 3. Cost 4. Backward-pass 5. Update weights
  • 27.
    Nervana Systems Proprietary Forwardpass 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 28x28
  • 28.
    Nervana Systems Proprietary Cost 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output(10x1) 28x28 Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function  
  • 29.
    Nervana Systems Proprietary Backwardpass 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function    ∆Wi→j
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    Nervana Systems Proprietary Training fpropcost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop  
  • 35.
    Nervana Systems Proprietary Gradientdescent fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   Update weights via:   Learning rate
  • 36.
    Nervana Systems Proprietary Stochastic(minibatch) Gradient descent fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   minibatch #1 weight update minibatch #2 weight update
  • 37.
    Nervana Systems Proprietary Stochastic(minibatch) Gradient descent Epoch 0 Epoch 1 Sample numbers: • Learning rate ~0.001 • Batch sizes of 32-128 • 50-90 epochs
  • 38.
    Nervana Systems Proprietary WhyDoes This Work at All? Krizhevsky, 2012 60 million parameters 120 million parameters Taigman, 2014
  • 39.
    Nervana Systems Proprietary 39 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 40.
    Nervana Systems Proprietary Nervanain 30 seconds. Possibly less. 40 neon deep learning framework train deployexplore nervana engine 2-3x speedup on Titan X GPUs cloudn
  • 41.
  • 42.
    Nervana Systems Proprietary nervanacloud Web Interface Command Line
  • 43.
    Nervana Systems Proprietary 43 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 44.
    Nervana Systems Proprietary Ge(i)tNeon! 1. git clone https://github.com/NervanaSystems/neon.git 2. pip install {h5py, pyaml, virtualenv} 3. brew install {opencv|opencv3} 4. make {python2|python3} 5. . .venv/bin/activate 6. examples/mnist_mlp.py 7. deactivate ⟹ https://goo.gl/jZgfNg Documentation!
  • 45.
    Nervana Systems Proprietary Deeplearning ingredients Dataset Model/Layers Activation OptimizerCost  
  • 46.
    Nervana Systems Proprietary neonoverview Backend NervanaGPU, NervanaCPU, NervanaMGPU Datasets MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short- Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalizat ion, Bidirectional-RNN, Bidirectional-LSTM Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
  • 47.
    Nervana Systems Proprietary CuratedModels 47 • https://github.com/NervanaSystems/ModelZoo • Pre-trained weights and models SegNet Deep Speech 2 Skip-thought Autoencoders Deep Dream
  • 48.
    Nervana Systems Proprietary Neonworkflow 1. Generate backend 2. Load data 3. Specify model architecture 4. Define training parameters 5. Train model 6. Evaluate
  • 49.
    Nervana Systems Proprietary Interactingwith Neon 1. Via command line 2. In a virtual environment 3. In an ipython/jupyter notebook 4. ncloud
  • 50.
  • 51.
  • 52.
  • 53.
    Nervana Systems Proprietary 53 ThisEvening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 54.
  • 55.
  • 56.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units, pooling, dropout, softmax •Popular with 2D + depth (+ time) inputs •Gray or RBG images •Videos •Synthetic aperture radar •Spectrogram (speech)
  • 57.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units, pooling, dropout, softmax •Use multiple copies of the same feature on the input (correlation) •Use several features (aka kernels, filters) •Reduces number of weights compared to fully connected
  • 58.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units (ReLu), pooling, dropout, softmax •It is fast – no normalization or exponential computations •Induces sparsity in the hidden units  
  • 59.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units, pooling, dropout, softmax •Downsampling •Reduces the number of parameters •Provides some translation invariance
  • 60.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units, pooling, dropout, softmax •Reduces overfitting – Prevents co-adaptation on training data
  • 61.
    Nervana Systems Proprietary •Layers:convolution, rectified linear units, pooling, dropout, softmax •aka “normalized exponential function” •Normalizes vector to a probability distribution 
  • 62.
  • 63.
    Nervana Systems Proprietary 63 DEEPLEARNING USE CASES! Long Short-Term Memory (LSTM)
  • 64.
    Nervana Systems Proprietary WhyRecurrent Neural Networks? Input Hidden Output • Temporal dependencies • Variable sequence length • Independence • Fixed Length
  • 65.
    Nervana Systems Proprietary Recurrentneuron                
  • 66.
    Nervana Systems Proprietary RNN:what is it good for? 0.1 -0.4 0.6 1 0 0 0 0.1 0.7 0.1 0.1 -0.3 0.6 1.6 1 0 0 0 0.1 0.3 0.4 0.2 0.7 -0.4 -0.4 1 0 0 0 0.3 0.0 0.6 0.1 0.1 -0.8 0.1 1 0 0 0 0.0 0.0 0.2 0.8 “h” “e” “l” “l” “e” “l” “l” “o”   Learned a language model!
  • 67.
    Nervana Systems Proprietary RNN:what is it good for? 0.1 -0.4 0.6 1 0 0 0 0.1 0.7 0.1 0.1 -0.3 0.6 1.6 1 0 0 0 0.1 0.3 0.4 0.2 0.7 -0.4 -0.4 1 0 0 0 0.4 0.0 0.5 0.1 0.1 -0.8 0.1 1 0 0 0 0.0 0.0 0.2 0.8 “cash” “flow” “is” “high” “flow” “is” “high” “today”   Learned a language model! “low” “high”
  • 68.
    Nervana Systems Proprietary RNN:what is it good for? 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 0 0 0 1 “this” “movie” “was” “bad” NEGATIVE “and” “long” <eos> 0.1 -0.8 0.1 1 0 0 0 0.7 -0.4 -0.4 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.2 0.8
  • 69.
    Nervana Systems Proprietary RNN:what is it good for? 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 “neon” “is” “amazing” 0.1 -0.8 0.1 0.7 -0.4 -0.4 -0.3 0.6 1.6 0.1 0.7 0.1 0.1 0.1 0.3 0.4 0.2 0.3 0.0 0.6 0.1 0.0 0.0 0.2 0.8 “neon” “est” “incroyable” “!” 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 “neon” “is” “amazing” 0.1 -0.8 0.1 0.7 -0.4 -0.4 -0.3 0.6 1.6 0.1 0.7 0.1 0.1 0.1 0.3 0.4 0.2 0.3 0.0 0.6 0.1 0.0 0.0 0.2 0.8 “neon”“est”“incroyable”“!”
  • 70.
    Nervana Systems Proprietary Long-ShortTerm Memory (LSTM)           1 1   1 Manipulate memory cell: 1. “forget” (flush the memory) 2. “input” (add to memory) 3. “output” (get from memory)
  • 71.
    Nervana Systems Proprietary Example– Sentiment analysis with LSTM “Okay, sorry, but I loved this movie. I just love the whole 80’s genre of these kind of movies, because you don’t see many like this...” -~CupidGrl~ POSITIVE The plot/writing is completely unrealistic and just dumb at times. Bond is dressed up in a white tux on an overnight train ride? eh, OK. But then they just show up at the villain’s compound like nothing bad is going to happen to them. How stupid is this Bond? NEGATIVE
  • 72.
    Nervana Systems Proprietary Preprocessing “Okay,sorry, but I loved this movie. I just love the whole 80’s genre of these kind of movies, because you don’t see many like this...” -~CupidGrl~ [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890, …,1] Out-of-Vocab (e.g. CupidGrl) • Limit vocab size to 20,000 words • Truncate each example to 128 words [from the left] • Pad examples up to 128 whitespace
  • 73.
    Nervana Systems Proprietary Model d=128 embeddinglayer LSTM LSTM LSTM LSTM N=2 [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890, …,1]   POS NEG N=64 LSTM AffineRecurrentSum  
  • 74.
    Nervana Systems Proprietary Dataflow d=128 embedding layer LSTM (2, 1) POS NEG LSTM Affine      LSTM LSTM LSTM         RecurrentSum     n=64
  • 75.
    Nervana Systems Proprietary Dataflow in batches with neon d=128 embedding layer LSTM (2, bsz) [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890,…, 1]   POS NEG LSTM Affine        LSTM LSTM LSTM         RecurrentSum     n=64
  • 76.
  • 77.
  • 78.
    Nervana Systems Proprietary InSummary… 1. Deep learning methods are powerful and versatile 2. It’s important to understand how DL relates to traditional ML methods 3. The barrier of entry to using DL in practice is lowered with the neon framework on the Nervana ecosystem kyle.h.ambert@intel.com @TheKyleAmbert