© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cyrus Vahid <cyrusmv@amazon.com>
Principal Evangelist, AI Labs – MXNet
Aug 2018
Apache MXNet and gluon
Building Deep Learning Applications with
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Background
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deductive Reasoning
P Q P ∧ Q P ∨ Q P ∴ Q
T T T T T
T F F T F
F T F T T
F F F F T
• 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇
• 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄; ∼ 𝑃 ∴ 𝑃 → 𝑄
• P → Q
P
_________
∴ Q
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rule Based Programming
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Plausible Reasoning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Programming with Data
Understand
your data
Algorithmically
Discover
Hidden Patents
Generalize
Solution
Algorithm
Apply solution
to unseen
patterns
Make
Predictions
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fundamentals
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Biological & Artificial Neuron
Source: http://cs231n.github.io/neural-networks-1/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
w1 w2 w3
𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖))
Φ 𝑥 =
1, 𝑖𝑓 𝑥 ≥ 0.5
0, 𝑖𝑓 𝑥 < 0.5
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
1 1 -1
𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1
𝐼1 = 𝐼2 = 𝐵1 = 1
𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0
𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Non-Linearity
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
P
Q
x0
0 0
P
Q
x0
x 0
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
hidden layersInput layer
output
Add Non Linearity to output of hidden layer
To transform output into continuous range
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
X
input
label
...
X1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activation Function (Φ)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Preprocessing, Batches, Epochs
Preprocessing
 Random separation of data into
training, validation, and test sets
 Necessary to measuring the
accuracy of the model
Batch
 Amount of data propagated
through network at every iteration
 Enables faster optimization
through shorter iteration cycles
Epoch
 Complete pass through all the
training data
 Optimization will have multiple
epochs to reduce error rate
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding MNIST data
https://www.tensorflow.org/get_started/mnist/beginners
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding Pictures into Data
7 x 7 x 3 Matrix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification with the Softmax Function
Softmax converts the output layer into probabilities – necessary for classification
Softmax Function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Loss Function
• It is an objective function that quantifies how successful
the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts
the plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Weight Initialization
https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizers
http://imgur.com/a/Hqolp
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Rates
• Learning Rate: It is a real number
that decides how far to move down in
the direction of steepest gradient
• Online Learning: Weights are
updated at each step (slow to learn)
• Batch Learning: Weights are
updated after all training data is
processed (hard to optimize)
• Mini-Batch: Combination of both
when we break up the training set
into smaller batches and update the
weights after each mini-batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training and Validation Data
Best model
When only evaluating accuracy using the training set, we face the Overfitting issue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
Scaling with MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative vs Symbolic Programming
Imperative Symbolic
Execution Flow is the same as flow of the
code:
Abstract functions are defined and compiled
first, data binding happens next.
Flexible but inefficient: Efficient
• Memory: 4 * 10 * 8 = 320 bytes
• Interim values are available
• No Operation Folding.
• Familiar coding paradigm.
• Memory: 2 * 10 * 8 = 160 bytes
• Interim values are not available
• Operation Folding: Folding
multiple operations into one.
We run one op. instead of
many on GPU. This is possible
because we have access to
whole comp. graph
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evolution of DL Frameworks
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of the Gluon API
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
 Neural networks can be defined using simple, clear, concise code
 Plug-and-play neural network building blocks – including predefined layers,
optimizers, and initializers
 Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
 Intuitive, easy-to-debug, familiar code
 Neural networks can change in shape or size during the training process to
address advanced use cases where the size of data fed is variable
 Important area of innovation in Natural Language Processing (NLP)
 There is no sacrifice with respect to training speed
 When it is time to move from prototyping to production, easily cache neural
networks for high performance and a reduced memory footprint
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Code
https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonCV, a Deep Learning Toolkit for Computer Vision
• Features:
• training scripts that reproduces SOTA results reported in latest
papers,
• a large set of pre-trained models,
• carefully designed APIs and easy to understand implementations,
• community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonNLP, a Deep Learning Toolkit for Natural
Language Processing
• Features:
• Training scripts to reproduce SOTA results reported in research
papers.
• Pre-trained models for common NLP tasks.
• Carefully designed APIs that greatly reduce the implementation
complexity.
• Community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• MXNet backend for Keras: Keras is a high-level neural networks
API, written in Python and capable of running on top of Apache MXNet,
Tensorflow, CNTK, and Theano.
• Performance: MXNet backend provides scalable and fast backend for
new projects and existing code, hence with least effort it can improve
performance of existing models. For more on benchmarking please check:
https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Refrences
• Mxnet: http://mxnet.incubator.apache.org/
• Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/
• Deep Learning book based on gluon: https://gluon.mxnet.io/
• GluonCV: https://gluon-cv.mxnet.io/
• GluonNLP: https://gluon-nlp.mxnet.io/
• Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
c y r u s m v @ a m a z o n . c o m

Deep Learning with MXNet

  • 1.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus Vahid <cyrusmv@amazon.com> Principal Evangelist, AI Labs – MXNet Aug 2018 Apache MXNet and gluon Building Deep Learning Applications with
  • 2.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Background
  • 3.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Deductive Reasoning P Q P ∧ Q P ∨ Q P ∴ Q T T T T T T F F T F F T F T T F F F F T • 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇 • 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄; ∼ 𝑃 ∴ 𝑃 → 𝑄 • P → Q P _________ ∴ Q
  • 4.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Rule Based Programming
  • 5.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Plausible Reasoning
  • 6.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Programming with Data Understand your data Algorithmically Discover Hidden Patents Generalize Solution Algorithm Apply solution to unseen patterns Make Predictions
  • 7.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fundamentals
  • 8.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/
  • 9.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖)) Φ 𝑥 = 1, 𝑖𝑓 𝑥 ≥ 0.5 0, 𝑖𝑓 𝑥 < 0.5
  • 10.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1 𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1 𝐼1 = 𝐼2 = 𝐵1 = 1 𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0 𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
  • 11.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0
  • 12.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range
  • 13.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... backpropagation (gradient descent) X1 != X 0.4 ± 𝛿 0.3 ± 𝛿 new weights new weights 0 1 0 1 1 . . - X input label ... X1
  • 14.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Activation Function (Φ)
  • 15.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing  Random separation of data into training, validation, and test sets  Necessary to measuring the accuracy of the model Batch  Amount of data propagated through network at every iteration  Enables faster optimization through shorter iteration cycles Epoch  Complete pass through all the training data  Optimization will have multiple epochs to reduce error rate
  • 16.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners
  • 17.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding Pictures into Data 7 x 7 x 3 Matrix
  • 18.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function
  • 19.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
  • 20.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 21.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Weight Initialization https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
  • 22.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
  • 23.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizers http://imgur.com/a/Hqolp
  • 24.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch
  • 25.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue
  • 26.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
  • 27.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet
  • 28.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3
  • 29.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 30.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 31.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 32.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 Scaling with MXNet
  • 33.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic Programming Imperative Symbolic Execution Flow is the same as flow of the code: Abstract functions are defined and compiled first, data binding happens next. Flexible but inefficient: Efficient • Memory: 4 * 10 * 8 = 320 bytes • Interim values are available • No Operation Folding. • Familiar coding paradigm. • Memory: 2 * 10 * 8 = 160 bytes • Interim values are not available • Operation Folding: Folding multiple operations into one. We run one op. instead of many on GPU. This is possible because we have access to whole comp. graph
  • 34.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon
  • 35.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Evolution of DL Frameworks
  • 36.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance  Neural networks can be defined using simple, clear, concise code  Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers  Eliminates rigidity of neural network model definition and brings together the model with the training algorithm  Intuitive, easy-to-debug, familiar code  Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable  Important area of innovation in Natural Language Processing (NLP)  There is no sacrifice with respect to training speed  When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint
  • 37.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
  • 38.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonCV, a Deep Learning Toolkit for Computer Vision • Features: • training scripts that reproduces SOTA results reported in latest papers, • a large set of pre-trained models, • carefully designed APIs and easy to understand implementations, • community support.
  • 39.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonNLP, a Deep Learning Toolkit for Natural Language Processing • Features: • Training scripts to reproduce SOTA results reported in research papers. • Pre-trained models for common NLP tasks. • Carefully designed APIs that greatly reduce the implementation complexity. • Community support.
  • 40.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • MXNet backend for Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of Apache MXNet, Tensorflow, CNTK, and Theano. • Performance: MXNet backend provides scalable and fast backend for new projects and existing code, hence with least effort it can improve performance of existing models. For more on benchmarking please check: https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
  • 41.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Refrences • Mxnet: http://mxnet.incubator.apache.org/ • Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/ • Deep Learning book based on gluon: https://gluon.mxnet.io/ • GluonCV: https://gluon-cv.mxnet.io/ • GluonNLP: https://gluon-nlp.mxnet.io/ • Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
  • 42.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! c y r u s m v @ a m a z o n . c o m