SlideShare a Scribd company logo
1 of 24
©2017, Amazon Web Services, Inc. or its affiliates. All rights reserved
Deep Learning with Apache MXNet
Julien Simon, Principal Technical Evangelist
@julsimon
Selected customers running AI on AWS
Questions
• What’s the business problem my IT has failed to solve?
– That’s probably where Deep Learning can help
• Should I design and train my own Deep Learning model?
– Do I have the expertise?
– Do I have enough time, data & compute to train it?
• Should I use a pre-trained model?
– How well does it fit my use case?
– On what data was it trained? How close is this to my own data?
• Should I use a SaaS solution?
Amazon AI for every developer
More information at aws.amazon.com/ai
Neural Networks
Frank Rosenblatt
Artificial Intelligence pioneer
1957 – The Perceptron
Appropriate weights are applied to the inputs, and
the resulting weighted sum is passed to a function
that produces the output.
The neuron
x = [x1, x2, …. xI ]
w = [w1, w2, …. wI ]
u = x.w
Activation function
The neural network
x =
x11, x12, …. x1I
x21, x22, …. x2I
x…, x.., …. x.I
xm1, xm2, …. xmI
I features
m samples
y =
y1
y2
y…
ym
m labels
Total number of predictions
Accuracy =
Number of correct predictions
The training process
• The difference between the predicted output and the actual output (aka
ground truth) is called the prediction loss.
• There are different ways to compute it (loss functions).
• The purpose of training is to iteratively minimize loss and maximize
accuracy for a given data set.
• We need a way to adjust weights (aka parameters) in order to gradually
minimize loss
 Backpropagation + optimization algorithm
Paul Werbos
Artificial Intelligence pioneer
IEEE Neural Network Pioneer Award
1974 – Backpropagation
The back-propagation algorithm acts as an error
correcting mechanism at each neuron level,
thereby helping the network to learn effectively.
Stochastic Gradient Descent (SGD)
Imagine you stand on top of a mountain with
skis strapped to your feet. You want to get down
to the valley as quickly as possible, but there is
fog and you can only see your immediate
surroundings. How can you get down the
mountain as quickly as possible? You look
around and identify the steepest path down, go
down that path for a bit, again look around and
find the new steepest path, go down that path,
and repeat—this is exactly what gradient
descent does.
Tim Dettmers
University of Lugano
2015
Source: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/
The « step size » is called
the learning rate
There is such a thing as « learning too well »
• If a network is large enough and given enough time, it will perfectly
learn a data set (universal approximation theorem).
• But what about new samples? Can it also predict them correctly?
• Does the network generalize well or not?
• To prevent overfitting, we need to know when to stop training.
• The training data set is not enough.
Data sets
Full data set
Training data set
Validation data set
Test data set
Training set: this data set is used to
adjust the weights on the neural
network.
Validation set: this data set is used
to minimize overfitting. You're not
adjusting the weights of the network
with this data set, you're just
verifying that any increase in
accuracy over the training data set
actually yields an increase in
accuracy over a data set that has not
been shown to the network before.
Testing set: this data set is used
only for testing the final weights in
order to benchmark the actual
predictive power of the network.
Training
Training data set Training
Trained
neural network
Number of epochs
Batch size
Learning rate
Hyper parameters
Validation
Validation Data set Trained
neural network
Validation
accuracy
Prediction at
the end of
each epoch
Stop training when validation accuracy stops increasing
Saving parameters at the end of each epoch is a good idea ;)
Convolutional Neural Networks
Le Cun, 1998: handwritten digit recognition, 32x32 pixels
Feature extraction and downsampling allow smaller networks
https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
Apache MXNet
Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
Most Open Best On AWS
Optimized for
deep learning on
AWS
Accepted into the
Apache Incubator
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1
• Straightforward and flexible.
• Take advantage of language
native features (loop, condition,
debugger).
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONSEasy to tweak
in Python
Imperative Programming
• More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
x
Declarative Programming
0
4
8
12
16
1 2 4 8 16
Ideal
Inception v3
Resnet
Alexnet
91%
Efficiency
Multi-GPU Scaling With MXNet
Input Output
1 1 1
1 0 1
0 0 0
3
mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20)
mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) ,
st r i de=( 2, 2)
l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx. sym. Ful l yConnect ed( dat a, num_hi dden=128)
2
mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k)
0.2
-0.1
...
0.7
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen ) = cos(w, k i n g) - cos(w, m an ) + cos(w, w om an )
mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" )
" r el u"
" t anh"
" si gmoi d"
" sof t r el u"
Neural Art
Face Search
Image Segmentation
Image Caption
“ People Riding
Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“ People Riding
Bikes”
Machine Translation
“ Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx. model . FeedFor war d model . f i t
mx. sym. Sof t maxOut put
Let’s code
1. Our first network
2. Using pre-trained models
3. Learning from scratch (MNIST, MLP, CNN)
4. Learning and fine-tuning (CIFAR-10, Resnext-101)
5. Using Apache MXNet as a Keras backend
6. Fine-tuning with Keras/MXNet (CIFAR-10, Resnet-50)
7. A quick word about Sockeye
http://mxnet.io
http://medium.com/@julsimon
https://github.com/juliensimon/aws/tree/master/mxnet
Thank you!
http://aws.amazon.com/evangelists/julien-simon
@julsimon

More Related Content

What's hot

Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Julien SIMON
 
Running BSD on AWS
Running BSD on AWSRunning BSD on AWS
Running BSD on AWSJulien SIMON
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningAmanda Mackay (she/her)
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowAltoros
 
Optimizing training on Apache MXNet (January 2018)
Optimizing training on Apache MXNet (January 2018)Optimizing training on Apache MXNet (January 2018)
Optimizing training on Apache MXNet (January 2018)Julien SIMON
 
MCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSMCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSAmazon Web Services
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Julien SIMON
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Julien SIMON
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeJulien SIMON
 
MCL 322 Optimizing Training on Apache MXNet
MCL 322 Optimizing Training on Apache MXNet MCL 322 Optimizing Training on Apache MXNet
MCL 322 Optimizing Training on Apache MXNet Julien SIMON
 
Deep Learning for Developers (December 2017)
Deep Learning for Developers (December 2017)Deep Learning for Developers (December 2017)
Deep Learning for Developers (December 2017)Julien SIMON
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017MLconf
 
Optimizing training on Apache MXNet
Optimizing training on Apache MXNetOptimizing training on Apache MXNet
Optimizing training on Apache MXNetAmazon Web Services
 
Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Julien SIMON
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the EdgeJulien SIMON
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud jtsagata
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestgeetachauhan
 
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)Amazon Web Services Korea
 

What's hot (20)

Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)
 
Running BSD on AWS
Running BSD on AWSRunning BSD on AWS
Running BSD on AWS
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine Learning
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlow
 
Optimizing training on Apache MXNet (January 2018)
Optimizing training on Apache MXNet (January 2018)Optimizing training on Apache MXNet (January 2018)
Optimizing training on Apache MXNet (January 2018)
 
MCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSMCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWS
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
MCL 322 Optimizing Training on Apache MXNet
MCL 322 Optimizing Training on Apache MXNet MCL 322 Optimizing Training on Apache MXNet
MCL 322 Optimizing Training on Apache MXNet
 
Deep Learning for Developers (December 2017)
Deep Learning for Developers (December 2017)Deep Learning for Developers (December 2017)
Deep Learning for Developers (December 2017)
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
 
Optimizing training on Apache MXNet
Optimizing training on Apache MXNetOptimizing training on Apache MXNet
Optimizing training on Apache MXNet
 
Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the Edge
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBest
 
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
 

Similar to Deep Learning with Apache MXNet (September 2017)

Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Amazon Web Services
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughLavanya Shukla
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine LearningGaurav Bhalotia
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 

Similar to Deep Learning with Apache MXNet (September 2017) (20)

Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in Action
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
Deep learning
Deep learningDeep learning
Deep learning
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthrough
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 

More from Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Julien SIMON
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 

More from Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Deep Learning with Apache MXNet (September 2017)

  • 1. ©2017, Amazon Web Services, Inc. or its affiliates. All rights reserved Deep Learning with Apache MXNet Julien Simon, Principal Technical Evangelist @julsimon
  • 3. Questions • What’s the business problem my IT has failed to solve? – That’s probably where Deep Learning can help • Should I design and train my own Deep Learning model? – Do I have the expertise? – Do I have enough time, data & compute to train it? • Should I use a pre-trained model? – How well does it fit my use case? – On what data was it trained? How close is this to my own data? • Should I use a SaaS solution?
  • 4. Amazon AI for every developer More information at aws.amazon.com/ai
  • 6. Frank Rosenblatt Artificial Intelligence pioneer 1957 – The Perceptron Appropriate weights are applied to the inputs, and the resulting weighted sum is passed to a function that produces the output.
  • 7. The neuron x = [x1, x2, …. xI ] w = [w1, w2, …. wI ] u = x.w Activation function
  • 8. The neural network x = x11, x12, …. x1I x21, x22, …. x2I x…, x.., …. x.I xm1, xm2, …. xmI I features m samples y = y1 y2 y… ym m labels Total number of predictions Accuracy = Number of correct predictions
  • 9. The training process • The difference between the predicted output and the actual output (aka ground truth) is called the prediction loss. • There are different ways to compute it (loss functions). • The purpose of training is to iteratively minimize loss and maximize accuracy for a given data set. • We need a way to adjust weights (aka parameters) in order to gradually minimize loss  Backpropagation + optimization algorithm
  • 10. Paul Werbos Artificial Intelligence pioneer IEEE Neural Network Pioneer Award 1974 – Backpropagation The back-propagation algorithm acts as an error correcting mechanism at each neuron level, thereby helping the network to learn effectively.
  • 11. Stochastic Gradient Descent (SGD) Imagine you stand on top of a mountain with skis strapped to your feet. You want to get down to the valley as quickly as possible, but there is fog and you can only see your immediate surroundings. How can you get down the mountain as quickly as possible? You look around and identify the steepest path down, go down that path for a bit, again look around and find the new steepest path, go down that path, and repeat—this is exactly what gradient descent does. Tim Dettmers University of Lugano 2015 Source: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/ The « step size » is called the learning rate
  • 12. There is such a thing as « learning too well » • If a network is large enough and given enough time, it will perfectly learn a data set (universal approximation theorem). • But what about new samples? Can it also predict them correctly? • Does the network generalize well or not? • To prevent overfitting, we need to know when to stop training. • The training data set is not enough.
  • 13. Data sets Full data set Training data set Validation data set Test data set Training set: this data set is used to adjust the weights on the neural network. Validation set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before. Testing set: this data set is used only for testing the final weights in order to benchmark the actual predictive power of the network.
  • 14. Training Training data set Training Trained neural network Number of epochs Batch size Learning rate Hyper parameters
  • 15. Validation Validation Data set Trained neural network Validation accuracy Prediction at the end of each epoch Stop training when validation accuracy stops increasing Saving parameters at the end of each epoch is a good idea ;)
  • 16. Convolutional Neural Networks Le Cun, 1998: handwritten digit recognition, 32x32 pixels Feature extraction and downsampling allow smaller networks https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
  • 18. Apache MXNet Programmable Portable High Performance Near linear scaling across hundreds of GPUs Highly efficient models for mobile and IoT Simple syntax, multiple languages Most Open Best On AWS Optimized for deep learning on AWS Accepted into the Apache Incubator
  • 19. import numpy as np a = np.ones(10) b = np.ones(10) * 2 c = b * a d = c + 1 • Straightforward and flexible. • Take advantage of language native features (loop, condition, debugger). • E.g. Numpy, Matlab, Torch, … • Hard to optimize PROS CONSEasy to tweak in Python Imperative Programming
  • 20. • More chances for optimization • Cross different languages • E.g. TensorFlow, Theano, Caffe • Less flexible PROS CONS C can share memory with D because C is deleted later A = Variable('A') B = Variable('B') C = B * A D = C + 1 f = compile(D) d = f(A=np.ones(10), B=np.ones(10)*2) A B 1 + x Declarative Programming
  • 21. 0 4 8 12 16 1 2 4 8 16 Ideal Inception v3 Resnet Alexnet 91% Efficiency Multi-GPU Scaling With MXNet
  • 22. Input Output 1 1 1 1 0 1 0 0 0 3 mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20) mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) , st r i de=( 2, 2) l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed) 4 2 2 0 4=Max 1 3 ... 4 0.2 -0.1 ... 0.7 mx. sym. Ful l yConnect ed( dat a, num_hi dden=128) 2 mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k) 0.2 -0.1 ... 0.7 Queen 4 2 2 0 2=Avg Input Weights cos(w, queen ) = cos(w, k i n g) - cos(w, m an ) + cos(w, w om an ) mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" ) " r el u" " t anh" " si gmoi d" " sof t r el u" Neural Art Face Search Image Segmentation Image Caption “ People Riding Bikes” Bicycle, People, Road, Sport Image Labels Image Video Speech Text “ People Riding Bikes” Machine Translation “ Οι άνθρωποι ιππασίας ποδήλατα” Events mx. model . FeedFor war d model . f i t mx. sym. Sof t maxOut put
  • 23. Let’s code 1. Our first network 2. Using pre-trained models 3. Learning from scratch (MNIST, MLP, CNN) 4. Learning and fine-tuning (CIFAR-10, Resnext-101) 5. Using Apache MXNet as a Keras backend 6. Fine-tuning with Keras/MXNet (CIFAR-10, Resnet-50) 7. A quick word about Sockeye http://mxnet.io http://medium.com/@julsimon https://github.com/juliensimon/aws/tree/master/mxnet

Editor's Notes

  1. Execute operations step by step. c = b ⨉ a invokes a kernel operation Numpy programs are imperative
  2. Declares the computation Compiles into a function C = B ⨉ A  only specifies the requirement SQL is declarative