Deep Learning for Developers
16/12/2017
Julien Simon, AI Evangelist, EMEA
@julsimon
What to expect
• AI ?
• An introduction to Deep Learning
• Common neural network architectures and use cases
• An introduction to Apache MXNet
• Demos using Jupyter notebooks on Amazon SageMaker
• Resources
• Artificial Intelligence: design software applications which exhibit
human-like behavior, e.g. speech, natural language processing,
reasoning or intuition
• Machine Learning: teach machines to learn without being
explicitly programmed
• Deep Learning: using neural networks, teach machines to learn
from complex data where features cannot be explicitly expressed
Myth: AI is dark magic
aka « You’re not smart enough »
Fact: AI is math, code and chips
A bit of Science, a lot of Engineering
Amazon AI is based on Deep Learning
An introduction to Deep Learning
Activation functions
The neuron
i=1
l
xi ∗ wi = u
”Multiply and Accumulate”
Source: Wikipedia
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
x =
x11, x12, …. x1I
x21, x22, …. x2I
… … …
xm1, xm2, …. xmI
I features
m samples
y =
y1
y2
…
ym
m labels,
N2 outputs
0,0,1,0,0,…,0
1,0,0,0,0,…,0
…
0,0,0,0,1,…,0
One-hot encoding
Neural networks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
x =
x11, x12, …. x1I
x21, x22, …. x2I
… … …
xm1, xm2, …. xmI
I features
m samples
y =
y1
y2
…
ym
m labels,
N2 categories
Total number of predictions
Accuracy =
Number of correct predictions
0,0,1,0,0,…,0
1,0,0,0,0,…,0
…
0,0,0,0,1,…,0
One-hot encoding
Neural networks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Neural networks
Initially, the network will not predict correctly
f(X1) = Y’1
A loss function measures the difference between
the real label Y1 and the predicted label Y’1
error = loss(Y1, Y’1)
For a batch of samples:
𝑖=1
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
loss(Yi, Y’i) = batch error
The purpose of the training process is to
minimize error by gradually adjusting weights
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training
Training data set Training
Trained
neural network
Batch size
Learning rate
Number of epochs
Hyper parameters
Backpropagation
Stochastic Gradient Descent (SGD)
Imagine you stand on top of a mountain with skis
strapped to your feet. You want to get down to
the valley as quickly as possible, but there is fog
and you can only see your immediate
surroundings. How can you get down the
mountain as quickly as possible? You look
around and identify the steepest path down, go
down that path for a bit, again look around and
find the new steepest path, go down that path,
and repeat—this is exactly what gradient descent
does.
Tim Dettmers
University of Lugano
2015
https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/
The « step size » is called
the learning rate
z=f(x,y)
Alternatives to SGD:
rmsprod, adagrad, adadelta,
adam, etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Validation
Validation data set Trained
neural network
Validation
accuracy
Prediction at
the end of
each epoch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Early stopping
Training accuracy
Loss function
Accuracy
100%
Epochs
Validation accuracy
Loss
OVERFITTING
Save the model at the end of each epoch 
Common network architectures
and use cases
Convolutional Neural Networks (CNN)
Le Cun, 1998: handwritten digit recognition, 32x32 pixels
Convolution and pooling reduce dimensionality
https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/
• Expedia have over 10 million images from
300,000 hotels
• Using great images boosts conversion
• Using Keras and EC2 GPU instances,
they fine-tuned a pre-trained Convolutional Neural
Network using 100,000 images
• Hotel descriptions now automatically feature the best
available images
CNN: Object Classification
CNN: Object Detection
https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo
• 17,000 images from Instagram
• 7 brands
• Deep Learning model pre-trained on ImageNet
• Fine-tuning with TensorFlow and EC2 GPU
instances
• Additional work on color extraction
https://technology.condenast.com/story/handbag-brand-and-color-detection
CNN: Object Detection
CNN: Object Segmentation
https://github.com/TuSimple/mx-maskrcnn
https://www.oreilly.com/ideas/self-driving-trucks-enter-the-fast-lane-using-deep-learn
Last June, tuSimple drove an autonomous
truck
for 200 miles from Yuma, AZ to San Diego,
CNN: Face Detection
https://github.com/tornadomeet/mxnet-face
Real-Time Pose Estimation
https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation
Long Short Term Memory Networks (LSTM)
• A LSTM neuron computes the
output based on the input and a
previous state
• LSTM networks have memory
• They’re great at predicting
sequences, e.g. machine
translation
LSTM: Machine Translation
https://github.com/awslabs/sockeye
Generative Adversarial Networks (GAN)
• Goodfellow, 2014
• Dual-network architecture
• Detector network
• Sees real samples
• Learn how to detect real samples from fake ones
created by the Generator network
• Generator network
• Doesn’t see real samples
• Creates images from random data
• Applies the same weight updates as the Generator
• Learns gradually how to generate
better fake samples
GAN: Welcome to the (un)real world, Neo
Generating new ”celebrity” faces
https://github.com/tkarras/progressive_growing_of_gans
From semantic map to 2048x1024 picture
https://tcwang0509.github.io/pix2pixHD/
Apache MXNet
Apache MXNet: Open Source library for Deep Learning
Programmable Portable High Performance
Near linear scaling
across hundreds of
GPUs
Highly efficient
models for
mobile
and IoT
Simple syntax,
multiple
languages
Most Open Best On AWS
Optimized for
Deep Learning on AWS
Accepted into the
Apache Incubator
MXNet 1.0 released on December 4th
Input Output
1 1 1
1 0 1
0 0 0
3
mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20)
mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) ,
st r i de=( 2, 2)
l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed)
4 2
2 0
4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx. sym. Ful l yConnect ed( dat a, num_hi dden=128)
2
mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k)
0.2
-0.1
...
0.7
Queen
4 2
2 0
2=Avg
Input Weights
cos(w, queen ) = cos(w, k i n g) - cos(w, m an ) + cos(w, w om an )
mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" )
" r el u"
" t anh"
" si gmoi d"
" sof t r el u"
Neural Art
Face Search
Image Segmentation
Image Caption
“ People Riding
Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“ People Riding
Bikes”
Machine Translation
“ Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx. model . FeedFor war d model . f i t
mx. sym. Sof t maxOut put
https://github.com/awslabs/mxnet-model-server/
https://aws.amazon.com/blogs/ai/announcing-onnx-support-for-apache-mxnet/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Apache MXNet API
• Storing and accessing data in multi-dimensional arrays
NDArray API
• Building models (layers, weights, activation functions)
 Symbol API
• Serving data during training and validation
 Iterators
• Training and using models
 Module API
Demos
https://github.com/juliensimon/dlnotebooks
1) Hello World: learn a synthetic data set
2) Classify images with pre-trained models
3) Learn and predict MNIST with a Multi-Layer Perceptron and the LeNet CNN
4) Train, host and deploy a model with Amazon SageMaker
5) Generate MNIST samples with a GAN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One-click training
for ML, DL, and
custom algorithms
Easier training with
hyperparameter
optimization
Highly-optimized
machine learning
algorithms
Deployment
without
engineering effort
Fully-managed
hosting at scale
BuildPre-built notebook
instances
Deploy
Train
Amazon SageMaker
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Resources
https://aws.amazon.com/machine-learning
https://aws.amazon.com/sagemaker
https://aws.amazon.com/blogs/ai
https://mxnet.incubator.apache.org
https://github.com/apache/incubator-mxnet
https://github.com/gluon-api
An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4
https://medium.com/@julsimon
Thank you!
Julien Simon, AI Evangelist, EMEA
@julsimon

Deep Learning for Developers (expanded version, 12/2017)

  • 1.
    Deep Learning forDevelopers 16/12/2017 Julien Simon, AI Evangelist, EMEA @julsimon
  • 2.
    What to expect •AI ? • An introduction to Deep Learning • Common neural network architectures and use cases • An introduction to Apache MXNet • Demos using Jupyter notebooks on Amazon SageMaker • Resources
  • 3.
    • Artificial Intelligence:design software applications which exhibit human-like behavior, e.g. speech, natural language processing, reasoning or intuition • Machine Learning: teach machines to learn without being explicitly programmed • Deep Learning: using neural networks, teach machines to learn from complex data where features cannot be explicitly expressed
  • 4.
    Myth: AI isdark magic aka « You’re not smart enough »
  • 5.
    Fact: AI ismath, code and chips A bit of Science, a lot of Engineering
  • 6.
    Amazon AI isbased on Deep Learning
  • 7.
    An introduction toDeep Learning
  • 8.
    Activation functions The neuron i=1 l xi∗ wi = u ”Multiply and Accumulate” Source: Wikipedia
  • 9.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. x = x11, x12, …. x1I x21, x22, …. x2I … … … xm1, xm2, …. xmI I features m samples y = y1 y2 … ym m labels, N2 outputs 0,0,1,0,0,…,0 1,0,0,0,0,…,0 … 0,0,0,0,1,…,0 One-hot encoding Neural networks
  • 10.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. x = x11, x12, …. x1I x21, x22, …. x2I … … … xm1, xm2, …. xmI I features m samples y = y1 y2 … ym m labels, N2 categories Total number of predictions Accuracy = Number of correct predictions 0,0,1,0,0,…,0 1,0,0,0,0,…,0 … 0,0,0,0,1,…,0 One-hot encoding Neural networks
  • 11.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Neural networks Initially, the network will not predict correctly f(X1) = Y’1 A loss function measures the difference between the real label Y1 and the predicted label Y’1 error = loss(Y1, Y’1) For a batch of samples: 𝑖=1 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒 loss(Yi, Y’i) = batch error The purpose of the training process is to minimize error by gradually adjusting weights
  • 12.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Training Training data set Training Trained neural network Batch size Learning rate Number of epochs Hyper parameters Backpropagation
  • 13.
    Stochastic Gradient Descent(SGD) Imagine you stand on top of a mountain with skis strapped to your feet. You want to get down to the valley as quickly as possible, but there is fog and you can only see your immediate surroundings. How can you get down the mountain as quickly as possible? You look around and identify the steepest path down, go down that path for a bit, again look around and find the new steepest path, go down that path, and repeat—this is exactly what gradient descent does. Tim Dettmers University of Lugano 2015 https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/ The « step size » is called the learning rate z=f(x,y) Alternatives to SGD: rmsprod, adagrad, adadelta, adam, etc.
  • 14.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Validation Validation data set Trained neural network Validation accuracy Prediction at the end of each epoch
  • 15.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Early stopping Training accuracy Loss function Accuracy 100% Epochs Validation accuracy Loss OVERFITTING Save the model at the end of each epoch 
  • 16.
  • 17.
    Convolutional Neural Networks(CNN) Le Cun, 1998: handwritten digit recognition, 32x32 pixels Convolution and pooling reduce dimensionality https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
  • 18.
    https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/ • Expedia haveover 10 million images from 300,000 hotels • Using great images boosts conversion • Using Keras and EC2 GPU instances, they fine-tuned a pre-trained Convolutional Neural Network using 100,000 images • Hotel descriptions now automatically feature the best available images CNN: Object Classification
  • 19.
  • 20.
    • 17,000 imagesfrom Instagram • 7 brands • Deep Learning model pre-trained on ImageNet • Fine-tuning with TensorFlow and EC2 GPU instances • Additional work on color extraction https://technology.condenast.com/story/handbag-brand-and-color-detection CNN: Object Detection
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Long Short TermMemory Networks (LSTM) • A LSTM neuron computes the output based on the input and a previous state • LSTM networks have memory • They’re great at predicting sequences, e.g. machine translation
  • 26.
  • 27.
    Generative Adversarial Networks(GAN) • Goodfellow, 2014 • Dual-network architecture • Detector network • Sees real samples • Learn how to detect real samples from fake ones created by the Generator network • Generator network • Doesn’t see real samples • Creates images from random data • Applies the same weight updates as the Generator • Learns gradually how to generate better fake samples
  • 28.
    GAN: Welcome tothe (un)real world, Neo Generating new ”celebrity” faces https://github.com/tkarras/progressive_growing_of_gans From semantic map to 2048x1024 picture https://tcwang0509.github.io/pix2pixHD/
  • 29.
  • 30.
    Apache MXNet: OpenSource library for Deep Learning Programmable Portable High Performance Near linear scaling across hundreds of GPUs Highly efficient models for mobile and IoT Simple syntax, multiple languages Most Open Best On AWS Optimized for Deep Learning on AWS Accepted into the Apache Incubator MXNet 1.0 released on December 4th
  • 31.
    Input Output 1 11 1 0 1 0 0 0 3 mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20) mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) , st r i de=( 2, 2) l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed) 4 2 2 0 4=Max 1 3 ... 4 0.2 -0.1 ... 0.7 mx. sym. Ful l yConnect ed( dat a, num_hi dden=128) 2 mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k) 0.2 -0.1 ... 0.7 Queen 4 2 2 0 2=Avg Input Weights cos(w, queen ) = cos(w, k i n g) - cos(w, m an ) + cos(w, w om an ) mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" ) " r el u" " t anh" " si gmoi d" " sof t r el u" Neural Art Face Search Image Segmentation Image Caption “ People Riding Bikes” Bicycle, People, Road, Sport Image Labels Image Video Speech Text “ People Riding Bikes” Machine Translation “ Οι άνθρωποι ιππασίας ποδήλατα” Events mx. model . FeedFor war d model . f i t mx. sym. Sof t maxOut put
  • 32.
  • 33.
  • 34.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. The Apache MXNet API • Storing and accessing data in multi-dimensional arrays NDArray API • Building models (layers, weights, activation functions)  Symbol API • Serving data during training and validation  Iterators • Training and using models  Module API
  • 35.
    Demos https://github.com/juliensimon/dlnotebooks 1) Hello World:learn a synthetic data set 2) Classify images with pre-trained models 3) Learn and predict MNIST with a Multi-Layer Perceptron and the LeNet CNN 4) Train, host and deploy a model with Amazon SageMaker 5) Generate MNIST samples with a GAN
  • 36.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. One-click training for ML, DL, and custom algorithms Easier training with hyperparameter optimization Highly-optimized machine learning algorithms Deployment without engineering effort Fully-managed hosting at scale BuildPre-built notebook instances Deploy Train Amazon SageMaker
  • 37.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Resources https://aws.amazon.com/machine-learning https://aws.amazon.com/sagemaker https://aws.amazon.com/blogs/ai https://mxnet.incubator.apache.org https://github.com/apache/incubator-mxnet https://github.com/gluon-api An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4 https://medium.com/@julsimon
  • 38.
    Thank you! Julien Simon,AI Evangelist, EMEA @julsimon

Editor's Notes

  • #31 XXX CPU and GPU
  • #37 DEPLOY   Deployment without Engineering Effort After training, SageMaker provides the model artifacts and scoring images to you for deployment to Amazon EC2 or anywhere else. When you’re ready to deploy your model, you can launch into a secure and elastically scalable environment, with one-click deployment from the SageMaker console.   Fully Managed Amazon SageMaker handles all of the compute infrastructure on your behalf, with built-in Amazon CloudWatch monitoring and logging, to perform health checks, apply security patches, and other routine maintenance, as well as ensure updates to the supported deep learning frameworks as they become available.