Deep Learning Fundamentals

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning Fundamentals
Thomas Delteil, Machine Learning Scientist, Amazon AI
Soji Adeshina, Machine Learning Engineer, Amazon AI
©2018 Amazon Web Services, Inc. or its affiliates, All rights reserved

Why do machine learning?
How many cats?
Complex tasks where you can’t code up explicit solutions

• Data & labels
• Classification, Labeling
• Regression
Supervised
• Data, no labels
• Clustering
• Dimensionality reduction
Unsupervised
• Data, some labels
• Active learning
• Reinforcement learning
Semi-supervised
Types of Machine Learning

Situating Deep Learning
AI
Machine
Learning
Deep
Learning
Can machines think?
Can machines do what we can?
(Turing, 1950)
Machine
Learning
Data
Answers Rules

Linear and non-linear separability

What is “Deep” Learning ?

Deep computational graph
Inception model
100+ Millions of learnable
parameters
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3

What can Deep Learning do?

And many more
- Action recognition
- Image super resolution
- Pose estimation
- Image generation
- Text to speech
- Speech to text
- Text recognition
- Robotics policy learning
- …

Sea/Land segmentation via satellite images
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017

Automatic Galaxy classification
Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017

Medical Imaging, MRI, X-ray, surgical cameras
Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016

Stock market predictions
Deep Learning for Forecasting Stock Returns in the Cross-Section, Masaya Abe and Hideki Nakayama 2017

How did it start ?

ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E.
Hinton, Advances in Neural Information Processing Systems, 2012
AlexNet architecture
2012 - ImageNet Classification with Deep
Convolutional Neural Networks

Classify images among 1000 classes:
AlexNet Top-5 error-rate, 25% => 16%!ImageNet

Actual photo of the reaction from the computer vision community*
*might just be a stock photo

I told you
so!

Why now?

Nvidia V100, float16 Ops:
~ 120 TFLOPS, 5000+ cuda cores
(#1 Super computer 2005 135 TFLOPS)
Source: Mathworks
Hardware: GPUs!

And more specialized hardware being
developed as we speak
- AWS Inferentia
- Intel Movidius
- FPGAs
- Apple A11 “neural engine”
- …

Software
- Deep learning frameworks:
- MXNet
- Tensorflow
- Pytorch
- Deep learning accelerators
- TensorRT
- MKLDNN
- Deep learning compiler
- TVM
- Glow
- Deep Learning APIs
- ONNX
- Keras

How does it work?

Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
Predict if a person earns >$50K
Training examples (rows)
Input features / x
Label / ground truth / y

Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
One-hot encoding to convert categorical features
Age Edu_Bachelors Edu_Masters Years of
education
Marital_Single … Label
39 1 0 16 1 … -1
31 0 1 18 0 … +1

Inspired by the brain’s neurons
We have ~100B of them, and ~1Q Synapses
w1
w2
wn
x1
x2
xn
Σ φ
Inputs Weights Activation
𝑦
…
𝑦 = 𝜑(
𝑗=1
𝑛
𝑤𝑗 𝑥𝑗 + 𝑏)
(Artificial) Neural Networks (ANN)
b
Bias

Bias term
• Each neuron has a bias associated with it
• Moves the activation left or right on x-axis
𝑦 = 𝜑(
𝑗=1
𝑛

Deep Learning
the Multi Layer Perceptron (MLP)
hidden layers
Input layer
output
Activation
Discriminator

Activation Functions
• Determine how the neuron fires
• Represent non-linearity
𝑦 = 𝜑(
𝑗=1
𝑛
• 𝑆𝑖𝑔𝑚𝑜𝑖𝑑: Φ 𝑥 = 1
1+𝑒−𝑥
• 𝑡𝑎𝑛ℎ: Φ 𝑥 = 2
1+𝑒−2𝑥 −1
• 𝑟𝑒𝑙𝑢: Φ 𝑥 =
𝑥; 𝑖𝑓 𝑥 ≥ 0
0; 𝑖𝑓 𝑥 < 0
• 𝑠𝑜𝑓𝑡𝑚𝑎𝑥: Φ 𝑥 𝑖 = 𝑒 𝑥 𝑖
𝑘=1
𝐾 𝑒 𝑥 𝑘

0.
4
0.
3
0.
2
0.
9
...
backpropagation (gradient descent)
𝑦 != 𝑦
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
.
X
input
label
...
𝑦
ℎ𝑖 = Φ(
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 + 𝑏)
The learning in Deep Learning

Other layers: Convolutional Neural Networks

Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time

It is the cross-channel sum of the element-wise
multiplication of a convolutional filter (kernel/mask)
computed over a sliding window on an input tensor given
a certain stride and padding, plus a bias term. The result
is called a feature map.
2 2 1
3 1 -1
4 3 2
1 -1
-1 0
Input matrix (3x3)
no padding
1 channel
Kernel (2x2)
Stride 1
Bias = 2
Feature map (2x2)
-1 2
0 1
1*2 –1*2 –1*3 + 0*1 + 2 = – 1
1*2 –1*2 –1*1 + 0*-1 + 2. = 2
1*3 –1*1 –1*4 + 0*3 + 2 = 0
1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1
How does it work?

Spatial data: Convolutional Layers (images, etc)

- Detect patterns at larger and larger scale by stacking
convolution layers on top of each others to grow the
receptive field
- Applicable to spatially correlated data
Source: AlexNet first 96 filters learned represented in RGB space

Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time,
now learnable!

Source: ML Review, A guide to receptive field arithmetic
Deeper in the
network
Hierarchical learning: growing receptive field

Another layer: Max pooling

A lot of others layers:
- Batch normalization layer
- Dropout layer
- Pooling layer
- Attention layer
- Recurrent Layers (LSTM, GRU)
- …

More deep learning concepts

Overfitting
• Model learns signal as well as
noise in the training data.
• Model doesn’t generalize
• too few data points, noisy
data, or too large of a network

Parameters and Hyperparameters
• Parameters
• Numeric values in the model: weights and biases
• Learned during training
• Hyperparameters
• Values set for the training session
• Numeric e.g. mini-batch size
• Non-numeric e.g. which algorithm to use for optimization
• Hyperparameter optimization
• Outer layer of learning / searching for hyperparameters

Accuracy vs. Loss
• Accuracy: A percentage
• Correct or not per example
• Loss: calculated during training
• How far off is the current model?
• Continuous value
• Common loss functions
• Mean squared error (regression)
• Cross entropy: log of difference in probability
• During training, minimize loss with an optimizer

Stochastic Gradient Descent
• Take a series of steps
• Specify a learning rate:
• weight = weight + learning_rate * gradient

Other optimization rules:

Conclusion
- Deep learning is a collection of techniques and algorithms
- Characterized by large computational graphs with learnable
parameters
- Trained using backward propagation of the gradients of a loss
- Usually requiring large amount of data
- Possible by advances in hardware and software
- Applied to a variety of tasks across a large number of domains

Deep Learning Fundamentals

More Related Content

What's hot

Similar to Deep Learning Fundamentals

Recently uploaded

Deep Learning Fundamentals

Editor's Notes