The document provides an overview of deep learning fundamentals. It discusses key concepts like neural networks, convolutional neural networks, activation functions, backpropagation, and optimization techniques like stochastic gradient descent. Examples are given of deep learning applications in areas like computer vision, natural language processing, and medical imaging. The document also traces the history and growth of deep learning since 2012, driven by advances in hardware, software frameworks, and large datasets.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Deep Learning Fundamentals
Thomas Delteil, Machine Learning Scientist, Amazon AI
Soji Adeshina, Machine Learning Engineer, Amazon AI
Š2018 Amazon Web Services, Inc. or its affiliates, All rights reserved
2.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Why do machine learning?
How many cats?
Complex tasks where you canât code up explicit solutions
3.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
⢠Data & labels
⢠Classification, Labeling
⢠Regression
Supervised
⢠Data, no labels
⢠Clustering
⢠Dimensionality reduction
Unsupervised
⢠Data, some labels
⢠Active learning
⢠Reinforcement learning
Semi-supervised
Types of Machine Learning
4.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Situating Deep Learning
AI
Machine
Learning
Deep
Learning
Can machines think?
Can machines do what we can?
(Turing, 1950)
Machine
Learning
Data
Answers Rules
5.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Linear and non-linear separability
6.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
What is âDeepâ Learning ?
7.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Deep computational graph
Inception model
100+ Millions of learnable
parameters
⢠đ§ = đĽ â đŚ
⢠đ = đ â đ
⢠đĄ = đđ§ + đ
x y
đ§
x
đ
đ˘
x
a
x
b
k
đĄ
+
1 1
2
3
8.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
What can Deep Learning do?
9.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
10.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
11.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
And many more
- Action recognition
- Image super resolution
- Pose estimation
- Image generation
- Text to speech
- Speech to text
- Text recognition
- Robotics policy learning
- âŚ
12.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Sea/Land segmentation via satellite images
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017
13.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Automatic Galaxy classification
Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017
14.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Medical Imaging, MRI, X-ray, surgical cameras
Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016
15.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Stock market predictions
Deep Learning for Forecasting Stock Returns in the Cross-Section, Masaya Abe and Hideki Nakayama 2017
16.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
How did it start ?
17.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E.
Hinton, Advances in Neural Information Processing Systems, 2012
AlexNet architecture
2012 - ImageNet Classification with Deep
Convolutional Neural Networks
18.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Classify images among 1000 classes:
AlexNet Top-5 error-rate, 25% => 16%!ImageNet
19.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Actual photo of the reaction from the computer vision community*
*might just be a stock photo
20.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
I told you
so!
21.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Why now?
22.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Nvidia V100, float16 Ops:
~ 120 TFLOPS, 5000+ cuda cores
(#1 Super computer 2005 135 TFLOPS)
Source: Mathworks
Hardware: GPUs!
23.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
And more specialized hardware being
developed as we speak
- AWS Inferentia
- Intel Movidius
- FPGAs
- Apple A11 âneural engineâ
- âŚ
24.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Software
- Deep learning frameworks:
- MXNet
- Tensorflow
- Pytorch
- Deep learning accelerators
- TensorRT
- MKLDNN
- Deep learning compiler
- TVM
- Glow
- Deep Learning APIs
- ONNX
- Keras
25.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
How does it work?
26.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
Predict if a person earns >$50K
Training examples (rows)
Input features / x
Label / ground truth / y
27.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
One-hot encoding to convert categorical features
Age Edu_Bachelors Edu_Masters Years of
education
Marital_Single ⌠Label
39 1 0 16 1 ⌠-1
31 0 1 18 0 ⌠+1
28.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Inspired by the brainâs neurons
We have ~100B of them, and ~1Q Synapses
w1
w2
wn
x1
x2
xn
ÎŁ Ď
Inputs Weights Activation
đŚ
âŚ
đŚ = đ(
đ=1
đ
đ¤đ đĽđ + đ)
(Artificial) Neural Networks (ANN)
b
Bias
29.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Bias term
⢠Each neuron has a bias associated with it
⢠Moves the activation left or right on x-axis
đŚ = đ(
đ=1
đ
đ¤đ đĽđ + đ)
30.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
the Multi Layer Perceptron (MLP)
hidden layers
Input layer
output
Activation
Discriminator
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Š 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0.
4
0.
3
0.
2
0.
9
...
backpropagation (gradient descent)
đŚ != đŚ
0.4 Âą đż 0.3 Âą đż
new
weights
new
weights
0
1
0
1
1
.
.
.
X
input
label
...
đŚ
âđ = ÎŚ(
đ=1
đ
đ¤đ đĽđ + đ)
The learning in Deep Learning
33.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Other layers: Convolutional Neural Networks
34.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time
35.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
It is the cross-channel sum of the element-wise
multiplication of a convolutional filter (kernel/mask)
computed over a sliding window on an input tensor given
a certain stride and padding, plus a bias term. The result
is called a feature map.
2 2 1
3 1 -1
4 3 2
1 -1
-1 0
Input matrix (3x3)
no padding
1 channel
Kernel (2x2)
Stride 1
Bias = 2
Feature map (2x2)
-1 2
0 1
1*2 â1*2 â1*3 + 0*1 + 2 = â 1
1*2 â1*2 â1*1 + 0*-1 + 2. = 2
1*3 â1*1 â1*4 + 0*3 + 2 = 0
1*1 â (-1)*1 â1*3 + 0*2 + 2 = 1
How does it work?
36.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Spatial data: Convolutional Layers (images, etc)
37.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
- Detect patterns at larger and larger scale by stacking
convolution layers on top of each others to grow the
receptive field
- Applicable to spatially correlated data
Source: AlexNet first 96 filters learned represented in RGB space
38.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time,
now learnable!
39.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Source: ML Review, A guide to receptive field arithmetic
Deeper in the
network
Hierarchical learning: growing receptive field
40.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Another layer: Max pooling
41.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
A lot of others layers:
- Batch normalization layer
- Dropout layer
- Pooling layer
- Attention layer
- Recurrent Layers (LSTM, GRU)
- âŚ
42.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
More deep learning concepts
43.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Overfitting
⢠Model learns signal as well as
noise in the training data.
⢠Model doesnât generalize
⢠too few data points, noisy
data, or too large of a network
44.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Parameters and Hyperparameters
⢠Parameters
⢠Numeric values in the model: weights and biases
⢠Learned during training
⢠Hyperparameters
⢠Values set for the training session
⢠Numeric e.g. mini-batch size
⢠Non-numeric e.g. which algorithm to use for optimization
⢠Hyperparameter optimization
⢠Outer layer of learning / searching for hyperparameters
45.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Accuracy vs. Loss
⢠Accuracy: A percentage
⢠Correct or not per example
⢠Loss: calculated during training
⢠How far off is the current model?
⢠Continuous value
⢠Common loss functions
⢠Mean squared error (regression)
⢠Cross entropy: log of difference in probability
⢠During training, minimize loss with an optimizer
46.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
⢠Take a series of steps
⢠Specify a learning rate:
⢠weight = weight + learning_rate * gradient
47.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Other optimization rules:
48.
Š 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
Conclusion
- Deep learning is a collection of techniques and algorithms
- Characterized by large computational graphs with learnable
parameters
- Trained using backward propagation of the gradients of a loss
- Usually requiring large amount of data
- Possible by advances in hardware and software
- Applied to a variety of tasks across a large number of domains
#4Â Semi-supervised?? Well, things that donât neatly fit into one of those buckets or the other.
#5Â Sometimes here people say âML and DLâ as if they were two different things.
#6Â Good examples:
Imagine x = longitude, y = latitude
Linearly separable: blue dots = people in USA; green dots = Canada
Non-Linearly separable: blue = people in a city, green = people outside city
#13Â Many fields of application, no need to be an expert
#27Â Common ML educational task on old USA census data.
#28Â All elements of input feature vector must be numbers.
Assign to positions in the vector, not just arbitrary numbers
There are more complex ways of encoding categorical features e.g. embeddings
#32Â Without these activation functions we would just have linear combinations of features and weights