7 steps for highly effective
deep neural networks
Natalino Busa - Head of Data Science
2 Natalino Busa - @natbusa
O’Reilly Author and Speaker
Teradata EMEA Practice Lead on Open Source Technologies
Teradata Principal Data Scientist
ING Group Enterprise Architect: Cybersecurity, Marketing, Fintech
Cognitive Finance Group Advisory Board Member
Philips Senior Researcher, Data Architect
Linkedin and Twitter:
@natbusa
3
Natalino Busa - @natbusa
X: independent variable
Y: dependent variable
Linear Regression:
How to best fit a line to some data
4
Natalino Busa - @natbusa
input : 784 numbers
output: 10 classes
28x28 pixels
Classification: Handwritten digits
5
Natalino Busa - @natbusa
Sharing the (Not so) Secret Lore:
Keras & Tensorflow
Some smaller projects:
Tflearn, Tensorlayers
http://keras.io/
6
Natalino Busa - @natbusa
1: Single Layer Perceptron
“dendrites”
Axon’s
response
Activation function
7
Natalino Busa - @natbusa
More activation functions:
https://en.wikipedia.org/wiki/Activation_function
8
Natalino Busa - @natbusa
Single Layer Neural Network
Takes: n-input features: Map them to a soft “binary” space
∑
x1
x2
xn
f
1: Single Layer Perceptron (binary classifier)
9
Natalino Busa - @natbusa
1: Single Layer Perceptron (multi-class classifier)
Values between 0 and 1
Sum of all outcomes = 1
It produces an estimate of a probability!
From soft binary space to predicting probabilities:
Take n inputs, Divide by the sum of the predicted values
∑
x1
x2
xn
f
∑ f
softmax ‘1’: 95%
‘8’: 5%
10
Natalino Busa - @natbusa
1: Single Layer Perceptron
Minimize costs:
The cost function depends on:
- Parameters of the model
- How the model “composes”
Goal :
Reduce the mean probability error
modify the parameters to reduce the error! Vintage math from last century
11
Natalino Busa - @natbusa
Supervised Learning
Stack layers of perceptrons
- Feed Forward Network
- Scoring goes
from input to output
- Back propagate the error
from output to input
SOFTMAX
Input parameters
classes (estimated probabilities)
Feed-forwardfunctions
Cost function
supervised : actual output
BackPropagateErrors
12
Natalino Busa - @natbusa
Let’s go!
13
Natalino Busa - @natbusa
1. Single Layer Perceptron
14
Natalino Busa - @natbusa
1. Single Layer Perceptron
15
Natalino Busa - @natbusa
1. Single Layer Perceptron
16
Natalino Busa - @natbusa
1. Single Layer Perceptron
17
Natalino Busa - @natbusa
Tensorboard
18
Natalino Busa - @natbusa
Tensorboard
19
Natalino Busa - @natbusa
Tensorboard
20
Natalino Busa - @natbusa
2. Multi Layer Perceptron
21
Natalino Busa - @natbusa
2. Multi Layer Perceptron
22
Natalino Busa - @natbusa
2. Multi Layer Perceptron
23
Natalino Busa - @natbusa
2. Multi Layer Perceptron
24
Natalino Busa - @natbusa
3. Convolution
From Krizehvsky et al. (2012)
25
Natalino Busa - @natbusa
3. Convolution
diagrams:
By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45659236
By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45673581
CC0, https://en.wikipedia.org/w/index.php?curid=48817276
convolution Max pooling RELU / ELU
26
Natalino Busa - @natbusa
3. Convolution
27
Natalino Busa - @natbusa
3. Convolution
28
Natalino Busa - @natbusa
3. Batch Normalization
29
Natalino Busa - @natbusa
3. Batch Normalization ( example for MLP)
30
Natalino Busa - @natbusa
3. Batch Normalization ( example for MLP)
31
Natalino Busa - @natbusa
3. Batch Normalization ( example for MLP)
Activation function:
SIGMOID
32
Natalino Busa - @natbusa
3. Batch Normalization ( example for MLP)
Activation function:
RELU
33
Natalino Busa - @natbusa
4. Regularization: Prevent overfitting in ANNs
- Batch Normalization
- RELU/ELU
- RESIDUAL / SKIP Networks
- DROP LAYER
- REDUCE PRECISION (HUFFMAN ENCODING)
In general ANN are parameters rich, constraining the
parameter space usually produces better results and speed
up the learning
34
Natalino Busa - @natbusa
5. Inception architectures
Cannot be stacked!
35
Natalino Busa - @natbusa
5. Inception architectures
36
Natalino Busa - @natbusa
5. Inception architectures
compression
Avoid
dimensions
explosion
37
Natalino Busa - @natbusa
5. Inception architectures
38
Natalino Busa - @natbusa
5. Inception architectures (top architecture)
39
Natalino Busa - @natbusa
6. Residual Networks
https://culurciello.github.io/tech/2016/06/04/nets.html
40
Natalino Busa - @natbusa
6. Residual Networks
41
Natalino Busa - @natbusa
6. Residual + Inception Networks
42
Natalino Busa - @natbusa
7. LSTM on Images
43
Natalino Busa - @natbusa
7. LSTM on Images
44
Natalino Busa - @natbusa
7. LSTM on Images
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
45
Natalino Busa - @natbusa
7. LSTM on Images
46
Natalino Busa - @natbusa
7. LSTM on Images
47
Natalino Busa - @natbusa
7. LSTM on ConvNets (bonus slide)
48
Natalino Busa - @natbusa
All the codez :)
https://github.com/natbusa/deepnumbers
49
Natalino Busa - @natbusa
Meta- References
… just a few articles, but extremely dense in content.
A must read!
https://keras.io/
http://karpathy.github.io/neuralnets/
https://culurciello.github.io/tech/2016/06/04/nets.html
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b

7 steps for highly effective deep neural networks

  • 1.
    7 steps forhighly effective deep neural networks Natalino Busa - Head of Data Science
  • 2.
    2 Natalino Busa- @natbusa O’Reilly Author and Speaker Teradata EMEA Practice Lead on Open Source Technologies Teradata Principal Data Scientist ING Group Enterprise Architect: Cybersecurity, Marketing, Fintech Cognitive Finance Group Advisory Board Member Philips Senior Researcher, Data Architect Linkedin and Twitter: @natbusa
  • 3.
    3 Natalino Busa -@natbusa X: independent variable Y: dependent variable Linear Regression: How to best fit a line to some data
  • 4.
    4 Natalino Busa -@natbusa input : 784 numbers output: 10 classes 28x28 pixels Classification: Handwritten digits
  • 5.
    5 Natalino Busa -@natbusa Sharing the (Not so) Secret Lore: Keras & Tensorflow Some smaller projects: Tflearn, Tensorlayers http://keras.io/
  • 6.
    6 Natalino Busa -@natbusa 1: Single Layer Perceptron “dendrites” Axon’s response Activation function
  • 7.
    7 Natalino Busa -@natbusa More activation functions: https://en.wikipedia.org/wiki/Activation_function
  • 8.
    8 Natalino Busa -@natbusa Single Layer Neural Network Takes: n-input features: Map them to a soft “binary” space ∑ x1 x2 xn f 1: Single Layer Perceptron (binary classifier)
  • 9.
    9 Natalino Busa -@natbusa 1: Single Layer Perceptron (multi-class classifier) Values between 0 and 1 Sum of all outcomes = 1 It produces an estimate of a probability! From soft binary space to predicting probabilities: Take n inputs, Divide by the sum of the predicted values ∑ x1 x2 xn f ∑ f softmax ‘1’: 95% ‘8’: 5%
  • 10.
    10 Natalino Busa -@natbusa 1: Single Layer Perceptron Minimize costs: The cost function depends on: - Parameters of the model - How the model “composes” Goal : Reduce the mean probability error modify the parameters to reduce the error! Vintage math from last century
  • 11.
    11 Natalino Busa -@natbusa Supervised Learning Stack layers of perceptrons - Feed Forward Network - Scoring goes from input to output - Back propagate the error from output to input SOFTMAX Input parameters classes (estimated probabilities) Feed-forwardfunctions Cost function supervised : actual output BackPropagateErrors
  • 12.
    12 Natalino Busa -@natbusa Let’s go!
  • 13.
    13 Natalino Busa -@natbusa 1. Single Layer Perceptron
  • 14.
    14 Natalino Busa -@natbusa 1. Single Layer Perceptron
  • 15.
    15 Natalino Busa -@natbusa 1. Single Layer Perceptron
  • 16.
    16 Natalino Busa -@natbusa 1. Single Layer Perceptron
  • 17.
    17 Natalino Busa -@natbusa Tensorboard
  • 18.
    18 Natalino Busa -@natbusa Tensorboard
  • 19.
    19 Natalino Busa -@natbusa Tensorboard
  • 20.
    20 Natalino Busa -@natbusa 2. Multi Layer Perceptron
  • 21.
    21 Natalino Busa -@natbusa 2. Multi Layer Perceptron
  • 22.
    22 Natalino Busa -@natbusa 2. Multi Layer Perceptron
  • 23.
    23 Natalino Busa -@natbusa 2. Multi Layer Perceptron
  • 24.
    24 Natalino Busa -@natbusa 3. Convolution From Krizehvsky et al. (2012)
  • 25.
    25 Natalino Busa -@natbusa 3. Convolution diagrams: By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45659236 By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45673581 CC0, https://en.wikipedia.org/w/index.php?curid=48817276 convolution Max pooling RELU / ELU
  • 26.
    26 Natalino Busa -@natbusa 3. Convolution
  • 27.
    27 Natalino Busa -@natbusa 3. Convolution
  • 28.
    28 Natalino Busa -@natbusa 3. Batch Normalization
  • 29.
    29 Natalino Busa -@natbusa 3. Batch Normalization ( example for MLP)
  • 30.
    30 Natalino Busa -@natbusa 3. Batch Normalization ( example for MLP)
  • 31.
    31 Natalino Busa -@natbusa 3. Batch Normalization ( example for MLP) Activation function: SIGMOID
  • 32.
    32 Natalino Busa -@natbusa 3. Batch Normalization ( example for MLP) Activation function: RELU
  • 33.
    33 Natalino Busa -@natbusa 4. Regularization: Prevent overfitting in ANNs - Batch Normalization - RELU/ELU - RESIDUAL / SKIP Networks - DROP LAYER - REDUCE PRECISION (HUFFMAN ENCODING) In general ANN are parameters rich, constraining the parameter space usually produces better results and speed up the learning
  • 34.
    34 Natalino Busa -@natbusa 5. Inception architectures Cannot be stacked!
  • 35.
    35 Natalino Busa -@natbusa 5. Inception architectures
  • 36.
    36 Natalino Busa -@natbusa 5. Inception architectures compression Avoid dimensions explosion
  • 37.
    37 Natalino Busa -@natbusa 5. Inception architectures
  • 38.
    38 Natalino Busa -@natbusa 5. Inception architectures (top architecture)
  • 39.
    39 Natalino Busa -@natbusa 6. Residual Networks https://culurciello.github.io/tech/2016/06/04/nets.html
  • 40.
    40 Natalino Busa -@natbusa 6. Residual Networks
  • 41.
    41 Natalino Busa -@natbusa 6. Residual + Inception Networks
  • 42.
    42 Natalino Busa -@natbusa 7. LSTM on Images
  • 43.
    43 Natalino Busa -@natbusa 7. LSTM on Images
  • 44.
    44 Natalino Busa -@natbusa 7. LSTM on Images http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 45.
    45 Natalino Busa -@natbusa 7. LSTM on Images
  • 46.
    46 Natalino Busa -@natbusa 7. LSTM on Images
  • 47.
    47 Natalino Busa -@natbusa 7. LSTM on ConvNets (bonus slide)
  • 48.
    48 Natalino Busa -@natbusa All the codez :) https://github.com/natbusa/deepnumbers
  • 49.
    49 Natalino Busa -@natbusa Meta- References … just a few articles, but extremely dense in content. A must read! https://keras.io/ http://karpathy.github.io/neuralnets/ https://culurciello.github.io/tech/2016/06/04/nets.html http://colah.github.io/posts/2015-08-Understanding-LSTMs/ https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b