Supporting slides for Hidden Layers MeetUp (Deep Learning Study Group) - January 31st, 2017
The presentation covers the common difficulties when creating a Deep Learning model (DL architecture, back-propagation, vanishing gradients, etc.)
1. The Art Of Backpropagation
and other Bedtime Deep Learning Stories
Jennifer Prendki, @WalmartLabs
2. Why this talk?
• Deep Learning can solve many problem
• Deep Learning is trendy
• Deep Learning is applied in many different industries
Everybody is using it, or want to use it
• But many people are using Deep Learning as a black-box
• There is no consistent theory regarding architecture building
3. Context: Neural Nets, Forward & Backward Feeds
• Back to the basics: what are Artificial Neural Nets?
The combination of:
• a training method
• an optimization method
A 2-phase cycle:
• propagation
• weight update
4. Deep Learning Glossary
• Input: the first layer (what is fed to the algorithm, the initial data columns)
• Output: what we want to compute (can be more than one value)
• Hidden layers: the neurons for the intermediate steps
• Forward propagation of a training pattern's input through the neural network in
order to generate the network's output value(s)
• Backward propagation of the propagation's output activations through the
neural net using the training pattern target in order to generate the
• Deltas: the difference between the targeted and actual output values of all
output and hidden neurons
• Weight update: the process of multiplying the output delta and input activation
to compute the gradient of the weight.
• Learning rate: ratio of the weight's gradient is subtracted from the weight
5. Backpropagation Algorithm
• Propagation
Forward propagation of a training pattern's input through the neural network in order to
generate the network's output value(s).
Backward propagation of the propagation's output activations through the neural
network using the training pattern target in order to generate the deltas.
• Weight update
The weight's output delta and input activation are multiplied to find the gradient of the
weight.
The weight is updated according to the learning rate.
6. Backpropagation Algorithm
Backpropagation can be explained
through the “Shoe Lace” analogy
- Too little tension =
- Not enough constraining, too loose
(unsatisfactory model)
- Too much tension =
- too much constraint (overtraining)
- taking too much time (slow process)
- higher likelihood of breaking (non
convergence)
- Pulling more on one than the other =
- discomfort (bias)
7. Learning Rate
• Learning rate definition:
• Ratio of the weight's gradient that is subtracted from the weight
• Learning rate = Trade-Off
• Large values for ratio => Fast training
• Lower ratios => Accurate training
• Question: How do you choose the learning rate?
8. Activation Function
• Backpropagation &
Supervised Learning
• Backpropagation used in
supervised context
• Backpropagation requires
the activation function to
be differentiable
9. Vanishing Gradient
• What is a vanishing gradient?
The case where some weights go down to 0
Lessons:
- Starting point for weight matter
(can fall into non optimal minimum)
- Large architectures make it harder to
control
- Expensive memory-wise (and useless)
hidden
10.
11. Let’s Recap: What Is Hard/Tricky with DL?
• What decisions to be made to build a DL model?
• Overall architecture (RNN, etc.)
• Number of layers
• Number of neurons
• Learning rate
• Conclusion
• Architecture building is sketchy and empirical
• Experimentation takes time and memory
• Loss function
• Activation function
• Starting weights
ARCHITECTURE MODEL DATA
• Number of inputs
• Number of outputs
• Amount of Data
Editor's Notes
LiveSlide Site
http://www.emergentmind.com/neural-network