Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Basic Deep Architectures (D1L4 Deep... by Universitat Polit... 1016 views
- Recurrent Neural Networks I (D2L2 D... by Universitat Polit... 1317 views
- Convolutional Neural Networks (D1L3... by Universitat Polit... 1095 views
- The Perceptron (D1L2 Deep Learning ... by Universitat Polit... 1796 views
- Deep Belief Networks (D2L1 Deep Lea... by Universitat Polit... 1183 views
- Training Deep Networks (D1L6 Deep L... by Universitat Polit... 648 views

479 views

Published on

Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.

The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.

Published in:
Data & Analytics

No Downloads

Total views

479

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

97

Comments

0

Likes

1

No embeds

No notes for slide

- 1. [course site] Day 1 Lecture 5 Backpropagation in Deep Networks Elisa Sayrol
- 2. From previous lectures L Hidden Layers Hidden pre-activation (k>0) Hidden activation (k=1,…L) Output activation (k=L+1) Figure Credit: Hugo Laroche NN course
- 3. Backpropagation algorithm The output of the Network gives class scores that depens on the input and the parameters • Define a loss function that quantifies our unhappiness with the scores across the training data. • Come up with a way of efficiently finding the parameters that minimize the loss function (optimization)
- 4. Probability Class given an input (softmax) h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Figure Credit: Kevin McGuiness Forward Pass
- 5. Probability Class given an input (softmax) Loss function; e.g., negative log-likelihood (good for classification) h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Regularization term (L2 Norm) aka as weight decay Figure Credit: Kevin McGuiness Forward Pass
- 6. Probability Class given an input (softmax) Minimize the loss (plus some regularization term) w.r.t. Parameters over the whole training set. Loss function; e.g., negative log-likelihood (good for classification) h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Regularization term (L2 Norm) aka as weight decay Figure Credit: Kevin McGuiness Forward Pass
- 7. Backpropagation algorithm • We need a way to fit the model to data: find parameters (W(k) , b(k) ) of the network that (locally) minimize the loss function. • We can use stochastic gradient descent. Or better yet, mini-batch stochastic gradient descent. • To do this, we need to find the gradient of the loss function with respect to all the parameters of the model (W(k) , b(k) ) • These can be found using the chain rule of differentiation. • The calculations reveal that the gradient wrt. the parameters in layer k only depends on the error from the above layer and the output from the layer below. • This means that the gradients for each layer can be computed iteratively, starting at the last layer and propagating the error back through the network. This is known as the backpropagation algorithm. Slide Credit: Kevin McGuiness
- 8. 1. Find the error in the top layer: h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass
- 9. 1. Find the error in the top layer: 2. Compute weight updates h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass To simplify we don’t consider the biass
- 10. 1. Find the error in the top layer: 3. Backpropagate error to layer below2. Compute weight updates h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass To simplify we don’t consider the biass
- 11. Optimization Stochastic Gradient Descent Stochastic Gradient Descent with momentum Stochastic Gradient Descent with L2 regularization Backpropagation: http://cs231n.github.io/optimization-2/ : learning rate : weight decay Recommended lectures: Optimization: http://sebastianruder.com/optimizing-gradient-descent/ Sebastian Ruder Blog
- 12. In the backward pass you might be in the flat part of the sigmoid (or any other activation function like tanh) so derivative tends to zero and your training loss will not go down “Vanishing Gradients”

No public clipboards found for this slide

Be the first to comment