Regularization
in
Deep Neural Networks
Dr. Akhter Mohiuddin
Great Lakes Institute of Management
Regularization
• The complexity of the DNN can
increase such that the training error
reduces but the testing error doesn’t.
• Regularization is a technique which makes slight modifications to the
learning algorithm such that the model generalizes better.
• This in turn improves the model’s performance on the unseen data as
well.
Regularization techniques
• Regularization refers to a set of different techniques that lower the
complexity of a neural network model during training, and thus
prevent the overfitting.
• The following are the regularization techniques:
1. L1 & L2
2. Dropout
3. Early stopping
Dropout
• Dropout works by causing hidden neurons of the neural network to
be unavailable during part of the training.
• Dropping part of the neural network causes the remaining portion to
be trained to still achieve a good score even without the dropped
neurons.
• This decreases co-adaption between neurons, which results in less
overfitting.
• Dropout layers will periodically drop some of their neurons during
training. You can use dropout layers on regular feedforward neural
networks.
• The following animation that shows how dropout works:
https://yusugomori.com/projects/deep-learning/dropout-relu
Dropout layer
Dropout layer
• The discarded neurons and their connections are shown as dashed lines.
• The input layer has two input neurons as well as a bias neuron.
• The second layer is a dense layer with three neurons as well as a bias
neuron.
• The third layer is a dropout layer with six regular neurons even though
the program has dropped 50% of them.
• While the program drops these neurons, it neither calculates nor trains
them. However, the final neural network will use all of these neurons for
the output. As previously mentioned, the program only temporarily
discards the neurons.
Dropout is like bootstrapping
• Bootstrapping is one of the most simple ensemble techniques.
• Bootstrapping simply trains a number of neural networks to perform exactly the
same task.
• However, each of these neural networks will perform differently because of some
training techniques and the random numbers used in the neural network weight
initialization.
• This process decreases overfitting through the consensus of differently trained
neural networks.
• Dropout works somewhat like bootstrapping.
• You might think of each neural network that results from a different set of neurons
being dropped out as an individual member in an ensemble.
• As training progresses, the program creates more neural networks in this way.
• However, dropout does not require the same amount of processing as does
bootstrapping.
L1 and L2 Regularization
• The most common type of regularization for deep learning models is
the one that keeps the weights of the network small.
• This type of regularization is called weight regularization and has two
different variations: L2 regularization and L1 regularization.
• In weight regularization, a penalizing term is added to the loss
function. This term is either L2 norm (the sum of the squared values)
of the weights, or L1 norm (the sum of the absolute values) of the
weights.
Early Stopping
• Early stopping is a kind of cross-validation strategy where we keep
one part of the training set as the validation set.
• When we see that the performance on the validation set is getting
worse, we immediately stop the training on the model. This is known
as early stopping.
• In the given image, we will stop training
at the dotted line since after that our model will start
overfitting on the training data.
Early stopping in Keras
• In keras, we can apply early stopping using the callbacks function.
Below is the sample code for it.
• Here, monitor denotes the quantity that needs to be monitored and
‘val_loss’ denotes the validation error.
• Patience denotes the number of epochs with no further
improvement after which the training will be stopped.
Thanks!
Any questions?
akhter.m@greatlakes.edu.in
11

Regularizing DNN.pptx

  • 1.
    Regularization in Deep Neural Networks Dr.Akhter Mohiuddin Great Lakes Institute of Management
  • 2.
    Regularization • The complexityof the DNN can increase such that the training error reduces but the testing error doesn’t. • Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. • This in turn improves the model’s performance on the unseen data as well.
  • 3.
    Regularization techniques • Regularizationrefers to a set of different techniques that lower the complexity of a neural network model during training, and thus prevent the overfitting. • The following are the regularization techniques: 1. L1 & L2 2. Dropout 3. Early stopping
  • 4.
    Dropout • Dropout worksby causing hidden neurons of the neural network to be unavailable during part of the training. • Dropping part of the neural network causes the remaining portion to be trained to still achieve a good score even without the dropped neurons. • This decreases co-adaption between neurons, which results in less overfitting. • Dropout layers will periodically drop some of their neurons during training. You can use dropout layers on regular feedforward neural networks. • The following animation that shows how dropout works: https://yusugomori.com/projects/deep-learning/dropout-relu
  • 5.
  • 6.
    Dropout layer • Thediscarded neurons and their connections are shown as dashed lines. • The input layer has two input neurons as well as a bias neuron. • The second layer is a dense layer with three neurons as well as a bias neuron. • The third layer is a dropout layer with six regular neurons even though the program has dropped 50% of them. • While the program drops these neurons, it neither calculates nor trains them. However, the final neural network will use all of these neurons for the output. As previously mentioned, the program only temporarily discards the neurons.
  • 7.
    Dropout is likebootstrapping • Bootstrapping is one of the most simple ensemble techniques. • Bootstrapping simply trains a number of neural networks to perform exactly the same task. • However, each of these neural networks will perform differently because of some training techniques and the random numbers used in the neural network weight initialization. • This process decreases overfitting through the consensus of differently trained neural networks. • Dropout works somewhat like bootstrapping. • You might think of each neural network that results from a different set of neurons being dropped out as an individual member in an ensemble. • As training progresses, the program creates more neural networks in this way. • However, dropout does not require the same amount of processing as does bootstrapping.
  • 8.
    L1 and L2Regularization • The most common type of regularization for deep learning models is the one that keeps the weights of the network small. • This type of regularization is called weight regularization and has two different variations: L2 regularization and L1 regularization. • In weight regularization, a penalizing term is added to the loss function. This term is either L2 norm (the sum of the squared values) of the weights, or L1 norm (the sum of the absolute values) of the weights.
  • 9.
    Early Stopping • Earlystopping is a kind of cross-validation strategy where we keep one part of the training set as the validation set. • When we see that the performance on the validation set is getting worse, we immediately stop the training on the model. This is known as early stopping. • In the given image, we will stop training at the dotted line since after that our model will start overfitting on the training data.
  • 10.
    Early stopping inKeras • In keras, we can apply early stopping using the callbacks function. Below is the sample code for it. • Here, monitor denotes the quantity that needs to be monitored and ‘val_loss’ denotes the validation error. • Patience denotes the number of epochs with no further improvement after which the training will be stopped.
  • 11.