2. DEFINE REGULARIZATION
Regularization is a technique which makes slight modifications to the
learning algorithm such that the model generalizes better. This in turn
improves the model’s performance on the unseen data as well.
3. WHAT’S DROPOUT?
In machine learning, “dropout” refers to the practice of
ignoring certain nodes in a layer at random during training. A
dropout is a regularization approach that prevents overfitting by
ensuring that no units are codependent with one another.
This is the one of the most interesting types of regularization
techniques. It also produces very good results and is consequently
the most frequently used regularization technique in the field of
deep learning.
4. DROPOUT REGULARIZATION
• When you have training data, if you try to train your model too much, it
might overfit, and when you get the actual test data for making
predictions, it will not probably perform well. Dropout regularization is
one technique used to tackle overfitting problems in deep learning.
• That’s what we are going to look into in this blog, and we’ll go over some
theories first, and then we’ll write python code using TensorFlow, and
we’ll see how adding a dropout layer increases the performance of your
neural network.
5. WHAT IS A DROPOUT?
• The term “dropout” refers to dropping out the nodes (input and hidden
layer) in a neural network (as seen in Figure 1). All the forward and
backwards connections with a dropped node are temporarily removed,
thus creating a new network architecture out of the parent network.
The nodes are dropped by a dropout probability of p. So what does
dropout do?
• At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown
below.
6.
7.
8. Why dropout works?
• By using dropout, in every iteration, you will work on a smaller neural
network than the previous one and therefore, it approaches
regularization.
• Dropout helps in shrinking the squared norm of the weights and this tends
to a reduction in overfitting.
9. • Overfitting is avoided by training with two dropout layers and a dropout probability of 25%.
However, this affects training accuracy, necessitating the training of a regularized network
over a longer period.
Leaving improves model generalization. Although the training accuracy is lower than that of
the unregularized network, the total validation accuracy has improved. This explains why the
generalization error has decreased.
Why will dropout help with overfitting?
• It can’t rely on one input as it might be randomly dropped out.
• Neurons will not learn redundant details of inputs
10. THE DRAWBACKS OF DROPOUT
Although dropout is a potent tool, it has certain downsides. A
dropout network may take 2-3 times longer to train than a normal
network. Finding a regularizer virtually comparable to a dropout layer
is one method to reap the benefits of dropout without slowing down
training. This regularizer is a modified variant of L2 regularization for
linear regression. An analogous regularizer for more complex models
has yet to be discovered until that time when doubt drops out.
11. 2. MAX-NORM REGULARIZATION
Max-norm regularization is a regularization technique that
constrains the weights of a neural network. The constraint
imposed on the network by max-norm regularization is
simple. The weight vector associated with each neuron is forced
to have an L2 norm of at most r, where r is a hyperparameter.
12. The constraint imposed on the network by max-norm
regularization is simple. The weight vector associated with each
neuron is forced to have an ℓ2 norm of at most r, where r is a
max-norm hyperparameter.
If this constraint is not satisfied, the weight vector is replaced by
the unit vector in the same direction that has been scaled by r.
This may be written as,
13. Reducing r increases the amount of regularization and helps reduce
overfitting. Maxnormvregularization can also help alleviate the
vanishing/exploding gradients problems(if you are not using Batch
Normalization).
14. WHY IS MAX-NORM REGULARIZATION AN EFFECTIVE
REGULARIZATION TECHNIQUE?
Throughout the training process it is common for certain weights in the network
to grow particularly large in order to fit specific examples in the training set. The presence
of large weights in a network usually causes the network to produce large variations in
output for small variations in input. As a result of this, large weights usually have an
adverse effect on the generalization capabilities of a network.
Large weights are generally characteristic of an overfitted model.
Max-norm regularization constrains the values of the weights in the network. This
prevents the network from making use of large weights in order to fit specific examples
in the training set (at the expense of being able to effectively generalize).
15. Max-norm is a somewhat more aggressive regularization technique
than ℓ1 and ℓ2 regularization in preventing the use of large weights.
These techniques add a penalty term to the loss function. This penalty
term is a function of the networks weights, the ℓ1 and ℓ2 norm
respectively.
Unlike these techniques, max-norm regularization constrains the weights
of the network, providing the guarantee that their magnitude will not
exceed a given threshold value.
ℓ1 and ℓ2 regularization discourage the use of large weights, whereas
max-norm regularization prevents the magnitude of any weight from
exceeding a given threshold value.