SlideShare a Scribd company logo
AVOIDING OVERFITTING THROUGH
REGULARIZATION
UNIT-2 PART-4
DEFINE REGULARIZATION
Regularization is a technique which makes slight modifications to the
learning algorithm such that the model generalizes better. This in turn
improves the model’s performance on the unseen data as well.
WHAT’S DROPOUT?
In machine learning, “dropout” refers to the practice of
ignoring certain nodes in a layer at random during training. A
dropout is a regularization approach that prevents overfitting by
ensuring that no units are codependent with one another.
This is the one of the most interesting types of regularization
techniques. It also produces very good results and is consequently
the most frequently used regularization technique in the field of
deep learning.
DROPOUT REGULARIZATION
• When you have training data, if you try to train your model too much, it
might overfit, and when you get the actual test data for making
predictions, it will not probably perform well. Dropout regularization is
one technique used to tackle overfitting problems in deep learning.
• That’s what we are going to look into in this blog, and we’ll go over some
theories first, and then we’ll write python code using TensorFlow, and
we’ll see how adding a dropout layer increases the performance of your
neural network.
WHAT IS A DROPOUT?
• The term “dropout” refers to dropping out the nodes (input and hidden
layer) in a neural network (as seen in Figure 1). All the forward and
backwards connections with a dropped node are temporarily removed,
thus creating a new network architecture out of the parent network.
The nodes are dropped by a dropout probability of p. So what does
dropout do?
• At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown
below.
Why dropout works?
• By using dropout, in every iteration, you will work on a smaller neural
network than the previous one and therefore, it approaches
regularization.
• Dropout helps in shrinking the squared norm of the weights and this tends
to a reduction in overfitting.
• Overfitting is avoided by training with two dropout layers and a dropout probability of 25%.
However, this affects training accuracy, necessitating the training of a regularized network
over a longer period.
Leaving improves model generalization. Although the training accuracy is lower than that of
the unregularized network, the total validation accuracy has improved. This explains why the
generalization error has decreased.
Why will dropout help with overfitting?
• It can’t rely on one input as it might be randomly dropped out.
• Neurons will not learn redundant details of inputs
THE DRAWBACKS OF DROPOUT
Although dropout is a potent tool, it has certain downsides. A
dropout network may take 2-3 times longer to train than a normal
network. Finding a regularizer virtually comparable to a dropout layer
is one method to reap the benefits of dropout without slowing down
training. This regularizer is a modified variant of L2 regularization for
linear regression. An analogous regularizer for more complex models
has yet to be discovered until that time when doubt drops out.
2. MAX-NORM REGULARIZATION
Max-norm regularization is a regularization technique that
constrains the weights of a neural network. The constraint
imposed on the network by max-norm regularization is
simple. The weight vector associated with each neuron is forced
to have an L2 norm of at most r, where r is a hyperparameter.
The constraint imposed on the network by max-norm
regularization is simple. The weight vector associated with each
neuron is forced to have an ℓ2 norm of at most r, where r is a
max-norm hyperparameter.
If this constraint is not satisfied, the weight vector is replaced by
the unit vector in the same direction that has been scaled by r.
This may be written as,
Reducing r increases the amount of regularization and helps reduce
overfitting. Maxnormvregularization can also help alleviate the
vanishing/exploding gradients problems(if you are not using Batch
Normalization).
WHY IS MAX-NORM REGULARIZATION AN EFFECTIVE
REGULARIZATION TECHNIQUE?
Throughout the training process it is common for certain weights in the network
to grow particularly large in order to fit specific examples in the training set. The presence
of large weights in a network usually causes the network to produce large variations in
output for small variations in input. As a result of this, large weights usually have an
adverse effect on the generalization capabilities of a network.
Large weights are generally characteristic of an overfitted model.
Max-norm regularization constrains the values of the weights in the network. This
prevents the network from making use of large weights in order to fit specific examples
in the training set (at the expense of being able to effectively generalize).
Max-norm is a somewhat more aggressive regularization technique
than ℓ1 and ℓ2 regularization in preventing the use of large weights.
These techniques add a penalty term to the loss function. This penalty
term is a function of the networks weights, the ℓ1 and ℓ2 norm
respectively.
Unlike these techniques, max-norm regularization constrains the weights
of the network, providing the guarantee that their magnitude will not
exceed a given threshold value.
ℓ1 and ℓ2 regularization discourage the use of large weights, whereas
max-norm regularization prevents the magnitude of any weight from
exceeding a given threshold value.
REFERENCES
https://www.analyticsvidhya.com/blog/2022/08/dropout-regularization-in-deep-
learning/
https://machinelearningjourney.com/index.php/2021/01/15/max-norm-
regularization/#:~:text=Max%2Dnorm%20regularization%20is%20a,where%20r%20i
s%20a%20hyperparameter.

More Related Content

Similar to Unit-4 PART-4 Overfitting.pptx

Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
Joe li
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
Joe li
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_B
Srimatre K
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural Nets
DEBJYOTI PAUL
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
dyyjkd
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
Seunghyun Hwang
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural NetworksOptimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Kashif Mehmood
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
cscpconf
 
Deep Learning Basics.pptx
Deep Learning Basics.pptxDeep Learning Basics.pptx
Deep Learning Basics.pptx
CallplanetsDeveloper
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
ijcses
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
waseem khan
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
sudheeremoa229
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 

Similar to Unit-4 PART-4 Overfitting.pptx (20)

Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_B
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural Nets
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural NetworksOptimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
 
Deep Learning Basics.pptx
Deep Learning Basics.pptxDeep Learning Basics.pptx
Deep Learning Basics.pptx
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 

Recently uploaded

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 

Recently uploaded (20)

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 

Unit-4 PART-4 Overfitting.pptx

  • 2. DEFINE REGULARIZATION Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well.
  • 3. WHAT’S DROPOUT? In machine learning, “dropout” refers to the practice of ignoring certain nodes in a layer at random during training. A dropout is a regularization approach that prevents overfitting by ensuring that no units are codependent with one another. This is the one of the most interesting types of regularization techniques. It also produces very good results and is consequently the most frequently used regularization technique in the field of deep learning.
  • 4. DROPOUT REGULARIZATION • When you have training data, if you try to train your model too much, it might overfit, and when you get the actual test data for making predictions, it will not probably perform well. Dropout regularization is one technique used to tackle overfitting problems in deep learning. • That’s what we are going to look into in this blog, and we’ll go over some theories first, and then we’ll write python code using TensorFlow, and we’ll see how adding a dropout layer increases the performance of your neural network.
  • 5. WHAT IS A DROPOUT? • The term “dropout” refers to dropping out the nodes (input and hidden layer) in a neural network (as seen in Figure 1). All the forward and backwards connections with a dropped node are temporarily removed, thus creating a new network architecture out of the parent network. The nodes are dropped by a dropout probability of p. So what does dropout do? • At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections as shown below.
  • 6.
  • 7.
  • 8. Why dropout works? • By using dropout, in every iteration, you will work on a smaller neural network than the previous one and therefore, it approaches regularization. • Dropout helps in shrinking the squared norm of the weights and this tends to a reduction in overfitting.
  • 9. • Overfitting is avoided by training with two dropout layers and a dropout probability of 25%. However, this affects training accuracy, necessitating the training of a regularized network over a longer period. Leaving improves model generalization. Although the training accuracy is lower than that of the unregularized network, the total validation accuracy has improved. This explains why the generalization error has decreased. Why will dropout help with overfitting? • It can’t rely on one input as it might be randomly dropped out. • Neurons will not learn redundant details of inputs
  • 10. THE DRAWBACKS OF DROPOUT Although dropout is a potent tool, it has certain downsides. A dropout network may take 2-3 times longer to train than a normal network. Finding a regularizer virtually comparable to a dropout layer is one method to reap the benefits of dropout without slowing down training. This regularizer is a modified variant of L2 regularization for linear regression. An analogous regularizer for more complex models has yet to be discovered until that time when doubt drops out.
  • 11. 2. MAX-NORM REGULARIZATION Max-norm regularization is a regularization technique that constrains the weights of a neural network. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an L2 norm of at most r, where r is a hyperparameter.
  • 12. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an ℓ2 norm of at most r, where r is a max-norm hyperparameter. If this constraint is not satisfied, the weight vector is replaced by the unit vector in the same direction that has been scaled by r. This may be written as,
  • 13. Reducing r increases the amount of regularization and helps reduce overfitting. Maxnormvregularization can also help alleviate the vanishing/exploding gradients problems(if you are not using Batch Normalization).
  • 14. WHY IS MAX-NORM REGULARIZATION AN EFFECTIVE REGULARIZATION TECHNIQUE? Throughout the training process it is common for certain weights in the network to grow particularly large in order to fit specific examples in the training set. The presence of large weights in a network usually causes the network to produce large variations in output for small variations in input. As a result of this, large weights usually have an adverse effect on the generalization capabilities of a network. Large weights are generally characteristic of an overfitted model. Max-norm regularization constrains the values of the weights in the network. This prevents the network from making use of large weights in order to fit specific examples in the training set (at the expense of being able to effectively generalize).
  • 15. Max-norm is a somewhat more aggressive regularization technique than ℓ1 and ℓ2 regularization in preventing the use of large weights. These techniques add a penalty term to the loss function. This penalty term is a function of the networks weights, the ℓ1 and ℓ2 norm respectively. Unlike these techniques, max-norm regularization constrains the weights of the network, providing the guarantee that their magnitude will not exceed a given threshold value. ℓ1 and ℓ2 regularization discourage the use of large weights, whereas max-norm regularization prevents the magnitude of any weight from exceeding a given threshold value.