SlideShare a Scribd company logo
1 of 16
AVOIDING OVERFITTING THROUGH
REGULARIZATION
UNIT-2 PART-4
DEFINE REGULARIZATION
Regularization is a technique which makes slight modifications to the
learning algorithm such that the model generalizes better. This in turn
improves the model’s performance on the unseen data as well.
WHAT’S DROPOUT?
In machine learning, “dropout” refers to the practice of
ignoring certain nodes in a layer at random during training. A
dropout is a regularization approach that prevents overfitting by
ensuring that no units are codependent with one another.
This is the one of the most interesting types of regularization
techniques. It also produces very good results and is consequently
the most frequently used regularization technique in the field of
deep learning.
DROPOUT REGULARIZATION
• When you have training data, if you try to train your model too much, it
might overfit, and when you get the actual test data for making
predictions, it will not probably perform well. Dropout regularization is
one technique used to tackle overfitting problems in deep learning.
• That’s what we are going to look into in this blog, and we’ll go over some
theories first, and then we’ll write python code using TensorFlow, and
we’ll see how adding a dropout layer increases the performance of your
neural network.
WHAT IS A DROPOUT?
• The term “dropout” refers to dropping out the nodes (input and hidden
layer) in a neural network (as seen in Figure 1). All the forward and
backwards connections with a dropped node are temporarily removed,
thus creating a new network architecture out of the parent network.
The nodes are dropped by a dropout probability of p. So what does
dropout do?
• At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown
below.
Why dropout works?
• By using dropout, in every iteration, you will work on a smaller neural
network than the previous one and therefore, it approaches
regularization.
• Dropout helps in shrinking the squared norm of the weights and this tends
to a reduction in overfitting.
• Overfitting is avoided by training with two dropout layers and a dropout probability of 25%.
However, this affects training accuracy, necessitating the training of a regularized network
over a longer period.
Leaving improves model generalization. Although the training accuracy is lower than that of
the unregularized network, the total validation accuracy has improved. This explains why the
generalization error has decreased.
Why will dropout help with overfitting?
• It can’t rely on one input as it might be randomly dropped out.
• Neurons will not learn redundant details of inputs
THE DRAWBACKS OF DROPOUT
Although dropout is a potent tool, it has certain downsides. A
dropout network may take 2-3 times longer to train than a normal
network. Finding a regularizer virtually comparable to a dropout layer
is one method to reap the benefits of dropout without slowing down
training. This regularizer is a modified variant of L2 regularization for
linear regression. An analogous regularizer for more complex models
has yet to be discovered until that time when doubt drops out.
2. MAX-NORM REGULARIZATION
Max-norm regularization is a regularization technique that
constrains the weights of a neural network. The constraint
imposed on the network by max-norm regularization is
simple. The weight vector associated with each neuron is forced
to have an L2 norm of at most r, where r is a hyperparameter.
The constraint imposed on the network by max-norm
regularization is simple. The weight vector associated with each
neuron is forced to have an ℓ2 norm of at most r, where r is a
max-norm hyperparameter.
If this constraint is not satisfied, the weight vector is replaced by
the unit vector in the same direction that has been scaled by r.
This may be written as,
Reducing r increases the amount of regularization and helps reduce
overfitting. Maxnormvregularization can also help alleviate the
vanishing/exploding gradients problems(if you are not using Batch
Normalization).
WHY IS MAX-NORM REGULARIZATION AN EFFECTIVE
REGULARIZATION TECHNIQUE?
Throughout the training process it is common for certain weights in the network
to grow particularly large in order to fit specific examples in the training set. The presence
of large weights in a network usually causes the network to produce large variations in
output for small variations in input. As a result of this, large weights usually have an
adverse effect on the generalization capabilities of a network.
Large weights are generally characteristic of an overfitted model.
Max-norm regularization constrains the values of the weights in the network. This
prevents the network from making use of large weights in order to fit specific examples
in the training set (at the expense of being able to effectively generalize).
Max-norm is a somewhat more aggressive regularization technique
than ℓ1 and ℓ2 regularization in preventing the use of large weights.
These techniques add a penalty term to the loss function. This penalty
term is a function of the networks weights, the ℓ1 and ℓ2 norm
respectively.
Unlike these techniques, max-norm regularization constrains the weights
of the network, providing the guarantee that their magnitude will not
exceed a given threshold value.
ℓ1 and ℓ2 regularization discourage the use of large weights, whereas
max-norm regularization prevents the magnitude of any weight from
exceeding a given threshold value.
REFERENCES
https://www.analyticsvidhya.com/blog/2022/08/dropout-regularization-in-deep-
learning/
https://machinelearningjourney.com/index.php/2021/01/15/max-norm-
regularization/#:~:text=Max%2Dnorm%20regularization%20is%20a,where%20r%20i
s%20a%20hyperparameter.

More Related Content

Similar to Unit-4 PART-4 Overfitting.pptx

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 

Similar to Unit-4 PART-4 Overfitting.pptx (20)

Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_B
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Issues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural NetsIssues while working with Multi Layered Perceptron and Deep Neural Nets
Issues while working with Multi Layered Perceptron and Deep Neural Nets
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural NetworksOptimal Load Shedding Using an Ensemble of Artifcial Neural Networks
Optimal Load Shedding Using an Ensemble of Artifcial Neural Networks
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
 
Deep Learning Basics.pptx
Deep Learning Basics.pptxDeep Learning Basics.pptx
Deep Learning Basics.pptx
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
Kira Dess
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
BalamuruganV28
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 

Recently uploaded (20)

Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 

Unit-4 PART-4 Overfitting.pptx

  • 2. DEFINE REGULARIZATION Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well.
  • 3. WHAT’S DROPOUT? In machine learning, “dropout” refers to the practice of ignoring certain nodes in a layer at random during training. A dropout is a regularization approach that prevents overfitting by ensuring that no units are codependent with one another. This is the one of the most interesting types of regularization techniques. It also produces very good results and is consequently the most frequently used regularization technique in the field of deep learning.
  • 4. DROPOUT REGULARIZATION • When you have training data, if you try to train your model too much, it might overfit, and when you get the actual test data for making predictions, it will not probably perform well. Dropout regularization is one technique used to tackle overfitting problems in deep learning. • That’s what we are going to look into in this blog, and we’ll go over some theories first, and then we’ll write python code using TensorFlow, and we’ll see how adding a dropout layer increases the performance of your neural network.
  • 5. WHAT IS A DROPOUT? • The term “dropout” refers to dropping out the nodes (input and hidden layer) in a neural network (as seen in Figure 1). All the forward and backwards connections with a dropped node are temporarily removed, thus creating a new network architecture out of the parent network. The nodes are dropped by a dropout probability of p. So what does dropout do? • At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections as shown below.
  • 6.
  • 7.
  • 8. Why dropout works? • By using dropout, in every iteration, you will work on a smaller neural network than the previous one and therefore, it approaches regularization. • Dropout helps in shrinking the squared norm of the weights and this tends to a reduction in overfitting.
  • 9. • Overfitting is avoided by training with two dropout layers and a dropout probability of 25%. However, this affects training accuracy, necessitating the training of a regularized network over a longer period. Leaving improves model generalization. Although the training accuracy is lower than that of the unregularized network, the total validation accuracy has improved. This explains why the generalization error has decreased. Why will dropout help with overfitting? • It can’t rely on one input as it might be randomly dropped out. • Neurons will not learn redundant details of inputs
  • 10. THE DRAWBACKS OF DROPOUT Although dropout is a potent tool, it has certain downsides. A dropout network may take 2-3 times longer to train than a normal network. Finding a regularizer virtually comparable to a dropout layer is one method to reap the benefits of dropout without slowing down training. This regularizer is a modified variant of L2 regularization for linear regression. An analogous regularizer for more complex models has yet to be discovered until that time when doubt drops out.
  • 11. 2. MAX-NORM REGULARIZATION Max-norm regularization is a regularization technique that constrains the weights of a neural network. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an L2 norm of at most r, where r is a hyperparameter.
  • 12. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an ℓ2 norm of at most r, where r is a max-norm hyperparameter. If this constraint is not satisfied, the weight vector is replaced by the unit vector in the same direction that has been scaled by r. This may be written as,
  • 13. Reducing r increases the amount of regularization and helps reduce overfitting. Maxnormvregularization can also help alleviate the vanishing/exploding gradients problems(if you are not using Batch Normalization).
  • 14. WHY IS MAX-NORM REGULARIZATION AN EFFECTIVE REGULARIZATION TECHNIQUE? Throughout the training process it is common for certain weights in the network to grow particularly large in order to fit specific examples in the training set. The presence of large weights in a network usually causes the network to produce large variations in output for small variations in input. As a result of this, large weights usually have an adverse effect on the generalization capabilities of a network. Large weights are generally characteristic of an overfitted model. Max-norm regularization constrains the values of the weights in the network. This prevents the network from making use of large weights in order to fit specific examples in the training set (at the expense of being able to effectively generalize).
  • 15. Max-norm is a somewhat more aggressive regularization technique than ℓ1 and ℓ2 regularization in preventing the use of large weights. These techniques add a penalty term to the loss function. This penalty term is a function of the networks weights, the ℓ1 and ℓ2 norm respectively. Unlike these techniques, max-norm regularization constrains the weights of the network, providing the guarantee that their magnitude will not exceed a given threshold value. ℓ1 and ℓ2 regularization discourage the use of large weights, whereas max-norm regularization prevents the magnitude of any weight from exceeding a given threshold value.