SlideShare a Scribd company logo
1 of 9
Download to read offline
Regularization in Machine
Learning
Over the period of time, we came across various articles that discussed the merits,
demerits, applications as well and numerous algorithms in the domain of Machine
Learning. The striking feature of all these articles was that each article spoke about
the importance of avoiding overfitting. You might have encountered overfitting in
real senses when your Machine Learning proved exceptionally well on the training
data.
However, it lagged a great deal when you tested it with unseen data sets. These
situations degrade the overall performance of your model, leading to inaccuracy and
decreased efficiency. In an attempt to avoid overfitting, data scientists make use of a
technique called Regularization in machine learning.
In this PythonGeeks article, we will discuss this technique known as regularization.
We will walk through the various aspects of regularization, its need, working, and
type of regularization. We will try to observe the difference between the two types of
regularization in brief. Towards the end, we will have a discussion over the
achievement of regularization. So, without further delay, let’s dive straight into the
article to know more about regularization.
Why Regularization?
We may encounter certain situations where our Machine learning model operates
well on the training data but does not cross the mark when we use it on the unseen
or test data. It indicates to the problem that the model is not able to serve its purpose
of predicting the output or target column for the unseen data by introducing noise in
the output. This makes the model highly inaccurate and we call such kinds of models
“overfitted” models.
Let us try to understand what exactly we want to say with the term noise in the
output. The term noise refers to those data points in the dataset that don’t really
represent the true properties of our data but exist in the dataset only due to a
random chance. To put it simply, noise is the redundant data points of the data set
which do not contribute significantly to the dataset and just produce errors in the
final output.
So, to overcome the problem of overfitting, in other words, to avoid the interference
of noise, we take the help of regularization techniques while deploying any Machine
Learning model.
What is Regularization?
Regularization is one of the most crucial concepts of machine learning. To put it
simply, it is a technique to prevent the machine learning model from overfitting by
taking preventive measures like adding extra information to the dataset.
We have already seen that the overfitting problem occurs when the machine learning
model performs well with the training data but it is not able to produce the expected
output with the test data. It means the model is not able to accurately predict the
output when it has to deal with unseen or test data by the introduction of noise in the
output, and hence we call the model is overfitted. We can avoid this problem with
the help of regularization techniques to get accurate results.
This technique mainly focuses on maintaining all the crucial variables or features in
the model by reducing the magnitude of the variables to avoid noise. Hence, it is able
to maintain accuracy along with a generalization of the model and avoid overfitting
conditions.
The technique tends to regularize or reduce the coefficient of features towards a null
value. To put it in a rather comprehensive way, we try to reduce the magnitude of the
features by keeping the same number of features in the regularization technique in
an attempt to avoid overfitting.
What is the Regularization Parameter in
Machine Learning?
For the datasets consisting of linear regression, regularization consists of two main
parameters, namely, Ordinary Least Square (OLS) and the penalty term.
We can define the loss function of the regression in regularization by:
Loss function (regularization) = loss function (OLS) + penalty.
In the above equation, we can find penalty using:
Penalty = λ * w
Here, λ is the regularization parameter whereas w is the weight associated with the
regularization.
The regularization parameter in machine learning is λ and has
the following features:
■ It tries to impose a higher penalty on the variable having higher values,
and hence, it controls the strength of the penalty term of the linear
regression.
■ This is a tuning parameter that controls the bias-variance trade-off.
■ λ can take values ranging from 0 to infinity
■ If λ = 0, then it means there is no difference between a model with and
without regularization and helps to find out if the model is overfitted or
not.
How does Regularization work?
As we have seen in the above sections, Regularization tends to serve its purpose by
adding a penalty or complexity term to the complex model in an attempt to avoid
overfitting. Let us try to understand this by considering the simple linear regression
equation:
y= β0+β1×1+β2×2+β3×3+⋯+βnxn +b
In the above-mentioned equation, Y represents the value that the model needs to
predict as the output.
X1, X2, …Xn represents the features of the value Y.
β0, β1……βn represents the weights or magnitude associated with the features of the
value Y, respectively. Here, b represents the intercept of the linear regression
equation.
The main focus of the Linear regression models is to try to optimize the β0 and b in
an attempt to minimize the cost function of the model. We can represent the
equation for the cost function for the linear model with the help of the following
equation:
Furthermore, we will add a loss function and try to optimize parameters in an
attempt to make a model that can predict the accurate value of Y. We can call the loss
function for the linear regression as RSS or Residual sum of squares.
Regularization Techniques in Machine
Learning
We can further divide regularization into parts, namely, Lasso Regression and Ridge
Regression
1. Lasso Regression
Lasso regression is a type of regularization technique that aims to reduce the
complexity of the overfitted model. It is the acronym for Least Absolute and
Selection Operator. It works in a similar fashion as compared to the Ridge
Regression except for the fact that the penalty term in Lasso regression contains only
the absolute weights instead of considering a square of weights.
Due to the fact that it takes absolute values, it is able to shrink the slope to 0,
whereas Ridge Regression can only shrink it to a near to 0 value. It is also known by
the term L1 regularization. We can represent the equation for the cost function of
Lasso regression by:
We majorly neglect some of the features associated with this technique for model
evaluation. Due to this reason, the Lasso regression can help us to reduce the
overfitting in the model along with the feature selection benefit.
Limitations of Lasso Regression
a. Problematic to handle some types of Datasets: If the number of predictors for any
given dataset is greater than the number of data points, the Lasso regression
technique will pick at most n predictors as non-zero, even if all predictors are
relevant, hence being inefficient in handling the predictors.
b. Multicollinearity Problem: If there exist two or more highly collinear variables in
the dataset, then LASSO regression chooses one of them randomly which is not good
for the interpretation of our model since we need to consider the most appropriate
variables for the accuracy of the results.
2. Ridge Regression
Ridge regression is one of the types of linear regression in which we introduce a
small amount of bias so that we can get better long-term prediction results for the
model. It is a type of regularization technique, which we use to reduce the complexity
of the model in an attempt to avoid overfitting. We also know this regularization
technique by the name of L2 regularization.
In this technique, we tend to alter the cost function by adding the penalty term to the
cost function. The amount of bias that we add to the model is called the Ridge
Regression penalty of that particular model. We can calculate Ridge Regression
Penalty by multiplying it with the lambda to the squared weight of each individual
feature present in the dataset. We can represent the equation for the cost function in
ridge regression by:
In the above equation, the penalty term tends to regularize the coefficients of the
model, which in turn helps the ridge regression to reduce the amplitudes of the
coefficients that decrease the complexity of the overfitted model.
As we can see from the above equation, if the values of λ tend to be zero or try to
become null, then the equation represents the cost function of the linear regression
model. Hence, for the minimum value of λ, where it tends to be a null value, the
model will tend to resemble the linear regression model.
As we already know, generally linear or polynomial regression is bound to fail if
there is high collinearity between the independent variables of the dataset. In an
attempt to solve such problems, we can make use of the Ridge regression. It helps us
to solve the problems our dataset comprises more parameters than the samples
present in the dataset.
Usage of Ridge Regression
a. When we encounter a dataset that has independent variables which are having
high collinearity (problem of multicollinearity) between them, general linear or
polynomial regression will fail to tackle such problems. We can make use of Ridge
regression to make the model free of the overfitting conditions.
b. If we have more parameters associated with the data points than the samples
present in the dataset, then Ridge regression helps the model to solve the problems
of overfitting.
Limitations of Ridge Regression
a. Does not help in Feature Selection: Though ridge regression decreases the
complexity of a model, it does not reduce the number of independent variables. This
happens because it never leads to a coefficient being zero rather it only minimizes
the coefficient of the variables. Hence, cannot make use of this technique as this
technique is not good for feature selection.
b. Model Interpretability: The major disadvantage of this technique is model
interpretability since it will shrink the coefficients for least important predictors,
very close to zero but it will never make them exactly zero and hence it does not tend
to nullify the overfitting conditions. In simple words, the final model will include all
the independent variables, also known as predictors and thus the final output will
also have noise to some extent.
Key differences between Ridge and Lasso
Regularization
a. We mostly make use of Ridge regression to reduce the overfitting in the model. It
includes all the features present in the overfitted model. It reduces the complexity of
the model by shrinking the coefficients unlike, Lasso regression.
b. Lasso regression helps to reduce the overfitting in the model along with the
benefit of feature selection.
What does Regularization in Machine
Learning achieve?
As we have studied earlier, a standard least-squares model tends to have some
variance present in it. It indicates that this model won’t be able to generalize well for
a data set that is different from its training data.
Regularization significantly reduces the variance of the model by adding necessary
variables, without a substantial increase in its bias. As a result of this, the tuning
parameter λ, that we use in the regularization techniques described in the above
equation, controls the impact on bias and variance of the given dataset.
As the value of λ rises, it reduces the value of coefficients. This results in the
reduction of the variance. To a point, this increase in λ proves beneficial for the
model as it is only reducing the variance resulting in avoidance of overfitting,
without losing any important properties in the data. However, after attaining a
certain value, the model starts losing important properties, giving rise to bias in the
model and thus resulting in the creation of an overfitted model. Therefore, we need
to carefully select the value of λ to properly serve its purpose.
Conclusion
With this, we have reached the end of this article which talked about the basics of
regularization in machine learning. We came across the need for regularization as
well as the working of the regularization techniques. We also came across the various
regularization techniques, their limitations as well as their uses in machine learning.
Hope that this article from PythonGeeks was able to solve all your queries regarding
Regularization and its need.

More Related Content

Similar to Machine Learning Regularization Guide

HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxMr. Moms
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Machine Learning Interview Question and Answer
Machine Learning Interview Question and AnswerMachine Learning Interview Question and Answer
Machine Learning Interview Question and AnswerLearnbay Datascience
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdfjuan631
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regressionSreerajVA
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introductionThe IOT Academy
 
Penalized Logistic Regression methods .pptx
Penalized Logistic Regression methods .pptxPenalized Logistic Regression methods .pptx
Penalized Logistic Regression methods .pptxhamdy53
 

Similar to Machine Learning Regularization Guide (20)

HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Machine Learning Interview Question and Answer
Machine Learning Interview Question and AnswerMachine Learning Interview Question and Answer
Machine Learning Interview Question and Answer
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
 
Penalized Logistic Regression methods .pptx
Penalized Logistic Regression methods .pptxPenalized Logistic Regression methods .pptx
Penalized Logistic Regression methods .pptx
 

More from SudhanshiBakre1

Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdfSudhanshiBakre1
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfSudhanshiBakre1
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdfSudhanshiBakre1
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdfSudhanshiBakre1
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfSudhanshiBakre1
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdfSudhanshiBakre1
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdfSudhanshiBakre1
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdfSudhanshiBakre1
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfSudhanshiBakre1
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfSudhanshiBakre1
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfSudhanshiBakre1
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfSudhanshiBakre1
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfSudhanshiBakre1
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfSudhanshiBakre1
 

More from SudhanshiBakre1 (20)

IoT Security.pdf
IoT Security.pdfIoT Security.pdf
IoT Security.pdf
 
Top Java Frameworks.pdf
Top Java Frameworks.pdfTop Java Frameworks.pdf
Top Java Frameworks.pdf
 
Numpy ndarrays.pdf
Numpy ndarrays.pdfNumpy ndarrays.pdf
Numpy ndarrays.pdf
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdf
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdf
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdf
 
Node.js with MySQL.pdf
Node.js with MySQL.pdfNode.js with MySQL.pdf
Node.js with MySQL.pdf
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdf
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdf
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdf
 
Streams in Node .pdf
Streams in Node .pdfStreams in Node .pdf
Streams in Node .pdf
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdf
 
RESTful API in Node.pdf
RESTful API in Node.pdfRESTful API in Node.pdf
RESTful API in Node.pdf
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdf
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdf
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdf
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdf
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Machine Learning Regularization Guide

  • 1. Regularization in Machine Learning Over the period of time, we came across various articles that discussed the merits, demerits, applications as well and numerous algorithms in the domain of Machine Learning. The striking feature of all these articles was that each article spoke about the importance of avoiding overfitting. You might have encountered overfitting in real senses when your Machine Learning proved exceptionally well on the training data. However, it lagged a great deal when you tested it with unseen data sets. These situations degrade the overall performance of your model, leading to inaccuracy and decreased efficiency. In an attempt to avoid overfitting, data scientists make use of a technique called Regularization in machine learning. In this PythonGeeks article, we will discuss this technique known as regularization. We will walk through the various aspects of regularization, its need, working, and type of regularization. We will try to observe the difference between the two types of regularization in brief. Towards the end, we will have a discussion over the achievement of regularization. So, without further delay, let’s dive straight into the article to know more about regularization. Why Regularization? We may encounter certain situations where our Machine learning model operates well on the training data but does not cross the mark when we use it on the unseen or test data. It indicates to the problem that the model is not able to serve its purpose of predicting the output or target column for the unseen data by introducing noise in
  • 2. the output. This makes the model highly inaccurate and we call such kinds of models “overfitted” models. Let us try to understand what exactly we want to say with the term noise in the output. The term noise refers to those data points in the dataset that don’t really represent the true properties of our data but exist in the dataset only due to a random chance. To put it simply, noise is the redundant data points of the data set which do not contribute significantly to the dataset and just produce errors in the final output. So, to overcome the problem of overfitting, in other words, to avoid the interference of noise, we take the help of regularization techniques while deploying any Machine Learning model. What is Regularization? Regularization is one of the most crucial concepts of machine learning. To put it simply, it is a technique to prevent the machine learning model from overfitting by taking preventive measures like adding extra information to the dataset. We have already seen that the overfitting problem occurs when the machine learning model performs well with the training data but it is not able to produce the expected output with the test data. It means the model is not able to accurately predict the output when it has to deal with unseen or test data by the introduction of noise in the output, and hence we call the model is overfitted. We can avoid this problem with the help of regularization techniques to get accurate results. This technique mainly focuses on maintaining all the crucial variables or features in the model by reducing the magnitude of the variables to avoid noise. Hence, it is able
  • 3. to maintain accuracy along with a generalization of the model and avoid overfitting conditions. The technique tends to regularize or reduce the coefficient of features towards a null value. To put it in a rather comprehensive way, we try to reduce the magnitude of the features by keeping the same number of features in the regularization technique in an attempt to avoid overfitting. What is the Regularization Parameter in Machine Learning? For the datasets consisting of linear regression, regularization consists of two main parameters, namely, Ordinary Least Square (OLS) and the penalty term. We can define the loss function of the regression in regularization by: Loss function (regularization) = loss function (OLS) + penalty. In the above equation, we can find penalty using: Penalty = λ * w Here, λ is the regularization parameter whereas w is the weight associated with the regularization. The regularization parameter in machine learning is λ and has the following features: ■ It tries to impose a higher penalty on the variable having higher values, and hence, it controls the strength of the penalty term of the linear regression. ■ This is a tuning parameter that controls the bias-variance trade-off.
  • 4. ■ λ can take values ranging from 0 to infinity ■ If λ = 0, then it means there is no difference between a model with and without regularization and helps to find out if the model is overfitted or not. How does Regularization work? As we have seen in the above sections, Regularization tends to serve its purpose by adding a penalty or complexity term to the complex model in an attempt to avoid overfitting. Let us try to understand this by considering the simple linear regression equation: y= β0+β1×1+β2×2+β3×3+⋯+βnxn +b In the above-mentioned equation, Y represents the value that the model needs to predict as the output. X1, X2, …Xn represents the features of the value Y. β0, β1……βn represents the weights or magnitude associated with the features of the value Y, respectively. Here, b represents the intercept of the linear regression equation. The main focus of the Linear regression models is to try to optimize the β0 and b in an attempt to minimize the cost function of the model. We can represent the equation for the cost function for the linear model with the help of the following equation:
  • 5. Furthermore, we will add a loss function and try to optimize parameters in an attempt to make a model that can predict the accurate value of Y. We can call the loss function for the linear regression as RSS or Residual sum of squares. Regularization Techniques in Machine Learning We can further divide regularization into parts, namely, Lasso Regression and Ridge Regression 1. Lasso Regression Lasso regression is a type of regularization technique that aims to reduce the complexity of the overfitted model. It is the acronym for Least Absolute and Selection Operator. It works in a similar fashion as compared to the Ridge Regression except for the fact that the penalty term in Lasso regression contains only the absolute weights instead of considering a square of weights. Due to the fact that it takes absolute values, it is able to shrink the slope to 0, whereas Ridge Regression can only shrink it to a near to 0 value. It is also known by the term L1 regularization. We can represent the equation for the cost function of Lasso regression by:
  • 6. We majorly neglect some of the features associated with this technique for model evaluation. Due to this reason, the Lasso regression can help us to reduce the overfitting in the model along with the feature selection benefit. Limitations of Lasso Regression a. Problematic to handle some types of Datasets: If the number of predictors for any given dataset is greater than the number of data points, the Lasso regression technique will pick at most n predictors as non-zero, even if all predictors are relevant, hence being inefficient in handling the predictors. b. Multicollinearity Problem: If there exist two or more highly collinear variables in the dataset, then LASSO regression chooses one of them randomly which is not good for the interpretation of our model since we need to consider the most appropriate variables for the accuracy of the results. 2. Ridge Regression Ridge regression is one of the types of linear regression in which we introduce a small amount of bias so that we can get better long-term prediction results for the model. It is a type of regularization technique, which we use to reduce the complexity of the model in an attempt to avoid overfitting. We also know this regularization technique by the name of L2 regularization. In this technique, we tend to alter the cost function by adding the penalty term to the cost function. The amount of bias that we add to the model is called the Ridge Regression penalty of that particular model. We can calculate Ridge Regression Penalty by multiplying it with the lambda to the squared weight of each individual feature present in the dataset. We can represent the equation for the cost function in ridge regression by:
  • 7. In the above equation, the penalty term tends to regularize the coefficients of the model, which in turn helps the ridge regression to reduce the amplitudes of the coefficients that decrease the complexity of the overfitted model. As we can see from the above equation, if the values of λ tend to be zero or try to become null, then the equation represents the cost function of the linear regression model. Hence, for the minimum value of λ, where it tends to be a null value, the model will tend to resemble the linear regression model. As we already know, generally linear or polynomial regression is bound to fail if there is high collinearity between the independent variables of the dataset. In an attempt to solve such problems, we can make use of the Ridge regression. It helps us to solve the problems our dataset comprises more parameters than the samples present in the dataset. Usage of Ridge Regression a. When we encounter a dataset that has independent variables which are having high collinearity (problem of multicollinearity) between them, general linear or polynomial regression will fail to tackle such problems. We can make use of Ridge regression to make the model free of the overfitting conditions. b. If we have more parameters associated with the data points than the samples present in the dataset, then Ridge regression helps the model to solve the problems of overfitting. Limitations of Ridge Regression
  • 8. a. Does not help in Feature Selection: Though ridge regression decreases the complexity of a model, it does not reduce the number of independent variables. This happens because it never leads to a coefficient being zero rather it only minimizes the coefficient of the variables. Hence, cannot make use of this technique as this technique is not good for feature selection. b. Model Interpretability: The major disadvantage of this technique is model interpretability since it will shrink the coefficients for least important predictors, very close to zero but it will never make them exactly zero and hence it does not tend to nullify the overfitting conditions. In simple words, the final model will include all the independent variables, also known as predictors and thus the final output will also have noise to some extent. Key differences between Ridge and Lasso Regularization a. We mostly make use of Ridge regression to reduce the overfitting in the model. It includes all the features present in the overfitted model. It reduces the complexity of the model by shrinking the coefficients unlike, Lasso regression. b. Lasso regression helps to reduce the overfitting in the model along with the benefit of feature selection. What does Regularization in Machine Learning achieve? As we have studied earlier, a standard least-squares model tends to have some variance present in it. It indicates that this model won’t be able to generalize well for a data set that is different from its training data.
  • 9. Regularization significantly reduces the variance of the model by adding necessary variables, without a substantial increase in its bias. As a result of this, the tuning parameter λ, that we use in the regularization techniques described in the above equation, controls the impact on bias and variance of the given dataset. As the value of λ rises, it reduces the value of coefficients. This results in the reduction of the variance. To a point, this increase in λ proves beneficial for the model as it is only reducing the variance resulting in avoidance of overfitting, without losing any important properties in the data. However, after attaining a certain value, the model starts losing important properties, giving rise to bias in the model and thus resulting in the creation of an overfitted model. Therefore, we need to carefully select the value of λ to properly serve its purpose. Conclusion With this, we have reached the end of this article which talked about the basics of regularization in machine learning. We came across the need for regularization as well as the working of the regularization techniques. We also came across the various regularization techniques, their limitations as well as their uses in machine learning. Hope that this article from PythonGeeks was able to solve all your queries regarding Regularization and its need.