L2 and L1 Regularization/ Ridge and Lasso Regression
Subject- Machine Learning
Dr. Varun Kumar
Subject- Machine Learning Dr. Varun Kumar 1 / 10
Outlines
1 Problem of Overfitting
2 Ridge Regression
Main idea behind Ridge regression
Working of Ridge regression
How it can solve the unsolvable?
3 Lasso Regression
4 References
Subject- Machine Learning Dr. Varun Kumar 2 / 10
Problem of Overfitting
In real world scenario, the model should performed well for unlabeled or
unknown data.
Overfitted model works well for training data only.
By increasing the model complexity testing data also works well but
underperform for very high complexity.
Subject- Machine Learning Dr. Varun Kumar 3 / 10
Ridge regression: (L2 regularization)
Main idea
⇒ In regression problem, we find the line that gives the minimum mean
square error (MSE) residual.
⇒ By changing the weight value, the MSE can also be reduced for the
test data.
Subject- Machine Learning Dr. Varun Kumar 4 / 10
Working of Ridge regression
⇒ Red dots are the training point.
⇒ Green dots are the testing point.
⇒ Residual error is zero for training model but not for the testing data.
⇒ Given red line (training model) has high variance and zero bias.
Subject- Machine Learning Dr. Varun Kumar 5 / 10
Continued–
⇒ We introduce a small bias, so that MSE for test data could be
minimized.
⇒ From above figure
size = y − axis intercept + Slope × weight
⇒ From Ridge regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × Θ2
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞
Subject- Machine Learning Dr. Varun Kumar 6 / 10
Example–
♦ Case 1: Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69
→ High variance
♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒
MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance
♦ Ridge regression lines are less sensitive to weight rather then least
square line under given penalty.
Subject- Machine Learning Dr. Varun Kumar 7 / 10
Penalty factor λ
⇒ MSE = 1
m
Pn+m
i=n+1(yi − ˆ
f (xi ))2 + λ × Θ2 → Ridge regression
⇒ Higher the slope (Θ) → Size is more sensitive to the weight from
above figure.
⇒ Penalty= λ × Θ2
If λ = 0, Penalty=0
If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1.
If λ → ∞, under constant penalty, Slope → 0.
⇒ For very large λ, size becomes insensitive to the weight from above
figure.
Note: From above discussion Ridge regression has been discussed (size vs
weight), which is continuous variable. It can also be applicable for discrete
variable.
Subject- Machine Learning Dr. Varun Kumar 8 / 10
Lasso regression: L1 regularization
It is similar to the Ridge regression, but there are some differences
From Lasso regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × |Θ|
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞
Difference between Ridge and Lasso regression
Lasso regression can exclude the useless variable from the equation.
Ridge regression include the variable in the equation, which are more
important.
Subject- Machine Learning Dr. Varun Kumar 9 / 10
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Subject- Machine Learning Dr. Varun Kumar 10 / 10

Regularization in deep learning

  • 1.
    L2 and L1Regularization/ Ridge and Lasso Regression Subject- Machine Learning Dr. Varun Kumar Subject- Machine Learning Dr. Varun Kumar 1 / 10
  • 2.
    Outlines 1 Problem ofOverfitting 2 Ridge Regression Main idea behind Ridge regression Working of Ridge regression How it can solve the unsolvable? 3 Lasso Regression 4 References Subject- Machine Learning Dr. Varun Kumar 2 / 10
  • 3.
    Problem of Overfitting Inreal world scenario, the model should performed well for unlabeled or unknown data. Overfitted model works well for training data only. By increasing the model complexity testing data also works well but underperform for very high complexity. Subject- Machine Learning Dr. Varun Kumar 3 / 10
  • 4.
    Ridge regression: (L2regularization) Main idea ⇒ In regression problem, we find the line that gives the minimum mean square error (MSE) residual. ⇒ By changing the weight value, the MSE can also be reduced for the test data. Subject- Machine Learning Dr. Varun Kumar 4 / 10
  • 5.
    Working of Ridgeregression ⇒ Red dots are the training point. ⇒ Green dots are the testing point. ⇒ Residual error is zero for training model but not for the testing data. ⇒ Given red line (training model) has high variance and zero bias. Subject- Machine Learning Dr. Varun Kumar 5 / 10
  • 6.
    Continued– ⇒ We introducea small bias, so that MSE for test data could be minimized. ⇒ From above figure size = y − axis intercept + Slope × weight ⇒ From Ridge regression MSE = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 + λ × Θ2 (a) Θ → Slope of st line. (b) λ → Penalty factor, 0 < λ < ∞ Subject- Machine Learning Dr. Varun Kumar 6 / 10
  • 7.
    Example– ♦ Case 1:Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69 → High variance ♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒ MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance ♦ Ridge regression lines are less sensitive to weight rather then least square line under given penalty. Subject- Machine Learning Dr. Varun Kumar 7 / 10
  • 8.
    Penalty factor λ ⇒MSE = 1 m Pn+m i=n+1(yi − ˆ f (xi ))2 + λ × Θ2 → Ridge regression ⇒ Higher the slope (Θ) → Size is more sensitive to the weight from above figure. ⇒ Penalty= λ × Θ2 If λ = 0, Penalty=0 If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1. If λ → ∞, under constant penalty, Slope → 0. ⇒ For very large λ, size becomes insensitive to the weight from above figure. Note: From above discussion Ridge regression has been discussed (size vs weight), which is continuous variable. It can also be applicable for discrete variable. Subject- Machine Learning Dr. Varun Kumar 8 / 10
  • 9.
    Lasso regression: L1regularization It is similar to the Ridge regression, but there are some differences From Lasso regression MSE = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 + λ × |Θ| (a) Θ → Slope of st line. (b) λ → Penalty factor, 0 < λ < ∞ Difference between Ridge and Lasso regression Lasso regression can exclude the useless variable from the equation. Ridge regression include the variable in the equation, which are more important. Subject- Machine Learning Dr. Varun Kumar 9 / 10
  • 10.
    References E. Alpaydin, Introductionto machine learning. MIT press, 2020. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. Subject- Machine Learning Dr. Varun Kumar 10 / 10