Regularization in deep learning

L2 and L1 Regularization/ Ridge and Lasso Regression
Subject- Machine Learning
Dr. Varun Kumar
Subject- Machine Learning Dr. Varun Kumar 1 / 10

Outlines
1 Problem of Overfitting
2 Ridge Regression
Main idea behind Ridge regression
Working of Ridge regression
How it can solve the unsolvable?
3 Lasso Regression
4 References

Problem of Overfitting
In real world scenario, the model should performed well for unlabeled or
unknown data.
Overfitted model works well for training data only.
By increasing the model complexity testing data also works well but
underperform for very high complexity.

Ridge regression: (L2 regularization)
Main idea
⇒ In regression problem, we find the line that gives the minimum mean
square error (MSE) residual.
⇒ By changing the weight value, the MSE can also be reduced for the
test data.

Working of Ridge regression
⇒ Red dots are the training point.
⇒ Green dots are the testing point.
⇒ Residual error is zero for training model but not for the testing data.
⇒ Given red line (training model) has high variance and zero bias.

Continued–
⇒ We introduce a small bias, so that MSE for test data could be
minimized.
⇒ From above figure
size = y − axis intercept + Slope × weight
⇒ From Ridge regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × Θ2
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞

Example–
♦ Case 1: Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69
→ High variance
♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒
MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance
♦ Ridge regression lines are less sensitive to weight rather then least
square line under given penalty.

Penalty factor λ
⇒ MSE = 1
m
Pn+m
i=n+1(yi − ˆ
f (xi ))2 + λ × Θ2 → Ridge regression
⇒ Higher the slope (Θ) → Size is more sensitive to the weight from
above figure.
⇒ Penalty= λ × Θ2
If λ = 0, Penalty=0
If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1.
If λ → ∞, under constant penalty, Slope → 0.
⇒ For very large λ, size becomes insensitive to the weight from above
figure.
Note: From above discussion Ridge regression has been discussed (size vs
weight), which is continuous variable. It can also be applicable for discrete
variable.

Lasso regression: L1 regularization
It is similar to the Ridge regression, but there are some differences
From Lasso regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × |Θ|
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞
Difference between Ridge and Lasso regression
Lasso regression can exclude the useless variable from the equation.
Ridge regression include the variable in the equation, which are more
important.

References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.

Regularization in deep learning

More Related Content

What's hot

Similar to Regularization in deep learning

More from VARUN KUMAR

Recently uploaded

Regularization in deep learning