Regularization in deep learning

•

0 likes•159 views

VARUN KUMAR

Ridge and Lasso regression

Engineering

Outlines
1 Problem of Overfitting
2 Ridge Regression
Main idea behind Ridge regression
Working of Ridge regression
How it can solve the unsolvable?
3 Lasso Regression
4 References
Subject- Machine Learning Dr. Varun Kumar 2 / 10

Problem of Overfitting
In real world scenario, the model should performed well for unlabeled or
unknown data.
Overfitted model works well for training data only.
By increasing the model complexity testing data also works well but
underperform for very high complexity.
Subject- Machine Learning Dr. Varun Kumar 3 / 10

Ridge regression: (L2 regularization)
Main idea
⇒ In regression problem, we find the line that gives the minimum mean
square error (MSE) residual.
⇒ By changing the weight value, the MSE can also be reduced for the
test data.
Subject- Machine Learning Dr. Varun Kumar 4 / 10

Working of Ridge regression
⇒ Red dots are the training point.
⇒ Green dots are the testing point.
⇒ Residual error is zero for training model but not for the testing data.
⇒ Given red line (training model) has high variance and zero bias.
Subject- Machine Learning Dr. Varun Kumar 5 / 10

Continued–
⇒ We introduce a small bias, so that MSE for test data could be
minimized.
⇒ From above figure
size = y − axis intercept + Slope × weight
⇒ From Ridge regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × Θ2
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞
Subject- Machine Learning Dr. Varun Kumar 6 / 10

Example–
♦ Case 1: Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69
→ High variance
♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒
MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance
♦ Ridge regression lines are less sensitive to weight rather then least
square line under given penalty.
Subject- Machine Learning Dr. Varun Kumar 7 / 10

Penalty factor λ
⇒ MSE = 1
m
Pn+m
i=n+1(yi − ˆ
f (xi ))2 + λ × Θ2 → Ridge regression
⇒ Higher the slope (Θ) → Size is more sensitive to the weight from
above figure.
⇒ Penalty= λ × Θ2
If λ = 0, Penalty=0
If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1.
If λ → ∞, under constant penalty, Slope → 0.
⇒ For very large λ, size becomes insensitive to the weight from above
figure.
Note: From above discussion Ridge regression has been discussed (size vs
weight), which is continuous variable. It can also be applicable for discrete
variable.
Subject- Machine Learning Dr. Varun Kumar 8 / 10

Lasso regression: L1 regularization
It is similar to the Ridge regression, but there are some differences
From Lasso regression
MSE =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
+ λ × |Θ|
(a) Θ → Slope of st line.
(b) λ → Penalty factor, 0 < λ < ∞
Difference between Ridge and Lasso regression
Lasso regression can exclude the useless variable from the equation.
Ridge regression include the variable in the equation, which are more
important.
Subject- Machine Learning Dr. Varun Kumar 9 / 10

References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Subject- Machine Learning Dr. Varun Kumar 10 / 10

What's hot

Linear regressionMartinHogg9

Support Vector Machines ( SVM ) Mohammad Junaid Khan

Rough K Means - Numerical ExampleDr.E.N.Sathishkumar

Cache memoryAnuj Modi

Support Vector Machinesnextlib

Bias and variance trade offVARUN KUMAR

Ensemble learningMustafa Sherazi

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

Rabin Karp - String Matching AlgorithmSyed Owais Ali Chishti

Theory of automata and formal languageRabia Khalid

Query decomposition in data baseSalman Memon

Knn Algorithm presentationRishavSharma112

2.3 bayesian classificationKrish_ver2

PAC LearningSanghyuk Chun

Bagging.pptxComsatsSahiwal1

Decision Tree - C4.5&CARTXueping Peng

Region Splitting and Merging Technique For Image segmentation.SomitSamanto1

Dbscan algorithomMahbubur Rahman Shimul

K Nearest NeighborsTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Apriori AlgorithmInternational School of Engineering

What's hot (20)

Linear regression

Support Vector Machines ( SVM )

Rough K Means - Numerical Example

Cache memory

Support Vector Machines

Bias and variance trade off

Ensemble learning

K - Nearest neighbor ( KNN )

Rabin Karp - String Matching Algorithm

Theory of automata and formal language

Query decomposition in data base

Knn Algorithm presentation

2.3 bayesian classification

PAC Learning

Bagging.pptx

Decision Tree - C4.5&CART

Region Splitting and Merging Technique For Image segmentation.

Dbscan algorithom

K Nearest Neighbors

Apriori Algorithm

Similar to Regularization in deep learning

Lasso and ridge regressionSreerajVA

14 ch ken black solutionKrunal Shah

Finite Element MethodsDr.Vikas Deulgaonkar

Linear RegressionVARUN KUMAR

Get Multiple Regression Assignment Help HelpWithAssignment.com

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra

Reweighting and Boosting to uniforimty in HEParogozhnikov

Simplex method material for operation .pptxbizuayehuadmasu1

APLICACIONES DE LA DERIVADA EN LA CARRERA DE (Mecánica, Electrónica, Telecomu...WILIAMMAURICIOCAHUAT1

Numerical analysis dual, primal, revised simplexSHAMJITH KM

Chapter 4 Simplex Method pptDereje Tigabu

Multinomial Logistic Regression AnalysisHARISH Kumar H R

Supervised Learning.pdfgadissaassefa

Artificial Neural Networks Deep Learning ReportLisa Muthukumar

ch02.pdfHaneenWaleed3

Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Abolfazl Asudeh

Regularization and variable selection via elastic netKyusonLim

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)

Bias-Variance_relted_to_ML.pdfVGaneshKarthikeyan

Sparsenetndronen

Similar to Regularization in deep learning (20)

Lasso and ridge regression

14 ch ken black solution

Finite Element Methods

Linear Regression

Get Multiple Regression Assignment Help

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University

Reweighting and Boosting to uniforimty in HEP

Simplex method material for operation .pptx

APLICACIONES DE LA DERIVADA EN LA CARRERA DE (Mecánica, Electrónica, Telecomu...

Numerical analysis dual, primal, revised simplex

Chapter 4 Simplex Method ppt

Multinomial Logistic Regression Analysis

Supervised Learning.pdf

Artificial Neural Networks Deep Learning Report

ch02.pdf

Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...

Regularization and variable selection via elastic net

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...

Bias-Variance_relted_to_ML.pdf

Sparsenet

Recently uploaded

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

Extrusion Processes and Their Limitations120cr0395

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

Introduction and different types of Ethernet.pptxupamatechverse

result management system report for college projectTonystark477637

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Recently uploaded (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Processing & Properties of Floor and Wall Tiles.pptx

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Microscopic Analysis of Ceramic Materials.pptx

Extrusion Processes and Their Limitations

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

Introduction and different types of Ethernet.pptx

result management system report for college project

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Regularization in deep learning

1. L2 and L1 Regularization/ Ridge and Lasso Regression Subject- Machine Learning Dr. Varun Kumar Subject- Machine Learning Dr. Varun Kumar 1 / 10

2. Outlines 1 Problem of Overfitting 2 Ridge Regression Main idea behind Ridge regression Working of Ridge regression How it can solve the unsolvable? 3 Lasso Regression 4 References Subject- Machine Learning Dr. Varun Kumar 2 / 10

3. Problem of Overfitting In real world scenario, the model should performed well for unlabeled or unknown data. Overfitted model works well for training data only. By increasing the model complexity testing data also works well but underperform for very high complexity. Subject- Machine Learning Dr. Varun Kumar 3 / 10

4. Ridge regression: (L2 regularization) Main idea ⇒ In regression problem, we find the line that gives the minimum mean square error (MSE) residual. ⇒ By changing the weight value, the MSE can also be reduced for the test data. Subject- Machine Learning Dr. Varun Kumar 4 / 10

5. Working of Ridge regression ⇒ Red dots are the training point. ⇒ Green dots are the testing point. ⇒ Residual error is zero for training model but not for the testing data. ⇒ Given red line (training model) has high variance and zero bias. Subject- Machine Learning Dr. Varun Kumar 5 / 10

6. Continued– ⇒ We introduce a small bias, so that MSE for test data could be minimized. ⇒ From above figure size = y − axis intercept + Slope × weight ⇒ From Ridge regression MSE = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 + λ × Θ2 (a) Θ → Slope of st line. (b) λ → Penalty factor, 0 < λ < ∞ Subject- Machine Learning Dr. Varun Kumar 6 / 10

7. Example– ♦ Case 1: Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69 → High variance ♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒ MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance ♦ Ridge regression lines are less sensitive to weight rather then least square line under given penalty. Subject- Machine Learning Dr. Varun Kumar 7 / 10

8. Penalty factor λ ⇒ MSE = 1 m Pn+m i=n+1(yi − ˆ f (xi ))2 + λ × Θ2 → Ridge regression ⇒ Higher the slope (Θ) → Size is more sensitive to the weight from above figure. ⇒ Penalty= λ × Θ2 If λ = 0, Penalty=0 If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1. If λ → ∞, under constant penalty, Slope → 0. ⇒ For very large λ, size becomes insensitive to the weight from above figure. Note: From above discussion Ridge regression has been discussed (size vs weight), which is continuous variable. It can also be applicable for discrete variable. Subject- Machine Learning Dr. Varun Kumar 8 / 10

9. Lasso regression: L1 regularization It is similar to the Ridge regression, but there are some differences From Lasso regression MSE = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 + λ × |Θ| (a) Θ → Slope of st line. (b) λ → Penalty factor, 0 < λ < ∞ Difference between Ridge and Lasso regression Lasso regression can exclude the useless variable from the equation. Ridge regression include the variable in the equation, which are more important. Subject- Machine Learning Dr. Varun Kumar 9 / 10

10. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. Subject- Machine Learning Dr. Varun Kumar 10 / 10

Regularization in deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Regularization in deep learning

Similar to Regularization in deep learning (20)

More from VARUN KUMAR

More from VARUN KUMAR (20)

Recently uploaded

Recently uploaded (20)

Regularization in deep learning