SlideShare a Scribd company logo
1 of 13
Download to read offline
Bias Variance Trade-off in Machine Learning
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar Machine Learning 1 / 13
Outlines
1 Introduction to Under Fitting and Over Fitting
2 Introduction to Bias and Variance
3 Mathematical Intuition
4 Bias Variance Trade-off relation
5 References
Subject: Machine Learning Dr. Varun Kumar Machine Learning 2 / 13
Introduction to under fitting and over fitting and best fit
Regression problem: Underfitting, overfitting, best fitting
1 Underfitting: A model or a ML algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
2 Overfitting: A model or a ML algorithm is said to have overfitting
when it capture the underlying trend of the data very accurately.
3 Best fit: A model or a ML algorithm is said to have best fit when it
capture the underlying trend of the data moderately.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 3 / 13
Bias and variance
1 Bias: It shows the degree of randomness of the training data.
(a) Based on the training data a suitable model may be created for the
regression or classification problem.
(b) Regression: Linear, logistic, polynomial
(c) Classification: Decision tree, random forest, naive baye’s, KNN, SVM
2 Variance: It shows the degree of randomness of the testing data.
(i) Testing data validates the accuracy of a model, that has been made
with the help of training data set.
(ii) Testing data is nothing but the unlabeled or unknown data.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 4 / 13
Underfitting, overfitting, best fit and bias, variance
Note:
⇒ The objective of ML algorithm not only fit for the training data but
also fit for the testing data.
⇒ In other words, low bias and low variance is the appropriate solution.
Underfitting Overfitting Best fit
High bias Very low bias Low bias
High variance High variance Low variance
Subject: Machine Learning Dr. Varun Kumar Machine Learning 5 / 13
Underfitting, overfitting, and best fit in classification problem
A classifier (Decision tree, random forest, naive baye’s, KNN, SVM) works
on two types of data
Training data
Testing data
Example:
Classifier comes under the category of underfitting, overfitting, and best fit
based on its training and testing accuracy.
Underfitting Overfitting Best fit
Train error=25% Train error=1% Train error=8%
Test error= 27% Test error= 23% Test error=9%
Subject: Machine Learning Dr. Varun Kumar Machine Learning 6 / 13
Mathematical intuition
bias ˆ
f (x)

= E[ˆ
f (x)] − f (x) (1)
variance ˆ
f (x)

= E
h
ˆ
f (x) − E[ˆ
f (x)]
2
i
(2)
⇒ ˆ
f (x) → output observed through the training model
⇒ For linear model ˆ
f (x) = w1x + w0
⇒ For complex model ˆ
f (x) =
Pp
i=1 wi xi + w0
⇒ We don’t have idea regarding the true f (x).
⇒ Simple model: Low bias  high variance
⇒ Complex model: High bias  low variance
E[(y − ˆ
f (x))2
] = bias2
+ Variance + σ2
(Irreducible error) (3)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 7 / 13
Continued–
⇒ Let there be n + m sample data in a given data set in which n and m
samples are taken for the training and testing (validation) purpose then
trainerr =
1
n
n
X
i=1
(yi − ˆ
f (xi ))2
testerr =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
⇒ If model complexity is increased the training model becomes too optimistic
gives a wrong picture of how close ˆ
f to f .
⇒ Let D = {xi , yi }n+m
1 is a given data set, where
yi = f (xi ) + i
⇒  ∼ N(0, σ2
)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 8 / 13
Continued–
⇒ We use ˆ
f to approximate f , where training set T ⊂ D such that
yi = ˆ
f (xi )
⇒ We are interested to know
E
h
ˆ
f (xi ) − f (xi )
2
i
⇒ We cannot estimate the above expression directly, because we don’t know
f (xi )
E

(ŷi − yi )2

= E

(ˆ
f (xi ) − f (xi ) − i )2

⇐ yi = f (xi ) + i
= E

(ˆ
f (xi ) − f (xi ))2
− 2i (ˆ
f (xi ) − f (xi )) + 2
i

= E

(ˆ
f (xi ) − f (xi ))2

− 2E

i (ˆ
f (xi ) − f (xi ))

+ E

2
i

E

(ˆ
f (xi ) − f (xi ))2

= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

(4)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 9 / 13
Continued–
⇒ Empirical estimate: Let Z = {zi }n
i=1
E(Z) =
1
n
n
X
i=1
zi
⇒ We can empirically evaluate R.H.S using training or test observation
Case 1: Using test observation:
E

(ˆ
f (xi ) − f (xi ))2

| {z }
True error
= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

=
1
m
n+m
X
i=n+1
(ŷi − yi )2
| {z }
Empirical error
−
1
m
n+m
X
i=n+1
(i )2
| {z }
Small constant
+ 2E
h
i (ˆ
f (xi ) − f (xi ))
i
| {z }
Covariance
∵ Cov(X, Y ) = E
h
(X − µx )(Y − µy )
i
let i = X and Y = (ˆ
f (xi ) − f (xi ))
Subject: Machine Learning Dr. Varun Kumar Machine Learning 10 / 13
Continued–
⇒ E
h
(X − µx )(Y − µy )
i
= E
h
X(Y − µy )
i
= E

XY

− µy E

X

= E[XY ]
⇒ None of the test data participated for estimating the ˆ
f (xi ).
⇒ ˆ
f (xi ) is estimated only using the training data.
⇒ ∴ i ⊥ (ˆ
f (xi ) − f (xi ))
∴ E[XY ] = E[X]E[Y ] = 0
⇒ True error=Empirical error+Small constant ← Test data
Case 2: For training observation:
E[XY ] 6= 0
Using Stein’s Lemma
1
n
n
X
i=1
i (ˆ
f (xi ) − f (xi )) =
σ2
n
n
X
i=1
∂ ˆ
f (xi )
∂yi
Subject: Machine Learning Dr. Varun Kumar Machine Learning 11 / 13
Bias and variance trade-off relation
Subject: Machine Learning Dr. Varun Kumar Machine Learning 12 / 13
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 13 / 13

More Related Content

What's hot

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 

What's hot (20)

Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 

Similar to Bias and variance trade off

Problem solving content
Problem solving contentProblem solving content
Problem solving content
Timothy Welsh
 
problem_solving in physics
 problem_solving in physics problem_solving in physics
problem_solving in physics
Timothy Welsh
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
butest
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
butest
 

Similar to Bias and variance trade off (20)

A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leaching
 
Data Mining and Machine Learning Presentation
Data Mining and Machine Learning PresentationData Mining and Machine Learning Presentation
Data Mining and Machine Learning Presentation
 
Ensemble Learning and Boosting
Ensemble Learning and BoostingEnsemble Learning and Boosting
Ensemble Learning and Boosting
 
[update] Introductory Parts of the Book "Dive into Deep Learning"
[update] Introductory Parts of the Book "Dive into Deep Learning"[update] Introductory Parts of the Book "Dive into Deep Learning"
[update] Introductory Parts of the Book "Dive into Deep Learning"
 
151028_abajpai1
151028_abajpai1151028_abajpai1
151028_abajpai1
 
C3.2.2
C3.2.2C3.2.2
C3.2.2
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2
 
Problem solving content
Problem solving contentProblem solving content
Problem solving content
 
problem_solving in physics
 problem_solving in physics problem_solving in physics
problem_solving in physics
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Ph.D. Dissertation Defense
Ph.D. Dissertation DefensePh.D. Dissertation Defense
Ph.D. Dissertation Defense
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 

More from VARUN KUMAR

More from VARUN KUMAR (20)

Distributed rc Model
Distributed rc ModelDistributed rc Model
Distributed rc Model
 
Electrical Wire Model
Electrical Wire ModelElectrical Wire Model
Electrical Wire Model
 
Interconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI DesignInterconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI Design
 
Introduction to Digital VLSI Design
Introduction to Digital VLSI DesignIntroduction to Digital VLSI Design
Introduction to Digital VLSI Design
 
Challenges of Massive MIMO System
Challenges of Massive MIMO SystemChallenges of Massive MIMO System
Challenges of Massive MIMO System
 
E-democracy or Digital Democracy
E-democracy or Digital DemocracyE-democracy or Digital Democracy
E-democracy or Digital Democracy
 
Ethics of Parasitic Computing
Ethics of Parasitic ComputingEthics of Parasitic Computing
Ethics of Parasitic Computing
 
Action Lines of Geneva Plan of Action
Action Lines of Geneva Plan of ActionAction Lines of Geneva Plan of Action
Action Lines of Geneva Plan of Action
 
Geneva Plan of Action
Geneva Plan of ActionGeneva Plan of Action
Geneva Plan of Action
 
Fair Use in the Electronic Age
Fair Use in the Electronic AgeFair Use in the Electronic Age
Fair Use in the Electronic Age
 
Software as a Property
Software as a PropertySoftware as a Property
Software as a Property
 
Orthogonal Polynomial
Orthogonal PolynomialOrthogonal Polynomial
Orthogonal Polynomial
 
Patent Protection
Patent ProtectionPatent Protection
Patent Protection
 
Copyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy LawCopyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy Law
 
Property Right and Software
Property Right and SoftwareProperty Right and Software
Property Right and Software
 
Investigating Data Trials
Investigating Data TrialsInvestigating Data Trials
Investigating Data Trials
 
Gaussian Numerical Integration
Gaussian Numerical IntegrationGaussian Numerical Integration
Gaussian Numerical Integration
 
Censorship and Controversy
Censorship and ControversyCensorship and Controversy
Censorship and Controversy
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
 
Introduction to Censorship
Introduction to Censorship Introduction to Censorship
Introduction to Censorship
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 

Bias and variance trade off

  • 1. Bias Variance Trade-off in Machine Learning Subject: Machine Learning Dr. Varun Kumar Subject: Machine Learning Dr. Varun Kumar Machine Learning 1 / 13
  • 2. Outlines 1 Introduction to Under Fitting and Over Fitting 2 Introduction to Bias and Variance 3 Mathematical Intuition 4 Bias Variance Trade-off relation 5 References Subject: Machine Learning Dr. Varun Kumar Machine Learning 2 / 13
  • 3. Introduction to under fitting and over fitting and best fit Regression problem: Underfitting, overfitting, best fitting 1 Underfitting: A model or a ML algorithm is said to have underfitting when it cannot capture the underlying trend of the data. 2 Overfitting: A model or a ML algorithm is said to have overfitting when it capture the underlying trend of the data very accurately. 3 Best fit: A model or a ML algorithm is said to have best fit when it capture the underlying trend of the data moderately. Subject: Machine Learning Dr. Varun Kumar Machine Learning 3 / 13
  • 4. Bias and variance 1 Bias: It shows the degree of randomness of the training data. (a) Based on the training data a suitable model may be created for the regression or classification problem. (b) Regression: Linear, logistic, polynomial (c) Classification: Decision tree, random forest, naive baye’s, KNN, SVM 2 Variance: It shows the degree of randomness of the testing data. (i) Testing data validates the accuracy of a model, that has been made with the help of training data set. (ii) Testing data is nothing but the unlabeled or unknown data. Subject: Machine Learning Dr. Varun Kumar Machine Learning 4 / 13
  • 5. Underfitting, overfitting, best fit and bias, variance Note: ⇒ The objective of ML algorithm not only fit for the training data but also fit for the testing data. ⇒ In other words, low bias and low variance is the appropriate solution. Underfitting Overfitting Best fit High bias Very low bias Low bias High variance High variance Low variance Subject: Machine Learning Dr. Varun Kumar Machine Learning 5 / 13
  • 6. Underfitting, overfitting, and best fit in classification problem A classifier (Decision tree, random forest, naive baye’s, KNN, SVM) works on two types of data Training data Testing data Example: Classifier comes under the category of underfitting, overfitting, and best fit based on its training and testing accuracy. Underfitting Overfitting Best fit Train error=25% Train error=1% Train error=8% Test error= 27% Test error= 23% Test error=9% Subject: Machine Learning Dr. Varun Kumar Machine Learning 6 / 13
  • 7. Mathematical intuition bias ˆ f (x) = E[ˆ f (x)] − f (x) (1) variance ˆ f (x) = E h ˆ f (x) − E[ˆ f (x)] 2 i (2) ⇒ ˆ f (x) → output observed through the training model ⇒ For linear model ˆ f (x) = w1x + w0 ⇒ For complex model ˆ f (x) = Pp i=1 wi xi + w0 ⇒ We don’t have idea regarding the true f (x). ⇒ Simple model: Low bias high variance ⇒ Complex model: High bias low variance E[(y − ˆ f (x))2 ] = bias2 + Variance + σ2 (Irreducible error) (3) Subject: Machine Learning Dr. Varun Kumar Machine Learning 7 / 13
  • 8. Continued– ⇒ Let there be n + m sample data in a given data set in which n and m samples are taken for the training and testing (validation) purpose then trainerr = 1 n n X i=1 (yi − ˆ f (xi ))2 testerr = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 ⇒ If model complexity is increased the training model becomes too optimistic gives a wrong picture of how close ˆ f to f . ⇒ Let D = {xi , yi }n+m 1 is a given data set, where yi = f (xi ) + i ⇒ ∼ N(0, σ2 ) Subject: Machine Learning Dr. Varun Kumar Machine Learning 8 / 13
  • 9. Continued– ⇒ We use ˆ f to approximate f , where training set T ⊂ D such that yi = ˆ f (xi ) ⇒ We are interested to know E h ˆ f (xi ) − f (xi ) 2 i ⇒ We cannot estimate the above expression directly, because we don’t know f (xi ) E (ŷi − yi )2 = E (ˆ f (xi ) − f (xi ) − i )2 ⇐ yi = f (xi ) + i = E (ˆ f (xi ) − f (xi ))2 − 2i (ˆ f (xi ) − f (xi )) + 2 i = E (ˆ f (xi ) − f (xi ))2 − 2E i (ˆ f (xi ) − f (xi )) + E 2 i E (ˆ f (xi ) − f (xi ))2 = E (ŷi − yi )2 + 2E i (ˆ f (xi ) − f (xi )) − E 2 i (4) Subject: Machine Learning Dr. Varun Kumar Machine Learning 9 / 13
  • 10. Continued– ⇒ Empirical estimate: Let Z = {zi }n i=1 E(Z) = 1 n n X i=1 zi ⇒ We can empirically evaluate R.H.S using training or test observation Case 1: Using test observation: E (ˆ f (xi ) − f (xi ))2 | {z } True error = E (ŷi − yi )2 + 2E i (ˆ f (xi ) − f (xi )) − E 2 i = 1 m n+m X i=n+1 (ŷi − yi )2 | {z } Empirical error − 1 m n+m X i=n+1 (i )2 | {z } Small constant + 2E h i (ˆ f (xi ) − f (xi )) i | {z } Covariance ∵ Cov(X, Y ) = E h (X − µx )(Y − µy ) i let i = X and Y = (ˆ f (xi ) − f (xi )) Subject: Machine Learning Dr. Varun Kumar Machine Learning 10 / 13
  • 11. Continued– ⇒ E h (X − µx )(Y − µy ) i = E h X(Y − µy ) i = E XY − µy E X = E[XY ] ⇒ None of the test data participated for estimating the ˆ f (xi ). ⇒ ˆ f (xi ) is estimated only using the training data. ⇒ ∴ i ⊥ (ˆ f (xi ) − f (xi )) ∴ E[XY ] = E[X]E[Y ] = 0 ⇒ True error=Empirical error+Small constant ← Test data Case 2: For training observation: E[XY ] 6= 0 Using Stein’s Lemma 1 n n X i=1 i (ˆ f (xi ) − f (xi )) = σ2 n n X i=1 ∂ ˆ f (xi ) ∂yi Subject: Machine Learning Dr. Varun Kumar Machine Learning 11 / 13
  • 12. Bias and variance trade-off relation Subject: Machine Learning Dr. Varun Kumar Machine Learning 12 / 13
  • 13. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. Subject: Machine Learning Dr. Varun Kumar Machine Learning 13 / 13