Bias and variance trade off

Bias Variance Trade-off in Machine Learning
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar Machine Learning 1 / 13

Outlines
1 Introduction to Under Fitting and Over Fitting
2 Introduction to Bias and Variance
3 Mathematical Intuition
4 Bias Variance Trade-off relation
5 References

Introduction to under fitting and over fitting and best fit
Regression problem: Underfitting, overfitting, best fitting
1 Underfitting: A model or a ML algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
2 Overfitting: A model or a ML algorithm is said to have overfitting
when it capture the underlying trend of the data very accurately.
3 Best fit: A model or a ML algorithm is said to have best fit when it
capture the underlying trend of the data moderately.

Bias and variance
1 Bias: It shows the degree of randomness of the training data.
(a) Based on the training data a suitable model may be created for the
regression or classification problem.
(b) Regression: Linear, logistic, polynomial
(c) Classification: Decision tree, random forest, naive baye’s, KNN, SVM
2 Variance: It shows the degree of randomness of the testing data.
(i) Testing data validates the accuracy of a model, that has been made
with the help of training data set.
(ii) Testing data is nothing but the unlabeled or unknown data.

Underfitting, overfitting, best fit and bias, variance
Note:
⇒ The objective of ML algorithm not only fit for the training data but
also fit for the testing data.
⇒ In other words, low bias and low variance is the appropriate solution.
Underfitting Overfitting Best fit
High bias Very low bias Low bias
High variance High variance Low variance

Underfitting, overfitting, and best fit in classification problem
A classifier (Decision tree, random forest, naive baye’s, KNN, SVM) works
on two types of data
Training data
Testing data
Example:
Classifier comes under the category of underfitting, overfitting, and best fit
based on its training and testing accuracy.
Underfitting Overfitting Best fit
Train error=25% Train error=1% Train error=8%
Test error= 27% Test error= 23% Test error=9%

Mathematical intuition
bias ˆ
f (x)

= E[ˆ
f (x)] − f (x) (1)
variance ˆ
f (x)

= E
h
ˆ
f (x) − E[ˆ
f (x)]
2
i
(2)
⇒ ˆ
f (x) → output observed through the training model
⇒ For linear model ˆ
f (x) = w1x + w0
⇒ For complex model ˆ
f (x) =
Pp
i=1 wi xi + w0
⇒ We don’t have idea regarding the true f (x).
⇒ Simple model: Low bias high variance
⇒ Complex model: High bias low variance
E[(y − ˆ
f (x))2
] = bias2
+ Variance + σ2
(Irreducible error) (3)

Continued–
⇒ Let there be n + m sample data in a given data set in which n and m
samples are taken for the training and testing (validation) purpose then
trainerr =
1
n
n
X
i=1
(yi − ˆ
f (xi ))2
testerr =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
⇒ If model complexity is increased the training model becomes too optimistic
gives a wrong picture of how close ˆ
f to f .
⇒ Let D = {xi , yi }n+m
1 is a given data set, where
yi = f (xi ) + i
⇒ ∼ N(0, σ2
)

Continued–
⇒ We use ˆ
f to approximate f , where training set T ⊂ D such that
yi = ˆ
f (xi )
⇒ We are interested to know
E
h
ˆ
f (xi ) − f (xi )
2
i
⇒ We cannot estimate the above expression directly, because we don’t know
f (xi )
E

(ŷi − yi )2

= E

(ˆ
f (xi ) − f (xi ) − i )2

⇐ yi = f (xi ) + i
= E

(ˆ
f (xi ) − f (xi ))2
− 2i (ˆ
f (xi ) − f (xi )) + 2
i

= E

(ˆ
f (xi ) − f (xi ))2

− 2E

i (ˆ
f (xi ) − f (xi ))

+ E

2
i

E

(ˆ
f (xi ) − f (xi ))2

= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

(4)

Continued–
⇒ Empirical estimate: Let Z = {zi }n
i=1
E(Z) =
1
n
n
X
i=1
zi
⇒ We can empirically evaluate R.H.S using training or test observation
Case 1: Using test observation:
E

(ˆ
f (xi ) − f (xi ))2

| {z }
True error
= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

=
1
m
n+m
X
i=n+1
(ŷi − yi )2
| {z }
Empirical error
−
1
m
n+m
X
i=n+1
(i )2
| {z }
Small constant
+ 2E
h
i (ˆ
f (xi ) − f (xi ))
i
| {z }
Covariance
∵ Cov(X, Y ) = E
h
(X − µx )(Y − µy )
i
let i = X and Y = (ˆ
f (xi ) − f (xi ))

Continued–
⇒ E
h
(X − µx )(Y − µy )
i
= E
h
X(Y − µy )
i
= E

XY

− µy E

X

= E[XY ]
⇒ None of the test data participated for estimating the ˆ
f (xi ).
⇒ ˆ
f (xi ) is estimated only using the training data.
⇒ ∴ i ⊥ (ˆ
f (xi ) − f (xi ))
∴ E[XY ] = E[X]E[Y ] = 0
⇒ True error=Empirical error+Small constant ← Test data
Case 2: For training observation:
E[XY ] 6= 0
Using Stein’s Lemma
1
n
n
X
i=1
i (ˆ
f (xi ) − f (xi )) =
σ2
n
n
X
i=1
∂ ˆ
f (xi )
∂yi

Bias and variance trade-off relation

References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.

Bias and variance trade off

More Related Content

What's hot

Similar to Bias and variance trade off

More from VARUN KUMAR

Recently uploaded

In this document

Bias and variance trade off