SlideShare a Scribd company logo
1 of 11
Download to read offline
Lecture 3: Statistical Learning (Chapter 2.2)
1 / 11
Assess Model Accuracy
I No one method dominates all others over all possible data sets.
I Different methods require different model assumptions.
I It is an important task to decide for any given set of data which method
produces the best results.
2 / 11
Measuring the Quality of Fit
Suppose we fit a model ˆ
f (x) to some training data {(x1, y1), ..., (xn, yn)}, and
we wish to see how well it performs.
I Mean squared error (MSE):
MSE =
1
n
n
X
i=1
(yi − ˆ
f (xi ))2
.
I It is also called training MSE, because the MSE is computed using the
training data.
I However, it is Not a valid measure of the model fit, because overfit models
usually have smaller training MSE. (If we look for models with smallest
training MSE, we usually pick an overfit model, which has too large
variance.)
3 / 11
Measuring the Quality of Fit
I Test data refers to the data which are not used to train the statistical
model (i.e., not used to calculate ˆ
f ).
I Test MSE. Instead of using the training MSE, we should look at the test
MSE. Suppose we have the test data {(xT1, yT1), ..., (xTm, yTm)}
MSET =
1
m
m
X
i=1
(yTi − ˆ
f (xTi ))2
.
I We’d like to select the model for which the test MSE is as small as possible.
I How to calculate MSET ? If there are test data, we directly calculate
MSET . If there are no test data, we use corss-validation (Chapter 5).
4 / 11
Training MSE vs Test MSE
Left: Data simulated from f , shown in black. Three estimates of f are shown:
the linear regression line (orange curve), and two nonparametric fits (blue and
green curves). Right: Training MSE (grey curve), test MSE (red curve), and
minimum possible test MSE over all methods (dashed line). Squares represent
the training and test MSEs for the three fits shown in the left-hand panel.
5 / 11
Training MSE vs Test MSE
Here, we use a different true f that is much closer to linear. In this setting,
linear regression provides a very good fit to the data.
6 / 11
Training MSE vs Test MSE
We use an f that is far from linear. In this setting, linear regression provides a
very poor fit to the data.
7 / 11
Bias-Variance Trade-off
Suppose we have an estimator ˆ
f (x) from the training data. let (x0, y0) be a test
observation drawn from the population.
True model is Y = f (X) +  (with f (x) = E(Y |X = x)), then
E[(Y0 − ˆ
f (X))2
|X = x0] = var(ˆ
f (x0))
| {z }
Variance
+[E[ˆ
f (x0)] − f (x0)
| {z }
Bias
]2
+ var()
| {z }
Irreducible error
.
The expectation is over the variability of y0 as well as the training data.
E[(Y0 − ˆ
f (X))2
|X = x0] is called expected test MSE at x0.
Expected test MSE ≥ the irreducible error, E[(Y0 − ˆ
f (X))2
|X = x0] ≥ var().
We want to select a learning method to minimize the expected test MSE.
8 / 11
What is Bias-Variance Trade-off?
Variance: how much ˆ
f would change if we estimated it using a different
training data set.
Bias: refers to the error that is introduced by approximating a real-life problem
(e.g., the real relationship between response and predictors is nonlinear, but we
fit a linear model, which causes bias).
Typically as the flexibility of ˆ
f increases (e.g., nonparametric methods), the
variance of ˆ
f increases, and its bias decreases. On the other hand, if ˆ
f is less
flexible (e.g., linear model), the variance of ˆ
f is usually small, and the bias is
large.
So choosing the flexibility based on expected test MSE amounts to a
bias-variance trade-off.
9 / 11
Example
Left: Data simulated from f , shown in black. Three estimates of f are shown:
the linear regression line (orange curve), and two nonparametric fits (blue and
green curves). Right: Training MSE (grey curve), test MSE (red curve), and
minimum possible test MSE over all methods (dashed line). Squares represent
the training and test MSEs for the three fits shown in the left-hand panel.
10 / 11
Example
Squared bias (green curve), variance (orange curve), var() (dashed line), and
test MSE (red curve). The vertical dotted line indicates the flexibility level
corresponding to the smallest test MSE.
11 / 11

More Related Content

Similar to Data Mining and Machine Learning Presentation

Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering materialTeluguSudhakar3
 
In the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxIn the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxbradburgess22840
 
Statistics in research
Statistics in researchStatistics in research
Statistics in researchBalaji P
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxcockekeshia
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Importance Sampling Report Report a (1)
Importance Sampling Report Report a (1)Importance Sampling Report Report a (1)
Importance Sampling Report Report a (1)Kevin Danser
 
In a left-tailed test comparing two means with variances unknown b.docx
In a left-tailed test comparing two means with variances unknown b.docxIn a left-tailed test comparing two means with variances unknown b.docx
In a left-tailed test comparing two means with variances unknown b.docxbradburgess22840
 
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxcockekeshia
 
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxtheodorelove43763
 
A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingtuxette
 
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain ShainaBoling829
 
Discussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiDiscussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiwiddowsonerica
 
Discussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxDiscussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxduketjoy27252
 
Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modelingsmackinnon
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxanhlodge
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxagnesdcarey33086
 
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxLEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxsmile790243
 

Similar to Data Mining and Machine Learning Presentation (20)

Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering material
 
In the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docxIn the t test for independent groups, ____.we estimate µ1 µ2.docx
In the t test for independent groups, ____.we estimate µ1 µ2.docx
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Importance Sampling Report Report a (1)
Importance Sampling Report Report a (1)Importance Sampling Report Report a (1)
Importance Sampling Report Report a (1)
 
In a left-tailed test comparing two means with variances unknown b.docx
In a left-tailed test comparing two means with variances unknown b.docxIn a left-tailed test comparing two means with variances unknown b.docx
In a left-tailed test comparing two means with variances unknown b.docx
 
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
 
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
 
A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leaching
 
panel regression.pptx
panel regression.pptxpanel regression.pptx
panel regression.pptx
 
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain
Happiness Data SetAuthor Jackson, S.L. (2017) Statistics plain
 
Discussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiDiscussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topi
 
Discussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxDiscussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docx
 
Basics of Structural Equation Modeling
Basics of Structural Equation ModelingBasics of Structural Equation Modeling
Basics of Structural Equation Modeling
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxLEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
 
Stat sample test ch 12
Stat sample test ch 12Stat sample test ch 12
Stat sample test ch 12
 

Recently uploaded

how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 

Recently uploaded (20)

how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 

Data Mining and Machine Learning Presentation

  • 1. Lecture 3: Statistical Learning (Chapter 2.2) 1 / 11
  • 2. Assess Model Accuracy I No one method dominates all others over all possible data sets. I Different methods require different model assumptions. I It is an important task to decide for any given set of data which method produces the best results. 2 / 11
  • 3. Measuring the Quality of Fit Suppose we fit a model ˆ f (x) to some training data {(x1, y1), ..., (xn, yn)}, and we wish to see how well it performs. I Mean squared error (MSE): MSE = 1 n n X i=1 (yi − ˆ f (xi ))2 . I It is also called training MSE, because the MSE is computed using the training data. I However, it is Not a valid measure of the model fit, because overfit models usually have smaller training MSE. (If we look for models with smallest training MSE, we usually pick an overfit model, which has too large variance.) 3 / 11
  • 4. Measuring the Quality of Fit I Test data refers to the data which are not used to train the statistical model (i.e., not used to calculate ˆ f ). I Test MSE. Instead of using the training MSE, we should look at the test MSE. Suppose we have the test data {(xT1, yT1), ..., (xTm, yTm)} MSET = 1 m m X i=1 (yTi − ˆ f (xTi ))2 . I We’d like to select the model for which the test MSE is as small as possible. I How to calculate MSET ? If there are test data, we directly calculate MSET . If there are no test data, we use corss-validation (Chapter 5). 4 / 11
  • 5. Training MSE vs Test MSE Left: Data simulated from f , shown in black. Three estimates of f are shown: the linear regression line (orange curve), and two nonparametric fits (blue and green curves). Right: Training MSE (grey curve), test MSE (red curve), and minimum possible test MSE over all methods (dashed line). Squares represent the training and test MSEs for the three fits shown in the left-hand panel. 5 / 11
  • 6. Training MSE vs Test MSE Here, we use a different true f that is much closer to linear. In this setting, linear regression provides a very good fit to the data. 6 / 11
  • 7. Training MSE vs Test MSE We use an f that is far from linear. In this setting, linear regression provides a very poor fit to the data. 7 / 11
  • 8. Bias-Variance Trade-off Suppose we have an estimator ˆ f (x) from the training data. let (x0, y0) be a test observation drawn from the population. True model is Y = f (X) + (with f (x) = E(Y |X = x)), then E[(Y0 − ˆ f (X))2 |X = x0] = var(ˆ f (x0)) | {z } Variance +[E[ˆ f (x0)] − f (x0) | {z } Bias ]2 + var() | {z } Irreducible error . The expectation is over the variability of y0 as well as the training data. E[(Y0 − ˆ f (X))2 |X = x0] is called expected test MSE at x0. Expected test MSE ≥ the irreducible error, E[(Y0 − ˆ f (X))2 |X = x0] ≥ var(). We want to select a learning method to minimize the expected test MSE. 8 / 11
  • 9. What is Bias-Variance Trade-off? Variance: how much ˆ f would change if we estimated it using a different training data set. Bias: refers to the error that is introduced by approximating a real-life problem (e.g., the real relationship between response and predictors is nonlinear, but we fit a linear model, which causes bias). Typically as the flexibility of ˆ f increases (e.g., nonparametric methods), the variance of ˆ f increases, and its bias decreases. On the other hand, if ˆ f is less flexible (e.g., linear model), the variance of ˆ f is usually small, and the bias is large. So choosing the flexibility based on expected test MSE amounts to a bias-variance trade-off. 9 / 11
  • 10. Example Left: Data simulated from f , shown in black. Three estimates of f are shown: the linear regression line (orange curve), and two nonparametric fits (blue and green curves). Right: Training MSE (grey curve), test MSE (red curve), and minimum possible test MSE over all methods (dashed line). Squares represent the training and test MSEs for the three fits shown in the left-hand panel. 10 / 11
  • 11. Example Squared bias (green curve), variance (orange curve), var() (dashed line), and test MSE (red curve). The vertical dotted line indicates the flexibility level corresponding to the smallest test MSE. 11 / 11