2. Assess Model Accuracy
I No one method dominates all others over all possible data sets.
I Different methods require different model assumptions.
I It is an important task to decide for any given set of data which method
produces the best results.
2 / 11
3. Measuring the Quality of Fit
Suppose we fit a model ˆ
f (x) to some training data {(x1, y1), ..., (xn, yn)}, and
we wish to see how well it performs.
I Mean squared error (MSE):
MSE =
1
n
n
X
i=1
(yi − ˆ
f (xi ))2
.
I It is also called training MSE, because the MSE is computed using the
training data.
I However, it is Not a valid measure of the model fit, because overfit models
usually have smaller training MSE. (If we look for models with smallest
training MSE, we usually pick an overfit model, which has too large
variance.)
3 / 11
4. Measuring the Quality of Fit
I Test data refers to the data which are not used to train the statistical
model (i.e., not used to calculate ˆ
f ).
I Test MSE. Instead of using the training MSE, we should look at the test
MSE. Suppose we have the test data {(xT1, yT1), ..., (xTm, yTm)}
MSET =
1
m
m
X
i=1
(yTi − ˆ
f (xTi ))2
.
I We’d like to select the model for which the test MSE is as small as possible.
I How to calculate MSET ? If there are test data, we directly calculate
MSET . If there are no test data, we use corss-validation (Chapter 5).
4 / 11
5. Training MSE vs Test MSE
Left: Data simulated from f , shown in black. Three estimates of f are shown:
the linear regression line (orange curve), and two nonparametric fits (blue and
green curves). Right: Training MSE (grey curve), test MSE (red curve), and
minimum possible test MSE over all methods (dashed line). Squares represent
the training and test MSEs for the three fits shown in the left-hand panel.
5 / 11
6. Training MSE vs Test MSE
Here, we use a different true f that is much closer to linear. In this setting,
linear regression provides a very good fit to the data.
6 / 11
7. Training MSE vs Test MSE
We use an f that is far from linear. In this setting, linear regression provides a
very poor fit to the data.
7 / 11
8. Bias-Variance Trade-off
Suppose we have an estimator ˆ
f (x) from the training data. let (x0, y0) be a test
observation drawn from the population.
True model is Y = f (X) + (with f (x) = E(Y |X = x)), then
E[(Y0 − ˆ
f (X))2
|X = x0] = var(ˆ
f (x0))
| {z }
Variance
+[E[ˆ
f (x0)] − f (x0)
| {z }
Bias
]2
+ var()
| {z }
Irreducible error
.
The expectation is over the variability of y0 as well as the training data.
E[(Y0 − ˆ
f (X))2
|X = x0] is called expected test MSE at x0.
Expected test MSE ≥ the irreducible error, E[(Y0 − ˆ
f (X))2
|X = x0] ≥ var().
We want to select a learning method to minimize the expected test MSE.
8 / 11
9. What is Bias-Variance Trade-off?
Variance: how much ˆ
f would change if we estimated it using a different
training data set.
Bias: refers to the error that is introduced by approximating a real-life problem
(e.g., the real relationship between response and predictors is nonlinear, but we
fit a linear model, which causes bias).
Typically as the flexibility of ˆ
f increases (e.g., nonparametric methods), the
variance of ˆ
f increases, and its bias decreases. On the other hand, if ˆ
f is less
flexible (e.g., linear model), the variance of ˆ
f is usually small, and the bias is
large.
So choosing the flexibility based on expected test MSE amounts to a
bias-variance trade-off.
9 / 11
10. Example
Left: Data simulated from f , shown in black. Three estimates of f are shown:
the linear regression line (orange curve), and two nonparametric fits (blue and
green curves). Right: Training MSE (grey curve), test MSE (red curve), and
minimum possible test MSE over all methods (dashed line). Squares represent
the training and test MSEs for the three fits shown in the left-hand panel.
10 / 11
11. Example
Squared bias (green curve), variance (orange curve), var() (dashed line), and
test MSE (red curve). The vertical dotted line indicates the flexibility level
corresponding to the smallest test MSE.
11 / 11