SlideShare a Scribd company logo
Machine	
  Learning	
  Specializa0on	
  
Assessing
Performance
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1 ©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  2	
  
Make predictions, get $, right??
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model
Algorithm
Fit f
Model + algorithm
à fitted function
Predictions
à decisions à outcome
Machine	
  Learning	
  Specializa0on	
  3	
  
Or, how much am I losing?
Example: Lost $ due to inaccurate listing price
- Too low à low offers
- Too high à few lookers + no/low offers
How much am I losing compared to perfection?
Perfect predictions: Loss = 0
My predictions: Loss = ???
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  4	
  
Measuring loss
Loss function:
L(y,fŵ(x))
Examples:
(assuming loss for underpredicting = overpredicting)
Absolute error: L(y,fŵ(x)) = |y-fŵ(x)|
Squared error: L(y,fŵ(x)) = (y-fŵ(x))2
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
actual
value
Cost of using ŵ at x
when y is true
= predicted value ŷf(x)
Machine	
  Learning	
  Specializa0on	
  ©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
“Remember that all models are
wrong; the practical question is
how wrong do they have to be to
not be useful.” George Box, 1987.
Machine	
  Learning	
  Specializa0on	
  
Assessing the loss
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  
Assessing the loss
Part 1: Training error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  8	
  
Define training data
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  9	
  
Define training data
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  10	
  
Example:
Fit quadratic to minimize RSS
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
ŵ minimizes
RSS of
training data
Machine	
  Learning	
  Specializa0on	
  11	
  
Compute training error
1.  Define a loss function L(y,fŵ(x))
-  E.g., squared error, absolute error,…
2.  Training error
= avg. loss on houses in training set
= L(yi,fŵ(xi))
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fit using training data
1
N
NX
i=1
Machine	
  Learning	
  Specializa0on	
  12	
  
Example:
Use squared error loss (y-fŵ(x))2
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
Training error (ŵ) = 1/N *
[($train 1-fŵ(sq.ft.train 1))2
+ ($train 2-fŵ(sq.ft.train 2))2
+ ($train 3-fŵ(sq.ft.train 3))2
+ … include all
training houses]
Machine	
  Learning	
  Specializa0on	
  13	
  
Example:
Use squared error loss (y-fŵ(x))2
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
Training error (ŵ) =
(yi-fŵ(xi))2
RMSE =
(yi-fŵ(xi))2
1
N
NX
i=1
v
u
u
t 1
N
NX
i=1
Machine	
  Learning	
  Specializa0on	
  14	
  
Training error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  15	
  
Training error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  16	
  
Training error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  17	
  
Training error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  18	
  
Training error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
x
y
x
y
Machine	
  Learning	
  Specializa0on	
  19	
  
Is training error a good measure
of predictive performance?
How do we expect to perform on
a new house?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  20	
  
Is training error a good measure
of predictive performance?
Is there something particularly bad
about having xt square feet???
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
xt
Machine	
  Learning	
  Specializa0on	
  21	
  
Is training error a good measure
of predictive performance?
Issue: Training error is overly optimistic
because ŵ was fit to training data
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
xt
Small training error ≠> good predictions
unless training data includes everything you
might ever see
Machine	
  Learning	
  Specializa0on	
  
Assessing the loss
Part 2: Generalization (true) error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  23	
  
Generalization error
Really want estimate of loss
over all possible ( ,$) pairs
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Lots of houses
in neighborhood,
but not in dataset
Machine	
  Learning	
  Specializa0on	
  24	
  
Distribution over houses
In our neighborhood, houses of what
# sq.ft. ( ) are we likely to see?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
Machine	
  Learning	
  Specializa0on	
  25	
  
Distribution over sales prices
For houses with a given # sq.ft. ( ),
what house prices $ are we likely to see?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
price ($)
For fixed
# sq.ft.
Machine	
  Learning	
  Specializa0on	
  26	
  
Generalization error definition
Really want estimate of loss
over all possible ( ,$) pairs
Formally:
generalization error = Ex,y[L(y,fŵ(x))]
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fit using training data
average over all possible
(x,y) pairs weighted by
how likely each is
Machine	
  Learning	
  Specializa0on	
  27	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
fŵ
Machine	
  Learning	
  Specializa0on	
  28	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
fŵ
Machine	
  Learning	
  Specializa0on	
  29	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y fŵ
Machine	
  Learning	
  Specializa0on	
  30	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y fŵ
Machine	
  Learning	
  Specializa0on	
  31	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
square feet (sq.ft.)
price($)
x
y
fŵ
Machine	
  Learning	
  Specializa0on	
  32	
  
Generalization error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
x
y
x
y
Can’t
compute!
Machine	
  Learning	
  Specializa0on	
  
Assessing the loss
Part 3: Test error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  34	
  
Approximating
generalization error
Wanted estimate of loss
over all possible ( ,$) pairs
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Approximate by looking at
houses not in training set
Machine	
  Learning	
  Specializa0on	
  35	
  
Forming a test set
Hold out some ( ,$) that are
not used for fitting the model
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set
Test set
Machine	
  Learning	
  Specializa0on	
  36	
  
Forming a test set
Hold out some ( ,$) that are
not used for fitting the model
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set
Test set
Proxy for “everything you
might see”
Machine	
  Learning	
  Specializa0on	
  37	
  
Compute test error
Test error
= avg. loss on houses in test set
= L(yi,fŵ(xi))
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fit using training data# test points
has never seen
test data!
1
Ntest
X
i in test set
Machine	
  Learning	
  Specializa0on	
  38	
  
Example: As before,
fit quadratic to training data
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
ŵ minimizes
RSS of
training data
Machine	
  Learning	
  Specializa0on	
  39	
  
Example: As before,
use squared error loss (y-fŵ(x))2
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Test error (ŵ) = 1/N *
[($test 1-fŵ(sq.ft.test 1))2
+ ($test 2-fŵ(sq.ft.test 2))2
+ ($test 3-fŵ(sq.ft.test 3))2
+ … include all
test houses]square feet (sq.ft.)
price($)
x
y
Machine	
  Learning	
  Specializa0on	
  40	
  
Training, true, & test error vs.
model complexity
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
Error
Overfitting if:
x
y
x
y
Machine	
  Learning	
  Specializa0on	
  
Training/test split
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  42	
  
Training/test splits
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set Test set
how many? how many?vs.
Machine	
  Learning	
  Specializa0on	
  43	
  
Training/test splits
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Too few	
  à	
  ŵ poorly estimated
Training
set
Test set
Machine	
  Learning	
  Specializa0on	
  44	
  
Too few à test error bad approximation
of generalization error
Training/test splits
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set
Test
set
Machine	
  Learning	
  Specializa0on	
  45	
  
Training/test splits
Typically, just enough test points to form a
reasonable estimate of generalization error
If this leaves too few for training, other
methods like cross validation (will see later…)
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set Test set
Machine	
  Learning	
  Specializa0on	
  
3 sources of error +
the bias-variance tradeoff
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  47	
  
3 sources of error
In forming predictions, there
are 3 sources of error:
1.  Noise
2.  Bias
3.  Variance
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  47
Machine	
  Learning	
  Specializa0on	
  48	
  
Data inherently noisy
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y yi = fw(true)(xi)+εi
Irreducible
error
variance
of noise
Machine	
  Learning	
  Specializa0on	
  49	
  
Bias contribution
Assume we fit a constant function
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
square feet (sq.ft.)
price($)
x
y
fŵ(train1)
fŵ(train2)
N house
sales (	
  	
  	
  	
  	
  ,$)	
  
N other house
sales (	
  	
  	
  	
  	
  ,$)	
  
Machine	
  Learning	
  Specializa0on	
  50	
  
Bias contribution
Over all possible size N training sets,
what do I expect my fit to be?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y fw(true)
fw̄
fŵ(train1)
fŵ(train2)
fŵ(train3)
Machine	
  Learning	
  Specializa0on	
  51	
  
Bias contribution
Bias(x) = fw(true)(x) - fw̄(x)
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
low complexity
à
high bias fw(true)
Is our approach flexible
enough to capture fw(true)?
If not, error in predictions.
Machine	
  Learning	
  Specializa0on	
  52	
  
Variance contribution
How much do specific fits
vary from the expected fit?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fŵ(train1)
fŵ(train2)
fŵ(train3)
Machine	
  Learning	
  Specializa0on	
  53	
  
Variance contribution
How much do specific fits
vary from the expected fit?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fŵ(train1)
fŵ(train2)
fŵ(train3)
Machine	
  Learning	
  Specializa0on	
  54	
  
Variance contribution
How much do specific fits
vary from the expected fit?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
low complexity
à
low variance
Can specific fits
vary widely?
If so, erratic
predictions
Machine	
  Learning	
  Specializa0on	
  55	
  
Variance of high-complexity models
Assume we fit a high-order polynomial
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fŵ(train1)
fŵ(train2)
y
square feet (sq.ft.)
price($)
x
Machine	
  Learning	
  Specializa0on	
  56	
  
Variance of high-complexity models
Assume we fit a high-order polynomial
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fŵ(train1)
fŵ(train3)
fŵ(train2)
Machine	
  Learning	
  Specializa0on	
  57	
  
Variance of high-complexity models
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
high
complexity
à
high variance
Machine	
  Learning	
  Specializa0on	
  58	
  
Bias of high-complexity models
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fw(true)
high
complexity
à
low bias
Machine	
  Learning	
  Specializa0on	
  59	
  
Bias-variance tradeoff
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Model complexity
x
y
x
y
Machine	
  Learning	
  Specializa0on	
  60	
  
Error vs. amount of data
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
# data points in
training set
Error
Machine	
  Learning	
  Specializa0on	
  
More in depth on the
3 sources of errors…
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
OPTIONAL
Machine	
  Learning	
  Specializa0on	
  62	
  
Accounting for training set randomness
Training set was just a random sample of N houses sold
What if N other houses had been sold and recorded?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y fŵ(1)
square feet (sq.ft.)
price($)
x
y
fŵ(2)
Machine	
  Learning	
  Specializa0on	
  63	
  
Accounting for training set randomness
Training set was just a random sample of N houses sold
What if N other houses had been sold and recorded?
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
square feet (sq.ft.)
price($)
x
yfŵ(1)
fŵ(2)
generalization error of ŵ(1)	
   generalization error of ŵ(2)	
  
Machine	
  Learning	
  Specializa0on	
  64	
  
Accounting for training set randomness
Ideally, want performance averaged
over all possible training sets of size N
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
square feet (sq.ft.)
price($)
x
yfŵ(1)
fŵ(2)
generalization error of ŵ(1)	
   generalization error of ŵ(2)	
  
Machine	
  Learning	
  Specializa0on	
  65	
  
Expected prediction error
Etraining set[generalization error of ŵ(training set)]
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
averaging over all training sets
(weighted by how likely each is)
parameters fit
on a specific
training set
square feet (sq.ft.)
price($)
x
y fŵ(training set)
Machine	
  Learning	
  Specializa0on	
  66	
  
Start by considering:
1.  Loss at target xt (e.g. 2640 sq.ft.)
2.  Squared error loss L(y,fŵ(x)) = (y-fŵ(x))2
Prediction error at target input
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y fŵ(training set)
xt
Machine	
  Learning	
  Specializa0on	
  67	
  
Sum of 3 sources of error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y fŵ(training set)
xt
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
Machine	
  Learning	
  Specializa0on	
  68	
  
Error variance of the model
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
σ2 =
“variance”
y = fw(true)(x)+ε
square feet (sq.ft.)
price($)
x
y
xt
Irreducible
error
Machine	
  Learning	
  Specializa0on	
  69	
  
Bias of function estimator
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
square feet (sq.ft.)
price($)
x
y
fŵ(train1) fŵ(train2)
Machine	
  Learning	
  Specializa0on	
  70	
  
Bias of function estimator
Average estimated function = fw̄(x)
True function = fw(x)
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fw
xt
Etrain[fŵ(train)(x)]
over all training
sets of size N
fŵ(train1)fŵ(train2)
Machine	
  Learning	
  Specializa0on	
  71	
  
Bias of function estimator
Average estimated function = fw̄(x)
True function = fw(x)
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
square feet (sq.ft.)
price($)
x
y
fw̄
fw
bias(fŵ(xt)) = fw(xt) - fw̄(xt)
xt
Machine	
  Learning	
  Specializa0on	
  72	
  
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
Bias of function estimator
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  73	
  
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
square feet (sq.ft.)
price($)
x
y
Variance of function estimator
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fw̄
fŵ(train1)
fŵ(train2)
fŵ(train3)
Machine	
  Learning	
  Specializa0on	
  74	
  
Average prediction error at xt
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
square feet (sq.ft.)
price($)
x
y
Variance of function estimator
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fw̄
xt
Machine	
  Learning	
  Specializa0on	
  75	
  
var(fŵ(xt)) = Etrain[(fŵ(train)(xt)-fw̄(xt))2]
Variance of function estimator
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
fw̄
square feet (sq.ft.)
price($)
x
y
xt
over all training
sets of size N
deviation of
specific fit from
expected fit at xt
fit on a specific
training dataset
what I expect to learn
over all training sets
Machine	
  Learning	
  Specializa0on	
  
Why 3 sources of error?
A formal derivation
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
OPTIONAL
Machine	
  Learning	
  Specializa0on	
  77	
  
Deriving expected
prediction error
Expected prediction error
= Etrain [generalization error of ŵ(train)]
= Etrain [Ex,y[L(y,fŵ(train)(x))]]
1.  Look at specific xt
2.  Consider L(y,fŵ(x)) = (y-fŵ(x))2
Expected prediction error at xt
= Etrain, [(yt-fŵ(train)(xt))2]
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
yt
Machine	
  Learning	
  Specializa0on	
  78	
  
Deriving expected
prediction error
Expected prediction error at xt
= Etrain, [(yt-fŵ(train)(xt))2]
= Etrain, [((yt-fw(true)(xt)) + (fw(true)(xt)-fŵ(train)(xt)))2]
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
yt
yt
Machine	
  Learning	
  Specializa0on	
  79	
  
Equating MSE with
bias and variance
MSE[fŵ(train)(xt)]
= Etrain[(fw(true)(xt) – fŵ(train)(xt))2]
= Etrain[((fw(true)(xt)–fw̄(xt)) + (fw̄(xt)–fŵ(train)(xt)))2]
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  80	
  
Putting it all together
Expected prediction error at xt
= σ2 + MSE[fŵ(xt)]
= σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
3 sources of error
Machine	
  Learning	
  Specializa0on	
  
Summary of tasks
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  82	
  
The regression/ML workflow
1.  Model selection
Often, need to choose tuning
parameters λ controlling model
complexity (e.g. degree of polynomial)
2.  Model assessment
Having selected a model, assess
the generalization error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  83	
  
Hypothetical implementation
1.  Model selection
For each considered model complexity λ :
i.  Estimate parameters ŵλ on training data
ii.  Assess performance of ŵλ on test data
iii.  Choose λ* to be λ with lowest test error
2.  Model assessment
Compute test error of ŵλ* (fitted model for selected
complexity λ*) to approx. generalization error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Training set Test set
Machine	
  Learning	
  Specializa0on	
  84	
  
Hypothetical implementation
1.  Model selection
For each considered model complexity λ :
i.  Estimate parameters ŵλ on training data
ii.  Assess performance of ŵλ on test data
iii.  Choose λ* to be λ with lowest test error
2.  Model assessment
Compute test error of ŵλ* (fitted model for selected
complexity λ*) to approx. generalization error
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Overly optimistic!
Training set Test set
Machine	
  Learning	
  Specializa0on	
  85	
  
Hypothetical implementation
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Issue: Just like fitting ŵ and assessing its
performance both on training data
•  λ* was selected to minimize test error
(i.e., λ* was fit on test data)
•  If test data is not representative of the whole
world, then ŵλ* will typically perform worse than
test error indicates
Training set Test set
Machine	
  Learning	
  Specializa0on	
  86	
  
Training set Test set
Practical implementation
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Solution: Create two “test” sets!
1.  Select λ* such that ŵλ* minimizes error on
validation set
2.  Approximate generalization error of ŵλ* using
test set
Validation
set
Training set
Test
set
Machine	
  Learning	
  Specializa0on	
  87	
  
Practical implementation
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Validation
set
Training set
Test
set
fit ŵλ
test performance
of ŵλ to select λ*
assess
generalization
error of ŵλ*
Machine	
  Learning	
  Specializa0on	
  88	
  
Typical splits
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Validation
set
Training set
Test
set
80% 10% 10%
50% 25% 25%
Machine	
  Learning	
  Specializa0on	
  
Summary of
assessing performance
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  
Machine	
  Learning	
  Specializa0on	
  90	
  
What you can do now…
•  Describe what a loss function is and give examples
•  Contrast training, generalization, and test error
•  Compute training and test error given a loss function
•  Discuss issue of assessing performance on training set
•  Describe tradeoffs in forming training/test splits
•  List and interpret the 3 sources of avg. prediction error
-  Irreducible error, bias, and variance
•  Discuss issue of selecting model complexity on test data
and then using test error to assess generalization error
•  Motivate use of a validation set for selecting tuning
parameters (e.g., model complexity)
•  Describe overall regression workflow
©2015	
  Emily	
  Fox	
  &	
  Carlos	
  Guestrin	
  

More Related Content

What's hot

Reciprocal Allocation Method
Reciprocal Allocation MethodReciprocal Allocation Method
Reciprocal Allocation Method
Paulino Silva
 
Algebra
AlgebraAlgebra
Algebra
TEMPLERUN2
 
7th PreAlg - L74--May4
7th PreAlg - L74--May47th PreAlg - L74--May4
7th PreAlg - L74--May4
jdurst65
 
Gm
GmGm
Day 4 evaluating with add and subtract
Day 4 evaluating with add and subtractDay 4 evaluating with add and subtract
Day 4 evaluating with add and subtract
Erik Tjersland
 
Extract information from pie
Extract information from pieExtract information from pie
Extract information from pie
Nadeem Uddin
 
Tutorials--Solving One-Step Equations
Tutorials--Solving One-Step EquationsTutorials--Solving One-Step Equations
Tutorials--Solving One-Step Equations
Media4math
 
Applications of calculus in commerce and economics
Applications of calculus in commerce and economicsApplications of calculus in commerce and economics
Applications of calculus in commerce and economics
sumanmathews
 
Applications of calculus in commerce and economics ii
Applications of calculus in commerce and economics ii Applications of calculus in commerce and economics ii
Applications of calculus in commerce and economics ii
sumanmathews
 

What's hot (9)

Reciprocal Allocation Method
Reciprocal Allocation MethodReciprocal Allocation Method
Reciprocal Allocation Method
 
Algebra
AlgebraAlgebra
Algebra
 
7th PreAlg - L74--May4
7th PreAlg - L74--May47th PreAlg - L74--May4
7th PreAlg - L74--May4
 
Gm
GmGm
Gm
 
Day 4 evaluating with add and subtract
Day 4 evaluating with add and subtractDay 4 evaluating with add and subtract
Day 4 evaluating with add and subtract
 
Extract information from pie
Extract information from pieExtract information from pie
Extract information from pie
 
Tutorials--Solving One-Step Equations
Tutorials--Solving One-Step EquationsTutorials--Solving One-Step Equations
Tutorials--Solving One-Step Equations
 
Applications of calculus in commerce and economics
Applications of calculus in commerce and economicsApplications of calculus in commerce and economics
Applications of calculus in commerce and economics
 
Applications of calculus in commerce and economics ii
Applications of calculus in commerce and economics ii Applications of calculus in commerce and economics ii
Applications of calculus in commerce and economics ii
 

Similar to C2.3

C3.5.1
C3.5.1C3.5.1
C3.5.1
Daniel LIAO
 
C3.2.1
C3.2.1C3.2.1
C3.2.1
Daniel LIAO
 
Break even point(BEP)
Break even point(BEP)Break even point(BEP)
Break even point(BEP)
Hardik Kakadiya
 
C3.2.2
C3.2.2C3.2.2
C3.2.2
Daniel LIAO
 
Integer Programming, Gomory
Integer Programming, GomoryInteger Programming, Gomory
Integer Programming, Gomory
AVINASH JURIANI
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regression
ankit_ppt
 
Gen Math Lesson 1.pptx
Gen Math Lesson 1.pptxGen Math Lesson 1.pptx
Gen Math Lesson 1.pptx
jarred16
 

Similar to C2.3 (7)

C3.5.1
C3.5.1C3.5.1
C3.5.1
 
C3.2.1
C3.2.1C3.2.1
C3.2.1
 
Break even point(BEP)
Break even point(BEP)Break even point(BEP)
Break even point(BEP)
 
C3.2.2
C3.2.2C3.2.2
C3.2.2
 
Integer Programming, Gomory
Integer Programming, GomoryInteger Programming, Gomory
Integer Programming, Gomory
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regression
 
Gen Math Lesson 1.pptx
Gen Math Lesson 1.pptxGen Math Lesson 1.pptx
Gen Math Lesson 1.pptx
 

More from Daniel LIAO

C4.5
C4.5C4.5
C4.4
C4.4C4.4
C4.3.1
C4.3.1C4.3.1
C4.3.1
Daniel LIAO
 
Lime
LimeLime
C4.1.2
C4.1.2C4.1.2
C4.1.2
Daniel LIAO
 
C4.1.1
C4.1.1C4.1.1
C4.1.1
Daniel LIAO
 
C3.7.1
C3.7.1C3.7.1
C3.7.1
Daniel LIAO
 
C3.6.1
C3.6.1C3.6.1
C3.6.1
Daniel LIAO
 
C3.4.2
C3.4.2C3.4.2
C3.4.2
Daniel LIAO
 
C3.4.1
C3.4.1C3.4.1
C3.4.1
Daniel LIAO
 
C3.3.1
C3.3.1C3.3.1
C3.3.1
Daniel LIAO
 
C3.1.2
C3.1.2C3.1.2
C3.1.2
Daniel LIAO
 
C3.1.logistic intro
C3.1.logistic introC3.1.logistic intro
C3.1.logistic intro
Daniel LIAO
 
C3.1 intro
C3.1 introC3.1 intro
C3.1 intro
Daniel LIAO
 

More from Daniel LIAO (14)

C4.5
C4.5C4.5
C4.5
 
C4.4
C4.4C4.4
C4.4
 
C4.3.1
C4.3.1C4.3.1
C4.3.1
 
Lime
LimeLime
Lime
 
C4.1.2
C4.1.2C4.1.2
C4.1.2
 
C4.1.1
C4.1.1C4.1.1
C4.1.1
 
C3.7.1
C3.7.1C3.7.1
C3.7.1
 
C3.6.1
C3.6.1C3.6.1
C3.6.1
 
C3.4.2
C3.4.2C3.4.2
C3.4.2
 
C3.4.1
C3.4.1C3.4.1
C3.4.1
 
C3.3.1
C3.3.1C3.3.1
C3.3.1
 
C3.1.2
C3.1.2C3.1.2
C3.1.2
 
C3.1.logistic intro
C3.1.logistic introC3.1.logistic intro
C3.1.logistic intro
 
C3.1 intro
C3.1 introC3.1 intro
C3.1 intro
 

Recently uploaded

SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Kalna College
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
Celine George
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
shreyassri1208
 
220711130100 udita Chakraborty Aims and objectives of national policy on inf...
220711130100 udita Chakraborty  Aims and objectives of national policy on inf...220711130100 udita Chakraborty  Aims and objectives of national policy on inf...
220711130100 udita Chakraborty Aims and objectives of national policy on inf...
Kalna College
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
TechSoup
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
ShwetaGawande8
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
Celine George
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
RandolphRadicy
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 

Recently uploaded (20)

SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
 
220711130100 udita Chakraborty Aims and objectives of national policy on inf...
220711130100 udita Chakraborty  Aims and objectives of national policy on inf...220711130100 udita Chakraborty  Aims and objectives of national policy on inf...
220711130100 udita Chakraborty Aims and objectives of national policy on inf...
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 

C2.3

  • 1. Machine  Learning  Specializa0on   Assessing Performance Emily Fox & Carlos Guestrin Machine Learning Specialization University of Washington 1 ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 2. Machine  Learning  Specializa0on  2   Make predictions, get $, right?? ©2015  Emily  Fox  &  Carlos  Guestrin   Model Algorithm Fit f Model + algorithm à fitted function Predictions à decisions à outcome
  • 3. Machine  Learning  Specializa0on  3   Or, how much am I losing? Example: Lost $ due to inaccurate listing price - Too low à low offers - Too high à few lookers + no/low offers How much am I losing compared to perfection? Perfect predictions: Loss = 0 My predictions: Loss = ??? ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 4. Machine  Learning  Specializa0on  4   Measuring loss Loss function: L(y,fŵ(x)) Examples: (assuming loss for underpredicting = overpredicting) Absolute error: L(y,fŵ(x)) = |y-fŵ(x)| Squared error: L(y,fŵ(x)) = (y-fŵ(x))2 ©2015  Emily  Fox  &  Carlos  Guestrin   actual value Cost of using ŵ at x when y is true = predicted value ŷf(x)
  • 5. Machine  Learning  Specializa0on  ©2015  Emily  Fox  &  Carlos  Guestrin   “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” George Box, 1987.
  • 6. Machine  Learning  Specializa0on   Assessing the loss ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 7. Machine  Learning  Specializa0on   Assessing the loss Part 1: Training error ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 8. Machine  Learning  Specializa0on  8   Define training data ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y
  • 9. Machine  Learning  Specializa0on  9   Define training data ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y
  • 10. Machine  Learning  Specializa0on  10   Example: Fit quadratic to minimize RSS ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y ŵ minimizes RSS of training data
  • 11. Machine  Learning  Specializa0on  11   Compute training error 1.  Define a loss function L(y,fŵ(x)) -  E.g., squared error, absolute error,… 2.  Training error = avg. loss on houses in training set = L(yi,fŵ(xi)) ©2015  Emily  Fox  &  Carlos  Guestrin   fit using training data 1 N NX i=1
  • 12. Machine  Learning  Specializa0on  12   Example: Use squared error loss (y-fŵ(x))2 ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y Training error (ŵ) = 1/N * [($train 1-fŵ(sq.ft.train 1))2 + ($train 2-fŵ(sq.ft.train 2))2 + ($train 3-fŵ(sq.ft.train 3))2 + … include all training houses]
  • 13. Machine  Learning  Specializa0on  13   Example: Use squared error loss (y-fŵ(x))2 ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y Training error (ŵ) = (yi-fŵ(xi))2 RMSE = (yi-fŵ(xi))2 1 N NX i=1 v u u t 1 N NX i=1
  • 14. Machine  Learning  Specializa0on  14   Training error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y
  • 15. Machine  Learning  Specializa0on  15   Training error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y
  • 16. Machine  Learning  Specializa0on  16   Training error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y
  • 17. Machine  Learning  Specializa0on  17   Training error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y
  • 18. Machine  Learning  Specializa0on  18   Training error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error x y x y
  • 19. Machine  Learning  Specializa0on  19   Is training error a good measure of predictive performance? How do we expect to perform on a new house? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y
  • 20. Machine  Learning  Specializa0on  20   Is training error a good measure of predictive performance? Is there something particularly bad about having xt square feet??? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y xt
  • 21. Machine  Learning  Specializa0on  21   Is training error a good measure of predictive performance? Issue: Training error is overly optimistic because ŵ was fit to training data ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y xt Small training error ≠> good predictions unless training data includes everything you might ever see
  • 22. Machine  Learning  Specializa0on   Assessing the loss Part 2: Generalization (true) error ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 23. Machine  Learning  Specializa0on  23   Generalization error Really want estimate of loss over all possible ( ,$) pairs ©2015  Emily  Fox  &  Carlos  Guestrin   Lots of houses in neighborhood, but not in dataset
  • 24. Machine  Learning  Specializa0on  24   Distribution over houses In our neighborhood, houses of what # sq.ft. ( ) are we likely to see? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.)
  • 25. Machine  Learning  Specializa0on  25   Distribution over sales prices For houses with a given # sq.ft. ( ), what house prices $ are we likely to see? ©2015  Emily  Fox  &  Carlos  Guestrin   price ($) For fixed # sq.ft.
  • 26. Machine  Learning  Specializa0on  26   Generalization error definition Really want estimate of loss over all possible ( ,$) pairs Formally: generalization error = Ex,y[L(y,fŵ(x))] ©2015  Emily  Fox  &  Carlos  Guestrin   fit using training data average over all possible (x,y) pairs weighted by how likely each is
  • 27. Machine  Learning  Specializa0on  27   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y fŵ
  • 28. Machine  Learning  Specializa0on  28   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y fŵ
  • 29. Machine  Learning  Specializa0on  29   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y fŵ
  • 30. Machine  Learning  Specializa0on  30   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y fŵ
  • 31. Machine  Learning  Specializa0on  31   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error square feet (sq.ft.) price($) x y fŵ
  • 32. Machine  Learning  Specializa0on  32   Generalization error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error x y x y Can’t compute!
  • 33. Machine  Learning  Specializa0on   Assessing the loss Part 3: Test error ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 34. Machine  Learning  Specializa0on  34   Approximating generalization error Wanted estimate of loss over all possible ( ,$) pairs ©2015  Emily  Fox  &  Carlos  Guestrin   Approximate by looking at houses not in training set
  • 35. Machine  Learning  Specializa0on  35   Forming a test set Hold out some ( ,$) that are not used for fitting the model ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set
  • 36. Machine  Learning  Specializa0on  36   Forming a test set Hold out some ( ,$) that are not used for fitting the model ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set Proxy for “everything you might see”
  • 37. Machine  Learning  Specializa0on  37   Compute test error Test error = avg. loss on houses in test set = L(yi,fŵ(xi)) ©2015  Emily  Fox  &  Carlos  Guestrin   fit using training data# test points has never seen test data! 1 Ntest X i in test set
  • 38. Machine  Learning  Specializa0on  38   Example: As before, fit quadratic to training data ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y ŵ minimizes RSS of training data
  • 39. Machine  Learning  Specializa0on  39   Example: As before, use squared error loss (y-fŵ(x))2 ©2015  Emily  Fox  &  Carlos  Guestrin   Test error (ŵ) = 1/N * [($test 1-fŵ(sq.ft.test 1))2 + ($test 2-fŵ(sq.ft.test 2))2 + ($test 3-fŵ(sq.ft.test 3))2 + … include all test houses]square feet (sq.ft.) price($) x y
  • 40. Machine  Learning  Specializa0on  40   Training, true, & test error vs. model complexity ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity Error Overfitting if: x y x y
  • 41. Machine  Learning  Specializa0on   Training/test split ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 42. Machine  Learning  Specializa0on  42   Training/test splits ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set how many? how many?vs.
  • 43. Machine  Learning  Specializa0on  43   Training/test splits ©2015  Emily  Fox  &  Carlos  Guestrin   Too few  à  ŵ poorly estimated Training set Test set
  • 44. Machine  Learning  Specializa0on  44   Too few à test error bad approximation of generalization error Training/test splits ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set
  • 45. Machine  Learning  Specializa0on  45   Training/test splits Typically, just enough test points to form a reasonable estimate of generalization error If this leaves too few for training, other methods like cross validation (will see later…) ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set
  • 46. Machine  Learning  Specializa0on   3 sources of error + the bias-variance tradeoff ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 47. Machine  Learning  Specializa0on  47   3 sources of error In forming predictions, there are 3 sources of error: 1.  Noise 2.  Bias 3.  Variance ©2015  Emily  Fox  &  Carlos  Guestrin  47
  • 48. Machine  Learning  Specializa0on  48   Data inherently noisy ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y yi = fw(true)(xi)+εi Irreducible error variance of noise
  • 49. Machine  Learning  Specializa0on  49   Bias contribution Assume we fit a constant function ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y square feet (sq.ft.) price($) x y fŵ(train1) fŵ(train2) N house sales (          ,$)   N other house sales (          ,$)  
  • 50. Machine  Learning  Specializa0on  50   Bias contribution Over all possible size N training sets, what do I expect my fit to be? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw(true) fw̄ fŵ(train1) fŵ(train2) fŵ(train3)
  • 51. Machine  Learning  Specializa0on  51   Bias contribution Bias(x) = fw(true)(x) - fw̄(x) ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ low complexity à high bias fw(true) Is our approach flexible enough to capture fw(true)? If not, error in predictions.
  • 52. Machine  Learning  Specializa0on  52   Variance contribution How much do specific fits vary from the expected fit? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fŵ(train1) fŵ(train2) fŵ(train3)
  • 53. Machine  Learning  Specializa0on  53   Variance contribution How much do specific fits vary from the expected fit? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fŵ(train1) fŵ(train2) fŵ(train3)
  • 54. Machine  Learning  Specializa0on  54   Variance contribution How much do specific fits vary from the expected fit? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y low complexity à low variance Can specific fits vary widely? If so, erratic predictions
  • 55. Machine  Learning  Specializa0on  55   Variance of high-complexity models Assume we fit a high-order polynomial ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fŵ(train1) fŵ(train2) y square feet (sq.ft.) price($) x
  • 56. Machine  Learning  Specializa0on  56   Variance of high-complexity models Assume we fit a high-order polynomial ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fŵ(train1) fŵ(train3) fŵ(train2)
  • 57. Machine  Learning  Specializa0on  57   Variance of high-complexity models ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ high complexity à high variance
  • 58. Machine  Learning  Specializa0on  58   Bias of high-complexity models ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fw(true) high complexity à low bias
  • 59. Machine  Learning  Specializa0on  59   Bias-variance tradeoff ©2015  Emily  Fox  &  Carlos  Guestrin   Model complexity x y x y
  • 60. Machine  Learning  Specializa0on  60   Error vs. amount of data ©2015  Emily  Fox  &  Carlos  Guestrin   # data points in training set Error
  • 61. Machine  Learning  Specializa0on   More in depth on the 3 sources of errors… ©2015  Emily  Fox  &  Carlos  Guestrin   OPTIONAL
  • 62. Machine  Learning  Specializa0on  62   Accounting for training set randomness Training set was just a random sample of N houses sold What if N other houses had been sold and recorded? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fŵ(1) square feet (sq.ft.) price($) x y fŵ(2)
  • 63. Machine  Learning  Specializa0on  63   Accounting for training set randomness Training set was just a random sample of N houses sold What if N other houses had been sold and recorded? ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y square feet (sq.ft.) price($) x yfŵ(1) fŵ(2) generalization error of ŵ(1)   generalization error of ŵ(2)  
  • 64. Machine  Learning  Specializa0on  64   Accounting for training set randomness Ideally, want performance averaged over all possible training sets of size N ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y square feet (sq.ft.) price($) x yfŵ(1) fŵ(2) generalization error of ŵ(1)   generalization error of ŵ(2)  
  • 65. Machine  Learning  Specializa0on  65   Expected prediction error Etraining set[generalization error of ŵ(training set)] ©2015  Emily  Fox  &  Carlos  Guestrin   averaging over all training sets (weighted by how likely each is) parameters fit on a specific training set square feet (sq.ft.) price($) x y fŵ(training set)
  • 66. Machine  Learning  Specializa0on  66   Start by considering: 1.  Loss at target xt (e.g. 2640 sq.ft.) 2.  Squared error loss L(y,fŵ(x)) = (y-fŵ(x))2 Prediction error at target input ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fŵ(training set) xt
  • 67. Machine  Learning  Specializa0on  67   Sum of 3 sources of error ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fŵ(training set) xt Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt))
  • 68. Machine  Learning  Specializa0on  68   Error variance of the model Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) ©2015  Emily  Fox  &  Carlos  Guestrin   σ2 = “variance” y = fw(true)(x)+ε square feet (sq.ft.) price($) x y xt Irreducible error
  • 69. Machine  Learning  Specializa0on  69   Bias of function estimator Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y square feet (sq.ft.) price($) x y fŵ(train1) fŵ(train2)
  • 70. Machine  Learning  Specializa0on  70   Bias of function estimator Average estimated function = fw̄(x) True function = fw(x) ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fw xt Etrain[fŵ(train)(x)] over all training sets of size N fŵ(train1)fŵ(train2)
  • 71. Machine  Learning  Specializa0on  71   Bias of function estimator Average estimated function = fw̄(x) True function = fw(x) ©2015  Emily  Fox  &  Carlos  Guestrin   square feet (sq.ft.) price($) x y fw̄ fw bias(fŵ(xt)) = fw(xt) - fw̄(xt) xt
  • 72. Machine  Learning  Specializa0on  72   Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) Bias of function estimator ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 73. Machine  Learning  Specializa0on  73   Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) square feet (sq.ft.) price($) x y Variance of function estimator ©2015  Emily  Fox  &  Carlos  Guestrin   fw̄ fŵ(train1) fŵ(train2) fŵ(train3)
  • 74. Machine  Learning  Specializa0on  74   Average prediction error at xt = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) square feet (sq.ft.) price($) x y Variance of function estimator ©2015  Emily  Fox  &  Carlos  Guestrin   fw̄ xt
  • 75. Machine  Learning  Specializa0on  75   var(fŵ(xt)) = Etrain[(fŵ(train)(xt)-fw̄(xt))2] Variance of function estimator ©2015  Emily  Fox  &  Carlos  Guestrin   fw̄ square feet (sq.ft.) price($) x y xt over all training sets of size N deviation of specific fit from expected fit at xt fit on a specific training dataset what I expect to learn over all training sets
  • 76. Machine  Learning  Specializa0on   Why 3 sources of error? A formal derivation ©2015  Emily  Fox  &  Carlos  Guestrin   OPTIONAL
  • 77. Machine  Learning  Specializa0on  77   Deriving expected prediction error Expected prediction error = Etrain [generalization error of ŵ(train)] = Etrain [Ex,y[L(y,fŵ(train)(x))]] 1.  Look at specific xt 2.  Consider L(y,fŵ(x)) = (y-fŵ(x))2 Expected prediction error at xt = Etrain, [(yt-fŵ(train)(xt))2] ©2015  Emily  Fox  &  Carlos  Guestrin   yt
  • 78. Machine  Learning  Specializa0on  78   Deriving expected prediction error Expected prediction error at xt = Etrain, [(yt-fŵ(train)(xt))2] = Etrain, [((yt-fw(true)(xt)) + (fw(true)(xt)-fŵ(train)(xt)))2] ©2015  Emily  Fox  &  Carlos  Guestrin   yt yt
  • 79. Machine  Learning  Specializa0on  79   Equating MSE with bias and variance MSE[fŵ(train)(xt)] = Etrain[(fw(true)(xt) – fŵ(train)(xt))2] = Etrain[((fw(true)(xt)–fw̄(xt)) + (fw̄(xt)–fŵ(train)(xt)))2] ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 80. Machine  Learning  Specializa0on  80   Putting it all together Expected prediction error at xt = σ2 + MSE[fŵ(xt)] = σ2 + [bias(fŵ(xt))]2 + var(fŵ(xt)) ©2015  Emily  Fox  &  Carlos  Guestrin   3 sources of error
  • 81. Machine  Learning  Specializa0on   Summary of tasks ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 82. Machine  Learning  Specializa0on  82   The regression/ML workflow 1.  Model selection Often, need to choose tuning parameters λ controlling model complexity (e.g. degree of polynomial) 2.  Model assessment Having selected a model, assess the generalization error ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 83. Machine  Learning  Specializa0on  83   Hypothetical implementation 1.  Model selection For each considered model complexity λ : i.  Estimate parameters ŵλ on training data ii.  Assess performance of ŵλ on test data iii.  Choose λ* to be λ with lowest test error 2.  Model assessment Compute test error of ŵλ* (fitted model for selected complexity λ*) to approx. generalization error ©2015  Emily  Fox  &  Carlos  Guestrin   Training set Test set
  • 84. Machine  Learning  Specializa0on  84   Hypothetical implementation 1.  Model selection For each considered model complexity λ : i.  Estimate parameters ŵλ on training data ii.  Assess performance of ŵλ on test data iii.  Choose λ* to be λ with lowest test error 2.  Model assessment Compute test error of ŵλ* (fitted model for selected complexity λ*) to approx. generalization error ©2015  Emily  Fox  &  Carlos  Guestrin   Overly optimistic! Training set Test set
  • 85. Machine  Learning  Specializa0on  85   Hypothetical implementation ©2015  Emily  Fox  &  Carlos  Guestrin   Issue: Just like fitting ŵ and assessing its performance both on training data •  λ* was selected to minimize test error (i.e., λ* was fit on test data) •  If test data is not representative of the whole world, then ŵλ* will typically perform worse than test error indicates Training set Test set
  • 86. Machine  Learning  Specializa0on  86   Training set Test set Practical implementation ©2015  Emily  Fox  &  Carlos  Guestrin   Solution: Create two “test” sets! 1.  Select λ* such that ŵλ* minimizes error on validation set 2.  Approximate generalization error of ŵλ* using test set Validation set Training set Test set
  • 87. Machine  Learning  Specializa0on  87   Practical implementation ©2015  Emily  Fox  &  Carlos  Guestrin   Validation set Training set Test set fit ŵλ test performance of ŵλ to select λ* assess generalization error of ŵλ*
  • 88. Machine  Learning  Specializa0on  88   Typical splits ©2015  Emily  Fox  &  Carlos  Guestrin   Validation set Training set Test set 80% 10% 10% 50% 25% 25%
  • 89. Machine  Learning  Specializa0on   Summary of assessing performance ©2015  Emily  Fox  &  Carlos  Guestrin  
  • 90. Machine  Learning  Specializa0on  90   What you can do now… •  Describe what a loss function is and give examples •  Contrast training, generalization, and test error •  Compute training and test error given a loss function •  Discuss issue of assessing performance on training set •  Describe tradeoffs in forming training/test splits •  List and interpret the 3 sources of avg. prediction error -  Irreducible error, bias, and variance •  Discuss issue of selecting model complexity on test data and then using test error to assess generalization error •  Motivate use of a validation set for selecting tuning parameters (e.g., model complexity) •  Describe overall regression workflow ©2015  Emily  Fox  &  Carlos  Guestrin