Validation of Machine Learning Methods

aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
CS-E3210 Machine Learning: Basic Principles
Lecture 7: Validation
slides by Alexander Jung, 2017
Department of Computer Science
Aalto University, School of Science
Autumn (Period I) 2017
1 / 17

aalto-logo-en-3
Intro
Wrap Up
Todays Motto
small empirical risk (training error) does not imply good
performance on new data!
2 / 17

aalto-logo-en-3
Intro
Wrap Up
Background
this lecture is inspired by
lecture notes
http://cs229.stanford.edu/notes/cs229-notes4.pdf
of Prof. Andrew Ng (Stanford)
video of Prof. Ng
https://www.youtube.com/watch?v=BpgnnS7mKKU
Chapter 5.2 of the course book [DLBook]
3 / 17

aalto-logo-en-3
Intro
Wrap Up
Outline
1 Intro
2 A Simple Validation Method
3 Wrap Up
4 / 17

aalto-logo-en-3
Intro
Wrap Up
Ski Resort Marketing
you still did not ﬁnd another job
thus, you still work as marketing of a ski resort
hard disk full of webcam snapshots (gigabytes of data)
want to group them into “winter” and ”summer” images
you have only a few hours for this task ...
5 / 17

aalto-logo-en-3
Intro
Wrap Up
The Dataset
6 / 17

aalto-logo-en-3
Intro
Wrap Up
ML workflow so far...
create dataset X={(x(i), y(i))}N
i=1 by manual labeling
features x(i) ∈X and label y(i) ∈ Y of ith data point
define loss L((x, y), h(·)) (e.g., L((x, y), h(·))=(y −h(x))2)
define hypothesis space H (e.g., linear maps h(x) = wT x)
learn predictor h(·) : X → Y by empirical risk minimization
min
h(·)∈H
E{h(·)|X} = (1/N)
N
i=1
L((x(i)
, y(i)
), h(·))
7 / 17

aalto-logo-en-3
Intro
Wrap Up
So What?
want to predict label y from features x of new (unlabeled)
data point (which does not belong to X)
how is E{h(·)|X} related to average loss L((x, y), h(·)) ?
i.e., how well does h(·) generalize from X to new data points?
8 / 17

aalto-logo-en-3
Intro
Wrap Up
Outline
1 Intro
3 Wrap Up
9 / 17

aalto-logo-en-3
Intro
Wrap Up
Use Diﬀerent Data for Training and Testing
1 ERM on dataset X(train) to ﬁnd optimal predictor hopt(·)
2 apply hopt(·) to another dataset X(test) to get average loss
(1/N )
(x,y)∈X(test)
L((x, y), hopt(·))
10 / 17

aalto-logo-en-3
Intro
Wrap Up
Training and Testing
we randomly select and label N data points to obtain X(train)
we randomly select and label N data points to obtain X(test)
we learn optimal classiﬁer via ERM on training set X(train)
hopt(·) = argmin
h(·)∈H
(1/N)
(x,y)∈X(train)
L((x, y), h(·))
we then estimate the average loss using test set X(test)
E(hopt|X(test)
) = (1/N )
(x,y)∈X(test)
L((x, y), hopt(·))
11 / 17

aalto-logo-en-3
Intro
Wrap Up
Overﬁtting
using test set allows to diagnose “overﬁtting”
consider linear regression for predicting daytime y ∈ R from
10 × 10 pixels snapshots x ∈ R102
learn predictor h(x) = wT x using ERM with dataset X(train)
which contains N = 4 labeled data points (x(i), y(i))
how small can we make the empirical risk on X(train) ?
12 / 17

aalto-logo-en-3
Intro
Wrap Up
Overﬁtting
13 / 17

aalto-logo-en-3
Intro
Wrap Up
Overﬁtting
14 / 17

aalto-logo-en-3
Intro
Wrap Up
Outline
1 Intro
3 Wrap Up
15 / 17

aalto-logo-en-3
Intro
Wrap Up
Key Message Today
follow basic ML recipe to get optimal predictor/classiﬁer
DO NOT STOP AFTER OPTIMAL PREDICTOR FOUND
validate the predictor using NEW TEST DATA !
small training error and large test error indicates overﬁtting!
16 / 17

aalto-logo-en-3
Intro
Wrap Up
What Happens Next?
next lecture on using validation for model selection
read chapter “Cross Validation” of
http://cs229.stanford.edu/notes/cs229-notes5.pdf
ﬁll out post-lecture questionnaire in MyCourses (contributes
to grade!)
17 / 17

Validation of Machine Learning Methods

Recommended

Recommended

More Related Content

Similar to Validation of Machine Learning Methods

Similar to Validation of Machine Learning Methods (20)

Recently uploaded

Recently uploaded (20)

Validation of Machine Learning Methods