This talk introduces a main aspect in any machine learning application, i.e., the necessity of validating a machine learning method after it has been learned from training data.
1. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
CS-E3210 Machine Learning: Basic Principles
Lecture 7: Validation
slides by Alexander Jung, 2017
Department of Computer Science
Aalto University, School of Science
Autumn (Period I) 2017
1 / 17
3. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Background
this lecture is inspired by
lecture notes
http://cs229.stanford.edu/notes/cs229-notes4.pdf
of Prof. Andrew Ng (Stanford)
video of Prof. Ng
https://www.youtube.com/watch?v=BpgnnS7mKKU
Chapter 5.2 of the course book [DLBook]
3 / 17
5. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Ski Resort Marketing
you still did not find another job
thus, you still work as marketing of a ski resort
hard disk full of webcam snapshots (gigabytes of data)
want to group them into “winter” and ”summer” images
you have only a few hours for this task ...
5 / 17
7. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
ML workflow so far...
create dataset X={(x(i), y(i))}N
i=1 by manual labeling
features x(i) ∈X and label y(i) ∈ Y of ith data point
define loss L((x, y), h(·)) (e.g., L((x, y), h(·))=(y −h(x))2)
define hypothesis space H (e.g., linear maps h(x) = wT x)
learn predictor h(·) : X → Y by empirical risk minimization
min
h(·)∈H
E{h(·)|X} = (1/N)
N
i=1
L((x(i)
, y(i)
), h(·))
7 / 17
8. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
So What?
want to predict label y from features x of new (unlabeled)
data point (which does not belong to X)
how is E{h(·)|X} related to average loss L((x, y), h(·)) ?
i.e., how well does h(·) generalize from X to new data points?
8 / 17
10. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Use Different Data for Training and Testing
1 ERM on dataset X(train) to find optimal predictor hopt(·)
2 apply hopt(·) to another dataset X(test) to get average loss
(1/N )
(x,y)∈X(test)
L((x, y), hopt(·))
10 / 17
11. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Training and Testing
we randomly select and label N data points to obtain X(train)
we randomly select and label N data points to obtain X(test)
we learn optimal classifier via ERM on training set X(train)
hopt(·) = argmin
h(·)∈H
(1/N)
(x,y)∈X(train)
L((x, y), h(·))
we then estimate the average loss using test set X(test)
E(hopt|X(test)
) = (1/N )
(x,y)∈X(test)
L((x, y), hopt(·))
11 / 17
12. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Overfitting
using test set allows to diagnose “overfitting”
consider linear regression for predicting daytime y ∈ R from
10 × 10 pixels snapshots x ∈ R102
learn predictor h(x) = wT x using ERM with dataset X(train)
which contains N = 4 labeled data points (x(i), y(i))
how small can we make the empirical risk on X(train) ?
12 / 17
16. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
Key Message Today
follow basic ML recipe to get optimal predictor/classifier
DO NOT STOP AFTER OPTIMAL PREDICTOR FOUND
validate the predictor using NEW TEST DATA !
small training error and large test error indicates overfitting!
16 / 17
17. aalto-logo-en-3
Intro
A Simple Validation Method
Wrap Up
What Happens Next?
next lecture on using validation for model selection
read chapter “Cross Validation” of
http://cs229.stanford.edu/notes/cs229-notes5.pdf
fill out post-lecture questionnaire in MyCourses (contributes
to grade!)
17 / 17