Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Boxever

Like-for-Like Comparsion of Machine Learning
Algorithms
Sensitivity Analysis of ML Hyperparameters
Dominik Dahlem
2016-09-25 Sun

Who am I?
• Dominik Dahlem, Lead Data Scientist, Boxever
 dominik.dahlem@boxever.com
 http://ie.linkedin.com/in/ddahlem
 http://github.com/dahlem
 @dahlemd

Introduction
Outline
1 Introduction
2 Building ML Models
3 Example: RL Gridworld
4 Summary

Introduction
Boxever
• Boxever is a Data Science company with a Customer
Intelligence Cloud for Travel
• Our cloud analytical services need to be well-tuned and robust
• e.g., recommendation models, propensity models, etc.

Introduction
Machine Learning Models are like…

Introduction
Goal
• Tuning
• ML algorithms tend to be governed by tunable parameters
typically referred as hyperparameters
• They are not trained
• Require trial-and-error ﬁne tuning
• Sensitivity Analysis
• Does a small perburbation in the parameters change the output
dramatically?
• Visual inspection easy in ML algorithms with very few
hyperparameters
• But mathematical treatment necessary in high dimensions

Building ML Models
Outline
1 Introduction
4 Summary

Building ML Models
Evaluating a ML Model Hypothesis
• A hypothesis (given the hyperparameters) may overﬁt → How
do we know?
• We have a low error, but the model is still inaccurate
• Test-driven development and debugging ↔ Statistical Diagnostics
1 With a given dataset, split into two sets: training and test
2 Fix hyperparameters
3 Learn model parameters and minimise the corresponding error
using the training set
4 Compute the test error using the test set

Building ML Models
Model Selection
• Without the validation set
• Optimise ML parameters using the training set for each
hypothesis (e.g., polynomial degree)
• Select the hypothesis with the smallest test error
• Estimate the generalisation error also using the test set →
optimistic error estimates
• With validation set
• Optimise ML parameters using the training set for each
hypothesis (e.g., polynomial degree)
• Select the hypothesis with the smallest cross-validation error
• Estimate the generalisation error also using the test set

Building ML Models
General ML Pipeline
• Parameter search
• grid vs random vs active
learning
• We found well-performing model,
but
• are the parameters sensitive to
minute changes?

Building ML Models
Sensitivity Analysis1
• Hyperparameter tuning using
e.g., Spearmint
• integrate uncertainty of the
k-fold CV
• model the parameter surface
on the mean error metric from
CV
• Characterise the nature of the
hyperparameter surface in the
vicinity of the optimal point (e.g.,
the one that minimised the error
of the ML algorithm)
1George E. P. Box and Norman R. Draper (2007). Response Surfaces, Mixtures, and Ridge Analyses. 2nd ed.
Wiley-Interscience.

Building ML Models
Bias vs. Variance
Jtraining(θ)
JCV(θ)
Optimal
Underﬁtting
(high bias)
Overﬁtting
(high variance)
d (polynomial degree)
J(θ)
• What is the source of bad predictions?

Building ML Models
Regularisation and Bias/Variance
Jtraining(θ)
JCV(θ)
Optimal
Overﬁtting
(high variance)
Underﬁtting
(high bias)
λ (regularisation)
J(θ)

Building ML Models
Learning Curves (High Bias)
Jtest(θ)
Jtraining(θ)
Desired
N (training set size)
J(θ)
• More training data will not help!

Building ML Models
Learning Curves (High Variance)
Jtest(θ)
Jtraining(θ)
Desired
N (training set size)
J(θ)
• More training data will likely help!

Example: RL Gridworld
Outline
1 Introduction
4 Summary

Overview
• Teach a computer to ﬁnd a
path to a goal
• Actions: N, E, S, W
• Classifying grids?
• Trial and Error?

SARSA(λ)
• SARSA update rule:
Q(s, a) ← Q(s, a) + α [r + γQ(s , a ) − Q(s, a)] . (1)
• s: the state, i.e., cell on the grid
• a: the action, i.e., N, E, S, W
• Q(s, a): state-action value function
• here: lookup table
• r: the reward received for performing action a in state s
• α: the learning rate
• γ: the discount factor

RL Gridworld Pipeline
• Optimise the learning rate and
the discount factor
• α ∈ [0.0001, 0.3]
• γ ∈ [0.01, 0.95]
• Fixed parameters for brevity:
• greedy policy
• the eligibility traces λ
• episodes N = 2000

Hyperparameter Tuning
α
0.000.050.100.150.200.250.30γ
0.0
0.2
0.4
0.6
0.8
0
100
200
300
400
68
72
76
80
84
88
92
96
100

Overview: Canonical Analysis2
• Find the optimum point using a
constrained optimisation method
that can escape local minima
• α = 0.0001, γ = 0.577
• Restrict the canonical analysis to
a subset of the parameter space
around the optimum value
• α ∈ [0.0001, 0.03]
• γ ∈ [0.48, 0.67]
• Trace the α, γ, and estimated
number of steps along the
maximum path
• eigen-system analysis of the
covariance matrix of the
hyperparameter surface
2George E. P. Box and Norman R. Draper (2007). Response Surfaces, Mixtures, and Ridge Analyses. 2nd ed.
Wiley-Interscience.

Maximum Path
0.15 0.2 0.25
0.3
0.4
0.5
ScaledParameters
α
γ
0.15 0.2 0.25
90
91
92
93
R
y

Tuned Gridworld
0

Summary
Outline
1 Introduction
4 Summary

Summary
• Enable like-for-like ML model evaluations
• Tuning, e.g.,
• Spearmint: https://github.com/JasperSnoek/spearmint
• SMAC: http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
• hyperopt: http://hyperopt.github.io/hyperopt/
• Canonical Anlaysis
• Sensitivity of the hyperparameters when subjected to small
perturbations around the optimum
• Assess the sensitivity of the HP between competing ML models
• Choose an ML model that does not exhibit minima that are
surrounded by very steep slopes in the hyperparameter surface

Boxever
Thank You!
dominik.dahlem@boxever.com

Acknowledgements
• Images:
• Rube Goldberg’s Self-Operating Napkin

Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Boxever

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Boxever

Similar to Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Boxever (20)

More from WithTheBest

More from WithTheBest (20)

Recently uploaded

Recently uploaded (20)

Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Boxever