Towards automating [supervised] machine learning:
Benchmarking tools for hyperparameter tuning
Dr. Thorben Jensen
tjensen@informationsfabrik.de
INFORMATIONSFABRIK
MÜNSTER
DIGITAL
TREASURE
HUNTERS
2
Finding optimal hyperparameters is important!
|
3
▪ Hyperparameters
are “parameters whose values [are] set before the learning process begins.
By contrast, the values of other parameters are derived via training.” (wikipedia.org)
▪ Hyperparameter example: depth of neural network
Hyperparameter tuning in supervised machine learning
Data Science Lab Production Factory
Idea
& Data
Visual-
ize &
Explore
Pre-
process
Tune &
select
models
Deploy Moni-
tor
Stack
Models
we are here
|4
A week in the life of a data scientist
|
5
Tuning … Re-tuning …
sigh.
Re-tuning …
More data
just arrived.
Please update
your model!
OK?
Just remove this
one variable…
https://phdcomics.com/
Simple automation: Grid and Random Search
|
6
Grid search
1. Select values for each hyperparameter to test
2. Try ALL combinations
Random search
- Varies important hyperparameters more !
- More efficient at model tuning
parameter 1
parameter2
parameter 1
parameter2
http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
Sequential model-based optimization (SMBO)
|
7
▪ Use fast regression model as proxy for slow ML model:
1. Evaluate some random sets of hyperparameters
2. Build a regression model: ‘hyperparameters -> loss’
3. Find hyperparameter with lowest loss, according to fast proxy
4. Evaluate real model for hyperparameter, observe loss
5. Update regression model
▪ Popular options for proxy model:
 Gaussian Processes
 Random Forests
 Tree-structured Parzen Estimators
0.7 0.5
Gaussian Process
|
8
1. From previous samples,
fit Gaussian Process:
expected value
uncertainty range
2. Max(acquisition function):
good expected value
and high uncertainty
3. Use new observation
to update Gaussian Process
4. Repeat from step 2 …
https://arxiv.org/pdf/1012.2599.pdf
better
hyperparameter
better
hyperparameter
Gaussian Process live
|
9
https://github.com/fmfn/BayesianOptimization
Gaussian Process live
|
10
https://github.com/fmfn/BayesianOptimization
Random Forest
▪Differences to Gaussian Process
Uses Decision Tree Ensemble, e.g. 10 trees
Also works with categorically scaled hyperparameters, e.g. activation function in neural net
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/papers/11-LION5-SMAC.pdf
|11
Tree-structured Parzen Estimator
|
12
1. Test some hyperparameters
2. Separate into:
best hyperparameters
worse hyperparameters
3. For new candidates
(vertical lienes),
model probability to be in
good or bad group
4. Expected improvement for
candidate:
P(good) / P(bad)
http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html
Loss
Algorithms compared
▪Gaussian Process
▪Random forest
▪Tree-structured
Parzen Estimator
|
13
also discrete
hyperparmeters
easy
configuration
interacting
hyperparameters
Python
library
/
Which one to choose?  benchmark ‘em all!
▪ Benchmarked python libraries
 Random Search: own implementation
 Gaussian Process: bayes_opt, skopt
 Random forest: smac
 Tree-structured Parzen Estimator: hyperopt
▪ Comparing to human expert (3 manual steps)
▪ Task: classification with xgboost library, early stopping
▪ 16 core CPU machines
▪ Structured data sets
 Iris
 Real-life dataset for decision support in insurance
▪ Limitations
 Only 2 datasets
 Varying both models and implementations
|14
Result I/II: iris dataset
|
15
Result I/II: iris dataset – compared to manual tuning
|
16
Result II/II: real-life dataset
|
17
Result II/II: real-life dataset – compared to manual
|
18
Literature I/II
Gaussian Process beats Random Search
https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
|19
Literature II/II
Tree-structured Parzen Estimator
beats Gaussian Process, Random Search (- - -), and Manual Search ( )
Random
https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
|20
Take home messages
▪More free weekends: automated tuning beats manual
▪SMBO algorithms do not differ much in performance, but in constraints:
Scaling of your hyperparameter
Interactions between hyperparameter
Algorithms for hyperparameter optimization have hyperparameters…
▪Tree-structured Parzen Estimator simple, but surprisingly effective
▪Random Search simple, but scalable and reasonably competitive
▪Spend your time wisely!  Instead of manual tuning, get more data and features
|21
Data Science Lab Production Factory
Idea
& Data
Visual-
ize &
Explore
Pre-
process
Tune &
select
models
Deploy Moni-
tor
Stack
Models
Towards automating [supervised] machine learning?
?? ?
|22
THANK YOU
谢谢
CHOKRANE
HVALA
VIELEN DANK
TAK
MERCI
TÄNAN
DANK U WEL
KÖSZÖNÖM
ขอบคุณครับ
DZIĘKUJĘ
ARIGATÔ
GRAZIE
СПАСИБО
TERIMA KASIH
TESEKKÜR EDERIM
GRACIAS
Special thanks to Tobias Rippel, Prasanth Sivabalasingam!

Towards automating machine learning: benchmarking tools for hyperparameter tuning - Dr. Thorben Jensen

  • 1.
    Towards automating [supervised]machine learning: Benchmarking tools for hyperparameter tuning Dr. Thorben Jensen tjensen@informationsfabrik.de
  • 2.
  • 3.
    Finding optimal hyperparametersis important! | 3 ▪ Hyperparameters are “parameters whose values [are] set before the learning process begins. By contrast, the values of other parameters are derived via training.” (wikipedia.org) ▪ Hyperparameter example: depth of neural network
  • 4.
    Hyperparameter tuning insupervised machine learning Data Science Lab Production Factory Idea & Data Visual- ize & Explore Pre- process Tune & select models Deploy Moni- tor Stack Models we are here |4
  • 5.
    A week inthe life of a data scientist | 5 Tuning … Re-tuning … sigh. Re-tuning … More data just arrived. Please update your model! OK? Just remove this one variable… https://phdcomics.com/
  • 6.
    Simple automation: Gridand Random Search | 6 Grid search 1. Select values for each hyperparameter to test 2. Try ALL combinations Random search - Varies important hyperparameters more ! - More efficient at model tuning parameter 1 parameter2 parameter 1 parameter2 http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
  • 7.
    Sequential model-based optimization(SMBO) | 7 ▪ Use fast regression model as proxy for slow ML model: 1. Evaluate some random sets of hyperparameters 2. Build a regression model: ‘hyperparameters -> loss’ 3. Find hyperparameter with lowest loss, according to fast proxy 4. Evaluate real model for hyperparameter, observe loss 5. Update regression model ▪ Popular options for proxy model:  Gaussian Processes  Random Forests  Tree-structured Parzen Estimators 0.7 0.5
  • 8.
    Gaussian Process | 8 1. Fromprevious samples, fit Gaussian Process: expected value uncertainty range 2. Max(acquisition function): good expected value and high uncertainty 3. Use new observation to update Gaussian Process 4. Repeat from step 2 … https://arxiv.org/pdf/1012.2599.pdf better hyperparameter better hyperparameter
  • 9.
  • 10.
  • 11.
    Random Forest ▪Differences toGaussian Process Uses Decision Tree Ensemble, e.g. 10 trees Also works with categorically scaled hyperparameters, e.g. activation function in neural net http://www.cs.ubc.ca/labs/beta/Projects/SMAC/papers/11-LION5-SMAC.pdf |11
  • 12.
    Tree-structured Parzen Estimator | 12 1.Test some hyperparameters 2. Separate into: best hyperparameters worse hyperparameters 3. For new candidates (vertical lienes), model probability to be in good or bad group 4. Expected improvement for candidate: P(good) / P(bad) http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html Loss
  • 13.
    Algorithms compared ▪Gaussian Process ▪Randomforest ▪Tree-structured Parzen Estimator | 13 also discrete hyperparmeters easy configuration interacting hyperparameters Python library /
  • 14.
    Which one tochoose?  benchmark ‘em all! ▪ Benchmarked python libraries  Random Search: own implementation  Gaussian Process: bayes_opt, skopt  Random forest: smac  Tree-structured Parzen Estimator: hyperopt ▪ Comparing to human expert (3 manual steps) ▪ Task: classification with xgboost library, early stopping ▪ 16 core CPU machines ▪ Structured data sets  Iris  Real-life dataset for decision support in insurance ▪ Limitations  Only 2 datasets  Varying both models and implementations |14
  • 15.
    Result I/II: irisdataset | 15
  • 16.
    Result I/II: irisdataset – compared to manual tuning | 16
  • 17.
  • 18.
    Result II/II: real-lifedataset – compared to manual | 18
  • 19.
    Literature I/II Gaussian Processbeats Random Search https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf |19
  • 20.
    Literature II/II Tree-structured ParzenEstimator beats Gaussian Process, Random Search (- - -), and Manual Search ( ) Random https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf |20
  • 21.
    Take home messages ▪Morefree weekends: automated tuning beats manual ▪SMBO algorithms do not differ much in performance, but in constraints: Scaling of your hyperparameter Interactions between hyperparameter Algorithms for hyperparameter optimization have hyperparameters… ▪Tree-structured Parzen Estimator simple, but surprisingly effective ▪Random Search simple, but scalable and reasonably competitive ▪Spend your time wisely!  Instead of manual tuning, get more data and features |21
  • 22.
    Data Science LabProduction Factory Idea & Data Visual- ize & Explore Pre- process Tune & select models Deploy Moni- tor Stack Models Towards automating [supervised] machine learning? ?? ? |22
  • 23.
    THANK YOU 谢谢 CHOKRANE HVALA VIELEN DANK TAK MERCI TÄNAN DANKU WEL KÖSZÖNÖM ขอบคุณครับ DZIĘKUJĘ ARIGATÔ GRAZIE СПАСИБО TERIMA KASIH TESEKKÜR EDERIM GRACIAS Special thanks to Tobias Rippel, Prasanth Sivabalasingam!