SlideShare a Scribd company logo
1 of 91
Download to read offline
Hyperparameter Optimization
(landscape) in Python
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon
● Intro
● Methods
● Libraries + Evaluation Criteria
○ Scikit-Optimize
○ Optuna + Hyperopt
○ HpBandster
● Results and Recommendations
Agenda
learning rate
depth
feature fraction
Model scoredata
Intro
learning rate
depth
feature fraction
Model
score
bin_nr
groupby columns
lagging
Feature
Engineering
imputation method
scaling method
Data
Cleaning
thresholds
Post-
processing
data
objective(params, data=data) -> score
Intro
● Grid search
● Random search
● Guided search
● Grad student search (still best)
Methods
● Better configuration proposal
○ Objective function is estimated with surrogate models
○ Evolutionary methods
○ ...
● Faster objective function calculation
○ Bandid methods
○ Pruning
○ Estimating a score from the learning curve of NN
○ ...
Methods
Methods: surrogate models
objective(params) -> score
surrogate(params) -> est_score
surrogate(params2) =
est_score*2
surrogate(params1000) =
est_score1000
objective(params2) = score2
surrogate(params1) =
est_score*1
expensive
cheap
explore cheap
try expensive
Methods: surrogate models
objective(params) -> score
surrogate(params) -> est_score
surrogate(params2) =
est_score*2
surrogate(params1000) =
est_score1000
objective(params2) = score2
surrogate(params1) =
est_score*1
TPE, GP, RF
EI, PI, LCB
Methods: bandid methods
objective(params, data=data) -> score
objective(params, budget=full) -> score
objective(params, budget=low) -> score
Estimate the score with lower fidelity runs
Methods: bandid methods
● Budget options:
○ Dataset size
○ Number of epochs
○ Time
○ Number of features
○ Number of CV-folds
Methods: bandid methods
● Successive halving: set resource, set budget, set run nr link
● Hyperband: random resource, grid search run nr, within set budget link
Methods: pruning
● Prune/abort runs that show little hope before they finish
● Isn’t it called early stopping?
● Scikit-Optimize
● Optuna
● Hyperopt (sort of)
● HpBandSter
● Ray.tune (future work)
● … and many more
Libraries
● Algorithm
● API / ease of use
● Documentation
● Speed / Parallelization
● Visualization suite
● Experimental results
Evaluation Criteria
Scikit-Optimize
Algorithm
● Objective function estimated with surrogate models
○ Random Forests
○ Gradient Boosted Trees
○ Gaussian process
● Next run params selected via acquisition function
○ Expected Improvement
○ Probability of Improvement
○ Lower Confidence Bound
● No objective func calculation speedup mechanism
API
search space
+
objective
+
{fun}_minimize
API: search space
● Basic options:
○ skopt.space.Real
○ skopt.space.Integer
○ skopt.space.Categorical
● No support for nested search spaces
API: search space
SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'),
skopt.space.Integer(1, 30, name='max_depth'),
skopt.space.Integer(2, 100, name='num_leaves'),
skopt.space.Integer(10, 1000, name='min_data_in_leaf'),
skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'),
skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'),
]
API: objective
● Define a function to minimize!
● Decorate if you want to keep parameter names
API: objective
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)
@skopt.utils.use_named_args(SPACE)
def objective(**params):
return -1.0 * train_evaluate(X, y, **params)
API: {fun}_minimize
● A few optimizers to choose from
○ skopt.forest_minimize
○ skopt.gbrt_minimize
○ skopt.gp_minimize
● Accepts callbacks
API: {fun}_minimize
results = skopt.forest_minimize(objective, SPACE,
n_calls=100,
n_random_starts=10,
base_estimator='ET',
acq_func='LCB',
xi=0.02,
kappa=1.96)
API: {fun}_minimize
def monitor(res):
neptune.send_metric('run_score', res.func_vals[-1])
results = skopt.forest_minimize(..., callback=[monitor])
API: {fun}_minimize
● There are (hyper)hyperparameters
● Acquisition function:
○ ‘EI’, ‘PI’ , expected improvement probability of improvement (max)
○ ‘LCB’, expected value of objective + variance of GP
● Exploration vs exploitation
○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation
○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration
Documentation
● Amazing!
● Functions have docstrings.
● A lot of examples.
link
Visualizations
● Options:
○ skopt.plots.plot_convergence - score improvement
○ skopt.plots.plot_evaluations - space search evolution
○ skopt.plots.plot_objective - sensitivity
● Beautiful and very useful.
Visualizations: plot_convergence
skopt.plots.plot_convergence(results)
Visualizations: plot_convergence
skopt.plots.plot_convergence(results_list)
Visualizations: plot_evaluations
skopt.plots.plot_evaluations(results)
Visualizations: plot_evaluations
skopt.plots.plot_objective(results)
Speed & Parallelization
● Runs sequentially and you cannot distribute it across many machines
● You can parallelize base estimator at every run with n_jobs
● If you have just 1 machine it is fast
Experimental Results
Conclusions: good
● Easy to use API and great documentation
● A lot of optimizers and tweaking options
● Awesome visualizations
● Solid gains over the random search
● Fast if you are running sequentially on 1 machine
● Active project support
Conclusions: bad
● Search space doesn’t support nesting
● No support for distributed computing
Optuna
Algorithm
● Objective function estimated with Tree of Parzen Estimators
● Next run params selected via Expected Improvement
● Objective func calculation speedup via run pruning and
successive halving (optionally)
API
search space & objective
+
{fun}_minimize
API: search space & objective
def objective(trial):
params = OrderedDict([
('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)),
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100)),
('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)),
('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)),
('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))])
score = -1.0 * train_evaluate(X, y, params)
return score
API: search space & objective
● Basic options:
○ suggest_categorical
○ suggest_int , suggest_discrete_uniform
○ suggest_uniform , suggest_loguniform
● Nested search spaces
● Defined in-run (pytorch-like)
API: search space & objective
def objective(trial):
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c)
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)
…
API: {fun}_minimize
● Allows pruning
● Handles exceptions in objective
● Handles callbacks
study = optuna.create_study()
study.optimize(objective, n_trials=100)
results = study.trails_dataframe()
API: {fun}_minimize
API: {fun}_minimize: pruning
from optuna.integration import LightGBMPruningCallback
def objective(trial):
params = OrderedDict([
('max_depth',trial.suggest_int('max_depth', 1, 30)),
('num_leaves',trial.suggest_int('num_leaves', 2, 100))])
pruning_callback = LightGBMPruningCallback(trial, 'auc')
score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback)
return score
def train_evaluate_with_pruning(X, y, params, callback):
...
model = lgb.train(params, train_data, ... , callbacks = [pruning_callback])
return model.best_score['valid']['auc']
API: {fun}_minimize: callbacks
study = optuna.create_study()
study.optimize(objective, n_trials=100, callbacks=[report_neptune])
def report_neptune(study, trial):
neptune.send_metric('value', trial.value)
neptune.send_metric('best_value', study.best_value)
Available in bleeding edge version from source*
Documentation
● Solid read-the-docs project,
● Docstrings, docstrings everywhere,
● A lot of examples.
link
Visualizations
● Options:
○ optuna.visualization.plot_intermediate_values
○ optuna.visualization.plot_optimization_history
● Basic monitoring
● Available in bleeding edge version from source*
Speed & Parallelization
● Can be easily distributed across one or many machines
● Has pruning to speed up unpromising runs
Speed & Parallelization: one
study.optimize(objective, n_trials=100, n_jobs=5)
Speed & Parallelization: many
…
study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db')
study.optimize(objective, n_trials=100)
...
$ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db"
$ python optuna_search.py
$ python optuna_search.py
terminal 1
terminal 2
terminal 3
optuna_search.py
Experimental Results
Conclusions: good
● Easy to use API
● Great documentation
● Can be easily distributed over a cluster of machines
● Has pruning
● Has callbacks
● Search space supports nesting
● Active project support
Conclusions: bad
● Only TPE optimizer available
● Only some visualizations
● *No gains over the random search (with 100 iterations budget)
Optuna is hyperopt with:
● better api
● waaaay better documentation
● pruning (and halving available)
● exception handling
● simpler parallelization
● active project support
Should I swap hyperopt with optuna?
HpBandSter
https://www.automl.org/
● HyperBand on Steroids
● It has state-of-the-art algorithms
○ Hyperband link
○ BOHB (Bayesian Optimization + Hyperband) link
● Distributed-computing-first API
HpBandSter
HpBandSter
Algorithm
● Objective function estimated with TPE
● Next run params selected via Expected Improvement
● Objective func calculation speedup via bandid with
random budgets (hyperband)
API
server
+
worker
+
optimizer
API
server
+
worker
+
optimizer
API: server
● Workers communicate with server to:
○ get next parameter configuration
○ send results
● You have to define it even for the most basic setups/problems (weird)
API: server
import hpbandster.core.nameserver as hpns
NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY)
ns_host, ns_port = NS.start()
API: worker: objective
from hpbandster.core.worker import Worker
class TrainEvalWorker(Worker):
...
def compute(self, config, budget, working_directory, *args, **kwargs):
loss = -1.0 * train_evaluate(self.X, self.y, budget, config)
return ({'loss': loss,
'info': { 'auxiliary_stuff': 'worked'
}
})
API: worker: search space
● Basic options:
○ CSH.{Categorical/Ordinal}Hyperparameter
○ CSH.{Uniform/Normal}IntegerHyperparameter
○ CSH.{Uniform/Normal}FloatHyperparameter
● Nested search spaces with ifs
API: worker: search space
class TrainEvalWorker(Worker):
...
@staticmethod
def get_configspace():
cs = CS.ConfigurationSpace()
learning_rate = CSH.UniformFloatHyperparameter('learning_rate',
lower=0.01, upper=0.5, default_value=0.01, log=True)
subsample = CSH.UniformFloatHyperparameter('subsample',
lower=0.1, upper=1.0, default_value=0.5, log=False)
cs.add_hyperparameters([learning_rate, subsample])
return cs
API: worker: connecting to server
worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port)
worker.run(background=True)
API: optimizer
from hpbandster.optimizers import BOHB
optim = BOHB(configspace = worker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port,
eta=3, min_budget=0.1, max_budget=1,
num_samples=64, top_n_percent=15,
min_bandwidth=1e-3, bandwidth_factor=3)
study = optim.run(n_iterations=100)
API: optimizer: callbacks
class NeptuneLogger:
def new_config(self, *args, **kwargs):
pass
def __call__(self, job):
neptune.send_metric('run_score', job.result['loss'])
neptune.send_text('run_parameters', str(job.kwargs['config']))
optim = BOHB(configspace=worker.get_configspace(),
run_id=RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port,
result_logger=NeptuneLogger())
Documentation
● Decent Read-the-docs project,
● Missing docstrings in a lot of places,
● A bunch of examples.
link
Visualizations
● Options:
○ hpvis.losses_over_time - score improvement
○ hpvis.concurrent_runs_over_time - speed/parallelization
○ hpvis.finished_runs_over_time - budget adjustment
○ hpvis.correlation_across_budgets - budget adjustment
○ hpvis.performance_histogram_model_vs_random - sanity check
● Very lib/debug-specific but can be useful for tweaking
Visualizations: losses_over_time
Visualizations: losses_over_time
all_runs = results.get_all_runs()
hpvis.losses_over_time(all_runs);
Visualizations:
correlation_across_budgets
Visualizations:
correlation_across_budgets
hpvis.correlation_across_budgets(results);
Visualizations:
performance_histogram_model_vs_random
Visualizations:
performance_histogram_model_vs_random
all_runs = results.get_all_runs()
id2conf = results.get_id2config_mapping()
hpvis.performance_histogram_model_vs_random(all_runs, id2conf);
Speed & Parallelization
● Can be easily distributed across threads/processes/machines
Speed & Parallelization: threads
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=True)
workers.append(w)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
Speed & Parallelization: processes
workers=[]
for i in range(N_WORKERS):
w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5,
nameserver=ns_host, nameserver_port=ns_port)
w.run(background=False)
exit(0)
optim = BOHB(configspace = TrainEvalWorker.get_configspace(),
run_id = RUN_ID,
nameserver=ns_host,
nameserver_port=ns_port)
study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
Speed & Parallelization: machines
Follow the example from the docs … but it is not obvious
Experimental Results
Conclusions: good
● State-of-the-art algorithm
● Can be distributed over a cluster of machines
● Useful visualizations
● Search space supports nesting
Conclusions: bad
● Project is not very active
● Complicated API
● Missing docstrings
Which one should I choose?
Results (mostly subjective)
Scikit-Optimize Optuna HpBandSter Hyperopt
API/ease of use Great Great Difficult Good
Documentation Great Great Ok(ish) Bad
Speed/Parallelization Fast if
sequential/None
Great Good Ok
Visualizations Amazing Basic Very lib specific Some
*Experimental results 0.8566 (100) 0.8419 (100)
0.8597 (10000)
0.8629 (100) 0.8420 (100)
Dream library
Scikit-Optimize Visualizations
+
Optuna API + Docs + Pruning + Callbacks +
Parallelization
+
HpBandSter Optimizers
Conversions between
results objects are in
neptune-contrib
import neptunecontrib.hpo.utils as hpo_utils
results = hpo_utils.optuna2skopt(study)
Dream library
● If you don’t have a lot of resources - use Scikit-Optimize
● If you want to get SOTA and don’t care about API/Docs - use HpBandSter
● If you want good docs/api/parallelization - use Optuna
Recommendations
● Slides link on Twitter @NeptuneML or Linkedin @neptune.ml
● Blog posts on Medium @jakub.czakon
● Experiments in Neptune tags skopt/optuna/hpbandster
○ Code
○ Best hyperparams and Hyper hyper params
○ learning curves
○ diagnostic charts
○ resource consumption charts
○ pickled results objects
Materials
Data science work sharing hub.
Track | Organize | Collaborate
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

More Related Content

Similar to Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowDatabricks
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 

Similar to Hyperparameter optimization landscape Berlin ML Group meetup 8/2019 (20)

DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
C3 w1
C3 w1C3 w1
C3 w1
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

  • 1. Hyperparameter Optimization (landscape) in Python kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon
  • 2. ● Intro ● Methods ● Libraries + Evaluation Criteria ○ Scikit-Optimize ○ Optuna + Hyperopt ○ HpBandster ● Results and Recommendations Agenda
  • 4. learning rate depth feature fraction Model score bin_nr groupby columns lagging Feature Engineering imputation method scaling method Data Cleaning thresholds Post- processing data objective(params, data=data) -> score Intro
  • 5. ● Grid search ● Random search ● Guided search ● Grad student search (still best) Methods
  • 6. ● Better configuration proposal ○ Objective function is estimated with surrogate models ○ Evolutionary methods ○ ... ● Faster objective function calculation ○ Bandid methods ○ Pruning ○ Estimating a score from the learning curve of NN ○ ... Methods
  • 7. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 expensive cheap explore cheap try expensive
  • 8. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 TPE, GP, RF EI, PI, LCB
  • 9. Methods: bandid methods objective(params, data=data) -> score objective(params, budget=full) -> score objective(params, budget=low) -> score Estimate the score with lower fidelity runs
  • 10. Methods: bandid methods ● Budget options: ○ Dataset size ○ Number of epochs ○ Time ○ Number of features ○ Number of CV-folds
  • 11. Methods: bandid methods ● Successive halving: set resource, set budget, set run nr link ● Hyperband: random resource, grid search run nr, within set budget link
  • 12. Methods: pruning ● Prune/abort runs that show little hope before they finish ● Isn’t it called early stopping?
  • 13. ● Scikit-Optimize ● Optuna ● Hyperopt (sort of) ● HpBandSter ● Ray.tune (future work) ● … and many more Libraries
  • 14. ● Algorithm ● API / ease of use ● Documentation ● Speed / Parallelization ● Visualization suite ● Experimental results Evaluation Criteria
  • 16. Algorithm ● Objective function estimated with surrogate models ○ Random Forests ○ Gradient Boosted Trees ○ Gaussian process ● Next run params selected via acquisition function ○ Expected Improvement ○ Probability of Improvement ○ Lower Confidence Bound ● No objective func calculation speedup mechanism
  • 18. API: search space ● Basic options: ○ skopt.space.Real ○ skopt.space.Integer ○ skopt.space.Categorical ● No support for nested search spaces
  • 19. API: search space SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'), skopt.space.Integer(1, 30, name='max_depth'), skopt.space.Integer(2, 100, name='num_leaves'), skopt.space.Integer(10, 1000, name='min_data_in_leaf'), skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'), skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'), ]
  • 20. API: objective ● Define a function to minimize! ● Decorate if you want to keep parameter names
  • 21. API: objective def objective(**params): return -1.0 * train_evaluate(X, y, **params) @skopt.utils.use_named_args(SPACE) def objective(**params): return -1.0 * train_evaluate(X, y, **params)
  • 22. API: {fun}_minimize ● A few optimizers to choose from ○ skopt.forest_minimize ○ skopt.gbrt_minimize ○ skopt.gp_minimize ● Accepts callbacks
  • 23. API: {fun}_minimize results = skopt.forest_minimize(objective, SPACE, n_calls=100, n_random_starts=10, base_estimator='ET', acq_func='LCB', xi=0.02, kappa=1.96)
  • 24. API: {fun}_minimize def monitor(res): neptune.send_metric('run_score', res.func_vals[-1]) results = skopt.forest_minimize(..., callback=[monitor])
  • 25. API: {fun}_minimize ● There are (hyper)hyperparameters ● Acquisition function: ○ ‘EI’, ‘PI’ , expected improvement probability of improvement (max) ○ ‘LCB’, expected value of objective + variance of GP ● Exploration vs exploitation ○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation ○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration
  • 26. Documentation ● Amazing! ● Functions have docstrings. ● A lot of examples. link
  • 27. Visualizations ● Options: ○ skopt.plots.plot_convergence - score improvement ○ skopt.plots.plot_evaluations - space search evolution ○ skopt.plots.plot_objective - sensitivity ● Beautiful and very useful.
  • 32. Speed & Parallelization ● Runs sequentially and you cannot distribute it across many machines ● You can parallelize base estimator at every run with n_jobs ● If you have just 1 machine it is fast
  • 34. Conclusions: good ● Easy to use API and great documentation ● A lot of optimizers and tweaking options ● Awesome visualizations ● Solid gains over the random search ● Fast if you are running sequentially on 1 machine ● Active project support
  • 35. Conclusions: bad ● Search space doesn’t support nesting ● No support for distributed computing
  • 37. Algorithm ● Objective function estimated with Tree of Parzen Estimators ● Next run params selected via Expected Improvement ● Objective func calculation speedup via run pruning and successive halving (optionally)
  • 38. API search space & objective + {fun}_minimize
  • 39. API: search space & objective def objective(trial): params = OrderedDict([ ('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)), ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100)), ('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)), ('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)), ('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))]) score = -1.0 * train_evaluate(X, y, params) return score
  • 40. API: search space & objective ● Basic options: ○ suggest_categorical ○ suggest_int , suggest_discrete_uniform ○ suggest_uniform , suggest_loguniform ● Nested search spaces ● Defined in-run (pytorch-like)
  • 41. API: search space & objective def objective(trial): classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c) else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth) …
  • 42. API: {fun}_minimize ● Allows pruning ● Handles exceptions in objective ● Handles callbacks
  • 43. study = optuna.create_study() study.optimize(objective, n_trials=100) results = study.trails_dataframe() API: {fun}_minimize
  • 44. API: {fun}_minimize: pruning from optuna.integration import LightGBMPruningCallback def objective(trial): params = OrderedDict([ ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100))]) pruning_callback = LightGBMPruningCallback(trial, 'auc') score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback) return score def train_evaluate_with_pruning(X, y, params, callback): ... model = lgb.train(params, train_data, ... , callbacks = [pruning_callback]) return model.best_score['valid']['auc']
  • 45. API: {fun}_minimize: callbacks study = optuna.create_study() study.optimize(objective, n_trials=100, callbacks=[report_neptune]) def report_neptune(study, trial): neptune.send_metric('value', trial.value) neptune.send_metric('best_value', study.best_value) Available in bleeding edge version from source*
  • 46. Documentation ● Solid read-the-docs project, ● Docstrings, docstrings everywhere, ● A lot of examples. link
  • 47. Visualizations ● Options: ○ optuna.visualization.plot_intermediate_values ○ optuna.visualization.plot_optimization_history ● Basic monitoring ● Available in bleeding edge version from source*
  • 48. Speed & Parallelization ● Can be easily distributed across one or many machines ● Has pruning to speed up unpromising runs
  • 49. Speed & Parallelization: one study.optimize(objective, n_trials=100, n_jobs=5)
  • 50. Speed & Parallelization: many … study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db') study.optimize(objective, n_trials=100) ... $ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db" $ python optuna_search.py $ python optuna_search.py terminal 1 terminal 2 terminal 3 optuna_search.py
  • 52. Conclusions: good ● Easy to use API ● Great documentation ● Can be easily distributed over a cluster of machines ● Has pruning ● Has callbacks ● Search space supports nesting ● Active project support
  • 53. Conclusions: bad ● Only TPE optimizer available ● Only some visualizations ● *No gains over the random search (with 100 iterations budget)
  • 54. Optuna is hyperopt with: ● better api ● waaaay better documentation ● pruning (and halving available) ● exception handling ● simpler parallelization ● active project support
  • 55. Should I swap hyperopt with optuna?
  • 57. ● HyperBand on Steroids ● It has state-of-the-art algorithms ○ Hyperband link ○ BOHB (Bayesian Optimization + Hyperband) link ● Distributed-computing-first API HpBandSter
  • 59. Algorithm ● Objective function estimated with TPE ● Next run params selected via Expected Improvement ● Objective func calculation speedup via bandid with random budgets (hyperband)
  • 62. API: server ● Workers communicate with server to: ○ get next parameter configuration ○ send results ● You have to define it even for the most basic setups/problems (weird)
  • 63. API: server import hpbandster.core.nameserver as hpns NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY) ns_host, ns_port = NS.start()
  • 64. API: worker: objective from hpbandster.core.worker import Worker class TrainEvalWorker(Worker): ... def compute(self, config, budget, working_directory, *args, **kwargs): loss = -1.0 * train_evaluate(self.X, self.y, budget, config) return ({'loss': loss, 'info': { 'auxiliary_stuff': 'worked' } })
  • 65. API: worker: search space ● Basic options: ○ CSH.{Categorical/Ordinal}Hyperparameter ○ CSH.{Uniform/Normal}IntegerHyperparameter ○ CSH.{Uniform/Normal}FloatHyperparameter ● Nested search spaces with ifs
  • 66. API: worker: search space class TrainEvalWorker(Worker): ... @staticmethod def get_configspace(): cs = CS.ConfigurationSpace() learning_rate = CSH.UniformFloatHyperparameter('learning_rate', lower=0.01, upper=0.5, default_value=0.01, log=True) subsample = CSH.UniformFloatHyperparameter('subsample', lower=0.1, upper=1.0, default_value=0.5, log=False) cs.add_hyperparameters([learning_rate, subsample]) return cs
  • 67. API: worker: connecting to server worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port) worker.run(background=True)
  • 68. API: optimizer from hpbandster.optimizers import BOHB optim = BOHB(configspace = worker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port, eta=3, min_budget=0.1, max_budget=1, num_samples=64, top_n_percent=15, min_bandwidth=1e-3, bandwidth_factor=3) study = optim.run(n_iterations=100)
  • 69. API: optimizer: callbacks class NeptuneLogger: def new_config(self, *args, **kwargs): pass def __call__(self, job): neptune.send_metric('run_score', job.result['loss']) neptune.send_text('run_parameters', str(job.kwargs['config'])) optim = BOHB(configspace=worker.get_configspace(), run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port, result_logger=NeptuneLogger())
  • 70. Documentation ● Decent Read-the-docs project, ● Missing docstrings in a lot of places, ● A bunch of examples. link
  • 71. Visualizations ● Options: ○ hpvis.losses_over_time - score improvement ○ hpvis.concurrent_runs_over_time - speed/parallelization ○ hpvis.finished_runs_over_time - budget adjustment ○ hpvis.correlation_across_budgets - budget adjustment ○ hpvis.performance_histogram_model_vs_random - sanity check ● Very lib/debug-specific but can be useful for tweaking
  • 73. Visualizations: losses_over_time all_runs = results.get_all_runs() hpvis.losses_over_time(all_runs);
  • 77. Visualizations: performance_histogram_model_vs_random all_runs = results.get_all_runs() id2conf = results.get_id2config_mapping() hpvis.performance_histogram_model_vs_random(all_runs, id2conf);
  • 78. Speed & Parallelization ● Can be easily distributed across threads/processes/machines
  • 79. Speed & Parallelization: threads workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=True) workers.append(w) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  • 80. Speed & Parallelization: processes workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=False) exit(0) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  • 81. Speed & Parallelization: machines Follow the example from the docs … but it is not obvious
  • 83. Conclusions: good ● State-of-the-art algorithm ● Can be distributed over a cluster of machines ● Useful visualizations ● Search space supports nesting
  • 84. Conclusions: bad ● Project is not very active ● Complicated API ● Missing docstrings
  • 85. Which one should I choose?
  • 86. Results (mostly subjective) Scikit-Optimize Optuna HpBandSter Hyperopt API/ease of use Great Great Difficult Good Documentation Great Great Ok(ish) Bad Speed/Parallelization Fast if sequential/None Great Good Ok Visualizations Amazing Basic Very lib specific Some *Experimental results 0.8566 (100) 0.8419 (100) 0.8597 (10000) 0.8629 (100) 0.8420 (100)
  • 87. Dream library Scikit-Optimize Visualizations + Optuna API + Docs + Pruning + Callbacks + Parallelization + HpBandSter Optimizers
  • 88. Conversions between results objects are in neptune-contrib import neptunecontrib.hpo.utils as hpo_utils results = hpo_utils.optuna2skopt(study) Dream library
  • 89. ● If you don’t have a lot of resources - use Scikit-Optimize ● If you want to get SOTA and don’t care about API/Docs - use HpBandSter ● If you want good docs/api/parallelization - use Optuna Recommendations
  • 90. ● Slides link on Twitter @NeptuneML or Linkedin @neptune.ml ● Blog posts on Medium @jakub.czakon ● Experiments in Neptune tags skopt/optuna/hpbandster ○ Code ○ Best hyperparams and Hyper hyper params ○ learning curves ○ diagnostic charts ○ resource consumption charts ○ pickled results objects Materials
  • 91. Data science work sharing hub. Track | Organize | Collaborate kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon