Successfully reported this slideshow.
Your SlideShare is downloading. ×

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 91 Ad

More Related Content

Similar to Hyperparameter optimization landscape Berlin ML Group meetup 8/2019 (20)

Recently uploaded (20)

Advertisement

Hyperparameter optimization landscape Berlin ML Group meetup 8/2019

  1. 1. Hyperparameter Optimization (landscape) in Python kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon
  2. 2. ● Intro ● Methods ● Libraries + Evaluation Criteria ○ Scikit-Optimize ○ Optuna + Hyperopt ○ HpBandster ● Results and Recommendations Agenda
  3. 3. learning rate depth feature fraction Model scoredata Intro
  4. 4. learning rate depth feature fraction Model score bin_nr groupby columns lagging Feature Engineering imputation method scaling method Data Cleaning thresholds Post- processing data objective(params, data=data) -> score Intro
  5. 5. ● Grid search ● Random search ● Guided search ● Grad student search (still best) Methods
  6. 6. ● Better configuration proposal ○ Objective function is estimated with surrogate models ○ Evolutionary methods ○ ... ● Faster objective function calculation ○ Bandid methods ○ Pruning ○ Estimating a score from the learning curve of NN ○ ... Methods
  7. 7. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 expensive cheap explore cheap try expensive
  8. 8. Methods: surrogate models objective(params) -> score surrogate(params) -> est_score surrogate(params2) = est_score*2 surrogate(params1000) = est_score1000 objective(params2) = score2 surrogate(params1) = est_score*1 TPE, GP, RF EI, PI, LCB
  9. 9. Methods: bandid methods objective(params, data=data) -> score objective(params, budget=full) -> score objective(params, budget=low) -> score Estimate the score with lower fidelity runs
  10. 10. Methods: bandid methods ● Budget options: ○ Dataset size ○ Number of epochs ○ Time ○ Number of features ○ Number of CV-folds
  11. 11. Methods: bandid methods ● Successive halving: set resource, set budget, set run nr link ● Hyperband: random resource, grid search run nr, within set budget link
  12. 12. Methods: pruning ● Prune/abort runs that show little hope before they finish ● Isn’t it called early stopping?
  13. 13. ● Scikit-Optimize ● Optuna ● Hyperopt (sort of) ● HpBandSter ● Ray.tune (future work) ● … and many more Libraries
  14. 14. ● Algorithm ● API / ease of use ● Documentation ● Speed / Parallelization ● Visualization suite ● Experimental results Evaluation Criteria
  15. 15. Scikit-Optimize
  16. 16. Algorithm ● Objective function estimated with surrogate models ○ Random Forests ○ Gradient Boosted Trees ○ Gaussian process ● Next run params selected via acquisition function ○ Expected Improvement ○ Probability of Improvement ○ Lower Confidence Bound ● No objective func calculation speedup mechanism
  17. 17. API search space + objective + {fun}_minimize
  18. 18. API: search space ● Basic options: ○ skopt.space.Real ○ skopt.space.Integer ○ skopt.space.Categorical ● No support for nested search spaces
  19. 19. API: search space SPACE = [skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'), skopt.space.Integer(1, 30, name='max_depth'), skopt.space.Integer(2, 100, name='num_leaves'), skopt.space.Integer(10, 1000, name='min_data_in_leaf'), skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'), skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform'), ]
  20. 20. API: objective ● Define a function to minimize! ● Decorate if you want to keep parameter names
  21. 21. API: objective def objective(**params): return -1.0 * train_evaluate(X, y, **params) @skopt.utils.use_named_args(SPACE) def objective(**params): return -1.0 * train_evaluate(X, y, **params)
  22. 22. API: {fun}_minimize ● A few optimizers to choose from ○ skopt.forest_minimize ○ skopt.gbrt_minimize ○ skopt.gp_minimize ● Accepts callbacks
  23. 23. API: {fun}_minimize results = skopt.forest_minimize(objective, SPACE, n_calls=100, n_random_starts=10, base_estimator='ET', acq_func='LCB', xi=0.02, kappa=1.96)
  24. 24. API: {fun}_minimize def monitor(res): neptune.send_metric('run_score', res.func_vals[-1]) results = skopt.forest_minimize(..., callback=[monitor])
  25. 25. API: {fun}_minimize ● There are (hyper)hyperparameters ● Acquisition function: ○ ‘EI’, ‘PI’ , expected improvement probability of improvement (max) ○ ‘LCB’, expected value of objective + variance of GP ● Exploration vs exploitation ○ xi for ‘EI’ and ‘PI’, low xi exploration high xi exploitation ○ Kappa for ‘LCB’, low kappa exploitation, high kappa exploration
  26. 26. Documentation ● Amazing! ● Functions have docstrings. ● A lot of examples. link
  27. 27. Visualizations ● Options: ○ skopt.plots.plot_convergence - score improvement ○ skopt.plots.plot_evaluations - space search evolution ○ skopt.plots.plot_objective - sensitivity ● Beautiful and very useful.
  28. 28. Visualizations: plot_convergence skopt.plots.plot_convergence(results)
  29. 29. Visualizations: plot_convergence skopt.plots.plot_convergence(results_list)
  30. 30. Visualizations: plot_evaluations skopt.plots.plot_evaluations(results)
  31. 31. Visualizations: plot_evaluations skopt.plots.plot_objective(results)
  32. 32. Speed & Parallelization ● Runs sequentially and you cannot distribute it across many machines ● You can parallelize base estimator at every run with n_jobs ● If you have just 1 machine it is fast
  33. 33. Experimental Results
  34. 34. Conclusions: good ● Easy to use API and great documentation ● A lot of optimizers and tweaking options ● Awesome visualizations ● Solid gains over the random search ● Fast if you are running sequentially on 1 machine ● Active project support
  35. 35. Conclusions: bad ● Search space doesn’t support nesting ● No support for distributed computing
  36. 36. Optuna
  37. 37. Algorithm ● Objective function estimated with Tree of Parzen Estimators ● Next run params selected via Expected Improvement ● Objective func calculation speedup via run pruning and successive halving (optionally)
  38. 38. API search space & objective + {fun}_minimize
  39. 39. API: search space & objective def objective(trial): params = OrderedDict([ ('learning_rate',trial.suggest_loguniform('learning_rate', 0.01, 0.5)), ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100)), ('min_data_in_leaf',trial.suggest_int('min_data_in_leaf', 10, 1000)), ('feature_fraction',trial.suggest_uniform('feature_fraction', 0.1, 1.0)), ('subsample',trial.suggest_uniform('subsample', 0.1, 1.0))]) score = -1.0 * train_evaluate(X, y, params) return score
  40. 40. API: search space & objective ● Basic options: ○ suggest_categorical ○ suggest_int , suggest_discrete_uniform ○ suggest_uniform , suggest_loguniform ● Nested search spaces ● Defined in-run (pytorch-like)
  41. 41. API: search space & objective def objective(trial): classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c) else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth) …
  42. 42. API: {fun}_minimize ● Allows pruning ● Handles exceptions in objective ● Handles callbacks
  43. 43. study = optuna.create_study() study.optimize(objective, n_trials=100) results = study.trails_dataframe() API: {fun}_minimize
  44. 44. API: {fun}_minimize: pruning from optuna.integration import LightGBMPruningCallback def objective(trial): params = OrderedDict([ ('max_depth',trial.suggest_int('max_depth', 1, 30)), ('num_leaves',trial.suggest_int('num_leaves', 2, 100))]) pruning_callback = LightGBMPruningCallback(trial, 'auc') score = -1.0 * train_evaluate_with_pruning(X, y, params, pruning_callback) return score def train_evaluate_with_pruning(X, y, params, callback): ... model = lgb.train(params, train_data, ... , callbacks = [pruning_callback]) return model.best_score['valid']['auc']
  45. 45. API: {fun}_minimize: callbacks study = optuna.create_study() study.optimize(objective, n_trials=100, callbacks=[report_neptune]) def report_neptune(study, trial): neptune.send_metric('value', trial.value) neptune.send_metric('best_value', study.best_value) Available in bleeding edge version from source*
  46. 46. Documentation ● Solid read-the-docs project, ● Docstrings, docstrings everywhere, ● A lot of examples. link
  47. 47. Visualizations ● Options: ○ optuna.visualization.plot_intermediate_values ○ optuna.visualization.plot_optimization_history ● Basic monitoring ● Available in bleeding edge version from source*
  48. 48. Speed & Parallelization ● Can be easily distributed across one or many machines ● Has pruning to speed up unpromising runs
  49. 49. Speed & Parallelization: one study.optimize(objective, n_trials=100, n_jobs=5)
  50. 50. Speed & Parallelization: many … study = optuna.Study(study_name='distributed-search', storage='sqlite:///example.db') study.optimize(objective, n_trials=100) ... $ optuna create-study --study-name "distributed-search" --storage "sqlite:///example.db" $ python optuna_search.py $ python optuna_search.py terminal 1 terminal 2 terminal 3 optuna_search.py
  51. 51. Experimental Results
  52. 52. Conclusions: good ● Easy to use API ● Great documentation ● Can be easily distributed over a cluster of machines ● Has pruning ● Has callbacks ● Search space supports nesting ● Active project support
  53. 53. Conclusions: bad ● Only TPE optimizer available ● Only some visualizations ● *No gains over the random search (with 100 iterations budget)
  54. 54. Optuna is hyperopt with: ● better api ● waaaay better documentation ● pruning (and halving available) ● exception handling ● simpler parallelization ● active project support
  55. 55. Should I swap hyperopt with optuna?
  56. 56. HpBandSter https://www.automl.org/
  57. 57. ● HyperBand on Steroids ● It has state-of-the-art algorithms ○ Hyperband link ○ BOHB (Bayesian Optimization + Hyperband) link ● Distributed-computing-first API HpBandSter
  58. 58. HpBandSter
  59. 59. Algorithm ● Objective function estimated with TPE ● Next run params selected via Expected Improvement ● Objective func calculation speedup via bandid with random budgets (hyperband)
  60. 60. API server + worker + optimizer
  61. 61. API server + worker + optimizer
  62. 62. API: server ● Workers communicate with server to: ○ get next parameter configuration ○ send results ● You have to define it even for the most basic setups/problems (weird)
  63. 63. API: server import hpbandster.core.nameserver as hpns NS = hpns.NameServer(run_id=RUN_ID, host=HOST, port=PORT, working_directory=WORKING_DIRECTORY) ns_host, ns_port = NS.start()
  64. 64. API: worker: objective from hpbandster.core.worker import Worker class TrainEvalWorker(Worker): ... def compute(self, config, budget, working_directory, *args, **kwargs): loss = -1.0 * train_evaluate(self.X, self.y, budget, config) return ({'loss': loss, 'info': { 'auxiliary_stuff': 'worked' } })
  65. 65. API: worker: search space ● Basic options: ○ CSH.{Categorical/Ordinal}Hyperparameter ○ CSH.{Uniform/Normal}IntegerHyperparameter ○ CSH.{Uniform/Normal}FloatHyperparameter ● Nested search spaces with ifs
  66. 66. API: worker: search space class TrainEvalWorker(Worker): ... @staticmethod def get_configspace(): cs = CS.ConfigurationSpace() learning_rate = CSH.UniformFloatHyperparameter('learning_rate', lower=0.01, upper=0.5, default_value=0.01, log=True) subsample = CSH.UniformFloatHyperparameter('subsample', lower=0.1, upper=1.0, default_value=0.5, log=False) cs.add_hyperparameters([learning_rate, subsample]) return cs
  67. 67. API: worker: connecting to server worker = TrainEvalWorker(run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port) worker.run(background=True)
  68. 68. API: optimizer from hpbandster.optimizers import BOHB optim = BOHB(configspace = worker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port, eta=3, min_budget=0.1, max_budget=1, num_samples=64, top_n_percent=15, min_bandwidth=1e-3, bandwidth_factor=3) study = optim.run(n_iterations=100)
  69. 69. API: optimizer: callbacks class NeptuneLogger: def new_config(self, *args, **kwargs): pass def __call__(self, job): neptune.send_metric('run_score', job.result['loss']) neptune.send_text('run_parameters', str(job.kwargs['config'])) optim = BOHB(configspace=worker.get_configspace(), run_id=RUN_ID, nameserver=ns_host, nameserver_port=ns_port, result_logger=NeptuneLogger())
  70. 70. Documentation ● Decent Read-the-docs project, ● Missing docstrings in a lot of places, ● A bunch of examples. link
  71. 71. Visualizations ● Options: ○ hpvis.losses_over_time - score improvement ○ hpvis.concurrent_runs_over_time - speed/parallelization ○ hpvis.finished_runs_over_time - budget adjustment ○ hpvis.correlation_across_budgets - budget adjustment ○ hpvis.performance_histogram_model_vs_random - sanity check ● Very lib/debug-specific but can be useful for tweaking
  72. 72. Visualizations: losses_over_time
  73. 73. Visualizations: losses_over_time all_runs = results.get_all_runs() hpvis.losses_over_time(all_runs);
  74. 74. Visualizations: correlation_across_budgets
  75. 75. Visualizations: correlation_across_budgets hpvis.correlation_across_budgets(results);
  76. 76. Visualizations: performance_histogram_model_vs_random
  77. 77. Visualizations: performance_histogram_model_vs_random all_runs = results.get_all_runs() id2conf = results.get_id2config_mapping() hpvis.performance_histogram_model_vs_random(all_runs, id2conf);
  78. 78. Speed & Parallelization ● Can be easily distributed across threads/processes/machines
  79. 79. Speed & Parallelization: threads workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=True) workers.append(w) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  80. 80. Speed & Parallelization: processes workers=[] for i in range(N_WORKERS): w = TrainEvalWorker(run_id=RUN_ID, id=isleep_interval = 0.5, nameserver=ns_host, nameserver_port=ns_port) w.run(background=False) exit(0) optim = BOHB(configspace = TrainEvalWorker.get_configspace(), run_id = RUN_ID, nameserver=ns_host, nameserver_port=ns_port) study = optim.run(n_iterations=100, min_n_workers=N_WORKERS)
  81. 81. Speed & Parallelization: machines Follow the example from the docs … but it is not obvious
  82. 82. Experimental Results
  83. 83. Conclusions: good ● State-of-the-art algorithm ● Can be distributed over a cluster of machines ● Useful visualizations ● Search space supports nesting
  84. 84. Conclusions: bad ● Project is not very active ● Complicated API ● Missing docstrings
  85. 85. Which one should I choose?
  86. 86. Results (mostly subjective) Scikit-Optimize Optuna HpBandSter Hyperopt API/ease of use Great Great Difficult Good Documentation Great Great Ok(ish) Bad Speed/Parallelization Fast if sequential/None Great Good Ok Visualizations Amazing Basic Very lib specific Some *Experimental results 0.8566 (100) 0.8419 (100) 0.8597 (10000) 0.8629 (100) 0.8420 (100)
  87. 87. Dream library Scikit-Optimize Visualizations + Optuna API + Docs + Pruning + Callbacks + Parallelization + HpBandSter Optimizers
  88. 88. Conversions between results objects are in neptune-contrib import neptunecontrib.hpo.utils as hpo_utils results = hpo_utils.optuna2skopt(study) Dream library
  89. 89. ● If you don’t have a lot of resources - use Scikit-Optimize ● If you want to get SOTA and don’t care about API/Docs - use HpBandSter ● If you want good docs/api/parallelization - use Optuna Recommendations
  90. 90. ● Slides link on Twitter @NeptuneML or Linkedin @neptune.ml ● Blog posts on Medium @jakub.czakon ● Experiments in Neptune tags skopt/optuna/hpbandster ○ Code ○ Best hyperparams and Hyper hyper params ○ learning curves ○ diagnostic charts ○ resource consumption charts ○ pickled results objects Materials
  91. 91. Data science work sharing hub. Track | Organize | Collaborate kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon

×