SlideShare a Scribd company logo
1 of 33
Deep Dive into
Hyperparameter
Tuning
About Me
Shubhmay Potdar
Sr. Software Engineer @ eQ-Technologic
Contents
1. Introduction to Hyperparameter Tuning
2. Grid and Random Search
3. Sobol Sequences
4. Introduction to Sequential based Model Optimization
a. Bayesian Optimization
b. Tree of Parzen Estimator
5. Evolutionary Algorithms: CMA-ES
6. Particle Based Methods: Particle Swarm Optimization
7. Multi Fidelity Methods: Successive Halving and HyperBand
8. Libraries and Services for Hyperparameter Tuning
9. Future Scope for Research
Hyperparameters
What are hyperparameters ?
In machine learning, a hyperparameters are set of
configurations that are being assigned to the
learning algorithm and whose values cannot be
estimated using data.
1. Depth of tree ( Decision Tree)
2. No. of trees (Random Forest)
3. Regularization Parameters (XGBoost)
4. No. of layers (Deep Neural Network)
Why are they required ?
Good combinations are likely to give the best
results
Define complexity, ability to learn, structure of
the model.
Choosing correct values will help to eliminate
the chances of overfitting and underfitting.
Exploration Problem
Hyperparameter tuning
can be seen as an
exploration problem
The true structure of the
underlying function is
unknown
Aim is to explore as
many region as possible
within some constraints
1 2 3 4
Four Steps in Hyperparam Tuning
Objective Function:
what we want to
minimize, in this case
the validation error of a
machine learning
model with respect to
the hyperparameters
Domain Space:
hyperparameter values
to search over
Optimization algorithm:
method for constructing
the surrogate model and
choosing the next
hyperparameter values
to evaluate
Result history:
stored outcomes from
evaluations of the
objective function
consisting of the
hyperparameters and
validation loss
Grid Search
❖ Select values for each hyperparameter
to test and try all combinations
❖ Expensive to evaluate all combinations
Bergstra, James and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (2012): 281-305.
Random Search
❖ Select values randomly for every
hyperparameter
❖ Evaluations are independent, can be
evaluated parallely
❖ Specify distribution of parameters for
effective sampling
Bergstra, James and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (2012): 281-305.
Sobol Sequences
Sobol sequence is a low discrepancy
quasi-random sequence
Sobol sequences were designed to cover the
unit hypercube with lower discrepancy than
completely random sampling
Preview SMBO Can we do better than grid and random search ?
Can we have a guided tour in our journey for finding optimal
parameters ?
We know that the cost of evaluation of our training algorithm
is significantly large in most cases
And obviously we are not guaranteed that the given set of
parameters will give the optimal solution
https://pixabay.com/en/light-bulb-ideas-sketch-i-think-487859/
Bayesian
Optimization
Bayesian optimization is a framework
that is useful in following scenarios:
❖ Objective function has no
closed-form
❖ No access to gradients
❖ In presence of noise
❖ It may be expensive to evaluate.
Bayesian Optimization - Main
Components
Surrogate Function:
Needed to approximate the objective
function and chooses to optimize it
according to some acquisition function
Common choices are Gaussian Process,
Random Forest, Gradient Boosted
Machines
Acquisition function:
Helps to select next point for evaluation
Trade off between exploring unknown
regions versus exploiting known regions
Common choices are Expected
Improvement, Upper Confidence Bound,
Probability of Improvement, Thompson
Sampling etc.
Bayesian Optimization - Algorithm
Gaussian Process
Expected Improvement
f∗ - current optimal value
Quantify the improvement over f∗ if we sample a point x - I(x) = max(f∗ − Y, 0)
If f is modelled using GP, where ϕ,Φ are the PDF, CDF of standard normal
distribution, respectively
Challenges
How to design surrogate function that models
the objective function and which is also cheap to
evaluate
How to design the helper function that
guarantee tradeoff between exploration and
exploitation
https://pixabay.com/en/overcoming-stone-roll-slide-strong-2127669/
Drawbacks
❖ Complexity of GP is O(n^3)
❖ Hyperparameters for GP itself
❖ Difficult to parallelize
❖ Can stuck at local minima
Tree of Parzen
Estimator
We tend to explore more in the
region where we got high
percentage of optimal values in our
exploration.
Algorithm
❖ Sample N candidates at random and evaluate model
❖ Divide N candidates into two groups
➢ Group 1 - contains best observations
➢ group 2 - rest all
❖ Evaluate densities of both groups using parzen
window density estimator
❖ Use Expected Improvement as acquisition function
❖ Draw M samples from group 1
❖ Calculate EI = l(x)/g(X) for M samples (Where l(x) is a
probability being in the first group and g(x) is a
probability being in the second group.)
❖ Evaluate model where EI is maximum
❖ Repeat from 2 until no. of iterations get exhausted
Source: http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html
TPE - Algorithm
Source: http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html
Evolutionary Algorithm
❖ Evaluate the objective function at
certain points
❖ Based on the fitness results of the
current solutions, produce the next
generation of candidate solutions
that is more likely to produce even
better results than the current
generation
❖ The iterative process will stop once
the best known solution is
satisfactory for the user
Source: http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
Algorithm 1. Start with N candidates
2. Calculate the fitness score of each
candidate solution
3. Isolates the best 25% of the population in
generation
4. Using only the best solutions, along with
the mean μ​(g)​​ of the current generation
5. Calculate the covariance matrix C(g+1)​ of
the next generation
6. Sample a new set of candidate solutions
using the updated mean μ​(g+1)​​ and
covariance matrix C(g+1)
CMA-ES
Schaffer-2D Function Rastrigin-2D Function
Source: http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
Particle Swarm Optimization
❖ heuristic optimization technique
❖ simulates a set of particles that are moving around in the search space
❖ for hyperparameter search, position of a particle represents a set of
hyperparameters and its movement is influenced by the goodness of the
objective function value
Particle Swarm
Optimization
Algorithm
Particle Swarm Optimization
Source: https://pyswarms.readthedocs.io/en/latest/examples/visualization.html
Multi-Fidelity
Optimization
❖ Idea is to be replace full
evaluation with cheap
approproximations
➢ using subset of data
➢ cross validations on few folds
➢ few iteration of algorithm
❖ Reject significantly worst
performing configuration
Hyperband ❖ Employs pure exploration approach
❖ The idea is to try a large number of
random configurations
❖ By computing more efficiently, it tries at
more hyperparameter configurations
❖ Most of the algorithms are iterative in
machine learning,
❖ If we are running a set of parameters, and
the progress looks terrible, it might be a
good idea to quit and just try a new set of
hyperparameters
Successive Halving
❖ One way to implement such a scheme
called successive halving
❖ First try out N hyperparameter settings for
some fixed amount of time T
❖ Keep the N/2 best performing algorithms
and run for time 2T
❖ Repeating this procedure log2(M) times,
we end up with N/M configurations run
for MT time
Source: https://pdfs.semanticscholar.org/2442/ad6a385b9bcfcdca09b28e74b122eba8fdac.pdf
max_iter = 81
eta = 3
B = 5*max_iter
S = 4
n_i r_i
S = 3
n_i r_i
S = 2
n_i r_i
S = 1
n_i r_i
S = 0
n_i r_i
81 1 27 3 9 9 6 27 5
27 3 9 9 3 27 2 81
9 9 3 27 1 81
3 27 1 81
1 81
Suggestions If all hyperparameters are real-valued and one can only
afford a few dozen function evaluations, we recommend the
use of a Gaussian process-based Bayesian optimization
For large and conditional configuration spaces we suggest
either the random forest-based SMAC or TPE due to their
proven strong performance
For purely real-valued spaces and relatively cheap objective
functions, for which we can afford more than hundreds of
evaluations,use CMA-ES
Library Optunity - https://optunity.readthedocs.io/en/latest/
Deap - https://github.com/DEAP/deap
Smac3 - https://github.com/automl/SMAC3
Tune - https://ray.readthedocs.io/en/latest/tune.html
GPyOpt - https://sheffieldml.github.io/GPyOpt/
Scikit-optimize - https://scikit-optimize.github.io/
Hyperopt - https://github.com/hyperopt/hyperopt
Hyperband - https://github.com/zygmuntz/hyperband
Thanks

More Related Content

What's hot

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative ModelsMijung Kim
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 

What's hot (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Gradient Boosting
Gradient BoostingGradient Boosting
Gradient Boosting
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 

Similar to Deep Dive into Hyperparameter Tuning

Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationPriyatham Bollimpalli
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowDatabricks
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...PyData
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning techniqueDishaSinha9
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Jisu Han
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneXiaoweiJiang7
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...IAESIJAI
 

Similar to Deep Dive into Hyperparameter Tuning (20)

Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter Optimization
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...
Adaptive Bayesian contextual hyperband: A novel hyperparameter optimization a...
 

Recently uploaded

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 

Recently uploaded (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Deep Dive into Hyperparameter Tuning

  • 2. About Me Shubhmay Potdar Sr. Software Engineer @ eQ-Technologic
  • 3. Contents 1. Introduction to Hyperparameter Tuning 2. Grid and Random Search 3. Sobol Sequences 4. Introduction to Sequential based Model Optimization a. Bayesian Optimization b. Tree of Parzen Estimator 5. Evolutionary Algorithms: CMA-ES 6. Particle Based Methods: Particle Swarm Optimization 7. Multi Fidelity Methods: Successive Halving and HyperBand 8. Libraries and Services for Hyperparameter Tuning 9. Future Scope for Research
  • 4. Hyperparameters What are hyperparameters ? In machine learning, a hyperparameters are set of configurations that are being assigned to the learning algorithm and whose values cannot be estimated using data. 1. Depth of tree ( Decision Tree) 2. No. of trees (Random Forest) 3. Regularization Parameters (XGBoost) 4. No. of layers (Deep Neural Network) Why are they required ? Good combinations are likely to give the best results Define complexity, ability to learn, structure of the model. Choosing correct values will help to eliminate the chances of overfitting and underfitting.
  • 5. Exploration Problem Hyperparameter tuning can be seen as an exploration problem The true structure of the underlying function is unknown Aim is to explore as many region as possible within some constraints
  • 6. 1 2 3 4 Four Steps in Hyperparam Tuning Objective Function: what we want to minimize, in this case the validation error of a machine learning model with respect to the hyperparameters Domain Space: hyperparameter values to search over Optimization algorithm: method for constructing the surrogate model and choosing the next hyperparameter values to evaluate Result history: stored outcomes from evaluations of the objective function consisting of the hyperparameters and validation loss
  • 7. Grid Search ❖ Select values for each hyperparameter to test and try all combinations ❖ Expensive to evaluate all combinations Bergstra, James and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (2012): 281-305.
  • 8. Random Search ❖ Select values randomly for every hyperparameter ❖ Evaluations are independent, can be evaluated parallely ❖ Specify distribution of parameters for effective sampling Bergstra, James and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (2012): 281-305.
  • 9. Sobol Sequences Sobol sequence is a low discrepancy quasi-random sequence Sobol sequences were designed to cover the unit hypercube with lower discrepancy than completely random sampling
  • 10. Preview SMBO Can we do better than grid and random search ? Can we have a guided tour in our journey for finding optimal parameters ? We know that the cost of evaluation of our training algorithm is significantly large in most cases And obviously we are not guaranteed that the given set of parameters will give the optimal solution https://pixabay.com/en/light-bulb-ideas-sketch-i-think-487859/
  • 11. Bayesian Optimization Bayesian optimization is a framework that is useful in following scenarios: ❖ Objective function has no closed-form ❖ No access to gradients ❖ In presence of noise ❖ It may be expensive to evaluate.
  • 12. Bayesian Optimization - Main Components Surrogate Function: Needed to approximate the objective function and chooses to optimize it according to some acquisition function Common choices are Gaussian Process, Random Forest, Gradient Boosted Machines Acquisition function: Helps to select next point for evaluation Trade off between exploring unknown regions versus exploiting known regions Common choices are Expected Improvement, Upper Confidence Bound, Probability of Improvement, Thompson Sampling etc.
  • 15. Expected Improvement f∗ - current optimal value Quantify the improvement over f∗ if we sample a point x - I(x) = max(f∗ − Y, 0) If f is modelled using GP, where ϕ,Φ are the PDF, CDF of standard normal distribution, respectively
  • 16. Challenges How to design surrogate function that models the objective function and which is also cheap to evaluate How to design the helper function that guarantee tradeoff between exploration and exploitation https://pixabay.com/en/overcoming-stone-roll-slide-strong-2127669/
  • 17. Drawbacks ❖ Complexity of GP is O(n^3) ❖ Hyperparameters for GP itself ❖ Difficult to parallelize ❖ Can stuck at local minima
  • 18. Tree of Parzen Estimator We tend to explore more in the region where we got high percentage of optimal values in our exploration.
  • 19. Algorithm ❖ Sample N candidates at random and evaluate model ❖ Divide N candidates into two groups ➢ Group 1 - contains best observations ➢ group 2 - rest all ❖ Evaluate densities of both groups using parzen window density estimator ❖ Use Expected Improvement as acquisition function ❖ Draw M samples from group 1 ❖ Calculate EI = l(x)/g(X) for M samples (Where l(x) is a probability being in the first group and g(x) is a probability being in the second group.) ❖ Evaluate model where EI is maximum ❖ Repeat from 2 until no. of iterations get exhausted Source: http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html
  • 20. TPE - Algorithm Source: http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html
  • 21. Evolutionary Algorithm ❖ Evaluate the objective function at certain points ❖ Based on the fitness results of the current solutions, produce the next generation of candidate solutions that is more likely to produce even better results than the current generation ❖ The iterative process will stop once the best known solution is satisfactory for the user Source: http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
  • 22. Algorithm 1. Start with N candidates 2. Calculate the fitness score of each candidate solution 3. Isolates the best 25% of the population in generation 4. Using only the best solutions, along with the mean μ​(g)​​ of the current generation 5. Calculate the covariance matrix C(g+1)​ of the next generation 6. Sample a new set of candidate solutions using the updated mean μ​(g+1)​​ and covariance matrix C(g+1)
  • 23. CMA-ES Schaffer-2D Function Rastrigin-2D Function Source: http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
  • 24. Particle Swarm Optimization ❖ heuristic optimization technique ❖ simulates a set of particles that are moving around in the search space ❖ for hyperparameter search, position of a particle represents a set of hyperparameters and its movement is influenced by the goodness of the objective function value
  • 26. Particle Swarm Optimization Source: https://pyswarms.readthedocs.io/en/latest/examples/visualization.html
  • 27. Multi-Fidelity Optimization ❖ Idea is to be replace full evaluation with cheap approproximations ➢ using subset of data ➢ cross validations on few folds ➢ few iteration of algorithm ❖ Reject significantly worst performing configuration
  • 28. Hyperband ❖ Employs pure exploration approach ❖ The idea is to try a large number of random configurations ❖ By computing more efficiently, it tries at more hyperparameter configurations ❖ Most of the algorithms are iterative in machine learning, ❖ If we are running a set of parameters, and the progress looks terrible, it might be a good idea to quit and just try a new set of hyperparameters
  • 29. Successive Halving ❖ One way to implement such a scheme called successive halving ❖ First try out N hyperparameter settings for some fixed amount of time T ❖ Keep the N/2 best performing algorithms and run for time 2T ❖ Repeating this procedure log2(M) times, we end up with N/M configurations run for MT time Source: https://pdfs.semanticscholar.org/2442/ad6a385b9bcfcdca09b28e74b122eba8fdac.pdf
  • 30. max_iter = 81 eta = 3 B = 5*max_iter S = 4 n_i r_i S = 3 n_i r_i S = 2 n_i r_i S = 1 n_i r_i S = 0 n_i r_i 81 1 27 3 9 9 6 27 5 27 3 9 9 3 27 2 81 9 9 3 27 1 81 3 27 1 81 1 81
  • 31. Suggestions If all hyperparameters are real-valued and one can only afford a few dozen function evaluations, we recommend the use of a Gaussian process-based Bayesian optimization For large and conditional configuration spaces we suggest either the random forest-based SMAC or TPE due to their proven strong performance For purely real-valued spaces and relatively cheap objective functions, for which we can afford more than hundreds of evaluations,use CMA-ES
  • 32. Library Optunity - https://optunity.readthedocs.io/en/latest/ Deap - https://github.com/DEAP/deap Smac3 - https://github.com/automl/SMAC3 Tune - https://ray.readthedocs.io/en/latest/tune.html GPyOpt - https://sheffieldml.github.io/GPyOpt/ Scikit-optimize - https://scikit-optimize.github.io/ Hyperopt - https://github.com/hyperopt/hyperopt Hyperband - https://github.com/zygmuntz/hyperband