SlideShare a Scribd company logo
1 of 49
Download to read offline
Introduction to Parallel Bayesian Optimization
for Machine Learning
Kejia Shi
H2O.ai Inc.
Columbia University in the City of New York
kejia.shi@columbia.edu
August 16, 2017
Kejia Shi Parallel Bayesian Optimization August 16, 2017 1 / 49
Overview
1 Introduction
2 Fundamentals
3 Bayesian Optimization
Gaussian Process
Acquisition Function
4 Parallel Bayesian Optimization
Kejia Shi Parallel Bayesian Optimization August 16, 2017 2 / 49
Introduction
So you did all the data cleaning, feature engineering...
And you fit your first model with hyperparameters...
Ridge regression: λ...
Neural network: number of layers, batch size...
Random forest: number of trees, leaf size...
Now what you are going to do?
Kejia Shi Parallel Bayesian Optimization August 16, 2017 3 / 49
Introduction
θ∗ = arg minθ Φ(θ), for θ ∈ Θ
What is the best set of hyperparameters to optimize the chosen
model?
Kejia Shi Parallel Bayesian Optimization August 16, 2017 4 / 49
Problem of Hyperparameter Search
Figure: Typical List of a Convolutional Neural Network Training Hyperparameters
Kejia Shi Parallel Bayesian Optimization August 16, 2017 5 / 49
Problem of Hyperparameter Search
Kejia Shi Parallel Bayesian Optimization August 16, 2017 6 / 49
Automated Hyperparameter Search: Grid Search
Kejia Shi Parallel Bayesian Optimization August 16, 2017 7 / 49
Automated Hyperparameter Search: Grid Search
Kejia Shi Parallel Bayesian Optimization August 16, 2017 8 / 49
Automated Hyperparameter Search: Random Search
Kejia Shi Parallel Bayesian Optimization August 16, 2017 9 / 49
Fundamentals
Kejia Shi Parallel Bayesian Optimization August 16, 2017 10 / 49
Fundamentals: Review of Probability
Suppose we have two events, A and B that may or may not be correlated,
A = It’s raining
B = The ground is wet
We can talk about probabilities of these events,
P(A) = Probability it is raining
P(B) = Probability the ground is wet
We can also talk about their other probabilities,
Conditional probabilities
P(B|A) = Probability the ground is wet given that it is raining
Joint probabilities
P(A, B) = Probability it is raining and the ground is wet
Kejia Shi Parallel Bayesian Optimization August 16, 2017 11 / 49
Fundamentals: Calculus of Probability
There are simple rules from moving from one probability to another
P(A, B) = P(A|B)P(B) = P(B|A)P(A)
P(A) = ΣbP(A, B = b)
and similarly P(B) = ΣaP(A = a, B)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 12 / 49
Fundamentals: Bayes Rule
We want to say something about the probability of B given that A
happened, Bayes rule says that the probability of B after knowing A
is:
P(B|A) =
P(A|B)P(B)
P(A)
.
Notice: P(B|A) – ”posterior”, P(A|B) – ”likelihood”, P(B) – ”prior”
and P(A) – ”marginal” or ”evidence”.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 13 / 49
Fundamentals: Gaussian Distribution
x ∼ N(µ, σ2
)
Density: f (x|µ, σ2
) =
1
√
2πσ2
exp{−
(x − µ)2
2σ2
}
Log Density: ln f (x|µ, σ2
) = −
(x − µ)2
2σ2
+ constant
Kejia Shi Parallel Bayesian Optimization August 16, 2017 14 / 49
Fundamentals: Probabilistic Model
In short, probabilistic model is a set of probability distribution p(x|).
the distribution family p(·) is selected but we don’t know the
parameter θ.
In Gaussian distribution, p(x|θ), θ = {µ, Σ}.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 15 / 49
Fundamentals: Maximum Likelihood
The independent and identically distributed (iid) assumption:
xi
iid
∼ p(x|θ), i = 1, ..., n.
With this, the joint density function becomes
p(x1, ..., xn|θ) = i p(xi |θ).
Maximum likelihood approach seeks the value of θ that mximize the
likelihood function: ˆθML arg maxθ p(x1, ..., xn|θ).
How to solve it? Take gradients!
Kejia Shi Parallel Bayesian Optimization August 16, 2017 16 / 49
Fundamentals: Logarithm Trick
Taking gradients of n
i=1 fi p(xi |θ) can be complicated. Therefore we
use the property of monotonically increasing functions on R+.
Specifically, maximizing g(y) and its monotone transformation
f (g(y)) is the same.
Hence we have the equality
ln(
i
fi ) = Σi ln(fi ).
Notice: location of a maximum does not change, while value of a
maximum does change.
max
y
ln g(y) = max
y
g(y)
arg max
y
ln g(y) = arg max
y
g(y)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 17 / 49
Bayesian Optimization
Kejia Shi Parallel Bayesian Optimization August 16, 2017 18 / 49
Gaussian Process
Kejia Shi Parallel Bayesian Optimization August 16, 2017 19 / 49
Gaussian Process: From Linear Regression
The linear regression model may be written as:
yi ≈ f (xi ; β) = β0 + β1xi1 + β2xi2 + ... + βd xid = β0 + Σd
j=1βj xij .
We have our training data (x1, y1), ..., (xn, yn) and we want to use them to
learn a set of weights β0, β1, ..., βd such that yi ≈ f (xi ; β).
Our least square objective function asks us to choose the β’s that
minimize the sum of squared errors
βLS = argminβΣn
i=1(yi − f (xi ; β))2
≡ argminβL.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 20 / 49
Gaussian Process: From Linear Regression
Vertical length error. And now (β0, β1, β2) defines this plane. (Share
the same solution as maximum likelihood approach.)
Idea: Imagine only one feature one response, now you are fitting to a
straight line, which leads to the assumption that errors are normally
distributed.
Extension:
Regularization: l2, Ridge, l1, LASSO
Priors: MAP
Kejia Shi Parallel Bayesian Optimization August 16, 2017 21 / 49
Gaussian Process: From Linear Regression
Probabistic View
Maximum Likelihood
Likelihood term rising from squared errors y ∼ N(Xβ, σ2I)
βML = argmaxβ ln p(y|µ = Xβ, σ2
)
= argmaxβ −
1
2σ2
||y − Xβ||2
(1)
Maximum a Posteriori
With extra Gaussian prior β ∼ N(0, λ−1I)
Notice ln p(y|X) is unknown. From Bayes Rule:
βMAP = argmaxβ ln p(y|β, X) + ln p(β) − ln p(y|X)
= argmaxβ −
1
2σ2
||y − Xβ||2
−
λ
2
||β||2
(2)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 22 / 49
Gaussian Process: In a Bayesian Way
Moving from point estimates to total Bayesian inference:
How do we take into account the uncertainty associated with the
limited data instead of just finding the MAP solution?
Take off prior from the likelihood! Instead, build a Gaussian prior
(over weights), Gaussian likelihood model.
(easy for computation of the weights)
The idea is to integrate over multiple straight lines.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 23 / 49
Gaussian Process: Bayesian Linear Regression
Settings:
Prior : β ∼ N(0, λ−1
I)
Likelihood : y ∼ N(Xβ, σ2
I)
(3)
The weights don’t even have to be represented explicitly. Different
weights result in different possible possible functions related to your
predictions.
Bayesian linear regression takes into account the imperfect
information and makes subjective guesses with insufficient data.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 24 / 49
Inner Product
The inner product of two vectors a = [a1, a2, ..., an] and
b = [b1, b2, ..., bn] is defined as
a · b = Σn
i=1ai bi = a1b1 + a2b2 + ... + anbn.
Inner product operation can be viewed as the simplest kernel function.
K(xn, xm) = xT
n xm.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 25 / 49
Gaussian Process: Kernelization
Kernel tricks add more flexibility without changing the algorithm by
efficiently letting you use ”inner product” to frame all kinds of
computation.
Specifically, suppose we have some inputs that cannot be easily
represented as elements of Rd elements, we can still apply a
kernelized algorithm to these inputs, as long as we can define an inner
product on them.
This enable us to integrate over more complicated bases, not just
straight lines to some finite basis (e.g. a polynomial basis {1, x, x2}).
Kejia Shi Parallel Bayesian Optimization August 16, 2017 26 / 49
Gaussian Process: Kernelization
Common Kernels
Polynomial kernel: K(xn, xm) = (axT
n xm + c)d , a > 0, c 0, d ∈ Z+
Exponential of a quadratic kernel:
K(xn, xm) = θ0 exp −θ1
2 (xn − xm)T (xn − xm) + θ2 + θ3xT
n xm
Squared exponential kernel:
K(xn, xm) = τ2 exp − 1
2l2 (xn − xm)T (xn − xm)
Mat´ern 3/2 kernel: K(xn, xm) = σ2
f 1 +
√
3r
σl
exp −
√
3r
σl
where,
r = (xn − xm)T (xn − xm).
Mat´ern 5/2 kernel: K(xn, xm) = σ2
f 1 +
√
5r
σl
+ 5r2
3σ2
l
exp −
√
5r
σl
Kejia Shi Parallel Bayesian Optimization August 16, 2017 27 / 49
Gaussian Process: Model and Inference
Gaussian Process Model:
Prior: f|X ∼ N(m, K)
Likelihood: y|f , σ2 ∼ N(f, σ2I)
where the elements of the mean vector and kernel/covariance matrix
are defined as mi µ0(xi ) and Ki,j k(xi , xj ).
Posterior:
Mean: µn(x) = µ0(x) + k(x)T (K + σ2I)−1(y − m)
Variance: σ2
n(x) = k(x, x) − k(x)T (K + σ2I)−1k(x)
k(x) is a vector of covariance terms between x and x1:n
These posteriors are used to select the next query point xn+1 as well.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 28 / 49
Gaussian Process: Demo and Evaluation
Linear mean function:
- Represents an initial guess at the regression function, with the linear
model m(x) = Xβ corresponding to a convenient special case,
possibly with a hyperprior chosen for the regression coefficients β in
this mean function. Posterior concentrates close to the linear model
to an extent supported by data
Covariance function:
- Controls the smoothness of realizations from the Gaussian process
and the degree of shrinkage towards the mean
Simplicity and flexibility in terms of conditioning and inference!
Kejia Shi Parallel Bayesian Optimization August 16, 2017 29 / 49
Gaussian Process: Demo and Evaluation
Advantages
Can fit a wide range of smooth surfaces while being computationally
tractable even for moderate to large numbers of predictors
Disadvantages
O(n3) complexity in terms of total number of observations from the
inversion of a dense covariance matrix
This becomes troublesome as the size of the search space grows with
the model complexity
Solution
Computation becomes significantly more accessible and it’s possible
to train many models in parallel
Kejia Shi Parallel Bayesian Optimization August 16, 2017 30 / 49
Gaussian Process v.s. Multivariate Gaussian
In short, Gaussian processes can be considered as an infinite-dimensional
generalization of multivariate Gaussian distributions.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 31 / 49
Acquisition Function
Kejia Shi Parallel Bayesian Optimization August 16, 2017 32 / 49
Acquisition Function: Overview
What does acquisition function mean?
An inexpensive function a(x) that can be evaluated at a given point
x that shows how desirable evaluating f at x is expected to be for
the minimization problem.
How do we query points to evaluate and update our hyperparameters?
Still select arbitrarily?
Utilize our model!
Kejia Shi Parallel Bayesian Optimization August 16, 2017 33 / 49
Acquisition Function: Overview
How to utilize the posterior model to create such acquisition functions?
A setting of the model hyperparameters θ
An arbitrary query point x and its correponding function value
v = f (x)
Map them to a utility function U = U(x, v, θ)
Now the acquisition function can be defined as an integration over all its
parameters, i.e. v|x, θ and θ, the selection of the n + 1 point given its
previous data Dn,
a(x; Dn) = EθEv|x,θ[U(x, v, θ)].
Kejia Shi Parallel Bayesian Optimization August 16, 2017 34 / 49
Acquisition Function: Marginalization
Deal with a(x; Dn) = EθEv|x,θ[U(x, v, θ)] is very complicated.
We start from the outer expectation over unknown hyperparameters θ.
We wish to marginalize out our uncertainty about θ with
an(x) Eθ|Dn
[a(x; θ)] = a(x; θ)p(θ|Dn)dθ.
Notice: from Bayes Rule, p(θ|Dn) = p(y|X,θ)p(θ)
p(Dn) .
Two ways to tackle the marginalized acquisition function:
Point estimate ˆθ = ˆθML or ˆθMAP , therefore ˆan(x) = a(x; ˆθn).
Sampling from posterior distribution p(θ|Dn), {θ
(i)
n }S
i=1, therefore
Eθ|Dn
[a(x; θ)] ≈ 1
S ΣS
i=1a(x; θ
(i)
n )
Idea: By far we can ignore the θ dependence!
Kejia Shi Parallel Bayesian Optimization August 16, 2017 35 / 49
Acquisition Function: Categories
Improvement-based Policies: how much has been improved from
current optimal?
Optimistic Policies: based on our optimistic estimation, what would
this give us? (exploitation and exploration)
Information-based Policies: how much by evaluating this point can
help us reduce the entropy?
Kejia Shi Parallel Bayesian Optimization August 16, 2017 36 / 49
Acquisition Function: Improvement-based Policies
Favors points that are likely to improve upon an incumbent target τ!
Notice τ in practice is set to the best observed value, i.e. maxi=1:n yi .
Probability of improvement(PI):
Earliest idea
aPI (x; Dn) P(v > τ) = Φ(µn(x)−τ
σn(x) )
Expected improvement(EI):
Improvement function, I(x): I(x, v, θ) (v − τ)I(v > τ)
aEI (x; Dn) E[I(x, v, θ)] = (µn(x) − τ)Φ(µn(x)−τ
σn(x) ) + σn(x)φ(µn(x)−τ
σn(x) )
Kejia Shi Parallel Bayesian Optimization August 16, 2017 37 / 49
Acquisition Function: Optimistic Policies
As we mentioned acquisition function balances between exploration of
the search space and exploitation of current promising areas, the
upper confidence bound method is one popular way. This category
is optimistic facing uncertainty, as the original method uses the upper
confidence bound as its acquisition function. Since the posterior at
any arbitrary point is a Gaussian, any quantile of the distribution is
computed with its corresponding value of βn as follows,
aUCB(x; Dn) µn(x) + βnσn(x).
We have extra strategies of choosing hyperparameter βn.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 38 / 49
Acquisition Function: Information-based Policies
What’s different? Consider posterior distribution over the unknown
minimizer x∗, denoted p∗(x|Dn).
One example is the Entropy Search (ES) technique, which aims to
reduce the uncertainty in the location x∗ by selecting points that are
expected to cause the largest reduction in entropy of distribution
p∗(x|Dn).
Information gain: U(x, y, θ) = H(x∗|Dn) − H(x∗|Dn ∪ {(x, y)})
aES (x; Dn) = H(x∗|Dn) − Ey|Dn,xH(x∗|Dn ∪ {(x, y)})
H(x∗|Dn) is the differential entropy of the posterior distribution.
Kejia Shi Parallel Bayesian Optimization August 16, 2017 39 / 49
Acquisition Function: Demo
Kejia Shi Parallel Bayesian Optimization August 16, 2017 40 / 49
Parallelization
Kejia Shi Parallel Bayesian Optimization August 16, 2017 41 / 49
Parallelization
And more Bayesian optimization packages (scikit-optimize, GPyOpt...)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 42 / 49
Parallelization
1. On Query Methods (Or Mostly,) Acquisition Function:
Most common: Expected Improvements
Snoek et al. 2012, Spearmint: Monte Carlo Acquisition Function
Wang et al. 2016, MOE (Yelp)[12]: q-EI, argmaxx1, , xqEI(x1, ..., xq)
- Construct an unbiased estimator of EI(x1, , xq) using infinitesimal
perturbation analysis (IPA)
- Turn to a gradient estimator problem under certain assumptions
Kejia Shi Parallel Bayesian Optimization August 16, 2017 43 / 49
Parallelization
2. Entropy Search
Consider a distribution over the minimum of the function and iteratively
evaluating points that will most decrease the entropy of this distribution.
Swersky et al. 2013, Multi-task:
Motivation: a single model on multiple datasets, perform low-cost
exploration of a promising location on the cheaper task before risking
an evaluation of the expensive task
Kejia Shi Parallel Bayesian Optimization August 16, 2017 44 / 49
Parallelization: Demo
Constant Liar: a constant target value is chosen for all pending new
evaluations (Chevalier, Ginsbourger 2012)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 45 / 49
References
0. Fundamentals borrow a lot from Prof. John Paisley’s Machine Learning slides.
Retrieved from http://www.columbia.edu/∼jwp2128/Teaching/W4721/
Spring2017/W4721Spring2017.html
1. Adams, R. P. (2014) A Tutorial on Bayesian Optimization for Machine Learning.
Retrieved from
https://www.iro.umontreal.ca/∼bengioy/.../Ryan adams 140814 bayesopt ncap.pdf
2. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization.
Journal of Machine Learning Research, 13(Feb), 281-305.
3. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B.
(2014). Bayesian Data Analysis (Vol. 2). Boca Raton, FL: CRC press.
4. Hennig, P., & Schuler, C. J. (2012). Entropy Search for Information-Efficient Global
Optimization. Journal of Machine Learning Research, 13(Jun), 18091837.
5. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & De Freitas, N. (2016). Taking
the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the
IEEE, 104(1), 148175. http://doi.org/10.1109/JPROC.2015.2494218
Kejia Shi Parallel Bayesian Optimization August 16, 2017 46 / 49
References
6. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016).
Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv
preprint arXiv:1603.06560.
7. scikit-optimize Package Documentation. Retrieved from
https://scikit-optimize.github.io
8. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of
machine learning algorithms. In Advances in neural information processing systems (pp.
2951-2959).
9. Swersky, K., Snoek, J., & Adams, R. P. (2013). Multi-task bayesian optimization. In
Advances in neural information processing systems (pp. 2004-2012).
Kejia Shi Parallel Bayesian Optimization August 16, 2017 47 / 49
References
Pictures:
1. How CSI Works/Grid Search: https://s-media-cache-
ak0.pinimg.com/originals/3b/dd/d2/3bddd27d048504c1bdb556b17f9ae86e.jpg
2. Catalhoyuk, The team excavates in the South Area on buildings in multiple levels to
better understand the stratigraphy, Retrieved from
(https://www.flickr.com/photos/catalhoyuk/19080999299):
3. A. Honchar, Neural Networks for Algorithmic Trading - Hyperparameters
optimization, Retrieved from (https://medium.com/alexrachnog/neural-networks-for-
algorithmic-trading-hyperparameters-optimization-cb2b4a29b8ee)
4. Grid Search and Random Search Demo Plots, Retrieved from
(http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-
optimal-tuning-parameters/)
Kejia Shi Parallel Bayesian Optimization August 16, 2017 48 / 49
The End
Kejia Shi Parallel Bayesian Optimization August 16, 2017 49 / 49

More Related Content

What's hot

accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
 
Igh maa-2015 nov
Igh maa-2015 novIgh maa-2015 nov
Igh maa-2015 novZach Zhang
 
Understanding distributed calculi in Haskell
Understanding distributed calculi in HaskellUnderstanding distributed calculi in Haskell
Understanding distributed calculi in HaskellPawel Szulc
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 

What's hot (20)

accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Igh maa-2015 nov
Igh maa-2015 novIgh maa-2015 nov
Igh maa-2015 nov
 
Understanding distributed calculi in Haskell
Understanding distributed calculi in HaskellUnderstanding distributed calculi in Haskell
Understanding distributed calculi in Haskell
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 

Similar to Parallel Bayesian Optimization

2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information TheoryJoe Suzuki
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsTommaso Rigon
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operatorjfrchicanog
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Conditioncsandit
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialRalph Schlosser
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationAbner Chih Yi Huang
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06ijaia
 

Similar to Parallel Bayesian Optimization (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Origins of Free
Origins of FreeOrigins of Free
Origins of Free
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
Lecture2 xing
Lecture2 xingLecture2 xing
Lecture2 xing
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomization
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Parallel Bayesian Optimization

  • 1. Introduction to Parallel Bayesian Optimization for Machine Learning Kejia Shi H2O.ai Inc. Columbia University in the City of New York kejia.shi@columbia.edu August 16, 2017 Kejia Shi Parallel Bayesian Optimization August 16, 2017 1 / 49
  • 2. Overview 1 Introduction 2 Fundamentals 3 Bayesian Optimization Gaussian Process Acquisition Function 4 Parallel Bayesian Optimization Kejia Shi Parallel Bayesian Optimization August 16, 2017 2 / 49
  • 3. Introduction So you did all the data cleaning, feature engineering... And you fit your first model with hyperparameters... Ridge regression: λ... Neural network: number of layers, batch size... Random forest: number of trees, leaf size... Now what you are going to do? Kejia Shi Parallel Bayesian Optimization August 16, 2017 3 / 49
  • 4. Introduction θ∗ = arg minθ Φ(θ), for θ ∈ Θ What is the best set of hyperparameters to optimize the chosen model? Kejia Shi Parallel Bayesian Optimization August 16, 2017 4 / 49
  • 5. Problem of Hyperparameter Search Figure: Typical List of a Convolutional Neural Network Training Hyperparameters Kejia Shi Parallel Bayesian Optimization August 16, 2017 5 / 49
  • 6. Problem of Hyperparameter Search Kejia Shi Parallel Bayesian Optimization August 16, 2017 6 / 49
  • 7. Automated Hyperparameter Search: Grid Search Kejia Shi Parallel Bayesian Optimization August 16, 2017 7 / 49
  • 8. Automated Hyperparameter Search: Grid Search Kejia Shi Parallel Bayesian Optimization August 16, 2017 8 / 49
  • 9. Automated Hyperparameter Search: Random Search Kejia Shi Parallel Bayesian Optimization August 16, 2017 9 / 49
  • 10. Fundamentals Kejia Shi Parallel Bayesian Optimization August 16, 2017 10 / 49
  • 11. Fundamentals: Review of Probability Suppose we have two events, A and B that may or may not be correlated, A = It’s raining B = The ground is wet We can talk about probabilities of these events, P(A) = Probability it is raining P(B) = Probability the ground is wet We can also talk about their other probabilities, Conditional probabilities P(B|A) = Probability the ground is wet given that it is raining Joint probabilities P(A, B) = Probability it is raining and the ground is wet Kejia Shi Parallel Bayesian Optimization August 16, 2017 11 / 49
  • 12. Fundamentals: Calculus of Probability There are simple rules from moving from one probability to another P(A, B) = P(A|B)P(B) = P(B|A)P(A) P(A) = ΣbP(A, B = b) and similarly P(B) = ΣaP(A = a, B) Kejia Shi Parallel Bayesian Optimization August 16, 2017 12 / 49
  • 13. Fundamentals: Bayes Rule We want to say something about the probability of B given that A happened, Bayes rule says that the probability of B after knowing A is: P(B|A) = P(A|B)P(B) P(A) . Notice: P(B|A) – ”posterior”, P(A|B) – ”likelihood”, P(B) – ”prior” and P(A) – ”marginal” or ”evidence”. Kejia Shi Parallel Bayesian Optimization August 16, 2017 13 / 49
  • 14. Fundamentals: Gaussian Distribution x ∼ N(µ, σ2 ) Density: f (x|µ, σ2 ) = 1 √ 2πσ2 exp{− (x − µ)2 2σ2 } Log Density: ln f (x|µ, σ2 ) = − (x − µ)2 2σ2 + constant Kejia Shi Parallel Bayesian Optimization August 16, 2017 14 / 49
  • 15. Fundamentals: Probabilistic Model In short, probabilistic model is a set of probability distribution p(x|). the distribution family p(·) is selected but we don’t know the parameter θ. In Gaussian distribution, p(x|θ), θ = {µ, Σ}. Kejia Shi Parallel Bayesian Optimization August 16, 2017 15 / 49
  • 16. Fundamentals: Maximum Likelihood The independent and identically distributed (iid) assumption: xi iid ∼ p(x|θ), i = 1, ..., n. With this, the joint density function becomes p(x1, ..., xn|θ) = i p(xi |θ). Maximum likelihood approach seeks the value of θ that mximize the likelihood function: ˆθML arg maxθ p(x1, ..., xn|θ). How to solve it? Take gradients! Kejia Shi Parallel Bayesian Optimization August 16, 2017 16 / 49
  • 17. Fundamentals: Logarithm Trick Taking gradients of n i=1 fi p(xi |θ) can be complicated. Therefore we use the property of monotonically increasing functions on R+. Specifically, maximizing g(y) and its monotone transformation f (g(y)) is the same. Hence we have the equality ln( i fi ) = Σi ln(fi ). Notice: location of a maximum does not change, while value of a maximum does change. max y ln g(y) = max y g(y) arg max y ln g(y) = arg max y g(y) Kejia Shi Parallel Bayesian Optimization August 16, 2017 17 / 49
  • 18. Bayesian Optimization Kejia Shi Parallel Bayesian Optimization August 16, 2017 18 / 49
  • 19. Gaussian Process Kejia Shi Parallel Bayesian Optimization August 16, 2017 19 / 49
  • 20. Gaussian Process: From Linear Regression The linear regression model may be written as: yi ≈ f (xi ; β) = β0 + β1xi1 + β2xi2 + ... + βd xid = β0 + Σd j=1βj xij . We have our training data (x1, y1), ..., (xn, yn) and we want to use them to learn a set of weights β0, β1, ..., βd such that yi ≈ f (xi ; β). Our least square objective function asks us to choose the β’s that minimize the sum of squared errors βLS = argminβΣn i=1(yi − f (xi ; β))2 ≡ argminβL. Kejia Shi Parallel Bayesian Optimization August 16, 2017 20 / 49
  • 21. Gaussian Process: From Linear Regression Vertical length error. And now (β0, β1, β2) defines this plane. (Share the same solution as maximum likelihood approach.) Idea: Imagine only one feature one response, now you are fitting to a straight line, which leads to the assumption that errors are normally distributed. Extension: Regularization: l2, Ridge, l1, LASSO Priors: MAP Kejia Shi Parallel Bayesian Optimization August 16, 2017 21 / 49
  • 22. Gaussian Process: From Linear Regression Probabistic View Maximum Likelihood Likelihood term rising from squared errors y ∼ N(Xβ, σ2I) βML = argmaxβ ln p(y|µ = Xβ, σ2 ) = argmaxβ − 1 2σ2 ||y − Xβ||2 (1) Maximum a Posteriori With extra Gaussian prior β ∼ N(0, λ−1I) Notice ln p(y|X) is unknown. From Bayes Rule: βMAP = argmaxβ ln p(y|β, X) + ln p(β) − ln p(y|X) = argmaxβ − 1 2σ2 ||y − Xβ||2 − λ 2 ||β||2 (2) Kejia Shi Parallel Bayesian Optimization August 16, 2017 22 / 49
  • 23. Gaussian Process: In a Bayesian Way Moving from point estimates to total Bayesian inference: How do we take into account the uncertainty associated with the limited data instead of just finding the MAP solution? Take off prior from the likelihood! Instead, build a Gaussian prior (over weights), Gaussian likelihood model. (easy for computation of the weights) The idea is to integrate over multiple straight lines. Kejia Shi Parallel Bayesian Optimization August 16, 2017 23 / 49
  • 24. Gaussian Process: Bayesian Linear Regression Settings: Prior : β ∼ N(0, λ−1 I) Likelihood : y ∼ N(Xβ, σ2 I) (3) The weights don’t even have to be represented explicitly. Different weights result in different possible possible functions related to your predictions. Bayesian linear regression takes into account the imperfect information and makes subjective guesses with insufficient data. Kejia Shi Parallel Bayesian Optimization August 16, 2017 24 / 49
  • 25. Inner Product The inner product of two vectors a = [a1, a2, ..., an] and b = [b1, b2, ..., bn] is defined as a · b = Σn i=1ai bi = a1b1 + a2b2 + ... + anbn. Inner product operation can be viewed as the simplest kernel function. K(xn, xm) = xT n xm. Kejia Shi Parallel Bayesian Optimization August 16, 2017 25 / 49
  • 26. Gaussian Process: Kernelization Kernel tricks add more flexibility without changing the algorithm by efficiently letting you use ”inner product” to frame all kinds of computation. Specifically, suppose we have some inputs that cannot be easily represented as elements of Rd elements, we can still apply a kernelized algorithm to these inputs, as long as we can define an inner product on them. This enable us to integrate over more complicated bases, not just straight lines to some finite basis (e.g. a polynomial basis {1, x, x2}). Kejia Shi Parallel Bayesian Optimization August 16, 2017 26 / 49
  • 27. Gaussian Process: Kernelization Common Kernels Polynomial kernel: K(xn, xm) = (axT n xm + c)d , a > 0, c 0, d ∈ Z+ Exponential of a quadratic kernel: K(xn, xm) = θ0 exp −θ1 2 (xn − xm)T (xn − xm) + θ2 + θ3xT n xm Squared exponential kernel: K(xn, xm) = τ2 exp − 1 2l2 (xn − xm)T (xn − xm) Mat´ern 3/2 kernel: K(xn, xm) = σ2 f 1 + √ 3r σl exp − √ 3r σl where, r = (xn − xm)T (xn − xm). Mat´ern 5/2 kernel: K(xn, xm) = σ2 f 1 + √ 5r σl + 5r2 3σ2 l exp − √ 5r σl Kejia Shi Parallel Bayesian Optimization August 16, 2017 27 / 49
  • 28. Gaussian Process: Model and Inference Gaussian Process Model: Prior: f|X ∼ N(m, K) Likelihood: y|f , σ2 ∼ N(f, σ2I) where the elements of the mean vector and kernel/covariance matrix are defined as mi µ0(xi ) and Ki,j k(xi , xj ). Posterior: Mean: µn(x) = µ0(x) + k(x)T (K + σ2I)−1(y − m) Variance: σ2 n(x) = k(x, x) − k(x)T (K + σ2I)−1k(x) k(x) is a vector of covariance terms between x and x1:n These posteriors are used to select the next query point xn+1 as well. Kejia Shi Parallel Bayesian Optimization August 16, 2017 28 / 49
  • 29. Gaussian Process: Demo and Evaluation Linear mean function: - Represents an initial guess at the regression function, with the linear model m(x) = Xβ corresponding to a convenient special case, possibly with a hyperprior chosen for the regression coefficients β in this mean function. Posterior concentrates close to the linear model to an extent supported by data Covariance function: - Controls the smoothness of realizations from the Gaussian process and the degree of shrinkage towards the mean Simplicity and flexibility in terms of conditioning and inference! Kejia Shi Parallel Bayesian Optimization August 16, 2017 29 / 49
  • 30. Gaussian Process: Demo and Evaluation Advantages Can fit a wide range of smooth surfaces while being computationally tractable even for moderate to large numbers of predictors Disadvantages O(n3) complexity in terms of total number of observations from the inversion of a dense covariance matrix This becomes troublesome as the size of the search space grows with the model complexity Solution Computation becomes significantly more accessible and it’s possible to train many models in parallel Kejia Shi Parallel Bayesian Optimization August 16, 2017 30 / 49
  • 31. Gaussian Process v.s. Multivariate Gaussian In short, Gaussian processes can be considered as an infinite-dimensional generalization of multivariate Gaussian distributions. Kejia Shi Parallel Bayesian Optimization August 16, 2017 31 / 49
  • 32. Acquisition Function Kejia Shi Parallel Bayesian Optimization August 16, 2017 32 / 49
  • 33. Acquisition Function: Overview What does acquisition function mean? An inexpensive function a(x) that can be evaluated at a given point x that shows how desirable evaluating f at x is expected to be for the minimization problem. How do we query points to evaluate and update our hyperparameters? Still select arbitrarily? Utilize our model! Kejia Shi Parallel Bayesian Optimization August 16, 2017 33 / 49
  • 34. Acquisition Function: Overview How to utilize the posterior model to create such acquisition functions? A setting of the model hyperparameters θ An arbitrary query point x and its correponding function value v = f (x) Map them to a utility function U = U(x, v, θ) Now the acquisition function can be defined as an integration over all its parameters, i.e. v|x, θ and θ, the selection of the n + 1 point given its previous data Dn, a(x; Dn) = EθEv|x,θ[U(x, v, θ)]. Kejia Shi Parallel Bayesian Optimization August 16, 2017 34 / 49
  • 35. Acquisition Function: Marginalization Deal with a(x; Dn) = EθEv|x,θ[U(x, v, θ)] is very complicated. We start from the outer expectation over unknown hyperparameters θ. We wish to marginalize out our uncertainty about θ with an(x) Eθ|Dn [a(x; θ)] = a(x; θ)p(θ|Dn)dθ. Notice: from Bayes Rule, p(θ|Dn) = p(y|X,θ)p(θ) p(Dn) . Two ways to tackle the marginalized acquisition function: Point estimate ˆθ = ˆθML or ˆθMAP , therefore ˆan(x) = a(x; ˆθn). Sampling from posterior distribution p(θ|Dn), {θ (i) n }S i=1, therefore Eθ|Dn [a(x; θ)] ≈ 1 S ΣS i=1a(x; θ (i) n ) Idea: By far we can ignore the θ dependence! Kejia Shi Parallel Bayesian Optimization August 16, 2017 35 / 49
  • 36. Acquisition Function: Categories Improvement-based Policies: how much has been improved from current optimal? Optimistic Policies: based on our optimistic estimation, what would this give us? (exploitation and exploration) Information-based Policies: how much by evaluating this point can help us reduce the entropy? Kejia Shi Parallel Bayesian Optimization August 16, 2017 36 / 49
  • 37. Acquisition Function: Improvement-based Policies Favors points that are likely to improve upon an incumbent target τ! Notice τ in practice is set to the best observed value, i.e. maxi=1:n yi . Probability of improvement(PI): Earliest idea aPI (x; Dn) P(v > τ) = Φ(µn(x)−τ σn(x) ) Expected improvement(EI): Improvement function, I(x): I(x, v, θ) (v − τ)I(v > τ) aEI (x; Dn) E[I(x, v, θ)] = (µn(x) − τ)Φ(µn(x)−τ σn(x) ) + σn(x)φ(µn(x)−τ σn(x) ) Kejia Shi Parallel Bayesian Optimization August 16, 2017 37 / 49
  • 38. Acquisition Function: Optimistic Policies As we mentioned acquisition function balances between exploration of the search space and exploitation of current promising areas, the upper confidence bound method is one popular way. This category is optimistic facing uncertainty, as the original method uses the upper confidence bound as its acquisition function. Since the posterior at any arbitrary point is a Gaussian, any quantile of the distribution is computed with its corresponding value of βn as follows, aUCB(x; Dn) µn(x) + βnσn(x). We have extra strategies of choosing hyperparameter βn. Kejia Shi Parallel Bayesian Optimization August 16, 2017 38 / 49
  • 39. Acquisition Function: Information-based Policies What’s different? Consider posterior distribution over the unknown minimizer x∗, denoted p∗(x|Dn). One example is the Entropy Search (ES) technique, which aims to reduce the uncertainty in the location x∗ by selecting points that are expected to cause the largest reduction in entropy of distribution p∗(x|Dn). Information gain: U(x, y, θ) = H(x∗|Dn) − H(x∗|Dn ∪ {(x, y)}) aES (x; Dn) = H(x∗|Dn) − Ey|Dn,xH(x∗|Dn ∪ {(x, y)}) H(x∗|Dn) is the differential entropy of the posterior distribution. Kejia Shi Parallel Bayesian Optimization August 16, 2017 39 / 49
  • 40. Acquisition Function: Demo Kejia Shi Parallel Bayesian Optimization August 16, 2017 40 / 49
  • 41. Parallelization Kejia Shi Parallel Bayesian Optimization August 16, 2017 41 / 49
  • 42. Parallelization And more Bayesian optimization packages (scikit-optimize, GPyOpt...) Kejia Shi Parallel Bayesian Optimization August 16, 2017 42 / 49
  • 43. Parallelization 1. On Query Methods (Or Mostly,) Acquisition Function: Most common: Expected Improvements Snoek et al. 2012, Spearmint: Monte Carlo Acquisition Function Wang et al. 2016, MOE (Yelp)[12]: q-EI, argmaxx1, , xqEI(x1, ..., xq) - Construct an unbiased estimator of EI(x1, , xq) using infinitesimal perturbation analysis (IPA) - Turn to a gradient estimator problem under certain assumptions Kejia Shi Parallel Bayesian Optimization August 16, 2017 43 / 49
  • 44. Parallelization 2. Entropy Search Consider a distribution over the minimum of the function and iteratively evaluating points that will most decrease the entropy of this distribution. Swersky et al. 2013, Multi-task: Motivation: a single model on multiple datasets, perform low-cost exploration of a promising location on the cheaper task before risking an evaluation of the expensive task Kejia Shi Parallel Bayesian Optimization August 16, 2017 44 / 49
  • 45. Parallelization: Demo Constant Liar: a constant target value is chosen for all pending new evaluations (Chevalier, Ginsbourger 2012) Kejia Shi Parallel Bayesian Optimization August 16, 2017 45 / 49
  • 46. References 0. Fundamentals borrow a lot from Prof. John Paisley’s Machine Learning slides. Retrieved from http://www.columbia.edu/∼jwp2128/Teaching/W4721/ Spring2017/W4721Spring2017.html 1. Adams, R. P. (2014) A Tutorial on Bayesian Optimization for Machine Learning. Retrieved from https://www.iro.umontreal.ca/∼bengioy/.../Ryan adams 140814 bayesopt ncap.pdf 2. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281-305. 3. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian Data Analysis (Vol. 2). Boca Raton, FL: CRC press. 4. Hennig, P., & Schuler, C. J. (2012). Entropy Search for Information-Efficient Global Optimization. Journal of Machine Learning Research, 13(Jun), 18091837. 5. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & De Freitas, N. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1), 148175. http://doi.org/10.1109/JPROC.2015.2494218 Kejia Shi Parallel Bayesian Optimization August 16, 2017 46 / 49
  • 47. References 6. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016). Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560. 7. scikit-optimize Package Documentation. Retrieved from https://scikit-optimize.github.io 8. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951-2959). 9. Swersky, K., Snoek, J., & Adams, R. P. (2013). Multi-task bayesian optimization. In Advances in neural information processing systems (pp. 2004-2012). Kejia Shi Parallel Bayesian Optimization August 16, 2017 47 / 49
  • 48. References Pictures: 1. How CSI Works/Grid Search: https://s-media-cache- ak0.pinimg.com/originals/3b/dd/d2/3bddd27d048504c1bdb556b17f9ae86e.jpg 2. Catalhoyuk, The team excavates in the South Area on buildings in multiple levels to better understand the stratigraphy, Retrieved from (https://www.flickr.com/photos/catalhoyuk/19080999299): 3. A. Honchar, Neural Networks for Algorithmic Trading - Hyperparameters optimization, Retrieved from (https://medium.com/alexrachnog/neural-networks-for- algorithmic-trading-hyperparameters-optimization-cb2b4a29b8ee) 4. Grid Search and Random Search Demo Plots, Retrieved from (http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for- optimal-tuning-parameters/) Kejia Shi Parallel Bayesian Optimization August 16, 2017 48 / 49
  • 49. The End Kejia Shi Parallel Bayesian Optimization August 16, 2017 49 / 49