Explaining the Basics of Mean Field Variational Approximation for Statisticians

Explaining “Explaining Variational
Approximation”
Based on Paper
“Explaining Variational Approximation”
JT Ormerod, MP Wand (2010)
Presentation by Wayne Tai Lee

My Goal
● Convert the paper into a short presentation
● Not covering the examples (really helpful!)
● Intuition and motivation only

Why do we want to use
variational approximations?
● In Statistics, Bayesian solutions always
involve the posterior:
p(Θ|data) = p(data | Θ) p(Θ) / p(data)

● p(Θ|data) = p(data | Θ) p(Θ) / p(data)
p(Θ|data) : posterior, belief after updating with data

p(data | Θ): likelihood, data generation

p(Θ): prior, belief before updating with data

p(Θ): prior, belief before updating with data
p(data): “normalizing constant” to ensure posterior is
a density function

p(data | Θ): likelihood, specified by you
p(Θ): prior, specified by you
p(data): a nasty integral that cannot be calculated
explicitly in general

p(data | Θ): likelihood, specified by you
p(Θ): prior, specified by you
p(data): a nasty integral that cannot be calculated
explicitly in general
● Consequence:
– Posterior often has no analytical expression

Most popular alternative
● To obtain the posterior or any related statistic
– Sample the posterior via MCMC methods

Most popular alternative
● To obtain the posterior or any related statistic
– Sample the posterior via MCMC methods
● Pros
– Can get arbitrarily close to the posterior with enough
samples (resource/time intensive)
● Con:
– Lots of tuning necessary
– Time consuming to run

Variational Approximation
● Intuition:
– Approximate the posterior with a class of
functions that are easier to deal with
mathematically
– Find the function that minimizes the KL
divergence between the posterior in this
class

Variational Approximation
● Intuition:
– Approximate the posterior with a class of
functions that are easier to deal with
mathematically
– Find the function that minimizes the KL
divergence between the posterior in this class
● Pros:
– Suuuuper fast
● Cons:
– No guarantees on closeness

Big Picture
Method to get
Posterior
MCMC Variational Method
Strategy Sampling Optimization
Solution Asymptotically Exact Approximation with no bounds
Speed Often slow Fast
The “catch” Tuning and convergence
assessment require experience
Need tractable mathematical
setup

Explaining Variational
Approximation
● Change notation: p(y) = p(data)
● Use q(Θ) to approximate p(Θ|y)
● Will assume family of functions for q(Θ)
– q(Θ) = q1(Θ1)q2(Θ2)...qp(Θp)
– Each qi(Θi) is a density

Max Lower Bound =
Min KL-Divergence

Sanity Check: Optimal Solution is
THE solution
● Optimal q(Θ) is p(Θ|y) for general form:
● Important: this is a very general solution for
arbitrary dependence/distribution of Θ and y
● Product form of q(Θ) allows us to divide and
conquer!

Focus on each Θ separately
+...

Our assumptions so far
● Product form of q(Θ) allowed us to optimize
each term separately
● qi(Θi) being densities allow
to integrate out nicely

How to convert into an optimization
problem that we can solve?

We've only learned one trick...

We've only learned one trick...
● Optimal q1(Θ1) is then

To get a densities, just normalize

Repeat for Θi
● General Mean Field Variational Approximation
Solution:
The density that is proportional to

Similarity to Full Conditional in
Gibb Sampling
● Optimal qi(Θi) is proportional to
● Need to do algebra until this is “tractable”
– i.e. something we recognize as a standard
distribution that is easily normalized
– This is where the “setup” comes in important

For example
● If
resembles exp(Θi^2 *c) then we know this
must be the Gaussian density!

Final Solution
● Product of all qi(θi) is then approximated to
p(θ|y)
● Naturally doesn't do well when there's strong
dependence between the θi
● You Should try the examples in the paper!

First Example
● Data generated as
– Y | μ, σ^2 ~ N(μ,σ^2)
● Priors
– μ ~ N(m,s^2)
– σ^2 ~InvGamma(a, b)

Gibbs Sampling vs Variational
Samples
● N=100
● N=20

Discussion
● Hard to know when the approximation is poor
relative to the true posterior...

Explaining the Basics of Mean Field Variational Approximation for Statisticians

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Explaining the Basics of Mean Field Variational Approximation for Statisticians

Similar to Explaining the Basics of Mean Field Variational Approximation for Statisticians (20)

Recently uploaded

Recently uploaded (20)

Explaining the Basics of Mean Field Variational Approximation for Statisticians