5. 5
Variational Inference
Kullback–Leibler (KL) Divergence
Relative Entropy (because the way we measure he distance is through Entropy)
Measures the distance between two probability distributions
is large
is small
10. 10
Variational Inference
When using estimation, we can use KL div. to measure the quality of the estimation
We know from the conditional probability
13. 13
Variational Inference
P(x), x is given and therefore
“fixed" doesn’t change, respect to q(z)
It’s Lower Bound because
it controls the KL divergence
because p(x) is fixed
Larger and LargerSmaller and Smaller
When reduce the KL divergence
the estimation between q(z)
and p(z|x) becomes better
14. 14
Variational Inference
Here is the KEY POINT
when we are approximating a conditional prob. p(z|x) using q(z)
instead of minimizing the KL divergence
it’s the same as maximizing the Lower Bound
and it is easier to deal with the Lower Bound than KL
16. 16
Variational Inference
Why Variational Inference
x1
x3x2
x5x4
easy to get a joint pdf
it is a pain to get a conditional distribution
Solving this integrals is sometimes not even possible and intractable
possible ?
without needing to solve the integrals
17. 17
Variational Inference
This is the point of
Variational Inference
solve the conditional directly from the joint pdf
Bypass the need to solve the integrals
18. 18
Variational Inference
VI is not the only way to solve this problem
1. Metropolis Hasting 2. Variational Inference 3. Laplace approximation
Solution by sampling
More accurate
Takes longer to compute
Easier to understand
Deterministic solution
Less accurate
Takes less time to compute
Harder to understand
Good approximation
Deterministic solution
Much less accurate
Takes less time to compute
Easier to understand
Poor approximation
19. 19
Variational Inference
Unknown
Difficult
Known
to find this , we can use approximate it using
we want KL of p(z|x) and q(z) to be as small as possible
this is the same as making Lower bound as large as possible
Therefor : to find p(z|x) Variational Inference finds q(z) that maximize the Lower Bound
this is still a hard problem !!
How do we know q(z) to
pick to make L as large as possible
20. 20
Variational Inference
Let’s say there are two sets of variables:
they have a joint pdf of
now we want to find the conditional distribution
Or
Picked 3 variables randomly
21. 21
Variational Inference
if the integral is easy we can just take the integral
what if the integral is really hard
but
Variational Inference
we want to know:
we already know:
26. 26
Variational Inference
instead of looking for the entire q(z), maybe we
can solve them one at a time !!!
what if we just solve for q(z1)?
lets assume we already know q(z2), q(z3).
35. 35
Variational Inference
all of the function except one we are looking at
itlikeachickenandeggproblem
? ? ?
But we don't know q(z1), q(z2) and q(z3)
38. 38
Variational Inference
1- Find the expectation
2- Make it look like some distribution
3- Guess the rest of the parameters
Usingthesamelogic
This was a simple example, but if we cant ride of E[y],E[z]
In that case, we pick E[y], E[z] randomly. It will give us q(x)
q(x) q(y) q(z)
Trick
39. Variational Bayesian Inference
39
So far we have been concentrating on inferring latent variables
zi assuming the parameters θ of the model are known.
Now suppose we want to infer the parameters themselves.
If we make a fully factorized approximation,
we get a method known as variational Bayes
40. 40
If we want to infer both latent variables and parameters,
and we make an approximation of the form:
Variational Bayesian Inference
we get a method known as variational Bayes EM.
This includes mixtures models, GMM, PCA, HMMs, etc
In VBEM, we alternate between updating q(zi
|D) (the variational E step)
and updating q(θ|D) (the variational M step).
41. 41
Variational Bayesian Inference
We are given data points
The joint probability of all variables can be rewritten as
Gaussian Bayesian density estimation
Consider a simple Bayesian model consisting of a set of i.i.d. observations from a Gaussian
distribution, with unknown mean and variance.
47. 47
Variational Bayesian Inference
In each case, the parameters for the distribution over one of the variables
depend on expectations taken with respect to the other variable.
Expand the expectations, using the standard formulas for the expectations
of moments of the Gaussian and gamma distributions:
50. 50
Variational Bayesian Inference
Note that there are circular dependencies among the formulas for
Repeat the last two steps until convergence.
This naturally suggests an EM-like algorithm:
Compute Use these values to compute
Initialize to some arbitrary value. Use the current value of along
with the known values of the other parameters, to compute
51. References
51
Variational Inference: A Review for Statisticians, David M. Blei, Alp
Kucukelbir, Jon D. McAuliffe, 2016.
Machine Learning: A Probabilistic Perspective,Kevin P. Murphy MIT
Press, 2012.
A Tutorial on Variational Bayes, Fox, C. and Roberts, S. 2012. Artificial
Intelligence Review.
The on-line textbook: Information Theory, Inference, and Learning
Algorithms, by David J.C. MacKay provides an introduction to
variational methods.
https://www.cs.cmu.edu/~epxing/Class/10708-15/notes/
10708_scribe_lecture13.pdf