Zahedi

Approximate Inference
Esmaeil Zahedi
Probabilistic Graphical Model
‫ﺧﺪا‬ ‫ﻧﺎم‬ ‫ﺑﻪ‬
1
zahedi.esmaeil@gmail.com

Variational Inference
2
Variational Bayesian Inference

Information Theory
3
Information Measure
Information
Information Entropy (Avg of Info.)
Differential Entropy

4
E[p(x)]=
=1

5
Kullback–Leibler (KL) Divergence
Relative Entropy (because the way we measure he distance is through Entropy)
Measures the distance between two probability distributions
is large
is small

6
KL divergence relative to p
KL divergence relative to q
with respect to p against q

7
Or
Simpliﬁcation

8
Properties of Kullback–Leibler (KL) Divergence
1- KL is always
if so
2- KL is not symmetrical

9
Relationship between
and KL divergence
Lets say we have distribution :
we dont know this?
So we use to estimate

10
When using estimation, we can use KL div. to measure the quality of the estimation
We know from the conditional probability

12
Lower Bound
Probability distribution
(0-1)
Always a negative value
Always a positive value
Always a negative value

13
P(x), x is given and therefore
“ﬁxed" doesn’t change, respect to q(z)
It’s Lower Bound because
it controls the KL divergence
because p(x) is ﬁxed
Larger and LargerSmaller and Smaller
When reduce the KL divergence
the estimation between q(z)
and p(z|x) becomes better

14
Here is the KEY POINT
when we are approximating a conditional prob. p(z|x) using q(z)
instead of minimizing the KL divergence
it’s the same as maximizing the Lower Bound
and it is easier to deal with the Lower Bound than KL

15
Remember Joint
Conditional?
most of the time we have
but we dont have

16
Why Variational Inference
x1
x3x2
x5x4
easy to get a joint pdf
it is a pain to get a conditional distribution
Solving this integrals is sometimes not even possible and intractable
possible ?
without needing to solve the integrals

17
This is the point of
solve the conditional directly from the joint pdf
Bypass the need to solve the integrals

18
VI is not the only way to solve this problem
1. Metropolis Hasting 2. Variational Inference 3. Laplace approximation
Solution by sampling
More accurate
Takes longer to compute
Easier to understand
Deterministic solution
Less accurate
Takes less time to compute
Harder to understand
Good approximation
Deterministic solution
Much less accurate
Takes less time to compute
Easier to understand
Poor approximation

19
Unknown
Difficult
Known
to find this , we can use approximate it using
we want KL of p(z|x) and q(z) to be as small as possible
this is the same as making Lower bound as large as possible
Therefor : to find p(z|x) Variational Inference finds q(z) that maximize the Lower Bound
this is still a hard problem !!
How do we know q(z) to
pick to make L as large as possible

20
Let’s say there are two sets of variables:
they have a joint pdf of
now we want to ﬁnd the conditional distribution
Or
Picked 3 variables randomly

21
if the integral is easy we can just take the integral
what if the integral is really hard
but
we want to know:
we already know:

22
Lets use to estimate
Remember
Our goal is to ﬁnd that maximize the Lower Bound

23
This is very hard
instead of assuming
can be any thing, why don't we make
some assumption to simplify the math.

24
since is a prob. distribution of multiple variables,
Lets assume that the variables are Independent
Mean Field Method

25
Lets plug it into the lower bound equation

26
instead of looking for the entire q(z), maybe we
can solve them one at a time !!!
what if we just solve for q(z1)?
lets assume we already know q(z2), q(z3).

27
1
2
3
Lets look at one at a time

28
3
constant K (known)

29
2
sum of all prob. are always 1

30
1
Remember the deﬁnition of Expectation

31
we need to prove this is the log of some probability dist. function

32
=1
constant
Not important

33
maximize this
This the negative KL divergence of
2= easier
1= equally to neg. KL div.max
min-
Problem restatement
New goal

34
pick q(z1) that minimize the KL
what was this?

35
all of the function except one we are looking at
itlikeachickenandeggproblem
? ? ?
But we don't know q(z1), q(z2) and q(z3)

36
Example:
we want to estimate p(x,y,z) with q(x,y,z)
according this equation

37
constant relative to z
constant relative to x
This is the exponential distribution therefore :

38
1- Find the expectation
2- Make it look like some distribution
3- Guess the rest of the parameters
Usingthesamelogic
This was a simple example, but if we cant ride of E[y],E[z]
In that case, we pick E[y], E[z] randomly. It will give us q(x)
q(x) q(y) q(z)
Trick

39
So far we have been concentrating on inferring latent variables
zi assuming the parameters θ of the model are known.
Now suppose we want to infer the parameters themselves.
If we make a fully factorized approximation,
we get a method known as variational Bayes

40
If we want to infer both latent variables and parameters,
and we make an approximation of the form:
we get a method known as variational Bayes EM.
This includes mixtures models, GMM, PCA, HMMs, etc
In VBEM, we alternate between updating q(zi
|D) (the variational E step)
and updating q(θ|D) (the variational M step).

41
We are given data points
The joint probability of all variables can be rewritten as
Gaussian Bayesian density estimation
Consider a simple Bayesian model consisting of a set of i.i.d. observations from a Gaussian
distribution, with unknown mean and variance.

42
Our goal is to infer the posterior distribution
Derivation of

43

44

45

46
Derivation of
This derivation is similar to above, although some of the
details for the sake of brevity omitted.
Exponentiating both sides

47
In each case, the parameters for the distribution over one of the variables
depend on expectations taken with respect to the other variable.
Expand the expectations, using the standard formulas for the expectations
of moments of the Gaussian and gamma distributions:

48

49
We can then write the parameter equations as follows,
without any expectations:

50
Note that there are circular dependencies among the formulas for
Repeat the last two steps until convergence.
This naturally suggests an EM-like algorithm:
Compute Use these values to compute
Initialize to some arbitrary value. Use the current value of along
with the known values of the other parameters, to compute

References
51
Variational Inference: A Review for Statisticians, David M. Blei, Alp
Kucukelbir, Jon D. McAuliffe, 2016.
Machine Learning: A Probabilistic Perspective,Kevin P. Murphy MIT
Press, 2012.
A Tutorial on Variational Bayes, Fox, C. and Roberts, S. 2012. Artiﬁcial
Intelligence Review.
The on-line textbook: Information Theory, Inference, and Learning
Algorithms, by David J.C. MacKay provides an introduction to
variational methods.
https://www.cs.cmu.edu/~epxing/Class/10708-15/notes/
10708_scribe_lecture13.pdf

Zahedi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Zahedi

Similar to Zahedi (20)

Recently uploaded

Recently uploaded (20)

Zahedi