Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
8803-09-lec16.pdf
1. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Approximate Inference
Henrik I. Christensen
Robotics & Intelligent Machines @ GT
Georgia Institute of Technology,
Atlanta, GA 30332-0280
hic@cc.gatech.edu
Henrik I. Christensen (RIM@GT) Approximate Inference 1 / 36
2. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 2 / 36
3. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Introduction
We often are required to estimate a (conditional) prior of the form
p(Z|X)
The solution might be intractable
1 There might not be a close form solution
2 The integration over X or a parameter space θ might be
computationally challenging
3 The set of possible outcomes might be significant/exponential
Two strategies
1 Deterministic Approximation Methods
2 Stochastic Sampling (Monte Carlo Techniques)
Today we will talk about deterministic techniques
Henrik I. Christensen (RIM@GT) Approximate Inference 3 / 36
4. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 4 / 36
5. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Variational Inference
In general we have a Bayesian Model as seen earlier, ie.
ln p(X) = ln p(X, Z) − ln p(Z|X)
We can rewrite this to
ln p(X) = L(q) + KL(q||p)
where
L(q) =
Z
q(Z) ln
p(X, Z)
q(Z)
KL(q||p) = −
Z
q(Z) ln
p(Z|X)
q(Z)
So L(q) is an estimate of the joint distribution and KL is the
Kullback-Leibler comparison of q(Z) to p(Z|X).
Henrik I. Christensen (RIM@GT) Approximate Inference 5 / 36
6. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Factorized Distributions
Assume for now that we can factorize Z into disjoint groups so that
q(Z) =
M
Y
i=1
qi (Zi )
In physics a similar model has been adopted termed mean field theory
We can them optimize L(q) through a component wise optimization
L(q) =
Z Y
i
qi
ln p(X, Z) −
X
j
qj
dZ
=
Z
qj ln p̃(X, Zj )dZj −
Z
qj ln qj dZj + const
where
p̃(X, Zj ) = Ei6=j [ln p(X, Z)] + c = ln p(X, Z)
Y
i6=j
qi dZi + c
Henrik I. Christensen (RIM@GT) Approximate Inference 6 / 36
7. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Factorized distributions
The optimal solution is now
ln q∗
j (Zj ) = Ei6=j [ln p(X, Z)] + c
Ie the solution where every factor minimizes the influence on L(q)
Henrik I. Christensen (RIM@GT) Approximate Inference 7 / 36
8. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 8 / 36
9. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Variational Mixture of Gaussians
We encounter mixtures of Gaussians all the time
Examples are multi-wall modelling, ambiguous localization, ...
We have:
a set of observed data X,
a set of latent variables, Z that describe the mixture
Henrik I. Christensen (RIM@GT) Approximate Inference 9 / 36
10. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixture of Gaussians - Modelling
We can model the mixture model
p(Z|π) =
N
Y
n=1
K
Y
k=1
πznk
k
We can also derive the observed conditional
p(X|Z, µ, Λ) =
N
Y
n=1
K
Y
k=1
N(xn|µk, Λ−1
k )znk
We will for now assume that mixtures are modelled as diraclets
p(π) = Dir(π|α0) = C(α0)
K
Y
k=1
πα0−1
k
Henrik I. Christensen (RIM@GT) Approximate Inference 10 / 36
11. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixture of Gaussians - Modelling
The component processes can be modelled as a Gaussian-Wishart
p(µ, Λ) = p(µ|Λ)p(Λ) =
K
Y
k=1
N(µk|m0, (β0Λk)−1
)W (Λk|W0, ν0)
Ie a total model of
xn
zn
N
π
µ
Λ
Henrik I. Christensen (RIM@GT) Approximate Inference 11 / 36
12. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Variational
The conditional model can be seen as
p(X, Z, π, µ, Λ) = p(X|Z, µ, Λ)p(Z|π)p(π)p(µ|Λ)p(Λ)
Only X is observed
We can now consider the selection of a distribution
q(Z, π, µ, Λ) = q(Z)q(π, µ, Λ)
this is clear an assumption of independence.
We can use the general result of component-wise optimization
ln q∗
(Z) = Eπ,µ,Λ[ln p(X, Z, π, µ, Λ] + const
Decomposition gives us
ln q∗
(Z) = Eπ[ln p(Z|π)] + Eµ,Λ[ln p(X|Z, µ, Λ)] + const
ln q∗
(Z) =
N
X
n=1
K
X
k=1
znk ln ρnk + const
Henrik I. Christensen (RIM@GT) Approximate Inference 12 / 36
13. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Variational
We can further achieve
ln ρnk = E[ln πk ]+
1
2
E[ln |Λk |]−
D
2
ln 2π−
1
2
Eµk ,Λk
[(xn −µk )T
Λk (xn −µk )]+c
Taking the exponential we have
q∗
(Z) ∝
K
Y
k=1
N
Y
n=1
ρznk
nk
Using normalization we arrive at
q∗
(Z) ∝
K
Y
k=1
N
Y
n=1
rznk
nk
Where
rnk =
ρnk
P
j ρnj
Henrik I. Christensen (RIM@GT) Approximate Inference 13 / 36
14. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Variational
Just as we saw for EM we can define
Nk =
N
X
n=1
rnk
x̄k =
1
Nk
N
X
n=1
rnkxn
Sk =
1
Nk
N
X
n=1
rnk(xn − x̄n)(xn − x̄n)T
Henrik I. Christensen (RIM@GT) Approximate Inference 14 / 36
15. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Parameters/Mixture
Lets now consider q(π, µ, Λ) to arrive at
ln q
∗
(π, µ, Λ) = ln p(π) +
K
X
k=1
ln p(µk , Λk ) + EZ [ln p(Z|π)] +
k
X
k=1
N
X
n=1
E[znk ] ln N(xn|µk , Λ
−1
k ) + c
We can partition the problem into
q(π, µ, Λ) = q(π)
K
Y
k=1
q(µk, Λk)
We can derive
ln q∗
(π) = (α0 − 1)
K
X
k=1
ln πk +
K
X
k=1
N
X
n=1
rnk ln πk + c
We can now derive
q∗
(π) = Dir(π|α)
where
αk = α0 + Nk
Henrik I. Christensen (RIM@GT) Approximate Inference 15 / 36
16. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Parameters/Mixture
We can then derive
q∗
(µk, Λk) = N(µk|mk, (βkΛk)−1
)W (λk|Wk, νk)
where
βk = β0 + Nk
mk =
1
βk
(β0m0 + Nkx̄k)
W −1
K = W −1
0 + NkSk +
β0Nk
β0 + Nk
(x̄k − m0)(x̄k − m0)T
νk = ν0 + Nk + 1
Henrik I. Christensen (RIM@GT) Approximate Inference 16 / 36
17. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Parameters
We can now arrive at the parameters
Eµk ,Λk
[(xn − µk )T
(xn − µk )] = Dβ−1
k + νk (xn − mk )T
WK (xn − mk )
ln Λ̃k = E[ln |Λ|k |] =
D
X
i=1
ψ
νk + 1 − i
2
+ D ln 2 + ln |Wk |
ln π̃k = E[ln πk ] = ψ(αk ) − ψ(α̂)
here ψ(.) which is defined as d/da ln Γ(a) also known as the digramma
function. The last two results are given by the Gauss-Wishart
Henrik I. Christensen (RIM@GT) Approximate Inference 17 / 36
18. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixtures of Gaussians - Parameters
We can finally find the responsibilities
rnk ∝ πk|Λk|1/2
exp
−
1
2
(xn − µk)T
Λk(xn − µk)
The optimization is stepwise
1 Estimate µ, Λ and then rnk
2 Estimate π and Z
3 Check for convergence - return to 1 if not converged
Henrik I. Christensen (RIM@GT) Approximate Inference 18 / 36
19. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Mixture of Gaussians - Example
0 15
60 120
Henrik I. Christensen (RIM@GT) Approximate Inference 19 / 36
20. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
MoG - Varional Lower Bound
We can estimate the best fit / lower bound
L = E[ln p(X|Z, µ, Λ)] + E[ln p(Z|pi)] + E[ln p(µ, Λ)] − E[ln q(Z)] − E[ln q(π)] − E[ln q(µ, Λ)]
E[ln p(X|Z, µ, Λ)] =
1
2
X
k
Nk
n
ln Λ̃k − Dβ−1
k − νk Tr(Sk Wk )
−νk (x̄k − mk )T
WK (x̄k − mk ) − D ln 2π
E[ln p(Z|π)] =
X
n
X
k
rnk ln rnk
E[ln p(π)] = ln C(α0) + (α0 − 1)
X
k
ln π̃k
.
.
. =
.
.
. (see book)
Henrik I. Christensen (RIM@GT) Approximate Inference 20 / 36
21. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 21 / 36
22. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Exponential Family Distribution
Recall from 3rd lecture:
Exponential family
p(x|η) = h(x)g(η) exp
n
ηT
u(x)
o
where η represent the “natural parameters”
g(η) is the normalization “factor”
u(x) is some general function of data
Henrik I. Christensen (RIM@GT) Approximate Inference 22 / 36
23. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Exponential Family Distribution
The joint distribution for observed and latent variables is then
p(X, Z|η) =
N
Y
n=1
h(xn, zn)g(η) exp
n
ηT
u(xn, zn)
o
The conjugate prior for η is then
p(η|ν0, v0) = f (ν0, χ0)g(η)ν0
exp
n
ν0ηT
χ
o
where ν0 is prior number of observations and χ is the sufficient
statistics (moments)
Henrik I. Christensen (RIM@GT) Approximate Inference 23 / 36
24. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Exponential Family Distribution - Variational
As before we can compute
ln q∗
(Z) = Eη[ln p(X, Z|η)] + const
=
X
n
n
ln h(xn, zn) + E[ηT
]u(xn, zn)
o
+ const
i.e. a sum of independent terms
Taking exponential on both sides we have
q∗
(zn) = h(xn, zn)g(E[η]) exp
n
E[ηT
]u(xn, zn)
o
Henrik I. Christensen (RIM@GT) Approximate Inference 24 / 36
25. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Exponential Family Distribution - Variational
Similarly the natural parameters can be optimized by
ln q∗
(η) = ln p(η|ν0, χ0) + EZ [ln p(X, Z|η)] + const
Which expands to
ln q∗
(η) = ν0 ln g(η) + ηT
χ0 +
X
ln g(η) + ηT
Ezn [u(xn, zn)]
+ const
Using the trick of exponentials on both sides we have
q∗
(η) = f (νN, χN)g(η)νN
exp
n
ηT
χN
o
where
νN = ν0 + N χn = χ0 +
X
n
Ezn [u(xn, zn)]
Henrik I. Christensen (RIM@GT) Approximate Inference 25 / 36
26. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Exponential Family Distribution - Variational
As expected the solution is iterative
q∗(zn) and q∗(η) are coupled.
In the E step compute E[u(xn, zn)] - the sufficient statistics and
compute q(η)
In the M step use the estimate to maximize the estimate for q(zn)
and compute E[ηT ]
Henrik I. Christensen (RIM@GT) Approximate Inference 26 / 36
27. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 27 / 36
28. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation
Fundamentally we are trying to match distributions to the data and
match up the natural parameters. I.e. find the “best”family of
distributions and at the same time fit the parameter.
In the end we are trying to minimize the Kullback-Leibler (KL) with
respect to q(z)
Consider for a minute KL(p||q) where p(z) is fixed and q(z) is a
member of the exponential family
q(z) = h(z)g(η) exp
n
ηT
u(z)
o
Henrik I. Christensen (RIM@GT) Approximate Inference 28 / 36
29. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Optimization
The Kullback - Leibler is then
KL(p||q) = − ln g(η) − ηT
Ep(z)[u(z)] + const
The extrema is then given by
−∇ ln g(η) = Ep(z)[u(z)]
i.e. the best estimate is to match q(z) to p(z) by setting “natural
parameters” to the sufficient statistics (moment matching).
I.e. q(z) = N(z|µ, Σ) as a model for the data
Henrik I. Christensen (RIM@GT) Approximate Inference 29 / 36
30. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Modelling
Consider a model with factorized probabilities
p(D, θ) =
Y
i
fi (θ)
where fi (theta) = p(xn|θ) and you might have a prior f0(θ) = p(θ).
The posterior is then
p(θ|D) =
1
p(D)
Y
i
fi (θ)
The model evident is given by
p(D) =
Z Y
i
fi (θ)dθ
Henrik I. Christensen (RIM@GT) Approximate Inference 30 / 36
31. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Computing
The estimate is then
q(θ) =
1
Z
Y
i
˜
fi (θ)
q(θ) can be factorized so that each term is optimized
Through optimization factor-by-factor it is possible to generate an
estimate - take-one-out-and-optimize
Henrik I. Christensen (RIM@GT) Approximate Inference 31 / 36
32. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Algorithm
Initialize factor approximation - ˜
fi (θ)
Initialize posterior estimate - q(θ) ∝
Q
i
˜
fi (θ)
iterate
1 Choose a factor to refine
2 Remove ˜
fj (θ) from prior qj
= q/f
3 Evaluate new posterior/sufficient statistics
4 Update factors
5 Evaluate aproximation
Henrik I. Christensen (RIM@GT) Approximate Inference 32 / 36
33. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Example
θ x
−5 0 5 10
p(x|θ) = (1 − w)N(x|θ, I) + wN(x|0, aI)
Henrik I. Christensen (RIM@GT) Approximate Inference 33 / 36
34. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Expectation Propagation - Example
θ
−5 0 5 10 θ
−5 0 5 10
Henrik I. Christensen (RIM@GT) Approximate Inference 34 / 36
35. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Outline
1 Introduction
2 Variational Inference
3 Variational Mixture of Gaussians
4 Exponential Family
5 Expectation Propagation
6 Summary
Henrik I. Christensen (RIM@GT) Approximate Inference 35 / 36
36. Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary
Summary
Often computation of complete model is a challenge
Two ways to approximate computations
Deterministic Approximations
Sampling Based Methods
Many tricks for approximation
Factorization is typically a first strategy
Iterative optimization of factors
Next time we will talk about sampling based methods
Henrik I. Christensen (RIM@GT) Approximate Inference 36 / 36