SlideShare a Scribd company logo
1 of 8
Download to read offline
A Note on the Derivation of the Variational Inference Updates for
DILN [2]
Tomonari MASADA @ Nagasaki University
August 30, 2013
1
Let M, Nm, T be the number of documents, the number of word tokens appearing in the dth document,
and the truncation level. Xmn denotes the word appearing as the nth token of the mth document, and
Cmn denotes the latent topic for the nth token of the dth document. The definitions of other symbols can
be found in the original paper [2].
The joint distribution can be written as follows:
p(X, Z, C, w, η, V , α, β, m, K)
= p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K). (1)
A lower bound of the log evidence can be obtained by using Jensen’s inequality as follows:
ln p(X) = ln
∫ ∑
C
p(X, Z, C, w, η, V , α, β, m, K)dZdwdηdV dαdβdmdK
= ln
∫ ∑
C
q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K)
·
p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K)
q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K)
dZdwdηdV dαdβdmdK
≥
∫ ∑
C
q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K)
· ln
p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K)
q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K)
dZdwdηdV dαdβdmdK
=
∫ ∑
C
q(C)q(η) ln p(X|C, η)dη +
∫
q(Z)q(V )q(w)q(β) ln p(Z|V , w, β)dZdV dwdβ
+
∫ ∑
C
q(C)q(Z) ln p(C|Z)dZ +
∫
q(w)q(m)q(K) ln p(w|m, K)dwdmdK
+
∫
q(η) ln p(η)dη +
∫
q(V ) ln p(V |α)dV +
∫
q(α) ln p(α)dα
+
∫
q(β) ln p(β)dβ +
∫
q(m) ln p(m)dm +
∫
q(K) ln p(K)dK
−
∫
q(Z) ln q(Z)dZ −
∑
C
q(C) ln q(C) −
∫
q(w) ln q(w)dw
−
∫
q(η) ln q(η)dη −
∫
q(V ) ln q(V )dV −
∫
q(α) ln q(α)dα
−
∫
q(β) ln q(β)dβ −
∫
q(m) ln q(m)dm −
∫
q(K) ln q(K)dK. (2)
Since q(V ) = δV , q(m) = δm, q(K) = δK, q(α) = δα, q(β) = δβ, we can rewrite the right hand side
of Eq. (2) as follows:
ln p(X) ≥
∫ ∑
C
q(C)q(η) ln p(X|C, η)dη +
∫
q(Z)q(w) ln p(Z|V , w, β)dZdw
+
∫ ∑
C
q(C)q(Z) ln p(C|Z)dZ +
∫
q(w) ln p(w|m, K)dw +
∫
q(η) ln p(η)dη + ln p(V |α)
+ ln p(α) + ln p(β) + ln p(m) + ln p(K)
−
∫
q(Z) ln q(Z)dZ −
∑
C
q(C) ln q(C) −
∫
q(w) ln q(w)dw −
∫
q(η) ln q(η)dη. (3)
2
We examine each term of the right hand side of Eq. (3).
∫ ∑
C
q(C)q(η) ln p(X|C, η)dη =
M∑
m=1
Nm∑
n=1
T∑
k=1
ϕmnk
∫
Γ(
∑
d γ′
kd)
∏
d Γ(γ′
kd)
D∏
d=1
η
γ′
kd−1
kd ln ηkXmn dηk
=
M∑
m=1
Nm∑
n=1
T∑
k=1
ϕmnk
{
ψ(γ′
kXmn
) − ψ(γ′
k)
}
, (4)
where γ′
k ≡
∑D
d=1 γ′
kd.
∫
q(Z)q(w) ln p(Z|V , w, β)dZdw
=
∑
m
∑
k
∫
q(Zmk)q(wmk) ln
{
(e−wmk
)βpk
Γ(βpk)
Zβpk−1
mk e−e−wmk Zmk
}
dZmkdwmk
= −
∑
k
βpk
∑
m
∫
q(wmk)wmkdwmk −
∑
k
ln Γ(βpk)
+
∑
k
(βpk − 1)
∑
m
∫
q(Zmk) ln ZmkdZmk −
∑
m
∑
k
∫
q(Zmk)q(wmk)e−wmk
ZmkdZmkdwmk, (5)
where
∫
q(wmk)e−wmk
dwmk =
∫
1
√
2πvmk
exp
{
−
(wmk − µmk)2
2vmk
− wmk
}
dwmk
=
∫
1
√
2πvmk
exp
(
−
w2
mk − 2µmkwmk + 2vmkwmk + µ2
mk
2vmk
)
dwmk
=
∫
1
√
2πvmk
exp
{
−
(wmk − µmk + vmk)2
2vmk
− µmk +
vmk
2
}
dwmk = exp
(
− µmk +
vmk
2
)
. (6)
Note that vmk is a variance. Consequently, we have
∫
q(Z)q(w) ln p(Z|V , w, β)dZdw
= −
∑
k
βpk
∑
m
µmk −
∑
k
ln Γ(βpk) +
∑
k
(βpk − 1)
∑
m
{
ψ(amk) − ln bmk
}
−
∑
m
∑
k
amk
bmk
exp
(
− µmk +
vmk
2
)
. (7)
Note that pk ≡ Vk
∏k−1
j=1 (1 − Vj).
∫ ∑
C
q(C)q(Z) ln p(C|Z)dZ =
∑
m
∑
n
∫
q(Zm)
∑
k
ϕmnk ln
Zmk
∑T
j=1 Zmj
dZm
=
∑
m
∑
k
( ∑
n
ϕmnk
) ∫
q(Zmk) ln ZmkdZmk −
∑
m
Nm
∫
q(Zm) ln
( T∑
j=1
Zmj
)
dZm. (8)
Since ln x ≤ x
ξ − 1 + ln ξ for any ξ > 0,
∫
q(Zm) ln
( T∑
j=1
Zmj
)
dZm ≤
∫
q(Zm)
(∑
j Zmk
ξm
− 1 + ln ξm
)
dZm =
1
ξm
∑
k
amk
bmk
− 1 + ln ξm. (9)
Therefore,
∫ ∑
C
q(C)q(Z) ln p(C|Z)dZ
=
∑
m
∑
k
( ∑
n
ϕmnk
){
ψ(amk) − ln bmk
}
−
∑
m
Nm
ξm
∑
k
amk
bmk
+
∑
m
Nm −
∑
m
Nm ln ξm. (10)
∫
q(w) ln p(w|m, K)dw =
∑
m
∫
q(wm) ln p(wm|m, K)dwm
=
∑
m
[
−
D
2
ln 2π −
1
2
ln |K| −
1
2
∫
q(wm)(wm − m)T
K−1
(wm − m)dwm
]
= −
MD ln 2π
2
−
M ln |K|
2
−
1
2
∑
m
{ ∑
k
(µ2
mk + vmk)K−1
k:k − 2
∑
k
mkµmkK−1
k:k +
∑
k
m2
kK−1
k:k
+
∑
k
∑
j̸=k
(µmkµmj − 2µmkmj + mkmj)K−1
k:j
}
= −
MD ln 2π
2
−
M ln |K|
2
−
1
2
∑
m
{ ∑
k
vmkK−1
k:k +
∑
k
∑
j
(µmk − mk)(µmj − mj)K−1
k:j
}
(11)
∫
q(η) ln p(η)dη =
∑
k
∫
Γ(
∑
d γ′
kd)
∏
d Γ(γ′
kd)
D∏
d=1
η
γ′
kd−1
kd
{
ln Γ(Dγ) − DΓ(γ) +
∑
d′
(γ − 1) ln ηkd
}
dηk
= T ln Γ(Dγ) − TDΓ(γ) + (γ − 1)
∑
k
∑
d
{
ψ(γ′
kd) − ψ(γ′
k)
}
(12)
ln p(V |α) = T ln Γ(α + 1) − TΓ(α) + (α − 1)
∑
k
ln(1 − Vk) (13)
∫
q(Z) ln q(Z)dZ = −
∑
m
∑
k
{
ln Γ(amk) − (amk − 1)ψ(amk) − ln bmk + amk
}
(14)
∑
C
q(C) ln q(C) =
∑
m
∑
n
∑
k
ϕmnk ln ϕmnk (15)
∫
q(w) ln q(w)dw = −
MT(1 + ln 2π)
2
−
∑
m
∑
k
ln vmk
2
(16)
∫
q(η) ln q(η)dη =
∑
k
[ ∑
d
(γ′
kd − 1)
{
ψ(γ′
kd) − ψ(γ′
k)
}
+ ln Γ(γ′
k) −
∑
d
ln Γ(γ′
kd)
]
(17)
Consequently, we obtain a lower bound of the log evidence as follows:
ln p(X) ≥
M∑
m=1
Nm∑
n=1
T∑
k=1
ϕmnk
{
ψ(γ′
kXmn
) − ψ(γ′
k)
}
−
T∑
k=1
{
βVk
k−1∏
j=1
(1 − Vj)
} M∑
m=1
µmk −
T∑
k=1
ln Γ
(
βVk
k−1∏
j=1
(1 − Vj)
)
+
T∑
k=1
{
βVk
k−1∏
j=1
(1 − Vj) − 1
} M∑
m=1
{
ψ(amk) − ln bmk
}
−
M∑
m=1
T∑
k=1
amk
bmk
exp
(
− µmk +
vmk
2
)
+
M∑
m=1
T∑
k=1
( Nm∑
n=1
ϕmnk
){
ψ(amk) − ln bmk
}
−
M∑
m=1
Nm
ξm
T∑
k=1
amk
bmk
+
M∑
m=1
Nm −
M∑
m=1
Nm ln ξm
−
MD ln 2π
2
−
M ln |K|
2
−
1
2
M∑
m=1
{ T∑
k=1
vmkK−1
k:k +
T∑
k=1
T∑
j=1
(µmk − mk)(µmj − mj)K−1
k:j
}
+ T ln Γ(Dγ) − TD ln Γ(γ) + (γ − 1)
T∑
k=1
D∑
d=1
{
ψ(γ′
kd) − ψ(γ′
k)
}
+ T ln Γ(α + 1) − T ln Γ(α) + (α − 1)
T∑
k=1
ln(1 − Vk)
+
M∑
m=1
T∑
k=1
{
ln Γ(amk) − (amk − 1)ψ(amk) − ln bmk + amk
}
−
M∑
m=1
Nm∑
n=1
T∑
k=1
ϕmnk ln ϕmnk +
MT(1 + ln 2π)
2
+
M∑
m=1
T∑
k=1
ln vmk
2
−
T∑
k=1
[ D∑
d=1
(γ′
kd − 1)
{
ψ(γ′
kd) − ψ(γ′
k)
}
+ ln Γ(γ′
k) −
D∑
d=1
ln Γ(γ′
kd)
]
+ ln p(α) + ln p(β) + ln p(m) + ln p(K). (18)
We assume that p(m) and p(K) are a uniform distribution, and that p(α) and p(β) are a Gamma distri-
bution.
3 Inference Algorithm
3.1 Update q(Cmn)
Let L denote the right hand side of the Eq. (18).
∂L
∂ϕmnk
= ψ(γ′
kXmn
) − ψ(γ′
k) + ψ(amk) − ln bmk − ln ϕmnk − 1
∴ ϕmnk ∝ exp
{
ψ(γ′
kXmn
) − ψ(γ′
k) + ψ(amk) − ln bmk
}
(19)
3.2 Update q(Zmk)
∂L
∂ξm
=
Nm
ξ2
m
∑
k
amk
bmk
−
Nm
ξm
, ∴ ξm =
∑
k
amk
bmk
. (20)
∂L
∂bmk
= −
{
βVk
k−1∏
j=1
(1 − Vj) − 1
} 1
bmk
+
amk
b2
mk
exp
(
− µmk +
vmk
2
)
−
( Nm∑
n=1
ϕmnk
) 1
bmk
+
Nm
ξm
amk
b2
mk
−
1
bmk
(21)
∂L
∂bmk
= 0 gives
0 = −bmk
{
βVk
k−1∏
j=1
(1 − Vj) +
Nm∑
n=1
ϕmnk
}
+ amk
{
exp
(
− µmk +
vmk
2
)
+
Nm
ξm
}
. (22)
Therefore,
bmk = amk ·
exp
(
− µmk + vmk
2
)
+ Nm
ξm
βVk
∏k−1
j=1 (1 − Vj) +
∑Nm
n=1 ϕmnk
. (23)
∂L
∂amk
=
{
βVk
k−1∏
j=1
(1 − Vj) − 1
}
ψ′
(amk) −
1
bmk
exp
(
− µmk +
vmk
2
)
+
( Nm∑
n=1
ϕmnk
)
ψ′
(amk) −
Nm
ξm
1
bmk
− (amk − 1)ψ′
(amk) + 1
=
{
βVk
k−1∏
j=1
(1 − Vj) +
Nm∑
n=1
ϕmnk − amk
}
ψ′
(amk) −
1
bmk
{
exp
(
− µmk +
vmk
2
)
+
Nm
ξm
}
+ 1 (24)
By using the result for bmk, we obtain
∂L
∂amk
=
{
βVk
k−1∏
j=1
(1 − Vj) +
Nm∑
n=1
ϕmnk − amk
}
ψ′
(amk) −
βVk
∏k−1
j=1 (1 − Vj) +
∑Nm
n=1 ϕmnk
amk
+ 1
=
{
βVk
k−1∏
j=1
(1 − Vj) +
Nm∑
n=1
ϕmnk − amk
}{
ψ′
(amk) −
1
amk
}
∴ amk = βVk
k−1∏
j=1
(1 − Vj) +
Nm∑
n=1
ϕmnk, bmk = exp
(
− µmk +
vmk
2
)
+
Nm
ξm
. (25)
3.3 Update q(wmk)
∂L
∂µmk
=
amk
bmk
exp
(
− µmk +
vmk
2
)
−
{
βVk
k−1∏
j=1
(1 − Vj)
}
−
T∑
j=1
(µmj − mj)K−1
k:j (26)
∂L
∂vmk
=
1
2
{
−
amk
bmk
exp
(
− µmk +
vmk
2
)
− K−1
k:k +
1
vmk
}
(27)
The plus and minus signs on the right hand side of the second line of Eq. (22) in the original paper are
different from those given above. We may use L-BFGS for updating µmk and vmk.
3.4 Update q(ηk)
∂L
∂γ′
kd
=
∑
m
∑
n
I(Xmn = d)ϕmnkψ′
(γ′
kd) −
∑
m
∑
n
ϕmnkψ′
(γ′
k) + (γ − 1)ψ′
(γ′
kd) − (γ − 1)
∑
d
ψ′
(γ′
k)
− ψ(γ′
kd) + ψ(γ′
k) − (γ′
kd − 1)ψ′
(γ′
kd) +
∑
d
(γ′
kd − 1)ψ′
(γ′
k) − ψ(γ′
k) + ψ(γ′
dk)
=
∑
m
∑
n
I(Xmn = d)ϕmnkψ′
(γ′
kd) −
∑
m
∑
n
ϕmnkψ′
(γ′
k) + (γ − γ′
kd)ψ′
(γ′
kd) −
∑
d
(γ − γ′
kd)ψ′
(γ′
k)
= ψ′
(γ′
kd)
{ ∑
m
∑
n
I(Xmn = d)ϕmnk + γ − γ′
kd
}
− ψ′
(γ′
k)
∑
d
{ ∑
m
∑
n
I(Xmn = d)ϕmnk + γ − γ′
kd
}
∴ γ′
kd = γ +
∑
m
∑
n
I(Xmn = d)ϕmnk (28)
3.5 Update q(Vk)
∂L
∂Vk
= −
α − 1
1 − Vk
− β
k−1∏
j=1
(1 − Vj)
M∑
m=1
{
µmk − ψ(amk) + ln bmk
}
−
1
1 − Vk
T∑
ˆk=k+1
{
βVˆk
ˆk−1∏
j=1
(1 − Vj)
} M∑
m=1
{
µmˆk − ψ(amˆk) + ln bmˆk
}
− β
k−1∏
j=1
(1 − Vj)ψ
(
βVk
k−1∏
j=1
(1 − Vj)
)
−
T∑
ˆk=k+1
1
1 − Vk
βVˆk
ˆk−1∏
j=1
(1 − Vj)ψ
(
βVˆk
ˆk−1∏
j=1
(1 − Vj)
)
= −
α − 1
1 − Vk
− β
k−1∏
j=1
(1 − Vj)
M∑
m=1
{
µmk − ψ(amk) + ln bmk
}
− β
k−1∏
j=1
(1 − Vj)
T∑
ˆk=k+1
{
Vˆk
ˆk−1∏
j=k+1
(1 − Vj)
} M∑
m=1
{
µmˆk − ψ(amˆk) + ln bmˆk
}
− β
k−1∏
j=1
(1 − Vj)ψ
(
βVk
k−1∏
j=1
(1 − Vj)
)
− β
k−1∏
j=1
(1 − Vj)
T∑
ˆk=k+1
{
Vˆk
ˆk−1∏
j=k+1
(1 − Vj)
}
ψ
(
βVˆk
ˆk−1∏
j=1
(1 − Vj)
)
= −
α − 1
1 − Vk
− β
k−1∏
j=1
(1 − Vj)
[ M∑
m=1
{
µmk − ψ(amk) + ln bmk
}
+ ψ
(
βVk
k−1∏
j=1
(1 − Vj)
)]
− β
k−1∏
j=1
(1 − Vj)
T∑
ˆk=k+1
{
Vˆk
ˆk−1∏
j=k+1
(1 − Vj)
}[ M∑
m=1
{
µmˆk − ψ(amˆk) + ln bmˆk
}
+ ψ
(
βVˆk
ˆk−1∏
j=1
(1 − Vj)
)]
= −
α − 1
1 − Vk
−
pk
Vk
[ M∑
m=1
{
µmk − ψ(amk) + ln bmk
}
+ ψ(βpk)
]
−
T∑
j=k+1
pj
1 − Vk
[ M∑
m=1
{
µmj − ψ(amj) + ln bmj
}
+ ψ(βpj)
]
(29)
I think that Vk on the second line of Eq. (24) in the original paper is not required.
3.6 Update q(K)
With respect to K, we maximize the following function:
L(K) = −
M
2
ln |K| −
1
2
M∑
m=1
T∑
k=1
vmkK−1
k:k −
1
2
M∑
m=1
(µm − m)T
K−1
(µm − m), (30)
where the last term is equal to 1
2
∑M
m=1
∑T
k=1
∑T
j=1(µmk − mk)(µmj − mj)K−1
k:j.
The derivative of the first term of the right hand side in Eq. (30) is obtained based on the following
identity (Cf. Eq. (51) of The Matrix Cookbook1
):
∂ ln |K|
∂K
= K−1
. (31)
For the second term of the right hand side in Eq. (30), it holds that
∑
k vmkK−1
k:k = Tr[K−1
diag(vm)],
where diag(vm) is a diagonal matrix whose kth diagonal entry is vmk. By using the following identity (Cf.
Eq. (16) in Old and New Matrix Algebra Useful for Statistics2
):
∂Tr[AΣ−1
B]
∂Σ
= −Σ−1
BAΣ−1
, (32)
1http://orion.uwaterloo.ca/ hwolkowi/matrixcookbook.pdf
2http://research.microsoft.com/en-us/um/people/minka/papers/matrix/minka-matrix.pdf
we obtain
∂
∑
m
∑
k vmkK−1
k:k
∂K = −K−1
{ ∑
m diag(vm)
}
K−1
.
For the last term in Eq. (30), it holds that
(µm − m)T
K−1
(µm − m) = Tr
[
(µm − m)T
K−1
(µm − m)
]
. (33)
Therefore, by using Eq. (32), we obtain ∂(µm−m)T
K−1
(µm−m)
∂K = −K−1
(µm − m)(µm − m)T
K−1
.
Consequently, we have
∂L(K)
∂K
= −
M
2
K−1
+
1
2
K−1
{ ∑
m
diag(vm)
}
K−1
+
1
2
K−1
∑
m
{
(µm − m)(µm − m)T
}
K−1
. (34)
∂L(K)
∂K = 0 holds when
K−1
=
1
M
K−1
∑
m
{
diag(vm) + (µm − m)(µm − m)T
}
K−1
. (35)
By multiplying K on both sides of the above equation from left and right, we obtain
K =
1
M
∑
m
{
diag(vm) + (µm − m)(µm − m)T
}
. (36)
This derivation is completely the same with that of CTM [1].
3.7 Update q(m)
∂L
∂mk
=
T∑
j=1
(µmj − mj)K−1
k:j , ∴ mk =
1
T
T∑
j=1
µmj (37)
3.8 Update q(α)
With respect to α, we maximize the following function:
L(α) = T ln Γ(α + 1) − T ln Γ(α) + (α − 1)
T∑
k=1
ln(1 − Vk) (38)
We use the following identity (Cf. Eqs. (120), (121), and (122) in Estimating a Dirichlet distribution3
):
Γ(n + x)
Γ(x)
≥ cxa
if n ≥ 1 (39)
a =
{
ψ(n + ˆx) − ψ(ˆx)
}
ˆx (40)
c =
Γ(n + ˆx)
Γ(ˆx)
ˆx−a
(41)
Then we obtain:
L(α) ≥ T
{
ψ(ˆα + 1) − ψ(ˆα)
}
ˆα ln α + (α − 1)
T∑
k=1
ln(1 − Vk) + const. (42)
We maximize this lower bound, which we denote as L(α).
∂L(α)
∂α
=
1
α
T
{
ψ(ˆα + 1) − ψ(ˆα)
}
ˆα +
T∑
k=1
ln(1 − Vk) (43)
∴ α = α ·
T
{
ψ(α + 1) − ψ(α)
}
−
∑T
k=1 ln(1 − Vk)
(44)
3http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/
This is a multiplicative update.
When we apply a Gamma prior p(α) =
b
a0
0
Γ(a0) αa0−1
e−b0α
to α, we have the following result:
∂L(α)
∂α
=
1
α
T
{
ψ(ˆα + 1) − ψ(ˆα)
}
ˆα +
T∑
k=1
ln(1 − Vk) + (a0 − 1)
1
α
− b0 (45)
∴ α = α ·
a0 − 1 + T
{
ψ(α + 1) − ψ(α)
}
b0 −
∑T
k=1 ln(1 − Vk)
(46)
3.9 Update q(β)
With respect to β, we maximize the following function L(β):
L(β) = −
T∑
k=1
{
βVk
k−1∏
j=1
(1 − Vj)
} M∑
m=1
µmk −
T∑
k=1
ln Γ
(
βVk
k−1∏
j=1
(1 − Vj)
)
+
T∑
k=1
{
βVk
k−1∏
j=1
(1 − Vj)
} M∑
m=1
{
ψ(amk) − ln bmk
}
= −
T∑
k=1
βpk
M∑
m=1
µmk −
T∑
k=1
ln Γ(βpk) +
T∑
k=1
βpk
M∑
m=1
{
ψ(amk) − ln bmk
}
(47)
The first and the second derivatives are obtained as follows:
∂L(β)
∂β
= −
T∑
k=1
pk
[
ψ(βpk) +
M∑
m=1
{
µmk − ψ(amk) + ln bmk
}]
∂2
L(β)
∂β2
= −
T∑
k=1
p2
kψ′
(βpk) (48)
We can use Newton’s method to update β.
When we apply a Gamma prior p(β) =
d
c0
0
Γ(c0) βc0−1
e−d0β
to β, we have the following result:
∂L(β)
∂β
= −
T∑
k=1
pk
[
ψ(βpk) +
M∑
m=1
{
µmk − ψ(amk) + ln bmk
}]
+ (c0 − 1)
1
β
− d0
∂2
L(β)
∂β2
= −
T∑
k=1
p2
kψ′
(βpk) − (c0 − 1)
1
β2
(49)
References
[1] David M. Blei and John D. Lafferty. Correlated topic models. In NIPS, 2005.
[2] John Paisley, Chong Wang, and David Blei. The discrete infinite logistic normal distribution for
mixed-membership modeling. In AISTATS, 2011.

More Related Content

What's hot

A block-step version of KS regularization
A block-step version of KS regularizationA block-step version of KS regularization
A block-step version of KS regularizationKeigo Nitadori
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDBenjamin Jaedon Choi
 
Engineering formula sheet
Engineering formula sheetEngineering formula sheet
Engineering formula sheetsankalptiwari
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arraysKeigo Nitadori
 
Convergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootConvergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootKeigo Nitadori
 
LES from first principles
LES from first principlesLES from first principles
LES from first principlesMichael Munroe
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
 
A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelTomonari Masada
 
2d beam element with combined loading bending axial and torsion
2d beam element with combined loading bending axial and torsion2d beam element with combined loading bending axial and torsion
2d beam element with combined loading bending axial and torsionrro7560
 
Hosoya polynomial, wiener and hyper wiener indices of some regular graphs
Hosoya polynomial, wiener and hyper wiener indices of some regular graphsHosoya polynomial, wiener and hyper wiener indices of some regular graphs
Hosoya polynomial, wiener and hyper wiener indices of some regular graphsieijjournal
 
Bellman ford
Bellman fordBellman ford
Bellman fordKiran K
 
Longest common subsequence
Longest common subsequenceLongest common subsequence
Longest common subsequenceKiran K
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAlex (Oleksiy) Varfolomiyev
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
Single source shortes path in dag
Single source shortes path in dagSingle source shortes path in dag
Single source shortes path in dagKiran K
 
22 01 2014_03_23_31_eee_formula_sheet_final
22 01 2014_03_23_31_eee_formula_sheet_final22 01 2014_03_23_31_eee_formula_sheet_final
22 01 2014_03_23_31_eee_formula_sheet_finalvibhuti bansal
 
Litvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewLitvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewAlexander Litvinenko
 
Minimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiMinimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiIbrahim Alfayoumi
 

What's hot (20)

A block-step version of KS regularization
A block-step version of KS regularizationA block-step version of KS regularization
A block-step version of KS regularization
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
Engineering formula sheet
Engineering formula sheetEngineering formula sheet
Engineering formula sheet
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arrays
 
Convergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootConvergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-root
 
LES from first principles
LES from first principlesLES from first principles
LES from first principles
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax Model
 
2d beam element with combined loading bending axial and torsion
2d beam element with combined loading bending axial and torsion2d beam element with combined loading bending axial and torsion
2d beam element with combined loading bending axial and torsion
 
Hosoya polynomial, wiener and hyper wiener indices of some regular graphs
Hosoya polynomial, wiener and hyper wiener indices of some regular graphsHosoya polynomial, wiener and hyper wiener indices of some regular graphs
Hosoya polynomial, wiener and hyper wiener indices of some regular graphs
 
Bellman ford
Bellman fordBellman ford
Bellman ford
 
Longest common subsequence
Longest common subsequenceLongest common subsequence
Longest common subsequence
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
 
Matrices ii
Matrices iiMatrices ii
Matrices ii
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Single source shortes path in dag
Single source shortes path in dagSingle source shortes path in dag
Single source shortes path in dag
 
22 01 2014_03_23_31_eee_formula_sheet_final
22 01 2014_03_23_31_eee_formula_sheet_final22 01 2014_03_23_31_eee_formula_sheet_final
22 01 2014_03_23_31_eee_formula_sheet_final
 
Litvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewLitvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an Overview
 
Minimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumiMinimum spanning tree algorithms by ibrahim_alfayoumi
Minimum spanning tree algorithms by ibrahim_alfayoumi
 

Viewers also liked (20)

Filosofía
FilosofíaFilosofía
Filosofía
 
áLbum De FotografíAs Lalala
áLbum De FotografíAs LalalaáLbum De FotografíAs Lalala
áLbum De FotografíAs Lalala
 
áLbum2222
áLbum2222áLbum2222
áLbum2222
 
Que Briga Danada! - Vovó Mima Badan
Que Briga Danada! - Vovó Mima BadanQue Briga Danada! - Vovó Mima Badan
Que Briga Danada! - Vovó Mima Badan
 
Actividades
ActividadesActividades
Actividades
 
Clarice Sabichona - Vovó Mima Badan
Clarice Sabichona - Vovó Mima BadanClarice Sabichona - Vovó Mima Badan
Clarice Sabichona - Vovó Mima Badan
 
Corfo, ComunidadMujer dic 14
Corfo, ComunidadMujer dic 14Corfo, ComunidadMujer dic 14
Corfo, ComunidadMujer dic 14
 
Rap300
Rap300Rap300
Rap300
 
Exposic sólidos cristalinos
Exposic sólidos cristalinosExposic sólidos cristalinos
Exposic sólidos cristalinos
 
Pneus
PneusPneus
Pneus
 
Os alunos leem e ilustram
Os alunos leem e ilustramOs alunos leem e ilustram
Os alunos leem e ilustram
 
Stage that Home to Sell!
Stage that Home to Sell!Stage that Home to Sell!
Stage that Home to Sell!
 
3 Rct09 1
3 Rct09 13 Rct09 1
3 Rct09 1
 
Marketing
MarketingMarketing
Marketing
 
Educação digital und_01_apresent
Educação digital und_01_apresentEducação digital und_01_apresent
Educação digital und_01_apresent
 
Investigacion de red
Investigacion de redInvestigacion de red
Investigacion de red
 
Projeto Mente Aberta: Jornalismo
Projeto Mente Aberta: JornalismoProjeto Mente Aberta: Jornalismo
Projeto Mente Aberta: Jornalismo
 
Erotic Thoughts Master Document
Erotic Thoughts Master DocumentErotic Thoughts Master Document
Erotic Thoughts Master Document
 
Trabajo colaborativo (2)
Trabajo colaborativo (2)Trabajo colaborativo (2)
Trabajo colaborativo (2)
 
Módulo.conocimientos
Módulo.conocimientosMódulo.conocimientos
Módulo.conocimientos
 

Similar to A Note on the Derivation of the Variational Inference Updates for DILN

Signals and Systems Formula Sheet
Signals and Systems Formula SheetSignals and Systems Formula Sheet
Signals and Systems Formula SheetHaris Hassan
 
A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDAA Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDATomonari Masada
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic ModelsTomonari Masada
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
ch9.pdf
ch9.pdfch9.pdf
ch9.pdfKavS14
 
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihoodDeep Learning JP
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tablesGaurav Vasani
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
Copia de derivadas tablas
Copia de derivadas tablasCopia de derivadas tablas
Copia de derivadas tablasGeral Delgado
 
Ray : modeling dynamic systems
Ray : modeling dynamic systemsRay : modeling dynamic systems
Ray : modeling dynamic systemsHouw Liong The
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianTomonari Masada
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tablesSaravana Selvan
 
線形回帰モデル
線形回帰モデル線形回帰モデル
線形回帰モデル貴之 八木
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Shohei Taniguchi
 
Using blurred images to assess damage in bridge structures?
Using blurred images to assess damage in bridge structures?Using blurred images to assess damage in bridge structures?
Using blurred images to assess damage in bridge structures? Alessandro Palmeri
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習Masahiro Suzuki
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 

Similar to A Note on the Derivation of the Variational Inference Updates for DILN (20)

Signals and Systems Formula Sheet
Signals and Systems Formula SheetSignals and Systems Formula Sheet
Signals and Systems Formula Sheet
 
A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDAA Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDA
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic Models
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
ch9.pdf
ch9.pdfch9.pdf
ch9.pdf
 
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tables
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Copia de derivadas tablas
Copia de derivadas tablasCopia de derivadas tablas
Copia de derivadas tablas
 
Ray : modeling dynamic systems
Ray : modeling dynamic systemsRay : modeling dynamic systems
Ray : modeling dynamic systems
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tables
 
線形回帰モデル
線形回帰モデル線形回帰モデル
線形回帰モデル
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
Using blurred images to assess damage in bridge structures?
Using blurred images to assess damage in bridge structures?Using blurred images to assess damage in bridge structures?
Using blurred images to assess damage in bridge structures?
 
Semi vae memo (2)
Semi vae memo (2)Semi vae memo (2)
Semi vae memo (2)
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 

More from Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxTomonari Masada
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用Tomonari Masada
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationTomonari Masada
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionTomonari Masada
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...Tomonari Masada
 

More from Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmax
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
 
A Note on ZINB-VAE
A Note on ZINB-VAEA Note on ZINB-VAE
A Note on ZINB-VAE
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)
 
Poisson factorization
Poisson factorizationPoisson factorization
Poisson factorization
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

A Note on the Derivation of the Variational Inference Updates for DILN

  • 1. A Note on the Derivation of the Variational Inference Updates for DILN [2] Tomonari MASADA @ Nagasaki University August 30, 2013 1 Let M, Nm, T be the number of documents, the number of word tokens appearing in the dth document, and the truncation level. Xmn denotes the word appearing as the nth token of the mth document, and Cmn denotes the latent topic for the nth token of the dth document. The definitions of other symbols can be found in the original paper [2]. The joint distribution can be written as follows: p(X, Z, C, w, η, V , α, β, m, K) = p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K). (1) A lower bound of the log evidence can be obtained by using Jensen’s inequality as follows: ln p(X) = ln ∫ ∑ C p(X, Z, C, w, η, V , α, β, m, K)dZdwdηdV dαdβdmdK = ln ∫ ∑ C q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K) · p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K) q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K) dZdwdηdV dαdβdmdK ≥ ∫ ∑ C q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K) · ln p(X|C, η)p(Z|V , w, β)p(C|Z)p(w|m, K)p(η)p(V |α)p(α)p(β)p(m)p(K) q(Z)q(C)q(w)q(η)q(V )q(α)q(β)q(m)q(K) dZdwdηdV dαdβdmdK = ∫ ∑ C q(C)q(η) ln p(X|C, η)dη + ∫ q(Z)q(V )q(w)q(β) ln p(Z|V , w, β)dZdV dwdβ + ∫ ∑ C q(C)q(Z) ln p(C|Z)dZ + ∫ q(w)q(m)q(K) ln p(w|m, K)dwdmdK + ∫ q(η) ln p(η)dη + ∫ q(V ) ln p(V |α)dV + ∫ q(α) ln p(α)dα + ∫ q(β) ln p(β)dβ + ∫ q(m) ln p(m)dm + ∫ q(K) ln p(K)dK − ∫ q(Z) ln q(Z)dZ − ∑ C q(C) ln q(C) − ∫ q(w) ln q(w)dw − ∫ q(η) ln q(η)dη − ∫ q(V ) ln q(V )dV − ∫ q(α) ln q(α)dα − ∫ q(β) ln q(β)dβ − ∫ q(m) ln q(m)dm − ∫ q(K) ln q(K)dK. (2)
  • 2. Since q(V ) = δV , q(m) = δm, q(K) = δK, q(α) = δα, q(β) = δβ, we can rewrite the right hand side of Eq. (2) as follows: ln p(X) ≥ ∫ ∑ C q(C)q(η) ln p(X|C, η)dη + ∫ q(Z)q(w) ln p(Z|V , w, β)dZdw + ∫ ∑ C q(C)q(Z) ln p(C|Z)dZ + ∫ q(w) ln p(w|m, K)dw + ∫ q(η) ln p(η)dη + ln p(V |α) + ln p(α) + ln p(β) + ln p(m) + ln p(K) − ∫ q(Z) ln q(Z)dZ − ∑ C q(C) ln q(C) − ∫ q(w) ln q(w)dw − ∫ q(η) ln q(η)dη. (3) 2 We examine each term of the right hand side of Eq. (3). ∫ ∑ C q(C)q(η) ln p(X|C, η)dη = M∑ m=1 Nm∑ n=1 T∑ k=1 ϕmnk ∫ Γ( ∑ d γ′ kd) ∏ d Γ(γ′ kd) D∏ d=1 η γ′ kd−1 kd ln ηkXmn dηk = M∑ m=1 Nm∑ n=1 T∑ k=1 ϕmnk { ψ(γ′ kXmn ) − ψ(γ′ k) } , (4) where γ′ k ≡ ∑D d=1 γ′ kd. ∫ q(Z)q(w) ln p(Z|V , w, β)dZdw = ∑ m ∑ k ∫ q(Zmk)q(wmk) ln { (e−wmk )βpk Γ(βpk) Zβpk−1 mk e−e−wmk Zmk } dZmkdwmk = − ∑ k βpk ∑ m ∫ q(wmk)wmkdwmk − ∑ k ln Γ(βpk) + ∑ k (βpk − 1) ∑ m ∫ q(Zmk) ln ZmkdZmk − ∑ m ∑ k ∫ q(Zmk)q(wmk)e−wmk ZmkdZmkdwmk, (5) where ∫ q(wmk)e−wmk dwmk = ∫ 1 √ 2πvmk exp { − (wmk − µmk)2 2vmk − wmk } dwmk = ∫ 1 √ 2πvmk exp ( − w2 mk − 2µmkwmk + 2vmkwmk + µ2 mk 2vmk ) dwmk = ∫ 1 √ 2πvmk exp { − (wmk − µmk + vmk)2 2vmk − µmk + vmk 2 } dwmk = exp ( − µmk + vmk 2 ) . (6) Note that vmk is a variance. Consequently, we have ∫ q(Z)q(w) ln p(Z|V , w, β)dZdw = − ∑ k βpk ∑ m µmk − ∑ k ln Γ(βpk) + ∑ k (βpk − 1) ∑ m { ψ(amk) − ln bmk } − ∑ m ∑ k amk bmk exp ( − µmk + vmk 2 ) . (7) Note that pk ≡ Vk ∏k−1 j=1 (1 − Vj).
  • 3. ∫ ∑ C q(C)q(Z) ln p(C|Z)dZ = ∑ m ∑ n ∫ q(Zm) ∑ k ϕmnk ln Zmk ∑T j=1 Zmj dZm = ∑ m ∑ k ( ∑ n ϕmnk ) ∫ q(Zmk) ln ZmkdZmk − ∑ m Nm ∫ q(Zm) ln ( T∑ j=1 Zmj ) dZm. (8) Since ln x ≤ x ξ − 1 + ln ξ for any ξ > 0, ∫ q(Zm) ln ( T∑ j=1 Zmj ) dZm ≤ ∫ q(Zm) (∑ j Zmk ξm − 1 + ln ξm ) dZm = 1 ξm ∑ k amk bmk − 1 + ln ξm. (9) Therefore, ∫ ∑ C q(C)q(Z) ln p(C|Z)dZ = ∑ m ∑ k ( ∑ n ϕmnk ){ ψ(amk) − ln bmk } − ∑ m Nm ξm ∑ k amk bmk + ∑ m Nm − ∑ m Nm ln ξm. (10) ∫ q(w) ln p(w|m, K)dw = ∑ m ∫ q(wm) ln p(wm|m, K)dwm = ∑ m [ − D 2 ln 2π − 1 2 ln |K| − 1 2 ∫ q(wm)(wm − m)T K−1 (wm − m)dwm ] = − MD ln 2π 2 − M ln |K| 2 − 1 2 ∑ m { ∑ k (µ2 mk + vmk)K−1 k:k − 2 ∑ k mkµmkK−1 k:k + ∑ k m2 kK−1 k:k + ∑ k ∑ j̸=k (µmkµmj − 2µmkmj + mkmj)K−1 k:j } = − MD ln 2π 2 − M ln |K| 2 − 1 2 ∑ m { ∑ k vmkK−1 k:k + ∑ k ∑ j (µmk − mk)(µmj − mj)K−1 k:j } (11) ∫ q(η) ln p(η)dη = ∑ k ∫ Γ( ∑ d γ′ kd) ∏ d Γ(γ′ kd) D∏ d=1 η γ′ kd−1 kd { ln Γ(Dγ) − DΓ(γ) + ∑ d′ (γ − 1) ln ηkd } dηk = T ln Γ(Dγ) − TDΓ(γ) + (γ − 1) ∑ k ∑ d { ψ(γ′ kd) − ψ(γ′ k) } (12) ln p(V |α) = T ln Γ(α + 1) − TΓ(α) + (α − 1) ∑ k ln(1 − Vk) (13) ∫ q(Z) ln q(Z)dZ = − ∑ m ∑ k { ln Γ(amk) − (amk − 1)ψ(amk) − ln bmk + amk } (14) ∑ C q(C) ln q(C) = ∑ m ∑ n ∑ k ϕmnk ln ϕmnk (15) ∫ q(w) ln q(w)dw = − MT(1 + ln 2π) 2 − ∑ m ∑ k ln vmk 2 (16) ∫ q(η) ln q(η)dη = ∑ k [ ∑ d (γ′ kd − 1) { ψ(γ′ kd) − ψ(γ′ k) } + ln Γ(γ′ k) − ∑ d ln Γ(γ′ kd) ] (17)
  • 4. Consequently, we obtain a lower bound of the log evidence as follows: ln p(X) ≥ M∑ m=1 Nm∑ n=1 T∑ k=1 ϕmnk { ψ(γ′ kXmn ) − ψ(γ′ k) } − T∑ k=1 { βVk k−1∏ j=1 (1 − Vj) } M∑ m=1 µmk − T∑ k=1 ln Γ ( βVk k−1∏ j=1 (1 − Vj) ) + T∑ k=1 { βVk k−1∏ j=1 (1 − Vj) − 1 } M∑ m=1 { ψ(amk) − ln bmk } − M∑ m=1 T∑ k=1 amk bmk exp ( − µmk + vmk 2 ) + M∑ m=1 T∑ k=1 ( Nm∑ n=1 ϕmnk ){ ψ(amk) − ln bmk } − M∑ m=1 Nm ξm T∑ k=1 amk bmk + M∑ m=1 Nm − M∑ m=1 Nm ln ξm − MD ln 2π 2 − M ln |K| 2 − 1 2 M∑ m=1 { T∑ k=1 vmkK−1 k:k + T∑ k=1 T∑ j=1 (µmk − mk)(µmj − mj)K−1 k:j } + T ln Γ(Dγ) − TD ln Γ(γ) + (γ − 1) T∑ k=1 D∑ d=1 { ψ(γ′ kd) − ψ(γ′ k) } + T ln Γ(α + 1) − T ln Γ(α) + (α − 1) T∑ k=1 ln(1 − Vk) + M∑ m=1 T∑ k=1 { ln Γ(amk) − (amk − 1)ψ(amk) − ln bmk + amk } − M∑ m=1 Nm∑ n=1 T∑ k=1 ϕmnk ln ϕmnk + MT(1 + ln 2π) 2 + M∑ m=1 T∑ k=1 ln vmk 2 − T∑ k=1 [ D∑ d=1 (γ′ kd − 1) { ψ(γ′ kd) − ψ(γ′ k) } + ln Γ(γ′ k) − D∑ d=1 ln Γ(γ′ kd) ] + ln p(α) + ln p(β) + ln p(m) + ln p(K). (18) We assume that p(m) and p(K) are a uniform distribution, and that p(α) and p(β) are a Gamma distri- bution. 3 Inference Algorithm 3.1 Update q(Cmn) Let L denote the right hand side of the Eq. (18). ∂L ∂ϕmnk = ψ(γ′ kXmn ) − ψ(γ′ k) + ψ(amk) − ln bmk − ln ϕmnk − 1 ∴ ϕmnk ∝ exp { ψ(γ′ kXmn ) − ψ(γ′ k) + ψ(amk) − ln bmk } (19) 3.2 Update q(Zmk) ∂L ∂ξm = Nm ξ2 m ∑ k amk bmk − Nm ξm , ∴ ξm = ∑ k amk bmk . (20) ∂L ∂bmk = − { βVk k−1∏ j=1 (1 − Vj) − 1 } 1 bmk + amk b2 mk exp ( − µmk + vmk 2 ) − ( Nm∑ n=1 ϕmnk ) 1 bmk + Nm ξm amk b2 mk − 1 bmk (21)
  • 5. ∂L ∂bmk = 0 gives 0 = −bmk { βVk k−1∏ j=1 (1 − Vj) + Nm∑ n=1 ϕmnk } + amk { exp ( − µmk + vmk 2 ) + Nm ξm } . (22) Therefore, bmk = amk · exp ( − µmk + vmk 2 ) + Nm ξm βVk ∏k−1 j=1 (1 − Vj) + ∑Nm n=1 ϕmnk . (23) ∂L ∂amk = { βVk k−1∏ j=1 (1 − Vj) − 1 } ψ′ (amk) − 1 bmk exp ( − µmk + vmk 2 ) + ( Nm∑ n=1 ϕmnk ) ψ′ (amk) − Nm ξm 1 bmk − (amk − 1)ψ′ (amk) + 1 = { βVk k−1∏ j=1 (1 − Vj) + Nm∑ n=1 ϕmnk − amk } ψ′ (amk) − 1 bmk { exp ( − µmk + vmk 2 ) + Nm ξm } + 1 (24) By using the result for bmk, we obtain ∂L ∂amk = { βVk k−1∏ j=1 (1 − Vj) + Nm∑ n=1 ϕmnk − amk } ψ′ (amk) − βVk ∏k−1 j=1 (1 − Vj) + ∑Nm n=1 ϕmnk amk + 1 = { βVk k−1∏ j=1 (1 − Vj) + Nm∑ n=1 ϕmnk − amk }{ ψ′ (amk) − 1 amk } ∴ amk = βVk k−1∏ j=1 (1 − Vj) + Nm∑ n=1 ϕmnk, bmk = exp ( − µmk + vmk 2 ) + Nm ξm . (25) 3.3 Update q(wmk) ∂L ∂µmk = amk bmk exp ( − µmk + vmk 2 ) − { βVk k−1∏ j=1 (1 − Vj) } − T∑ j=1 (µmj − mj)K−1 k:j (26) ∂L ∂vmk = 1 2 { − amk bmk exp ( − µmk + vmk 2 ) − K−1 k:k + 1 vmk } (27) The plus and minus signs on the right hand side of the second line of Eq. (22) in the original paper are different from those given above. We may use L-BFGS for updating µmk and vmk. 3.4 Update q(ηk) ∂L ∂γ′ kd = ∑ m ∑ n I(Xmn = d)ϕmnkψ′ (γ′ kd) − ∑ m ∑ n ϕmnkψ′ (γ′ k) + (γ − 1)ψ′ (γ′ kd) − (γ − 1) ∑ d ψ′ (γ′ k) − ψ(γ′ kd) + ψ(γ′ k) − (γ′ kd − 1)ψ′ (γ′ kd) + ∑ d (γ′ kd − 1)ψ′ (γ′ k) − ψ(γ′ k) + ψ(γ′ dk) = ∑ m ∑ n I(Xmn = d)ϕmnkψ′ (γ′ kd) − ∑ m ∑ n ϕmnkψ′ (γ′ k) + (γ − γ′ kd)ψ′ (γ′ kd) − ∑ d (γ − γ′ kd)ψ′ (γ′ k) = ψ′ (γ′ kd) { ∑ m ∑ n I(Xmn = d)ϕmnk + γ − γ′ kd } − ψ′ (γ′ k) ∑ d { ∑ m ∑ n I(Xmn = d)ϕmnk + γ − γ′ kd } ∴ γ′ kd = γ + ∑ m ∑ n I(Xmn = d)ϕmnk (28)
  • 6. 3.5 Update q(Vk) ∂L ∂Vk = − α − 1 1 − Vk − β k−1∏ j=1 (1 − Vj) M∑ m=1 { µmk − ψ(amk) + ln bmk } − 1 1 − Vk T∑ ˆk=k+1 { βVˆk ˆk−1∏ j=1 (1 − Vj) } M∑ m=1 { µmˆk − ψ(amˆk) + ln bmˆk } − β k−1∏ j=1 (1 − Vj)ψ ( βVk k−1∏ j=1 (1 − Vj) ) − T∑ ˆk=k+1 1 1 − Vk βVˆk ˆk−1∏ j=1 (1 − Vj)ψ ( βVˆk ˆk−1∏ j=1 (1 − Vj) ) = − α − 1 1 − Vk − β k−1∏ j=1 (1 − Vj) M∑ m=1 { µmk − ψ(amk) + ln bmk } − β k−1∏ j=1 (1 − Vj) T∑ ˆk=k+1 { Vˆk ˆk−1∏ j=k+1 (1 − Vj) } M∑ m=1 { µmˆk − ψ(amˆk) + ln bmˆk } − β k−1∏ j=1 (1 − Vj)ψ ( βVk k−1∏ j=1 (1 − Vj) ) − β k−1∏ j=1 (1 − Vj) T∑ ˆk=k+1 { Vˆk ˆk−1∏ j=k+1 (1 − Vj) } ψ ( βVˆk ˆk−1∏ j=1 (1 − Vj) ) = − α − 1 1 − Vk − β k−1∏ j=1 (1 − Vj) [ M∑ m=1 { µmk − ψ(amk) + ln bmk } + ψ ( βVk k−1∏ j=1 (1 − Vj) )] − β k−1∏ j=1 (1 − Vj) T∑ ˆk=k+1 { Vˆk ˆk−1∏ j=k+1 (1 − Vj) }[ M∑ m=1 { µmˆk − ψ(amˆk) + ln bmˆk } + ψ ( βVˆk ˆk−1∏ j=1 (1 − Vj) )] = − α − 1 1 − Vk − pk Vk [ M∑ m=1 { µmk − ψ(amk) + ln bmk } + ψ(βpk) ] − T∑ j=k+1 pj 1 − Vk [ M∑ m=1 { µmj − ψ(amj) + ln bmj } + ψ(βpj) ] (29) I think that Vk on the second line of Eq. (24) in the original paper is not required. 3.6 Update q(K) With respect to K, we maximize the following function: L(K) = − M 2 ln |K| − 1 2 M∑ m=1 T∑ k=1 vmkK−1 k:k − 1 2 M∑ m=1 (µm − m)T K−1 (µm − m), (30) where the last term is equal to 1 2 ∑M m=1 ∑T k=1 ∑T j=1(µmk − mk)(µmj − mj)K−1 k:j. The derivative of the first term of the right hand side in Eq. (30) is obtained based on the following identity (Cf. Eq. (51) of The Matrix Cookbook1 ): ∂ ln |K| ∂K = K−1 . (31) For the second term of the right hand side in Eq. (30), it holds that ∑ k vmkK−1 k:k = Tr[K−1 diag(vm)], where diag(vm) is a diagonal matrix whose kth diagonal entry is vmk. By using the following identity (Cf. Eq. (16) in Old and New Matrix Algebra Useful for Statistics2 ): ∂Tr[AΣ−1 B] ∂Σ = −Σ−1 BAΣ−1 , (32) 1http://orion.uwaterloo.ca/ hwolkowi/matrixcookbook.pdf 2http://research.microsoft.com/en-us/um/people/minka/papers/matrix/minka-matrix.pdf
  • 7. we obtain ∂ ∑ m ∑ k vmkK−1 k:k ∂K = −K−1 { ∑ m diag(vm) } K−1 . For the last term in Eq. (30), it holds that (µm − m)T K−1 (µm − m) = Tr [ (µm − m)T K−1 (µm − m) ] . (33) Therefore, by using Eq. (32), we obtain ∂(µm−m)T K−1 (µm−m) ∂K = −K−1 (µm − m)(µm − m)T K−1 . Consequently, we have ∂L(K) ∂K = − M 2 K−1 + 1 2 K−1 { ∑ m diag(vm) } K−1 + 1 2 K−1 ∑ m { (µm − m)(µm − m)T } K−1 . (34) ∂L(K) ∂K = 0 holds when K−1 = 1 M K−1 ∑ m { diag(vm) + (µm − m)(µm − m)T } K−1 . (35) By multiplying K on both sides of the above equation from left and right, we obtain K = 1 M ∑ m { diag(vm) + (µm − m)(µm − m)T } . (36) This derivation is completely the same with that of CTM [1]. 3.7 Update q(m) ∂L ∂mk = T∑ j=1 (µmj − mj)K−1 k:j , ∴ mk = 1 T T∑ j=1 µmj (37) 3.8 Update q(α) With respect to α, we maximize the following function: L(α) = T ln Γ(α + 1) − T ln Γ(α) + (α − 1) T∑ k=1 ln(1 − Vk) (38) We use the following identity (Cf. Eqs. (120), (121), and (122) in Estimating a Dirichlet distribution3 ): Γ(n + x) Γ(x) ≥ cxa if n ≥ 1 (39) a = { ψ(n + ˆx) − ψ(ˆx) } ˆx (40) c = Γ(n + ˆx) Γ(ˆx) ˆx−a (41) Then we obtain: L(α) ≥ T { ψ(ˆα + 1) − ψ(ˆα) } ˆα ln α + (α − 1) T∑ k=1 ln(1 − Vk) + const. (42) We maximize this lower bound, which we denote as L(α). ∂L(α) ∂α = 1 α T { ψ(ˆα + 1) − ψ(ˆα) } ˆα + T∑ k=1 ln(1 − Vk) (43) ∴ α = α · T { ψ(α + 1) − ψ(α) } − ∑T k=1 ln(1 − Vk) (44) 3http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/
  • 8. This is a multiplicative update. When we apply a Gamma prior p(α) = b a0 0 Γ(a0) αa0−1 e−b0α to α, we have the following result: ∂L(α) ∂α = 1 α T { ψ(ˆα + 1) − ψ(ˆα) } ˆα + T∑ k=1 ln(1 − Vk) + (a0 − 1) 1 α − b0 (45) ∴ α = α · a0 − 1 + T { ψ(α + 1) − ψ(α) } b0 − ∑T k=1 ln(1 − Vk) (46) 3.9 Update q(β) With respect to β, we maximize the following function L(β): L(β) = − T∑ k=1 { βVk k−1∏ j=1 (1 − Vj) } M∑ m=1 µmk − T∑ k=1 ln Γ ( βVk k−1∏ j=1 (1 − Vj) ) + T∑ k=1 { βVk k−1∏ j=1 (1 − Vj) } M∑ m=1 { ψ(amk) − ln bmk } = − T∑ k=1 βpk M∑ m=1 µmk − T∑ k=1 ln Γ(βpk) + T∑ k=1 βpk M∑ m=1 { ψ(amk) − ln bmk } (47) The first and the second derivatives are obtained as follows: ∂L(β) ∂β = − T∑ k=1 pk [ ψ(βpk) + M∑ m=1 { µmk − ψ(amk) + ln bmk }] ∂2 L(β) ∂β2 = − T∑ k=1 p2 kψ′ (βpk) (48) We can use Newton’s method to update β. When we apply a Gamma prior p(β) = d c0 0 Γ(c0) βc0−1 e−d0β to β, we have the following result: ∂L(β) ∂β = − T∑ k=1 pk [ ψ(βpk) + M∑ m=1 { µmk − ψ(amk) + ln bmk }] + (c0 − 1) 1 β − d0 ∂2 L(β) ∂β2 = − T∑ k=1 p2 kψ′ (βpk) − (c0 − 1) 1 β2 (49) References [1] David M. Blei and John D. Lafferty. Correlated topic models. In NIPS, 2005. [2] John Paisley, Chong Wang, and David Blei. The discrete infinite logistic normal distribution for mixed-membership modeling. In AISTATS, 2011.