The document describes different methods for collapsed stochastic variational inference for topic models:
1) Classic collapsed LDA inference iterates between updating word topic assignments and topic distributions for each document.
2) Stochastic practical collapsed variational inference for HDP extends this by adding stochastic updates to the hyperparameters.
3) Stochastic variational inference for LDA further extends this with stochastic updates to the word-topic counts for each document.
4) An evaluation compares the predictive performance of these methods by measuring perplexity on held-out data from the Associated Press corpus.
2. Classic Collapsed LDA Inference
repeat
for each document d do
for each word i in d do
ndk = ndk zdik
nkwdi = nkwdi zdik
zdik / (ndk + ↵⇡k )
ndk = ndk + zdik
nkwdi = nkwdi + zdik
end for
end for
until stopping criterion is met.
nkwdi +
nk. + V
3. Stochastic Practical Collapsed Variational Inference
for the HDP
repeat
for each document d do
for each word i in d do
ndk = ndk zdik
nkwdi = nkwdi zdik
zdik / (ndk + ↵⇡k )
nkwdi +
nk. + V
ndk = ndk + zdik
nkwdi = nkwdi + zdik
end for
uk = 1 +
X
E[ (ndk
1)]
dT
vk =
⇡k = ⇡k
¯
X
E[ (ndl
(1
+
¯
⇡l ) ; with ⇡k =
¯
l=k+1,d
k 1
Y
l=1
1)]
uk
uk + v k
end for
until stopping criterion is met.
Sato, I., Kurihara, K., Nakagawa, H.: Practical Collapsed Variational Bayes Inference for Hierarchical Dirichlet Process. KDD (2012)
4. Stochastic Practical Collapsed Variational Inference
for the HDP
repeat
for each document d do
for each word i in d do
ndk = ndk zdik
nkwdi = nkwdi zdik
zdik / (ndk + ↵⇡k )
nkwdi +
nk. + V
ndk
nkw
ndk = ndk + zdik
nkwdi = nkwdi + zdik
(1
(1
⇢d )ndk + ⇢d Nd zk
t
t
⇢c )nkw + ⇢c N zk [wdi = w]
t
t
end for
uk = 1 +
X
E[ (ndk
uk
1)]
(1
⇢h )uk + ⇢h (1 + DE[ (ndk
t
t
(1
⇢h )vk
t
dT
vk =
⇡k = ⇡k
¯
E[ (ndl
(1
+
X
¯
⇡l ) ; with ⇡k =
¯
l=k+1,d
k 1
Y
vk
1)]
+
⇢h (
t
+D
T
X
1)])
E[ (ndl
1)])
l=k+1
l=1
uk
uk + v k
end for
until stopping criterion is met.
Bleier, A.: Practical Collapsed Stochastic Variational Inference for the HDP. NIPS Topic Models Workshop (2013)
5. Stochastic Variational Inference for LDA
repeat
for each document d do
for each word i in d do
ndk = ndk zdik
nkwdi = nkwdi zdik
zdik / (ndk + ↵⇡k )
ndk = ndk + zdik
nkwdi = nkwdi + zdik
nkwdi +
nk. + V
ndk
nkw
(1
(1
⇢d )ndk + ⇢d Nd zk
t
t
⇢c )nkw + ⇢c N zk [wdi = w]
t
t
end for
end for
until stopping criterion is met.
Foulds, J., Boyles, L., Smyth, P., Welling, M.: Stochastic Collapsed Variational Bayesian Inference for LDA. KDD (2013)
6. Evaluation
Predictive performance of the algorithm on the Associated
Press (TREC-1) data. For the evaluation we compared the
perplexity versus the number of documents seen for
PCSVB0, SCVB0 and PCVB0. We trained the model on
80% of the documents. All held out documents were split;
70% of the tokens in each held out document were used to
estimate the document parameters, the remaining 30%
were used to compute the perplexity.
s
Step-size schedule: ⇢t =
(⌧ + t)0.9
Configuration:
corpus-wide schedule ⇢c : s = 10, ⌧ = 1000
t
document schedule ⇢d : s = 1, ⌧ = 10
t
hyperparameter & stick weights ⇢h : s = 5, ⌧ = 100
t