Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Simple Stochastic Gradient
Variational Bayes
for Latent Dirichlet Allocation
Tomonari MASADA (正田备也)
Nagasaki University ...
Aim
•Obtain an informative summary of a large
set of documents
•by extracting word lists, each relating to a
specific topi...
3
Contribution
•We propose a new posterior estimation for
latent Dirichlet allocation (LDA) [Blei+ 03]
•by applying stochast...
5
LDA [Blei+ 03]
• Achieve a clustering of word tokens by assigning each word
token to one among the 𝐾 topics.
• 𝑧 𝑑𝑖: the t...
Variational Bayesian (VB) inference
= maximization of evidence lower bound (ELBO)
•VB tries to approximate the true poster...
Factorization assumption
•We assume the approximate posterior 𝑞 𝒛, 𝚯
factorizes as 𝑞 𝒛 𝑞 𝚯 to make the inference
tractable...
Stochastic gradient variational Bayes
(SGVB) [Kingma+ 14]
•A general framework for estimating evidence
lower bound (ELBO) ...
(SGVB) Monte Carlo integration
•By using Monte Carlo integration, ELBO can be
estimated with 𝐿 random samples as
• The dis...
(SGVB) Reparameterization
• SGVB can be applied "under certain mild conditions."
• We use the logistic normal distribution...
Maximize ELBO using gradient ascent
12
13
"Stochastic" gradient VB
•The expectation integrations in ELBO are estimated
by Monte Carlo method.
•The derivatives of EL...
without randomness
= with zero standard deviation
•A special case of the proposed method is quite
similar to CVB0 [Asuncio...
Data sets for evaluation
# docs
# vocabulary
words
NYT 99,932 46,263
MOVIE 27,859 62,408
NSF 128,818 21,471
MED 125,490 42...
17
18
19
20
Not that efficient in time…
•500 iters for NYT data set when 𝐾 = 200
•LNV: 43 hours
•CGS: 14 hours
•VB: 23 hours
•However,...
Conclusion
•We incorporate randomness into variational
inference for LDA by applying SGVB to LDA.
•The proposed method giv...
Future work
•SGVB is a general framework for devising a
posterior inference for probabilistic models.
•We've already appli...
24
Upcoming SlideShare
Loading in …5
×

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

5,512 views

Published on

ICCSA 2016 @ Beijing

Published in: Engineering
  • Dating for everyone is here: ❤❤❤ http://bit.ly/39pMlLF ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/39pMlLF ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

  1. 1. A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation Tomonari MASADA (正田备也) Nagasaki University (长崎大学) masada@nagasaki-u.ac.jp
  2. 2. Aim •Obtain an informative summary of a large set of documents •by extracting word lists, each relating to a specific topic  Topic modeling 2
  3. 3. 3
  4. 4. Contribution •We propose a new posterior estimation for latent Dirichlet allocation (LDA) [Blei+ 03] •by applying stochastic gradient variational Bayes (SGVB) [Kingma+ 14] to LDA 4
  5. 5. 5
  6. 6. LDA [Blei+ 03] • Achieve a clustering of word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: the topic to which the 𝑖-th word token in document 𝑑 is assigned. • 𝜃 𝑑𝑘: How often the topic 𝑘 is talked about in document 𝑑? • Topic probability distribution in each document • 𝜙 𝑘𝑣: How often the word 𝑣 is used to talk about the topic 𝑘? • Word probability distribution for each topic discrete variables continuous variables 6
  7. 7. Variational Bayesian (VB) inference = maximization of evidence lower bound (ELBO) •VB tries to approximate the true posterior. •An approximate posterior is introduced when ELBO is obtained by applying Jensen's inequality: • 𝒛: discrete hidden variables (topic assignments) • 𝚯: continuous hidden variables (multinomial parameters) evidence approximate posterior 𝑞(𝒛, 𝚯) 7
  8. 8. Factorization assumption •We assume the approximate posterior 𝑞 𝒛, 𝚯 factorizes as 𝑞 𝒛 𝑞 𝚯 to make the inference tractable. •Then ELBO can be written as 8
  9. 9. Stochastic gradient variational Bayes (SGVB) [Kingma+ 14] •A general framework for estimating evidence lower bound (ELBO) in variational Bayes (VB) •Only applicable to continuous distributions 𝑞 𝚯 9
  10. 10. (SGVB) Monte Carlo integration •By using Monte Carlo integration, ELBO can be estimated with 𝐿 random samples as • The discrete part 𝑞 𝒛 is estimated in a similar manner to the original VB for LDA [Blei+ 03]. 10
  11. 11. (SGVB) Reparameterization • SGVB can be applied "under certain mild conditions." • We use the logistic normal distributions for approximating the true posterior of 𝜃 𝑑𝑘: per-doc topic probability distributions, and 𝜙 𝑘𝑣: per-topic word probability distributions. • We can efficiently sample from the logistic normal with reparameterization. 11
  12. 12. Maximize ELBO using gradient ascent 12
  13. 13. 13
  14. 14. "Stochastic" gradient VB •The expectation integrations in ELBO are estimated by Monte Carlo method. •The derivatives of ELBO depend on random samples. •Randomness is incorporated into maximization. • SGVB = VB where gradients are stochastic. • (Observation) It seems easier to avoid poor local minima. 14
  15. 15. without randomness = with zero standard deviation •A special case of the proposed method is quite similar to CVB0 [Asuncion+ 09]. •Our method has a context. 15
  16. 16. Data sets for evaluation # docs # vocabulary words NYT 99,932 46,263 MOVIE 27,859 62,408 NSF 128,818 21,471 MED 125,490 42,830 16
  17. 17. 17
  18. 18. 18
  19. 19. 19
  20. 20. 20
  21. 21. Not that efficient in time… •500 iters for NYT data set when 𝐾 = 200 •LNV: 43 hours •CGS: 14 hours •VB: 23 hours •However, parallelization with GPU works. •(preparing an implementation with TensorFlow) 21
  22. 22. Conclusion •We incorporate randomness into variational inference for LDA by applying SGVB to LDA. •The proposed method gives perplexities comparable to the existing inferences for LDA. 22
  23. 23. Future work •SGVB is a general framework for devising a posterior inference for probabilistic models. •We've already applied SGVB to CTM [Blei+ 05]. • This will be poster-presented at APWeb'16. •SGVB is also applicable to other document models. • NVDM [Miao+ 16]: document modeling with MLP 23
  24. 24. 24

×