Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Simple SGVB
(Stochastic Gradient Variational Bayes)
for the CTM
(Correlated Topic Model)
Tomonari MASADA (正田备也)
Nagasaki...
Aim
•Make an informative summary of
large document sets by
•extracting word lists, each relating to
a different and partic...
Contribution
•We propose a new posterior estimation for
the correlated topic model (CTM) [Blei+ 07],
•an extension of LDA ...
LDA [Blei+ 03]
•Clustering word tokens by assigning each word token to
one among the 𝐾 topics.
• 𝑧 𝑑𝑖: To which topic is t...
CTM [Blei+ 05]
•Clustering word tokens by assigning each word token to
one among the 𝐾 topics.
• 𝑧 𝑑𝑖: To which topic is t...
Variational Bayes
Maximization of ELBO (evidence lower bound)
•VB (variational Bayes) approximates the true posterior.
•An...
Factorization assumption
•We assume the approximate posterior 𝑞 𝒛, 𝚯
factorizes as 𝑞 𝒛 𝑞 𝚯 .
•Then ELBO can be written as
...
SGVB
[Kingma+ 14]
•SGVB (stochastic gradient variational Bayes) is a
general framework for estimating ELBO in
VB.
•SGVB is...
Reparameterization
•We use the diagonal logistic normal for
approximating the true posterior of 𝜽 𝑑.
•We can efficiently s...
Monte Carlo integration
•ELBO is estimated with a sample from the
approximate posterior.
• The discrete part 𝑞 𝒛 is estima...
Parameter updates
No explicit inversion (only Cholesky factorization)
12
"Stochastic" gradient
•The expectation integrations are estimated
by Monte Carlo method.
•The derivatives of ELBO depend o...
Data sets
# docs # word types
NYT 149,890 46,650
MOVIE 27,859 62,408
NSF 128,818 21,471
MED 125,490 42,83014
Conclusion
•We incorporate randomness into the
posterior inference for the CTM by
using SGVB.
•The proposed method gives p...
Pro/Con
•No explicit inversion of covariance
matrix is required.
•Careful tuning of gradient descent
seems required.
•Only...
Future work
•Online learning for topic models with NN
•NN may achieve a better approximate posterior.
•SGVB can be used to...
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
Upcoming SlideShare
Loading in …5
×

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

5,138 views

Published on

poster presentation
APWeb 2016 @ Suzhou, China

Published in: Engineering
  • Dating direct: ❶❶❶ http://bit.ly/39sFWPG ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/39sFWPG ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

  1. 1. A Simple SGVB (Stochastic Gradient Variational Bayes) for the CTM (Correlated Topic Model) Tomonari MASADA (正田备也) Nagasaki University (长崎大学) masada@nagasaki-u.ac.jp APWeb 2016 @ Suzhou
  2. 2. Aim •Make an informative summary of large document sets by •extracting word lists, each relating to a different and particular topic.  Topic modeling 2
  3. 3. Contribution •We propose a new posterior estimation for the correlated topic model (CTM) [Blei+ 07], •an extension of LDA [Blei+ 03] for modeling topic correlations, •with stochastic gradient variational Bayes (SGVB) [Kingma+ 14]. 4
  4. 4. LDA [Blei+ 03] •Clustering word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is assigned? • 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑? • Multinomial distribution for each 𝑑 • 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘? • Multinomial distribution for each 𝑘 discrete variables continuous variables 5
  5. 5. CTM [Blei+ 05] •Clustering word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is assigned? • 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑? • 𝜽 𝑑 = 𝑓 𝜼 𝑑 where 𝜼 𝑑~𝑁 𝝁, 𝚺 (logistic normal distribution) • 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘? • Multinomial distribution for each 𝑘 discrete variables continuous variables 6
  6. 6. Variational Bayes Maximization of ELBO (evidence lower bound) •VB (variational Bayes) approximates the true posterior. •An approximate posterior is introduced when ELBO is obtained by Jensen's inequality: • 𝒛: discrete hidden variables (topic assignments) • 𝚯: continuous hidden variables (multinomial parameters) 7 log evidence approximate posterior 𝑞(𝒛, 𝚯)
  7. 7. Factorization assumption •We assume the approximate posterior 𝑞 𝒛, 𝚯 factorizes as 𝑞 𝒛 𝑞 𝚯 . •Then ELBO can be written as 8 ×discrete continuous
  8. 8. SGVB [Kingma+ 14] •SGVB (stochastic gradient variational Bayes) is a general framework for estimating ELBO in VB. •SGVB is only applicable to continuous distributions 𝑞 𝚯 . •Monte Carlo integration for expectation 9
  9. 9. Reparameterization •We use the diagonal logistic normal for approximating the true posterior of 𝜽 𝑑. •We can efficiently sample from the logistic normal with reparameterization. 10
  10. 10. Monte Carlo integration •ELBO is estimated with a sample from the approximate posterior. • The discrete part 𝑞 𝒛 is estimated as in the original VB. 11
  11. 11. Parameter updates No explicit inversion (only Cholesky factorization) 12
  12. 12. "Stochastic" gradient •The expectation integrations are estimated by Monte Carlo method. •The derivatives of ELBO depend on samples. •Randomness is incorporated into the maximization of ELBO. •Does this make it easier to avoid local minima? 13
  13. 13. Data sets # docs # word types NYT 149,890 46,650 MOVIE 27,859 62,408 NSF 128,818 21,471 MED 125,490 42,83014
  14. 14. Conclusion •We incorporate randomness into the posterior inference for the CTM by using SGVB. •The proposed method gives perplexities comparable to those achieved by LDA. 19
  15. 15. Pro/Con •No explicit inversion of covariance matrix is required. •Careful tuning of gradient descent seems required. •Only Adam was tested. 20
  16. 16. Future work •Online learning for topic models with NN •NN may achieve a better approximate posterior. •SGVB can be used to estimate ELBO in a similar manner. •Document batches can be fed to VB indefinitely. •Topic word lists are then updated indefinitely. 21

×