A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

•Download as PPTX, PDF•

1 like•5,264 views

Tomonari Masada

poster presentation APWeb 2016 @ Suzhou, China

Engineering

A Simple SGVB
(Stochastic Gradient Variational Bayes)
for the CTM
(Correlated Topic Model)
Tomonari MASADA (正田备也)
Nagasaki University (长崎大学)
masada@nagasaki-u.ac.jp
APWeb 2016 @ Suzhou

Aim
•Make an informative summary of
large document sets by
•extracting word lists, each relating to
a different and particular topic.
 Topic modeling
2

Contribution
•We propose a new posterior estimation for
the correlated topic model (CTM) [Blei+ 07],
•an extension of LDA [Blei+ 03] for modeling
topic correlations,
•with stochastic gradient variational Bayes
(SGVB) [Kingma+ 14].
4

LDA [Blei+ 03]
•Clustering word tokens by assigning each word token to
one among the 𝐾 topics.
• 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is
assigned?
• 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑?
• Multinomial distribution for each 𝑑
• 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘?
• Multinomial distribution for each 𝑘
discrete variables
continuous variables
5

CTM [Blei+ 05]
•Clustering word tokens by assigning each word token to
one among the 𝐾 topics.
• 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is
assigned?
• 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑?
• 𝜽 𝑑 = 𝑓 𝜼 𝑑 where 𝜼 𝑑~𝑁 𝝁, 𝚺 (logistic normal distribution)
• 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘?
• Multinomial distribution for each 𝑘
discrete variables
continuous variables
6

Variational Bayes
Maximization of ELBO (evidence lower bound)
•VB (variational Bayes) approximates the true posterior.
•An approximate posterior is introduced when ELBO is
obtained by Jensen's inequality:
• 𝒛: discrete hidden variables (topic assignments)
• 𝚯: continuous hidden variables (multinomial parameters)
7
log evidence approximate posterior 𝑞(𝒛, 𝚯)

Factorization assumption
•We assume the approximate posterior 𝑞 𝒛, 𝚯
factorizes as 𝑞 𝒛 𝑞 𝚯 .
•Then ELBO can be written as
8
×discrete continuous

SGVB
[Kingma+ 14]
•SGVB (stochastic gradient variational Bayes) is a
general framework for estimating ELBO in
VB.
•SGVB is only applicable to continuous
distributions 𝑞 𝚯 .
•Monte Carlo integration for expectation
9

Reparameterization
•We use the diagonal logistic normal for
approximating the true posterior of 𝜽 𝑑.
•We can efficiently sample from the logistic
normal with reparameterization.
10

Monte Carlo integration
•ELBO is estimated with a sample from the
approximate posterior.
• The discrete part 𝑞 𝒛 is estimated as in the original VB. 11

Parameter updates
No explicit inversion (only Cholesky factorization)
12

"Stochastic" gradient
•The expectation integrations are estimated
by Monte Carlo method.
•The derivatives of ELBO depend on samples.
•Randomness is incorporated into the
maximization of ELBO.
•Does this make it easier to avoid local minima?
13

Data sets
# docs # word types
NYT 149,890 46,650
MOVIE 27,859 62,408
NSF 128,818 21,471
MED 125,490 42,83014

Conclusion
•We incorporate randomness into the
posterior inference for the CTM by
using SGVB.
•The proposed method gives perplexities
comparable to those achieved by LDA.
19

Pro/Con
•No explicit inversion of covariance
matrix is required.
•Careful tuning of gradient descent
seems required.
•Only Adam was tested.
20

Future work
•Online learning for topic models with NN
•NN may achieve a better approximate posterior.
•SGVB can be used to estimate ELBO in a similar
manner.
•Document batches can be fed to VB indefinitely.
•Topic word lists are then updated indefinitely.
21

What's hot

Ir 09Mohammed Romi

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase

Word embeddingsShruti kar

TopicModels_BleiPaper_Summary.pptxKalpit Desai

..Ans 1Vimmi Kaushal

Probabilistic information retrieval models & systemsSelman Bozkır

What's hot (6)

Ir 09

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015

Word embeddings

TopicModels_BleiPaper_Summary.pptx

..Ans 1

Probabilistic information retrieval models & systems

Similar to A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...Association for Computational Linguistics

Explicit Density ModelsSangwoo Mo

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST

Word_Embedding.pptxNameetDaga1

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48

Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...AIST

PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP TasksMasahiro Kaneko

2021 03-02-distributed representations-of_words_and_phrasesJAEMINJEONG5

What is word2vec?Traian Rebedea

Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals

Word2vec slide(lab seminar)Jinpyo Lee

Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha

Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...PyData

Understanding GloVeJEE HYUN PARK

Word embeddingsAjay Taneja

Monte carlo dropout and variational bound天乐杨

Science in text miningTanay Chowdhury

Eskm20140903Shuhei Otani

AINL 2016: NikolenkoLidia Pivovarova

Similar to A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model (20)

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...

Explicit Density Models

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...

Word_Embedding.pptx

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...

Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...

PyData Los Angeles 2020 (Abhilash Majumder)

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

2021 03-02-distributed representations-of_words_and_phrases

What is word2vec?

Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...

Word2vec slide(lab seminar)

Reference Scope Identification of Citances Using Convolutional Neural Network

Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...

Understanding GloVe

Word embeddings

Monte carlo dropout and variational bound

Science in text mining

Eskm20140903

AINL 2016: Nikolenko

Recently uploaded

Introduction and different types of Ethernet.pptxupamatechverse

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Porous Ceramics seminar and technical writingrakeshbaidya232001

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Extrusion Processes and Their Limitations120cr0395

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Recently uploaded (20)

Introduction and different types of Ethernet.pptx

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Porous Ceramics seminar and technical writing

Microscopic Analysis of Ceramic Materials.pptx

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Processing & Properties of Floor and Wall Tiles.pptx

Extrusion Processes and Their Limitations

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

Introduction to IEEE STANDARDS and its different types.pptx

Coefficient of Thermal Expansion and their Importance.pptx

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

1. A Simple SGVB (Stochastic Gradient Variational Bayes) for the CTM (Correlated Topic Model) Tomonari MASADA (正田备也) Nagasaki University (长崎大学) masada@nagasaki-u.ac.jp APWeb 2016 @ Suzhou

2. Aim •Make an informative summary of large document sets by •extracting word lists, each relating to a different and particular topic.  Topic modeling 2

4. Contribution •We propose a new posterior estimation for the correlated topic model (CTM) [Blei+ 07], •an extension of LDA [Blei+ 03] for modeling topic correlations, •with stochastic gradient variational Bayes (SGVB) [Kingma+ 14]. 4

5. LDA [Blei+ 03] •Clustering word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is assigned? • 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑? • Multinomial distribution for each 𝑑 • 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘? • Multinomial distribution for each 𝑘 discrete variables continuous variables 5

6. CTM [Blei+ 05] •Clustering word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: To which topic is the 𝑖-th word token in document 𝑑 is assigned? • 𝜃 𝑑𝑘: How often is the topic 𝑘 talked about in document 𝑑? • 𝜽 𝑑 = 𝑓 𝜼 𝑑 where 𝜼 𝑑~𝑁 𝝁, 𝚺 (logistic normal distribution) • 𝜙 𝑘𝑣: How often is the word 𝑣 used to talk about the topic 𝑘? • Multinomial distribution for each 𝑘 discrete variables continuous variables 6

7. Variational Bayes Maximization of ELBO (evidence lower bound) •VB (variational Bayes) approximates the true posterior. •An approximate posterior is introduced when ELBO is obtained by Jensen's inequality: • 𝒛: discrete hidden variables (topic assignments) • 𝚯: continuous hidden variables (multinomial parameters) 7 log evidence approximate posterior 𝑞(𝒛, 𝚯)

8. Factorization assumption •We assume the approximate posterior 𝑞 𝒛, 𝚯 factorizes as 𝑞 𝒛 𝑞 𝚯 . •Then ELBO can be written as 8 ×discrete continuous

9. SGVB [Kingma+ 14] •SGVB (stochastic gradient variational Bayes) is a general framework for estimating ELBO in VB. •SGVB is only applicable to continuous distributions 𝑞 𝚯 . •Monte Carlo integration for expectation 9

10. Reparameterization •We use the diagonal logistic normal for approximating the true posterior of 𝜽 𝑑. •We can efficiently sample from the logistic normal with reparameterization. 10

11. Monte Carlo integration •ELBO is estimated with a sample from the approximate posterior. • The discrete part 𝑞 𝒛 is estimated as in the original VB. 11

12. Parameter updates No explicit inversion (only Cholesky factorization) 12

13. "Stochastic" gradient •The expectation integrations are estimated by Monte Carlo method. •The derivatives of ELBO depend on samples. •Randomness is incorporated into the maximization of ELBO. •Does this make it easier to avoid local minima? 13

14. Data sets # docs # word types NYT 149,890 46,650 MOVIE 27,859 62,408 NSF 128,818 21,471 MED 125,490 42,83014

15.

16.

17.

18.

19. Conclusion •We incorporate randomness into the posterior inference for the CTM by using SGVB. •The proposed method gives perplexities comparable to those achieved by LDA. 19

20. Pro/Con •No explicit inversion of covariance matrix is required. •Careful tuning of gradient descent seems required. •Only Adam was tested. 20

21. Future work •Online learning for topic models with NN •NN may achieve a better approximate posterior. •SGVB can be used to estimate ELBO in a similar manner. •Document batches can be fed to VB indefinitely. •Topic word lists are then updated indefinitely. 21

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

Similar to A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model (20)

More from Tomonari Masada

More from Tomonari Masada (20)

Recently uploaded

Recently uploaded (20)

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model