2. 2
Each word comes from different topics
(bag of words: ignore order)
Mixture Weight for Topic k
Multinomial Distribution
over ALL words based on topic k
3. It is a mixture model
3
Word
Topic 1 Topic K
datalove
date
life
computer
java
4. It is a mixture model
4
Word
Topic 1 Topic K
Big Data
Machine Learning
1) Pick a topic
2) Pick a word
data
life
computer
love
date
life
java
5. It is a mixture model
5
Word
Topic 1 Topic K
The chosen
Topic: Z
datalove
date
life
computer
java
6. It is a mixture model
6
Word
Topic 1 Topic K
Big Data
Machine Learning
The chosen
Topic: Z
data
love
date
life
computer
java
16. 16
and control the “sparsity” of the weights for the multinomial.
Implications: a priori we assume
- Topics have few key words
- Documents only have a small subset of topics
21. 21
The nominator can be computed by summing the joint distribution
over every possible instantiation of hidden topic structures.
However, the number of possible topic structures is exponentially
so it is exponentially large. This method is not feasible.
22. 22
Two main ways to get posterior:
- Sampling methods
- Asymptotically correct
- Time consuming
- Variational methods
- Faster
- Need math skills
23. 23
- An intuitively appealing Bayesian unsupervised learning model
- Training is difficult
- Lots of packages exist, main issue is scalability
- Validation is difficult
- Usually cast into a supervised learning framework
- Presentation is difficult
- Visualization for the Bayesian model is hard.
Summary:
24. Blei, D., Ng, A., Jordan, M. Latent Dirichlet allocation.
J. Mach. Learn. Res. 3 (January 2003), 993–1022.
Reference :
24