©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation (LDA)
- for ML-IR Discussion Group
1
Prepared...
©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation:
A Bayesian Unsupervised Learning Model
Roadm...
©2013 LinkedIn Corporation. All Rights Reserved.
Unsupervised Learning
Learning patterns with no labels
3
• Clustering is ...
©2013 LinkedIn Corporation. All Rights Reserved. 4
How would you cluster?
©2013 LinkedIn Corporation. All Rights Reserved. 5
Documents of wikipedia
Now try these ones!
©2013 LinkedIn Corporation. All Rights Reserved.
Bayesian Statistics
A framework to update your beliefs
6
• Probabilities ...
©2013 LinkedIn Corporation. All Rights Reserved. 7
Candidate potential
Example: Evaluating Candidates
©2013 LinkedIn Corporation. All Rights Reserved. 8
Candidate potential
Example: Evaluating Candidates
Schooling
Experience...
©2013 LinkedIn Corporation. All Rights Reserved. 9
Candidate potential
Example: Evaluating Candidates
Schooling
Experience...
©2013 LinkedIn Corporation. All Rights Reserved. 10
©2013 LinkedIn Corporation. All Rights Reserved. 11
Model for candidates Model for data generation
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture Models
A popular statistical model
12
• An easy way to build hier...
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture models visualized
13
Candidate Quality
High
Low
©2013 LinkedIn Corporation. All Rights Reserved. 14
Marginal Distribution of Candidate Performance: ignore quality
©2013 LinkedIn Corporation. All Rights Reserved. 15
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 16
Distribution of Candidate Performance:
Mixture Weights
©2013 LinkedIn Corporation. All Rights Reserved. 17
Mixture Weights
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 18
Distribution of Candidate Performance:
?
? ?
?
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
19
©2013 LinkedIn Corporation. All Rights Reserved.
One possibility:
20
Each word comes from different topics (bag of words: ...
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
21
Each word comes from different ...
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
22
Word
Topic 1
Topic K
Leadership
Big Data
Machine ...
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
23
Word
Topic 1
Topic K
Leadership
Big Data
Machine ...
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
24
Word
Topic 1
Topic K
Leadership
Big Data
Machine ...
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
25
Word
Topic 1
Topic K
Leadership
Big Data
Machine ...
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
26
Word
Topic 1
Topic K
Leadership
Big Data
Machine ...
©2013 LinkedIn Corporation. All Rights Reserved.
Review!
27
Z W
©2013 LinkedIn Corporation. All Rights Reserved. 28
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of wor...
©2013 LinkedIn Corporation. All Rights Reserved. 29
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of wor...
©2013 LinkedIn Corporation. All Rights Reserved. 30
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of wor...
©2013 LinkedIn Corporation. All Rights Reserved. 31
and control the “sparsity” of the weights for the multinomial.
Implica...
©2013 LinkedIn Corporation. All Rights Reserved.
Dirichlet Distribution with Different Sparsity Parameters
32
©2013 LinkedIn Corporation. All Rights Reserved. 33
Latent Dirichlet Allocation!!!
Zd,n
k=1…K
Wd,n
n=1,…,Nd
©2013 LinkedIn Corporation. All Rights Reserved. 34
How do we fit this model?
Want the posterior:
Worst part of Bayesian A...
©2013 LinkedIn Corporation. All Rights Reserved. 35
Two main ways to get posterior:
- Sampling methods
- Asymtotically cor...
©2013 LinkedIn Corporation. All Rights Reserved. 36
Variational Bayes (specifically mean field variational bayes):
What’s ...
©2013 LinkedIn Corporation. All Rights Reserved.
LDA Take Home
37
- An intuitively appealing Bayesian unsupervised learnin...
Upcoming SlideShare
Loading in...5
×

LDA Beginner's Tutorial

7,868

Published on

Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.

Published in: Education, Technology
0 Comments
14 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,868
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
362
Comments
0
Likes
14
Embeds 0
No embeds

No notes for slide
  • Take home: validation is difficult….no true answer here.
  • Clustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.
  • 2 stage process
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • 2 stage process
  • 2 stage process
  • 2 stage process
  • 2 stage process
  • 2 stage process
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  • LDA Beginner's Tutorial

    1. 1. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation (LDA) - for ML-IR Discussion Group 1 Prepared by Wayne Tai Lee, Satpreet Singh
    2. 2. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation: A Bayesian Unsupervised Learning Model Roadmap 2 • Unsupervised learning • Bayesian Statistics • Mixture Models • LDA – theory and intuition • LDA – practice and applications
    3. 3. ©2013 LinkedIn Corporation. All Rights Reserved. Unsupervised Learning Learning patterns with no labels 3 • Clustering is a form of “Unsupervised learning” • Classification is known as supervised learning • Validation is difficult
    4. 4. ©2013 LinkedIn Corporation. All Rights Reserved. 4 How would you cluster?
    5. 5. ©2013 LinkedIn Corporation. All Rights Reserved. 5 Documents of wikipedia Now try these ones!
    6. 6. ©2013 LinkedIn Corporation. All Rights Reserved. Bayesian Statistics A framework to update your beliefs 6 • Probabilities as beliefs • Updates your belief as data is observed • Requires a model that describes the data generation
    7. 7. ©2013 LinkedIn Corporation. All Rights Reserved. 7 Candidate potential Example: Evaluating Candidates
    8. 8. ©2013 LinkedIn Corporation. All Rights Reserved. 8 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship
    9. 9. ©2013 LinkedIn Corporation. All Rights Reserved. 9 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship How to update?!
    10. 10. ©2013 LinkedIn Corporation. All Rights Reserved. 10
    11. 11. ©2013 LinkedIn Corporation. All Rights Reserved. 11 Model for candidates Model for data generation
    12. 12. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture Models A popular statistical model 12 • An easy way to build hierarchical relationships
    13. 13. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture models visualized 13 Candidate Quality High Low
    14. 14. ©2013 LinkedIn Corporation. All Rights Reserved. 14 Marginal Distribution of Candidate Performance: ignore quality
    15. 15. ©2013 LinkedIn Corporation. All Rights Reserved. 15 Distribution of Candidate Performance:
    16. 16. ©2013 LinkedIn Corporation. All Rights Reserved. 16 Distribution of Candidate Performance: Mixture Weights
    17. 17. ©2013 LinkedIn Corporation. All Rights Reserved. 17 Mixture Weights Distribution of Candidate Performance:
    18. 18. ©2013 LinkedIn Corporation. All Rights Reserved. 18 Distribution of Candidate Performance: ? ? ? ?
    19. 19. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 19
    20. 20. ©2013 LinkedIn Corporation. All Rights Reserved. One possibility: 20 Each word comes from different topics (bag of words: ignore order)
    21. 21. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 21 Each word comes from different topics Mixture Weight for Topic k Multinomial Distribution over ALL words based on topic k
    22. 22. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 22 Word Topic 1 Topic K Leadership Big Data Machine Learning
    23. 23. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 23 Word Topic 1 Topic K Leadership Big Data Machine Learning 1) Pick a topic 2) Pick a word
    24. 24. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 24 Word Topic 1 Topic K Leadership Big Data Machine Learning The chosen Topic: Z
    25. 25. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 25 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z 2) _ 3) _ The chosen Topic: Z
    26. 26. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 26 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z (cluster for the word) 2) (document composition) 3) (key words) The chosen Topic: Z
    27. 27. ©2013 LinkedIn Corporation. All Rights Reserved. Review! 27 Z W
    28. 28. ©2013 LinkedIn Corporation. All Rights Reserved. 28 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents
    29. 29. ©2013 LinkedIn Corporation. All Rights Reserved. 29 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
    30. 30. ©2013 LinkedIn Corporation. All Rights Reserved. 30 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
    31. 31. ©2013 LinkedIn Corporation. All Rights Reserved. 31 and control the “sparsity” of the weights for the multinomial. Implications: a priori we assume - Topics have few key words - Documents only have a small subset of topics
    32. 32. ©2013 LinkedIn Corporation. All Rights Reserved. Dirichlet Distribution with Different Sparsity Parameters 32
    33. 33. ©2013 LinkedIn Corporation. All Rights Reserved. 33 Latent Dirichlet Allocation!!! Zd,n k=1…K Wd,n n=1,…,Nd
    34. 34. ©2013 LinkedIn Corporation. All Rights Reserved. 34 How do we fit this model? Want the posterior: Worst part of Bayesian Analysis…..personally speaking~
    35. 35. ©2013 LinkedIn Corporation. All Rights Reserved. 35 Two main ways to get posterior: - Sampling methods - Asymtotically correct - Time consuming - Lots of black magic in sampling tricks - Variational methods (practical solution!) - An approximation with no guarantees - Faster - Need math skills
    36. 36. ©2013 LinkedIn Corporation. All Rights Reserved. 36 Variational Bayes (specifically mean field variational bayes): What’s crazy? - Assumes all the latent variables are independent What’s not crazy? - Finds the “best” model within this crazy class. - Best under KL divergence Empirically have shown promising results! For “sufficient” details: “Explaining Variational Approximations ” by Ormerod and Wand
    37. 37. ©2013 LinkedIn Corporation. All Rights Reserved. LDA Take Home 37 - An intuitively appealing Bayesian unsupervised learning model - Training is difficult - Lots of packages exist, main issue is scalability - Validation is difficult - Usually cast into a supervised learning framework - Presentation is difficult - Visualization for the Bayesian model is hard.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×