Upcoming SlideShare
×

# LDA Beginner's Tutorial

17,765 views

Published on

Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.

Published in: Education, Technology
34 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
17,765
On SlideShare
0
From Embeds
0
Number of Embeds
100
Actions
Shares
0
794
0
Likes
34
Embeds 0
No embeds

No notes for slide
• Take home: validation is difficult….no true answer here.
• Clustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.
• 2 stage process
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• 2 stage process
• 2 stage process
• 2 stage process
• 2 stage process
• 2 stage process
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• ### LDA Beginner's Tutorial

2. 2. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation: A Bayesian Unsupervised Learning Model Roadmap 2 • Unsupervised learning • Bayesian Statistics • Mixture Models • LDA – theory and intuition • LDA – practice and applications
3. 3. ©2013 LinkedIn Corporation. All Rights Reserved. Unsupervised Learning Learning patterns with no labels 3 • Clustering is a form of “Unsupervised learning” • Classification is known as supervised learning • Validation is difficult
21. 21. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 21 Each word comes from different topics Mixture Weight for Topic k Multinomial Distribution over ALL words based on topic k