# LDA Beginner's Tutorial

Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.

• Take home: validation is difficult….no true answer here.
• Clustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.
• Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
• ### LDA Beginner's Tutorial

2. 2. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation: A Bayesian Unsupervised Learning Model Roadmap 2 • Unsupervised learning • Bayesian Statistics • Mixture Models • LDA – theory and intuition • LDA – practice and applications
3. 3. ©2013 LinkedIn Corporation. All Rights Reserved. Unsupervised Learning Learning patterns with no labels 3 • Clustering is a form of “Unsupervised learning” • Classification is known as supervised learning • Validation is difficult
21. 21. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 21 Each word comes from different topics Mixture Weight for Topic k Multinomial Distribution over ALL words based on topic k