Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lda and it's applications


Published on

LDA is usually used for topic modelling.

Published in: Data & Analytics
  • Login to see the comments

Lda and it's applications

  1. 1. LDA and it’s applications AI HACKERS
  2. 2. What is LDA?  LDA stands for latent dirichlet allocation  It is basically of distribution of words in topic k (let’s say 50) with probability of topic k occurring in document d (let’s say 5000)  Mechanism - It uses special kind of distribution called Dirichlet Distribution which is nothing but multi—variate generalization of Beta distribution of probability density function
  3. 3. LDA in layman terms Sentence 1: I spend the evening watching football Sentence 2: I ate nachos and guacamole. Sentence 3: I spend the evening watching football while eating nachos and guacamole. LDA might say something like: Sentence A is 100% about Topic 1 Sentence B is 100% Topic 2 Sentence C is 65% is Topic 1, 35% Topic 2 But also tells that Topic 1 is about football (50%), evening (50%), topic 2 is about nachos (50%), guacamole (50)%
  4. 4. Bayesian Network Example
  5. 5. LDA is Bayesian Network of Probability Density function
  6. 6. LDA history Andrew NgDavid Blei Michael I Jordan
  7. 7. A simple LDA
  8. 8. Packages used in python  sudo pip install nltk  sudo pip install genism  sudo pip intall stop-words
  9. 9. Stop words  Stop words are commonly occurring words which doesn’t contribute to topic modelling.  the, and, or  However, sometimes, removing stop words affect topic modelling  For e.g., Thor The Ragnarok is a single topic but we use stop words mechanism, then it will be removed.
  10. 10. Porter’s Stemmer algorithm  A common NLP technique to reduce topically similar words to their root. For e.g., “stemming,” “stemmer,” “stemmed,” all have similar meanings; stemming reduces those terms to “stem.”  Important for topic modeling, which would otherwise view those terms as separate entities and reduce their importance in the model.  It's a bunch of rules for reducing a word:  sses -> es  ies -> i  ational -> ate  tional -> tion  s -> ∅  when conflicts, the longest rule wins  Bad idea unless you customize it.
  11. 11. Porter’s Stemmer algorithm -Flowchart Arabic Stemming Process Simple Stemming Process
  12. 12. Lemmatization  It goes one step further than stemming.  It obtains grammatically correct words and distinguishes words by their word sense with the use of a vocabulary (e.g., type can mean write or category).  It is a much more difficult and expensive process than stemming.
  13. 13. Lemmatization - Example
  14. 14. Bag of Words
  15. 15. Word2Vec
  16. 16. CBOW v/s SKIP-GRAM
  17. 17. LDA 2 VEC – what really happens? LDA2VEC model adds in skipgrams. A word predicts another word in the same window, as in word2vec, but also has the notion of a context vector which only changes at the document level as in LDA.
  18. 18. Lda2Vec – Pytorch code  Source:  Go to 20newsgroups/.  Run get_windows.ipynb to prepare data.  Run python for training.  Run explore_trained_model.ipynb.  To use this on your data you need to edit get_windows.ipynb. Also there are hyperparameters in 20newsgroups/, utils/, utils/
  19. 19. Thank ou