Lda and it's applications

•Download as PPTX, PDF•

2 likes•2,182 views

Babu Priyavrat

LDA is usually used for topic modelling.

Data & Analytics

What is LDA?
 LDA stands for latent dirichlet allocation
 It is basically of distribution of words in topic k (let’s say 50) with probability of
topic k occurring in document d (let’s say 5000)
 Mechanism - It uses special kind of distribution called Dirichlet Distribution which
is nothing but multi—variate generalization of Beta distribution of probability
density function

LDA in layman terms
Sentence 1: I spend the evening watching football
Sentence 2: I ate nachos and guacamole.
Sentence 3: I spend the evening watching football while eating nachos and guacamole.
LDA might say something like:
Sentence A is 100% about Topic 1
Sentence B is 100% Topic 2
Sentence C is 65% is Topic 1, 35% Topic 2
But also tells that
Topic 1 is about football (50%), evening (50%),
topic 2 is about nachos (50%), guacamole (50)%

LDA is Bayesian Network of Probability
Density function

LDA history
Andrew NgDavid Blei Michael I Jordan

A simple LDA
https://ai.stanford.edu/~ang/papers/nips01-lda.pdf

Packages used in python
 sudo pip install nltk
 sudo pip install genism
 sudo pip intall stop-words

Stop words
 Stop words are commonly occurring words which doesn’t contribute to topic
modelling.
 the, and, or
 However, sometimes, removing stop words affect topic modelling
 For e.g., Thor The Ragnarok is a single topic but we use stop words mechanism, then it
will be removed.

Porter’s Stemmer algorithm
 A common NLP technique to reduce topically similar words to their root. For e.g., “stemming,” “stemmer,”
“stemmed,” all have similar meanings; stemming reduces those terms to “stem.”
 Important for topic modeling, which would otherwise view those terms as separate entities and reduce
their importance in the model.
 It's a bunch of rules for reducing a word:
 sses -> es
 ies -> i
 ational -> ate
 tional -> tion
 s -> ∅
 when conflicts, the longest rule wins
 Bad idea unless you customize it.

Porter’s Stemmer algorithm -Flowchart
Arabic Stemming Process
Simple Stemming Process

Lemmatization
 It goes one step further than stemming.
 It obtains grammatically correct words and distinguishes words by their word
sense with the use of a vocabulary (e.g., type can mean write or category).
 It is a much more difficult and expensive process than stemming.

CBOW v/s SKIP-GRAM
https://arxiv.org/pdf/1301.3781.pdf

LDA 2 VEC –
what really happens?
https://arxiv.org/pdf/1605.02019.pdf
LDA2VEC model adds in skipgrams.
A word predicts another word in the same window,
as in word2vec, but also has the notion of a context vector
which only changes at the document level as in LDA.

Lda2Vec – Pytorch code
 Source: https://github.com/TropComplique/lda2vec-pytorch
 Go to 20newsgroups/.
 Run get_windows.ipynb to prepare data.
 Run python train.py for training.
 Run explore_trained_model.ipynb.
 To use this on your data you need to edit get_windows.ipynb. Also there are
hyperparameters in 20newsgroups/train.py, utils/training.py, utils/lda2vec_loss.py.

Similar to Lda and it's applications

Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM

Introduction to word embeddings with PythonPavel Kalaidin

DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonMoscowDataFest

CMSC 723: Computational Linguistics Ibutest

Class14Dr. Cupid Lucid

Icon 2007 PedersenUniversity of Minnesota, Duluth

Vectorization In NLP.pptxChode Amarnath

Ir 03Mohammed Romi

Measuring Similarity Between Contexts and ConceptsUniversity of Minnesota, Duluth

SNLI_presentation_2Viral Gupta

graduate_thesis (1)Sihan Chen

Deep Learning for SearchBhaskar Mitra

Enriching the semantic web tutorial session 1Tobias Wunner

Tricks in natural language processingBabu Priyavrat

Jpl presentationRama Bastola

information retrieval --> dictionary.pptssusere3b1a2

Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen

Topic Models - LDA and Correlated Topic ModelsClaudia Wagner

Similar to Lda and it's applications (20)

Using topic modelling frameworks for NLP and semantic search

Introduction to word embeddings with Python

DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

CMSC 723: Computational Linguistics I

Class14

Icon 2007 Pedersen

Vectorization In NLP.pptx

Ir 03

Measuring Similarity Between Contexts and Concepts

SNLI_presentation_2

graduate_thesis (1)

Deep Learning for Search

Enriching the semantic web tutorial session 1

Tricks in natural language processing

Jpl presentation

information retrieval --> dictionary.ppt

Unsupervised Software-Specific Morphological Forms Inference from Informal Di...

Topic Models - LDA and Correlated Topic Models

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

B2 Creative Industry Response Evaluation.docxStephen266013

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Industrialised data - the key to AI success.pdfLars Albertsson

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

100-Concepts-of-AI by Anupama Kate .pptx

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

B2 Creative Industry Response Evaluation.docx

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Industrialised data - the key to AI success.pdf

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Lda and it's applications

1. LDA and it’s applications AI HACKERS

2. What is LDA?  LDA stands for latent dirichlet allocation  It is basically of distribution of words in topic k (let’s say 50) with probability of topic k occurring in document d (let’s say 5000)  Mechanism - It uses special kind of distribution called Dirichlet Distribution which is nothing but multi—variate generalization of Beta distribution of probability density function

3. LDA in layman terms Sentence 1: I spend the evening watching football Sentence 2: I ate nachos and guacamole. Sentence 3: I spend the evening watching football while eating nachos and guacamole. LDA might say something like: Sentence A is 100% about Topic 1 Sentence B is 100% Topic 2 Sentence C is 65% is Topic 1, 35% Topic 2 But also tells that Topic 1 is about football (50%), evening (50%), topic 2 is about nachos (50%), guacamole (50)%

4. Bayesian Network Example

5. LDA is Bayesian Network of Probability Density function

6. LDA history Andrew NgDavid Blei Michael I Jordan

7. A simple LDA https://ai.stanford.edu/~ang/papers/nips01-lda.pdf

8. Packages used in python  sudo pip install nltk  sudo pip install genism  sudo pip intall stop-words

9. Stop words  Stop words are commonly occurring words which doesn’t contribute to topic modelling.  the, and, or  However, sometimes, removing stop words affect topic modelling  For e.g., Thor The Ragnarok is a single topic but we use stop words mechanism, then it will be removed.

10. Porter’s Stemmer algorithm  A common NLP technique to reduce topically similar words to their root. For e.g., “stemming,” “stemmer,” “stemmed,” all have similar meanings; stemming reduces those terms to “stem.”  Important for topic modeling, which would otherwise view those terms as separate entities and reduce their importance in the model.  It's a bunch of rules for reducing a word:  sses -> es  ies -> i  ational -> ate  tional -> tion  s -> ∅  when conflicts, the longest rule wins  Bad idea unless you customize it.

11. Porter’s Stemmer algorithm -Flowchart Arabic Stemming Process Simple Stemming Process

12. Lemmatization  It goes one step further than stemming.  It obtains grammatically correct words and distinguishes words by their word sense with the use of a vocabulary (e.g., type can mean write or category).  It is a much more difficult and expensive process than stemming.

13. Lemmatization - Example

14. Bag of Words

15. Word2Vec

16. CBOW v/s SKIP-GRAM https://arxiv.org/pdf/1301.3781.pdf

17. LDA 2 VEC – what really happens? https://arxiv.org/pdf/1605.02019.pdf LDA2VEC model adds in skipgrams. A word predicts another word in the same window, as in word2vec, but also has the notion of a context vector which only changes at the document level as in LDA.

18. Lda2Vec – Pytorch code  Source: https://github.com/TropComplique/lda2vec-pytorch  Go to 20newsgroups/.  Run get_windows.ipynb to prepare data.  Run python train.py for training.  Run explore_trained_model.ipynb.  To use this on your data you need to edit get_windows.ipynb. Also there are hyperparameters in 20newsgroups/train.py, utils/training.py, utils/lda2vec_loss.py.

19. Thank ou

Lda and it's applications

Recommended

Recommended

More Related Content

Similar to Lda and it's applications

Similar to Lda and it's applications (20)

More from Babu Priyavrat

More from Babu Priyavrat (7)

Recently uploaded

Recently uploaded (20)

Lda and it's applications