Your SlideShare is downloading. ×
0
Context Based Search                     By       Shatabdi Kundu (2010EET2553)        Computer Technology,M.Tech          ...
Outline      Introduction to Topic Models- Probabilistic Modelling      Latent Dirichlet Allocation      Topic Discovery u...
Probabilistic Modelling       Treat data as observations that arise from a generative       probabilistic process that inc...
Intuition behind LDA           Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   4of 16
Generative Process      Cast these intuitions into a generative probabilistic process      Each document is a random mixtu...
Graphical Models      Nodes are random variables      Edges denote possible dependence      Observed variables are shaded ...
Graphical Models      Structure of the graph defines the pattern of conditional      dependence between the ensemble of ran...
Latent Dirichlet Allocation     1   Draw each topic βk ∼ Dir(η), for k                 {1,.....,K}     2   For each docume...
Latent Dirichlet Allocation       From a collection of documents, infer           Per-word topic assignment Zd,n          ...
Topic Discovery using Wordnet   Lexical relations used for finding out the latent topics        synsets(synonym sets) as ba...
Work Done     I took a collection of 10 documents that had a total of around     28K words     I removed the stop words an...
Results after training LDA model      This model only selects appropriate words within a topic but      does not name the ...
Results after applying to Wordnet      The above result gives us the hidden topic names of the words      that comprised t...
Conclusion and Future Work      Now we will be working on searching based on topics(context)      using this model.      B...
References      Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan.      Journal of Machine Learning Research, 3:9...
Thank YouShatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   16of 16
Upcoming SlideShare
Loading in...5
×

Minor Project

1,152

Published on

Latent Dirichlet Allocation for Topic Modelling.

Published in: Education, Technology
3 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
1,152
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
3
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Minor Project"

  1. 1. Context Based Search By Shatabdi Kundu (2010EET2553) Computer Technology,M.Tech IIT Delhi Email ID:shatabdikundu@live.com Project Guide: Prof.Santanu Chaudhury Electrical Engineering Department IIT Delhi Email ID:santanuc@ee.iitd.ac.inShatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 1of 16
  2. 2. Outline Introduction to Topic Models- Probabilistic Modelling Latent Dirichlet Allocation Topic Discovery using Wordnet Work Done Results Conclusion and Future Work References Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 2of 16
  3. 3. Probabilistic Modelling Treat data as observations that arise from a generative probabilistic process that includes hidden variables For documents, the hidden variables reflect the thematic structure of the collection Infer the hidden structure using posterior inference What are the topics that describe this collection? Situate new data into the estimated model How does this query or new document fit into the estimated topic structure? Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 3of 16
  4. 4. Intuition behind LDA Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 4of 16
  5. 5. Generative Process Cast these intuitions into a generative probabilistic process Each document is a random mixture of corpus-wide topics Each word is drawn from one of those topics Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 5of 16
  6. 6. Graphical Models Nodes are random variables Edges denote possible dependence Observed variables are shaded Plates denote replicated structure Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 6of 16
  7. 7. Graphical Models Structure of the graph defines the pattern of conditional dependence between the ensemble of random variables. Eg. this graph corressponds to N p(y , x1 ...xN ) = p(y ) p(xn | y ) (1) n=1 Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 7of 16
  8. 8. Latent Dirichlet Allocation 1 Draw each topic βk ∼ Dir(η), for k {1,.....,K} 2 For each document: 1 Draw topic proportions θd ∼ Dir(α) 2 For each word: 1 Draw Zd,n ∼ Mult(θd ) 2 Draw Wd,n ∼ Mult(βZd,n ) Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 8of 16
  9. 9. Latent Dirichlet Allocation From a collection of documents, infer Per-word topic assignment Zd,n Per-document topic proportions θd Per-corpus topic distributions βk Use posterior expectations to perform the task at hand, e.g information retrieval,document similarity, etc. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 9of 16
  10. 10. Topic Discovery using Wordnet Lexical relations used for finding out the latent topics synsets(synonym sets) as basic units hyponymy a semantic relation between word meanings Eg. {maple} is a hyponym of {tree} hypernymy inverse of hyponym Eg.{tree} is a hypernym of {maple} Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 10of 16
  11. 11. Work Done I took a collection of 10 documents that had a total of around 28K words I removed the stop words and rare words along with punctuation marks and numbers. Then I modeled a 7-topic LDA model with this corpus Now I had 7 topics with 5 most highly probable occuring words from each topic. I then used the lexical relations of Wordnet to identify the hidden topics using common parents of all the words in each topic. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 11of 16
  12. 12. Results after training LDA model This model only selects appropriate words within a topic but does not name the topic Discovering the topic name is done using Wordnet Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 12of 16
  13. 13. Results after applying to Wordnet The above result gives us the hidden topic names of the words that comprised the documents. This kind of model can be used for identifying topics when given only a word. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 13of 16
  14. 14. Conclusion and Future Work Now we will be working on searching based on topics(context) using this model. Basically we will be dealing with geo-intent of the queries and decide on the topic to which they belong for better retrieval of information. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 14of 16
  15. 15. References Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Jun Fu Cai, Wee Sun Lee, Yee Whye Teh. NUS-ML: Improving Word Sense Disambiguation Using Topic Features. SEMEVAL (2007). David M. Blei, Jon D. McAuliffe. Supervised Topic Models. NIPS (2007). Wordnet. http://www.shiffman.net/teaching/a2z/wordnet Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 15of 16
  16. 16. Thank YouShatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 16of 16
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×