A paper by   Tao Tao, Xuanhui Wang, Qiaozhu Mei, ChengXiang                              Zhai              Presented By Ku...
   Zero Count Problem: Term is a possible word    of Information need does not occur in    document   General Problem of...
This gives the average logarithmicdistance between the probabilities: aword would be observed at random fromunigram query ...
C(w, d) is number of times word w occur in document d, and |d|is the length of document.Problems:•Assigns Zero Probability...
   Jelinek-Mercer(JM) Smoothing   Dirichlet Smoothing
   Proposes a fixed parameter λ to control    interpolation.          Probability of word w given by the collection model...
   It uses document dependent coefficient    (parameterized with μ) to control the    interpolation.
   Uses clustering information to smooth a    document.   Divides all documents into K clusters.   First smoothes clust...
ΘLd stand for document d’s cluster model and λ,β are smoothing parameters.
   Better than JM or Dirichlet Smoothing: It    expands a document with more data from    the cluster instead of just usi...
Cluster D good for                        smoothing document a                        but not good for                    ...
   Expand each document using Probabilistic    Neighborhood  to  estimate  a   virtual    document(d’).   Apply any inte...
   Can use Cosine rule to determine documents    in the neighborhood of Original document.   Problems:    ◦ In narrow se...
   Associates a Confidence Value with every    document in the collection    ◦ This Confidence Value reflects the belief ...
   Confidence Value(γd) is associated to every    document to indicate how strongly it is    sampled from d’s document. ...
   Shorter document require more help from its    neighbor.   Longer documents rely more on itself.   In order to take ...
For Efficiency: Pseudo term count can becalculated only using top M closest Neighbors ( asconfidence value follows decay s...
   For performance comparison:    ◦ It uses four TREC data sets        AP(Associate Press news 1988-90)        LA ( LA ...
Comparison of DELM +(Diri/JM) with Diri/JMλ for JM, μ for Dirichet are optimal and the same values of λ orμ are used for ...
Compared Precision                                    values at different                                    levels of re...
Compares                                              Performance Trend                                              with ...
Comparison of DELM+Dirichlet with CBDM DELM + Dirichet outperforms CBDM in MAP values on allfour data sets.
Document in AP88-89 was shrinked to 30% of original in 1st,50% of original in 2nd and 70% of original in 3rd .Results sh...
Performance change with respect to αOptimal Points Migrate when document length becomesshorter. ( 100% corpus length gets...
Combination of DELM with Pseudo Feedback DELM combined with Model-Based Feedback proposed in (Zhaiand Lafferty, 2001a)Ex...
   References:    ◦ http://sifaka.cs.uiuc.edu/czhai/pub/hlt06-exp.pdf    ◦ http://nlp.stanford.edu/IR-book/pdf/12lmodel.p...
Upcoming SlideShare
Loading in …5
×

Language Model Information Retrieval with Document Expansion

531 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
531
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Language Model Information Retrieval with Document Expansion

  1. 1. A paper by Tao Tao, Xuanhui Wang, Qiaozhu Mei, ChengXiang Zhai Presented By Kumar Ashish INF384H/CS395T: Concepts of Information Retrieval (and Web Search) Fall 2011
  2. 2.  Zero Count Problem: Term is a possible word of Information need does not occur in document General Problem of Estimation: Terms occurring once are overestimated even though their occurrence was partly by chance In order to solve above problems, high quality extra data is required to enlarge the sample of document.
  3. 3. This gives the average logarithmicdistance between the probabilities: aword would be observed at random fromunigram query language model andunigram document language model.
  4. 4. C(w, d) is number of times word w occur in document d, and |d|is the length of document.Problems:•Assigns Zero Probability to any word not present in documentcausing problem in scoring a document with KL-Divergence.
  5. 5.  Jelinek-Mercer(JM) Smoothing Dirichlet Smoothing
  6. 6.  Proposes a fixed parameter λ to control interpolation. Probability of word w given by the collection model Θc
  7. 7.  It uses document dependent coefficient (parameterized with μ) to control the interpolation.
  8. 8.  Uses clustering information to smooth a document. Divides all documents into K clusters. First smoothes cluster model with collection model using Dirichlet Smoothing. Takes smoothed cluster as a new reference model to smooth document using JM Smoothing
  9. 9. ΘLd stand for document d’s cluster model and λ,β are smoothing parameters.
  10. 10.  Better than JM or Dirichlet Smoothing: It expands a document with more data from the cluster instead of just using the same collection language model.
  11. 11. Cluster D good for smoothing document a but not good for document d.Ideally each documentshould have its owncluster centeredaround itself.
  12. 12.  Expand each document using Probabilistic Neighborhood to estimate a virtual document(d’). Apply any interpolation based method(e.g. JM or Dirchlet) to such a virtual document and treat the word counts given by this virtual document as if they were the original word count.
  13. 13.  Can use Cosine rule to determine documents in the neighborhood of Original document. Problems: ◦ In narrow sense would contain only few documents whereas in wide sense the whole collection may included. ◦ Neighbor documents can’t be sampled the same as original document.
  14. 14.  Associates a Confidence Value with every document in the collection ◦ This Confidence Value reflects the belief that the document is sampled from the same underlying model as the original one.
  15. 15.  Confidence Value(γd) is associated to every document to indicate how strongly it is sampled from d’s document. Confidence Value should follow normal distribution:
  16. 16.  Shorter document require more help from its neighbor. Longer documents rely more on itself. In order to take care of this a parameter α is introduced to control this balance.
  17. 17. For Efficiency: Pseudo term count can becalculated only using top M closest Neighbors ( asconfidence value follows decay shape)
  18. 18.  For performance comparison: ◦ It uses four TREC data sets  AP(Associate Press news 1988-90)  LA ( LA times)  WSJ(Wall Street Journals 1987- 92)  SJMN(San Jose Mercury News 1991) For Testing Algorithm Scale Up ◦ Uses TREC8 For Testing Effect on Short Documents ◦ Uses DOE( Department of Energy)
  19. 19. Comparison of DELM +(Diri/JM) with Diri/JMλ for JM, μ for Dirichet are optimal and the same values of λ orμ are used for DELM without further tuning. M is 100 and α is0.5 for DELM.DELM Outperforms JM and Dirichlet on each Data Sets withimprovement as much as 15% in case of Associated PressNews(AP).
  20. 20. Compared Precision values at different levels of recall for AP data sets. DELM + Dirichet outperforms Dirichlet on every precision point.Precision-Recall Curve on AP Data
  21. 21. Compares Performance Trend with respect to M( top M closest neighbors for each Document) Performance change with respect to MConclusion: Neighborhood information improves retrieval accuracyPerformance becomes insensitive to M when M is sufficiently large
  22. 22. Comparison of DELM+Dirichlet with CBDM DELM + Dirichet outperforms CBDM in MAP values on allfour data sets.
  23. 23. Document in AP88-89 was shrinked to 30% of original in 1st,50% of original in 2nd and 70% of original in 3rd .Results shows that DELM help shorter documents more thanlonger ones (41% on 30%-length corpus to 16% on full length)
  24. 24. Performance change with respect to αOptimal Points Migrate when document length becomesshorter. ( 100% corpus length gets optimal at α = 0.4 but30% corpus has to use α = 0.2)
  25. 25. Combination of DELM with Pseudo Feedback DELM combined with Model-Based Feedback proposed in (Zhaiand Lafferty, 2001a)Experiment Performed by:  Retrieving Documents by DELM method Choosing top five document to do model based Feedback Using Expanded query model to retrieve documents againResult: DELM can be combined with pseudo feedback toimprove performance
  26. 26.  References: ◦ http://sifaka.cs.uiuc.edu/czhai/pub/hlt06-exp.pdf ◦ http://nlp.stanford.edu/IR-book/pdf/12lmodel.pdf ◦ http://krisztianbalog.com/files/sigir2008-csiro.pdf

×