Successfully reported this slideshow.
Constructing Query Models from Elaborate Query Formulations<br />A Few Examples Go A Long Way<br />KrisztianBalog<br />kba...
AIM<br /><ul><li>This paper aims to introduce and compare several methods for sampling expansion terms with query independ...
Along with the query it takes sample documents as input. Sample documents are additional information that users provide co...
Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
P(D1|Q) = 0.32<br />P(D2|Q) = 0.26<br />What is a Rainforest?<br />P(D3|Q) = 0.19<br />P(D4|Q) = 0.12<br />P(D5|Q) = 0.09<...
Query Likelihood<br />Bayes’ Rule<br />Ignoring P(Q)<br />Assuming Independence of Query terms<br />Taking log<br />Using ...
Relevance Model<br />What is a Rainforest?<br />Query (Q)<br />Documents<br />
Underlying Relevance Model<br />The query and relevant documents are random samples from an underlying relevance model R.<...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Document Modeling<br />Maximum Likelihood Estimate<br />Smoothing ML estimate<br />This document will have P(“Rain”|D) as ...
Query Modeling<br />P(t|Q) is extremely space and thus query expansion is necessary.<br />This document does not have word...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Experimental Setup<br /><ul><li> CSIRO Enterprise Research Collection (CERC), a crawl of    *.csiro.au web site conducted ...
 370,715 documents
 Size of 4.2 gigabytes
 50 topics
 Judgments made in 3-point scale: </li></ul>2: highly relevant “key reference”<br />1: candidate key page<br />0: not a “k...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Parameter Estimation<br />Maximizing Average Precision (MAX_AP)<br />Maximizing Query Log likelihood (MAX_QLL)<br />Best E...
Evaluation<br /><ul><li>Maximum AP score is reached when weight is 0.6
MAX_QLL performs slightly better than MAX_AP</li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br /...
Query Representation<br /><ul><li> Combination of expanded query terms is performed with the original query terms.
 This prevents the topic to shift away from the original user information need.</li></li></ul><li>Overview<br />Retrieval ...
Feedback Using Relevance Models<br />Joint Probability of observing t together with query terms q1,q2…qk divided by joint ...
 RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.</li></li></ul><li>RM1<br />Assume weight of ...
RM2<br />Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words fro...
Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimenta...
Relevance Models from Sample Documents<br /><ul><li> Apply Relevance Models on Sample Document instead of Feedback documen...
 For RM1 assume P(D) = 1/|S|.</li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Represen...
Query Model from Sample Documents<br />Top K terms with highest probability P(t|S) are taken and used to formulate expande...
Query Model from Sample Documents<br /><ul><li> Maximum Likelihood Estimate of a term (EX-QM-ML)
 Smoothed Estimate of a term (EX-QM-SM)
 Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP)</li></li></ul><li>Query Model f...
 Query-biased:
 Inverse query-biased:      </li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Represent...
Expanded Query Models<br />
Upcoming SlideShare
Loading in …5
×

Tanvi Motwani- A Few Examples Go A Long Way

479 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Tanvi Motwani- A Few Examples Go A Long Way

  1. 1. Constructing Query Models from Elaborate Query Formulations<br />A Few Examples Go A Long Way<br />KrisztianBalog<br />kbalog@science.uva.nl<br />WouterWeerkamp<br />weerkamp@science.uva.nl<br />MaartendeRijke<br />mdr@science.uva.nl<br />ISLA,University of Amsterdam<br />Presented by TanviMotwani<br />
  2. 2. AIM<br /><ul><li>This paper aims to introduce and compare several methods for sampling expansion terms with query independent as well as query dependent techniques.
  3. 3. Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
  4. 4. Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but by the sample documents.</li></li></ul><li>Aspect Retrieval<br />Query: What are current applications of robotics?<br />Find as many different applications as possible. <br />Aspect judgments<br />A1 A2 A3 … ... Ak<br />d1 1 1 0 0 … 0 0<br />d2 0 1 1 1 … 0 0<br />d3 0 0 0 0 … 1 0<br />….<br />dk 1 0 1 0 ... 0 1<br />Example Aspects<br />A1: spot-welding robotics<br />A2: controlling inventory <br />A3: pipe-laying robots<br />A4: talking robot<br />A5: robots for loading & unloading <br /> memory tapes<br />A6: robot telephone operators<br />A7: robot cranes<br />… … <br />
  5. 5. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />
  6. 6. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Likelihood<br />Document Modeling<br />Query Modeling<br />
  7. 7. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Likelihood<br />Document Modeling<br />Query Modeling<br />
  8. 8. P(D1|Q) = 0.32<br />P(D2|Q) = 0.26<br />What is a Rainforest?<br />P(D3|Q) = 0.19<br />P(D4|Q) = 0.12<br />P(D5|Q) = 0.09<br />Query (Q)<br />Documents<br />
  9. 9. Query Likelihood<br />Bayes’ Rule<br />Ignoring P(Q)<br />Assuming Independence of Query terms<br />Taking log<br />Using query and document models<br />
  10. 10. Relevance Model<br />What is a Rainforest?<br />Query (Q)<br />Documents<br />
  11. 11. Underlying Relevance Model<br />The query and relevant documents are random samples from an underlying relevance model R.<br />Documents are ranked based on their similarity to the query model.<br />The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.<br />
  12. 12. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Likelihood<br />Document Modeling<br />Query Modeling<br />
  13. 13. Document Modeling<br />Maximum Likelihood Estimate<br />Smoothing ML estimate<br />This document will have P(“Rain”|D) as 0, thus smoothing is required.<br />
  14. 14. Query Modeling<br />P(t|Q) is extremely space and thus query expansion is necessary.<br />This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.<br />
  15. 15. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />
  16. 16. Experimental Setup<br /><ul><li> CSIRO Enterprise Research Collection (CERC), a crawl of *.csiro.au web site conducted in March 2007.
  17. 17. 370,715 documents
  18. 18. Size of 4.2 gigabytes
  19. 19. 50 topics
  20. 20. Judgments made in 3-point scale: </li></ul>2: highly relevant “key reference”<br />1: candidate key page<br />0: not a “key reference” <br />
  21. 21. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Maximizing Average Precision (MAX_AP)<br />Maximizing Query Log Likelihood (MAX_QLL)<br />Best Empirical estimate (EMP_BEST)<br />
  22. 22. Parameter Estimation<br />Maximizing Average Precision (MAX_AP)<br />Maximizing Query Log likelihood (MAX_QLL)<br />Best Empirical Estimate (EMP_BEST)<br />
  23. 23. Evaluation<br /><ul><li>Maximum AP score is reached when weight is 0.6
  24. 24. MAX_QLL performs slightly better than MAX_AP</li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Model from Sample Documents<br />Feedback Using Relevance Models<br />Relevance Models from Sample Documents<br />
  25. 25. Query Representation<br /><ul><li> Combination of expanded query terms is performed with the original query terms.
  26. 26. This prevents the topic to shift away from the original user information need.</li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Model from Sample Documents<br />Feedback Using Relevance Models<br />Relevance Models from Sample Documents<br />
  27. 27. Feedback Using Relevance Models<br />Joint Probability of observing t together with query terms q1,q2…qk divided by joint probability of the query terms.<br /><ul><li> RM1: It is assumed that t and qi are sampled independently and identically to each other
  28. 28. RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.</li></li></ul><li>RM1<br />Assume weight of smoothing is 0.<br />“wild” appears 5 times in this document.<br />“rain” appears 20 times in this document.<br />“forest” appears 30 times in this document.<br />Number of unique terms in this document are 150.<br />M is just this single document.<br />P(D1) = 1/5<br />P(“wild”, “rain”, “forest”) = 1/5* 5/150 * 20/150 * 30/150<br />
  29. 29. RM2<br />Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document.<br />Assume P(D | “wild”) is 0.7<br />This document has 10 “rain” words<br />And 20 “forest” words<br />Document has 200 unique words<br />P(“wild”) is 0.2<br />And M is just this document<br />P(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200<br />
  30. 30. Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Model from Sample Documents<br />Feedback Using Relevance Models<br />Relevance Models from Sample Documents<br />
  31. 31. Relevance Models from Sample Documents<br /><ul><li> Apply Relevance Models on Sample Document instead of Feedback documents i.e. set M = S.
  32. 32. For RM1 assume P(D) = 1/|S|.</li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />Query Model from Sample Documents<br />Feedback Using Relevance Models<br />Relevance Models from Sample Documents<br />
  33. 33. Query Model from Sample Documents<br />Top K terms with highest probability P(t|S) are taken and used to formulate expanded query.<br />Sample Document set S<br />Select document D from this set S with probability P(D|S)<br />From this document, generate term t with probability P(t|D)<br />Sum over all sample documents to obtain P(t|S)<br />
  34. 34. Query Model from Sample Documents<br /><ul><li> Maximum Likelihood Estimate of a term (EX-QM-ML)
  35. 35. Smoothed Estimate of a term (EX-QM-SM)
  36. 36. Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP)</li></li></ul><li>Query Model from Sample Documents<br />Three options for estimating P(D|S)<br /><ul><li> Uniform:
  37. 37. Query-biased:
  38. 38. Inverse query-biased: </li></li></ul><li>Overview<br />Retrieval Model<br />Experimental Set up<br />Query Representation<br />Baseline Parameters<br />Experimental Evaluation<br />
  39. 39. Expanded Query Models<br />
  40. 40. Combination with Original Query<br />
  41. 41. Importance of Sample Document<br />
  42. 42. Topic Level Comparison<br />
  43. 43. Topic Level Comparison<br />
  44. 44. Sampling conditioned on query<br />
  45. 45. Conclusion<br /><ul><li> Introduced a method of sampling query expansion terms in a query-independent way, based on sample documents that reflect “aspects” of user’s information need that are not captured by the query.
  46. 46. Introduced different versions of expansion term selection method, based on different term selection and document importance weighting methods and compared them against more traditional query expansion terms is a query-biased manner.</li></li></ul><li>Questions/Discussion<br /><ul><li> Every topic needs a sample document set, is this method feasible in real world domain where there are uncountable topics?
  47. 47. Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?
  48. 48. Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?</li>

×