Efficient Diversification of Web        Search Results    G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri          ...
Introduction: SE Results             Diversification• Query: “Vinci”, what’s the user’s intent?   • Information on Leonardo...
Introduction: SE Results             Diversification• Query: “Vinci”, what’s the user’s intent?   • Information on Leonardo...
Introduction: SE Results             Diversification• Query: “Vinci”, what’s the user’s intent?   • Information on Leonardo...
Query Diversification as a            Coverage Problem• Hypothesis: • For each user’s query I can tell what’s the set of al...
State-of-the-Art Methods•   IASelect: •   Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Div...
Diversify (k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)                                                                       intentsF. Silvestri - Efficient Diversi...
Diversify (k)                                                                                                         the ...
Diversify (k)                                                                                                         the ...
Diversify (k)                                                                                                         the ...
Diversify (k)                                                                                                         the ...
Diversify (k)                                                                                                         the ...
Known Results• Diversify(k) is NP-hard: • Reduction from max-weight coverage• Diversify(k)’s objective function is sub-mod...
Known Results• Diversify(k) is NP-hard: • Reduction from max-weight coverage• Diversify(k)’s objective function is sub-mod...
It looks reasonable, but...•   ... we might not diversify, at all!•   Consider a query returning a set Rd={a,b,c} of docum...
It looks reasonable, but...•   ... we might not diversify, at all!•   Consider a query returning a set Rd={a,b,c} of docum...
xQuAD_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)                                                                       Same problem as before...        ...
Our Proposal:                   MaxUtilityF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk ...
Vinci                     Our Proposal:                           MaxUtility        F. Silvestri - Efficient Diversification...
Leonardo da VinciVinci      Vinci Town                      Our Proposal:           Vinci Group                      MaxUt...
Leonardo da VinciVinci      Vinci Town                    1/3                          5/12                               ...
Leonardo da VinciVinci      Vinci Town                    1/3                          5/12                               ...
Leonardo da VinciVinci      Vinci Town                    1/3                          5/12                               ...
MaxUtility_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Mos...
MaxUtility_Diversify(k)                                                                                                   ...
MaxUtility_Diversify(k)                                                                                                   ...
Why it is Efficient?• By using a simple arithmetic argument we can show that:• Therefore we can find the optimal set S of di...
OptSelectF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
OptSelectF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
The Specialization Set Sq• It is crucial for OptSelect to  have the set of specialization  available for each query.• Our ...
Probability EstimationF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Mosc...
Usefulness of a ResultF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Mosc...
Usefulness of a ResultF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Mosc...
Experiments: Settings• TREC 2009 Web tracks Diversity Task framework: • ClueWeb-B, the subset of the TREC ClueWeb09 datase...
Experiments: QualityF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow...
Experiments: EfficiencyF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Mosc...
Conclusions and Future Work• We studied the problem of search results diversification from an efficiency point of  view• We ...
Question Time                                     Fabrizio Silvestri                                   ISTI-CNR, Pisa Ital...
Upcoming SlideShare
Loading in …5
×

"Efficient Diversification of Web Search Results"

3,844 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,844
On SlideShare
0
From Embeds
0
Number of Embeds
3,144
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"Efficient Diversification of Web Search Results"

  1. 1. Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI - CNR, Pisa, Italy
  2. 2. Introduction: SE Results Diversification• Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  3. 3. Introduction: SE Results Diversification• Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  4. 4. Introduction: SE Results Diversification• Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  5. 5. Query Diversification as a Coverage Problem• Hypothesis: • For each user’s query I can tell what’s the set of all possible intents • For each document in the collection I can tell what are all the possible user’s intents it represents • each intent for each document is, possibly, weighted by a value representing how much that intent is represented by that document (e.g., 1/2 of document D is related to the intent of “digital photography techniques”)• Goal: • Select the set of k documents in the collection covering the maximum amount of intent weight. I.e., maximize the number of satisfied users. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 3
  6. 6. State-of-the-Art Methods• IASelect: • Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM 09), Ricardo Baeza- Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.• xQuAD: • Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh, NC, USA, 2010. ACM. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 4
  7. 7. Diversify (k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  8. 8. Diversify (k) intentsF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  9. 9. Diversify (k) the weight intentsF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  10. 10. Diversify (k) the weight intents the weight is the probability of being relative to intent cF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  11. 11. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to cF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  12. 12. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to cF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  13. 13. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c at least one doc is no doc is pertinent to c pertinent to cF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  14. 14. Known Results• Diversify(k) is NP-hard: • Reduction from max-weight coverage• Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  15. 15. Known Results• Diversify(k) is NP-hard: • Reduction from max-weight coverage• Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  16. 16. It looks reasonable, but...• ... we might not diversify, at all!• Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.• The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2• The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  17. 17. It looks reasonable, but...• ... we might not diversify, at all!• Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.• The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2• The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  18. 18. xQuAD_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  19. 19. xQuAD_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  20. 20. xQuAD_Diversify(k) Same problem as before... It may not diversify, at all.F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  21. 21. Our Proposal: MaxUtilityF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  22. 22. Vinci Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  23. 23. Leonardo da VinciVinci Vinci Town Our Proposal: Vinci Group MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  24. 24. Leonardo da VinciVinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  25. 25. Leonardo da VinciVinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  26. 26. Leonardo da VinciVinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  27. 27. MaxUtility_Diversify(k)F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  28. 28. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query qF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  29. 29. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q Set of possible query specializationsF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  30. 30. Why it is Efficient?• By using a simple arithmetic argument we can show that:• Therefore we can find the optimal set S of diversified documents by using a sort-based approach. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 11
  31. 31. OptSelectF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  32. 32. OptSelectF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  33. 33. The Specialization Set Sq• It is crucial for OptSelect to have the set of specialization available for each query.• Our method is, thus, query log- based. • we use a query recommender system to obtain a set of queries from which Sq is built by including the most popular (i.e., freq. in query log > f(q) / s) recommendations: F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 13
  34. 34. Probability EstimationF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 14
  35. 35. Usefulness of a ResultF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  36. 36. Usefulness of a ResultF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  37. 37. Experiments: Settings• TREC 2009 Web tracks Diversity Task framework: • ClueWeb-B, the subset of the TREC ClueWeb09 dataset • The 50 topics (i.e., queries) provided by TREC • We evaluate α-NDCG and IA-P• All the tests were conducted on a Intel Core 2 Quad PC with 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22). F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 16
  38. 38. Experiments: QualityF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 17
  39. 39. Experiments: EfficiencyF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 18
  40. 40. Conclusions and Future Work• We studied the problem of search results diversification from an efficiency point of view• We derived a diversification method (OptSelect): • same (or better) quality of the state of the art • up to 100 times faster• Future work: • the exploitation of users search history for personalizing result diversification • the use of click-through data to improve our effectiveness results, and • the study of a search architecture performing the diversification task in parallel with the document scoring phase (Done! See DDR2011 paper) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 19
  41. 41. Question Time Fabrizio Silvestri ISTI-CNR, Pisa Italy http://hpc.isti.cnr.it/~fabriziosilvestri f.silvestri@isti.cnr.itF. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 20

×