Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
35th Annual International ACM SIGIR Conference on Research                    and Development in Information Retrieval (SI...
Outline            Context: IR diversification formulation and algorithms            Proposed approach: relevance-based ...
IR diversity – Brief recap                                                                                                ...
IR diversity – Brief recap                                                                                                ...
IR diversification – Problem statement   Given a query 𝑞 on a collection    Find 𝑆 ⊂  of given size maximizing:         ...
IR diversity – Instantiations of objective function   State of the art aspect-based approaches    IA-Select scheme (Agraw...
IR diversity – Instantiations of objective function   State of the art aspect-based approaches    IA-Select scheme (Agraw...
IR diversity – Instantiations of objective function   State of the art aspect-based approaches    IA-Select scheme (Agraw...
IR diversity – Instantiations of objective function   State of the art aspect-based approaches    IA-Select scheme (Agraw...
IR diversity – Instantiations of objective function   State of the art aspect-based approaches    IA-Select scheme (Agraw...
IR diversity – Instantiations of objective function                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭   ...
IR diversity – Relevance-based instantiation of objective function                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆...
IR diversity – Relevance-based instantiation of objective function                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆...
IR diversity – Relevance-based instantiation of objective function             𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is r...
Relevance distirbution vs. document distribution                 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this c...
Relevance-based greedy diversification         Relevance-based reformulation of diversification algorithm         1. Need ...
Aspect-based relevance model     Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛     Cannot use odds, logs, constant removal… or any other rank-prese...
Positional relevance distribution estimate                                        𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞                ...
Relevance-based greedy diversification         Relevance-based reformulation of diversification algorithm         1. Need ...
Experiments   Search diversity          Collection: ClueWeb09 category B (50M documents)          Query/subtopic set: TREC...
Experiments – Search diversity on TREC     xQuAD scheme                                                               Base...
Experiments – Search diversity on TREC                                                                    -nDCG@20      ...
Experiments   Recommendation diversity                                               Collection: 6K users, 4K movies, 1M r...
Experiments – Recommendation diversity on MovieLens and Last.fm                      pLSA recommender                     ...
Relevance-based greedy diversification         Relevance-based reformulation of diversification algorithm         1. Need ...
Adjustable tolerance to redundancy      Generalization of relevance-based diversification scheme      Formally support a...
Adjustable tolerance to redundancy                   Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG                      ...
Conclusion    Alternative, relevance-based formulation of greedy aspect-based diversification                 – Unifies t...
Upcoming SlideShare
Loading in …5
×

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

1,324 views

Published on

The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components –in particular redundancy assessment– are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.

Published in: Technology, Business
  • Be the first to comment

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

  1. 1. 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) Explicit Relevance Models in Intent-Aware IR Diversification Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid http://ir.ii.uam.es Portland, OR, 13 August 2012IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  2. 2. Outline  Context: IR diversification formulation and algorithms  Proposed approach: relevance-based reformulation of diversification algorithms  Experiments  Adjustable tolerance to redundancy  ConclusionIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  3. 3. IR diversity – Brief recap Nutrition / Health Appliance Chemical element Golf Mining / MetallurgyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  4. 4. IR diversity – Brief recap Nutrition / Health Appliance  Diversity as a means to address uncertainty in user queries – The same query may have different intents or aspects in the Chemical information need underneath element  Revision of document relevance independence – Marginal utility of additional relevant documents decreases fast Golf  Trade diminishing marginal utility for increased intent coverage – Thus maximize the number of users who obtain at least some useful document Mining / MetallurgyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  5. 5. IR diversification – Problem statement Given a query 𝑞 on a collection  Find 𝑆 ⊂  of given size maximizing: NP-hard 𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞 Agrawal 2009, Santos 2010, Chen 2006, … 𝑅− 𝑆 𝑆 Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy ranking 𝑑∈𝑅−𝑆 ranking approx 𝑝(𝑑|𝑞) 𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  6. 6. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛 𝑧 𝑑 ′ ∈𝑆 Explicit query aspectsIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  7. 7. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Query aspect  xQuAD scheme (Santos 2010) coverage 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  8. 8. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Document “relevance”  xQuAD scheme (Santos 2010) for query aspect 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  9. 9. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) Redundancy penalization 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  10. 10. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Mixture with baseline 𝜆  Degree of diversificationIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  11. 11. IR diversity – Instantiations of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Probability to  xQuAD scheme (Santos 2010) observe documents 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  12. 12. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based Our proposal 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Probability  xQuAD scheme – relevance-based of relevance 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  13. 13. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 More literal interpretation 𝑑 ′ ∈𝑆 of initial problem statement  xQuAD scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  14. 14. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Equivalent  xQuAD scheme – relevance-based for 𝜆 = 1 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  15. 15. Relevance distirbution vs. document distribution 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context) 1 𝑝 𝑑 𝑞, 𝑧 = 1 𝑑 𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1 𝑑 Different potential behavior  E.g. stronger redundancy penalization Potential rank 0 equivalences do 𝑑 not apply here 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧IRG 𝑧 ′ Explicit Relevance Models in Intent-Aware IR Diversification 𝑑 ∈𝑆 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  16. 16. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  17. 17. Aspect-based relevance model Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛 Cannot use odds, logs, constant removal… or any other rank-preserving step (we need the specific values) 𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞 Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending 𝑝 𝑧 𝑑 on available observations: 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP) • 𝑧 as subqueries (e.g. reformulations) 𝑝(𝑧) Then derive the other two parameters 𝑝 𝑑 𝑞 Normalized baseline IR system score (as in e.g. Bache 2009)IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  18. 18. Positional relevance distribution estimate 𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌 1E+00 1E-01 𝑝 𝑟 𝑘 pLSA 1E-02 p(r|k) Lemur Precision 1E-03 estimates 1E-04 Click log AOL statistics 1E-05 0 20 40 60 80 100 120 140 160 180 200 𝑘 kIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  19. 19. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  20. 20. Experiments Search diversity Collection: ClueWeb09 category B (50M documents) Query/subtopic set: TREC 2009/10 diversity task (100 queries) Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100 Query aspect space: a) ODP categories level 4 (~7K categories) b) TREC subtopics (oracle for reference) Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform ODP categories: semi-supervised text classification by Textwise 𝑝 𝑧 𝑑 TREC subtopics: Indri search system run on 𝑧 as if a query i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation) 𝑝 𝑟 𝑘 ii. Click statistics from AOL log (thus different IR system)IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  21. 21. Experiments – Search diversity on TREC xQuAD scheme Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑟 𝑘 from qrels Based on 𝑝 𝑑 𝑞, 𝑧 ODP categories TREC subtopics ERR-IA ERR-IA λ λIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  22. 22. Experiments – Search diversity on TREC  -nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20 Lemur - 0.2587 0.1630 0.2396 0.4636 IA-Select - 0.2651 0.1681 0.2423 0.4483 categories a) ODP xQuAD 0.9 0.2675 0.1656 0.2451 0.4864 Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△ xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽ IA-Select - 0.3541 0.2346 0.3213 0.5787 subtopics b) TREC xQuAD 1.0 0.3445 0.2241 0.3127 0.5704 Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△ xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△  “informally” maximizing ERR-IA by 0.1 steps for each diversifier Best value in bold green ▲▼  𝑝 < 0.05IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  23. 23. Experiments Recommendation diversity Collection: 6K users, 4K movies, 1M ratings Dataset 1: MovieLens 1M Subtopic set: 10 movie genres Collection: 1K users, 175K artists, 20M playcounts Dataset 2: Last.fm crawl Subtopic set: 120K social tags on artists by Last.fm users Queries  users Adaptation of IR diversity paradigm Documents  items (movies, music artists) Subtopics  item features (genres, tags) (Vargas, Castells & Vallet SIGIR 2011) Relevance judgments  test ratings from data split a) pLSA Baseline rankings: Diversified top n: 100 b) Popularity-based recommendation Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform 𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association) 𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test usersIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  24. 24. Experiments – Recommendation diversity on MovieLens and Last.fm pLSA recommender MovieLens 1M Last.fm ERR-IA by item popularity Recommendation ERR-IA Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 Based on 𝑝 𝑑 𝑞, 𝑧 λ λIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  25. 25. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  26. 26. Adjustable tolerance to redundancy  Generalization of relevance-based diversification scheme  Formally support adjustable redundancy penalization  Approach: generalize relevance to browsing model Tolerance to redundancy 𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯ = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓 𝑐 𝑑 ′ ∈𝑆  Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1] – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1, i.e. a single relevant document is soughtIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  27. 27. Adjustable tolerance to redundancy Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG Search task Recommendation task Lemur on TREC / Subtopics pLSA on MovieLens / Genres 1 1 𝑝 𝑠𝑡𝑜𝑝 𝑟 𝑝 𝑠𝑡𝑜𝑝 𝑟 0  1 0  1  best -nDCG value of column For each   worst -nDCG value of columnIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  28. 28. Conclusion  Alternative, relevance-based formulation of greedy aspect-based diversification – Unifies two previous aspect-based algorithms – More literal expression of formal problem statement (and metrics?)  𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧 – Literal value estimates needed (rather than rank-equivalent approximations) – Estimate based on positional relevance (relevance or click data needed)  Seems to perform well empirically – Light requirements on relevance or click data for training positional relevance – Improvement trend, but needs to be tested under further optimizations  Formal support for redundancy tolerance adjustmentIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012

×