Your SlideShare is downloading. ×
0
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

911

Published on

The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components –in particular …

The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components –in particular redundancy assessment– are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
911
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
25
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) Explicit Relevance Models in Intent-Aware IR Diversification Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid http://ir.ii.uam.es Portland, OR, 13 August 2012IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 2. Outline  Context: IR diversification formulation and algorithms  Proposed approach: relevance-based reformulation of diversification algorithms  Experiments  Adjustable tolerance to redundancy  ConclusionIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 3. IR diversity – Brief recap Nutrition / Health Appliance Chemical element Golf Mining / MetallurgyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 4. IR diversity – Brief recap Nutrition / Health Appliance  Diversity as a means to address uncertainty in user queries – The same query may have different intents or aspects in the Chemical information need underneath element  Revision of document relevance independence – Marginal utility of additional relevant documents decreases fast Golf  Trade diminishing marginal utility for increased intent coverage – Thus maximize the number of users who obtain at least some useful document Mining / MetallurgyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 5. IR diversification – Problem statement Given a query 𝑞 on a collection  Find 𝑆 ⊂  of given size maximizing: NP-hard 𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞 Agrawal 2009, Santos 2010, Chen 2006, … 𝑅− 𝑆 𝑆 Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy ranking 𝑑∈𝑅−𝑆 ranking approx 𝑝(𝑑|𝑞) 𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 6. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛 𝑧 𝑑 ′ ∈𝑆 Explicit query aspectsIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 7. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Query aspect  xQuAD scheme (Santos 2010) coverage 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 8. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Document “relevance”  xQuAD scheme (Santos 2010) for query aspect 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 9. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) Redundancy penalization 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 10. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Mixture with baseline 𝜆  Degree of diversificationIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 11. IR diversity – Instantiations of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Probability to  xQuAD scheme (Santos 2010) observe documents 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 12. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based Our proposal 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Probability  xQuAD scheme – relevance-based of relevance 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 13. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 More literal interpretation 𝑑 ′ ∈𝑆 of initial problem statement  xQuAD scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 14. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Equivalent  xQuAD scheme – relevance-based for 𝜆 = 1 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 15. Relevance distirbution vs. document distribution 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context) 1 𝑝 𝑑 𝑞, 𝑧 = 1 𝑑 𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1 𝑑 Different potential behavior  E.g. stronger redundancy penalization Potential rank 0 equivalences do 𝑑 not apply here 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧IRG 𝑧 ′ Explicit Relevance Models in Intent-Aware IR Diversification 𝑑 ∈𝑆 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 16. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 17. Aspect-based relevance model Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛 Cannot use odds, logs, constant removal… or any other rank-preserving step (we need the specific values) 𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞 Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending 𝑝 𝑧 𝑑 on available observations: 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP) • 𝑧 as subqueries (e.g. reformulations) 𝑝(𝑧) Then derive the other two parameters 𝑝 𝑑 𝑞 Normalized baseline IR system score (as in e.g. Bache 2009)IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 18. Positional relevance distribution estimate 𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌 1E+00 1E-01 𝑝 𝑟 𝑘 pLSA 1E-02 p(r|k) Lemur Precision 1E-03 estimates 1E-04 Click log AOL statistics 1E-05 0 20 40 60 80 100 120 140 160 180 200 𝑘 kIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 19. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 20. Experiments Search diversity Collection: ClueWeb09 category B (50M documents) Query/subtopic set: TREC 2009/10 diversity task (100 queries) Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100 Query aspect space: a) ODP categories level 4 (~7K categories) b) TREC subtopics (oracle for reference) Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform ODP categories: semi-supervised text classification by Textwise 𝑝 𝑧 𝑑 TREC subtopics: Indri search system run on 𝑧 as if a query i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation) 𝑝 𝑟 𝑘 ii. Click statistics from AOL log (thus different IR system)IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 21. Experiments – Search diversity on TREC xQuAD scheme Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑟 𝑘 from qrels Based on 𝑝 𝑑 𝑞, 𝑧 ODP categories TREC subtopics ERR-IA ERR-IA λ λIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 22. Experiments – Search diversity on TREC  -nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20 Lemur - 0.2587 0.1630 0.2396 0.4636 IA-Select - 0.2651 0.1681 0.2423 0.4483 categories a) ODP xQuAD 0.9 0.2675 0.1656 0.2451 0.4864 Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△ xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽ IA-Select - 0.3541 0.2346 0.3213 0.5787 subtopics b) TREC xQuAD 1.0 0.3445 0.2241 0.3127 0.5704 Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△ xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△  “informally” maximizing ERR-IA by 0.1 steps for each diversifier Best value in bold green ▲▼  𝑝 < 0.05IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 23. Experiments Recommendation diversity Collection: 6K users, 4K movies, 1M ratings Dataset 1: MovieLens 1M Subtopic set: 10 movie genres Collection: 1K users, 175K artists, 20M playcounts Dataset 2: Last.fm crawl Subtopic set: 120K social tags on artists by Last.fm users Queries  users Adaptation of IR diversity paradigm Documents  items (movies, music artists) Subtopics  item features (genres, tags) (Vargas, Castells & Vallet SIGIR 2011) Relevance judgments  test ratings from data split a) pLSA Baseline rankings: Diversified top n: 100 b) Popularity-based recommendation Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform 𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association) 𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test usersIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 24. Experiments – Recommendation diversity on MovieLens and Last.fm pLSA recommender MovieLens 1M Last.fm ERR-IA by item popularity Recommendation ERR-IA Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 Based on 𝑝 𝑑 𝑞, 𝑧 λ λIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 25. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancyIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 26. Adjustable tolerance to redundancy  Generalization of relevance-based diversification scheme  Formally support adjustable redundancy penalization  Approach: generalize relevance to browsing model Tolerance to redundancy 𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯ = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓 𝑐 𝑑 ′ ∈𝑆  Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1] – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1, i.e. a single relevant document is soughtIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 27. Adjustable tolerance to redundancy Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG Search task Recommendation task Lemur on TREC / Subtopics pLSA on MovieLens / Genres 1 1 𝑝 𝑠𝑡𝑜𝑝 𝑟 𝑝 𝑠𝑡𝑜𝑝 𝑟 0  1 0  1  best -nDCG value of column For each   worst -nDCG value of columnIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012
  • 28. Conclusion  Alternative, relevance-based formulation of greedy aspect-based diversification – Unifies two previous aspect-based algorithms – More literal expression of formal problem statement (and metrics?)  𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧 – Literal value estimates needed (rather than rank-equivalent approximations) – Estimate based on positional relevance (relevance or click data needed)  Seems to perform well empirically – Light requirements on relevance or click data for training positional relevance – Improvement trend, but needs to be tested under further optimizations  Formal support for redundancy tolerance adjustmentIRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)IR Group @ UAM Portland, OR, 13 August 2012

×