SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

35th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2012)

Explicit Relevance Models
in Intent-Aware IR Diversification
Saúl Vargas, Pablo Castells and David Vallet
Universidad Autónoma de Madrid
http://ir.ii.uam.es

Portland, OR, 13 August 2012

IRG
Explicit Relevance Models in Intent-Aware IR Diversification
35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM Portland, OR, 13 August 2012

Outline

 Context: IR diversification formulation and algorithms

 Proposed approach: relevance-based reformulation
of diversification algorithms

 Experiments

 Adjustable tolerance to redundancy

 Conclusion

IRG

IR diversity – Brief recap

Nutrition /
Health

Appliance

Chemical
element

Golf

Mining /
Metallurgy

IRG

IR diversity – Brief recap

Nutrition /
Health

Appliance
 Diversity as a means to address uncertainty in user queries
– The same query may have different intents or aspects in the Chemical
information need underneath element
 Revision of document relevance independence
– Marginal utility of additional relevant documents decreases fast
Golf
 Trade diminishing marginal utility for increased intent coverage
– Thus maximize the number of users who obtain at least some
useful document Mining /
Metallurgy

IRG

IR diversification – Problem statement

Given a query 𝑞 on a collection 
Find 𝑆 ⊂  of given size maximizing: NP-hard
𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞
Agrawal 2009, Santos 2010, Chen 2006, …

𝑅− 𝑆 𝑆
Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy
ranking 𝑑∈𝑅−𝑆 ranking approx
𝑝(𝑑|𝑞)

𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞

IRG

IR diversity – Instantiations of objective function

State of the art aspect-based approaches
 IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆

Explicit query aspects
 xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛
𝑧 𝑑 ′ ∈𝑆

Explicit query aspects
IRG



𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑧 𝑑 ′ ∈𝑆

Query aspect
coverage
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

= 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

IRG



𝑧 𝑑 ′ ∈𝑆

Document “relevance”
for query aspect
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

𝑧 𝑑 ′ ∈𝑆

IRG



𝑧 𝑑 ′ ∈𝑆

 xQuAD scheme (Santos 2010) Redundancy
penalization
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

𝑧 𝑑 ′ ∈𝑆

IRG



𝑧 𝑑 ′ ∈𝑆

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

𝑧 𝑑 ′ ∈𝑆

Mixture with baseline 𝜆  Degree of diversification
IRG


𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞


𝑧 𝑑 ′ ∈𝑆

Probability to
 xQuAD scheme (Santos 2010) observe documents
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

𝑧 𝑑 ′ ∈𝑆

IRG

IR diversity – Relevance-based instantiation of objective function


 IA-Select scheme – relevance-based Our proposal
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

Probability
 xQuAD scheme – relevance-based of relevance
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

= 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

IRG



 IA-Select scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 More literal interpretation
𝑑 ′ ∈𝑆
of initial problem statement

 xQuAD scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

IRG


𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞

 IA-Select scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

Equivalent
 xQuAD scheme – relevance-based
for 𝜆 = 1
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
𝑧 𝑑 ′ ∈𝑆

IRG

Relevance distirbution vs. document distribution

𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)
1

𝑝 𝑑 𝑞, 𝑧 = 1
𝑑

𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1
𝑑

Different potential behavior
 E.g. stronger redundancy penalization
Potential rank
0 equivalences do
𝑑 not apply here

1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧
IRG 𝑧 ′
𝑑 ∈𝑆

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

IRG

Aspect-based relevance model

Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

Cannot use odds, logs, constant removal… or any other rank-preserving step
(we need the specific values)

𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞

Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending
𝑝 𝑧 𝑑
on available observations:
𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP)
• 𝑧 as subqueries (e.g. reformulations)
𝑝(𝑧)
Then derive the other two parameters

𝑝 𝑑 𝑞 Normalized baseline IR system score
(as in e.g. Bache 2009)

IRG

Positional relevance distribution estimate

𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌

1E+00

1E-01 𝑝 𝑟 𝑘
pLSA
1E-02
p(r|k)

Lemur Precision
1E-03 estimates

1E-04 Click log
AOL statistics
1E-05
0 20 40 60 80 100 120 140 160 180 200
𝑘
k

IRG

Experiments
Search diversity
Collection: ClueWeb09 category B (50M documents)
Query/subtopic set: TREC 2009/10 diversity task (100 queries)

Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100
Query aspect space:
a) ODP categories level 4 (~7K categories)
b) TREC subtopics (oracle for reference)
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
ODP categories: semi-supervised text classification by Textwise
𝑝 𝑧 𝑑
TREC subtopics: Indri search system run on 𝑧 as if a query
i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
𝑝 𝑟 𝑘
ii. Click statistics from AOL log (thus different IR system)

IRG

Experiments – Search diversity on TREC

xQuAD scheme
Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
𝑝 𝑟 𝑘 from qrels
Based on 𝑝 𝑑 𝑞, 𝑧

ODP categories TREC subtopics
ERR-IA

ERR-IA

λ λ

IRG

Experiments – Search diversity on TREC

 -nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20
Lemur - 0.2587 0.1630 0.2396 0.4636
IA-Select - 0.2651 0.1681 0.2423 0.4483
categories
a) ODP

xQuAD 0.9 0.2675 0.1656 0.2451 0.4864
Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△
xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽

IA-Select - 0.3541 0.2346 0.3213 0.5787
subtopics
b) TREC

xQuAD 1.0 0.3445 0.2241 0.3127 0.5704
Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△
xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△

 “informally” maximizing ERR-IA by 0.1 steps for each diversifier
Best value in bold green
▲▼  𝑝 < 0.05

IRG

Experiments
Recommendation diversity
Collection: 6K users, 4K movies, 1M ratings
Dataset 1: MovieLens 1M
Subtopic set: 10 movie genres
Collection: 1K users, 175K artists, 20M playcounts
Dataset 2: Last.fm crawl
Subtopic set: 120K social tags on artists by Last.fm users
Queries  users
Adaptation of IR diversity paradigm Documents  items (movies, music artists)
Subtopics  item features (genres, tags)
(Vargas, Castells & Vallet SIGIR 2011)
Relevance judgments  test ratings from data split

a) pLSA
Baseline rankings: Diversified top n: 100
b) Popularity-based recommendation
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)
𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users

IRG

Experiments – Recommendation diversity on MovieLens and Last.fm

pLSA recommender MovieLens 1M Last.fm

ERR-IA
by item popularity
Recommendation

ERR-IA

Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
Based on 𝑝 𝑑 𝑞, 𝑧

λ λ

IRG

Adjustable tolerance to redundancy

 Generalization of relevance-based diversification scheme
 Formally support adjustable redundancy penalization
 Approach: generalize relevance to browsing model
Tolerance to
redundancy
𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯

= 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓
𝑐 𝑑 ′ ∈𝑆

 Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]
– High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches
– In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,
i.e. a single relevant document is sought

IRG

Adjustable tolerance to redundancy

Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG

Search task Recommendation task
Lemur on TREC / Subtopics pLSA on MovieLens / Genres
1 1
𝑝 𝑠𝑡𝑜𝑝 𝑟

𝑝 𝑠𝑡𝑜𝑝 𝑟
0  1 0  1

 best -nDCG value of column
For each 
 worst -nDCG value of column

IRG

Conclusion

 Alternative, relevance-based formulation of greedy aspect-based diversification
– Unifies two previous aspect-based algorithms

– More literal expression of formal problem statement (and metrics?)

 𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧
– Literal value estimates needed (rather than rank-equivalent approximations)

– Estimate based on positional relevance (relevance or click data needed)

 Seems to perform well empirically

– Light requirements on relevance or click data for training positional relevance

– Improvement trend, but needs to be tested under further optimizations

 Formal support for redundancy tolerance adjustment

IRG

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

Recommended

Recommended

More Related Content

More from Pablo Castells

More from Pablo Castells (8)

Recently uploaded

Recently uploaded (20)

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification