ACM SAC 2018, Pau, France
LiMe: Linear Methods for Pseudo-Relevance Feedback
Daniel Valcarce Javier Parapar Álvaro Barreiro
@dvalcarce @jparapar @AlvaroBarreiroG
Information Retrieval Lab
University of A Coruña
Spain
Outline
1. Pseudo-Relevance Feedback
2. Our proposal: LiMe
3. Experiments
4. Conclusions and Future Directions
1
Pseudo-Relevance Feedback
Pseudo-Relevance Feedback (I)
Pseudo-Relevance Feedback provides an automatic method for
query expansion:
First retrieval with the original query
◦ Top retrieved documents are assumed to be relevant
(pseudo-relevant set)
Expand the query with terms from the pseudo-relevant set
Second retrieval with the expanded query
◦ The expanded query usually performs better than the
original one
3
Pseudo-Relevance Feedback (II)
Information need
4
Pseudo-Relevance Feedback (II)
Information need
query
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
Query
Expansion
expanded
query
4
Pseudo-Relevance Feedback (II)
Information need
query Retrieval
System
Query
Expansion
expanded
query
4
Our proposal: LiMe
LiMe
LiMe, as a PRF technique:
models the PRF task a matrix decomposition problem
employs linear methods to provide a solution
is able to learn inter-term similarities
jointly models the query and the pseudo-relevant set
admits different feature schemes to represent documents
and queries
is agnostic to the retrieval model
6
Notation
Some notation:
A user prompts a query Q
The collection C is composed of documents D
V denotes the vocabulary and is formed of terms t
We denote the pseudo-relevant set by F
And the extended pseudo-relevant set by F {Q} ∪ F
◦ Its cardinal is m |F | |F| + 1
◦ And its vocabulary VF has size n |VF | ≤ |V|
7
LiMe: Matrix Formulation
Let X ∈ Rm×n be the extended pseudo-relevant set matrix, we
aim to find a inter-term similarity matrix W ∈ Rn×n
+ such that:
X X × W
Q
D1
. . .
Dm−1 m×n
Q
D1
. . .
Dm−1 m×n
×
w11 · · · w1n
...
...
...
wn1 · · · wnn n×n
s.t. diag(W) 0, W ≥ 0
8
LiMe: Feature Schemes (I)
How do we fill matrix X
Q
D1
. . .
Dm−1 m×n
?
9
LiMe: Feature Schemes (I)
How do we fill matrix X
Q
D1
. . .
Dm−1 m×n
?
xij



s(tj , Q) if i 1 and f (tj , Q) > 0,
s(tj , Di−1) if i > 1 and f (tj , Di−1) > 0,
0 otherwise
s(t, D): weighting function of term t in D (or Q)
f (t, D): #occurrences of term t in D (or Q)
9
LiMe: Feature Schemes (II)
We tested two well-known Information Retrieval weighting
functions:
TF
stf(w, D) 1 + log2 f (w, D)
TF-IDF
stf-idf(w, D) 1 + log2 f (w, D) × log2
|C|
df(w)
10
LiMe: Optimization Problem
W∗
arg min
W
1
2
X − X W 2
F + β1 W 1,1 +
β2
2
W 2
F
s.t. diag(W) 0, W ≥ 0
(1)
11
LiMe: Optimization Problem
W∗
arg min
W
1
2
X − X W 2
F + β1 W 1,1 +
β2
2
W 2
F
s.t. diag(W) 0, W ≥ 0
(1)
Bound constrained least squares optimization problem with
elastic net ( 1 and 2 regularization) penalty:
ìw∗
·j arg min
ìw·j
1
2
ìx·j − X ìw·j
2
2
+ β1 ìw·j 1
+
β2
2
ìw·j
2
2
s.t. wjj 0, ìw·j ≥ 0
(2)
11
LiMe: Query Expansion
To expand the original query, we reconstruct the first row of X:
Q
1×n
Q
1×n
×
w11 · · · w1n
...
...
...
wn1 · · · wnn n×n
ˆx1· ìx1· × W∗
(3)
12
LiMe: Query Expansion
To expand the original query, we reconstruct the first row of X:
Q
1×n
Q
1×n
×
w11 · · · w1n
...
...
...
wn1 · · · wnn n×n
ˆx1· ìx1· × W∗
(3)
We compute a probabilistic estimate of a term tj given the
feedback model θF:
p(tj |θF)



ˆx1j
tv ∈VF
ˆx1v
if tj ∈ VF ,
0 otherwise
(4)
12
LiMe: Second retrieval
The second retrieval is performed interpolating the original
query model with the feedback model:
p(t|θQ) (1 − α) p(t|θQ) + α p(t|θF) (5)
The hyperparameter α controls the interpolation
This is a standard procedure in state-of-the-art PRF
techniques
13
Experiments
State-of-the-art Baselines
Retrieval model:
◦ LM: Language Models (µ 1000) [Ponte & Croft, SIGIR ’98]
Based on language modelling:
◦ RM3: Relevance-Based Language Models [Lavrenko &
Croft, SIGIR ’01]
◦ MEDMM: Maximum-Entropy Divergence Minimisation
Models [Lv & Zhai, CIKM ’09]
Based on matrix factorization:
◦ RFMF: Relevance Feedback Matrix Factorisation [Zamani et
al., CIKM ’16]
15
Test Collections
Collection #docs
Avg doc Topics
length Training Test
AP88-89 165k 284.7 51-100 101-150
TREC-678 528k 297.1 301-350 351-400
Robust04 528k 28.3 301-450 601-700
WT10G 1,692k 399.3 451-500 501-550
GOV2 25,205k 647.9 701-750 751-800
16
Evaluation Metrics
We produce a ranking of 1000 documents per query:
MAP Mean Average Precision
nDCG Normalised Discounted Cumulative Gain
RI Robustness Index:
#topics improved−#topics degraded
#topics
17
Results
Metric LM RFMF MEDMM RM3 LiMe-TF LiMe-TF-IDF
AP
MAP 0.2349 0.2774 0.3010 0.3002 0.3062 0.3149
nDCG 0.5637 0.5749 0.5955 0.6005 0.6003 0.6085
RI − 0.42 0.42 0.50 0.38 0.52
TREC
MAP 0.1931 0.2072 0.2327 0.2235 0.2267 0.2357
nDCG 0.4518 0.4746 0.5115 0.4987 0.5051 0.5198
RI − 0.23 0.26 0.40 0.48 0.46
Robust
MAP 0.2914 0.3130 0.3447 0.3488 0.3388 0.3517
nDCG 0.5830 0.5884 0.6227 0.6251 0.6223 0.6294
RI − 0.07 0.32 0.37 0.23 0.37
WT10G
MAP 0.2194 0.2389 0.2472 0.2470 0.2484 0.2476
nDCG 0.5212 0.5262 0.5324 0.5352 0.5416 0.5398
RI − 0.30 0.36 0.20 0.32 0.30
GOV2
MAP 0.3310 0.3580 0.3790 0.3755 0.3776 0.3830
nDCG 0.6325 0.6453 0.6653 0.6618 0.6656 0.6698
RI − 0.42 0.66 0.60 0.68 0.62 18
Sensitivity of the 1 regularization (β1)
0.220
0.240
0.260
0.280
0.300
0.320
0.340
0.360
10−5 10−4 10−3 10−2 10−1 100 101 102 103
MAP
β1
AP88-89
WT2G
TREC678
WT10G
19
Sensitivity of the 2 regularization (β2)
0.16
0.18
0.20
0.22
0.24
0.26
0.28
0.30
0.32
0.34
0 50 100 150 200 250 300 350 400 450 500
MAP
β2
AP88-89
TREC-678
Robust04
WT10G
Gov2
20
Sensitivity of the number of pseudo-relevant documents (k)
0.140
0.160
0.180
0.200
0.220
0.240
0.260
0.280
0.300
0.320
0.340
5 10 25 50 75 100
MAP
k
AP88-89
TREC678
Robust04
WT10G
GOV2
21
Sensitivity of the number of terms (e)
0.140
0.160
0.180
0.200
0.220
0.240
0.260
0.280
0.300
0.320
0.340
5 10 25 50 75 100
MAP
e
AP88-89
TREC678
Robust04
WT10G
GOV2
22
Sensitivity of the query interpolation (α)
0.14
0.16
0.18
0.20
0.22
0.24
0.26
0.28
0.30
0.32
0.34
0.0 0.2 0.4 0.6 0.8 1.0
MAP
α
AP88-89
TREC678
Robust04
WT10G
GOV2
23
Conclusions and Future Directions
Conclusions
LiMe:
is a PRF technique that shows state-of-the-art performance
can be plugged on top of any retrieval model
accepts different feature schemes
models inter-term similarities
25
Future work
Alternative feature schemes based on:
retrieval features
query logs
Explore connection with Translation Models which also rely on
inter-term similarities:
learnt from training data [Berger & Lafferty, SIGIR ’99]
based on mutual information [Karimzadehgan & Zhai,
SIGIR ’10]
26
Thank you!
@dvalcarce
http://www.dc.fi.udc.es/~dvalcarce

LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]

  • 1.
    ACM SAC 2018,Pau, France LiMe: Linear Methods for Pseudo-Relevance Feedback Daniel Valcarce Javier Parapar Álvaro Barreiro @dvalcarce @jparapar @AlvaroBarreiroG Information Retrieval Lab University of A Coruña Spain
  • 2.
    Outline 1. Pseudo-Relevance Feedback 2.Our proposal: LiMe 3. Experiments 4. Conclusions and Future Directions 1
  • 3.
  • 4.
    Pseudo-Relevance Feedback (I) Pseudo-RelevanceFeedback provides an automatic method for query expansion: First retrieval with the original query ◦ Top retrieved documents are assumed to be relevant (pseudo-relevant set) Expand the query with terms from the pseudo-relevant set Second retrieval with the expanded query ◦ The expanded query usually performs better than the original one 3
  • 5.
  • 6.
  • 7.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System 4
  • 8.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System 4
  • 9.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System 4
  • 10.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System 4
  • 11.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System Query Expansion expanded query 4
  • 12.
    Pseudo-Relevance Feedback (II) Informationneed query Retrieval System Query Expansion expanded query 4
  • 13.
  • 14.
    LiMe LiMe, as aPRF technique: models the PRF task a matrix decomposition problem employs linear methods to provide a solution is able to learn inter-term similarities jointly models the query and the pseudo-relevant set admits different feature schemes to represent documents and queries is agnostic to the retrieval model 6
  • 15.
    Notation Some notation: A userprompts a query Q The collection C is composed of documents D V denotes the vocabulary and is formed of terms t We denote the pseudo-relevant set by F And the extended pseudo-relevant set by F {Q} ∪ F ◦ Its cardinal is m |F | |F| + 1 ◦ And its vocabulary VF has size n |VF | ≤ |V| 7
  • 16.
    LiMe: Matrix Formulation LetX ∈ Rm×n be the extended pseudo-relevant set matrix, we aim to find a inter-term similarity matrix W ∈ Rn×n + such that: X X × W Q D1 . . . Dm−1 m×n Q D1 . . . Dm−1 m×n × w11 · · · w1n ... ... ... wn1 · · · wnn n×n s.t. diag(W) 0, W ≥ 0 8
  • 17.
    LiMe: Feature Schemes(I) How do we fill matrix X Q D1 . . . Dm−1 m×n ? 9
  • 18.
    LiMe: Feature Schemes(I) How do we fill matrix X Q D1 . . . Dm−1 m×n ? xij    s(tj , Q) if i 1 and f (tj , Q) > 0, s(tj , Di−1) if i > 1 and f (tj , Di−1) > 0, 0 otherwise s(t, D): weighting function of term t in D (or Q) f (t, D): #occurrences of term t in D (or Q) 9
  • 19.
    LiMe: Feature Schemes(II) We tested two well-known Information Retrieval weighting functions: TF stf(w, D) 1 + log2 f (w, D) TF-IDF stf-idf(w, D) 1 + log2 f (w, D) × log2 |C| df(w) 10
  • 20.
    LiMe: Optimization Problem W∗ argmin W 1 2 X − X W 2 F + β1 W 1,1 + β2 2 W 2 F s.t. diag(W) 0, W ≥ 0 (1) 11
  • 21.
    LiMe: Optimization Problem W∗ argmin W 1 2 X − X W 2 F + β1 W 1,1 + β2 2 W 2 F s.t. diag(W) 0, W ≥ 0 (1) Bound constrained least squares optimization problem with elastic net ( 1 and 2 regularization) penalty: ìw∗ ·j arg min ìw·j 1 2 ìx·j − X ìw·j 2 2 + β1 ìw·j 1 + β2 2 ìw·j 2 2 s.t. wjj 0, ìw·j ≥ 0 (2) 11
  • 22.
    LiMe: Query Expansion Toexpand the original query, we reconstruct the first row of X: Q 1×n Q 1×n × w11 · · · w1n ... ... ... wn1 · · · wnn n×n ˆx1· ìx1· × W∗ (3) 12
  • 23.
    LiMe: Query Expansion Toexpand the original query, we reconstruct the first row of X: Q 1×n Q 1×n × w11 · · · w1n ... ... ... wn1 · · · wnn n×n ˆx1· ìx1· × W∗ (3) We compute a probabilistic estimate of a term tj given the feedback model θF: p(tj |θF)    ˆx1j tv ∈VF ˆx1v if tj ∈ VF , 0 otherwise (4) 12
  • 24.
    LiMe: Second retrieval Thesecond retrieval is performed interpolating the original query model with the feedback model: p(t|θQ) (1 − α) p(t|θQ) + α p(t|θF) (5) The hyperparameter α controls the interpolation This is a standard procedure in state-of-the-art PRF techniques 13
  • 25.
  • 26.
    State-of-the-art Baselines Retrieval model: ◦LM: Language Models (µ 1000) [Ponte & Croft, SIGIR ’98] Based on language modelling: ◦ RM3: Relevance-Based Language Models [Lavrenko & Croft, SIGIR ’01] ◦ MEDMM: Maximum-Entropy Divergence Minimisation Models [Lv & Zhai, CIKM ’09] Based on matrix factorization: ◦ RFMF: Relevance Feedback Matrix Factorisation [Zamani et al., CIKM ’16] 15
  • 27.
    Test Collections Collection #docs Avgdoc Topics length Training Test AP88-89 165k 284.7 51-100 101-150 TREC-678 528k 297.1 301-350 351-400 Robust04 528k 28.3 301-450 601-700 WT10G 1,692k 399.3 451-500 501-550 GOV2 25,205k 647.9 701-750 751-800 16
  • 28.
    Evaluation Metrics We producea ranking of 1000 documents per query: MAP Mean Average Precision nDCG Normalised Discounted Cumulative Gain RI Robustness Index: #topics improved−#topics degraded #topics 17
  • 29.
    Results Metric LM RFMFMEDMM RM3 LiMe-TF LiMe-TF-IDF AP MAP 0.2349 0.2774 0.3010 0.3002 0.3062 0.3149 nDCG 0.5637 0.5749 0.5955 0.6005 0.6003 0.6085 RI − 0.42 0.42 0.50 0.38 0.52 TREC MAP 0.1931 0.2072 0.2327 0.2235 0.2267 0.2357 nDCG 0.4518 0.4746 0.5115 0.4987 0.5051 0.5198 RI − 0.23 0.26 0.40 0.48 0.46 Robust MAP 0.2914 0.3130 0.3447 0.3488 0.3388 0.3517 nDCG 0.5830 0.5884 0.6227 0.6251 0.6223 0.6294 RI − 0.07 0.32 0.37 0.23 0.37 WT10G MAP 0.2194 0.2389 0.2472 0.2470 0.2484 0.2476 nDCG 0.5212 0.5262 0.5324 0.5352 0.5416 0.5398 RI − 0.30 0.36 0.20 0.32 0.30 GOV2 MAP 0.3310 0.3580 0.3790 0.3755 0.3776 0.3830 nDCG 0.6325 0.6453 0.6653 0.6618 0.6656 0.6698 RI − 0.42 0.66 0.60 0.68 0.62 18
  • 30.
    Sensitivity of the1 regularization (β1) 0.220 0.240 0.260 0.280 0.300 0.320 0.340 0.360 10−5 10−4 10−3 10−2 10−1 100 101 102 103 MAP β1 AP88-89 WT2G TREC678 WT10G 19
  • 31.
    Sensitivity of the2 regularization (β2) 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0 50 100 150 200 250 300 350 400 450 500 MAP β2 AP88-89 TREC-678 Robust04 WT10G Gov2 20
  • 32.
    Sensitivity of thenumber of pseudo-relevant documents (k) 0.140 0.160 0.180 0.200 0.220 0.240 0.260 0.280 0.300 0.320 0.340 5 10 25 50 75 100 MAP k AP88-89 TREC678 Robust04 WT10G GOV2 21
  • 33.
    Sensitivity of thenumber of terms (e) 0.140 0.160 0.180 0.200 0.220 0.240 0.260 0.280 0.300 0.320 0.340 5 10 25 50 75 100 MAP e AP88-89 TREC678 Robust04 WT10G GOV2 22
  • 34.
    Sensitivity of thequery interpolation (α) 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.0 0.2 0.4 0.6 0.8 1.0 MAP α AP88-89 TREC678 Robust04 WT10G GOV2 23
  • 35.
  • 36.
    Conclusions LiMe: is a PRFtechnique that shows state-of-the-art performance can be plugged on top of any retrieval model accepts different feature schemes models inter-term similarities 25
  • 37.
    Future work Alternative featureschemes based on: retrieval features query logs Explore connection with Translation Models which also rely on inter-term similarities: learnt from training data [Berger & Lafferty, SIGIR ’99] based on mutual information [Karimzadehgan & Zhai, SIGIR ’10] 26
  • 38.