Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Maarten de Rijke

Joint work with Katja Hofmann
and Shimon Whiteson

Adapting Rankers Online 2

 Growing complexity of search engines
 Current methods for optimizing mostly work offline

Online learning to rank

 No distinction between training and operating
 Search engine observes users’ natural interactions
with the search interface, infers information from
them, and improves its ranking function
automatically
 Expensive data collection not required; the collected
data matches target users and target setting


Users’ natural interactions with the search
interface Refe r
s
s m a l to
p o s s i le st
b
Minimum scope of i te le s c op e
m
a cte d b ei n g
up o n
Segment Object Class
Behavior category

View, Listen, Scroll,
Examine Find, Query
Select Browse

Bookmark, Save,
Retain Print Delete, Purchase,
Email
Subscribe

Copy-and-paste, Forward, Reply,
Reference Quote Link, Cite
to
R efe rs of
o se
p u rp ve d Annotate Mark up Rate, Publish Organize
o bser io r
v
beh a
Create Type, Edit Author

Oard and Kim, 2001
Kelly and Teevan, 2004

Users’ interactions

 Relevance feedback
 History goes back close to forty years

 Typically used for query expansion, user profiling

 Explicit feedback
 Users explicitly give feedback

 Keywords, selecting or marking documents,

answering questions
 Natural explicit feedback can be difficult to obtain

 “Unnatural” explicit feedback through TREC

assessors and crowd sourcing

Users’ interactions (2)

 Implicit feedback for learning, query expansion and
user profiling
 Observe users’ natural interactions with system

 Reading time, saving, printing, bookmarking,

selecting, clicking, …
 Thought to be less accurate than explicit

measures
 Available in very large quantities at no cost


Learning to rank online

 Using online learning to rank approaches, retrieval
systems can learn directly from implicit feedback,
while they are running
 Algorithms need to explore new solutions to obtain

feedback for effective learning and exploit what has
been learned to produce results acceptable to users
 Interleaved comparison methods can use implicit

feedback to detect small differences between
rankers and can be used to learn ranking functions
online

Agenda

 Balancing exploration and exploitation
 Inferring preferences from clicks


Rec
en
wor t
k

Balancing Exploitation
and Exploration

K. Hofmann et al. (2011), Balancing exploration and exploitation. In:
ECIR ’11.

Challenges

 Generalize over queries and documents
 Learn from implicit feedback that is …
 noisy

 relative

 rank-biased

 Keep users happy while learning


Learning document pair-wise preferences

Vienna

 Insight: infer preferences
from clicks

Joachims, T. (2002). Optimizing search engines using
clickthrough data. In KDD '02, pages 133-142. Adapting Rankers Online 12

Learning document pair-wise preferences

 Input: feature vectors constructed from document
( (q, di ), (q, dj )) ∈ Rn × Rn
pairs x x
 Output: y ∈ {−1, +1} correct / incorrect order
 Learning method: supervised learning, e.g., SVM

Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD '02,
pages 133-142. Adapting Rankers Online 13

Challenges

 noisy

 relative

 rank-biased



Dueling bandit gradient descent

 Learns a ranking function consisting of a weight vector
for a linear weighted combination of feature vectors
from feedback about relative quality of rankings
 Outcome: weights for ranking S = w (q, d)
x
 Approach
 Maintain a current “best” ranking function
candidate w
 On each incoming query: x2
current best w
 Generate a new candidate ranking function
 Compare to current “best” x1
 If candidate is better, update “best” ranking function
Yue, Y. and Joachims, T. (2009). Interactively optimizing information
retrieval systems as a dueling bandits problem. In ICML '09.

Challenges

 noisy

 relative

 rank-biased



Exploration and exploitation

Need to learn effectively Need to present high-
from rank-biased quality results while
feedback learning

Exploration Exploitation

Previous approaches are either purely exploratory or
purely exploitative


Questions

 Can we improve online performance by balancing
exploration and exploitation?
 How much exploration is needed for effective
learning?


Problem formulation

 Reinforcement learning
 No explicit labels

 Learn from feedback from the environment in

response to actions (document lists)
 Contextual bandit problem
try something documents

Retrieval Environment Retrieval Environment
system (user) system (user)

get feedback clicks


Our method

 Learning based on Dueling Bandit Gradient Descent
 Relative evaluations of quality of two document

lists
 Infers such comparisons from implicit feedback

 Balance exploration and exploitation with k-greedy
comparison of document lists


k-greedy exploration

 To compare document
lists, interleave
 An exploration rate k
influences the relative
number of documents
from each list Blue wi n
c o mp a r s
is o n

n
Exp l o ratio
rate k = 0.5


k-greedy exploration

atio n atio n
Exp l o r 0.5 Exp l o r 0.2
rate k = rate k =


Evaluation

 Simulated interactions
 We need to
 observe clicks on arbitrary result lists

 measure online performance

 Simulate clicks and measure online performance
 Probabilistic click model: assume dependent click

model and define click and stop probabilities based
on standard learning to rank data sets
 Measure cumulative reward of the rankings

displayed to the user

Experiments

 Vary exploration rate k
 Three click models
 “perfect”

 “navigational”

 “informational”

 Evaluate on nine data sets (LETOR 3.0 and 4.0)


“Perfect” click model

0.8
0.6
 Click model

0.4
P(c|R) P(c|NR) P(s|R) P(s|NR)

0.2
1.0 0.0 0.0 0.0

0.0
0 200 400 600 800 1000

Final performance over time for data set
NP2003 and perfect click model

 Provides an upperbound


“Perfect” online performance

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 119.91 125.71 129.99 130.55 128.50
HP2004 109.21 111.57 118.54 119.86 116.46
117.44 fo r m a n
ce
NP2003 108.74 113.61
Bes t per 120.46
o
119.06
124.47 n l y t w
NP2004 112.33 119.34
with o 126.20
y
123.70
TD2003 82.00 84.24 88.20 r ato r 89.36
exp lo 86.20
or
e nts f91.71
TD2004 85.67 90.23 do c u m
91.00 88.98
OHSUMED 128.12 130.40 top- 01
131.16 re s u lts
133.37 131.93
MQ2007 96.02 97.48 98.54 100.28 98.32
MQ2008 90.97 92.99 94.03 95.59 95.14

Darker shades indicate higher performance
125.71 Dark borders indicate significant improvements over the k = 0.5 baseline


“Navigational” click model

0.8
0.6
 Click model

0.4

0.2
0.95 0.05 0.9 0.2

0.0
0 200 400 600 800 1000

Final performance over time for data set
 Simulate realistic but NP2003 and navigational click model

reliable interaction


“Navigational” online performance

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 102.58 109.78 118.84 116.38 117.52
HP2004 89.61 97.08 99.03 103.36 105.69
NP2003 90.32 100.94 Be st p e r fo r m a n c e
105.03 108.15 110.12
NP2004 99.14 104.34
t le
110.16 h l i t 112.05
wit 116.00
TD2003 70.93 75.20 ex plo
77.64ratio n77.54dan 75.70
TD2004 78.83 80.17 82.40 ot s o f 83.54
l 80.98
OHSUMED 125.35 126.92 127.37 l o i t at i o n
exp 127.94 127.21
MQ2007 95.50 94.99 95.70 96.02 94.94
MQ2008 89.39 90.55 91.24 92.36 92.25



“Informational” click model

0.8
k = 0.5 k = 0.2 k = 0.1

0.6
 Click model

0.4

0.2
0.9 0.4 0.5 0.1

0.0
0 200 400 600 800 1000

 Simulate very noisy Final performance over time for data set
NP2003 and informational click model

interaction


“Informational” online performance

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 59.53 63.91 61.43 70.11 71.19
HP2004 41.12 52.88
st 55.88
H i g h e 58.40
48.54 55.16

e nts63.23t h
wi
NP2003 53.63 53.64 57.60 69.90
63.38 p ro ve m
im
o n55.76 te s:
NP2004 60.59 64.17 69.96
51.58 at i
r ra
TD2003 52.78
l o w exp l o
52.95 57.30
59.75 n b et we e n
TD2004 58.49
i nte ra
61.43
ctio 62.88 63.37
126.76 et
as
OHSUMED
MQ2007
121.39
91.57
123.26
92.00
124.01
an
n o ise91.66 d dat90.79 125.40
90.19
MQ2008 86.06 87.26 85.83 87.62 86.29



Summary

 What?
 Developed first method for balancing exploration and

exploitation in online learning to rank
 Devised experimental framework for simulating user

interactions and measuring online performance
 And so?
 Balancing exploration and exploitation improves online

performance for all click models and all data sets
 Best results are achieved with 2 exploratory

documents per results list

What’s next here?

 Validate simulation assumptions
 Evaluate using on click logs
 Develop new algorithms for online learning to rank
for IR that can balance exploration and exploitation


Ongo
ing

Inferring Preferences
work

from Clicks


Interleaved ranker comparison methods

 Use implicit feedback (“clicks”), not to infer absolute
judgments, but to compare two rankers by observing
clicks on an interleaved result list
 Interleave two ranked lists (“outputs of two rankers”)

 Use click data to detect even very small differences

between rankers

 Examine three existing methods for interleaving,
identify issues with them and propose a new one

Three methods (1)

 Balanced interleave method
 Interleaved list is generated for each query based

on the two rankers
 User’s clicks on interleaved list are attributed to

each ranker based on how they ranked the clicked
docs
 Ranker that obtains more clicks is deemed

superior
Joachims, Evaluating retrieval performance Adapting Rankers Online 35
using clickthrough data, In: Text Mining, 2003

1) Interleaving 2) Comparison
List l1 List l2
d1 d2 d1 d2
d2 d3 x d2 x d1

observed
clicks c
d3 d4 d3 d3
d4 d1 x d4 x d4
k = min(4,3) = 3 k = min(4,4) = 4
Two possible interleaved lists l: click count:
click count:
c1 = 1 c1 = 2
d1 d2 c2 = 2
c2 = 2
d2 d1
d3 d3 l2 wins the first comparison, and the lists tie for
d4 d4 the second. In expectation l2 wins.


Three methods (2)

 Team draft method
 Create an interleaved list following the model of

“team captains” selecting their team from a set of
players
 For each pair of documents to be placed in the

interleaved list, a coin flip determines which list
gets to select a document first
 Record which document contributed which

document
Radlinski et al., How does click-through data reflect Adapting Rankers Online 37
retrieval quality? 2008

assignments a
List l1 List l2
d1 d2 a) c)
d2 d3 d1 1 d2 2
d3 d4 d2 2 d1 1
d4 d1 x d3 1 x d3 2
d4 2 d4 1
Four possible interleaved lists l,
with different assignments a: b) d)
d2 2 d1 1
For the interleaved lists a) and b) l1 d1 1 d2 2
wins the comparison. l2 wins in the x d3 1 x d3 2
other two cases. d4 2 d4 1


Three methods (3)

 Document-constraint method
 Result lists are interleaved and clicks observed as

for the balanced interleaved method
 Infer constraints on pairs of individual documents

based on clicks and ranks
 For each pair of a clicked document and a higher-ranked non-
clicked document, a constraint is inferred that requires the
former to be ranked higher than the latter
 The original list that violates fewer constraints is deemed
superior
He et al., Evaluation of methods for relative comparison of retrieval Adapting Rankers Online 39
systems based on clickthroughs, 2009

List l1 List l2
d1 d2 d1 d2
d2 d3 x d2 x d1
d3 d4 x d3 x d3
d4 d1 d4 d4
inferred constraints inferred constraints
Two possible interleaved lists l:
violated by: l1 l2 violated by: l1 l2
d1 d2 d2 ≻ d1 x - d1 ≻ d2 - x
d2 d1 d3 ≻ d1 x - d3 ≻ d2 x x
d3 d3 l2 wins the first comparison, and loses the
d4 d4 second. In expectation l2 wins.


Assessing comparison methods

 Bias
 Don’t prefer either ranker when clicks are random

 Sensitivity
 The ability of a comparison method to detect

differences in the quality of rankings

 Balanced interleave and document constraint are
biased
 Team draft may suffer from insensitivity

A new proposal

 Briefly
 Based on team draft

 Instead of interleaving deterministically, model the

interleaving process as random sampling from
softmax functions that define probability
distributions over documents
 Derive an estimator that is unbiased and sensitive

to small ranking changes
 Marginalize over all possible assignments to make

estimates more reliable

1) Probabilistic Interleave 2) Probabilistic marginalize over all possible assignments:
Comparison
l1 ! softmax s1 l2 ! softmax s2
a o(ci,a) P(a|li,qi)
d1 d2 P(dr=1)= 0.85 1 1 1 1 2 0 0.053
Observe data, e.g. 1 1 1 2 2 0 0.053
d2 d3 P(dr=2)= 0.10
d1 1 1 1 2 1 1 1 0.058
d3 d4 P(dr=3)= 0.03 x d2 2 1 1 2 2 1 1 0.058
d4 d1 P(dr=4)= 0.02
x d3 1 1 2 1 1 1 1 0.065
For each rank of the interleaved list l draw one of {s1, s2} and d4 2 1 2 1 2 1 1 0.065 P(c1 c2) = 0.108
sample d: 1 2 2 1 0 2 0.071 P(c1 c2) = 0.144
s1 d4
1 2 2 2 0 2 0.071
s1 d3
d2 s2 d4 2 1 1 1 2 0 0.001
s1 s2 d4 ... 2 1 1 2 2 0 0.001 s2 (based on l2) wins
d1 d3 ... 2 1 2 1 1 1 0.001 the comparison. s1 and
s2 2 1 2 2 1 1 0.001 s2 tie in expectation.
s1 d2 ... d4 ...
2 2 1 1 1 1 0.001
2 2 1 2 1 1 0.001
s2 d3 ... 2 2 2 1 0 2 0.001
All permutations of documents
d4 ... in D are possible. 2 2 2 2 0 2 0.001

 For an incoming query  ...
 System generates  All possible assignments are
generated;
interleaved list  Probability of each is computed
 Observe clicks
 Expensive; only need to do this
 Compute probability of until the lowest observed click
each possible outcome

Question

 Do analytical differences between the methods
translate into performance differences?


Evaluation

 Set-up
 Simulation based on dependent click model

 Perfect and realistic instantiations
 Not binary, but with relevance levels
 MSLR-WEB30k Microsoft learning to rank data set
 136 doc features (i.e., rankers)
 Three experiments
 Exhaustive comparison of all distinct ranker pairs

 9,180 distinct pairs
 Selection of small subsets for detailed analysis
 Add noise

Results (1)

 Experiment 1
 Accuracy

 Percentage of pairs of rankers for which a comparison
method identified the better ranker after 1000 queries

Method Accuracy
balanced interleave 0.881
team draft 0.898
document constraint 0.857
new 0.914


Results (2): overview

 “Problematic” pairs
 Pairs of rankers for which

all methods correctly
identified the better one
 Three achieved perfect

accuracy within 1000
queries
 For each method,

incorrectly judged pair with
highest difference in
NDCG

Results (3): perfect model
1

0.8

1
0.6

0.8
0.4
1
0.6
0.2 balanced interleave
team draft 0.8
document constraint
0.4
marginalized probabilities 1
0
0.6
1 10 100 1k 2k 5k 10k
0.2
0.8

0.4
0
0.6
1 10 100 1k 2k 5k 10k
0.2

0.4

0

1 10 0.2 100 1k 2k 5k 10k

0

1 10 100 1k 2k 5k 10k


Results (4): realistic model
1

0.8

0.6 1

0.4 0.8

0.2 0.6

0 0.4

1 10 100 1k 2k 5k 10k

0.2

0

1 10 100 1k 2k 5k 10k


Summary

 What?
 Methods for evaluating rankers using implicit

feedback
 Analysis of interleaved comparison methods in

terms of bias and sensitivity
 And so?
 Introduced a new probabilistic interleaved

comparison method, unbiased and sensitive
 Experimental analysis: more accurate, with

substantially fewer observed queries, more robust

What’s next here?

 Evaluate in a real-life setting in the future
 With more reliable and faster convergence, our
approach can pave the way for online learning to
rank methods that require many comparisons


Wrap-up


 Online learning to rank
 Emphasis on implicit feedback collected during
normal operation of the search engine
 Balancing exploration and exploitation
 Probabilistic method for inferring preferences from
clicks


Information retrieval observatory

 Academic experiments on online learning and
implicit feedback used simulators
 Need to validate the simulators

 What’s really needed
 Move away from artificial explicit feedback to

natural implicit feedback
 Shared experimental environment for observing

users in the wild as they interact with systems

 Adapting Rankers Online
 Maarten de Rijke, derijke@uva.nl


(Intentionally left blank)


Bias

List l1 List l2
d1 d2 d1 d2
d2 d3 x d2 x d1
observed
clicks c

d3 d4 d3 d3
d4 d1 x d4 x d4
k = min(4,3) = 3 k = min(4,4) = 4
Two possible interleaved lists l: click count:
click count:
c1 = 1 c1 = 2
d1 d2 c2 = 2
c2 = 2
d2 d1
d3 d3 l2 wins the first comparison, and the lists tie for
d4 d4 the second. In expectation l2 wins.


Sensitivity

assignments a
List l1 List l2
d1 d2 a) c)
d2 d3 d1 1 d2 2
d3 d4 d2 2 d1 1
d4 d1 x d3 1 x d3 2
d4 2 d4 1
Four possible interleaved lists l,
with different assignments a: b) d)
d2 2 d1 1
For the interleaved lists a) and b) l1 d1 1 d2 2
wins the comparison. l2 wins in the x d3 1 x d3 2
other two cases. d4 2 d4 1


Adapting Rankers Online, Maarten de Rijke

Recommended

Recommended

More Related Content

Similar to Adapting Rankers Online, Maarten de Rijke

Similar to Adapting Rankers Online, Maarten de Rijke (20)

More from yaevents

More from yaevents (20)

Recently uploaded

Recently uploaded (20)

Adapting Rankers Online, Maarten de Rijke