Learning to Rank - From pairwise approach to listwise

ì

Learning
To
Rank:
From
Pairwise

Approach
to
Listwise
Approach

Zhe
Cao,
Tao
Qin,
Tie-‐Yan
Liu,
Ming-‐Feng
Tsai,
and
Hang
Li

Hasan
Hüseyin
Topcu

Learning
To
Rank

Outline

ì  Related
Work

ì  Learning
System

ì  Learning
to
Rank

ì  Pairwise
vs.
Listwise
Approach

ì  Experiments

ì  Conclusion

Related
Work

ì  Pairwise
Approach
:
Learning
task
is
formalized
as
classiﬁcaNon

of
object
pairs
into
two
categories
(
correctly
ranked
and

incorrectly
ranked)

ì  The
methods
of
classiﬁcaNon:

ì  Ranking
SVM
(Herbrich
et
al.,
1999)
and
Joachims(2002)
applied

RankingSVM
to
InformaNon
Retrieval

ì  RankBoost
(
Freund
et
al.
1998)

ì  RankNet
(Burges
et
al.
2005):

Learning
System

Training
Data,
Data
Preprocessing,
…

How
objects
are
idenNﬁed?

How
instances
are
modeled?

SVM,
ANN,
BoosNng

Evaluate
with
test
data

Adapted
from
Paaern
Classiﬁcaton(Duda,
Hart,
Stork)

Learning
to
Rank

ì  A
number
of
queries
are
provided

ì  Each
query
is
associated
with
perfect
ranking
list
of
documents

(Ground-‐Truth)

ì  A
Ranking
funcNon
is
created
using
the
training
data
such
that

the
model
can
precisely
predict
the
ranking
list.

ì  Try
to
opNmize
a
Loss
funcNon
for
learning.
Note
that
the
loss

funcNon
for
ranking
is
slightly
diﬀerent
in
the
sense
that
it

makes
use
of
sorNng.

Data
Labeling

ì  Explicit
Human
Judgment
(Perfect,
Excellent,
Good,
Fair,
Bad)

ì  Implicit
Relevance
Judgment
:
Derived
from
click
data
(Search

log
data)

ì  Ordered
Pairs
between
documents
(A
>
B)

ì  List
of
judgments(scores)

Pairwise
Approach

ì  Training
data
instances
are
document
pairs
in
learning

Pairwise
Approach

ì  Collects
document
pairs
from
the
ranking
list
and
for
each

document
pairs
it
assigns
a
label.

ì 
Data
labels
+1
if
score
of
A
>
B
and
-‐1
if
A
<
B

ì  Formalizes
the
problem
of
learning
to
rank
as
binary

classiﬁcaNon

ì  RankingSVM,
RankBoost
and
RankNet

Pairwise
Approach
Drawbacks

ì  ObjecNve
of
learning
is
formalized
as
minimizing
errors
in

classiﬁcaNon
of
document
pairs
rather
than
minimizing
errors
in

ranking
of
documents.

ì  Training
process
is
computaNonally
costly,
as
the
documents
of

pairs
is
very
large.

Pairwise
Approach
Drawbacks

ì  Equally
treats
document
pairs
across
diﬀerent

grades
(labels)
(Ex.1)

ì  The
number
of
generated
document
pairs
varies

largely
from
query
to
query,
which
will
result
in

training
a
model
biased
toward
queries
with
more

document
pairs.
(Ex.2)

Listwise
Approach

ì  Training
data
instances
are
document
list

ì  The
objecNve
of
learning
is
formalized
as
minimizaNon
of
the

total
loses
with
respect
to
the
training
data.

ì  Listwise
Loss
FuncNon
uses
probability
models:
Permuta(on

Probability
and
Top
One
Probability

ments d(i0
)
are given, we construct feature vectors x(i0
)
from
them and use the trained ranking function to assign scores
to the documents d(i0
)
. Finally we rank the documents d(i0
)
in descending order of the scores. We call the learning
problem described above as the listwise approach to learn-
ing to rank.
By contrast, in the pairwise approach, a new training data
set T 0
is created from T , in which each feature vector pair
x(i)
j and x(i)
k forms a new instance where j , k, and +1 is
assigned to the pair if y(i)
j is larger than y(i)
k otherwise 1.
It turns out that the training data T 0
is a data set of bi-
nary classification. A classification model like SVM can
be created. As explained in Section 1, although the pair-
of scores s is defined
Ps(⇡
where s⇡( j) is the scor
⇡.
Let us consider an ex
ing scores s = (s1, s
tions ⇡ = h1, 2, 3i an
lows:
Ps(⇡) =
(
(s1) + (
ments d(i0
)
)
from
)
)
ing to rank.
set T 0
x(i)
j and x(i)
k otherwise 1.
Ps(⇡
⇡.
lows:
Ps(⇡) =
(
(s1) + (
ments d(i0
)
)
from
)
)
ing to rank.
set T 0
x(i)
j and x(i)
k otherwise 1.
Ps(⇡
⇡.
lows:
Ps(⇡) =
(
(s1) + (

Permutation
Probability

ì  Objects
:
{A,B,C}
and
PermutaNons:
ABC,
ACB,
BAC,
BCA,
CAB,
CBA

ì  Suppose
Ranking
funcNon
that
assigns
scores
to
objects
sA,
sB
and
sC

ì  Permuta5on
Probabilty:
Likelihood
of
a
permutaNon

ì  P(ABC)
>
P(CBA)

if

sA
>
sB
>
sC

Top
One
Probability

ì  Objects
:
{A,B,C}
and
PermutaNons:
ABC,
ACB,
BAC,
BCA,
CAB,
CBA

ì  Suppose
Ranking
funcNon
that
assigns
scores
to
objects
sA,
sB
and
sC

ì  Top
one
probability
of
an
object
represents
the
probability
of
its

being
ranked
on
the
top,
given
the
scores
of
all
the
objects

ì  P(A)
=
P(ABC)
+
P(ACB)

ì  NoNce
that
in
order
to
calculate
n
top
one
probabiliNes,
we
sNll
need

to
calculate
n!
permutaNon
probabiliNes.

ì  P(A)
=
P(ABC)
+
P(ACB)

ì  P(B)
=
P(BAC)
+
P(BCA)

ì  P(C)
=
P(CBA)
+
P(CAB)

Listwise
Loss
Function

ì  With
the
use
of
top
one
probability,
given
two
lists
of
scores
we

can
use
any
metric
to
represent
the
distance
between
two

score
lists.

ì  For
example
when
we
use
Cross
Entropy
as
metric,
the
listwise

loss
funcNon
becomes

ì  Ground
Truth:
ABCD

vs.

Ranking
Output:
ACBD
or
ABDC

ListNet

ì  Learning
Method:
ListNet

ì  OpNmize
Listwise
Loss
funcNon
based
on
top
one
probability

with
Neural
Network
and
Gradient
Descent
as
opNmizaNon

algorithm.

ì  Linear
Network
Model
is
used
for
simplicity:
y
=
wTx
+
b

Ranking
Accuracy

ì  ListNet

vs.

RankNet,
RankingSVM,
RankBoost

ì  3
Datasets:
TREC
2003,
OHSUMED
and
CSearch

ì  TREC
2003:
Relevance
Judgments
(Relevant
and
Irrelevant),
20
features

extracted

ì  OHSUMED:
Relevance
Judgments
(Deﬁnitely
Relevant,
PosiNvely

Relevant

and
Irrelevant),
30
features

ì  CSearch:
Relevance
Judgments
from
4(‘Perfect
Match’)
to
0
(‘Bad

Match’),
600
features

ì  EvaluaNon
Measures:
Normalized
Discounted
CumulaNve
Gain

(NDCG)
and
Mean
Average
Precision(MAP)

Experiments

ì  NDCG@n
on
TREC

Experiments

ì  NDCG@n
on
OHSUMED

Experiments

ì  NDCG@n
on
CSearch

Conclusion

ì  Discussed

ì  Learning
to
Rank

ì  Pairwise
approach
and
its
drawbacks

ì  Listwise
Approach
outperforms
the
exisNng
Pairwise
Approaches

ì  EvaluaNon
of
the
Paper

ì  Linear
Neural
Network
model
is
used.
What
about
Non-‐Linear

model?

ì  Listwise
Loss
FuncNon
is
the
key
issue.(Probability
models)

References

ì  Zhe
Cao,
Tao
Qin,
Tie-‐Yan
Liu,
Ming-‐Feng
Tsai,
and
Hang
Li.
2007.

Learning
to
rank:
from
pairwise
approach
to
listwise
approach.

In
Proceedings
of
the
24th
interna(onal
conference
on
Machine

learning
(ICML
'07),
Zoubin
Ghahramani
(Ed.).
ACM,
New
York,

NY,
USA,
129-‐136.
DOI=10.1145/1273496.1273513
hap://
doi.acm.org/10.1145/1273496.1273513

ì  Hang
Li:
A
Short
Introduc5on
to
Learning
to
Rank.

IEICE
TransacNons
94-‐D(10):
1854-‐1862
(2011)

ì  Learning
to
Rank.
Hang
Li.
Microsow
Research
Asia.
ACL-‐IJCNLP

2009
Tutorial.
Aug.
2,
2009.
Singapore

Learning to Rank - From pairwise approach to listwise

Learning to Rank - From pairwise approach to listwise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Learning to Rank - From pairwise approach to listwise

Similar to Learning to Rank - From pairwise approach to listwise (20)

Recently uploaded

Recently uploaded (20)

Learning to Rank - From pairwise approach to listwise