1. The less is more binary ranking: ClimF
The less is more binary ranking: ClimF
Xudong Sun,sun@aisbi.de
DSOR-AISBI
2. The less is more binary ranking: ClimF
Outline
1 Introduction
3. The less is more binary ranking: ClimF
Introduction
Objective functions in Recommendation system
Reciprocal rank: Capture how early get relevant result
Mean Average Precision
4. The less is more binary ranking: ClimF
Introduction
Mean Average Precision
Trade o between Precision and Recall
Recall at 5 numofhitsinthetop5list
numitemstheuserlike
Recall at num of items the user like=?
Recall at n - Recall at (n-1)=?
Precision at=numofhitsinthetop5list
numitems
Average Precision: precision-recall response curve p(r) p(k) :- r(k)
AveP= n
k=1
P(k)δr(k)
note that δr(k) = 1
numitemstheuserlike if k th item is hit,
otherwise δr(k) = 0
so AveP =
n
k=1 P(k)rel(k)
numitemstheuserlike where rel(k) is a indicator variable
denoting whether k th term is a hit
Mean Average Precision:
N
i=1 AveP(qi )
N
5. The less is more binary ranking: ClimF
Introduction
Mean Reciprocal rank
Reciprocal rank= 1
rankofhighestrelevanthit
best value is 1, when is the worst value?
relationship with MAP?
Mean Reciprocal Rank MRR = 1
N
N
i=1
1
ranki
,suppose we have N
queries as an evaluation set.
1
MRR harmonic mean of the rank
relationship with MAP?
6. The less is more binary ranking: ClimF
Introduction
Smoothing the reciprocal rank
RRi = N
j=1
Yij
Ri,j
N
k=1
(1 − YikI(Rik Rij ))
Yij indicate whether user i like item j
N is total number of items
Rij : rank of item j in user i's recommended list by relevance
score,the lower, the better.
I(Rik Rij ) is true when item k is more relevant then j
when Yik = 1 and RRik Rij , ie item k is relevant to user i,
and item j is has a lower predicted anity with user i than k.
The concatenated product is 0. So in order for one item j to
be taken into consideration, it should be the highest ranked
item according to the predicative anity function. So this is
equivalent to only considering the highest ranked item for the
user.
7. The less is more binary ranking: ClimF
Introduction
Approximating reciprocal rank
−6 −4 −2 0 2 4 6
0
0.2
0.4
0.6
0.8
1
fik − fij
I(RikRij)=g(fik−fij)=1
1+e
−(fik−fij)
I(Rik Rij ) = g(fik − fij )
1
Rik
= g(fik), actually, Rik is
not a number ,but here we
dene it to be a number,
which is consistent for our
ranking comparison.
8. The less is more binary ranking: ClimF
Introduction
RRi = N
j=1
Yij
Ri,j
N
k=1
(1 −
YikI(Rik Rij ))
I(Rik Rij ) = g(fik − fij )
1
Rik
= g(fik), actually, Rik is
not a number ,but here we
dene it to be a number,
which is consistent for our
ranking comparison.
RRi = N
j=1
Yij g(fi,j ) N
k=1
(1−
Yikg(fik − fij )) where
fik = Ui , Vk How many
manipulations we need to
calculate the derivative with
respect to latent item factor?
9. The less is more binary ranking: ClimF
Introduction
approximating smoothed reciprocal ranking
Ui , V = argmax
Ui ,V
{RRi } = argmax
Ui ,V
{ln( 1
n+
i
RRi )} =
argmax
Ui ,V
{ln( N
j=1
Yij
n+
i
g(fi,j)
N
k=1
(1 − Yikg(fik − fij )))}
dene n+ − i = N
l=1
Yil
10. The less is more binary ranking: ClimF
Introduction
Deriving lower bound for smoothed reciprocal ranking
Convex transform φ( n
i=1
λi xi ) = n
i=1
λi φ(xi ) Jenson
inequality: log(
n
i=1 xi
n ) =
n
i=1 log(xi )
n
−6 −4 −2 0 2 4 6
−10
0
10
20
30
f(x)=x2
−x+4
11. The less is more binary ranking: ClimF
Introduction
derivate lower bound for objective function
note that N
j
Yij
n+
i
= 1 which is the Jenson coecient
ln( N
j=1
Yij
n+
i
g(fi,j ) N
k=1
(1 − Yikg(fik − fij ))) =
1
n+
i
N
j=1
Yij ln(g(fi,j ) N
k=1
(1 − Yikg(fik − fij )) =
1
n+
i
N
j=1
Yij (ln(g(fi,j ) + ln( N
k=1
(1 − Yikg(fik − fij ))) =
1
n+
i
N
j=1
Yij (ln(g(fi,j ) + N
k=1
ln((1 − Yikg(fik − fij )))
If an item is relevant, the
Ui , Vj should be all very
big
In all the relevant items, only
one relevant items excel, others
are suppressed.
12. The less is more binary ranking: ClimF
Introduction
New Objective function
F(U, V ) = M
i=1
1
n+
i
N
j=1
Yij (ln(g(fi,j ) + N
k=1
ln((1 − Yikg(fik −
fij ))) + regTerm = M
i=1
1
n+
i
N
j=1
Yij (ln(g(UT
i Vj ) + N
k=1
ln((1 −
Yikg(UT
i Vk − UT
i Vj ))) − λ
2
(||U||2
+ ||V ||2
)
13. The less is more binary ranking: ClimF
Introduction
Gradient Optimization
properties of sigmoid function
g (x) = g(x)(1 − g(x)) = g(x)g(−x) ie. g(−x) = g (x)
g(x)
F(U, V ) = 1
n+
i
M
i=1
N
j=1
Yij [ln(g(UT
i Vj ) + N
k=1
ln(1 −
Yikg(UT
i Vk − UT
i Vj ))] − λ
2
(||U||2
+ ||V ||2
)
∂F(U,V )
∂Ui
= M
i=1
1
n+
i
N
j=1
Yij [(g(−UT
i Vj )Vj +
N
k=1
Yik g (fik −fij )
(1−Yik g(UT
i Vk −UT
i Vj ))
(Vj − Vk)] − λUi
14. The less is more binary ranking: ClimF
Introduction