Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

ACM RecSys 2013, Hong Kong

FABIO AIOLLI
UNIVERSITY OF PADOVA (ITALY)

Efficient Top-N Recommendation for
Very Large Scale Binary Rated Datasets

16/10/2013

F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

1

Abstract

 Very large datasets: n users, m items, both in the order of millions
 Top-N type of prediction
 Implicit feedback: only information about what people have already rated

Efficiency:
 Efficient MB-like scoring function tailored to implicit feedback that avoids
the computation of the whole m x m (n x n) similarity matrix
Effectiveness:
 Asymmetric similarity matrix
 Asymmetric scoring function
 Calibration
 Ranking Aggregation
16/10/2013


2

The MSD Challenge @kaggle
 Very large scale, music recommendation challenge
 Predict which songs a user will listen to given the listening history of the user
 Based on the MSD (Million Song Dataset), a freely available collection of

meta-data for one million of contemporary songs
 The challenge was actually based on a subset (Taste Profile Subset) of more

than 48 million rating pairs (user,song). Data consists of about 1.2 million
users and covers more than 380.000 songs
 User-Song matrix is very sparse (density 0.01%)
 153 teams participating
 We had full listening history for about 1M users, plus half of the listening

history for 110K users, for which we were required to predict the missing half
16/10/2013


3

Why do not use Matrix Factorization?
MF is recognized as a state-of-the-art technique in CF but…
 Model building is very expansive
 Regression setting does not match exactly the implicit setting
 Gradient descent issues: local minima and slow convergence rate

 Too many parameters to optimize (n + m) x k, very sparse matrix, and

no priori knowledge used -> overfitting
 …

MF based solutions provided by organizers at the beginning of the
challenge and MF based entries by other teams have shown really
poor results on this task.
16/10/2013


4

Memory-based Models
In standard memory based NN models the entire
matrix R is used to generate a prediction
Prediction is performed on-the-fly and no models have to
be constructed
 Independent predictions, can be easily parallelized!

cons

pros



Only few external parameters (lacks of flexibility)
 Needs the complete computation of similarities for every
user-user (or item-item) pair in order to compute NNs


16/10/2013


5

Memory-based Collaborative Filtering
We first define a modified version of the MB standard model tailored to
CF with implicit feedback as it uses rated information only
 User based:
 Item based:

q represents a locality parameter whose role is similar to taking the NNs. A
bigger q corresponds to a fewer nearest neighbors considered.
Note that, like in MF case, we can write

For each user, the N top-score items are recommended.
 User based: only U similarity computations (U = avg # of users x item)
 Item based: only I similarity computations (I = avg # of items x user)
16/10/2013


6

Asymmetric Cosine based CF
Given two variables and their (binary) vector representation, we
define:

AsymC has a probabilistic interpretation as an asymmetric
product of conditionals

16/10/2013


7

Locality effect: Item-based
IS (α=0)

mAP@500

q=1

0.12224

q=2

0.16581

q=3

0.17144

q=4

0.17004

q=5

0.16830

IS (α=1/2)

mAP@500

q=1

0.16439

q=2

0.16214

q=3

0.15587

q=4

0.15021

q=5

0.14621

16/10/2013


8

Locality effect: User-based
US (α=0)

mAP@500

q=3

0.12479

q=4

0.13289

q=5

0.13400

q=6

0.13187

q=7

0.12878

US (α=1/2)

mAP@500

q=3

0.12532

q=4

0.13779

q=5

0.14355

q=6

0.14487

q=7

0.14352

16/10/2013


9

AsymC similarity effect

item-based
varying α

16/10/2013

user-based
varying α


10

Asymmetric Scoring function

User
based

Item
based

Unfortunately, the norm of the weights term is inefficient to compute exactly and,
whenever the number of items is very large, we suggest to estimated it from data
16/10/2013


11

AsymC scoring effect on user based
recommandation

US, α=0. 5

Best β

mAP@500

q=1

0.07679

0.3

0.14890

q=2

0.10436

0.5

0.15801

q=3

0.12532

0.6

0.16132

q=4

0.13779

0.7

0.16229

q=5

0.14355

0.8

0.16152

q=6

0.14487

0.9

0.15975

q=7

16/10/2013

mAP@500

0.14352

0.9

0.15658


12

Calibration
Analyses of predicted score when items are actually rated on the
training set
Different items could be map on different scales

16/10/2013


13

Calibration
The scores are calibrated by a simple piece-wise linear function
1.0

0.5

type

parameters

uncalibrated

calibrated

IS

mAP@500=0.1773 mAP@500=0.1811*

US

mAP@500=0.1623 mAP@500=0.1649

16/10/2013


14

Ranking Aggregation
 Assuming that



the strategies are precision oriented, meaning that each one tends to make
good recommendations for songs on which they are more confident
different strategies are diverse and can recommend different songs

.. then aggregating different rankings can improve the results
 Aggregating item-based and user-based strategies





Stochastic aggregation: recommended items are chosen stochastically
from the lists
Linear aggregation: recommended items are chosen based on a
combination of scores on different lists
Borda aggregation: recommended items are chosen based on a variant of
the Borda Count algorithm

 More details in the paper!
16/10/2013


15

Ranking aggregation results
IS, α=0.15,q=3

mAP@500

0.0

1.0

0.14098

0.1

0.9

0.14813

0.2

0.8

0.15559

0.3

0.7

0.16248

0.4

0.6

0.16859

0.5

0.5

0.17362

0.6

0.4

0.17684

0.7

0.3

0.17870

0.8

0.2

0.17896

0.9

0.1

0.17813

1.0
16/10/2013

US, α=0.3,q=5

0.0

1.17732


16

Final MSD Challenge Results
RANK

TEAM NAME

mAP@500

1

Aio

0.17910

2

Learner

0.17196

3

Nohair

0.15892

4

Team Ubuntu

0.15695

5

TheMiner

0.15639

…

…

…

135

Songs by
Popularity

0.02079

Random

0.00002

…
151

16/10/2013


17

Final discussion
 Best ranked teams all used approaches based on CF
 The 2-nd ranked team used an approach similar to ours to create a set of features

to use in a learning to rank algorithm
 The 5-th ranked team used the Absorption algorithm by YouTube (graph based,
random walks) to get their best pubblic score
 Based on mine and other participant’s opinion and experiments



Metadata did not help (given the very large implicit info contained in user history is
much more than explicit info in metadata)
Matrix factorization did not help too

 Additional experiments on the MovieLens1M dataset can be found in the paper
 In the future we want to study more on how to exploit rich metadata

information, especially in a cold start setting
16/10/2013


18

Thank you! Questions are welcome

The MSD competition (info and data):
http://www.kaggle.com/c/msdchallenge
Python code I used for the challenge can be found in
http://www.math.unipd.it/~aiolli/CODE/MSD/

Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Recommended

Recommended

More Related Content

Similar to Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Similar to Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets (20)

Recently uploaded

Recently uploaded (20)

Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets