Incremental Item-based Collaborative Filtering

Incremental Item-based
Collaborative Filtering

João Marques da Silva
Palco Workshop - May 13, 2009

Item Similarity

Clã Xutos Gift DaWeasel
Ana 1 1 0 0
Miguel 1 1 1 0
Ivo 0 1 0 1
Paula 0 0 1 0
Joana 1 0 0 0

Take columns as vectors:
v =1,1,0 ,0 ,1
Clã and v =0,1 ,0 ,1,0
Gift

Similarity between Clã and Gift (cosine measure):
v . v
Clã Gift
sim Clã , Gift =cos  v , v =
Clã Gift ≃0.16
∥v ∥∗∥v ∥
Clã Gift

2

Similarity Matrix
S matrix
MxM, with M = nº of items

Clã Xutos Gift DaWeasel
Clã 1
Xutos ... 1
Gift 0.16 ... 1
DaWeasel 0 ... 0 1

How do we keep S up-to-date?

 Rebuild S at each new session:
O(m2n) for m items and n users.
 Incrementally update S with session data:
O(km) for k items in session.
3

Algorithm
Cosine measure for binary ratings:
#  I ∩J 
cos  ,  =
i j I , J are the sets of users that rated items i , j
 # I × # J
A cache matrix Int stores #(I ∩ J) for all item pairs (i,j):
Inti,j = #(I ∩ J)
Inti,i = #I

For each new session:
 Increment Inti,j by 1 for each item pair (i,j) in session

For each item in session update corresponding row/col in S:
Int i , .
S i ,.=
 Int i , i ×  Int . , . 4

Forgetting

 Usage and content change!
 News content quickly becomes obsolete
 Music/Movies/Books - popularity is often volatile

 How can CF adapt to change?
 Forget older data
 Two methods: sliding windows and fading
factors

5

Forgetting: Sliding Windows

Sliding Windows

window
Session weight

length

Data in window Current Session

Session index

Good for non-incremental:
Rebuild S with data in window. 6

Forgetting: Fading Factors

Fading Factors
Session weight

Current Session

Session index

Good for incremental. Before updating S:
S = αS , 0 < α < 1
α=1 is the non-fading factor
7

Implementation

 Implementation in R
 Code available from previous work (C. Miranda)
 Adapt algorithms to use forgetting mechanisms
 Improvements: sparse matrix handling
 Limitations with R: speed

8

Experiments

 Aims
 Forgetting – is it useful?
 Sliding windows vs fading factors
 Item-based better than user-based?

 Evaluation method
 All-but-one protocol (training, test and hidden sets)
 Artificial disturbances in datasets
 Accuracy: precision/recall (binary ratings)
9

Experiments: datasets

2 sequential datasets:
Dataset Origin # sessions # items
PALCO* Palco principal 725 1285
ART* Artificial 1500 4

PALCO: Listened tracks in Palcoprincipal
ART: dataset with abrupt change
{a,b,c} → {a,b,d} at session 500

10

Results (so far)
 Matrix update time
 Update time < Rebuild time
 Item-based better for #users > #items
 PALCO: user-based performs better
 Non-incremental good with small windows
 Recommendation time
 Item-based is faster
 Recovery from drifts

ART: α<1 recovers faster than α=1 (as expected)

PALCO: α=1 still better even with 90% drift! 11

Accuracy IBFF w/ ART

12

Accuracy UBSW, UBFF w/ PALCO

13

Issues

 Forgetting
 Not good for PALCO dataset?
 Good with ART dataset, but not realistic
 Other datasets (ex: news)?
 Long term effects → larger scale experiments
 Better hardware - on the way
 Other implementations (Java, C, SQL…)
 Palcoprincipal
 More items than users!
 Item-based possibly better for artist recommendations.
14

Incremental Item-based Collaborative Filtering

Recommended

Recommended

More Related Content

Similar to Incremental Item-based Collaborative Filtering

Similar to Incremental Item-based Collaborative Filtering (20)

Incremental Item-based Collaborative Filtering