Collaborative Filtering Survey

(’06 Paper Symeonidis, Nanopoulos, Papadopoulos)Nearest bi-clusters collaborative filteringPresenter: SarpCoskun

OutlineWhat is CF?What does NBCF provide unique?How does NBCF work with example?My Implementation demo

What is Collaborative Filtering (CF)?CF is a successful recommendation techniqueCF helps a customer to find what s/he interested in.

Related Works on CFUser-based algorithmBased on user similaritiesItem-based algorithmBased on item similaritiesK-means clustering algorithmModel on user ratings

Challenges with CF AlgorithmsAccuracy within the recommendationUsers should be happy with the suggestionsScalabilityAlgorithms face with performance problems once the data size increases

User Based & Item Based ApproachesUB and IB are both one-sided approaches. (ignore the duality between users and items)

Problems of UB and IBUB and IB is not scalable for very large datasetsUB and IB cannot detect partial matching. (they just find the less dissimilar users/items)Users would have negative similarity in UB and IB. Partial matching is missed.

Problems of K-Means AlgorithmK-means and H-clustering algorithms again ignore the duality of data. (one sided approach)

What is Different in NBCF?Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously.A nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences.

Steps in NBCFStep 1 The data preprocessing step (optional)Step 2 The biclustering processStep 3 The nearest-biclusters algorithm

ExampleTraining Data SetTest Data Set

Step 1Training dataset with Pt>2Binary discretization of the Training Set.

Step 2 (Bimax Clustering)• Four biclusters found. • overlapping between biclusters• well-tuning of overlapping.Min. number of users : 2Min. number of items : 2

Precision is the ratio of R to N.Recall is the ratio of R to the total number of relevant items for the test user (all items rated higher than Pτ by him).#users & #items in BiclusterF1 = 2 · recall · precision (recall + precision)

Step 3 – Part 1To find the k-nearest biclusters of a test user:We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1].

Step 3 – Part 2To generate the top-N recommendation list :Weighted Frequency (WF) of an item i in a biclusterb is the product between |Ub| and the similarity measure sim(u,b)Weight the contribution of each bicluster with its size, in addition to its similarity with the test user.

Results of the ExampleAll four biclusters with 2 nearest biclusters (k = 2)U9 has rated positively only two items (I1,I3). Similarity with each of the biclusters is (0.5, 0.5, 0, 0), respectively. Thus, nearest neighbors come from ﬁrst 2 biclustersRecommended items : I7 and I5.

Netflix ContestAny algorithm provides 10% better prediction than Cinematch wins $1MAT&T Lab ResearchesIn 6 weeks 5%First year 8.6% Second year 9.4%Third year (adding 2 new teams) 10.06% Sept 2009How?Taking the average of 800 diff algorithms (150 pages)

Solution that I LikedTrain the dataset with available different algorithms and pick the best one!

How The System Works?Best Method Test DataTraining Data23RMSE (root mean square error) MAE (mean absolute error) global-average

Collaborative Filtering Survey

More Related Content

What's hot

Viewers also liked

Similar to Collaborative Filtering Survey

Recently uploaded

Collaborative Filtering Survey