Collaborative Filtering Survey

4. Related Works on CF User-based algorithm Based on user similarities Item-based algorithm Based on item similarities K-means clustering algorithm Model on user ratings

5. Challenges with CF Algorithms Accuracy within the recommendation Users should be happy with the suggestions Scalability Algorithms face with performance problems once the data size increases

6. User Based & Item Based Approaches UB and IB are both one-sided approaches. (ignore the duality between users and items)

7. Problems of UB and IB UB and IB is not scalable for very large datasets UB and IB cannot detect partial matching. (they just find the less dissimilar users/items) Users would have negative similarity in UB and IB. Partial matching is missed.

8. Problems of K-Means Algorithm K-means and H-clustering algorithms again ignore the duality of data. (one sided approach)

9. What is Different in NBCF? Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously. A nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences.

10. Steps in NBCF Step 1 The data preprocessing step (optional) Step 2 The biclustering process Step 3 The nearest-biclusters algorithm

11. Example Training Data Set Test Data Set

12. Step 1 Training dataset with Pt>2 Binary discretization of the Training Set.

13. Step 2 (Bimax Clustering) • Four biclusters found. • overlapping between biclusters • well-tuning of overlapping. Min. number of users : 2 Min. number of items : 2

14. Precision is the ratio of R to N. Recall is the ratio of R to the total number of relevant items for the test user (all items rated higher than Pτ by him). #users & #items in Bicluster F1 = 2 · recall · precision (recall + precision)

15. Step 3 – Part 1 To find the k-nearest biclusters of a test user: We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1].

16. Step 3 – Part 2 To generate the top-N recommendation list : Weighted Frequency (WF) of an item i in a biclusterb is the product between |Ub| and the similarity measure sim(u,b) Weight the contribution of each bicluster with its size, in addition to its similarity with the test user.

17. Results of the Example All four biclusters with 2 nearest biclusters (k = 2) U9 has rated positively only two items (I1,I3). Similarity with each of the biclusters is (0.5, 0.5, 0, 0), respectively. Thus, nearest neighbors come from ﬁrst 2 biclusters Recommended items : I7 and I5.

18. Netflix Contest Any algorithm provides 10% better prediction than Cinematch wins $1M AT&T Lab Researches In 6 weeks 5% First year 8.6% Second year 9.4% Third year (adding 2 new teams) 10.06% Sept 2009 How? Taking the average of 800 diff algorithms (150 pages)

19. Solution that I Liked Train the dataset with available different algorithms and pick the best one!

20.

21. user-average

22.

23. user-kNN-pearson

24. user-kNN-cosine

25. item-kNN-pearson

26. item-kNN-cosine

27. item-attribute-kNN

28. user-item-baseline 1 4 DB WEB

29. Thank you … If you want to try the system yourself, visit ewenty.com References http://www.youtube.com/watch?v=ImpV70uLxyw