(’06 Paper Symeonidis, Nanopoulos, Papadopoulos)Nearest bi-clusters collaborative filtering<br />Presenter: SarpCoskun<br />
Outline<br />What is CF?<br />What does NBCF provide unique?<br />How does NBCF work with example?<br />My Implementation ...
What is Collaborative Filtering (CF)?<br />CF is a successful recommendation technique<br />CF helps a customer to find wh...
Related Works on CF<br />User-based algorithm<br />Based on user similarities<br />Item-based algorithm<br />Based on item...
Challenges with CF Algorithms<br />Accuracy within the recommendation<br />Users should be happy with the suggestions<br /...
User Based & Item Based Approaches<br />UB and IB are both one-sided approaches. (ignore the duality between users and ite...
Problems of UB and IB<br />UB and IB is not scalable for very large datasets<br />UB and IB cannot detect partial matching...
Problems of K-Means Algorithm<br />K-means and H-clustering algorithms again ignore the duality of data. (one sided approa...
What is Different in NBCF?<br />Biclustering to disclose the duality between users and items by grouping them in both dime...
Steps in NBCF<br />Step 1 The data preprocessing step (optional)<br />Step 2 The biclustering process<br />Step 3 The near...
Example<br />Training Data Set<br />Test Data Set<br />
Step 1<br />Training dataset with Pt>2<br />Binary discretization of the Training Set.<br />
Step 2 (Bimax Clustering)<br />• Four biclusters found. <br />• overlapping between biclusters<br />• well-tuning of overl...
Precision is the ratio of R to N.<br />Recall is the ratio of R to the total number of relevant items for the test user (a...
Step 3 – Part 1<br />To find the k-nearest biclusters of a test user:<br />We divide items they have in common to the sum ...
Step 3 – Part 2<br />To generate the top-N recommendation list :<br />Weighted Frequency (WF) of an item i in a biclusterb...
Results of the Example<br />All four biclusters with 2 nearest biclusters (k = 2)<br />U9 has rated positively only two it...
Netflix Contest<br />Any algorithm provides 10% better prediction than Cinematch wins $1M<br />AT&T Lab Researches<br />In...
Solution that I Liked<br />Train the dataset with available different algorithms and pick the best one!<br />
How The System Works?<br />Best Method <br />Test Data<br />Training Data<br />2<br />3<br />RMSE <br />(root mean square ...
 user-average
Upcoming SlideShare
Loading in …5
×

Collaborative Filtering Survey

1,185 views

Published on

How collaborative filtering is applied to ewenty.com?

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,185
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Collaborative Filtering Survey

  1. 1. (’06 Paper Symeonidis, Nanopoulos, Papadopoulos)Nearest bi-clusters collaborative filtering<br />Presenter: SarpCoskun<br />
  2. 2. Outline<br />What is CF?<br />What does NBCF provide unique?<br />How does NBCF work with example?<br />My Implementation demo<br />
  3. 3. What is Collaborative Filtering (CF)?<br />CF is a successful recommendation technique<br />CF helps a customer to find what s/he interested in.<br />
  4. 4. Related Works on CF<br />User-based algorithm<br />Based on user similarities<br />Item-based algorithm<br />Based on item similarities<br />K-means clustering algorithm<br />Model on user ratings<br />
  5. 5. Challenges with CF Algorithms<br />Accuracy within the recommendation<br />Users should be happy with the suggestions<br />Scalability<br />Algorithms face with performance problems once the data size increases<br />
  6. 6. User Based & Item Based Approaches<br />UB and IB are both one-sided approaches. (ignore the duality between users and items)<br />
  7. 7. Problems of UB and IB<br />UB and IB is not scalable for very large datasets<br />UB and IB cannot detect partial matching. (they just find the less dissimilar users/items)<br />Users would have negative similarity in UB and IB. Partial matching is missed.<br />
  8. 8. Problems of K-Means Algorithm<br />K-means and H-clustering algorithms again ignore the duality of data. (one sided approach)<br />
  9. 9. What is Different in NBCF?<br />Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously.<br />A nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences.<br />
  10. 10. Steps in NBCF<br />Step 1 The data preprocessing step (optional)<br />Step 2 The biclustering process<br />Step 3 The nearest-biclusters algorithm<br />
  11. 11. Example<br />Training Data Set<br />Test Data Set<br />
  12. 12. Step 1<br />Training dataset with Pt>2<br />Binary discretization of the Training Set.<br />
  13. 13. Step 2 (Bimax Clustering)<br />• Four biclusters found. <br />• overlapping between biclusters<br />• well-tuning of overlapping.<br />Min. number of users : 2<br />Min. number of items : 2<br />
  14. 14. Precision is the ratio of R to N.<br />Recall is the ratio of R to the total number of relevant items for the test user (all items rated higher than Pτ by him).<br />#users & #items in Bicluster<br />F1 = 2 · recall · precision<br /> (recall + precision)<br />
  15. 15. Step 3 – Part 1<br />To find the k-nearest biclusters of a test user:<br />We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1].<br />
  16. 16. Step 3 – Part 2<br />To generate the top-N recommendation list :<br />Weighted Frequency (WF) of an item i in a biclusterb is the product between |Ub| and the similarity measure sim(u,b)<br />Weight the contribution of each bicluster with its size, in addition to its similarity with the test user.<br />
  17. 17. Results of the Example<br />All four biclusters with 2 nearest biclusters (k = 2)<br />U9 has rated positively only two items (I1,I3). <br />Similarity with each of the biclusters is <br />(0.5, 0.5, 0, 0), respectively. <br />Thus, nearest neighbors come from first 2 biclusters<br />Recommended items : I7 and I5.<br />
  18. 18. Netflix Contest<br />Any algorithm provides 10% better prediction than Cinematch wins $1M<br />AT&T Lab Researches<br />In 6 weeks 5%<br />First year 8.6% <br />Second year 9.4%<br />Third year (adding 2 new teams) 10.06% Sept 2009<br />How?<br />Taking the average of 800 diff algorithms (150 pages)<br />
  19. 19. Solution that I Liked<br />Train the dataset with available different algorithms and pick the best one!<br />
  20. 20. How The System Works?<br />Best Method <br />Test Data<br />Training Data<br />2<br />3<br />RMSE <br />(root mean square error) <br />MAE <br />(mean absolute error)<br /><ul><li> global-average
  21. 21. user-average
  22. 22. item-average</li></ul>- SocialMF<br />- matrix-factorization <br /><ul><li> biased-matrix-factorization
  23. 23. user-kNN-pearson
  24. 24. user-kNN-cosine
  25. 25. item-kNN-pearson
  26. 26. item-kNN-cosine
  27. 27. item-attribute-kNN
  28. 28. user-item-baseline </li></ul>1<br />4<br />DB<br />WEB<br />
  29. 29. Thank you …<br />If you want to try the system yourself, visit ewenty.com<br />References<br />http://www.youtube.com/watch?v=ImpV70uLxyw<br />

×