(’06 Paper Symeonidis, Nanopoulos, Papadopoulos)Nearest bi-clusters collaborative filteringPresenter: SarpCoskun
OutlineWhat is CF?What does NBCF provide unique?How does NBCF work with example?My Implementation demo
What is Collaborative Filtering (CF)?CF is a successful recommendation techniqueCF helps a customer to find what s/he interested in.
Related Works on CFUser-based algorithmBased on user similaritiesItem-based algorithmBased on item similaritiesK-means clustering algorithmModel on user ratings
Challenges with CF AlgorithmsAccuracy within the recommendationUsers should be happy with the suggestionsScalabilityAlgorithms face with performance problems once the data size increases
User Based & Item Based ApproachesUB and IB are both one-sided approaches. (ignore the duality between users and items)
Problems of UB and IBUB and IB is not scalable for very large datasetsUB and IB cannot detect partial matching. (they just find the less dissimilar users/items)Users would have negative similarity in UB and IB. Partial matching is missed.
Problems of K-Means AlgorithmK-means and H-clustering algorithms again ignore the duality of data. (one sided approach)
What is Different in NBCF?Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously.A nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences.
Steps in NBCFStep 1 The data preprocessing step (optional)Step 2 The biclustering processStep 3 The nearest-biclusters algorithm
ExampleTraining Data SetTest Data Set
Step 1Training dataset with Pt>2Binary discretization of the Training Set.
Step 2 (Bimax Clustering)• Four biclusters found. • overlapping between biclusters• well-tuning of overlapping.Min. number of users : 2Min. number of items : 2
Precision is the ratio of R to N.Recall is the ratio of R to the total number of relevant items for the test user (all items rated higher than Pτ by him).#users & #items in BiclusterF1 = 2 · recall · precision	  (recall + precision)
Step 3 – Part 1To find the k-nearest biclusters of a test user:We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1].
Step 3 – Part 2To generate the top-N recommendation list :Weighted Frequency (WF) of an item i in a biclusterb is the product between |Ub| and the similarity measure sim(u,b)Weight the contribution of each bicluster with its size, in addition to its similarity with the test user.
Results of the ExampleAll four biclusters with 2 nearest biclusters (k = 2)U9 has rated positively only two items (I1,I3). Similarity with each of the biclusters is (0.5, 0.5, 0, 0), respectively. Thus, nearest neighbors come from first 2 biclustersRecommended items : I7 and I5.
Netflix ContestAny algorithm provides 10% better prediction than Cinematch wins $1MAT&T Lab ResearchesIn 6 weeks 5%First year 8.6% Second year 9.4%Third year (adding 2 new teams) 10.06% Sept 2009How?Taking the average of 800 diff algorithms (150 pages)
Solution that I LikedTrain the dataset with available different algorithms and pick the best one!
How The System Works?Best Method Test DataTraining Data23RMSE (root mean square error) MAE (mean absolute error) global-average
 user-average

Collaborative Filtering Survey

  • 1.
    (’06 Paper Symeonidis,Nanopoulos, Papadopoulos)Nearest bi-clusters collaborative filteringPresenter: SarpCoskun
  • 2.
    OutlineWhat is CF?Whatdoes NBCF provide unique?How does NBCF work with example?My Implementation demo
  • 3.
    What is CollaborativeFiltering (CF)?CF is a successful recommendation techniqueCF helps a customer to find what s/he interested in.
  • 4.
    Related Works onCFUser-based algorithmBased on user similaritiesItem-based algorithmBased on item similaritiesK-means clustering algorithmModel on user ratings
  • 5.
    Challenges with CFAlgorithmsAccuracy within the recommendationUsers should be happy with the suggestionsScalabilityAlgorithms face with performance problems once the data size increases
  • 6.
    User Based &Item Based ApproachesUB and IB are both one-sided approaches. (ignore the duality between users and items)
  • 7.
    Problems of UBand IBUB and IB is not scalable for very large datasetsUB and IB cannot detect partial matching. (they just find the less dissimilar users/items)Users would have negative similarity in UB and IB. Partial matching is missed.
  • 8.
    Problems of K-MeansAlgorithmK-means and H-clustering algorithms again ignore the duality of data. (one sided approach)
  • 9.
    What is Differentin NBCF?Biclustering to disclose the duality between users and items by grouping them in both dimensions simultaneously.A nearest-biclusters CF algorithm which uses a new similarity measure to achieve partial matching of users’ preferences.
  • 10.
    Steps in NBCFStep1 The data preprocessing step (optional)Step 2 The biclustering processStep 3 The nearest-biclusters algorithm
  • 11.
  • 12.
    Step 1Training datasetwith Pt>2Binary discretization of the Training Set.
  • 13.
    Step 2 (BimaxClustering)• Four biclusters found. • overlapping between biclusters• well-tuning of overlapping.Min. number of users : 2Min. number of items : 2
  • 14.
    Precision is theratio of R to N.Recall is the ratio of R to the total number of relevant items for the test user (all items rated higher than Pτ by him).#users & #items in BiclusterF1 = 2 · recall · precision (recall + precision)
  • 15.
    Step 3 –Part 1To find the k-nearest biclusters of a test user:We divide items they have in common to the sum of items they have in common and the number of items they differ. Similarity values range between [0,1].
  • 16.
    Step 3 –Part 2To generate the top-N recommendation list :Weighted Frequency (WF) of an item i in a biclusterb is the product between |Ub| and the similarity measure sim(u,b)Weight the contribution of each bicluster with its size, in addition to its similarity with the test user.
  • 17.
    Results of theExampleAll four biclusters with 2 nearest biclusters (k = 2)U9 has rated positively only two items (I1,I3). Similarity with each of the biclusters is (0.5, 0.5, 0, 0), respectively. Thus, nearest neighbors come from first 2 biclustersRecommended items : I7 and I5.
  • 18.
    Netflix ContestAny algorithmprovides 10% better prediction than Cinematch wins $1MAT&T Lab ResearchesIn 6 weeks 5%First year 8.6% Second year 9.4%Third year (adding 2 new teams) 10.06% Sept 2009How?Taking the average of 800 diff algorithms (150 pages)
  • 19.
    Solution that ILikedTrain the dataset with available different algorithms and pick the best one!
  • 20.
    How The SystemWorks?Best Method Test DataTraining Data23RMSE (root mean square error) MAE (mean absolute error) global-average
  • 21.