An Efficient Collaborative Recommender System                   based on k -separability       Georgios Alexandridis       ...
Outline  1    Current Trends in Recommender Systems         Recommender Systems         Design Issues  2    Theoretical & ...
What are the Recommender Systems?          Recommender Systems attempt to present information items (e.g.          movies,...
What are the Recommender Systems?          Recommender Systems attempt to present information items (e.g.          movies,...
Taxonomy of Recommender Systems          Criterion: How are the predictions made?                  Content-Based Recommend...
Taxonomy of Recommender Systems          Criterion: How are the predictions made?                  Content-Based Recommend...
Collaborative Recommender Systems          Key Component: The User Ratings’ MatrixAlexandridis, Siolas, Stafylopatis (NTUA...
Collaborative Recommender Systems          Key Component: The User Ratings’ Matrix          Ratings                  Indic...
Collaborative Recommender Systems          Key Component: The User Ratings’ Matrix          Ratings                  Indic...
Collaborative Recommender Systems          Key Component: The User Ratings’ Matrix          Ratings                  Indic...
Challanges in Collaborative Recommender System Design     1    The cold-start problem     2    The sparsity problemAlexand...
Challanges in Collaborative Recommender System Design     1    The cold-start problem                  Recommendations can...
Challanges in Collaborative Recommender System Design     1    The cold-start problem                  Recommendations can...
"Noisy" Datasets          The added noise in the dataset hinders the discovery of patterns          in data               ...
"Noisy" Datasets          The added noise in the dataset hinders the discovery of patterns          in data               ...
"Noisy" Datasets          The added noise in the dataset hinders the discovery of patterns          in data               ...
"Noisy" Datasets            The added noise in the dataset hinders the discovery of patterns            in data           ...
Extending linear separability to 3-separability The 2-bit XOR problem          A highly non-separable dataset          It ...
Extending linear separability to 3-separability The 2-bit XOR problem          A highly non-separable dataset          It ...
Extending linear separability to 3-separability The 2-bit XOR problem          A highly non-separable dataset          It ...
Generalizing to k -separability          Complex Datasets                  Combine the output of two neurons (or more . . ...
Generalizing to k -separability          Complex Datasets                  Combine the output of two neurons (or more . . ...
Generalizing to k -separability          Complex Datasets                  Combine the output of two neurons (or more . . ...
Implementation Details          The index of separability (k ) is not known a-priori                  Setting k to a fixed ...
Implementation Details          The index of separability (k ) is not known a-priori                  Setting k to a fixed ...
Implementation Details          The index of separability (k ) is not known a-priori                  Setting k to a fixed ...
Implementation Details          The index of separability (k ) is not known a-priori                  Setting k to a fixed ...
Implementation Details          The index of separability (k ) is not known a-priori                  Setting k to a fixed ...
Constructive Network Algorithm     1    Create a minimal architecture     2    Train the network in two phases on the whol...
Constructive Network Algorithm     1    Create a minimal architecture     2    Train the network in two phases on the whol...
Our Collaborative Recommender System          Input: The user ratings’ matrix and the target userAlexandridis, Siolas, Sta...
Our Collaborative Recommender System          Input: The user ratings’ matrix and the target user          Output: A model...
Our Collaborative Recommender System          Input: The user ratings’ matrix and the target user          Output: A model...
Experiment The MovieLens Database        Contains the ratings of 943 users on        1682 movies        Sparse matrix (6.3...
Experiment Test Sets & Metrics          Many users rate only a few movies. How would our system          perform?         ...
Experiment Test Sets & Metrics          Many users rate only a few movies. How would our system          perform?         ...
Experiment Test Sets & Metrics          Many users rate only a few movies. How would our system          perform?         ...
Experiment Test Sets & Metrics          Many users rate only a few movies. How would our system          perform?         ...
Experiment Test Sets & Metrics          Many users rate only a few movies. How would our system          perform?         ...
Results                                            Table: Performance Results                      Methodology            ...
Results                                            Table: Performance Results                      Methodology            ...
Conclusions          We have presented a complete Collaborative Recommender          System that is specifically fit for tho...
Upcoming SlideShare
Loading in …5
×

k-Separability Presentation

605 views
526 views

Published on

Presentation Slides - 20th ICANN (2010)

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
605
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

k-Separability Presentation

  1. 1. An Efficient Collaborative Recommender System based on k -separability Georgios Alexandridis Georgios Siolas Andreas Stafylopatis Department of Electrical and Computer Engineering National Technical University of Athens 20th International Conference on Artificial Neural Networks (ICANN 2010)Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 1 / 16
  2. 2. Outline 1 Current Trends in Recommender Systems Recommender Systems Design Issues 2 Theoretical & Practical Aspects of our Contribution k-Separability System Architecture 3 Evaluating our System Experiment Results ConclusionsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 2 / 16
  3. 3. What are the Recommender Systems? Recommender Systems attempt to present information items (e.g. movies, music, books, news stories) that are likely to be of interest to the user.Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 3 / 16
  4. 4. What are the Recommender Systems? Recommender Systems attempt to present information items (e.g. movies, music, books, news stories) that are likely to be of interest to the user. Some implementations Amazon "Customers Who Bought This Item Also Bought" Google News "Recommended Stories" Online Audio Broadcasters last.fm PandoraAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 3 / 16
  5. 5. Taxonomy of Recommender Systems Criterion: How are the predictions made? Content-Based Recommenders Locate "similar" items Collaborative Recommenders Find "like-minded" users Hybrid Recommenders Combination of the twoAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 4 / 16
  6. 6. Taxonomy of Recommender Systems Criterion: How are the predictions made? Content-Based Recommenders Locate "similar" items Collaborative Recommenders Find "like-minded" users Hybrid Recommenders Combination of the two Which method is the best? Open academic subject Highly dependent on the application domain We followed the Collaborative Recommender approach Computationally simpler than the Hybrid approach A user rating is more than a mere number; it is an aggregation of various characteristicsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 4 / 16
  7. 7. Collaborative Recommender Systems Key Component: The User Ratings’ MatrixAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
  8. 8. Collaborative Recommender Systems Key Component: The User Ratings’ Matrix Ratings Indicate how much a user likes an item "like" "dislike" 1-star up to 5-starsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
  9. 9. Collaborative Recommender Systems Key Component: The User Ratings’ Matrix Ratings Indicate how much a user likes an item "like" "dislike" 1-star up to 5-stars I1 I2 I3 I4 U1 5 3 2 U2 3 5 2 U3 1 2 U4 2 3Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
  10. 10. Collaborative Recommender Systems Key Component: The User Ratings’ Matrix Ratings Indicate how much a user likes an item "like" "dislike" 1-star up to 5-stars I1 I2 I3 I4 U1 5 3 2 U2 3 5 2 U3 1 2 U4 2 3 Users become each other’s predictor By locating positive and negative correlations among them.Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
  11. 11. Challanges in Collaborative Recommender System Design 1 The cold-start problem 2 The sparsity problemAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
  12. 12. Challanges in Collaborative Recommender System Design 1 The cold-start problem Recommendations cannot be made unless a user has provided some ratings Solutions: Recommend the most popular items Explicity ask the user to rate some items prior to making recommendations 2 The sparsity problemAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
  13. 13. Challanges in Collaborative Recommender System Design 1 The cold-start problem Recommendations cannot be made unless a user has provided some ratings Solutions: Recommend the most popular items Explicity ask the user to rate some items prior to making recommendations 2 The sparsity problem The ratings matrix is sparse Empty elements: More than 90% Solution: Dimensionality Reduction techniques Singular Value Decomposition (SVD) yields good results Pros: The resultant matrix is substantially smaller & densier Cons: The dataset becomes very "noisy" Most elements assume values that are marginally larger than zero Conclusion: We are in need of techniques that can "learn" noisy datasets!Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
  14. 14. "Noisy" Datasets The added noise in the dataset hinders the discovery of patterns in data Data clusters become difficult to separateAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
  15. 15. "Noisy" Datasets The added noise in the dataset hinders the discovery of patterns in data Data clusters become difficult to separate Machine Learning techniques for highly non-separable datasets Support Vector Machines, Radial Basis Functions Evolutionary AlgorithmsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
  16. 16. "Noisy" Datasets The added noise in the dataset hinders the discovery of patterns in data Data clusters become difficult to separate Machine Learning techniques for highly non-separable datasets Support Vector Machines, Radial Basis Functions Computing the support vector (or estimating the surface . . . ) can be a computationally intensive task Evolutionary Algorithms Meaningful Recommendations are not always guaranteed (evolutionary dead-ends)Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
  17. 17. "Noisy" Datasets The added noise in the dataset hinders the discovery of patterns in data Data clusters become difficult to separate Machine Learning techniques for highly non-separable datasets Support Vector Machines, Radial Basis Functions Computing the support vector (or estimating the surface . . . ) can be a computationally intensive task Evolutionary Algorithms Meaningful Recommendations are not always guaranteed (evolutionary dead-ends) Our approach: Use k -separability! Originally proposed by W. Duch1 Special case of the more general method of Projection Pursuit Application to Feed-Forward ANNs Extends linear separability of data clusters into k > 2 segments on the discriminating hyperplane 1 W. Duch, K-separability. Lecture Notes in Computer Science 4131 (2006) 188-197Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
  18. 18. Extending linear separability to 3-separability The 2-bit XOR problem A highly non-separable dataset It can be learned by a 2-layered perceptron, or ... ...by a single layer percpetron that implements k -separability!Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
  19. 19. Extending linear separability to 3-separability The 2-bit XOR problem A highly non-separable dataset It can be learned by a 2-layered perceptron, or ... ...by a single layer percpetron that implements k -separability! The activation function must partition the input space into 3 distinct areas 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (a) Input Space PartitioningAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
  20. 20. Extending linear separability to 3-separability The 2-bit XOR problem A highly non-separable dataset It can be learned by a 2-layered perceptron, or ... ...by a single layer percpetron that implements k -separability! The activation function must partition the input space into 3 distinct areas Soft-Windowed Activation Functions 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 −0.2 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −2 −1 0 1 2 3 4 (a) Input Space Partitioning (b) Soft-Windowed Activation FunctionAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
  21. 21. Generalizing to k -separability Complex Datasets Combine the output of two neurons (or more . . . ) e.g. A 5-separable dataset can be learned by the combined output of 2 neuronsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
  22. 22. Generalizing to k -separability Complex Datasets Combine the output of two neurons (or more . . . ) e.g. A 5-separable dataset can be learned by the combined output of 2 neurons Generalization by Induction m-neuron output ⇒ 2m + 1 regions on the discriminating line ⇒ k = 2m + 1-separable datasetAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
  23. 23. Generalizing to k -separability Complex Datasets Combine the output of two neurons (or more . . . ) e.g. A 5-separable dataset can be learned by the combined output of 2 neurons Generalization by Induction m-neuron output ⇒ 2m + 1 regions on the discriminating line ⇒ k = 2m + 1-separable dataset Use in a Recommendation Engine Create a 2-layered perceptron n-sized input vector, m-sized hidden layer, single output layer Overall, an n → m → 1 projection Build a model (NN) for each user Input: The ratings of the n "neighbors" of the target user on an item he hasn’t evaluated Output: A "score" for the unseen itemAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
  24. 24. Implementation Details The index of separability (k ) is not known a-priori Setting k to a fixed value is of little help It can lead to either overspecialization or to large training errorsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
  25. 25. Implementation Details The index of separability (k ) is not known a-priori Setting k to a fixed value is of little help It can lead to either overspecialization or to large training errors Therefore, k is a problem parameter: it has to be estimatedAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
  26. 26. Implementation Details The index of separability (k ) is not known a-priori Setting k to a fixed value is of little help It can lead to either overspecialization or to large training errors Therefore, k is a problem parameter: it has to be estimated Dynamic Network ArchitectureAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
  27. 27. Implementation Details The index of separability (k ) is not known a-priori Setting k to a fixed value is of little help It can lead to either overspecialization or to large training errors Therefore, k is a problem parameter: it has to be estimated Dynamic Network Architecture Sparse user ratings’ matrix ⇒ small overall network size ⇒ Constructive Network AlgorithmAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
  28. 28. Implementation Details The index of separability (k ) is not known a-priori Setting k to a fixed value is of little help It can lead to either overspecialization or to large training errors Therefore, k is a problem parameter: it has to be estimated Dynamic Network Architecture Sparse user ratings’ matrix ⇒ small overall network size ⇒ Constructive Network Algorithm Our constructive network algorithm was derived from the New Constructive Algorithm2 2 Islam MM et al. A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst Man Cybern B Cybern. 2009 Dec;39(6):1590-605Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
  29. 29. Constructive Network Algorithm 1 Create a minimal architecture 2 Train the network in two phases on the whole Training Set 3 Iteratively add neurons in the hidden layer Create new Training Sets based on the Classification Error (Boosting Algorithm) Only the newly added neuron’s weights are adapted; all other remain "frozen" 4 Stop network construction when the Classification Error stabilizesAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 11 / 16
  30. 30. Constructive Network Algorithm 1 Create a minimal architecture 2 Train the network in two phases on the whole Training Set 3 Iteratively add neurons in the hidden layer Create new Training Sets based on the Classification Error (Boosting Algorithm) Only the newly added neuron’s weights are adapted; all other remain "frozen" 4 Stop network construction when the Classification Error stabilizes Boosting Algorithm Inspired from AdaBoost and used in Network Training as a way of avoiding local minima Functionality Unlearned samples ⇒ New neurons in the hidden layer ⇒ New clusters discoveredAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 11 / 16
  31. 31. Our Collaborative Recommender System Input: The user ratings’ matrix and the target userAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
  32. 32. Our Collaborative Recommender System Input: The user ratings’ matrix and the target user Output: A model (NN) for the target userAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
  33. 33. Our Collaborative Recommender System Input: The user ratings’ matrix and the target user Output: A model (NN) for the target user Steps 1 Pick from the user ratings’ matrix all the co-raters of the target user 2 Compute the SVD of the co-raters matrix, retaining only the non-zero Singular Values 3 Partition the resultant matrix in 3 different sets; the Training Set, the Validation Set and the Test Set 4 Train a Constructive ANN Architecture (as discussed previously...) 5 Compute the Performance Metrics on the Test SetAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
  34. 34. Experiment The MovieLens Database Contains the ratings of 943 users on 1682 movies Sparse matrix (6.3% of non-zero elements) 140 Each user has rated at least 20 120 movies (106 on average), but. . . 100 Discrete Exponential Distribution 80 60% of all users have rated 100 60 movies or less 40 40% of all users have rated 50 20 movies or less 0 0 100 200 300 400 500 600 700 800 We followed a purely Collaborative (a) Rated items per user Strategy Taking into account only the user ratings’ and not any other demographic informationAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 13 / 16
  35. 35. Experiment Test Sets & Metrics Many users rate only a few movies. How would our system perform? How would our system perform on the average case?Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
  36. 36. Experiment Test Sets & Metrics Many users rate only a few movies. How would our system perform? Group A: The few raters user group. Contains all users who have rated 20-50 movies How would our system perform on the average case?Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
  37. 37. Experiment Test Sets & Metrics Many users rate only a few movies. How would our system perform? Group A: The few raters user group. Contains all users who have rated 20-50 movies How would our system perform on the average case? Group B: The moderate raters user group. Contains all users who have rated 51-100 movies May be used in comparisons to other implementationsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
  38. 38. Experiment Test Sets & Metrics Many users rate only a few movies. How would our system perform? Group A: The few raters user group. Contains all users who have rated 20-50 movies How would our system perform on the average case? Group B: The moderate raters user group. Contains all users who have rated 51-100 movies May be used in comparisons to other implementations We randomly picked 20 users from each group (40 users in total). The results were averaged for each groupAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
  39. 39. Experiment Test Sets & Metrics Many users rate only a few movies. How would our system perform? Group A: The few raters user group. Contains all users who have rated 20-50 movies How would our system perform on the average case? Group B: The moderate raters user group. Contains all users who have rated 51-100 movies May be used in comparisons to other implementations We randomly picked 20 users from each group (40 users in total). The results were averaged for each group Metrics 1 Precision 2 Recall 3 F-measureAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
  40. 40. Results Table: Performance Results Methodology Precision Recall F-measure OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37% OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97% MovieMagician Clique-based 74% 73% 74% Movielens 66% 74% 70% SVD/ANN 67.9% 69.7% 68.8% MovieMagician Feature-based 61% 75% 67% MovieMagician Hybrid 73% 56% 63% Correlation 64.4% 46.8% 54.2%Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 15 / 16
  41. 41. Results Table: Performance Results Methodology Precision Recall F-measure OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37% OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97% MovieMagician Clique-based 74% 73% 74% Movielens 66% 74% 70% SVD/ANN 67.9% 69.7% 68.8% MovieMagician Feature-based 61% 75% 67% MovieMagician Hybrid 73% 56% 63% Correlation 64.4% 46.8% 54.2% Observations Our system achieves good results in both usergroups and outperforms the other approaches Recall is higher in the few raters group because they seem to rate only the movies they like Therefore, the recommender cannot generalizeAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 15 / 16
  42. 42. Conclusions We have presented a complete Collaborative Recommender System that is specifically fit for those cases where information is limited Our system achieves a good trade-off between Precision and Recall, a basic requirement for Recommenders This is due to the fact that k -separability is able to uncover complex statistical dependencies (positive and negative) We don’t need to filter the neighborhood of the target user as other systems do (e.g. by using the Pearson Correlation Formula). All "neighbors" are considered Extremely useful in cases of sparse datasetsAlexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 16 / 16

×