k-Separability Presentation

An Efﬁcient Collaborative Recommender System
based on k -separability

Georgios Alexandridis Georgios Siolas Andreas Stafylopatis

Department of Electrical and Computer Engineering
National Technical University of Athens

20th International Conference on Artiﬁcial Neural Networks
(ICANN 2010)

Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 1 / 16

Outline

1 Current Trends in Recommender Systems
Recommender Systems
Design Issues

2 Theoretical & Practical Aspects of our Contribution
k-Separability
System Architecture

3 Evaluating our System
Experiment
Results
Conclusions


What are the Recommender Systems?

Recommender Systems attempt to present information items (e.g.
movies, music, books, news stories) that are likely to be of interest
to the user.


What are the Recommender Systems?

Recommender Systems attempt to present information items (e.g.
movies, music, books, news stories) that are likely to be of interest
to the user.
Some implementations
Amazon
"Customers Who Bought This Item Also Bought"
Google News
"Recommended Stories"
Online Audio Broadcasters
last.fm
Pandora


Taxonomy of Recommender Systems

Criterion: How are the predictions made?
Content-Based Recommenders
Locate "similar" items
Collaborative Recommenders
Find "like-minded" users
Hybrid Recommenders
Combination of the two


Taxonomy of Recommender Systems

Criterion: How are the predictions made?
Content-Based Recommenders
Locate "similar" items
Collaborative Recommenders
Find "like-minded" users
Hybrid Recommenders
Combination of the two
Which method is the best?
Open academic subject
Highly dependent on the application domain
We followed the Collaborative Recommender approach
Computationally simpler than the Hybrid approach
A user rating is more than a mere number; it is an aggregation of
various characteristics


Collaborative Recommender Systems

Key Component: The User Ratings’ Matrix



Ratings
Indicate how much a user likes an item
"like" "dislike"
1-star up to 5-stars



Ratings
"like" "dislike"

I1 I2 I3 I4
U1 5 3 2
U2 3 5 2
U3 1 2
U4 2 3



Ratings
"like" "dislike"

I1 I2 I3 I4
U1 5 3 2
U2 3 5 2
U3 1 2
U4 2 3

Users become each other’s predictor
By locating positive and negative correlations among them.


Challanges in Collaborative Recommender System
Design
1 The cold-start problem

2 The sparsity problem


Design
Recommendations cannot be made unless a user has provided
some ratings
Solutions:
Recommend the most popular items
Explicity ask the user to rate some items prior to making
recommendations


Design
Recommendations cannot be made unless a user has provided
some ratings
Solutions:
Recommend the most popular items
Explicity ask the user to rate some items prior to making
recommendations
The ratings matrix is sparse
Empty elements: More than 90%
Solution: Dimensionality Reduction techniques
Singular Value Decomposition (SVD) yields good results
Pros: The resultant matrix is substantially smaller & densier
Cons: The dataset becomes very "noisy"
Most elements assume values that are marginally larger than zero
Conclusion: We are in need of techniques that can "learn" noisy
datasets!

"Noisy" Datasets
The added noise in the dataset hinders the discovery of patterns
in data
Data clusters become difﬁcult to separate


"Noisy" Datasets
in data
Machine Learning techniques for highly non-separable datasets
Support Vector Machines, Radial Basis Functions

Evolutionary Algorithms


"Noisy" Datasets
in data
Computing the support vector (or estimating the surface . . . ) can be a
computationally intensive task
Meaningful Recommendations are not always guaranteed
(evolutionary dead-ends)


"Noisy" Datasets
in data
Computing the support vector (or estimating the surface . . . ) can be a
computationally intensive task
Meaningful Recommendations are not always guaranteed
(evolutionary dead-ends)
Our approach: Use k -separability!
Originally proposed by W. Duch1
Special case of the more general method of Projection Pursuit
Application to Feed-Forward ANNs
Extends linear separability of data clusters into k > 2 segments on
the discriminating hyperplane

1
W. Duch, K-separability. Lecture Notes in Computer Science 4131 (2006) 188-197

Extending linear separability to 3-separability
The 2-bit XOR problem
A highly non-separable dataset
It can be learned by a 2-layered perceptron, or ...
...by a single layer percpetron that implements k -separability!


The activation function must partition the input space into 3
distinct areas

1.2

1

0.8

0.6

0.4

0.2

0

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2

(a) Input Space Partitioning


The activation function must partition the input space into 3
distinct areas
Soft-Windowed Activation Functions

1.2

1
1

0.8 0.8

0.6
0.6

0.4
0.4
0.2

0.2
0

−0.2 0
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 −2 −1 0 1 2 3 4

(a) Input Space Partitioning (b) Soft-Windowed Activation
Function

Generalizing to k -separability

Complex Datasets
Combine the output of two neurons (or more . . . )
e.g. A 5-separable dataset can be learned by the combined output
of 2 neurons



Complex Datasets
of 2 neurons
Generalization by Induction
m-neuron output ⇒ 2m + 1 regions on the discriminating line
⇒ k = 2m + 1-separable dataset



Complex Datasets
of 2 neurons
Generalization by Induction
m-neuron output ⇒ 2m + 1 regions on the discriminating line
⇒ k = 2m + 1-separable dataset
Use in a Recommendation Engine
Create a 2-layered perceptron
n-sized input vector, m-sized hidden layer, single output layer
Overall, an n → m → 1 projection
Build a model (NN) for each user
Input: The ratings of the n "neighbors" of the target user on an item
he hasn’t evaluated
Output: A "score" for the unseen item


Implementation Details

The index of separability (k ) is not known a-priori
Setting k to a ﬁxed value is of little help
It can lead to either overspecialization or to large training errors



Therefore, k is a problem parameter: it has to be estimated



Dynamic Network Architecture



Sparse user ratings’ matrix ⇒ small overall network size ⇒
Constructive Network Algorithm



Sparse user ratings’ matrix ⇒ small overall network size ⇒
Our constructive network algorithm was derived from the New
Constructive Algorithm2

2
Islam MM et al. A new constructive algorithm for architectural and functional adaptation of artiﬁcial neural
networks.
IEEE Trans Syst Man Cybern B Cybern. 2009 Dec;39(6):1590-605

1 Create a minimal architecture
2 Train the network in two phases on the whole Training Set
3 Iteratively add neurons in the hidden layer
Create new Training Sets based on the Classiﬁcation Error
(Boosting Algorithm)
Only the newly added neuron’s weights are adapted; all other
remain "frozen"
4 Stop network construction when the Classiﬁcation Error stabilizes


1 Create a minimal architecture
2 Train the network in two phases on the whole Training Set
3 Iteratively add neurons in the hidden layer
Create new Training Sets based on the Classiﬁcation Error
(Boosting Algorithm)
Only the newly added neuron’s weights are adapted; all other
remain "frozen"
4 Stop network construction when the Classiﬁcation Error stabilizes

Boosting Algorithm
Inspired from AdaBoost and used in Network Training as a way of
avoiding local minima
Functionality
Unlearned samples ⇒ New neurons in the hidden layer ⇒ New
clusters discovered


Our Collaborative Recommender System

Input: The user ratings’ matrix and the target user



Output: A model (NN) for the target user



Output: A model (NN) for the target user
Steps
1 Pick from the user ratings’ matrix all the co-raters of the target user
2 Compute the SVD of the co-raters matrix, retaining only the
non-zero Singular Values
3 Partition the resultant matrix in 3 different sets; the Training Set, the
Validation Set and the Test Set
4 Train a Constructive ANN Architecture (as discussed previously...)
5 Compute the Performance Metrics on the Test Set


Experiment
The MovieLens Database
Contains the ratings of 943 users on
1682 movies
Sparse matrix (6.3% of non-zero
elements)
140

Each user has rated at least 20 120

movies (106 on average), but. . . 100

Discrete Exponential Distribution 80

60% of all users have rated 100 60

movies or less 40

40% of all users have rated 50 20

movies or less 0
0 100 200 300 400 500 600 700 800

We followed a purely Collaborative (a) Rated items per user
Strategy
Taking into account only the user
ratings’ and not any other
demographic information

Experiment
Test Sets & Metrics

Many users rate only a few movies. How would our system
perform?

How would our system perform on the average case?


Experiment
Test Sets & Metrics

perform?
Group A: The few raters user group.
Contains all users who have rated 20-50 movies


Experiment
Test Sets & Metrics

perform?
Group B: The moderate raters user group.
May be used in comparisons to other implementations


Experiment
Test Sets & Metrics

perform?
We randomly picked 20 users from each group (40 users in total).
The results were averaged for each group


Experiment
Test Sets & Metrics

perform?
We randomly picked 20 users from each group (40 users in total).
The results were averaged for each group
Metrics
1 Precision
2 Recall
3 F-measure


Results

Table: Performance Results
Methodology Precision Recall F-measure
OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37%
OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97%
MovieMagician Clique-based 74% 73% 74%
Movielens 66% 74% 70%
SVD/ANN 67.9% 69.7% 68.8%
MovieMagician Feature-based 61% 75% 67%
MovieMagician Hybrid 73% 56% 63%
Correlation 64.4% 46.8% 54.2%


Results

Table: Performance Results
Methodology Precision Recall F-measure
OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37%
OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97%
MovieMagician Clique-based 74% 73% 74%
Movielens 66% 74% 70%
SVD/ANN 67.9% 69.7% 68.8%
MovieMagician Feature-based 61% 75% 67%
MovieMagician Hybrid 73% 56% 63%
Correlation 64.4% 46.8% 54.2%

Observations
Our system achieves good results in both usergroups and
outperforms the other approaches
Recall is higher in the few raters group because they seem to rate
only the movies they like
Therefore, the recommender cannot generalize


Conclusions

We have presented a complete Collaborative Recommender
System that is specifically fit for those cases where information is
limited
Our system achieves a good trade-off between Precision and
Recall, a basic requirement for Recommenders
This is due to the fact that k -separability is able to uncover
complex statistical dependencies (positive and negative)
We don’t need to filter the neighborhood of the target user as other
systems do (e.g. by using the Pearson Correlation Formula).
All "neighbors" are considered
Extremely useful in cases of sparse datasets


k-Separability Presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to k-Separability Presentation

Similar to k-Separability Presentation (20)

Recently uploaded

Recently uploaded (20)

k-Separability Presentation