COLLABORATIVE FILTERING
USING KNN ALGORITHM
Şeyda HATİPOĞLU 11.06.2013
Recommender Systems
• Software tools and techniques providing suggestions for items
to be of use to a user
• Recommender systems analyze patterns of user interest in
items or products to provide personalized recommendations
of items that will suit a user’s taste
Item - What the system recommends to the user
(CD, news, books, movies...)
User preferences - ratings for products
User actions - user browsing history
RS Techniques
• Collaborative-Filtering system
– recommends to the active user the items that
other users with similar tastes liked in the past
• Content-based system
– recommend items that are similar to the ones that
the user liked in the past
• Hybrid-Collaborative Filtering
• Tagging: recommends items using tags
assigned by different users
Collaborative Filtering
• trying to predict the opinion the user will have on the
different items and be able to recommend the “best”
items to each user based on the user’s previous
likings and the opinions of other like minded users.
Collaborative Filtering
• The task of a CF algorithm is to find item likeliness of two
forms :
Prediction – a numerical value, expressing the predicted
likeliness value about an item of the active user
Recommendation – a list of N items that the active user will
like the most
K Nearest Neighbour Algorithm
• A distance measure is needed to determine the
“closeness” of instances
• Classify an instance by finding its nearest neighbors
and picking the most popular class among the
neighbors
Mega
Mind
Toy Story Despicabl
e Me
Lion King Kung Fu
Panda
Zeynep 4 5 3 2 4
Funda 3 3 2 3 5
Pınar 3 3 4 2 3
Gülten 4 4 5 4 5
Yağız 4 5 ? 4 5
Rating Prediction
Application
• MovieLens Database (1M)
 3883 movies
 6040 users
 1000209 ratings
• Technologies
 ASP.Net 4.0
 MS SQL Server 2008
RATING PREDICTION DATABASE DIAGRAM
Movies
MovieID
Title
Genre
Ratings
ID
UserID
MovieID
Rating
Timestamp
Users
UserID
Gender
Age
Occupation
ZipCode
Age
Id
Description
Occupation
Id
Description
Predictions
ID
UserID
MostSimilarUserID
Difference
TimeElapsed
MovieID
PredictedRating
ActualRating
Error Measurement
Mean Square Error (MSE)=0.975
Mean Absolute Error(MAE)=0.679
DEMO
Pro
Con
• Cold-start Problem
• Storage: all training
examples are saved in
memory
• Time: to classify x, you
need to loop over all
training examples (x’,y’) to
compute distance between
x and x’.
 Simple to implement and
use
 Comprehensible – easy to
explain prediction
 Robust to noisy data by
averaging k-nearest
neighbors
KNN Algorithm
Conclusion
 Recommending and personalization are important
approaches to combating information over-load.
 Machine Learning is an important part of systems for
these tasks.
 Collaborative Filtering has its own problems
 Better results would be achieved by use of
content, tags and more optimized similarity
functions.
Thank you

Collaborative Filtering using KNN

  • 1.
    COLLABORATIVE FILTERING USING KNNALGORITHM Şeyda HATİPOĞLU 11.06.2013
  • 2.
    Recommender Systems • Softwaretools and techniques providing suggestions for items to be of use to a user • Recommender systems analyze patterns of user interest in items or products to provide personalized recommendations of items that will suit a user’s taste Item - What the system recommends to the user (CD, news, books, movies...) User preferences - ratings for products User actions - user browsing history
  • 3.
    RS Techniques • Collaborative-Filteringsystem – recommends to the active user the items that other users with similar tastes liked in the past • Content-based system – recommend items that are similar to the ones that the user liked in the past • Hybrid-Collaborative Filtering • Tagging: recommends items using tags assigned by different users
  • 4.
    Collaborative Filtering • tryingto predict the opinion the user will have on the different items and be able to recommend the “best” items to each user based on the user’s previous likings and the opinions of other like minded users.
  • 5.
    Collaborative Filtering • Thetask of a CF algorithm is to find item likeliness of two forms : Prediction – a numerical value, expressing the predicted likeliness value about an item of the active user Recommendation – a list of N items that the active user will like the most
  • 6.
    K Nearest NeighbourAlgorithm • A distance measure is needed to determine the “closeness” of instances • Classify an instance by finding its nearest neighbors and picking the most popular class among the neighbors
  • 7.
    Mega Mind Toy Story Despicabl eMe Lion King Kung Fu Panda Zeynep 4 5 3 2 4 Funda 3 3 2 3 5 Pınar 3 3 4 2 3 Gülten 4 4 5 4 5 Yağız 4 5 ? 4 5 Rating Prediction
  • 8.
    Application • MovieLens Database(1M)  3883 movies  6040 users  1000209 ratings • Technologies  ASP.Net 4.0  MS SQL Server 2008
  • 9.
    RATING PREDICTION DATABASEDIAGRAM Movies MovieID Title Genre Ratings ID UserID MovieID Rating Timestamp Users UserID Gender Age Occupation ZipCode Age Id Description Occupation Id Description Predictions ID UserID MostSimilarUserID Difference TimeElapsed MovieID PredictedRating ActualRating
  • 10.
    Error Measurement Mean SquareError (MSE)=0.975 Mean Absolute Error(MAE)=0.679
  • 11.
  • 12.
    Pro Con • Cold-start Problem •Storage: all training examples are saved in memory • Time: to classify x, you need to loop over all training examples (x’,y’) to compute distance between x and x’.  Simple to implement and use  Comprehensible – easy to explain prediction  Robust to noisy data by averaging k-nearest neighbors KNN Algorithm
  • 13.
    Conclusion  Recommending andpersonalization are important approaches to combating information over-load.  Machine Learning is an important part of systems for these tasks.  Collaborative Filtering has its own problems  Better results would be achieved by use of content, tags and more optimized similarity functions.
  • 14.