Based on:
Recommender Systems
by Prem Melville & Vikas Sindhwani
Presented by:
Vijayindu Gamage
Udith Gunaratna
Pubudu Gun...
LOGORecommender Systems
LOGORecommender Systems
Structure of a Recommender System
LOGORecommender Systems
Classification
 Collaborative Filtering
 Neighborhood-based Collaborative Filtering
 Item-based...
LOGORecommender Systems
Neighborhood-based Collaborative Filtering
Basic Steps
 Assign a weight to all users with respect...
LOGORecommender Systems
 Compute a prediction from a weighted combination of
the selected neighbors’ ratings.
LOGORecommender Systems
Neighborhood-based CF - Problem
LESS users …
neighbors are EASY to find !
LOGORecommender Systems
Neighborhood-based CF - Problem
MANY users …
neighbors are HARD to find !
LOGORecommender Systems
Item-based Collaborative Filtering
 Proposed in 2003
 DOES NOT match similar users
 DOES match ...
LOGORecommender Systems
Item-based Collaborative Filtering
 Pearson correlation is used
 Rating for item i for user a is...
LOGORecommender Systems
More Extensions
Highly correlated neighbors based on very few
co-rated items
Significance Weighti...
LOGORecommender Systems
Model-based Collaborative Filtering
 Uses statistical models for predictions
 Based on data mini...
LOGORecommender Systems
Content-based Recommending
Pure collaborative filtering recommenders treat all
users and items as...
LOGORecommender Systems
Content-based Recommending
User liked
&
Movie Genre
Recommendation
LOGORecommender Systems
Content-based Recommending
Focused on recommending items with associated
textual content
2 appro...
LOGORecommender Systems
Hybrid Approaches
Used to leverage the strengths of content-based
and collaborative recommenders....
LOGORecommender Systems
Evaluation Metrics
Evaluation matrix is used to measure the quality
of a recommender system.
The...
LOGORecommender Systems
Mean Absolute Error

LOGORecommender Systems
Root Mean Squared Error (RMSE)

LOGORecommender Systems
Challenges and Limitations
Sparsity
Cold-Start Problem
Fraud
 push attacks
 nuke attacks
LOGORecommender Systems
Sparsity
User ratings matrix is typically very sparse
Effects collaborative filtering systems
T...
LOGORecommender Systems
Cold-Start Problem
New items and new users pose a significant
challenge to recommender systems.
Ne...
LOGORecommender Systems
Fraud
Push attacks
 Increase the rating of their own products
Nuke attacks
 Lower the ratings ...
LOGORecommender Systems
Content based or Collaborative
filtering
Advantages of CF over CB
CF can perform in domains where...
Recommender Systems
Upcoming SlideShare
Loading in …5
×

Recommender Systems

2,588 views
2,435 views

Published on

Based on:
Recommender Systems by Prem Melville & Vikas Sindhwani

Presented by:
Vijayindu Gamage
Udith Gunaratna
Pubudu Gunatilaka

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,588
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
211
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • In neighborhood-based CF, every user should be considered in finding neighborsWhen the number of users is small – neighborhood-based collaborative filtering works
  • When the number of users is large – computational complexity is highDifficult to find neighborsAlternative - Item-based Collaborative Filtering
  • Proposed in 2003 by Linden, Smith, and YorkDoes not match similar users as in neighborhood based CFMatch a user’s rated items to similar itemsResearches shows this leads to faster online systems and also results in improved recommendations
  • Pearson correlation is used to find the similarity between two items i and jU is the set of users who have rated both items i and jr(u,i) is the rating of user u on item ir‘(i) is the average rating of item I across all the usersThen the rating for item ‘i’ for user ‘u’ is predicted using weighted average.
  • It is common for the active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items. These neighbors based on a small number of overlapping items tend to be bad predictors. One approach to tackle this problem is to multiply the similarity weight by a significance weighting factor, which devalues the correlations based on few co-rated items.Another approach is applying a default value to unrated itemsThen one can now compute correlation using the union of items rated by users being matched as opposed to the intersection.There may be items which are universally loved or hatedThey are bad for predictionsA value called inverse user frequency is calculated and the original CF rating is multiplied by this valueNeighborhood based methods that generate recommendations based on statistical notions of similarity between users, or between items
  • Uses statistical models for predictionslatent factor models assume that the similarity between users and items is simultaneously induced by some hidden lower dimensional structure in the dataFor an example, the rating that a user gives to a movie might be assumed to depend on few implicit factors such as the user’s taste across various movie genresThese statistical models are developed based on data mining and machine learning algorithmsCurrently the latent factor and matrix factorization models are widely usedIn 2009 a competition was held by Netflix – popular movie web site to design the best collaborative filtering algorithm to predict user ratings for films. the grand prize of US$1,000,000  was given to the team which bested Netflix's own algorithm for predicting ratings by 10.06%The final winning solution was a complex ensemble of different models, several enhancements to basic matrix factorization models.
  • So far discussed about collaborative filteringSecond type of recommender systems are content-based recommending.Pure CF techniques treats users and items as atomic units.They make predictions without regard to the specifics of individual users or items.But using underlying information about users or items, better predictions can be made.For examples demographic information about users – age group, gender, ethnicity, languages etc.Movie genres such as action, comedy, horror, drama, romance etc.
  • Assume that a particular user has liked Start Wars and Star TrekWhen the content of those movies were analyzed, we can find that the genre is sci-fi.Based on that we can recommend another sci-fi movie to the user such as Oblivion
  • Content base recommending is mainly focused on items with associated textual content such as web pages, books and movies.There are two approaches to tackle this problem.Recommendation problem is treated as an Information Retrieval task.User’s preferences are treated as a Query and the unrated documents are scored with relevance/similarity to this queryRecommendation problem is treated as a Classification task.Each example represents the content of an item, and a user’s past ratings are used as labels for these examples
  • In order to leverage the strengths of content-based and collaborative recommenders, people have come up with hybrid approaches which combine the two.simple approach is to allow both content-based and collaborative filtering methods to produce separate ranked lists of recommendations, and then merge their resultsto produce a one final list. To improve this combine the two predictions using an adaptive weighted average, where the weight of the collaborative component increases as thenumber of users accessing an item increasescontent-based predictions are applied to convert a sparse user ratings matrix into a full ratings matrix, and then a CF method is used to provide recommendations
  • quality of a recommender system can be evaluated by comparing recommendations to a test set of known user ratings. these systems are typicaly measured using predictive accuracy metrics where the predicted ratings are directly compared to actual user ratings.The most commonly used metric
  • The MAE measures the average magnitude of the errors in a set of forecasts, without considering their direction. It measures accuracy for continuous variables.
  • The RMSE is a quadratic scoring rule which measures the average magnitude of the errorExpressing the formula in words, the difference between forecast and corresponding observed values are each squared and then averaged over the sample. Finally, the square root of the average is taken. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE is most useful when large errors are particularly undesirable.
  • New items and new userspose a signi+cant challenge to recommender systems.Collectively these problems are referred to as the coldstart problem
  • Stated simply, most users do not rate most items and, hence, the user ratings matrix is typically very sparse. this is a problem for collaborative filtering systems, since it decreases the probability of finding a set of users with similar ratings.This problem often occurs when a system has a very high itemuser ratio, or the system is in the initial stages of use.Solution to this using additional domain information about item. for example when a new movie is added to the system give additional making assumptions about the data generation process that allows for high-quality imputation
  • New items and new users pose a significant challenge to recommender systems. Collectively these problems are referred to as the cold start problem The first of these problems arises in collaborative filtering systems, where an item cannot be recommended unless some user hasrated it beforeSolution isSince content-based approaches do not rely on ratings from other users, they can be used to produce recommendations for all items, provided attributes ofthe items are available. Thenew-user problem is dificult to tackle, since without previous preferences of a user it is not possible to find similar users or to build a content-based profile.Solution to this is selecting items to be rated by a user so as to rapidly improve recommendation performance with the least user feedback.
  • As recommender systems are being increasingly adopted by commercial websites, they have started toplay a significant role in affecting the profitability of sellers. Thishas led to many vendors engaging in different forms of fraud. To increase the profits by cheating the recommendersystems for their benefitsIncrease the rating of their own productsLower the ratings of their competitors
  • Now let see which method is better. CF can perform in domains where there is not much content associated with itemsCF can also preform when content is difficult for a computer to analyze.CF system has the ability to provideserendipitous recommendations.
  • Recommender Systems

    1. 1. Based on: Recommender Systems by Prem Melville & Vikas Sindhwani Presented by: Vijayindu Gamage Udith Gunaratna Pubudu Gunatilaka
    2. 2. LOGORecommender Systems
    3. 3. LOGORecommender Systems Structure of a Recommender System
    4. 4. LOGORecommender Systems Classification  Collaborative Filtering  Neighborhood-based Collaborative Filtering  Item-based Collaborative Filtering  Model-based Collaborative Filtering  Content Based Recommending  Hybrid Approaches
    5. 5. LOGORecommender Systems Neighborhood-based Collaborative Filtering Basic Steps  Assign a weight to all users with respect to similarity with the active user.  Select k users that have the highest similarity with the active user – (neighborhood)
    6. 6. LOGORecommender Systems  Compute a prediction from a weighted combination of the selected neighbors’ ratings.
    7. 7. LOGORecommender Systems Neighborhood-based CF - Problem LESS users … neighbors are EASY to find !
    8. 8. LOGORecommender Systems Neighborhood-based CF - Problem MANY users … neighbors are HARD to find !
    9. 9. LOGORecommender Systems Item-based Collaborative Filtering  Proposed in 2003  DOES NOT match similar users  DOES match similar items  Leads to faster online systems  Results in improved recommendations
    10. 10. LOGORecommender Systems Item-based Collaborative Filtering  Pearson correlation is used  Rating for item i for user a is predicted
    11. 11. LOGORecommender Systems More Extensions Highly correlated neighbors based on very few co-rated items Significance Weighting  multiply the similarity weight by a significance weighting factor  Default Voting  assume a default value for the rating for items that have not been explicitly rated  Inverse User Frequency  Universally loved/hated items are bad
    12. 12. LOGORecommender Systems Model-based Collaborative Filtering  Uses statistical models for predictions  Based on data mining and machine learning algorithms  Latent factor and Matrix factorization models have emerged as a state-of-the-art methodology  Netflix Prize competition
    13. 13. LOGORecommender Systems Content-based Recommending Pure collaborative filtering recommenders treat all users and items as atomic units Can make a better personalized recommendation by knowing more about a user or an item  Demographic information  Movie genres  Literary genres
    14. 14. LOGORecommender Systems Content-based Recommending User liked & Movie Genre Recommendation
    15. 15. LOGORecommender Systems Content-based Recommending Focused on recommending items with associated textual content 2 approaches  Treat as an Information Retrieval (IR) Task  Treat as a Classification Task
    16. 16. LOGORecommender Systems Hybrid Approaches Used to leverage the strengths of content-based and collaborative recommenders. Merging the list results to produce a final list. Content-boosted collaborative filtering
    17. 17. LOGORecommender Systems Evaluation Metrics Evaluation matrix is used to measure the quality of a recommender system. These systems are typical measured using predictive accuracy metrics 1. Mean Absolute Error (MAE) 2. Root Mean Squared Error (RMSE)
    18. 18. LOGORecommender Systems Mean Absolute Error 
    19. 19. LOGORecommender Systems Root Mean Squared Error (RMSE) 
    20. 20. LOGORecommender Systems Challenges and Limitations Sparsity Cold-Start Problem Fraud  push attacks  nuke attacks
    21. 21. LOGORecommender Systems Sparsity User ratings matrix is typically very sparse Effects collaborative filtering systems The problem system has a very high item- to user ratio. The system is in the initial stages of use. Solution - making assumptions about the data generation process
    22. 22. LOGORecommender Systems Cold-Start Problem New items and new users pose a significant challenge to recommender systems. New item problem – content-based approach to produce recommendations for all items, New user problem selecting items to be rated by a user so as to rapidly improve recommendation performance with the least user feedback
    23. 23. LOGORecommender Systems Fraud Push attacks  Increase the rating of their own products Nuke attacks  Lower the ratings of their competitors Item-based collaborative filtering is more robust to these attacks Content based methods are unaffected by profile injection attacks.
    24. 24. LOGORecommender Systems Content based or Collaborative filtering Advantages of CF over CB CF can perform in domains where there is not much content associated with items CF can also preform when content is difficult for a computer to analyze. CF system has the ability to provide serendipitous recommendations.

    ×