More Related Content

Recommender Systems.pptx

  1. Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.
  2. Recommender System
  3. Recommender System o Predict ratings for unrated items o Recommend top-k items
  4. RS – Major Approaches • Basic question: Given 𝑅 𝑈 ×|𝐼| (highly incomplete/sparse), given 𝑢, 𝑖, predict 𝑟𝑢,𝑖. 𝒊𝟏 𝒊𝟐 𝒊𝟑 𝒊𝟒 𝒊𝟓 𝒊𝟔 1 3 5 1 4 4 4 2 3 3 5 4 4 4 3 𝑢1 𝑢2 𝑢3 𝑢4 𝑢5
  5. RS – Approaches • Content-based: how similar is 𝑖 to items 𝑢 has rated/liked in the past? – Use metadata for measuring similarity. + works even when no ratings available on affected items. - Requires metadata! • Collaborative Filtering: Identify items (users) with their rating vector; no need for metadata; but cold-start is a problem.
  6. RS – Approaches • CF can be memory-based (as sketched on p5): item 𝑖’s “characteristics captured by the ratings it has received (rating vector). • Or it can be model-based: model user/item’s behavior via latent factors (to be learned from data). – Dimensionality reduction – Original ratings matrix is usually (very) low rank.  Matrix completion: • using Singular value decomposition (SVD). • Using matrix factorization (MF) [and variants]. • MovieLens – example of RS using CF.
  7. Collaborative Filtering
  8. Key concepts/questions • How is user f/b expressed: ratings or implicit? • How to measure similarity? • How many nearest neighbors to pick (if memory- or neighborhood-based). • How to predict unknown ratings? • Distinguished (also called active) user and (target) item.
  9. A Naïve Algorithm (memory-based) • Find top-ℓ most similar neighbors to distinguished user 𝑢 (using chosen similarity or proximity measure). • ∀item 𝑖 rated by sufficiently many of these, compute 𝑟𝑢𝑖by aggregating by chosen neighbors above. • Sort items with predicted ratings and recommend top-𝑘 items to 𝑢.
  10. An Example 𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6 𝑖7 𝑢1 4 5 1 𝑢2 5 5 4 𝑢3 2 4 5 𝑢4 3 3 • Jaccard(A,B) = 1/5 <2/4 = Jaccard(A,C)! • cos 𝐴, 𝐵 = 4 × 5/ 𝐴 . |𝐵| ≈ 0.380 > 0.322 ≈ cos 𝐴, 𝐶 . – OK, but ignores internal “rating scales”  easy/hard graders. • See the Rajaraman et al. book for “rounded” Jaccard/Cosine. • A more principled approach: subtract from each rating the corresponding user’s mean rating, then apply Jaccard/cosine.
  11. An Example 𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6 𝑖7 𝑢1 2/3 5/3 -7/3 𝑢2 1/3 1/3 -2/3 𝑢3 -5/3 1/3 4/3 𝑢4 0 0 • See what just happened to the ratings! • Behavior and items more well-separated. • Cosine can now be + or -: check (A,B) and (A,C).
  12. Prediction using Memory/Neighborhood- based approaches • A popular approach – using Pearson correlation coefficient. • 𝑟𝑢𝑖 = 𝑟𝑢 + 𝐾. 𝑣∈𝑁 𝑢 ∩𝑈 𝑖 𝑤𝑢𝑣. 𝑟𝑣𝑖 − 𝑟𝑣 , where 𝑤𝑢𝑣 = { 𝑗∈𝐼 𝑢 ∩𝐼 𝑣 𝑟𝑢𝑗 − 𝑟𝑢 𝑟𝑣𝑗 − 𝑟𝑣 }/{√ 𝑗∈𝐼 𝑢 ∩𝐼 𝑣 𝑟𝑢𝑗 −
  13. User-User vs Item-Item. • User-User CF: what we just discussed! • Item-Item – dual in principle: find items most similar to distinguished item 𝑖; for every user 𝑢 who did not rate the distinguished item but rated sufficiently many from the similarity group, compute 𝑟𝑢𝑖. • In practice, item-item has been found to be better than user-user.
  14. Simpler Alternatives for Rating Estimation • Simple average of ratings by most similar neighbors. • Weighted average. • User’s mean plus offset corresponding to weighted average of offsets by most similar neighbors (Pearson!). • Or you can see the popular vote by most similar neighbors: e.g., 𝑢 has 5 most similar neighbors who have rated 𝑖. – 𝑣1, 𝑣2 rated 1; 𝑣_3 rated 3; 𝑣4 rated 4; 𝑣5 rated 5. – Simple majority: 𝑟𝑢𝑖 = 1. – Suppose 𝑤𝑢𝑣1 = 𝑤𝑢𝑣2 = 0.2; 𝑤𝑢𝑣3 = 0.3; 𝑤𝑢𝑣4 = 0.8; 𝑤𝑢𝑣5 =1.0. Then 𝑟𝑢𝑖 = 5. Tie-breaking arbitrary.
  15. Item-based CF • Dual to user-based CF, in principle. • “People who bought 𝑆 also bought 𝑇”. • Natural connection to association rules (each user = a transaction). • Predict unknown rating of user 𝑢 on item 𝑖 as the aggregate of ratings by 𝑢 on items similar to 𝑖. • E.g., using mean-centering and Pearson correlation for item-item similarity, 𝑟𝑢𝑖 = 𝑟𝑖 + 𝐾 𝑗∈𝐼 𝑢 ∩𝑁(𝑖) 𝑤𝑖𝑗. (𝑟𝑢𝑗 − 𝑟 𝑗) where 𝑟𝑖 =mean rating of 𝑖 by various users and 𝑤𝑖𝑗 = similarity b/w 𝑖 and 𝑗, and 𝐾– the usual normalization factor.
  16. Item-based CF Computation Illustrated • Similarities: computing sim. b/w all pairs of items is prohibitive! • But do we need to? • How efficiently can we compute the sim. of all pairs of items for which the sim. Is positive? X X X X 𝑖 𝑢 …
  17. Item-based CF – Recommendation Generation X X X X 𝑖 𝑢 X X X X X similar items? similar items? How efficiently can we generate recommendations for a given user?
  18. Some empirical facts re. user-based vs. item-based CF • User profiles are typically thinner than item profiles; depends on application domain. – Certainly holds for movies (Netflix). •  as users provide more ratings, user-user sim. can chage more dyamically than item-item sim. • Can we precompute item-item sim. and speed up prediction computation? • What about refreshing sim. against updates? Can we do it incrementally? How often should we do this? • Why not do this for user-user?
  19. User & Item-based CF are both personalized • Non-personalized would estimate an unknown rating as a global average. • Every user gets the same recommendation list, modulo items s/he may have already rated. • Personalized clearly leads to better predictions.