Mar. 24, 2023•0 likes## 0 likes

•5 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Data & Analytics

Finance: Boosting is used with deep learning models to automate critical tasks, including fraud detection, pricing analysis, and more. For example, boosting methods in credit card fraud detection and financial products pricing analysis (link resides outside of ibm.com) improve the accuracy of analyzing massive data sets to minimize financial losses.

Namrata VijFollow

Recommender Systems! @ASAI 2011Ernesto Mislej

Recommender systemsTamer Rezk

Overview of recommender systemStanley Wang

Recommender Systems: Advances in Collaborative FilteringChangsung Moon

Recommender Systems and Linked Open DataPolytechnic University of Bari

Artificial Intelligence: Case-based & Model-based ReasoningThe Integral Worm

- Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.
- Recommender System
- Recommender System o Predict ratings for unrated items o Recommend top-k items
- RS – Major Approaches • Basic question: Given 𝑅 𝑈 ×|𝐼| (highly incomplete/sparse), given 𝑢, 𝑖, predict 𝑟𝑢,𝑖. 𝒊𝟏 𝒊𝟐 𝒊𝟑 𝒊𝟒 𝒊𝟓 𝒊𝟔 1 3 5 1 4 4 4 2 3 3 5 4 4 4 3 𝑢1 𝑢2 𝑢3 𝑢4 𝑢5
- RS – Approaches • Content-based: how similar is 𝑖 to items 𝑢 has rated/liked in the past? – Use metadata for measuring similarity. + works even when no ratings available on affected items. - Requires metadata! • Collaborative Filtering: Identify items (users) with their rating vector; no need for metadata; but cold-start is a problem.
- RS – Approaches • CF can be memory-based (as sketched on p5): item 𝑖’s “characteristics captured by the ratings it has received (rating vector). • Or it can be model-based: model user/item’s behavior via latent factors (to be learned from data). – Dimensionality reduction – Original ratings matrix is usually (very) low rank. Matrix completion: • using Singular value decomposition (SVD). • Using matrix factorization (MF) [and variants]. • MovieLens – example of RS using CF.
- Collaborative Filtering
- Key concepts/questions • How is user f/b expressed: ratings or implicit? • How to measure similarity? • How many nearest neighbors to pick (if memory- or neighborhood-based). • How to predict unknown ratings? • Distinguished (also called active) user and (target) item.
- A Naïve Algorithm (memory-based) • Find top-ℓ most similar neighbors to distinguished user 𝑢 (using chosen similarity or proximity measure). • ∀item 𝑖 rated by sufficiently many of these, compute 𝑟𝑢𝑖by aggregating by chosen neighbors above. • Sort items with predicted ratings and recommend top-𝑘 items to 𝑢.
- An Example 𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6 𝑖7 𝑢1 4 5 1 𝑢2 5 5 4 𝑢3 2 4 5 𝑢4 3 3 • Jaccard(A,B) = 1/5 <2/4 = Jaccard(A,C)! • cos 𝐴, 𝐵 = 4 × 5/ 𝐴 . |𝐵| ≈ 0.380 > 0.322 ≈ cos 𝐴, 𝐶 . – OK, but ignores internal “rating scales” easy/hard graders. • See the Rajaraman et al. book for “rounded” Jaccard/Cosine. • A more principled approach: subtract from each rating the corresponding user’s mean rating, then apply Jaccard/cosine.
- An Example 𝑖1 𝑖2 𝑖3 𝑖4 𝑖5 𝑖6 𝑖7 𝑢1 2/3 5/3 -7/3 𝑢2 1/3 1/3 -2/3 𝑢3 -5/3 1/3 4/3 𝑢4 0 0 • See what just happened to the ratings! • Behavior and items more well-separated. • Cosine can now be + or -: check (A,B) and (A,C).
- Prediction using Memory/Neighborhood- based approaches • A popular approach – using Pearson correlation coefficient. • 𝑟𝑢𝑖 = 𝑟𝑢 + 𝐾. 𝑣∈𝑁 𝑢 ∩𝑈 𝑖 𝑤𝑢𝑣. 𝑟𝑣𝑖 − 𝑟𝑣 , where 𝑤𝑢𝑣 = { 𝑗∈𝐼 𝑢 ∩𝐼 𝑣 𝑟𝑢𝑗 − 𝑟𝑢 𝑟𝑣𝑗 − 𝑟𝑣 }/{√ 𝑗∈𝐼 𝑢 ∩𝐼 𝑣 𝑟𝑢𝑗 −
- User-User vs Item-Item. • User-User CF: what we just discussed! • Item-Item – dual in principle: find items most similar to distinguished item 𝑖; for every user 𝑢 who did not rate the distinguished item but rated sufficiently many from the similarity group, compute 𝑟𝑢𝑖. • In practice, item-item has been found to be better than user-user.
- Simpler Alternatives for Rating Estimation • Simple average of ratings by most similar neighbors. • Weighted average. • User’s mean plus offset corresponding to weighted average of offsets by most similar neighbors (Pearson!). • Or you can see the popular vote by most similar neighbors: e.g., 𝑢 has 5 most similar neighbors who have rated 𝑖. – 𝑣1, 𝑣2 rated 1; 𝑣_3 rated 3; 𝑣4 rated 4; 𝑣5 rated 5. – Simple majority: 𝑟𝑢𝑖 = 1. – Suppose 𝑤𝑢𝑣1 = 𝑤𝑢𝑣2 = 0.2; 𝑤𝑢𝑣3 = 0.3; 𝑤𝑢𝑣4 = 0.8; 𝑤𝑢𝑣5 =1.0. Then 𝑟𝑢𝑖 = 5. Tie-breaking arbitrary.
- Item-based CF • Dual to user-based CF, in principle. • “People who bought 𝑆 also bought 𝑇”. • Natural connection to association rules (each user = a transaction). • Predict unknown rating of user 𝑢 on item 𝑖 as the aggregate of ratings by 𝑢 on items similar to 𝑖. • E.g., using mean-centering and Pearson correlation for item-item similarity, 𝑟𝑢𝑖 = 𝑟𝑖 + 𝐾 𝑗∈𝐼 𝑢 ∩𝑁(𝑖) 𝑤𝑖𝑗. (𝑟𝑢𝑗 − 𝑟 𝑗) where 𝑟𝑖 =mean rating of 𝑖 by various users and 𝑤𝑖𝑗 = similarity b/w 𝑖 and 𝑗, and 𝐾– the usual normalization factor.
- Item-based CF Computation Illustrated • Similarities: computing sim. b/w all pairs of items is prohibitive! • But do we need to? • How efficiently can we compute the sim. of all pairs of items for which the sim. Is positive? X X X X 𝑖 𝑢 …
- Item-based CF – Recommendation Generation X X X X 𝑖 𝑢 X X X X X similar items? similar items? How efficiently can we generate recommendations for a given user?
- Some empirical facts re. user-based vs. item-based CF • User profiles are typically thinner than item profiles; depends on application domain. – Certainly holds for movies (Netflix). • as users provide more ratings, user-user sim. can chage more dyamically than item-item sim. • Can we precompute item-item sim. and speed up prediction computation? • What about refreshing sim. against updates? Can we do it incrementally? How often should we do this? • Why not do this for user-user?
- User & Item-based CF are both personalized • Non-personalized would estimate an unknown rating as a global average. • Every user gets the same recommendation list, modulo items s/he may have already rated. • Personalized clearly leads to better predictions.