Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Collaborative Filtering 1: User-based CF
1. Recommender Systems &
Collaborative Filtering
Yusuke Yamamoto
Faculty of Informatics
Senior Lecturer
yusuke_yamamoto@acm.org
Data Engineering (Recommender System 1)
2019.10.16
12. Why using Recommender Systems?
12
Value for users
● Find things that are interesting
● Narrow down the set of choices
Value for providers
● Increase trust and customer loyalty
● Increase sales, click rates, conversion etc.
● Discover new things..
● Opportunities for promotion
13. Definition of Recommender System
13
Favorite artist setting
Purchase
Dwell time
Clickthrough
Bookmark
…
Rating Comment
Retweet
…
Explicit preference info.
+
Predicts user preference model and
decides which items should be recommended.
Implicit preference info.
Favorite genre setting
Favorite brand setting
14. Definition of Recommender System
14
Ad
MusicProduct
…
Web pageUser
Event
Predicts user preference model and
decides which items should be recommended.
15. Paradigms of Recommender System
15Dietmar Jannach氏のRecommender Systems: An IntroductionのPPT資料より
Recommender
System
item score
item1 0.9
item2 1
item3 0.3
… …
User profile
& context
Recommendation
list for a target user
1
Community
data2
Item features
3
User model
16. 3 main approach for recommendation
16
Collaborative filtering
Decides which items should be recommended,
based on past behavior logs of similar users
Content-based filtering
Decides which items should be recommended,
based on item features and its metadata
Knowledge-based filtering
Decides which items should be recommended,
based on preference info. which users explicitly show
17. Problem Definition
17
§ User u’s behavior data setBu={b1, b2, …, bn}
§ Item set I = {i1, i2, …, im}
§ User u’s profile(user model):pu
§ Relevance between pu and item i :Rel (pu, i)
Input
Output
Ranked list of item set I (∀ i ∈ I), based on Rel (pu, i)
s.t.
l How to model user profiles?
l How to compute relevance?
Point
18. Content list of this lecture
18
1. Collaborative Filtering (CF)
2. Content-based Filtering
3. Link analysis
4. Advanced CF
Lecture + Programming Work
as you can see how methods work
21. Collaborative Filtering (CF)
21
Approach
Uses the preferences of a community data to
recommend items
Basis assumption
• Users appropriately give ratings to items
• Patterns in the rating data help us predict the ratings
Practical points
• Large commercial eCommerce sites use the CF
• Well-understood
• Applicable in many domains if only rating data can be
obtained
22. Example
22
How much does
Alice like Item5?Q.
Alice
Item 1 Item 2 Item 3 Item 4 Item 5
?
Items purchased by Alice and her ratings
Un-purchased item
A
✓ ✓ ✓ ✓
23. Let’s observe other users’ ratings
23
Can we predict Alice’s rating using others’ ratings?Q.
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
24. Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
24
Dissimilar
Similar
Let’s observe other users’ ratings
Can we predict Alice’s rating using others’ ratings?Q.
25. Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Let’s observe other users’ ratings
25
Alice will give about 5 to item 5?
Can we predict Alice’s rating using others’ ratings?Q.
26. User-based Collaborative Filtering
26
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Use ratings of the users with similar preferenceIdea:
Point
How to compute user similarity
How do we combine the ratings of the similar users
to predict Alice’s rating?
Which/how many similar users’ ratings to consider?
1.
2.
3.
27. Similarity between users(1/3)
27
Pearson Correlation Coefficient
𝑠𝑖𝑚 𝑢', 𝑢) =
∑,∈-(𝑟01,, − 𝑟01
)(𝑟04,, − 𝑟04
)
∑,∈- 𝑟01,, − 𝑟01
5
∑,∈- 𝑟01,, − 𝑟01
5
:User a, b𝑢', 𝑢)
𝑟01,, :User a’s rating to item i
𝐼 : Item set
𝑟01
𝑟04, :User a, b’s average rating
30. Pearson correlation(1/2)
30
A measure of the linear correlation between
two variables X and Y
0
1
2
3
4
5
6
Item1 Item2 Item3 Item4
Alice
User1
User4
Ratingscore
sim(Alice, User4)=-0.79
sim(Alice, User1)=0.85
(It takes differences in rating behavior into account)
31. Pearson correlation(2/2)
31
0
1
2
3
4
5
6
Item1 Item2 Item3 Item4
Alice
User1
User2
Ratingscore
sim(Alice, User2)=0.71
A measure of the linear correlation between
two variables X and Y
(It takes differences in rating behavior into account)
32. Predicting rating scores based on user similarity(1/3)
32
A typical prediction function
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢', 𝑖 = 𝑟01
+
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢) ? (𝑟0,, − 𝑟01
)
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢)
:Target user a𝑢'
𝑟0,, :Rating score of u for item i
𝑖 :Target item i
𝑟01 :User a’s average rating score
𝑈A :A set of similar users to ua
33. Predicting rating scores based on user similarity(2/3)
33
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢', 𝑖 = 𝑟01
+
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢) ? (𝑟0,, − 𝑟01
)
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢)
Item5 sim Average rating
Alice ? 1 4
User1 3 0.85 2.4
User2 5 0.71 3.8
Similar
users
4.0 +
0.85× 3 − 2.4 + 0.71×(5 − 3.8)
0.85 + 0.71
= 4.87
Predicted rating score of Alice for Item5
A typical prediction function
34. Predicting rating scores based on user similarity(3/3)
34
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢', 𝑖 = 𝑟01
+
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢) ? (𝑟0,, − 𝑟01
)
∑0∈=>
𝑠𝑖𝑚(𝑢', 𝑢)
Item5 sim Average rating
Alice ? 1 4
User1 3 0.85 2.4
User2 5 0.71 3.8
Similar
users
4.0 +
0.85× 3 − 2.4 + 0.71×(5 − 3.8)
0.85 + 0.71
= 4.87
Predicted rating score of Alice for Item5
A typical prediction function
How to choose similar users?Q.
35. How to decide “similar users” (nearest neighbors)?
35
Set a threshold for user similarity
• If a user has higher similarity than a threshold,
he/she can be regarded as a “similar” user
• In worst cases, no similar users will be found
Focus on top K similar users (kNN method)
• If a user ranks at the top K similarity, he/she can be
regarded as a similar user
• K is often set to between 50 〜 200
• In worst cases, a system uses rating information of users
with low similarity
36. Summary of User-based Collaborative Filtering
36
Basic Approach
• User similarities are obtained from a rating matrix
• Based on rating scores of similar users, systems predict
a rating score of target user for a target item
Similarity Calculation
Pearson correlation coefficient is often used
Selection of Similar Users
Top K users with high similarity are often selected as
similar users