This document outlines an item-based collaborative filtering recommendation algorithm that has been scaled up to run on Hadoop. It first discusses collaborative filtering techniques and how they work. It then describes scaling up the item-based collaborative filtering approach by dividing it into two steps: similarity computation and prediction/recommendation. The key computations involve calculating average item ratings, similarity between item pairs, and predicted ratings for target users. An experiment tested the scaled approach on a Hadoop cluster with 3 nodes.
4. Collaborative Filtering
✤ Collaborative filtering is a method of making
automatic predictions (filtering) about the interests of
a user by collecting preferences or taste information
from many users (collaborating). from wiki
5. Collaborative Filtering
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
6. 1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
simple
compute
Nathan [5,1,5]
Joe [5,2,5]
John [2,5,2.5]
Al [2,2,4]
use cosine compute similarity
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
7. 1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
simple
compute
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
(0.99*4+0.64*3+0.91*2)/(0.99+0.64+0.91) = 3.03
0.99
0.91
0.64
? = 3.03
9. Collaborative Filtering
✤ User-Based CF
compute similarity base on user
if predict user A to item4 rating
user B to item4 rating is 5
user F to item4 rating is 1
user A to item4 =
5 * similarities (user A, user B) + 1 * similarities (user A, user F)
similarities (user A, user B) + similarities (user A, user F)
10. Collaborative Filtering
✤ Item-Based CF
compute similarity base on item
if predict user A to item4 rating
user A to item2 rating is 1
user A to item3 rating is 1
user A to item4 =
1 * similarities (item2, item4) + 1 * similarities (item3, item4)
similarities (item2, item4) + similarities (item3, item4)
11. scaling-up item-based CF
divide CF algorithm into two steps as follows:
Similarity computation
Prediction and Recommendation
pearson correlation(1,-1)
j
14. scaling-up item-based CF
Similarity computation
apple milk toast
sam 2 0 4
john 5 5 3
tim 2 4 ?
u
j
i
Ru(sam) = (2+0+4)/3
Rj = (2+5+2)/3 Ri = (4+3)/2
15. scaling-up item-based CF
The three parts of intensive computation are:
(1)computing the average rating for each item
(2)computing the similarity between item pairs
(3)computing predicted items for the target user