2015/2/27
Scaling-up Item-based Collaborative Filtering
Recommendation Algorithm based on Hadoop
Jing Jiang, Jie Lu, Guangquan Zhang, Guodong Long 2011 IEEE World Congress Services
outline
✤ Collaborative Filtering
✤ scaling-up item-based CF
✤ experimentation and evaluation
Collaborative Filtering
✤ Collaborative filtering (CF) techniques have achieved
widespread success in E-commerce nowadays.
Collaborative Filtering
✤ Collaborative filtering is a method of making
automatic predictions (filtering) about the interests of
a user by collecting preferences or taste information
from many users (collaborating). from wiki
Collaborative Filtering
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
simple
compute
Nathan [5,1,5]
Joe [5,2,5]
John [2,5,2.5]
Al [2,2,4]
use cosine compute similarity
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected
neighbors’ ratings
simple
compute
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
(0.99*4+0.64*3+0.91*2)/(0.99+0.64+0.91) = 3.03
0.99
0.91
0.64
? = 3.03
Collaborative Filtering
✤ User-Based CF
✤ Item-Based CF
compute similarity base on user
compute similarity base on item
Collaborative Filtering
✤ User-Based CF
compute similarity base on user
if predict user A to item4 rating
user B to item4 rating is 5
user F to item4 rating is 1
user A to item4 =
5 * similarities (user A, user B) + 1 * similarities (user A, user F)
similarities (user A, user B) + similarities (user A, user F)
Collaborative Filtering
✤ Item-Based CF
compute similarity base on item
if predict user A to item4 rating
user A to item2 rating is 1
user A to item3 rating is 1
user A to item4 =
1 * similarities (item2, item4) + 1 * similarities (item3, item4)
similarities (item2, item4) + similarities (item3, item4)
scaling-up item-based CF
divide CF algorithm into two steps as follows:
Similarity computation
Prediction and Recommendation
pearson correlation(1,-1)
j
scaling-up item-based CF
pearson correlation(1,-1)
j
Covariance
scaling-up item-based CF
Similarity computation
apple milk toast
sam 2 0 4
john 5 5 3
tim 2 4 ?
u
i
j
j
Ri = (2+5+2)/3 Rj = (4+3)/2
scaling-up item-based CF
Similarity computation
apple milk toast
sam 2 0 4
john 5 5 3
tim 2 4 ?
u
j
i
Ru(sam) = (2+0+4)/3
Rj = (2+5+2)/3 Ri = (4+3)/2
scaling-up item-based CF
The three parts of intensive computation are:
(1)computing the average rating for each item
(2)computing the similarity between item pairs
(3)computing predicted items for the target user
item iby user j
map item i
1 2 3
1
wheremeans the
set of users who rated the item kand item l
2
similarity
3
map user j
map user j
experimentation and evaluation
3 nodes
nodes with Intel P4 CPU,
1G RAM, 80G disk
All the machines were connected
with one 100Mbps switch.
experimentation and evaluation
13
20

Collaborative Filtering Recommendation Algorithm based on Hadoop

  • 1.
    2015/2/27 Scaling-up Item-based CollaborativeFiltering Recommendation Algorithm based on Hadoop Jing Jiang, Jie Lu, Guangquan Zhang, Guodong Long 2011 IEEE World Congress Services
  • 2.
    outline ✤ Collaborative Filtering ✤scaling-up item-based CF ✤ experimentation and evaluation
  • 3.
    Collaborative Filtering ✤ Collaborativefiltering (CF) techniques have achieved widespread success in E-commerce nowadays.
  • 4.
    Collaborative Filtering ✤ Collaborativefiltering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). from wiki
  • 5.
    Collaborative Filtering 1. Weightall users with respect to similarity with active user 2. Select a subset of users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings
  • 6.
    1. Weight allusers with respect to similarity with active user 2. Select a subset of users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings simple compute Nathan [5,1,5] Joe [5,2,5] John [2,5,2.5] Al [2,2,4] use cosine compute similarity cos (Nathan,Joe) 0.99 cos (Nathan,John) 0.64 cos (Nathan,Al) 0.91
  • 7.
    1. Weight allusers with respect to similarity with active user 2. Select a subset of users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings simple compute cos (Nathan,Joe) 0.99 cos (Nathan,John) 0.64 cos (Nathan,Al) 0.91 (0.99*4+0.64*3+0.91*2)/(0.99+0.64+0.91) = 3.03 0.99 0.91 0.64 ? = 3.03
  • 8.
    Collaborative Filtering ✤ User-BasedCF ✤ Item-Based CF compute similarity base on user compute similarity base on item
  • 9.
    Collaborative Filtering ✤ User-BasedCF compute similarity base on user if predict user A to item4 rating user B to item4 rating is 5 user F to item4 rating is 1 user A to item4 = 5 * similarities (user A, user B) + 1 * similarities (user A, user F) similarities (user A, user B) + similarities (user A, user F)
  • 10.
    Collaborative Filtering ✤ Item-BasedCF compute similarity base on item if predict user A to item4 rating user A to item2 rating is 1 user A to item3 rating is 1 user A to item4 = 1 * similarities (item2, item4) + 1 * similarities (item3, item4) similarities (item2, item4) + similarities (item3, item4)
  • 11.
    scaling-up item-based CF divideCF algorithm into two steps as follows: Similarity computation Prediction and Recommendation pearson correlation(1,-1) j
  • 12.
    scaling-up item-based CF pearsoncorrelation(1,-1) j Covariance
  • 13.
    scaling-up item-based CF Similaritycomputation apple milk toast sam 2 0 4 john 5 5 3 tim 2 4 ? u i j j Ri = (2+5+2)/3 Rj = (4+3)/2
  • 14.
    scaling-up item-based CF Similaritycomputation apple milk toast sam 2 0 4 john 5 5 3 tim 2 4 ? u j i Ru(sam) = (2+0+4)/3 Rj = (2+5+2)/3 Ri = (4+3)/2
  • 15.
    scaling-up item-based CF Thethree parts of intensive computation are: (1)computing the average rating for each item (2)computing the similarity between item pairs (3)computing predicted items for the target user
  • 16.
    item iby userj map item i 1 2 3
  • 17.
    1 wheremeans the set ofusers who rated the item kand item l
  • 18.
  • 19.
    experimentation and evaluation 3nodes nodes with Intel P4 CPU, 1G RAM, 80G disk All the machines were connected with one 100Mbps switch.
  • 20.