On Top-k Recommendation using Social Networks

On Top-k Recommendation using Social
Networks

Xiwang Yang, Harald Steck*+, Yang Guo* and Yong Liu

Polytechnic Institute of NYU
*Bell Labs
+
Netflix Inc.

1

Outline
 Background & Motivation
 Social network based top-k recommendation
 Related Work: AllRank, SoRec, STE, SocialMF, Trust-cf

 Top-k recommender using social networks
 Top-k MF using Social Networks
 Nearest Neighbor Methods

 Evaluation
 Conclusion

2

Social Recommenders Everywhere

3

Social network based top-k recommendation

Target
Customer
List of
Top
Movies ??

Recommender

Social network based top-k recommendation is not well studied
4

Social Top-K Recommendation
 Top-k recommendation:
 More realistic RS task
 Integrate social network information into RS
 Matrix Factorization(MF)
• SoRec, STE, SocialMF – optimzie RMSE
• AllRank - without social network information
• Our approach directly optimize social network based
top-k recommendation
 Nearest Neighbor(NN)
• Trust-cf (recsys’09)
– Combine CF neighborhood with social neighborhood,
items rated by the combined neighborhood are
considered, average rating, rank item based on
predicted rating to form top-k recommendation
• Our approach employs new neighborhood construction +
5 using voting mechanism

AllRank-(Steck kdd’10)
 Use AllRank to optimize top-k recommendation
 user’s selection bias causes the observed feedback (e.g. ratings,
purchases, clicks) in the data to be missing not at random (MNAR)—
(Recsys’09)
 Lower ratings missed with higher probability
 missing ratings tend to indicate that a user does not like the item
ˆ
 Prediction: Ru ,i = rm + Qu PiT
 Objective:

∑∑
all u all i
i
ˆ
Wu ,i ( Ruo,&i − Ru ,i ) 2 + λ (|| P ||2 + || Q ||2 )
F F

 1 if Ru ,i observed R if Ru ,i observed
Wu ,i =  Ruo,&i =  u ,i
 wm otherwise
i
 rm otherwise
 Wm > 0, training on all items
 BaseMF: Wm = 0, training on observed ratings only
 Rank items based on predicted rating to form top-k list
 Tailor existing social-trust enhanced MF model for top-k
recommendation
6

Outline
 Related Work: AllRank, SoRec, STE, SocialMF


 Evaluation
 Conclusion

7

SoRec
 Prediction:
ˆ ˆ
Ru ,i = rm + Qu PiT S * ,v = sm + Qu Z vT
u

 Objective-optimize RMSE

∑ ˆ
( Ru ,i − Ru ,i ) 2 + γ ∑ ˆ*
( Su ,v − Su ,v ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 )
*
F F F
( u ,i ) obs . ( u ,v ) obs .

 Modified Objective-optimize top-k hit rate
∑ ∑W
all u all i
u ,i (Ro &i
u ,i
ˆ ) 2 + ∑ ∑ W ( S ) ( S *( o&i ) − S * ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 )
− Ru ,i u ,v
all u all v
u ,v
ˆ
u ,v F F F

Wu ,i =  Ruo,&i =  u ,i
 wm >0 otherwise
i
 rm otherwise
 1 if Su ,v observed
*
 Su , v
* *
if Su ,v observed
W (S )
u ,v = γ  (S ) S *( o &i )
=
 wm >0 otherwise
u ,v
 sm otherwise
Top-k list generated based on ranking of predicted ratings of all items

 STE: Ru ,i = rm + α Qu PiT + (1 − α )∑ Su ,vQv PiT
ˆ
v
 Modified Objective-optimize top-k hit rate

∑∑
all u all i
i
ˆ
Wu ,i ( Ruo,&i − Ru ,i )2 + λ (|| P ||2 + || Q ||2 )
F F

Wu ,i =  Ruo,&i =  u ,i
 wm >0 otherwise
i
 rm otherwise

ˆ
Ru ,i = rm + Qu PiT
 SocialMF:
 Modified Objective-optimize top-k hit rate
∑∑ ˆ
Wu ,i ( Ruo,&i − Ru ,i ) 2
all u all i
i

 
+ β ∑  (Qu − ∑ Su ,v Qv )(Qu − ∑ Su ,v Qv )T ÷
* *

all u  v v 
+λ (|| P ||2 + || Q ||2 )
F F

Nearest Neighbor Methods
 CF-ULF approach
 Use AllRank to obtain user latent features
 Clustering user by PCC in latent feature space
 Select k1 nearest neighbor for target user u
 Relevant items of these nearest neighbors are voted to
target user, voting weight is PCC similarity
Voteu ,i = ∑ ∑ sim(u, v) δ i∈I v ,
v∈Nu i

 Top-k list is generated based on voting value

 PureTrust approach
 breadth-first search (BFS) in the social network to
find k2 trusted users to the target user u.
 Relevant items of these trusted users are voted to
target user, voting weight is proportional to 1/dv

Voteu ,i = ∑ ∑ w (u, v) δ
t
t i∈I v
v∈Nu i

 is the set of trusted users of u
t
Nu

 wt (u, v) is the voting weight from user v
wt (u , v) = 1
dv
 dv is the depth of user v in the BFS tree rooted at
user u.

 Trust-CF-ULF approach
 combination of CF-ULF approach and PureTrust
 Find k1 nearest neighbors from the CF-ULF neighborhood
 Find k2 nearest neighbors from the trust neighborhood which
are not in the k1 set (k2 = k1)
 Relevant items of these users are voted to target user
 Top-k list is generated based on voting value

 Trust-CF-ULF-best approach
 Given total neighborhood size, dynamically tune the value of
k1 and k2 to obtain the best recall result

Outline
 Related Work: AllRank, SoRec, STE, SocialMF


 Evaluation
 Conclusion

13

Evaluation Metrics
 Top-k hit rate(Recall)
 The fraction of relevant items in the test set that are in the
top-k of the ranking list

 RMSE

RMSE =
∑ ( u ,i )∈Rtest
ˆ
( Ru ,i − Ru ,i ) 2
| Rtest |

14

Top-k hit rate on Epinions Dataset
 71K users, 104K items, 571K item reviews, 509K trust statement

 Up to ~10× increment compared with training on observed rating
 Social network is very helpful in terms of top-k recommendation
especially for recommendation of cold start users
 Modified SoRec outperforms modified No Trust (AllRank)by 23.1% in
terms of overall recall and 101.8% in terms of cold user recall
 Recall of cold users in SoRec better than all users
 Item rated by a cold user averagely has received 102 ratings
 Item rated by all users has received averagely 93 ratings
15

RMSE on Epinions Dataset
 Set j0 = 10 λ =0.1, rm = 4.0, wm = 0
 RMSE = 1.174, BaseMF
 RMSE = 1.095, for SocialMF (β = 20),
 RMSE = 1.157, for STE (α = 0.5),
 RMSE = 1.117, for SoRec ( γ = 50 and wM =0)
(S )

 Consistent with RMSE results in published literature
 SocialMF performs best in RMSE while performs
worst in terms of top-k hit rate

16

Experiments on Epinions Dataset-NN

 Greatly outperform existing work—trust-cf
 Trust-cf predicts the rating value of target user in terms of
the average rating values of the user’s neighbors–which is
obviously based on the observed ratings only
 Our CF neighbors derived from user latent features obtained
from AllRank, which considered data MNAR, training on all items
 Voting is the simplest possible way of accounting for all
ratings, i.e. by counting 0 for an absent rating and counting 1
17 for an observed relevant rating

Experiments on Flixster Dataset
 ~1M Users, 49K movies, 8.2M ratings,
26.7M connections
 Results are similar

18

Impact of Dimensionality and Top-k

 top-k hit rate of Flixster data is much more better than
Epinions data
 Number of items in Epinions dataset is about two times as of
Flixster dataset while recall of Flixster is more than twice
of Epinions for top-5 to top-500 recommendations
 Epinions is a multi-category data(cars, movies, books,etc.)
 users in Flixster dataset averagely have more number of
19
social connections and item ratings

Conclusion
 Comprehensive study on improving the accuracy of
top-k recommendation using social networks
 Tailor existing social-trust enhanced MF models for top-k
recommendation by considering missing ratings

 Proposed a NN based top-k recommendation method
combining users’ neighborhoods in the trust network with
their neighborhoods in the latent feature space and used
voting instead of average rating to consider all ratings

 Social recommenders considering missing feedbacks
that works best for minimizing RMSE works worst for
maximizing the hit rate, and vice versa
 First developing a good RMSE approach, and then modifying
the training for top-k is not necessarily a viable strategy for
obtaining a good top-k approach
20

On Top-k Recommendation using Social Networks

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

On Top-k Recommendation using Social Networks

Editor's Notes