1. On Top-k Recommendation using Social
Networks
Xiwang Yang, Harald Steck*+, Yang Guo* and Yong Liu
Polytechnic Institute of NYU
*Bell Labs
+
Netflix Inc.
1
2. Outline
Background & Motivation
Social network based top-k recommendation
Related Work: AllRank, SoRec, STE, SocialMF, Trust-cf
Top-k recommender using social networks
Top-k MF using Social Networks
Nearest Neighbor Methods
Evaluation
Conclusion
2
4. Social network based top-k recommendation
Target
Customer
List of
Top
Movies ??
Recommender
Social network based top-k recommendation is not well studied
4
5. Social Top-K Recommendation
Top-k recommendation:
More realistic RS task
Integrate social network information into RS
Matrix Factorization(MF)
• SoRec, STE, SocialMF – optimzie RMSE
• AllRank - without social network information
• Our approach directly optimize social network based
top-k recommendation
Nearest Neighbor(NN)
• Trust-cf (recsys’09)
– Combine CF neighborhood with social neighborhood,
items rated by the combined neighborhood are
considered, average rating, rank item based on
predicted rating to form top-k recommendation
• Our approach employs new neighborhood construction +
5 using voting mechanism
6. AllRank-(Steck kdd’10)
Use AllRank to optimize top-k recommendation
user’s selection bias causes the observed feedback (e.g. ratings,
purchases, clicks) in the data to be missing not at random (MNAR)—
(Recsys’09)
Lower ratings missed with higher probability
missing ratings tend to indicate that a user does not like the item
ˆ
Prediction: Ru ,i = rm + Qu PiT
Objective:
∑∑
all u all i
i
ˆ
Wu ,i ( Ruo,&i − Ru ,i ) 2 + λ (|| P ||2 + || Q ||2 )
F F
1 if Ru ,i observed R if Ru ,i observed
Wu ,i = Ruo,&i = u ,i
wm otherwise
i
rm otherwise
Wm > 0, training on all items
BaseMF: Wm = 0, training on observed ratings only
Rank items based on predicted rating to form top-k list
Tailor existing social-trust enhanced MF model for top-k
recommendation
6
7. Outline
Background & Motivation
Social network based top-k recommendation
Related Work: AllRank, SoRec, STE, SocialMF
Top-k recommender using social networks
Top-k MF using Social Networks
Nearest Neighbor Methods
Evaluation
Conclusion
7
8. SoRec
Prediction:
ˆ ˆ
Ru ,i = rm + Qu PiT S * ,v = sm + Qu Z vT
u
Objective-optimize RMSE
∑ ˆ
( Ru ,i − Ru ,i ) 2 + γ ∑ ˆ*
( Su ,v − Su ,v ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 )
*
F F F
( u ,i ) obs . ( u ,v ) obs .
Modified Objective-optimize top-k hit rate
∑ ∑W
all u all i
u ,i (Ro &i
u ,i
ˆ ) 2 + ∑ ∑ W ( S ) ( S *( o&i ) − S * ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 )
− Ru ,i u ,v
all u all v
u ,v
ˆ
u ,v F F F
1 if Ru ,i observed R if Ru ,i observed
Wu ,i = Ruo,&i = u ,i
wm >0 otherwise
i
rm otherwise
1 if Su ,v observed
*
Su , v
* *
if Su ,v observed
W (S )
u ,v = γ (S ) S *( o &i )
=
wm >0 otherwise
u ,v
sm otherwise
Top-k list generated based on ranking of predicted ratings of all items
9. STE: Ru ,i = rm + α Qu PiT + (1 − α )∑ Su ,vQv PiT
ˆ
v
Modified Objective-optimize top-k hit rate
∑∑
all u all i
i
ˆ
Wu ,i ( Ruo,&i − Ru ,i )2 + λ (|| P ||2 + || Q ||2 )
F F
1 if Ru ,i observed R if Ru ,i observed
Wu ,i = Ruo,&i = u ,i
wm >0 otherwise
i
rm otherwise
ˆ
Ru ,i = rm + Qu PiT
SocialMF:
Modified Objective-optimize top-k hit rate
∑∑ ˆ
Wu ,i ( Ruo,&i − Ru ,i ) 2
all u all i
i
+ β ∑ (Qu − ∑ Su ,v Qv )(Qu − ∑ Su ,v Qv )T ÷
* *
all u v v
+λ (|| P ||2 + || Q ||2 )
F F
10. Nearest Neighbor Methods
CF-ULF approach
Use AllRank to obtain user latent features
Clustering user by PCC in latent feature space
Select k1 nearest neighbor for target user u
Relevant items of these nearest neighbors are voted to
target user, voting weight is PCC similarity
Voteu ,i = ∑ ∑ sim(u, v) δ i∈I v ,
v∈Nu i
Top-k list is generated based on voting value
11. Nearest Neighbor Methods
PureTrust approach
breadth-first search (BFS) in the social network to
find k2 trusted users to the target user u.
Relevant items of these trusted users are voted to
target user, voting weight is proportional to 1/dv
Voteu ,i = ∑ ∑ w (u, v) δ
t
t i∈I v
v∈Nu i
is the set of trusted users of u
t
Nu
wt (u, v) is the voting weight from user v
wt (u , v) = 1
dv
dv is the depth of user v in the BFS tree rooted at
user u.
12. Nearest Neighbor Methods
Trust-CF-ULF approach
combination of CF-ULF approach and PureTrust
Find k1 nearest neighbors from the CF-ULF neighborhood
Find k2 nearest neighbors from the trust neighborhood which
are not in the k1 set (k2 = k1)
Relevant items of these users are voted to target user
Top-k list is generated based on voting value
Trust-CF-ULF-best approach
Given total neighborhood size, dynamically tune the value of
k1 and k2 to obtain the best recall result
13. Outline
Background & Motivation
Social network based top-k recommendation
Related Work: AllRank, SoRec, STE, SocialMF
Top-k recommender using social networks
Top-k MF using Social Networks
Nearest Neighbor Methods
Evaluation
Conclusion
13
14. Evaluation Metrics
Top-k hit rate(Recall)
The fraction of relevant items in the test set that are in the
top-k of the ranking list
RMSE
RMSE =
∑ ( u ,i )∈Rtest
ˆ
( Ru ,i − Ru ,i ) 2
| Rtest |
14
15. Top-k hit rate on Epinions Dataset
71K users, 104K items, 571K item reviews, 509K trust statement
Up to ~10× increment compared with training on observed rating
Social network is very helpful in terms of top-k recommendation
especially for recommendation of cold start users
Modified SoRec outperforms modified No Trust (AllRank)by 23.1% in
terms of overall recall and 101.8% in terms of cold user recall
Recall of cold users in SoRec better than all users
Item rated by a cold user averagely has received 102 ratings
Item rated by all users has received averagely 93 ratings
15
16. RMSE on Epinions Dataset
Set j0 = 10 λ =0.1, rm = 4.0, wm = 0
RMSE = 1.174, BaseMF
RMSE = 1.095, for SocialMF (β = 20),
RMSE = 1.157, for STE (α = 0.5),
RMSE = 1.117, for SoRec ( γ = 50 and wM =0)
(S )
Consistent with RMSE results in published literature
SocialMF performs best in RMSE while performs
worst in terms of top-k hit rate
16
17. Experiments on Epinions Dataset-NN
Greatly outperform existing work—trust-cf
Trust-cf predicts the rating value of target user in terms of
the average rating values of the user’s neighbors–which is
obviously based on the observed ratings only
Our CF neighbors derived from user latent features obtained
from AllRank, which considered data MNAR, training on all items
Voting is the simplest possible way of accounting for all
ratings, i.e. by counting 0 for an absent rating and counting 1
17 for an observed relevant rating
18. Experiments on Flixster Dataset
~1M Users, 49K movies, 8.2M ratings,
26.7M connections
Results are similar
18
19. Impact of Dimensionality and Top-k
top-k hit rate of Flixster data is much more better than
Epinions data
Number of items in Epinions dataset is about two times as of
Flixster dataset while recall of Flixster is more than twice
of Epinions for top-5 to top-500 recommendations
Epinions is a multi-category data(cars, movies, books,etc.)
users in Flixster dataset averagely have more number of
19
social connections and item ratings
20. Conclusion
Comprehensive study on improving the accuracy of
top-k recommendation using social networks
Tailor existing social-trust enhanced MF models for top-k
recommendation by considering missing ratings
Proposed a NN based top-k recommendation method
combining users’ neighborhoods in the trust network with
their neighborhoods in the latent feature space and used
voting instead of average rating to consider all ratings
Social recommenders considering missing feedbacks
that works best for minimizing RMSE works worst for
maximizing the hit rate, and vice versa
First developing a good RMSE approach, and then modifying
the training for top-k is not necessarily a viable strategy for
obtaining a good top-k approach
20
Brief introduction of social network based top-k rec
Due to their great commercial value, social recommender systems have been widely deployed in industry, such as social photo sharing at Pinterest, social music community site last.fm. Social product review website, Epinions. In Epinions, users review various items, such as cars, movies, books, software, etc., and assign ratings to the items. Users also assign trust values to other users if he find their product reviews or ratings are valuable Flixster is a social network site where user can rate movies and share movie reviews.
What is social network based top-k recommendation Most recent work on social network based recommendation is focused on minimizing RMSE Social network based top-k recommendation is not well studied. What we study in this paper is social network based top-k recommendation.
Why should we study top-k recommendation instead of RMSE oriented recommendation? When we look at the recommendation at Netflix or Amazon, it provides user the list of items that they may like. Top-k recommendation is a more realistic recommendation task. Top-k more relevant task, existing top-k in social network, MF & NN. Structure clear .
Let look at this part, w_m is the weight for missing ratings, r_m is the imputed value for missing ratings. The idea of AllRank is to impute a low value for missing rating but with low confidence. It is crucial that the weight assigned to the imputed ratings is positive . In contrast, the usual optimization of the RMSE (Root Mean Square Error) is obtained by training with Wm = 0. This seemingly small difference has the important effect that AllRank model is trained on all items, while RMSE - approaches are trained only on the observed ratings. Missing ratings prone to be not interested, this is captured by imputed value for missing rating r_m < global average rating, With less confidence than observed rating is captured by weight for missing ratings w_m >0, <1.
matrix Q is shared among the two equations. Due to this constraint, Q (i.e. the user profiles Qu for each user u) reflects information from both the ratings and the social network as to achieve accurate predictions for both. Gamma≥ 0 determines the weight of the social network information compared to the rating data. Obviously, Gamma = 0 corresponds to the extreme case where the social network is ignored when learning the matrices P and Q. As increases, the influence of the social network increases. We training on all user item pair by adding W_{u,i} with w_m > 0 we are indeed training on all user-item interaction. We training on all user user pair by adding W_{u,v}^{(S)}, with w_m^{(S)}>0 we are indeed training on all user-user interation. For the missing user-item rating, we impute a value r_m. For the missing user-user interaction, we impute value s_m. Note that, here we are not all training on all user-item interactions, but also all user-user interactions.
For STE model, the predicted rating of user u to item i is consist of two parts: the first part is Q_u.P_i, which infer from user u’s own taste, the other part is decided by u’s followees, followee’s weighted average latent factors contribute to the final results. Parameter alpha controls the contribution from the two parts. For modified STE and SocialMF model, the optimization procedure get easily stuck at local minimum, and we proposed some tricks to get rid of local minimum, the detail can be found in our technical report.
Trust-CF-ULF-best approach Given neighborhood size, dynamically tuning the value of k 1 and k 2 so as to obtain the best recall results. Recommendation of items to user is same as the one in CF-ULF approach.
N(u) number of relevant items of user u N(k,u) number of relevant items in top-k list for user u
Epinions: Social product review website In Epinions, users review various items, such as cars, movies, books, software, etc., and assign ratings to the items. Users also assign trust values to other users whose reviews and/or ratings they find valuable Compare with original training.
This illustrates that approaches that work well for the vastly popular RMSE are not necessarily useful for optimizing the more realistic top- k hit ratio or recall.
Despite the different properties of the Epinions and Flixster data sets, the results on the Flixster data confirm our results on the Epinions data.
Epinions data is a multi-category data which contains items from many categories(cars, movies, books, software, etc.) while items in Flixster are all movies which makes the recommendation easier in general. Furthermore, users in Flixster dataset averagely have more number of social connections and item ratings compared to Epinions dataset.
Existing social-trust enhanced Matrix Factorization (MF) models can be tailored for top-k recommendation by including observed and missing ratings in their training objective functions Found that the technical approach for combining feedback data (e.g. ratings) with social network information that works best for minimizing RMSE works poorly for maximizing the hit ratio, and vice versa.