On Top-k Recommendation using Social Networks

1,166 views

Published on

Published in: Technology, Business
3 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,166
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
34
Comments
3
Likes
0
Embeds 0
No embeds

No notes for slide
  • Brief introduction of social network based top-k rec
  • Due to their great commercial value, social recommender systems have been widely deployed in industry, such as social photo sharing at Pinterest, social music community site last.fm. Social product review website, Epinions. In Epinions, users review various items, such as cars, movies, books, software, etc., and assign ratings to the items. Users also assign trust values to other users if he find their product reviews or ratings are valuable Flixster is a social network site where user can rate movies and share movie reviews.
  • What is social network based top-k recommendation Most recent work on social network based recommendation is focused on minimizing RMSE Social network based top-k recommendation is not well studied. What we study in this paper is social network based top-k recommendation.
  • Why should we study top-k recommendation instead of RMSE oriented recommendation? When we look at the recommendation at Netflix or Amazon, it provides user the list of items that they may like. Top-k recommendation is a more realistic recommendation task. Top-k more relevant task, existing top-k in social network, MF & NN. Structure clear .
  • Let look at this part, w_m is the weight for missing ratings, r_m is the imputed value for missing ratings. The idea of AllRank is to impute a low value for missing rating but with low confidence. It is crucial that the weight assigned to the imputed ratings is positive . In contrast, the usual optimization of the RMSE (Root Mean Square Error) is obtained by training with Wm = 0. This seemingly small difference has the important effect that AllRank model is trained on all items, while RMSE - approaches are trained only on the observed ratings. Missing ratings prone to be not interested, this is captured by imputed value for missing rating r_m < global average rating, With less confidence than observed rating is captured by weight for missing ratings w_m >0, <1.
  • matrix Q is shared among the two equations. Due to this constraint, Q (i.e. the user profiles Qu for each user u) reflects information from both the ratings and the social network as to achieve accurate predictions for both. Gamma≥ 0 determines the weight of the social network information compared to the rating data. Obviously, Gamma = 0 corresponds to the extreme case where the social network is ignored when learning the matrices P and Q. As increases, the influence of the social network increases. We training on all user item pair by adding W_{u,i} with w_m > 0 we are indeed training on all user-item interaction. We training on all user user pair by adding W_{u,v}^{(S)}, with w_m^{(S)}>0 we are indeed training on all user-user interation. For the missing user-item rating, we impute a value r_m. For the missing user-user interaction, we impute value s_m. Note that, here we are not all training on all user-item interactions, but also all user-user interactions.
  • For STE model, the predicted rating of user u to item i is consist of two parts: the first part is Q_u.P_i, which infer from user u’s own taste, the other part is decided by u’s followees, followee’s weighted average latent factors contribute to the final results. Parameter alpha controls the contribution from the two parts. For modified STE and SocialMF model, the optimization procedure get easily stuck at local minimum, and we proposed some tricks to get rid of local minimum, the detail can be found in our technical report.
  • Trust-CF-ULF-best approach Given neighborhood size, dynamically tuning the value of k 1 and k 2 so as to obtain the best recall results. Recommendation of items to user is same as the one in CF-ULF approach.
  • N(u) number of relevant items of user u N(k,u) number of relevant items in top-k list for user u
  • Epinions: Social product review website In Epinions, users review various items, such as cars, movies, books, software, etc., and assign ratings to the items. Users also assign trust values to other users whose reviews and/or ratings they find valuable Compare with original training.
  • This illustrates that approaches that work well for the vastly popular RMSE are not necessarily useful for optimizing the more realistic top- k hit ratio or recall.
  • Despite the different properties of the Epinions and Flixster data sets, the results on the Flixster data confirm our results on the Epinions data.
  • Epinions data is a multi-category data which contains items from many categories(cars, movies, books, software, etc.) while items in Flixster are all movies which makes the recommendation easier in general. Furthermore, users in Flixster dataset averagely have more number of social connections and item ratings compared to Epinions dataset.
  • Existing social-trust enhanced Matrix Factorization (MF) models can be tailored for top-k recommendation by including observed and missing ratings in their training objective functions Found that the technical approach for combining feedback data (e.g. ratings) with social network information that works best for minimizing RMSE works poorly for maximizing the hit ratio, and vice versa.
  • On Top-k Recommendation using Social Networks

    1. 1. On Top-k Recommendation using Social Networks Xiwang Yang, Harald Steck*+, Yang Guo* and Yong Liu Polytechnic Institute of NYU *Bell Labs + Netflix Inc.1
    2. 2. Outline  Background & Motivation  Social network based top-k recommendation  Related Work: AllRank, SoRec, STE, SocialMF, Trust-cf  Top-k recommender using social networks  Top-k MF using Social Networks  Nearest Neighbor Methods  Evaluation  Conclusion2
    3. 3. Social Recommenders Everywhere3
    4. 4. Social network based top-k recommendation Target Customer List of Top Movies ?? RecommenderSocial network based top-k recommendation is not well studied 4
    5. 5. Social Top-K Recommendation Top-k recommendation:  More realistic RS task Integrate social network information into RS  Matrix Factorization(MF) • SoRec, STE, SocialMF – optimzie RMSE • AllRank - without social network information • Our approach directly optimize social network based top-k recommendation  Nearest Neighbor(NN) • Trust-cf (recsys’09) – Combine CF neighborhood with social neighborhood, items rated by the combined neighborhood are considered, average rating, rank item based on predicted rating to form top-k recommendation • Our approach employs new neighborhood construction +5 using voting mechanism
    6. 6. AllRank-(Steck kdd’10) Use AllRank to optimize top-k recommendation user’s selection bias causes the observed feedback (e.g. ratings, purchases, clicks) in the data to be missing not at random (MNAR)— (Recsys’09)  Lower ratings missed with higher probability  missing ratings tend to indicate that a user does not like the item ˆ Prediction: Ru ,i = rm + Qu PiT Objective: ∑∑ all u all i i ˆ Wu ,i ( Ruo,&i − Ru ,i ) 2 + λ (|| P ||2 + || Q ||2 ) F F  1 if Ru ,i observed R if Ru ,i observedWu ,i =  Ruo,&i =  u ,i  wm otherwise i  rm otherwise Wm > 0, training on all items BaseMF: Wm = 0, training on observed ratings only Rank items based on predicted rating to form top-k list Tailor existing social-trust enhanced MF model for top-k recommendation6
    7. 7. Outline  Background & Motivation  Social network based top-k recommendation  Related Work: AllRank, SoRec, STE, SocialMF  Top-k recommender using social networks  Top-k MF using Social Networks  Nearest Neighbor Methods  Evaluation  Conclusion7
    8. 8. SoRec  Prediction: ˆ ˆ Ru ,i = rm + Qu PiT S * ,v = sm + Qu Z vT u  Objective-optimize RMSE ∑ ˆ ( Ru ,i − Ru ,i ) 2 + γ ∑ ˆ* ( Su ,v − Su ,v ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 ) * F F F ( u ,i ) obs . ( u ,v ) obs .  Modified Objective-optimize top-k hit rate∑ ∑Wall u all i u ,i (Ro &i u ,i ˆ ) 2 + ∑ ∑ W ( S ) ( S *( o&i ) − S * ) 2 + λ (|| P ||2 + || Q ||2 + || Z ||2 ) − Ru ,i u ,v all u all v u ,v ˆ u ,v F F F  1 if Ru ,i observed R if Ru ,i observed Wu ,i =  Ruo,&i =  u ,i  wm >0 otherwise i  rm otherwise  1 if Su ,v observed *  Su , v * * if Su ,v observed W (S ) u ,v = γ  (S ) S *( o &i ) =  wm >0 otherwise u ,v  sm otherwise Top-k list generated based on ranking of predicted ratings of all items
    9. 9.  STE: Ru ,i = rm + α Qu PiT + (1 − α )∑ Su ,vQv PiT ˆ v  Modified Objective-optimize top-k hit rate ∑∑ all u all i i ˆ Wu ,i ( Ruo,&i − Ru ,i )2 + λ (|| P ||2 + || Q ||2 ) F F  1 if Ru ,i observed R if Ru ,i observedWu ,i =  Ruo,&i =  u ,i  wm >0 otherwise i  rm otherwise ˆ Ru ,i = rm + Qu PiT SocialMF:  Modified Objective-optimize top-k hit rate ∑∑ ˆ Wu ,i ( Ruo,&i − Ru ,i ) 2 all u all i i   + β ∑  (Qu − ∑ Su ,v Qv )(Qu − ∑ Su ,v Qv )T ÷ * * all u  v v  +λ (|| P ||2 + || Q ||2 ) F F
    10. 10. Nearest Neighbor Methods CF-ULF approach  Use AllRank to obtain user latent features  Clustering user by PCC in latent feature space  Select k1 nearest neighbor for target user u  Relevant items of these nearest neighbors are voted to target user, voting weight is PCC similarity Voteu ,i = ∑ ∑ sim(u, v) δ i∈I v , v∈Nu i  Top-k list is generated based on voting value
    11. 11. Nearest Neighbor Methods PureTrust approach  breadth-first search (BFS) in the social network to find k2 trusted users to the target user u.  Relevant items of these trusted users are voted to target user, voting weight is proportional to 1/dv Voteu ,i = ∑ ∑ w (u, v) δ t t i∈I v v∈Nu i  is the set of trusted users of u t Nu  wt (u, v) is the voting weight from user v wt (u , v) = 1 dv  dv is the depth of user v in the BFS tree rooted at user u.
    12. 12. Nearest Neighbor Methods Trust-CF-ULF approach  combination of CF-ULF approach and PureTrust  Find k1 nearest neighbors from the CF-ULF neighborhood  Find k2 nearest neighbors from the trust neighborhood which are not in the k1 set (k2 = k1)  Relevant items of these users are voted to target user  Top-k list is generated based on voting value Trust-CF-ULF-best approach  Given total neighborhood size, dynamically tune the value of k1 and k2 to obtain the best recall result
    13. 13. Outline  Background & Motivation  Social network based top-k recommendation  Related Work: AllRank, SoRec, STE, SocialMF  Top-k recommender using social networks  Top-k MF using Social Networks  Nearest Neighbor Methods  Evaluation  Conclusion13
    14. 14. Evaluation Metrics  Top-k hit rate(Recall)  The fraction of relevant items in the test set that are in the top-k of the ranking list  RMSE RMSE = ∑ ( u ,i )∈Rtest ˆ ( Ru ,i − Ru ,i ) 2 | Rtest |14
    15. 15. Top-k hit rate on Epinions Dataset  71K users, 104K items, 571K item reviews, 509K trust statement  Up to ~10× increment compared with training on observed rating  Social network is very helpful in terms of top-k recommendation especially for recommendation of cold start users  Modified SoRec outperforms modified No Trust (AllRank)by 23.1% in terms of overall recall and 101.8% in terms of cold user recall  Recall of cold users in SoRec better than all users  Item rated by a cold user averagely has received 102 ratings  Item rated by all users has received averagely 93 ratings15
    16. 16. RMSE on Epinions Dataset  Set j0 = 10 λ =0.1, rm = 4.0, wm = 0  RMSE = 1.174, BaseMF  RMSE = 1.095, for SocialMF (β = 20),  RMSE = 1.157, for STE (α = 0.5),  RMSE = 1.117, for SoRec ( γ = 50 and wM =0) (S )  Consistent with RMSE results in published literature  SocialMF performs best in RMSE while performs worst in terms of top-k hit rate16
    17. 17. Experiments on Epinions Dataset-NN  Greatly outperform existing work—trust-cf  Trust-cf predicts the rating value of target user in terms of the average rating values of the user’s neighbors–which is obviously based on the observed ratings only  Our CF neighbors derived from user latent features obtained from AllRank, which considered data MNAR, training on all items  Voting is the simplest possible way of accounting for all ratings, i.e. by counting 0 for an absent rating and counting 117 for an observed relevant rating
    18. 18. Experiments on Flixster Dataset  ~1M Users, 49K movies, 8.2M ratings, 26.7M connections  Results are similar18
    19. 19. Impact of Dimensionality and Top-k  top-k hit rate of Flixster data is much more better than Epinions data  Number of items in Epinions dataset is about two times as of Flixster dataset while recall of Flixster is more than twice of Epinions for top-5 to top-500 recommendations  Epinions is a multi-category data(cars, movies, books,etc.)  users in Flixster dataset averagely have more number of19 social connections and item ratings
    20. 20. Conclusion  Comprehensive study on improving the accuracy of top-k recommendation using social networks  Tailor existing social-trust enhanced MF models for top-k recommendation by considering missing ratings  Proposed a NN based top-k recommendation method combining users’ neighborhoods in the trust network with their neighborhoods in the latent feature space and used voting instead of average rating to consider all ratings  Social recommenders considering missing feedbacks that works best for minimizing RMSE works worst for maximizing the hit rate, and vice versa  First developing a good RMSE approach, and then modifying the training for top-k is not necessarily a viable strategy for obtaining a good top-k approach20
    21. 21. Thanks! Q&A21

    ×