Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to Empirical Evaluation of Active Learning in Recommender Systems(20)

Advertisement

Recently uploaded(20)

Empirical Evaluation of Active Learning in Recommender Systems

  1. Empirical Evaluation of Active Learning in Recommender Systems Mehdi Elahi Postdoc Researcher Politecnico di Milano, Italy 1 seminar @ Politecnico di Milano July 2015 www.linkedin.com/in/mehdielahi
  2. My Previous Research Group 2 Ph.D. Adviser: Francesco Ricci Full Professor Dean of the faculty of CS https://www.inf.unibz.it/idse/
  3. My Current Research Group 3 Research Adviser: Paolo Cremonesi Associate Professor DEIB @ Politecnico di Milano http://recsys.deib.polimi.it
  4. Outline ¤ Introduction ¤ Active Learning in RS ¤ Offline Evaluation and Results ¤ Application to Mobile RS ¤ Conclusion and Future Works 4
  5. Introduction ¤ Recommender Systems (RSs) are tools that support users decision making by suggesting products that can be interesting to them. ¤ Examples of Recommender Systems: 5
  6. Introduction ¤ Collaborative Filtering: ¤ A technique to predict unknown ratings, by exploiting ratings given by users, and to recommend the items with highest predicted ratings. 6
  7. Sparsity of the Data ¤ In Netflix: 98.8 % of the ratings are unknown ¤ In Movielens: 95.7 % of the ratings are unknown 1 2 3 4 5 6 7 8 9 1 2 2 5 3 1 3 1 5 4 5 5 5 6 4 1 4 7 8 5 4 9 5 Items Users Ratings 7
  8. Active Learning for Collaborative Filtering ¤ Active Learning: ¤ Requests and try to collect more ratings from the users before offering recommendations 8
  9. Which Items should be chosen? ¤ Not all the ratings are equally useful, i.e., equally bring information to the system. ¤ To minimize the user rating effort only some of them should be requested and acquired 9
  10. Definition of AL Strategy ¤ An active learning strategy for a Collaborative Filtering is a set of rules to choose the best items for the users to rate 10
  11. Non-Personalized Strategies ¤  Random: selects Items randomly (baseline) ¤  Popularity: scores an item according to the frequency of its ratings and then chooses the highest scored items (Carenini, 2003) ¤  Entropy: scores each item with the entropy of its ratings and then chooses the highest scored items (Rashid, 2002 and 2008) ¤  Variance: scores each item with the variance of its ratings and then chooses the highest scored items (Rubens, 2011) ¤  log(Popularity)*Entropy: combines the popularity and entropy scores and then chooses the highest scored items (Rashid, 2002 and 2008) 11
  12. Personalized Single Strategies ¤  Highest Predicted: scores an item according to the prediction of its ratings and then chooses the highest scored items (Elahi, 2011) ¤  Lowest Predicted: scores an item according to the prediction of its ratings and then chooses the lowest scored items (Elahi, 2011) ¤  Highest-Lowest Predicted: combines the highest predicted and lowest predicted scores and chooses the highest and lowest scored items (Elahi, 2011) ¤  Binary Prediction: scores an item according to the prediction of its ratings (using transformed matrix of user-item) and then chooses the highest scored items (Elahi, 2011) ¤  Personality based binary prediction: extends the binary prediction strategy by using user attributes, such as the scores for the Big Five personality traits on a scale from 1 to 5 (Elahi, 2013). 12
  13. Personalized Combined Strategies ¤ Combined with Voting: scores an item according to the votes given by a committee of different strategies and then chooses the highest scored items (Elahi, 2011) ¤ Combined with Switching: adaptively selects a strategy from a pool of individual AL strategies, based on the estimation of how well each strategy is able to cope with the conditions at hand. Then the selected strategy scores an item according to its criterion (Elahi, 2012) 13
  14. Offline Evaluation (A) ¤  Datasets are par))oned into three subsets: ¤ Known (K): contains the rating values that are considered to be known by the system at a certain point in time. ¤ Unknown (X): contains the ratings that are considered to be known by the users but not to the system. These ratings are incrementally elicited, i.e., they are transferred into K if the system asks them to the (simulated) users. ¤ Test (T): contains the ratings that are never elicited and are used only to test the recommendation effectiveness after the system has acquired the new elicited ratings. Netflix No. of users: 480189 No. of items: 17770 No. of ratings: 100M* Time span: 1998 – 2005 *We used the 1st 1M ratings Movielens No. of users: 6040 No. of items: 3900 No. of ratings: 1M Time span: 2000- 2003 14
  15. Learning Iteration Item Score 1 151 2 44 3 7 4 1 5 42 6 34 7 9 8 55 9 20 … … N 12 System computes the scores for all the items that can be scored (according to a strategy) 15
  16. Learning Iteration Top 10 items Score 1 151 8 55 43 54 11 50 2 44 5 42 6 34 22 33 75 29 13 25 The system selects the top 10 items and presents them to the simulated user 16
  17. Learning Iteration The items that are rated in the unknown set (X) are found and transferred to the known set (K) Rated items 1 2 5 75 13 17
  18. Learning Iteration The items that are rated in the unknown set (X) are found and transferred to the known set (K) 18
  19. System-wide vs User-centered 19 We have conducted System-wide evalua.on.
  20. Results: MAE ¤ Mean Absolute Error ¤ The lower the better. ¤ Measures the average absolute deviation of the predicted rating from the user's true rating: 0 20 40 60 80 100 120 140 160 180 200 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 # of iterations MAE Mean Absolute Error (MAE) random popularity lowest−pred highest−pred voting (Elahi, 2011) 20
  21. Effect on data distribution 1 2 3 4 5 6 7 8 9 1 5 5 4 2 5 4 2 4 5 5 3 1 3 1 5 4 5 4 5 5 4 4 5 5 5 5 4 5 4 5 5 6 3 5 5 1 4 4 5 7 4 4 5 5 5 5 8 5 5 4 5 5 5 9 5 5 5 5 4 Items Users 1 2 3 4 5 6 7 8 9 1 2 2 5 3 1 3 1 5 4 5 5 5 6 3 1 4 7 8 5 4 9 5 Items Rating Elicitation 21
  22. Histogram of Known Set ¤ Prediction Bias ¤ Since majority of the ratings added by highest-predicted strategy are ratings with high values, the prediction for the test set is biased 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Rating values Probability Iteration 1 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Rating values Probability Iteration 20 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Rating values Probability Iteration 40 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Rating values Probability Iteration 60 22
  23. Evaluation: NDCG ¤ Normalized Discounted Cumulative Gain: ¤ The higher the better. ¤ The recommendations for u are sorted according to the predicted rating values, then DCGu is computed: 0 20 40 60 80 100 120 140 160 180 200 0.8 0.82 0.84 0.86 0.88 0.9 0.92 # of iterations NDCG Normalized Discounted Cumulative Gain (NDCG) random popularity lowest−pred highest−pred voting (Elahi, 2011) 23
  24. Evaluation: Precision ¤ Precision: percentage of the items with rating values (as in T ) equal to 4 or 5 in the top 10 recommended items. 0 20 40 60 80 100 120 140 160 180 200 0.72 0.74 0.76 0.78 0.8 0.82 0.84 # of iterations Precision Presision random popularity lowest−pred highest−pred voting (Elahi, 2011) 24
  25. Successful Requests ¤ The ratio of the ratings acquired over those requested at different iterations.
  26. Offline Evaluation (B) 26
  27. Offline Evaluation (B) ¤ All the strategies show a non-monotone behavior, and there are a lot of fluctuations, since the test set, dynamically changes in every week. ¤ However, still the proposed strategies perform excellent in this situation compared to the base-line. 0 20 40 60 80 100 120 140 160 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 # of iterations MAE Traditional Evaluation Setting random highest−pred log(pop)*entropy voting 5 10 15 20 25 30 35 40 45 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 # of weeks MAE Proposed Evaluation Setting Natural Acquisition random highest−pred log(pop)*entropy voting switching Without Natural Acquisition (Elahi, 2011) With Natural Acquisition (Elahi, 2012) 27
  28. Evaluation: MAE (normalized) ¤  The highest predicted strategy (the default strategy of RSs) is not performing very differently from the natural acquisition of ratings. ¤  In fact, it is not acquiring additional ratings to those collected by the natural rating acquisition, i.e., the user rates these items by her own initiative 4 6 8 10 12 14 16 18 20 22 24 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05 0 # of weeks normalizedMAE Normalized Mean Absolute Error Natural Acquisition random highest−pred log(pop)*entropy voting switching (Elahi, 2012) 28
  29. Evaluation: NDCG (normalized) 4 6 8 10 12 14 16 18 20 22 24 −0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 # of weeks normalizedNDCG Normalized NDCG Natural Acquisition random highest−pred log(pop)*entropy voting switching (Elahi, 2012) 29 ¤ Our proposed Voting and Switching strategies, both perform excellent.
  30. Conclusion of Offline Evaluations ¤  We demonstrate that it is possible to adapt to the changes in the characteristics of the rating set by proposing two novel AL strategies: ¤  Combined with Voting ¤  Combined with Switching ¤  a more realistic active learning evaluation settings in which ratings are added not only by the AL strategies, but also by users without being prompted to rate (natural rating acquisition). ¤  Our results show that the natural rating acquisition considerably influences and changes the performance of the AL strategies. 30
  31. Application: South Tyrol Suggests(STS) ¤ A mobile Android context-aware RS that recommends places of interests (POIs) in the South Tyrol region. ¤ The system was in an extreme cold-start situation (only 700 ratings for total of 27,000 POIs). 31
  32. STS: Personality Questionnaire Neuroticism Conscientious- ness Openness ExtraversionAgreeableness Big Five Personality Traits 32
  33. STS: Personality Questionnaire Neuroticism Conscientious- ness Openness ExtraversionAgreeableness Big Five Personality Traits 33
  34. STS: Active Learning ¤ Using the personality of the user in the prediction model, the system estimates which POIs the user likely has experienced, and hence, can rate. 34
  35. STS: Contextual Factors 35
  36. STS: Recommendations ¤ STS computes rating predictions for all POIs from the database, using the personality information of the users and the ratings they have given to the POIs. 36
  37. User Study Research Hypotheses: ¤ Our proposed personality-based active learning strategy leads to a larger number of acquired user ratings and related contextual conditions. ¤ The prediction accuracy and context-awareness of the recommendation model improves the most when utilizing our proposed active learning strategy. 37
  38. Results: MAE MAE 38
  39. Results: Ratio of the Rating Acquisition Pairs of strategies Means p-value # of ratings Random / log(popularity) * entropy 1.35 / 2.07 < 0.001 73 / 112 Random / personality-based binary prediction 1.35 / 2.31 < 0.001 73 / 125 Personality-based binary prediction / log(popularity) * entropy 2.31 / 2.07 0.005 125 / 112 39
  40. Results: Ratio of the Rating Acquisition 0 10 20 30 40 50 60 0 1 2 3 4 #ofacquiredratings Random 0 10 20 30 40 50 60 0 1 2 3 4 #ofacquiredratings Log(popularity) * Entropy 0 10 20 30 40 50 60 0 1 2 3 4 #ofacquiredratings Personality−Based Binary Prediction 2.5 3 3.5 dratings Comparison of Regressions Random Log(popularity) * Entropy Personality−Based Binary Prediction 0 10 20 30 40 50 60 0 1 2 3 #ofacquiredrati 0 10 20 30 40 50 60 0 1 2 3 4 #ofacquiredratings Log(popularity) * Entropy 0 10 20 30 40 50 60 0 1 2 3 4 #ofacquiredratings Personality−Based Binary Prediction 0 10 20 30 40 50 60 1 1.5 2 2.5 3 3.5 Users over Time#ofacquiredratings Comparison of Regressions Random Log(popularity) * Entropy Personality−Based Binary Prediction 40
  41. Results: Context-Awareness 41 log(popularity) * entropy personality based binary pred. Q1 3.58 3.56 Q2 2.95 3.31 # of context 1.01 1.52
  42. Results: Context-Awareness 42 Comparison of MAE (the lower the be<er) and nDCG (the higher the be<er)
  43. Conclusion of User Study In a live user study, we have: ü  shown that user personality has an important impact in user’s rating behavior. ü  Successfully verified both research hypotheses, i.e., the personality-based active learning strategy acquired more ratings and improves the most the rating prediction accuracy. 43
  44. Main Contributions ¤  Proposing several novel personalized active learning strategies for collaborative filtering. ¤  Offline evaluation of several active learning strategies with regards to their system-wide effectiveness. ¤  Comprehensive evaluation of active learning strategies with regards to several evaluation measures. ¤  Evaluation of active learning strategies with and without natural acquisition of ratings. ¤  Application of active learning in an up-and-running mobile context-aware recommender system. 44
  45. Future Works 45 ¤ Gamification in Active Learning for RS: making the rating process more funny and enjoyable for the user. Shoot the ball to the place you visited and liked the most
  46. Future Works 46 ¤ Active Learning for Relevant Context Selection: how to select context factors that are relevant to the items. 46 Which contextual condition is more relevant to this item?
  47. Future Works 47 ¤ Sequential Active Learning: selecting and presenting the items to the user to rate incrementally. ¤ Hence the system can immediately adapt to the remaining rating requests. item 1 item 2 item 3 item 4
  48. Future Works 48 3 2 5 1 2 2 3 3 3 4 4 5 3 4 1 5 2 2 1 5 4 1 2 1 5 5 5 2 2 1 5 3 3 4 1 ? ? ? ? ? 5 2 3 2 1 4 5 5 5 3 1 2 5 3 2 Target domain Auxiliary source domainUser Personality new user Active Learning Active Learning
  49. Future Works 49 High Color VarianceLow Color Variance
  50. Publications on AL Book Chapter: 2015 ¤  N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan, Active Learning in Recommender Systems. Book chapter in Recommender Systems Handbook, Springer Verlag, 2015 Journal: 2016 ¤  M. Elahi, F. Ricci, N. Rubens, A survey of active learning in collaborative filtering recommender systems, Computer Science Review,,,,2016,Elsevier ¤  I. Fern´andez-Tob´ıas, M. Braunhofer, M. Elahi, F. Ricci, and I. Cantador. Alleviating the New User Problem in Collaborative Filtering by Exploiting Personality Information, User Modeling and User-Adapted Interaction (UMUAI), Personality in Personalized Systems, 2016, Springer 2014 ¤  M. Braunhofer, M. Elahi, and F. Ricci. Techniques for cold-starting context-aware mobile recommender systems for tourism. Intelligenza Artificiale, 8(2):129–143, 2014 2013 ¤  M. Elahi, F. Ricci, and N. Rubens. Active learning strategies for rating elicitation in collaborative filtering: A system-wide perspective. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):13, 201 50 Full list @ www.researchgate.net/profile/Mehdi_Elahi2
  51. Publications on AL Conference: 2015 ¤  M. Braunhofer, M. Elahi, and F. Ricci. User personality and the new user problem in a context-aware points of interest recommender system. In Information and Communication Technologies in Tourism 2015. Springer International Publishing, 2015 2014 ¤  M. Elahi, F. Ricci, and N. Rubens. Active learning in collaborative filtering recommender systems. In E- Commerce and Web Technologies (EC-Web), pages 113–124. Springer International Publishing, 2014 ¤  M. Braunhofer, M. Elahi, M. Ge, and F. Ricci. Context dependent preference acquisition with personality- based active learning in mobile recommender systems. In Learning and Collaboration Technologies. Technology-Rich Environments for Learning and Collaboration, pages 105–116. Springer International Publishing, 2014 2013 ¤  M. Elahi, M. Braunhofer, M. Ricci, and M. Tkalcic. Personality- based active learning for collaborative filtering recommender systems. In AI* IA 2013: Advances in Artificial Intelligence, pages 360–371. Springer International Publishing, 2013 51 Full list @ www.researchgate.net/profile/Mehdi_Elahi2
  52. Publications on AL Conference: 2012 ¤  M. Elahi, F. Ricci, and N. Rubens. Adapting to natural rating acquisition with combined active learning strategies. In Foundations of Intelligent Systems, pages 254–263. Springer Berlin Heidelberg, 2012 2011 ¤  M. Elahi, V. Repsys, and F. Ricci. Rating elicitation strategies for collaborative filtering. In E-Commerce and Web Technologies (EC-Web), pages 160–171. Springer Berlin Heidelberg, 2011 52 Full list @ www.researchgate.net/profile/Mehdi_Elahi2
  53. Thank you! 53 seminar @ Politecnico di Milano July 2015
Advertisement