Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tutorial bpocf

2,599 views

Published on

Traditional recommender systems assume the availability of explicit ratings of items from users. However, in many applications this is not the case and only binary, positive-only user feedback is available in the form of likes on Facebook, items bought on Amazon, videos watched on Netflix, adds/links clicked on Google, tags assigned to a photo in Flickr etc. Recently, the number of publications on designing recommender systems that handle binary, positive-only feedback, is growing very fast. In this tutorial we discuss why collaborative filtering with binary, positive-only feedback is fundamentally different from collaborative filtering with rating data. We give an overview of the algorithms suitable for this task with an emphasis on surprising commonalities and key differences. We show, for example, that also the nearest-neighbors method can be elegantly described in the matrix factorization framework.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Tutorial bpocf

  1. 1. Collaborative Filtering with Binary, Positive-only Data Tutorial @ ECML PKDD, September 2015, Porto Koen Verstrepen+, Kanishka Bhaduri*, Bart Goethals+ * +
  2. 2. Agenda •  Introduction •  Algorithms •  Netflix
  3. 3. Agenda •  Introduction •  Algorithms •  Netflix
  4. 4. Binary, Positive-Only Data — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRES U I R REFERENCES F. Aiolli. 2013. Efficient Top 273–280. Fabio Aiolli. 2014. Convex A 293–296. S.S. Anand and B. Mobasher
  5. 5. Collaborative Filtering — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRES U I R REFERENCES F. Aiolli. 2013. Efficient Top 273–280. Fabio Aiolli. 2014. Convex A 293–296. S.S. Anand and B. Mobasher
  6. 6. Movies — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  7. 7. Music — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  8. 8. Social Networks — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  9. 9. Tagging / Annotation — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. Paris New York Porto Statue of Liberty Eiffel Tower
  10. 10. Also Explicit Feedback — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  11. 11. Matrix Representation 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many a evaluation measures, multiple data split methods, sufficiently rand — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16 C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Man evaluation measures, multiple data split methods — also empirically evaluate the explanations extract 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H recommender systems. In Advances in Knowledge Discovery Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P — Convince the reader this is much better than offline, how to 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets evaluation measures, multiple data split methods, sufficien — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 1       1       1       1           1           1   1           1   — Convince the re 8. EXPERIMENTAL — Who: ? — THE offline com evaluation meas — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. — THE offli evaluatio — also empi 9. SYMBOL U I R REFERENC F. Aiolli. 201 273–280. Fabio Aiolli. 2 293–296. S.S. Anand an C.M. Bishop. Evangelia Ch R
  12. 12. Unknown = 0 no negative information 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many a evaluation measures, multiple data split methods, sufficiently rand — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16 C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Man evaluation measures, multiple data split methods — also empirically evaluate the explanations extract 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H recommender systems. In Advances in Knowledge Discovery Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P — Convince the reader this is much better than offline, how to 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets evaluation measures, multiple data split methods, sufficien — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 1    0   1    0   1    0   1   0   0      0   1   0     0     1   0   0   1    0    0   1   — Convince the re 8. EXPERIMENTAL — Who: ? — THE offline com evaluation meas — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. — THE offli evaluatio — also empi 9. SYMBOL U I R REFERENC F. Aiolli. 201 273–280. Fabio Aiolli. 2 293–296. S.S. Anand an C.M. Bishop. Evangelia Ch R
  13. 13. Different Data Ratings Graded relevance, Positive-Only Binary, Positive-Only 1       5       4       3   3           4           2   2   5   5           1       5       4               4           5   5               X       X               X           X   X           •  •  Movies •  Music •  … •  Minutes watched •  Times clicked •  Times listened •  Money spent •  Visits/week •  … •  Seen •  Bought •  Watched •  Clicked •  …
  14. 14. Sparse 10 in 10 000
  15. 15. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  16. 16. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  17. 17. pLSA An elegant example [Hofmann 2004]
  18. 18. pLSA probabilistic Latent Semantic Analysis — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  19. 19. pLSA latent interests — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  20. 20. pLSA generative model — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  21. 21. pLSA probabilistic weights — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296. I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. E 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recom C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014 recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of t 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehens Methods. In Recommender Systems Handbook, F. x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 PD d=1 p(d | u) = 1P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Data 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedb 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 PD d=1 p(d | u) =P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conv 293–296. S.S. Anand and B. Moba C.M. Bishop. 2006. Patte
  22. 22. pLSA compute like-probability — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP
  23. 23. pLSA computing the weights — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an D d = 1 d = 1 d = D — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 (tempered)  Expecta5on-­‐Maximiza5on  (EM)   (1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) Recommendation for Very Large Scale Binary Rated Datasets. In RecSy optimization for top-N recommendation with implicit feedback. In RecSy 06. Contextual Recommendation. In WebMine. 142–160. gnition and Machine Learning. Springer, New York, NY. George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-
  24. 24. pLSA à General
  25. 25. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP
  26. 26. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2 models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. x AUC optimization for top-N recommendation with implicit feedback. In Proceed- Conference on Recommender systems. ACM, 293–296. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) ent top-n recommendation for very large scale binary rated datasets. In Proceedings erence on Recommender systems. ACM, 273–280. ex AUC optimization for top-N recommendation with implicit feedback. In Proceed- p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35 rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ES 013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- e 8th ACM Conference on Recommender systems. ACM, 293–296. h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  27. 27. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2 models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. x AUC optimization for top-N recommendation with implicit feedback. In Proceed- Conference on Recommender systems. ACM, 293–296. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) ent top-n recommendation for very large scale binary rated datasets. In Proceedings erence on Recommender systems. ACM, 273–280. ex AUC optimization for top-N recommendation with implicit feedback. In Proceed- p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35 rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ES 013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- e 8th ACM Conference on Recommender systems. ACM, 293–296. h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  28. 28. pLSA matrix factorization notation — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many d evaluation measures, multiple data split methods, s — also empirically evaluate the explanations extracted 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP evaluation measures, multiple data split methods, su — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX p(i|d) · p(d|u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX d=1 p(i|d) · p(d|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January p(i|u) = DX d=1 p(i|d) · p(d|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publicati p(i|u) = X d=1 p(i|d) · p(d|u max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No
  29. 29. pLSA matrix factorization notation — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many d evaluation measures, multiple data split methods, s — also empirically evaluate the explanations extracted 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP evaluation measures, multiple data split methods, su — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX p(i|d) · p(d|u) D ⇥ |U| |I| |U| |I| |I| D Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31 Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2) ltering: A Theoretical and Experimental Comparison of the State Of The Art1:31 Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2)
  30. 30. Scores = Matrix Factorization — Who: ? — THE offline comparison of OCCF alg evaluation measures, multiple data — also empirically evaluate the explan 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P — also empirically evaluate the explan 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) max P log p(i | u) D ⇥ |U| |I| |U| |I| |I| D Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2) Collaborative Filtering: A Theoretical and Experimental Comparison of the Stat Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2)
  31. 31. Deviation Function S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) Efficient Top-N Recommendation for Very Large Scale Binary Rated Dat 4. Convex AUC optimization for top-N recommendation with implicit feed B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. 6. Pattern Recognition and Machine Learning. Springer, New York, NY. akopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear r systems. In Advances in Knowledge Discovery and Data Mining. Spring , Yehuda Koren, and Roberto Turrin. 2010. Performance of recommend max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui min D (S, R) S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui
  32. 32. Summary: 2 Basic Building Blocks Factorization Model Deviation Function
  33. 33. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Parameter inference •  Netflix
  34. 34. Tour of The Models
  35. 35. pLSA soft clustering interpretation user-item scores user-cluster affinity item-cluster affinity mixed clusters [Hofmann 2004] [Hu et al. 2008] [Pan et al. 2008] [Sindhwani et al. 2010] [Yao et al. 2014] [Pan and Scholz 2009] [Rendle et al. 2009] [Shi et al. 2012] [Takàcs and Tikk 2012]
  36. 36. pLSA soft clustering interpretation — Who: Kanishka? — Convince the rea 8. EXPERIMENTAL — Who: ? — THE offline comp evaluation meas — also empirically 9. SYMBOLS FOR P x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 max P Rui=1 log p(i REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — Who: ? — THE offline co evaluation me — also empirical 9. SYMBOLS FO x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = P i2I p(i | d) = 1 max P Rui=1 log REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. S.S. Anand and B. M D ⇥ |U| |I| |U| |I| |I| D 0.05   0.1   0.5   0.3   0.4   0.1   0.4   0.1   max P Rui=1 |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D = 4 d = 1 i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D = 4 d = 1 ACM Comp Binary, Positive-Only Collaborative Filtering d = 2 d = 3 d = 4 S Binary, Positive-Only Collaborative Filtering d = 2 d = 3 d = 4 S Binary, Positive-Only Collaborative Filtering: A d = 2 d = 3 d = 4 Sui S 0.04   0.01   0.20   0.03   0.28   user-item scores user-cluster affinity item-cluster affinity
  37. 37. Hard Clustering user-item scores user-uCluster membership item-iCluster membership item probabilities uCluster-iCluster similarity [Hofmann 2004] [Hofmann 1999] [Ungar and Foster 1998]
  38. 38. Item Similarity dense user-item scores original rating matrix item-item similarity [Rendle et al. 2009] [Aiolli 2013]
  39. 39. Item Similarity sparse user-item scores item-item similarityoriginal rating matrix [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Ning and Karypis 2011]
  40. 40. User Similarity sparse user-item scores column normalized original rating matrix (row normalized) user-user similarity [Sarwar et al. 2000]
  41. 41. User Similarity dense user-item scores column normalized original rating matrix (row normalized) user-user similarity [Aiolli 2014] [Aiolli 2013]
  42. 42. User+Item Similarity [Verstrepen and Goethals 2014]
  43. 43. Factored Item Similarity symmetrical user-item scores original rating matrix Identical item profiles evaluation measures, mult — also empirically evaluate th 9. SYMBOLS FOR PRESENTAT x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 9. SYMBOLS FO x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = P i2I p(i | d) = 1 max P Rui=1 log item clusters Item-cluster affinity Similarity by dotproduct [Weston et al. 2013b]
  44. 44. Factored Item Similarity asymmetrical + bias user-item scores original rating matrix row normalized Item profile if known preference Item profile if candidateitem biasesuser biases [Kabbur et al. 2013]
  45. 45. Higher Order Item Similarity inner product user-item scores extended rating matrix Itemset-item similarity selected higher order itemsets[Christakopoulou and Karypis 2014] [Deshpande and Karypis 2004] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] [Lin et al. 2002]
  46. 46. Higher Order Item Similarity max product 0.05   0.1   0.5   0.3   0.4   0.1   0.4   0.1   0.04   0.01   0.20   0.03   0.20   max MP [Sarwar et al. 2001] [Mobasher et al. 2001]
  47. 47. Higher Order User Similarity inner product user-item scores user-userset similarity extended rating matrix selected higher order usersets [Lin et al. 2002]
  48. 48. Best of few user models non linearity by max [Weston et al. 2013a] Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison o rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear meth recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38– rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ENCES olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings he 7th ACM conference on Recommender systems. ACM, 273–280. olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- of the 8th ACM Conference on Recommender systems. ACM, 293–296. Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top- commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 46. ~ 3 models/user
  49. 49. Best of all user models efficient max out of [Verstrepen and Goethals 2015] Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Compariso rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. I of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear me recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ENCES olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings he 7th ACM conference on Recommender systems. ACM, 273–280. olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- of the 8th ACM Conference on Recommender systems. ACM, 293–296. Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top- commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 46. . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log(1 Sui) + ⇣ ||S( 2|u| models/user REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale b of the 7th ACM conference on Recommender systems. ACM, 273–280 Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation ings of the 8th ACM Conference on Recommender systems. ACM, 29 Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recom C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o recommender systems. In Advances in Knowledge Discovery and Da Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. Mukund Deshpande and George Karypis. 2004. Item-based top-n recom 2 models/ 2|u| REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very of the 7th ACM conference on Recommender systems. A Fabio Aiolli. 2014. Convex AUC optimization for top-N rec ings of the 8th ACM Conference on Recommender syste Sarabjot Singh Anand and Bamshad Mobasher. 2007. Con C.M. Bishop. 2006. Pattern Recognition and Machine Lear Evangelia Christakopoulou and George Karypis. 2014. Ho recommender systems. In Advances in Knowledge Dis Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010 n recommendation tasks. In Proceedings of the fourth
  50. 50. Combination item vectors can be shared [Kabbur and Karypis 2014] Binary, Positive-Only Collaborative Filtering: A Theoretical and E rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U = Z ( ) · p( D(S, R) = DKL(Q(S)||p(S| . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large sca of the 7th ACM conference on Recommender systems. ACM, 273 Fabio Aiolli. 2014. Convex AUC optimization for top-N recommenda ings of the 8th ACM Conference on Recommender systems. ACM Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual r C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Spr Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Hig recommender systems. In Advances in Knowledge Discovery an rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedba ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear m recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender alg n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender 39–46.
  51. 51. Sigmoid link function for probabilistic frameworks [Johnson 2014] = r u2U i2I Rui=1 j2I Duij(S, R) = = Z ( ) · p(
  52. 52. ive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of t rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d CES 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Pro h ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In he 8th ACM Conference on Recommender systems. ACM, 293–296. gh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Pdf over parameters i.s.o. point estimation [Koeningstein et al. 2012] [Paquet and Koeningstein 2013]
  53. 53. Summary: 2 Basic Building Blocks Factorization Model Deviation Function
  54. 54. Summary: 2 Basic Building Blocks Factorization Model Deviation Function a.k.a. What do we minimize in order to find the parameters in the factor matrices?
  55. 55. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  56. 56. Tour of Deviation Functions
  57. 57. Local Minima depending on initialisation Rui=1 X Rui=1 log Sui = X Rui=1 log Sui n D (S, R) for Very Large Scale Binary Rated Datasets. In RecSys. top-N recommendation with implicit feedback. In RecSys. ecommendation. In WebMine. 142–160. hine Learning. Springer, New York, NY. 2014. Hoslim: Higher-order sparse linear method for top-n ledge Discovery and Data Mining. Springer, 38–49. urrin. 2010. Performance of recommender algorithms on s of the fourth ACM conference on Recommender systems. d Top-N Recommendation Algorithms. TOIS 22, 1 (2004), hensive Survey of Neighborhood-based Recommendation ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). hirani. 2010. Regularization paths for generalized linear tistical software 33, 1 (2010), 1. en PLSA and NMF and implications. In SIGIR. 601–602. Indexing. In SIGIR. 50–57. ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 Mining and Knowledge Discovery Handbook, O. Mainmon Y. for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KN every row S (1) u. and (S(1,1) , . . . , S(T,F ) ) REFERENCES
  58. 58. Max Likelihood high scores for known preferences max Rui=1 log Sui min X Rui=1 log Sui in D (S, R) = X Rui=1 log Sui Binary, Positive-Only Collaborative Filtering: A Theoretical a d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 Binary, Positive-Only Collaborative Filtering: A Th d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 Binary, Positive-Only Collaborative d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 1 1 [Hofmann 2004] [Hofmann 1999]
  59. 59. Reconstruction w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for
  60. 60. Reconstruction w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for `Ridge’regularization[Kabbur et al. 2013] [Kabbur and Karypis 2014]
  61. 61. Reconstruction Elastic net regularization w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for (1 0)2 = 1 = (1 2)2 w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 FERENCES Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Dataset 273–280. bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac 293–296. `Ridge’regularization [Ning and Karypis 2011] [Christakopoulou and Karypis 2014] [Kabbur et al. 2013] [Kabbur and Karypis 2014]
  62. 62. Reconstruction between AMAU and AMAN u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , AMAN
  63. 63. Reconstruction between AMAU and AMAN 999]. Ungar and Foster [Ungar and Foster 199 ethod, but remain vague about the details of t tion Based Deviation Functions. Next, there is a -based matrix factorization algorithms for ra Bell 2011]. They start from the 2-factor factor (Eq. 3) but strip the parameters of all their st ated to be an approximate, factorized reconstru h is to find S(1) and S(2) such that they mini ror between S and R. A deviation function th D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , AMAU AMAN
  64. 64. Reconstruction between AMAU and AMAN 999]. Ungar and Foster [Ungar and Foster 199 ethod, but remain vague about the details of t tion Based Deviation Functions. Next, there is a -based matrix factorization algorithms for ra Bell 2011]. They start from the 2-factor factor (Eq. 3) but strip the parameters of all their st ated to be an approximate, factorized reconstru h is to find S(1) and S(2) such that they mini ror between S and R. A deviation function th D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the egatives. On the other hand, the AMAN assumpti actually searching for the preferences among the u Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu tween AMAU and AMAN: D (S, R) = X u2U X i2I Wui (Rui Sui) 2 , n⇥m assigns a weight to every value in R. The hig bout Rui. There is a high confidence about the one fidence about the zeros being dislikes. To formaliz 2008] give two potential definitions of Wui: AMAU AMAN Middle Way
  65. 65. Reconstruction choosing W nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the egatives. On the other hand, the AMAN assumpti actually searching for the preferences among the u Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu tween AMAU and AMAN: D (S, R) = X u2U X i2I Wui (Rui Sui) 2 , n⇥m assigns a weight to every value in R. The hig bout Rui. There is a high confidence about the one fidence about the zeros being dislikes. To formaliz 2008] give two potential definitions of Wui: Middle Way d=1 X i2I S (1) di = 1 ⇢ Wui = 1 if Rui = 0 Wui = ↵ if Rui = 1 ient Top-N Recommendation for Very Large Scale Binary Rate eys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
  66. 66. Reconstruction regularization ollaborative Filtering: A Theoretical and Experimental Comparison matrix factorization of its statistical meaning, also the c 9 disappear. Simply minimizing Equation 11 however r t are overfitted on the training data. Therefore both Hu o minimize a regularized version R) = X u2U X i2I Wui (Rui Sui) 2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ , egularization hyperparameter and ||.||F the Frobenius an make it hard to find a good value. Additionally, Pan te regularization: = X u2U X i2I Wui ⇣ (Rui Sui) 2 + ⇣ ||S (1) u⇤ ||F + ||S (2) ⇤j ||F ⌘⌘ . Squared reconstruction error term Regularization term Regularization hyperparameter [Hu et al. 2008] [Pan et al. 2008] [Pan and Scholz 2009]
  67. 67. Reconstruction more complex matrix factorization of its statistical meaning, also the c disappear. Simply minimizing Equation 11 however re are overfitted on the training data. Therefore both Hu o minimize a regularized version ) = X u2U X i2I Wui (Rui Sui) 2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ , gularization hyperparameter and ||.||F the Frobenius n an make it hard to find a good value. Additionally, Pan te regularization: = X u2U X i2I Wui ⇣ (Rui Sui) 2 + ⇣ ||S (1) u⇤ ||F + ||S (2) ⇤j ||F ⌘⌘ . function is defined over all user-item pairs, a direct s stochastic gradient descent (SGD), which is frequentl orizations in rating prediction problems, seems unfeasib
  68. 68. Reconstruction rewritten S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback
  69. 69. Reconstruction rewritten S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback
  70. 70. S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback Reconstruction guess unknown = 0
  71. 71. + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ p (1 Sui) 2 + (1 p) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 Reconstruction unknown can also be 1 [Yao et al. 2014]
  72. 72. Reconstruction less assumptions, more parameters u2U i2I + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 96. and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. p. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  73. 73. Reconstruction more regularization + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
  74. 74. sitive-Only Collaborative Filtering: A Theoretical and Experimental Compariso X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba Reconstruction more (flexible) parameters [Sindhwani et al. 2010]
  75. 75. Reconstruction conceptual flaw imate, factorized reconstruction of R d S(2) such that they minimize the R. A deviation function that reflect X 2U X i2I Rui (Rui Sui) 2 . AU assumption. Making the AMAN s are interpreted as an absence of pr = X X (Rui Sui) 2 . 1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 S F + ||S(2) ||F X I (1 Rui)H (Pui) (1 0)2 = 1 = (1 2)2 -N Recommendation for Very Large Scale Binary Rate UC optimization for top-N recommendation with impli
  76. 76. Log likelihood similar idea [C. Johnson 2014] . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log ⇣ 1 Sui) + (||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES . 2013. Efficient top-n recommendation for very large scale binary rated datasets. 7th ACM conference on Recommender systems. ACM, 273–280.
  77. 77. Log likelihood similar idea [C. Johnson 2014] . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log ⇣ 1 Sui) + (||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES . 2013. Efficient top-n recommendation for very large scale binary rated datasets. 7th ACM conference on Recommender systems. ACM, 273–280. . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log(1 Sui) + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES li. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P 7th ACM conference on Recommender systems. ACM, 273–280. Zero-­‐mean,  spherical   Gaussian  priors  
  78. 78. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009]
  79. 79. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009]
  80. 80. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009] 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a simila method, but remain vague about the details of their method. uction Based Deviation Functions. Next, there is a group of algorithm D-based matrix factorization algorithms for rating prediction prob d Bell 2011]. They start from the 2-factor factorization that describe l (Eq. 3) but strip the parameters of all their statistical meaning. In lated to be an approximate, factorized reconstruction of R. A straigh h is to find S(1) and S(2) such that they minimize the the square rror between S and R. A deviation function that reflects this line o D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . early makes the AMAU assumption. Making the AMAN assumption nd, all missing values are interpreted as an absence of preference an nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the vast majority of th has important practical consequences: If Rui = 1, the square loss and Sui = 2. However, Sui = 2 is a much better prediction than Sui the reconstruction based deviation functions (implicitly) assume are equally strong, which is an important simplification of reality A deviation function that does not suffer from this flaw was p Scholz [Pan and Scholz 2009], who applied the idea of Maximum torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, orative filtering. They construct the matrix ˜R as ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, and define the deviation funtion as D ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃ with ||.||⌃ the trace norm, a regularization hyperparameter, h ⇣ hinge loss given by Figure 3 [Rennie and Srebro 2005] and W Equations 14-16. The deviation function incorporates the confidence about the tra of W and the missing knowledge about the degree of preference b loss h ⇣ ˜Rui · Sui ⌘ . Since the degree of preference is considered unk 1 is not penalized. ons. Notice that R is a binary matrix and ued matrix. Therefore, the interpretation amentally flawed. This fundamental flaw i = 1, the square loss is 1 for both Sui = 0 er prediction than Sui = 0. Put differently, s (implicitly) assume that all preferences mplification of reality. from this flaw was proposed by Pan and he idea of Maximum Margin Matrix Fac- et al. 2004] to binary, positive-only collab- ˜R as if Rui = 1 if Rui = 0, p(i|d2) p(i|dD) Sui = 0 Sui = 2 p(i|d1) p(i|d2) p(i|dD) Sui = 0 Sui = 2 1 1 ~ 0.5 0
  81. 81. e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which AUC directly optimize the ranking [Rendle et al. 2009]
  82. 82. AUC directly optimize the ranking [Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  83. 83. AUC non-differentiable [Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  84. 84. AUC smooth approximation n-differentiability of h(z) = (1 z)+ of J(U, V, ) is easy to compute. The th respect to each element of U is: C R 1X r=1 X j|ij2S Tij(k)h ⇣ Tr ij( ir Ui ased Deviation Functions. The scores computed by recommender s personally rank all items for every user. Therefore, Rendle et al rgued that it is natural to directly optimize the ranking. More s maximize the area under the ROC curve (AUC), which is give AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), and (false) = 0. If the AUC is higher, the pairwise rankings in are more in line with the observed data R. However, because rentiable, their deviation function is a differentiable approxim AUC from which constant factors have been removed and to w rm has been added: ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) ||2 F , moid function and 1, 2 regularization constants, which are he method. Notice that this deviation function coniders all m negative, i.e. it corresponds to the AMAN assumption.[Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  85. 85. Pairwise Ranking 2 similar to AUC + ||S ||F + ||S ||F ↵ X u2U X i2I (1 Rui)H (Pui) (57 (1 0)2 = 1 = (1 2)2 (58 w(j) = X u2U Ruj ˜R ⌘ = X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy nd B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ristakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top- nder systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. [Kabbur et al. 2013]
  86. 86. Pairwise Ranking 3 no regularization, also 1 to 1 + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , larization constant and () the sigmoid function. Notice that this n de facto ignores all missing feedback, i.e. it corresponds to the AM r ranking based deviation function was proposed by Tak´acs and Tikk 2012] D ⇣ S, ˜R ⌘ = X u2U X i2I Rui X j2I w(j) ((Sui Suj) (Rui Ruj)) 2 , er-defined item weighting function. The simplest choice is w(j) = native proposed by Tak´acs and Tikk is w(j) = P u2U Ruj. This devia some resemblance with the one in Equation 4.1.4. However, a squ stead of the log-loss of the sigmoid. Furthermore, this deviation fun s the score-difference between all known preferences, which is not .1.4. Finally, it is remarkable that Tak´acs and Tikk explicitly do not on term, whereas most other authors find that the regularization ter their models performance. or Probability Deviation Functions. At this point, we almost finished X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) (1 0)2 = 1 = (1 2)2 w(j) = X u2U Ruj NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 96. and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. [Takàcs and Tikk 2012]
  87. 87. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR focus on top of the ranking [Shi et al. 2012]
  88. 88. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR non-differentiable [Shi et al. 2012]

×