Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒


Published on

陳弘軒 / 中央大學資訊工程學系助理教授

Published in: Data & Analytics
  • Be the first to comment

[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒

  1. 1. 1 Practical lessons in mining and evaluating information systems Hung-Hsuan Chen, National Central University
  2. 2. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI Data Analytics Research Team (DART) • Discover the problems or needs (need) • Have the programming, math skills, and domain knowledge to solve the problem (skill) • Have passion to realize the plan (passion) 2
  3. 3. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI My background • An engineer wearing a scientist’s hat? • Deep learning and ensemble learning on recommender systems (2014 – 2018) • Academic search engine CiteSeerX (2008 – 2013) § 4M+ documents § 87M+ citations § 2M – 4M hits per day § 300K+ monthly downloads § 100K documents added monthly 3
  4. 4. Outline • I will present 4 common pitfalls in training and evaluating recommender systems • These pitfalls appeared in many previous studies on recommender systems and information systems • Details are in the following paper § Chen, H. H., Chung, C. A., Huang, H. C., & Tsui, W. (2017). Common pitfalls in training and evaluating recommender systems. ACM SIGKDD Explorations Newsletter, 19(1), 37-45. 4
  5. 5. A typical flow to build a recommender system 5
  6. 6. t0 t1 t2 No recommendation period Initial recommendation algorithm Rorig is applied online The logs of this period is used to train and compare the the initial algorithm Rorig and the new algorithm Rnew The data used to train the new recommendation algorithm Rnew and re-train the the original algorithm Rorig Test data to compare Rorig and Rnew ts The logs (e.g., clickstream) of this period is used to train the initial recommendation algorithm Rorig
  7. 7. Issue 1: trained model may be biased toward highly reachable products 7
  8. 8. Clicks resulted from the in-page direct links Day Day 1 Day 2 Percentage 19.3150% 21.2812% 8 • If we use the clickstreams to generate the positive samples, by rearranging the layout of the pages or the link targets in the pages, approximately 1/5 of the positive training instances are likely to be different.
  9. 9. Percentage of promoted products in the recommendation list Meth od MC Categ oryTP TotalT P ICF- U2I ICF-I2I NMF- U2I NMF- I2I train- all 100% 1.48% 1.84% 93.22 % 1.40% 1.48% 1.34% train- sel 1.08% 0.86% 0.98% 14.46 % 1.28% 1.32% 1.24% 9 • When using train-all as the training data, several algorithms recommend many of the “promoted products” § We seem to learn the “layout” of the product page (i.e., the direct links from !" to !# ) instead of the intrinsic relatedness of between products
  10. 10. Lessons learned • The common wisdom that the clickstream represents a user’s interest/habit could be problematic § Clickstreams are highly influenced by the reachability of the products and the layouts of the product pages • Training a recommender system based on the clickstreams are likely to learn § The “layout” of the pages § The recommendation rules of the online recommender system • Need to select training data more carefully 10
  11. 11. Issue 2: the online recommendation algorithm affects the distribution of the test data 11
  12. 12. CTRs when using different online recommendation algorithm 12
  13. 13. Lessons learned • Previous studies sometimes use all the available test data as the ground truth for evaluation • Unfortunately, such an evaluation process inevitably favors the algorithms that suggest products close to the online recommendation algorithm • We should carefully select the test dataset to perform a fairer evaluation. 13
  14. 14. Issue 3: click through rates are mediocre proxy to the recommendation revenues 14
  15. 15. CTR vs recommendation revenue 15 recommendation revenue CTR • Based on ~1 year log • The correlation of determination is only 0.089 § A weak positive relationship
  16. 16. Lessons learned • Comparing recommendation algorithms based on the user-centric metrics (e.g., CTR) may fail to capture the business owner’s satisfaction (e.g., revenue) • Unfortunately, studies on recommender systems mostly perform comparisons based on the user-centric metrics • Even if a recommendation algorithm attracts many clicks, we cannot assure this algorithm will bring a large amount of revenue to the website 16
  17. 17. Issue 4: evaluating recommendation revenue is not straightforward 17
  18. 18. Comparing number of purchases 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date totalorders 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date recmdorders 18 Green line: the channel with a recommendation panel Blue line: the channel without a recommendation panel
  19. 19. Lessons learned • Although a recommendation module may help users discover their needs, these users, even without the recommendations, may still be able to locate the desired products by other processes • It is not clear whether a recommendation module brings extra purchases, or simply re- direct users from other purchasing processes to recommendation • A/B-testing might be necessary 19
  20. 20. Discussion • We discussed 4 pitfalls in training and evaluating recommender systems • The first two issues are due to the biased data collection of the training and the test datasets • The third issue is regarding the proper selection of the evaluation metrics • The fourth issue discusses the extra purchase vs re-directed purchase of the recommender systems 20
  21. 21. 21 • Hung-Hsuan Chen • Questions?