Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Towards Better Online Personalization: A
Framework for Empirical Evaluation and
Real-Life Validation of Hybrid
Recommendation Systems
Stijn Geuens, Koen W. De Bock, Kristof Coussement

Recommendation Systems: Examples
207/20/2016 AMS World Marketing Congress 2016

How to Calculate Recommendations
[Bobadilla et al. 2013; Adomavicius et al. 2008]
 Classification based on calculation paradigm:
 Classification based on input data:
3AMS World Marketing Congress 201607/20/2016

How to Calculate Recommendations
[Bobadilla et al. 2013; Adomavicius et al. 2008]
 Classification based on calculation paradigm:
 Memory-based [Goldberg, 1992]
 Model-based [Koren, 2008]
 Classification based on input data:
 Socio-demographic information  Demographic RecSys [eg. Pazzani 1999; Porcel et al. 2012]
 Product characteristics  Content-based RecSys [eg. Lang 1995; Meteren and Someren 2000]
 Real-time navigation information  Knowledge-based RecSys [eg. Burke 2000]
 Behavioral history  Collaborative filtering RecSys [eg. Herlocker et al. 2004]
 Hybrid solutions [eg. Burke 2002; Preece and Sneiderman 2009]

A Shift Towards Hybrid Algorithms
 Single data source systems: advantages and disadvantages [Bobadilla et al. 2013]
 Hybridization resolves these issues and leads to better performance [Bobadilla et al. 2013]
 Algorithm combination vs. data source combination [Bobadilla et al. 2013]
 Burke’s classification [Burke, 2002]:
 Weighting
 Feature combination

Contributions
 Go beyond creation of a hybrid algorithm by:
 Creation of a decision framework for marketing academics and professionals
to guide them in their efforts to create recommendation systems
 Opening the black-box of recommendation systems by introducing the
concept of feature importance

Research Questions
6AMS World Marketing Congress 2016
 Data:
 Recommendation Calculation:
 Feature Importance:
07/20/2016

Research Questions
 Data:
RQ1.a. Do Recommendation systems based on different single data sources differ in performance?
RQ1.b. Does combining different data sources add predictive performance?
 Recommendation Calculation:
RQ2. Which hybridization technique performs best for algorithms with the optimal number of data
sources?
 Feature Importance:
RQ3. Which are the most important predictors in the best performing algorithm?
07/20/2016

Framework
AMS World Marketing Congress 2016 707/20/2016

Framework
AMS World Marketing Congress 2016 8
[Song, 2000; Kohavi et al., 2004]
07/20/2016

Framework
[Rendle, 2010]
[Burke, 2002; Adomavicius & Tuzhilin, 2005]
07/20/2016

Framework
[Lipton, 2014]
[Herlocker et al., 2004]
[Breiman, 2003]
07/20/2016

Framework

Experimental Setup
 8 different company specific datasets
Product Category Visitors Products
Shoes 31,536 11,712
Children's Clothing 16,752 3,956
Decoration 12,747 5,054
Lingerie 11,672 3,514
Furniture 20,507 6,481
Men's Clothing 8,412 4,737
Women's Clothing 50,336 12,979
Household linen 12,376 2,934
07/20/2016

Experimental Setup
 Evaluation metric: F1@5 [Lipton, 2015]
 Method of analysis:

Experimental Setup
 Evaluation: Data and Recommendation Calculation
 Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]

Experimental Setup
 Evaluation: Data and Recommendation Calculation
 Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]
 Interpretation: Variable importance
 Implementation of Breiman’s (2003) method developed for random forests
𝐹𝑒𝑎𝑡𝐼𝑚𝑝 𝑖
=
𝐹1@5 𝐹𝑢𝑙𝑙 − 𝐹1@5 𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛
𝑖
𝐹1@5 𝐹𝑢𝑙𝑙
𝐹𝑒𝑎𝑡𝐼𝑚𝑝 𝑎𝑔𝑔𝑟
𝑖
=
1
𝑑
𝐹𝑒𝑎𝑡𝐼𝑚𝑝 𝑖
𝑑
07/20/2016

Results: Data
 RQ1.a. Do Recommendation systems based on different single data sources differ in
performance?
---- indicate a non-significant difference @ 95% CI
07/20/2016

Results: Data
performance?
 Yes, there is a difference in performance of different single data source
recommendation sytems
07/20/2016

Results: Data
performance?
 Yes, there is a difference in performance of different single data source
recommendation sytems
A company focusses best on a RBD (or PD) based recommendation sytem when
building a single data source recommender system
07/20/2016

Results: Data
 RQ1.b. Does combining different data sources add predictive performance?
…... indicate a marginally significant difference
07/20/2016

Results: Data
 Yes, performance increases when adding data sources
07/20/2016

Results: Data
 Yes, performance increases when adding data sources
It is worthwhile for a company to investigate data source combination to improve
performance of recommendation systems
07/20/2016

Results: Recommendation Calculation
 RQ2. Which hybridization technique performs best for algorithms with the optimal
number of data sources?

 Factorization machines are out performing an a posteriori weighting of single data
source algorithms

 Factorization machines are out performing an a posteriori weighting of single data
source algorithms
It is worthwhile for a company to investigate advanced hybridization techniques to
improve the performance of recommendation systems

Results: Feature Importance
 RQ3. Which are the most important predictors in the best performing algorithm?
 Within the best performing algorithm (RQ1 and RQ2), distinction can be made
between data source importance scores. RBD > PD > CD > ABD
0% 5% 10% 15% 20% 25% 30% 35% 40%
Aggregated Behavioral Data
Customer Data
Product Data
Raw Behavioral Data
07/20/2016

Results: Feature Importance
0% 2% 4% 6% 8% 10% 12% 14%
Number of total purchases
Mean product rating
Total value of purchases
Length of relationship
Time since last purchase
Internal vs external
Value-based segmentation
Mean Product Rating
Explicit ratings
Number of children
Marital Status
Place of residence
Age of Children
Brand
Gender
Age
Internal search
Product Division 3
Product Division 2
Product Division 1
Purchases
Addition to cart
Views
RBD
PD
CD
ABD
07/20/2016

Conclusions
 A framework to guide marketing professionals and academics in their
efforts to create recommendation systems
 Empirical validation of the framework on 8 datasets:

Conclusions
 Single data sources recommendation systems differ in performance

Conclusions
 Combining data sources adds to the performance of recommendation systems

Conclusions
 An advanced combination technique based on feature combination outperforms
a posteriori weighting of single data source algorithms

Conclusions
 An advanced combination technique based on feature combination outperforms
a posteriori weighting of single data source algorithms
 RBD is the most important data source in the best performing model followed by
PD, CD, and finally ABD

Future Work
 Incorporation of other evaluation metrics in the framework
 Field test  Evaluation of different recommendation strategies in terms
of business metrics
 Identification of the relationship between ‘academic’ metrics and
business metrics

References
 J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, Recommender systems survey, Knowl.-Based Syst.,
46 (2013) 109-132
 ] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the
state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., 17 (2005) 734-749
 Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, 14th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas,
NV, 2008, pp. 426-434
 M.J. Pazzani, A framework for collaborative, content-based and demographic filtering, Artif. Intell.
Rev., 13 (1999) 393-408
 C. Porcel, A. Tejeda-Lorente, M.A. Martinez, E. Herrera-Viedma, A hybrid recommender system for
the selective dissemination of research resources in a technology transfer office, Inform. Sciences,
184 (2012) 1-19
 R. Burke, Hybrid recommender systems: Survey and experiments, User Modeling and User-Adapted
Interaction, 12 (2002) 331-370

References
 J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative filtering recommender
systems, ACM Trans. Inf. Syst., 22 (2004) 5-53
 I.-Y. Song, Database Design for Real-World E-Commerce Systems, IEEE Data Engineering Bulletin, 23
(2000) 23-28.
 R. Kohavi, L. Mason, R. Parekh, Z. Zheng, Lessons and Challenges from Mining Retail E-Commerce
Data, Mach. Learn., 57 (2004) 83-113
 S. Rendle, Factorization Machines, IEEE International Conference on Data Mining, Sydney, Australia,
2010
 Z.C. Lipton, C. Elkan, B. Naryanaswamy, Optimal thresholding of classifiers to maximize F1 measure,
in: T. Calders, F. Esposito, E. Hüllermeier, R. Meo (Eds.) Machine Learning and Knowledge Discovery in
Databases, Springer Berlin Heidelberg 2014, pp. 225-239
 L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32

Thank you for
your Attention
Contact:
Stijn Geuens (0)3.20.545.892
IESEG School of Management s.geuens@ieseg.fr
3 Rue de la Digue fr.linkedin.com/pub/stijn-geuens/
F-59000 Lille stijn.geuens

Appendix 1: Advantages and disadvantages
of different systems
[Burke, 2002]
Collaborative
Filtering
Content-based Knowledge-Based Demographic
Pros
No metadata
engineering needed
Comparison between
items possible
Deterministic
No metadata
engineering needed
Serendipity in results
No metadata
engineering needed
No cold-start Serendipity in results
Adaptive Adaptive
Cons
Scalability Overspecialization
Knowledge engineering
required
Long tail
Cold Start for new users
and items
Cold start for new users Subjective Cold start for new users
Long tail problem
Collection of product
information
Static Static
Stability
07/20/2016

Appendix 2: Experimental Framework
Data
Data
Product Data
Three main
product division
Brand
Mean product
rating
Internal vs.
external
Availability on
the web
Customer Data
Age
Gender
Marital status
Place of
residence
Number of
children
Age of children
Aggregated
Behavioral Data
RFM
Time since last
purchase
Number of total
purchases
Total value of
purchases
Relationship
features
Length of
Relationship
Value-based
segmentation
Mean product
rating
Raw Behavioral
Data
Explicit ratings
Purchases
Internal search
Addition to cart
Views
07/20/2016

Data

Data
Product Category Visitors Products
Shoes 31,536 11,712
Children's Clothing 16,752 3,956
Decoration 12,747 5,054
Lingerie 11,672 3,514
Furniture 20,507 6,481
Men's Clothing 8,412 4,737
Women's Clothing 50,336 12,979
Household linen 12,376 2,934
07/20/2016

Appendix 3: Experimental Framework:
Recommendation Calculation
25
 Factorization Machines
 Introduced by Rendle (2010)
 Based on Support Vector Machines (SVM) and factorization models and combines the advantages
of both.
 SVM: Works with any real valued feature vector, allowing to integrated different data sources
 Factorization Models: Variable interaction is calculated based on factorized parameters, allowing
to estimate interaction under huge sparsity, where SVM’s fail.
 General FM model equation of degree 2:
AMS World Marketing Congress 201607/20/2016

Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Similar to Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems (20)

Recently uploaded

Recently uploaded (20)

Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems