My PhD trajectory

Factorization Machines for Hybrid
Recommendation Systems Based
on Behavioral, Product, and
Customer Data
Stijn Geuens

Agenda
• PhD Trajectory
• Goals
• Research Questions
• Progress
• Future Work
RecSys 2015 s.geuens@ieseg.fr

PhD Trajectory
Computer
Science
Machine
Learning Math &
Statistics
Business
Expertise
Data
Engineering
Business
Analytics
Data
Science

Research Goals
Machine Learning
Data Engineering
Business Analytics

Research Questions
Machine Learning What is the added value of combining different
data sources?
• More data beats better models (Halevy, Norveg, Pereira, 2009)
• Rich database
– Explicit Ratings
– Implicit Ratings
– Customer Data
– Product Data
– Context Data
• Different combination methods

Research Questions
How can we evaluate recommender systems in
online settings using business metrics?
• Collaboration with company
• Witch metric to optimize?
– Click rates
– conversion
– Turnover
– Loyalty
– Etc.
• Does a RecSys affect these business performance?
Business Analytics

Current Study
Factorization Machines for Hybrid
Recommendation Systems Based
on Behavioral, Product, and
Customer Data

Motivation
• Typologies of systems using different input data:
– Collaborative filtering, content-based, and hybrid (Adomavicius, Tuzhilin, 2005)
– Collaborative filtering, content-based, demographic, knowledge-based,
hybrid (Burke, 2000; Bobadilla et al. 2013)
• Each systems has its advantages and disadvantages
• Hybridization resolves these issues and leads to better performance
• More data trumps better models (Halevy, Norveg, Pereira, 2009)
• This study: Hybridization by combining different data sources
(customer, product, behavioral data) by feature combination using a
single state-of-the-art algorithm, factorization machines (FM)
 Combining all different data sources in one algorithm is never
done before, especially not in factorization machines research

Factorization Machines (FM)
• Introduced by Rendle (2010)
• Based on Support Vector Machines (SVM) and factorization
models and combines the advantages of both.
• SVM: Works with any real valued feature vector, allowing to
integrated different data sources
• Factorization Models: Variable interaction is calculated based
on factorized parameters, allowing to estimate interaction
under huge sparsity, where SVM’s fail.
• General FM model equation of degree 2:

Algorithms
• 4 factorization machines
– 3 single data source FMs
• Behavioral data (FMBD)
• Customer data (FMCD)
• Product data (FMPD)
– 1 Hybrid FM based on the 3 distinct data sources (FMBD/CD/PD)
• 1 company used hybrid CF benchmark model
– Input user-item matrix (M), where each element is defined as follows:

Data
• 2 distinct data sets:
– Furniture: 5,368 users and 2,601 items
– Children’s clothing: 5,999 users and 4,372 items

Results
• Evaluation: Recall@5 – recall@100
• Friedman test with Holm’s Procedure (Demsar 2006):
– Dependent variable = Recall
– Independent variable = Algorithm
– Cases = selection size – product category combinations
Algorithm FMPD/CD/BD FMBD CF FMCD FMPD
Ranking 1 2.38 2.77 3.95 4.90
NS

Results
• Furniture category • Children’s Clothing Category
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Recall
Selection Size
FMPD FMCD FMBD FMPD/CD/BD CF
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Recall
Selection Size
FMPD FMCD FMBD FM/PD/CD/BD CF

Future Work: This study
• Preform grid search to identify witch data sources
are the most important (on data type level and
individual variable level)
• Creating a benchmark hybrid algorithm combining
results of different systems created based on each
of the data sources
• Evaluation based on other theoretical metrics
(precision, F1, AUC, diversity, novelty, etc.)

Future Work: PhD
• Implement model at the company and perform a real-life
A/B tests
– Email system
– Webshop
• Evaluation of the implemented algorithm in terms of
business metrics (click rates, conversion rates, turnover, loyalty,
etc.)
• Investigate which (combination of) business metrics
optimize(s) economic value of the RecSys in both short and
long term
• Investigate the impact of a RecSys on economic performance
of a company

Thank you for
your Attention
Contact:
Stijn Geuens (0)3.20.545.892
IESEG School of Management s.geuens@ieseg.fr
3 Rue de la Digue fr.linkedin.com/pub/stijn-geuens/
F-59000 Lille stijn.geuens

Advantages and disadvantages of
different systems
Pros Cons
Collaborative Filtering • No metadata
engineering needed
• Serendipity in results
• Adaptive
• Scalability
• Cold Start for new users
and items
• Long tail problem
• Stability
Content-based • Comparision between
items possible
• No metadata
engineering needed
• Adaptive
• Overspecialization
• Cold start for new users
• Collection of product
information

Advantages and disadvantages of
different systems
Pros Cons
Knowlegde-based • Deterministic
• No cold-start
• Knowledge engineering
requered
• Subjective
• Static
Demographic • No metadata
engineering needed
• Serendipity in results
• Long tail
• Cold start for new users
• Static

My PhD trajectory

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to My PhD trajectory

Similar to My PhD trajectory (20)

Recently uploaded

Recently uploaded (20)

My PhD trajectory

Editor's Notes