SlideShare a Scribd company logo
Factorization Machines (FM)- novel but proved
and promising approach
Evgeniy Marinov
July 2018
1 / 29
Introduction to FM
In 2010, Steffen Rendle (currently a senior research scientist
at Google), introduced a seminal paper [Rendle - FM].
Many years have passed since such an impactful algorithm has
been introduced in the world of ML /Recommender Systems/.
FMs are a new model class which combines the advantages of
Support Vector Machines(SVM)/polynomial regression and
factorization models.
2 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
3 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
Avazu’s CTR prediction context
https://www.kaggle.com/c/avazu-ctr-prediction
http://www.csie.ntu.edu.tw/~r01922136/slides/
kaggle-avazu.pdf
3 / 29
Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
Avazu’s CTR prediction context
https://www.kaggle.com/c/avazu-ctr-prediction
http://www.csie.ntu.edu.tw/~r01922136/slides/
kaggle-avazu.pdf
In both competitions the LogLoss function for evaluation of the
classifiers has been used - the aim is to minimize the Logrithmic
Loss.
LogLoss = − 1
N
N
i=1[yi log(pi ) − (1 − yi ) log(1 − pi )]
with binary classification
3 / 29
Task Description
We are given (training and testing) dataset /images,
documents,.../
D = {(x(l), y(l))}l∈L ⊂ Rn × T
The aim is to find a function ˆy : Rn −→ T (target space),
which estimates the testing data /unknown to the trained
model/.
In recommender systems (Online advertising) we deal with
sparse input vectors x(l) ∈ Rn.
T can be {−1, 1}, {0, 1} (classification), R (regression) or
some categorical space (ranks for an item).
4 / 29
Task Description
5 / 29
Ad Classification
Classification from implicit feedback - the user does not rate
explicitly..
But we can ”guess” items with which features he likes or does
not care about.
Country Day Ad Type Clicked ?
USA 3/3/15 MOVIE 1
China 1/7/14 GAME 0
China 3/3/15 GAME 1
6 / 29
Standard (dummy) encoding
USA China 3/3/15 1/7/14 MOVIE GAME Clicked ?
1 0 1 0 1 0 1
0 1 0 1 0 1 0
0 1 1 0 0 1 1
Very large feature space
Very sparse samples (feature values)
7 / 29
(De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
8 / 29
(De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
Feature space: insanely large
If originally n features −→ additional n
2 = n(n−1)
2!
Samples: still sparse
8 / 29
Factorization Models
9 / 29
SVD with ALS
Singular Value Decomposition (Matrix Factorization) with
Alternate Least Squares.
The most basic matrix factorization model for recommender
systems models the rating ˆr a user u would give to an item i by:
ˆrui = xT
u yi ,
where
xT
u = (x1
u , · · · , xN
u ) - factor vector associated to the user
yT
i = (x1
y , · · · , yN
i ) - factor vector associated to the item
The dimension N of the factors xT
u , yT
i is the rank of the
model (factorization).
10 / 29
Factorization Models
Collaborative Filtering:
- Neighborhood Methods
- Latent Factor Methods
11 / 29
Factorization Models
12 / 29
Factorization Models
13 / 29
Factorization Models
14 / 29
Advantages of Factorization Models
Factorization models:
Factorization of higher order interactions
Efficient parameter estimation and superior performance
Even for sparse data where just a few or no observations for
those higher order effects are available
Main applications:
Collaborative Filtering (in Recommender Systems)
Link Prediction (in Social Networks)
15 / 29
Disadvantages of Factorization Models
BUT are not general prediction models. There are many
factorization models designed for specific tasks.
Standard models:
Matrix Factorization (MF)
Tensor Factorization (TF)
Specific tasks:
SVD++ [Koren, Bell - Advances in CF]
timeSVD++ [Koren, Bell - Advances in CF]
PIFT (Pairwise Interaction Tensor Factorization)
FPMC (Factorizing Personalized Markov Chains)
And others...
16 / 29
Advantages of FMs
FMs allow parameter estimation under very sparse data where
SVMs fail.
FMs have linear (and even ”almost constant”!) complexity
and don’t rely on support vectors like SVMs.
FMs are a general predictor that can work with any real
valued feature vector. In contrast, other state-of-the-art
factorization models work only on very restricted input data.
Through suitable feature engineering, FMs can mimic
state-of-the-art models like MF, SVD++, timeSVD++,
PARAFAC, PITF, FPMC and others.
17 / 29
18 / 29
Notations
For any x ∈ Rn let us define:
m(x) to be the number of the nonzero elements of the feature
vector;
¯mD to be the average of m(x(l)
) for all l ∈ L;
Since we deal with huge sparsity, we have that ¯mD n - the
dimensionality of the feature space.
One reason for huge sparsity is that the underlying problem
deals with large categorical variable domains.
19 / 29
Definition of Factorization Machine Model of degree 2
ˆy(x) := w0 +
n
i=1
wi xi +
n
i=1
n
j=i+1
vi , vj xi xj (1)
where the model parameters that have to be estimated are:
w0 ∈ R − global baias (mean rating over all items/users)
w = (w1, . . . , wn) ∈
Rn − weights of the corresponding features
vi := (vi,1, . . . , vi,k) within V ∈ Rn×k describes the i-th
variable(feature) with k factors.
20 / 29
Complexity of the model equation
The model equation ˆy(x) for every x ∈ D can be computed in
linear time O(kn) and under sparsity O(km(x))
Proof:
n
i=1
n
j=i+1 vi , vj xi xj =
1
2
n
i=1
n
j=1 vi , vj xi xj − 1
2
n
i=1 vi , vi xi xi =
1
2[ n
i=1
n
j=1
k
f =1 vi,f vj,f xi xj − n
i=1
k
f =1 vi,f vi,f xi xi ] =
1
2
k
f =1[( n
i=1 vi,f xi )( n
j=1 vj,f xj ) − n
i=1 v2
i,f x2
i ] =
1
2
k
f =1[( n
i=1 vi,f xi )2 − n
i=1 v2
i,f x2
i ]
21 / 29
Complexity of parameter updates
The needed computation to get the partial derivatives of ˆy(x) is
done while computing the computation of ˆy(x). Therefore,
All parameter updates for a case (x, y) can be done in O(kn)
and under sparsity O(km(x)).
∂ˆy(x)
∂θ
=



1, if θ is w0
xi , if θ is wi
xi
n
j=1 vj,f xj − vi,f x2
i if θ is vi,f
(2)
22 / 29
General applications of FM
FMs can be applied to a variety of prediction tasks.
Regression: ˆy(x) can be directly applied as a regression
predictor
Binary classification: The sign of ˆy(x) is used and the
parameters are optimized for hinge loss or logit loss.
Ranking: Input vectors x are ordered by the score of ˆy(x)
and optimization is done over pairs of instance vectors with a
pairwise classification loss ([Joachims - Ranking CTR]).
23 / 29
Inference
Inference algorithms:
SGD (Stochastic Gradient Descent)
ALS (Alternate Least Squares)
MCMC (Monte Carlo Markov Chains)
24 / 29
Library: libFM (C++)
Author: Steffen Rendle
Web site: http://www.libfm.org/
(very poor description of the library)
Source code (C++): https://github.com/srendle/libfm
Very professionaly written source code! (Linux and MacOS X)
libFM features
stochastic gradient descent (SGD)
alternating least squares (ALS)
Bayesian inference using Markov Chain Monte Carlo (MCMC)
25 / 29
Library: fastFM (Python)
Author: Immanuel Bayer
Web site: http://ibayer.github.io/fastFM/
Very detailed tutorial for the usage of the library!
Source code (Python):
https://github.com/ibayer/fastFM
Python (2.7 and 3.5) with the well known scikit-learn API.
Performence critical code in C and wrapped with Cython.
Task Solver Loss
Regression ALS, SGD, MCMC Square Loss
Classification ALS, SGD, MCMC Probit(MAP), Probit, Sigmuid
Ranking SGD [Rendle - BPR]
26 / 29
Library: LIBFFM (C++)
A library for Field-aware Factorization Machines from
the Machine Learning group at National Taiwan University
Author: YuChin Juan and colleagues
Web site:
https://www.csie.ntu.edu.tw/~cjlin/libffm/
(very poor description of the library)
Source code (C++):
https://github.com/guestwalk/libffm
FFM is prone to overfitting and not very stable for the
moment.
27 / 29
Table of contents
Thank you for your attention!
28 / 29
References
Steffen Rendle Factorization Machines. 2010. http://www.
ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and
Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking
from implicit feedback. UAI ’09, pages 452 2009
Koren Y. and Bell R. Advances in Collaborative Filtering.
Blondel M. Convex Factorization Machines.
http://www.mblondel.org/publications/mblondel-
ecmlpkdd2015.pdf
Thorsten Joachims, Optimizing Search Engines using
Clickthrough Data 2002.
29 / 29

More Related Content

What's hot

Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
InMobi
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Alexandros Karatzoglou
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
Evention
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
Olivier Jeunen
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
blueace
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
Erik Bernhardsson
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
YONG ZHENG
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Carlos Castillo (ChaTo)
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 

What's hot (20)

Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Recommender system
Recommender systemRecommender system
Recommender system
 
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 

Similar to Factorization Machines and Applications in Recommender Systems

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
fuchaoqun
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
Oswald Campesato
 
Ml5
Ml5Ml5
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
Ankit Tewari
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
Codemotion
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
Arthur240715
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
Joel Falcou
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
MLconf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
MLconf
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
Polytechnique Montréal
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
Oswald Campesato
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Pluribus One
 

Similar to Factorization Machines and Applications in Recommender Systems (20)

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Ml5
Ml5Ml5
Ml5
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
projectreport
projectreportprojectreport
projectreport
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
 

Recently uploaded

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 

Recently uploaded (20)

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 

Factorization Machines and Applications in Recommender Systems

  • 1. Factorization Machines (FM)- novel but proved and promising approach Evgeniy Marinov July 2018 1 / 29
  • 2. Introduction to FM In 2010, Steffen Rendle (currently a senior research scientist at Google), introduced a seminal paper [Rendle - FM]. Many years have passed since such an impactful algorithm has been introduced in the world of ML /Recommender Systems/. FMs are a new model class which combines the advantages of Support Vector Machines(SVM)/polynomial regression and factorization models. 2 / 29
  • 3. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf 3 / 29
  • 4. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf Avazu’s CTR prediction context https://www.kaggle.com/c/avazu-ctr-prediction http://www.csie.ntu.edu.tw/~r01922136/slides/ kaggle-avazu.pdf 3 / 29
  • 5. Kaggle Competitions Kaggle Competitions won with Factorization Machines: Criteo’s CTR on display ads contest - 2014 https: //www.kaggle.com/c/criteo-display-ad-challenge http://www.csie.ntu.edu.tw/~r01922136/ kaggle-2014-criteo.pdf Avazu’s CTR prediction context https://www.kaggle.com/c/avazu-ctr-prediction http://www.csie.ntu.edu.tw/~r01922136/slides/ kaggle-avazu.pdf In both competitions the LogLoss function for evaluation of the classifiers has been used - the aim is to minimize the Logrithmic Loss. LogLoss = − 1 N N i=1[yi log(pi ) − (1 − yi ) log(1 − pi )] with binary classification 3 / 29
  • 6. Task Description We are given (training and testing) dataset /images, documents,.../ D = {(x(l), y(l))}l∈L ⊂ Rn × T The aim is to find a function ˆy : Rn −→ T (target space), which estimates the testing data /unknown to the trained model/. In recommender systems (Online advertising) we deal with sparse input vectors x(l) ∈ Rn. T can be {−1, 1}, {0, 1} (classification), R (regression) or some categorical space (ranks for an item). 4 / 29
  • 8. Ad Classification Classification from implicit feedback - the user does not rate explicitly.. But we can ”guess” items with which features he likes or does not care about. Country Day Ad Type Clicked ? USA 3/3/15 MOVIE 1 China 1/7/14 GAME 0 China 3/3/15 GAME 1 6 / 29
  • 9. Standard (dummy) encoding USA China 3/3/15 1/7/14 MOVIE GAME Clicked ? 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 Very large feature space Very sparse samples (feature values) 7 / 29
  • 10. (De)motivation! Often features are more important in pairs: ”Country == USA”&”Day == Thanksgiving” If we create a new feature (conjunction) for every pair of features..?!? 8 / 29
  • 11. (De)motivation! Often features are more important in pairs: ”Country == USA”&”Day == Thanksgiving” If we create a new feature (conjunction) for every pair of features..?!? Feature space: insanely large If originally n features −→ additional n 2 = n(n−1) 2! Samples: still sparse 8 / 29
  • 13. SVD with ALS Singular Value Decomposition (Matrix Factorization) with Alternate Least Squares. The most basic matrix factorization model for recommender systems models the rating ˆr a user u would give to an item i by: ˆrui = xT u yi , where xT u = (x1 u , · · · , xN u ) - factor vector associated to the user yT i = (x1 y , · · · , yN i ) - factor vector associated to the item The dimension N of the factors xT u , yT i is the rank of the model (factorization). 10 / 29
  • 14. Factorization Models Collaborative Filtering: - Neighborhood Methods - Latent Factor Methods 11 / 29
  • 18. Advantages of Factorization Models Factorization models: Factorization of higher order interactions Efficient parameter estimation and superior performance Even for sparse data where just a few or no observations for those higher order effects are available Main applications: Collaborative Filtering (in Recommender Systems) Link Prediction (in Social Networks) 15 / 29
  • 19. Disadvantages of Factorization Models BUT are not general prediction models. There are many factorization models designed for specific tasks. Standard models: Matrix Factorization (MF) Tensor Factorization (TF) Specific tasks: SVD++ [Koren, Bell - Advances in CF] timeSVD++ [Koren, Bell - Advances in CF] PIFT (Pairwise Interaction Tensor Factorization) FPMC (Factorizing Personalized Markov Chains) And others... 16 / 29
  • 20. Advantages of FMs FMs allow parameter estimation under very sparse data where SVMs fail. FMs have linear (and even ”almost constant”!) complexity and don’t rely on support vectors like SVMs. FMs are a general predictor that can work with any real valued feature vector. In contrast, other state-of-the-art factorization models work only on very restricted input data. Through suitable feature engineering, FMs can mimic state-of-the-art models like MF, SVD++, timeSVD++, PARAFAC, PITF, FPMC and others. 17 / 29
  • 22. Notations For any x ∈ Rn let us define: m(x) to be the number of the nonzero elements of the feature vector; ¯mD to be the average of m(x(l) ) for all l ∈ L; Since we deal with huge sparsity, we have that ¯mD n - the dimensionality of the feature space. One reason for huge sparsity is that the underlying problem deals with large categorical variable domains. 19 / 29
  • 23. Definition of Factorization Machine Model of degree 2 ˆy(x) := w0 + n i=1 wi xi + n i=1 n j=i+1 vi , vj xi xj (1) where the model parameters that have to be estimated are: w0 ∈ R − global baias (mean rating over all items/users) w = (w1, . . . , wn) ∈ Rn − weights of the corresponding features vi := (vi,1, . . . , vi,k) within V ∈ Rn×k describes the i-th variable(feature) with k factors. 20 / 29
  • 24. Complexity of the model equation The model equation ˆy(x) for every x ∈ D can be computed in linear time O(kn) and under sparsity O(km(x)) Proof: n i=1 n j=i+1 vi , vj xi xj = 1 2 n i=1 n j=1 vi , vj xi xj − 1 2 n i=1 vi , vi xi xi = 1 2[ n i=1 n j=1 k f =1 vi,f vj,f xi xj − n i=1 k f =1 vi,f vi,f xi xi ] = 1 2 k f =1[( n i=1 vi,f xi )( n j=1 vj,f xj ) − n i=1 v2 i,f x2 i ] = 1 2 k f =1[( n i=1 vi,f xi )2 − n i=1 v2 i,f x2 i ] 21 / 29
  • 25. Complexity of parameter updates The needed computation to get the partial derivatives of ˆy(x) is done while computing the computation of ˆy(x). Therefore, All parameter updates for a case (x, y) can be done in O(kn) and under sparsity O(km(x)). ∂ˆy(x) ∂θ =    1, if θ is w0 xi , if θ is wi xi n j=1 vj,f xj − vi,f x2 i if θ is vi,f (2) 22 / 29
  • 26. General applications of FM FMs can be applied to a variety of prediction tasks. Regression: ˆy(x) can be directly applied as a regression predictor Binary classification: The sign of ˆy(x) is used and the parameters are optimized for hinge loss or logit loss. Ranking: Input vectors x are ordered by the score of ˆy(x) and optimization is done over pairs of instance vectors with a pairwise classification loss ([Joachims - Ranking CTR]). 23 / 29
  • 27. Inference Inference algorithms: SGD (Stochastic Gradient Descent) ALS (Alternate Least Squares) MCMC (Monte Carlo Markov Chains) 24 / 29
  • 28. Library: libFM (C++) Author: Steffen Rendle Web site: http://www.libfm.org/ (very poor description of the library) Source code (C++): https://github.com/srendle/libfm Very professionaly written source code! (Linux and MacOS X) libFM features stochastic gradient descent (SGD) alternating least squares (ALS) Bayesian inference using Markov Chain Monte Carlo (MCMC) 25 / 29
  • 29. Library: fastFM (Python) Author: Immanuel Bayer Web site: http://ibayer.github.io/fastFM/ Very detailed tutorial for the usage of the library! Source code (Python): https://github.com/ibayer/fastFM Python (2.7 and 3.5) with the well known scikit-learn API. Performence critical code in C and wrapped with Cython. Task Solver Loss Regression ALS, SGD, MCMC Square Loss Classification ALS, SGD, MCMC Probit(MAP), Probit, Sigmuid Ranking SGD [Rendle - BPR] 26 / 29
  • 30. Library: LIBFFM (C++) A library for Field-aware Factorization Machines from the Machine Learning group at National Taiwan University Author: YuChin Juan and colleagues Web site: https://www.csie.ntu.edu.tw/~cjlin/libffm/ (very poor description of the library) Source code (C++): https://github.com/guestwalk/libffm FFM is prone to overfitting and not very stable for the moment. 27 / 29
  • 31. Table of contents Thank you for your attention! 28 / 29
  • 32. References Steffen Rendle Factorization Machines. 2010. http://www. ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. UAI ’09, pages 452 2009 Koren Y. and Bell R. Advances in Collaborative Filtering. Blondel M. Convex Factorization Machines. http://www.mblondel.org/publications/mblondel- ecmlpkdd2015.pdf Thorsten Joachims, Optimizing Search Engines using Clickthrough Data 2002. 29 / 29