Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary
Factorization Machines
Steen Rendle
Current aliation: Google Inc.
Work was done at University of Konstanz
MLConf, November 14, 2014
Steen Rendle 1 / 53

Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Models
Linear/ Polynomial Regression
Comparison
Applications
Summary
Steen Rendle 2 / 53

Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
k is the rank of the reconstruction.
Steen Rendle 3 / 53

Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
^y(u; i) = ^yu;i =
Xk
f =1
wu;f hi ;f = hwu; hi i
k is the rank of the reconstruction.
Steen Rendle 3 / 53

Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
Steen Rendle 4 / 53

Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Steen Rendle 4 / 53

Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Rating
Matrix
time
^ytimeSVD(u; i ; t) := hvu + vu;t ; vi i
^ytimeTF(u; i ; t) :=
Xk
f =1
vu;f vi ;f vt;f
: : :
Steen Rendle 4 / 53

Tensor Factorization
Triples of Subject, Predicate, Object
^yPARAFAC(s; p; o) :=
Xk
f =1
vs;f vp;f vo;f
^yPITF(s; p; o) := hvs ; vpi + hvs ; voi + hvp; voi
: : :
Steen Rendle 5 / 53
[illustration from Drumond et al. 2012]

Sequential Factorization Models
Bt Bt3
b
b a b
a c
User 1 ?
c
e
c c
a
?
d c e e ?
?
User 2
User 3
User 4
Bt2
Bt1
a
^yFMC(u; i ; t) :=
X
l2Bt1
hvi ; vl i
^yFPMC(u; i ; t) := hvu; vi i +
X
l2Bt1
hvi ; vl i
: : :
Steen Rendle 6 / 53

Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
Steen Rendle 7 / 53

Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
I Downsides
I Factorization models are usually build speci

cally for each problem.
I Learning algorithms and implementations are tailored to individual
models.
Steen Rendle 7 / 53

Outline
Comparison
Applications
Summary
Steen Rendle 8 / 53

Data and Variable Representation
Many standard ML approaches work with real valued feature vectors as
input. It allows to represent, e.g.:
I any number of variables
I categorical domains by using dummy indicator variables
I numerical domains
I set-categorical domains by using dummy indicator variables
Using this representation allows to apply a wide variety of standard
models (e.g. linear regression, SVM, etc.).
Steen Rendle 9 / 53

Linear Regression
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation:
^y(x) := w0 +
Xp
i=1
wi xi
I Model parameters:
w0 2 R; w 2 Rp
O(p) model parameters.
Steen Rendle 10 / 53

Polynomial Regression
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp
O(p2) model parameters.

Outline
Comparison
Applications
Summary

Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User

Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
,
# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .

# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .
)
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)

Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:

1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Linear regression: ^y(x) = w0 + wu + wi

1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i

1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i
Matrix factorization: ^y(u; i) = hwu; hi i

For the data of the example:
I Linear regression has no user-item interaction.

I ) Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.

I n p2: number of cases is much smaller than number of model
parameters.

I n p2: number of cases is much smaller than number of model
parameters.
I Max.-likelihood estimator for a pairwise eect is:
wi ;j =
(
y w0 wi wu; if (i ; j ; y) 2 S:
not de

ned; else

ned; else
I Polynomial regression cannot generalize to any unobserved pairwise
eect.

Outline
Model
Examples
Properties
Learning
libFM Software
Applications
Summary

Factorization Machine (FM)
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
[Rendle 2010, Rendle 2012]

^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Compared to Polynomial regression:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp

^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
+
Xp
i=1
Xp
ji
Xp
lj
Xk
f =1
v(3)
i ;f v(3)
j ;f v(3)
l ;f xi xj xl
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk ; V(3) 2 Rpk

Factorization Machines: Discussion
I FMs work with real valued input.
I FMs include variable interactions like polynomial regression.
I Model parameters for interactions are factorized.
I Number of model parameters is O(k p) (instead of O(p2) for poly.
regr.).

Outline
Model
Examples
Properties
Learning
libFM Software
Applications
Summary

Matrix Factorization and Factorization Machines
Two categorical variables encoded with real valued predictor variables:
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
With this data, the FM is identical to MF with biases1:
^y(x) = w0 + wu + wi + hvu; vi i | {z }
MF
1libFM, k = 128, MCMC inference, Net
ix RMSE=0.8937

RDF-Triple Prediction with Factorization Machines
Three categorical variables encoded with real valued predictor variables:
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ... 0 0 0 1 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
S1 S2 S3 ... P1 P2 P3 P4 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
1
0
0
0
1
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
...
...
...
...
...
0 0 0 1 ...
O1 O2 O3 O4 ...
Subject Predicate Object
With this data, the FM is equivalent to the PITF model:
^y(x) := w0 + ws + wp + wo + hvs ; vpi + hvs ; voi + hvp; voi
[PITF: Rendle et al. 2010, WSDM Best Student Paper, ECML 2009 Best DC Award]

Time with Factorization Machines
Two categorical variables and time as linear predictor:
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
0.2
0.6
0.61
0.3
0.5
0.1
0.8
Time
The FM model would correspond to:
^y(x) := w0 + wi + wu + t wtime + hvu; vi i + t hvu; vtimei + t hvi ; vtimei

Time with Factorization Machines
Two categorical variables and time discretized in bins (b(t)):
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
1
0
0
1
0
1
0
0
1
1
0
1
0
0
Time
0
0
0
0
0
0
1
T1 T2 T3
The FM model would correspond to:2
^y(x) := w0 + wi + wu + wb(t) + hvu; vi i + hvu; vb(t)i + hvi ; vb(t)i
ix RMSE=0.8873

SVD++
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ... 0.3 0.3 0.3 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
0.3
0.3
0
0
0.5
0.3
0.3
0
0
0
0.3
0.3
0.5
0.5
0.5
0
0
0.5
0.5
0
...
...
...
...
...
0.5 0 0.5 0 ...
TI NH SW ST ...
User Movie Other Movies rated
With this data, the FM3 is identical to:
^y(x) =
SVD++ z }| {
w0 + wu + wi + hvu; vi i +
1 p
jNuj
X
l2Nu
hvi ; vl i
+
1 p
jNuj
X
l2Nu
0
@wl + hvu; vl i +
1 p
jNuj
X
l 02Nu ;l 0l
hvl ; v0
l i
1
A
ix RMSE=0.8865
[Koren, 2008]

Factorizing Personalized Markov Chains (FPMC)
Two categorical variables (u,i ), one set categorical (Bt1):
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ... 0.5 0.5 0 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
u1 u2 u3 ... A B C D ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
...
...
...
...
...
1 0 0 0 ...
User Product
A B C D ...
Last Basket
Sequential Baskets
u1 A,B C
u2 C D
u3 A C
FM is equivalent to
^y(x) := w0 + wu + wi +
1
jBt1j
X
j2Bt1
wj + hvu; vi i +
1
jBt1j
X
j2Bt1
hvi ; vj i + :::
[Rendle et al. 2010, WWW Best Paper]

Outline
Model
Examples
Properties
Learning
libFM Software
Applications
Summary

Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)

^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Ecient computation can be done in: O(p k)

^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Ecient computation can be done in: O(p k)
I Making use of many zeros in x even in: O(Nz (x) k), where Nz (x) is
the number of non-zero elements in vector x.

Ecient Computation
The model equation of an FM can be computed in O(p k).

Ecient Computation
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2

Xp
i=1
(xi vi ;f )2
3
5

Ecient Computation
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2

Xp
i=1
(xi vi ;f )2
3
5
I In the sums over i , only non-zero xi elements have to be summed up
) O(Nz (x) k).
I (The complexity of polynomial regression is O(Nz (x)2).)

Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .

Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .
E.g. for second order eects ( = vl ;f ):
^y(x; vl;f ) :=
g(vl;f )(x)
z }| {
w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
j=i+1
Xk
f 0=1
(f 06=f )_(l62fi ;jg)
vi ;f 0 vj;f 0 xi xj
+ vl;f xl
X
i=1;i6=l
vi ;f xi
| {z }
h(vl;f )(x)

Outline
Model
Examples
Properties
Learning
libFM Software
Applications
Summary

Learning
Using these properties, learning algorithms can be developed:
I L2-regularized regression and classi

cation:
I Stochastic gradient descent [Rendle, 2010]
I Alternating least squares/ Coordinate Descent [Rendle et al., 2011,
Rendle 2012]
I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al.
2011, Rendle 2012]
I L2-regularized ranking:
I Stochastic gradient descent [Rendle, 2010]
All the proposed learning algorithms have a runtime of O(k Nz (X) i ),
where i is the number of iterations and Nz (X) the number of non-zero
elements in the design matrix X.

Stochastic Gradient Descent (SGD)
I For each training case (x; y) 2 S, SGD updates the FM model
parameter using:
0 =

(^y(x) y)h()(x) + ()

I is the learning rate / step size.
I () is the regularization value of the parameter .
I SGD can easily be applied to other loss functions.
[Rendle, 2010]

Coordinate Descent (CD)
I CD updates each FM model parameter using:
0 =
P
(x;y)2S

y g()(x)

h()(x)
P
(x;y)2S h2
()(x) + ()
I Using caches of intermediate results, the runtime for updating all
model parameters is O(k Nz (X)).
I CD can be extended to classi

cation [Rendle, 2012].
[Rendle et al., 2011]

Gibbs Sampling (MCMC)
I Gibbs sampling with a block for each FM model parameter :
jS; n fg N

P
(x;y)2S

y g()(x)

h()(x)

P
(x;y)2S h2
()(x) + ()
;
1

P
(x;y)2S h2
()(x) + ()
!
I Mean is the same as for CD ) computational complexity is also
O(k Nz (X)).
I MCMC can be extended to classi

cation using link functions.
[Freudenthaler et al. 2011, Rendle 2012]

Learning Regularization Values
 v ,v
yi
w ,w
wj
w0
xij
i=1,...,n
v j

j=1,...,p
 , 0 ,0
yi
wj
w0
w
xij
i=1,...,n
v j

j=1,...,p
w0 ,w0
0 ,0
w0 ,w0
w  v v
Standard FM with priors. Two level FM with hyperpriors.
[Freudenthaler et al., 2011]

Outline
Model
Examples
Properties
Learning
libFM Software
Applications
Summary

libFM Software
libFM is an implementation of FMs
I Model: second-order FMs
I Learning/ inference: SGD, ALS, MCMC
I Classi

cation and regression
I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al],
SVMlight [Joachims].
I Supports variable grouping.
I Open source: GPLv3.
[http://www.libfm.org/]

Outline
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary

(Context-aware) Rating Prediction
I Main variables:
I User ID (categorical)
I Item ID (categorical)
I Additional variables:
I time
I mood
I user pro

le
I item meta data
I . . .
I Examples: Net
ix prize, Movielens, KDDCup 2011
+
♪ + +
Song
User Time Mood

Net
ix Prize
Netflix Prize: Prediction Error
Public Leaderboard
RMS Error
0.86 0.87 0.88 0.89 0.90
user, movie
user, movie, day
user, movie, impl.
user, movie,
day, impl.
SGD Matrix
Factorization
user, movie,
day, impl.,
freq, lin. day
$1M Prize
I k = 128 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)

Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using user ID and item ID
Probabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170
Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211
Matrix Factorization [6] Variational Bayes 30 *0.9141
Matchbox [15] Variational Bayes 50 *0.9100
ALS-MF [7] ALS 100 0.9079
ALS-MF [7] ALS 1000 *0.9018
SVD/ MF [3] SGD 100 0.9025
SVD/ MF [3] SGD 200 *0.9009
Bayesian Probablistic Matrix Factorization
[13] MCMC 150 0.8965
(BPMF)
Bayesian Probablistic Matrix Factorization
(BPMF)
[13] MCMC 300 *0.8954
FM, pred. var: user ID, movie ID - MCMC 128 0.8937
Models using implicit feedback
Probabilistic Matrix Factorization with Cons-
traints
[14] Batch GD 30 *0.9016
SVD++ [3] SGD 100 0.8924
SVD++ [3] SGD 200 *0.8911
BSRM/F [18] MCMC 100 0.8926
BSRM/F [18] MCMC 400 *0.8874
FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865

Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using time information
Bayesian Probabilistic Tensor Factorization
[17] MCMC 30 *0.9044
(BPTF)
FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873
Models using time and implicit feedback
timeSVD++ [5] SGD 100 0.8805
timeSVD++ [5] SGD 200 *0.8799
FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809
FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794
Assorted models
BRISMF/UM NB corrected [16] SGD 1000 *0.8904
BMFSI plus side information [8] MCMC 100 *0.8875
timeSVD++ plus frequencies [4] SGD 200 0.8777
timeSVD++ plus frequencies [4] SGD 2000 *0.8762
FM, pred. var: user ID, movie ID, day, impl.,
- MCMC 128 0.8779
freq., lin. day
FM, pred. var: user ID, movie ID, day, impl.,
freq., lin. day
- MCMC 256 0.8771

Outline
Applications
Recommender Systems
Kaggle Competitions
Summary

I Main variables:
I Actor A ID
I Actor B ID
I pro

les
I actions
I . . .
+
Actor A Actor B

KDDCup 2012: Track 1
KDDCup 2012 Track 1: Prediction Quality
Public Leaderboard Private Leaderboard
Mean Average Precision @3
0.32 0.34 0.36 0.38 0.40 0.42
none
gender, age, ...
keywords
friends
all
none
gender, age, ...
keywords
friends
all
Top 1
Top 5
Top 10
Top 100
I k = 22 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)
[Awarded 2nd place (out of 658 teams)]

Outline
Applications
Recommender Systems
Kaggle Competitions
Summary

I Main variables:
I User ID
I Query ID
I Ad/ Link ID
I query tokens
I user pro

le
I . . .
+
keyword... +
Link 1
Link 2
Link 3
User Query Ad/ Link

KDDCup 2012: Track 2
Model Inference wAUC (public) wAUC (private)
ID-based model (k = 0) SGD 0.78050 0.78086
Attribute-based model (k = 8) MCMC 0.77409 0.77555
Mixed model (k = 8) SGD 0.79011 0.79321
Final ensemble n/a 0.79857 0.80178
Ensemble
I Rank positions (not predicted clickthrough rates) are used.
I The MCMC attribute-based model and dierent variations of the
. SGD models are included.
[Awarded 3rd place (out of 171 teams)]

Outline
Applications
Recommender Systems
Kaggle Competitions
Summary

ECML/PKDD Discovery Challenge 2013
I Problem: Recommend given names.
I Main variables:
I User ID
I Name ID
I session info
I string representation for each name
I . . .
I FM approach won 1st place (online track) and 2nd (oine track).

Steffen Rendle, Research Scientist, Google at MLconf SF

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Steffen Rendle, Research Scientist, Google at MLconf SF

Similar to Steffen Rendle, Research Scientist, Google at MLconf SF (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Steffen Rendle, Research Scientist, Google at MLconf SF