Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings

Eﬃcient Bayesian Inference for Online Matrix
Factorization in Bandit settings
Vijay Pal Jat
Supervisor: Dr. Piyush Rai
Department of Computer Science and Engineering
Indian Institute of Technology Kanpur
January 26, 2020
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 1 / 37

Overview
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.

Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.
Approach: Matrix Factorization technique with Bandit algorithm

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Decompose the rating matrix R into low rank users (U) and items (V) feature
vector matrices.
R ≈ U V T
K << min(N, M)
An entry in matrix is represented as :
rij ≈ uT
i vj

Contextual Bandit Setting

Let v1, v2, .., vM be the set of arms (items).

Given a user ui , model decides to pull a arm vj , j ∈ [M].

Model receives a reward ri,j which is determined by user and pulled arm.

Objective: Maximize the cumulative rewards.

Strategy: Play according to Explore-Exploit (EE) trade-oﬀ.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.

Strategy: Play according to Explore-Exploit (EE) trade-oﬀ.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.
Contextual information ⇒ user and item latent feature vectors which are
not given explicitly.

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.

Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.
Multi-Armed Bandit algorithm based solutions

Existing Solutions
UCB based solution:

Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]

Existing Solutions
UCB based solution:
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF

Existing Solutions
UCB based solution:
e.g. [5]
ICF
Need of local conjugacy between likelihood and prior distribution.

Existing Solutions
UCB based solution:
e.g. [5]
ICF
Need of local conjugacy between likelihood and prior distribution.
Works better if reward is real valued.

Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.

Our Model
Our Approach:
Does not use additional information.

Our Model
Our Approach:
Works better for any type of rewards.

Our Model
Our Approach:
Works better for any type of rewards.
Updates both posterior U and V simultaneously.

Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).

items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.

items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.
Exploitation case (maximizing the immediate reward)
E(ri,j ) = E(ri,j |vj , ui )p(vj , ui |D)dui dvj
J∗
(i) = max
j
E(ri,j )

For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj

p(J∗
j
Above integral is approximated by drawing random parameter from their
distribution [1].

p(J∗
j
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj

p(J∗
j
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj
Cummulative regret
t
τ=1 ∆τ

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Thompson Sampling

The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij

, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be deﬁned.

, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be deﬁned.
Due to conjugacy between Gaussian’s precision and Gamma distribution, the
posterior of hyperparameter is also Gamma which is given below:
ΛUd
= Γ(Λud
|αo +
N
2
, βo +
1
2
N
i=1
u2
di )
Λvd
= Γ(Λvd
|αo +
M
2
, βo +
1
2
M
j=1
v2
dj )

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model

SGLD
Assume
X = {xi }N
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)

SGLD
Assume
X = {xi }N
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)
SGD update equation [4] for MAP estimation:
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt)
Gradient calculated on mini batch of size n.
MAP estimation does not capture uncertainty in parameter.

SGLD
Langevin Dynamics [2]
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )

SGLD
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
Gradient computed over whole dataset.
Require Metropolis Hastings (MH) accept-reject test.

SGLD
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
SGLD [6]
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, t)

SGLD
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
SGLD [6]
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, t)
As step size t tends to zero, the acceptance rate tends to one.
No need of MH accept-reject test.

SGLD based Thompson Sampling
Sample user and item feature vector using SGLD:
∆ui =
t
2
α
|Ωi |
j=1
(ri,j − f (uT
i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t)
∆vj =
t
2
α
|Ωj |
i=1
(ri,j − f (uT
i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t)

SGLD based Thompson Sampling
Sample user and item feature vector using SGLD:
∆ui =
t
2
α
|Ωi |
j=1
(ri,j − f (uT
i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t)
∆vj =
t
2
α
|Ωj |
i=1
(ri,j − f (uT
i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t)
Sample Hyper parameter using Gibbs:
ΛUd
= Γ(Λud
|αo +
N
2
, βo +
1
2
N
i=1
u2
di )
Λvd
= Γ(Λvd
|αo +
M
2
, βo +
1
2
M
j=1
v2
dj )

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.

Results
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.

Results
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.
(a) N=10, M=10, K=1 (b) N=10, M=10, K=3
(c) N=20, M=20, K=1 (d) N=30, M=30, K=1
Figure: Cumulative Regret on diﬀerent size of synthesis data

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

P´olya-Gamma

P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.

P´olya-Gamma
prior.
It is diﬃcult to update posterior distribution in online manner.

P´olya-Gamma
prior.
It is diﬃcult to update posterior distribution in online manner.
Use PG scheme that allows us to derive inference algorithm that looks like
simple Bayesian linear regression.

P´olya-gamma data augmentation
P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω

P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω
Log-likelihood Q(β), is a quadratic form in β [3],
Q(β) =
N
n=0
−
1
2
ωnψ2
n + κnψn
where, κn = bn − an/2, ψn = xT
n β and ωn =D
PG(b, 0)

Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .

Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .
Posterior on ω,
p(ω|b, ˆψ) =
exp(−ω ˆψ2
2 )p(b, 0)
∞
0
exp(−ω ˆψ2
2 )p(b, 0)dω
= PG(b, ˆψ)
where, p(b, 0) is P´olya-Gamma distribution.

P´olya-Gamma Sampling Scheme
To draw a sample from posterior of parameter, simply iterate following two
steps:
(ωij |ui ) ∼ PG(1, vT
j ui ) ∀j ∈ [1, 2, .., Ωi ]
(ui |Ri , ωi ) ∼ N(µui
, Σui
)
where
Σui
= (
|Ωi |
j=1
vj vT
j ωij + I)−1
µui
= Σui
(
|Ωi |
j=1
vj kij )
N(0, λ−1
I) prior distribution over users feature vector.

Outline
1 Introduction
Existing Solutions
Thompson Sampling
SGLD
Results
P´olya-Gamma
Algorithm
4 Results
5 Conclusion

Algorithm
Algorithm 1 P´olyaGamma-Stochastic Gradient Descent(PGSGD)
Input: λ
1: Initialize: A ← λ−1
I; b ← 0
2: for t = 1, ...., T do
3: Receive user it
4: Σit = A−1
b; µit = A−1
5: Sample uit
from N(µit
, Σit
)
6: Get MAP solution of item vj , ∀ j ∈ [Ωuit
]
7: Choose item j∗
(it)=argmaxj uT
it
vj
8: Receive the reward rit ,j∗(it )
9: ωij ∼ PG(1, uit Tvj∗(it ))
10: kij = rit ,j∗(it )
11: Update A= A + vj∗(it )vT
j∗(it )ωij
12: Update b= b + vj∗(it )kij
13: Update items feature vector at MAP solution using SGD
14: end for

Results
Figure: Jester Real-valued Dataset

Results
Figure: Jester Binary-valued Dataset

Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.

Can handle any type of reward.

Can handle constraint on parameter space.

Presented an eﬃcient algorithm for binary or count valued data.

Presented an eﬃcient algorithm for binary or count valued data.
Analysis can be done on regret bound.

Questions ?

References I
O. Chapelle and L. Li.
An empirical evaluation of thompson sampling.
In Advances in neural information processing systems, pages 2249–2257,
2011.
R. M. Neal et al.
Mcmc using hamiltonian dynamics.
Handbook of Markov Chain Monte Carlo, 2:113–162, 2011.
N. G. Polson, J. G. Scott, and J. Windle.
Bayesian inference for logistic models using p´olya-gamma latent variables.
Journal of the American statistical Association, 108(504):1339–1349, 2013.
H. Robbins and S. Monro.
A stochastic approximation method.
The annals of mathematical statistics, pages 400–407, 1951.
H. Wang, Q. Wu, and H. Wang.
Factorization bandits for interactive recommendation.
2017.

References II
M. Welling and Y. W. Teh.
Bayesian learning via stochastic gradient langevin dynamics.
In Proceedings of the 28th International Conference on Machine Learning
(ICML-11), pages 681–688, 2011.

The End

Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings

Similar to Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings (20)

Recently uploaded

Recently uploaded (20)

Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings