This document describes two Thompson sampling approaches for online matrix factorization in bandit settings:
1. SGLD based Thompson sampling which uses Stochastic Gradient Langevin Dynamics to update the posterior distributions of the user and item latent factors.
2. Polya-Gamma based Thompson sampling which leverages the conjugacy between the likelihood and prior distributions afforded by the Polya-Gamma distribution to update the posteriors in closed form.
Both approaches aim to address the limitations of existing solutions such as the need for local conjugacy between priors and likelihoods or additional contextual information. The document outlines the mathematical foundations and provides details on the algorithms and experimental results.
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings
1. Efficient Bayesian Inference for Online Matrix
Factorization in Bandit settings
Vijay Pal Jat
Supervisor: Dr. Piyush Rai
Department of Computer Science and Engineering
Indian Institute of Technology Kanpur
January 26, 2020
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 1 / 37
4. Online Interactive Recommender System
Online Interactive Recommender System
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
5. Online Interactive Recommender System
Online Interactive Recommender System
Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
6. Online Interactive Recommender System
Online Interactive Recommender System
Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.
Approach: Matrix Factorization technique with Bandit algorithm
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
8. Matrix Factorization
Decompose the rating matrix R into low rank users (U) and items (V) feature
vector matrices.
R ≈ U V T
K << min(N, M)
An entry in matrix is represented as :
rij ≈ uT
i vj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 6 / 37
10. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
11. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
12. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
13. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
14. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Strategy: Play according to Explore-Exploit (EE) trade-off.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
15. Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Strategy: Play according to Explore-Exploit (EE) trade-off.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.
Contextual information ⇒ user and item latent feature vectors which are
not given explicitly.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
17. Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
18. Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.
Multi-Armed Bandit algorithm based solutions
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
20. Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
21. Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
22. Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Need of local conjugacy between likelihood and prior distribution.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
23. Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Need of local conjugacy between likelihood and prior distribution.
Works better if reward is real valued.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
24. Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
25. Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
26. Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Works better for any type of rewards.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
27. Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Works better for any type of rewards.
Updates both posterior U and V simultaneously.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
28. Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
29. Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
30. Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.
Exploitation case (maximizing the immediate reward)
E(ri,j ) = E(ri,j |vj , ui )p(vj , ui |D)dui dvj
J∗
(i) = max
j
E(ri,j )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
31. Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
32. Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
33. Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
34. Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj
Cummulative regret
t
τ=1 ∆τ
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
37. Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
38. Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Prior on hyperparameter:
ΛUd
= Γ(ΛUd
|αo, βo)
Λvd
= Γ(ΛVd
|αo, βo)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
39. Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Prior on hyperparameter:
ΛUd
= Γ(ΛUd
|αo, βo)
Λvd
= Γ(ΛVd
|αo, βo)
ΛU , ΛV are diagonal matrices and Λud
, Λvd
are dth
diagonal elements.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
40. Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
41. Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be defined.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
42. Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be defined.
Due to conjugacy between Gaussian’s precision and Gamma distribution, the
posterior of hyperparameter is also Gamma which is given below:
ΛUd
= Γ(Λud
|αo +
N
2
, βo +
1
2
N
i=1
u2
di )
Λvd
= Γ(Λvd
|αo +
M
2
, βo +
1
2
M
j=1
v2
dj )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
44. SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
45. SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
46. SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)
SGD update equation [4] for MAP estimation:
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt)
Gradient calculated on mini batch of size n.
MAP estimation does not capture uncertainty in parameter.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
54. Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
55. Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
56. Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.
(a) N=10, M=10, K=1 (b) N=10, M=10, K=3
(c) N=20, M=20, K=1 (d) N=30, M=30, K=1
Figure: Cumulative Regret on different size of synthesis data
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
59. P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
60. P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
It is difficult to update posterior distribution in online manner.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
61. P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
It is difficult to update posterior distribution in online manner.
Use PG scheme that allows us to derive inference algorithm that looks like
simple Bayesian linear regression.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
62. P´olya-gamma data augmentation
P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
63. P´olya-gamma data augmentation
P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω
Log-likelihood Q(β), is a quadratic form in β [3],
Q(β) =
N
n=0
−
1
2
ωnψ2
n + κnψn
where, κn = bn − an/2, ψn = xT
n β and ωn =D
PG(b, 0)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
64. P´olya-gamma data augmentation
Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
65. P´olya-gamma data augmentation
Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .
Posterior on ω,
p(ω|b, ˆψ) =
exp(−ω ˆψ2
2 )p(b, 0)
∞
0
exp(−ω ˆψ2
2 )p(b, 0)dω
= PG(b, ˆψ)
where, p(b, 0) is P´olya-Gamma distribution.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
66. P´olya-Gamma Sampling Scheme
To draw a sample from posterior of parameter, simply iterate following two
steps:
(ωij |ui ) ∼ PG(1, vT
j ui ) ∀j ∈ [1, 2, .., Ωi ]
(ui |Ri , ωi ) ∼ N(µui
, Σui
)
where
Σui
= (
|Ωi |
j=1
vj vT
j ωij + I)−1
µui
= Σui
(
|Ωi |
j=1
vj kij )
N(0, λ−1
I) prior distribution over users feature vector.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 28 / 37
71. Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
72. Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
73. Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
74. Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Presented an efficient algorithm for binary or count valued data.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
75. Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Presented an efficient algorithm for binary or count valued data.
Analysis can be done on regret bound.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
77. References I
O. Chapelle and L. Li.
An empirical evaluation of thompson sampling.
In Advances in neural information processing systems, pages 2249–2257,
2011.
R. M. Neal et al.
Mcmc using hamiltonian dynamics.
Handbook of Markov Chain Monte Carlo, 2:113–162, 2011.
N. G. Polson, J. G. Scott, and J. Windle.
Bayesian inference for logistic models using p´olya-gamma latent variables.
Journal of the American statistical Association, 108(504):1339–1349, 2013.
H. Robbins and S. Monro.
A stochastic approximation method.
The annals of mathematical statistics, pages 400–407, 1951.
H. Wang, Q. Wu, and H. Wang.
Factorization bandits for interactive recommendation.
2017.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 35 / 37
78. References II
M. Welling and Y. W. Teh.
Bayesian learning via stochastic gradient langevin dynamics.
In Proceedings of the 28th International Conference on Machine Learning
(ICML-11), pages 681–688, 2011.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 36 / 37
79. The End
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 37 / 37