SlideShare a Scribd company logo
1 of 79
Efficient Bayesian Inference for Online Matrix
Factorization in Bandit settings
Vijay Pal Jat
Supervisor: Dr. Piyush Rai
Department of Computer Science and Engineering
Indian Institute of Technology Kanpur
January 26, 2020
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 1 / 37
Overview
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 2 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 3 / 37
Online Interactive Recommender System
Online Interactive Recommender System
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
Online Interactive Recommender System
Online Interactive Recommender System
Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
Online Interactive Recommender System
Online Interactive Recommender System
Challenges:
We don’t know about users a priori.
Users’ preferences may change over time.
We don’t know the correlation between items.
Approach: Matrix Factorization technique with Bandit algorithm
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 5 / 37
Matrix Factorization
Decompose the rating matrix R into low rank users (U) and items (V) feature
vector matrices.
R ≈ U V T
K << min(N, M)
An entry in matrix is represented as :
rij ≈ uT
i vj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 6 / 37
Contextual Bandit Setting
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Strategy: Play according to Explore-Exploit (EE) trade-off.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Contextual Bandit Setting
Let v1, v2, .., vM be the set of arms (items).
Given a user ui , model decides to pull a arm vj , j ∈ [M].
Model receives a reward ri,j which is determined by user and pulled arm.
Objective: Maximize the cumulative rewards.
Strategy: Play according to Explore-Exploit (EE) trade-off.
Exploration: Gather more information about each arms.
Exploitation: Pull the seemingly most rewarding arm.
Contextual information ⇒ user and item latent feature vectors which are
not given explicitly.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 8 / 37
Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
Existing Solutions
Matrix Factorization based solution:
Good matrix completion algorithm does not imply good Recommender system.
It is useless to know how well worst items are predicted. We never recommend
them. A rough estimate of rating of worst items is enough.
Multi-Armed Bandit algorithm based solutions
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
Existing Solutions
UCB based solution:
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Need of local conjugacy between likelihood and prior distribution.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
Existing Solutions
UCB based solution:
Use additional information about users and items to improve convergence rate.
e.g. [5]
Thompson sampling based solution:
Learn point estimate of both factors U and V or learn point estimate of one of
the factors (e.g V) and update the posterior of probability of other (U). e.g.
ICF
Need of local conjugacy between likelihood and prior distribution.
Works better if reward is real valued.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Works better for any type of rewards.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
Our Model
Our Approach:
Does not require local conjugacy between likelihood and prior.
Does not use additional information.
Works better for any type of rewards.
Updates both posterior U and V simultaneously.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
Mathematical Foundation
Reward is stochastic function of the action and the unknown true context of
items (V ∗
) and users (U∗
).
Choose the item that maximizes the expected reward
J∗
(i) = max
j
E(ri,j |v∗
j , u∗
i )
.
Exploitation case (maximizing the immediate reward)
E(ri,j ) = E(ri,j |vj , ui )p(vj , ui |D)dui dvj
J∗
(i) = max
j
E(ri,j )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
Mathematical Foundation
For EE setting, the probability matching heuristic is to randomly select an
item according to its probability of being optimal.
p(J∗
(it) = j) = I E(ri,j |ui , vj ) = max
j
E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj
Above integral is approximated by drawing random parameter from their
distribution [1].
Regret = (optimal - obtained reward)
∆t = max
j
u∗T
i v∗
j − max
j
uT
i vj
Cummulative regret
t
τ=1 ∆τ
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 14 / 37
Thompson Sampling
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 15 / 37
Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Prior on hyperparameter:
ΛUd
= Γ(ΛUd
|αo, βo)
Λvd
= Γ(ΛVd
|αo, βo)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
Distribution of Parameters and Hyperparameters
The Likelihood of rewards and prior on parameters are defined as:
p(R|U, V , α) =
N
i=1
M
j=1
N(rij |uT
i vj , α−1
)
Iij
p(U|ΛU ) =
N
i=1
N(ui |0, Λ−1
U )
p(V |ΛV ) =
M
j=1
N(vi |0, Λ−1
V )
Prior on hyperparameter:
ΛUd
= Γ(ΛUd
|αo, βo)
Λvd
= Γ(ΛVd
|αo, βo)
ΛU , ΛV are diagonal matrices and Λud
, Λvd
are dth
diagonal elements.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be defined.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
Distribution of Parameters and Hyperparameters
The conditional posterior distribution of user ui :
p(ui |R, V , α, αU , αV ) = N(ui |µui
, Σui
)
Σui
= α
M
j=1
vj vT
j
Iij
+ ΛU
µui
= α Σui
−1
M
j=1
rij vj
Iij
Similarly conditional posterior distribution of items can be defined.
Due to conjugacy between Gaussian’s precision and Gamma distribution, the
posterior of hyperparameter is also Gamma which is given below:
ΛUd
= Γ(Λud
|αo +
N
2
, βo +
1
2
N
i=1
u2
di )
Λvd
= Γ(Λvd
|αo +
M
2
, βo +
1
2
M
j=1
v2
dj )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 18 / 37
SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
SGLD
Assume
X = {xi }N
i=1 set of N data points
θ ← parameter vector
p(θ) ← prior on parameter
p(x|θ) ← likelihood model
The posterior distribution of data points:
p(θ|X) ∝ p(θ) ·
N
i=1
p(xi |θ)
SGD update equation [4] for MAP estimation:
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt)
Gradient calculated on mini batch of size n.
MAP estimation does not capture uncertainty in parameter.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
SGLD
Langevin Dynamics [2]
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
SGLD
Langevin Dynamics [2]
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
Gradient computed over whole dataset.
Require Metropolis Hastings (MH) accept-reject test.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
SGLD
Langevin Dynamics [2]
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
Gradient computed over whole dataset.
Require Metropolis Hastings (MH) accept-reject test.
SGLD [6]
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, t)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
SGLD
Langevin Dynamics [2]
θt+1 = θt +
2
∆ log p(θt) +
N
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, )
Gradient computed over whole dataset.
Require Metropolis Hastings (MH) accept-reject test.
SGLD [6]
θt+1 = θt +
t
2
∆ log p(θt) +
N
n
n
i=1
∆ log p(xti
|θt) + ηt; ηt ∼ N(0, t)
As step size t tends to zero, the acceptance rate tends to one.
No need of MH accept-reject test.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
SGLD based Thompson Sampling
Sample user and item feature vector using SGLD:
∆ui =
t
2
α
|Ωi |
j=1
(ri,j − f (uT
i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t)
∆vj =
t
2
α
|Ωj |
i=1
(ri,j − f (uT
i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 21 / 37
SGLD based Thompson Sampling
Sample user and item feature vector using SGLD:
∆ui =
t
2
α
|Ωi |
j=1
(ri,j − f (uT
i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t)
∆vj =
t
2
α
|Ωj |
i=1
(ri,j − f (uT
i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t)
Sample Hyper parameter using Gibbs:
ΛUd
= Γ(Λud
|αo +
N
2
, βo +
1
2
N
i=1
u2
di )
Λvd
= Γ(Λvd
|αo +
M
2
, βo +
1
2
M
j=1
v2
dj )
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 21 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 22 / 37
Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
Results
If an algorithm explores forever or exploits forever, it will have linear total
regret.
SGLD based Thompson sampling achieves sub-linear cumulative regret.
(a) N=10, M=10, K=1 (b) N=10, M=10, K=3
(c) N=20, M=20, K=1 (d) N=30, M=30, K=1
Figure: Cumulative Regret on different size of synthesis data
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 24 / 37
P´olya-Gamma
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
It is difficult to update posterior distribution in online manner.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
P´olya-Gamma
If rewards are binary or count value then the likelihood is not conjugate with
prior.
It is difficult to update posterior distribution in online manner.
Use PG scheme that allows us to derive inference algorithm that looks like
simple Bayesian linear regression.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
P´olya-gamma data augmentation
P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
P´olya-gamma data augmentation
P´olya-Gamma augmentation scheme allows us to re-write the Negative
Binomial likelihood as follows,
(exp(ψ))
a
(1 + exp(ψ))
b
= 2−b
eκψ
∞
0
e−ωψ2
/2
p(ω)dω
Log-likelihood Q(β), is a quadratic form in β [3],
Q(β) =
N
n=0
−
1
2
ωnψ2
n + κnψn
where, κn = bn − an/2, ψn = xT
n β and ωn =D
PG(b, 0)
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
P´olya-gamma data augmentation
Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
P´olya-gamma data augmentation
Logistic-Bernoulli likelihood,
Q(ξij |ωij ) ∝ exp −
1
2
ωij (κij /ωij − ξij )
2
where κij = yij − 0.5 and ξij = uT
i vj .
Posterior on ω,
p(ω|b, ˆψ) =
exp(−ω ˆψ2
2 )p(b, 0)
∞
0
exp(−ω ˆψ2
2 )p(b, 0)dω
= PG(b, ˆψ)
where, p(b, 0) is P´olya-Gamma distribution.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
P´olya-Gamma Sampling Scheme
To draw a sample from posterior of parameter, simply iterate following two
steps:
(ωij |ui ) ∼ PG(1, vT
j ui ) ∀j ∈ [1, 2, .., Ωi ]
(ui |Ri , ωi ) ∼ N(µui
, Σui
)
where
Σui
= (
|Ωi |
j=1
vj vT
j ωij + I)−1
µui
= Σui
(
|Ωi |
j=1
vj kij )
N(0, λ−1
I) prior distribution over users feature vector.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 28 / 37
Outline
1 Introduction
Online Interactive Recommender System
Matrix Factorization
Existing Solutions
Thompson Sampling
2 SGLD based Thompson Sampling
SGLD
Results
3 P´olya-Gamma based Thompson Sampling
P´olya-Gamma
Algorithm
4 Results
5 Conclusion
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 29 / 37
Algorithm
Algorithm 1 P´olyaGamma-Stochastic Gradient Descent(PGSGD)
Input: λ
1: Initialize: A ← λ−1
I; b ← 0
2: for t = 1, ...., T do
3: Receive user it
4: Σit = A−1
b; µit = A−1
5: Sample uit
from N(µit
, Σit
)
6: Get MAP solution of item vj , ∀ j ∈ [Ωuit
]
7: Choose item j∗
(it)=argmaxj uT
it
vj
8: Receive the reward rit ,j∗(it )
9: ωij ∼ PG(1, uit Tvj∗(it ))
10: kij = rit ,j∗(it )
11: Update A= A + vj∗(it )vT
j∗(it )ωij
12: Update b= b + vj∗(it )kij
13: Update items feature vector at MAP solution using SGD
14: end for
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 30 / 37
Results
Figure: Jester Real-valued Dataset
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 31 / 37
Results
Figure: Jester Binary-valued Dataset
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 32 / 37
Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Presented an efficient algorithm for binary or count valued data.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
Summary and Future Work
Presented a fast and generalized SGLD based Thompson sampling algorithm.
Can handle any type of reward.
Can handle constraint on parameter space.
Presented an efficient algorithm for binary or count valued data.
Analysis can be done on regret bound.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
Questions ?
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 34 / 37
References I
O. Chapelle and L. Li.
An empirical evaluation of thompson sampling.
In Advances in neural information processing systems, pages 2249–2257,
2011.
R. M. Neal et al.
Mcmc using hamiltonian dynamics.
Handbook of Markov Chain Monte Carlo, 2:113–162, 2011.
N. G. Polson, J. G. Scott, and J. Windle.
Bayesian inference for logistic models using p´olya-gamma latent variables.
Journal of the American statistical Association, 108(504):1339–1349, 2013.
H. Robbins and S. Monro.
A stochastic approximation method.
The annals of mathematical statistics, pages 400–407, 1951.
H. Wang, Q. Wu, and H. Wang.
Factorization bandits for interactive recommendation.
2017.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 35 / 37
References II
M. Welling and Y. W. Teh.
Bayesian learning via stochastic gradient langevin dynamics.
In Proceedings of the 28th International Conference on Machine Learning
(ICML-11), pages 681–688, 2011.
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 36 / 37
The End
Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 37 / 37

More Related Content

What's hot

Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
 
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...Ákos Horváth
 
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...kt.mako
 
Introduction into machine learning
Introduction into machine learningIntroduction into machine learning
Introduction into machine learningmohamed Naas
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to ReconstructJonas Adler
 
Data-driven Analysis for Multi-agent Trajectories in Team Sports
Data-driven Analysis for Multi-agent Trajectories in Team SportsData-driven Analysis for Multi-agent Trajectories in Team Sports
Data-driven Analysis for Multi-agent Trajectories in Team SportsKeisuke Fujii
 
Context-guided Learning to Rank Entities
Context-guided Learning to Rank EntitiesContext-guided Learning to Rank Entities
Context-guided Learning to Rank Entitieskt.mako
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...M. Ilhan Akbas
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataijistjournal
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)ActiveEon
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningMarc Bolaños Solà
 
Machine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sMachine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sSiddharth Chakravarty
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
 
A preliminary study of diversity in ELM ensembles (HAIS 2018)
A preliminary study of diversity in ELM ensembles (HAIS 2018)A preliminary study of diversity in ELM ensembles (HAIS 2018)
A preliminary study of diversity in ELM ensembles (HAIS 2018)Carlos Perales
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 

What's hot (20)

Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...
CPS(M): Constraint Satisfaction Problem over Models (a.k.a rule based design ...
 
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect ...
 
Introduction into machine learning
Introduction into machine learningIntroduction into machine learning
Introduction into machine learning
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Data-driven Analysis for Multi-agent Trajectories in Team Sports
Data-driven Analysis for Multi-agent Trajectories in Team SportsData-driven Analysis for Multi-agent Trajectories in Team Sports
Data-driven Analysis for Multi-agent Trajectories in Team Sports
 
Context-guided Learning to Rank Entities
Context-guided Learning to Rank EntitiesContext-guided Learning to Rank Entities
Context-guided Learning to Rank Entities
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...
Spectrum Analytic Approach for Cooperative Navigation of Connected and Autono...
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted data
 
ML Basics
ML BasicsML Basics
ML Basics
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Machine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sMachine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’s
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
 
A preliminary study of diversity in ELM ensembles (HAIS 2018)
A preliminary study of diversity in ELM ensembles (HAIS 2018)A preliminary study of diversity in ELM ensembles (HAIS 2018)
A preliminary study of diversity in ELM ensembles (HAIS 2018)
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 

Similar to Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings

Handout simulasi computer
Handout simulasi computerHandout simulasi computer
Handout simulasi computerSyafie ALin
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
Sbst15 tooldemo.ppt
Sbst15 tooldemo.pptSbst15 tooldemo.ppt
Sbst15 tooldemo.pptPtidej Team
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018Gourab Nath
 
Sbst16 tooldemo.ppt
Sbst16 tooldemo.pptSbst16 tooldemo.ppt
Sbst16 tooldemo.pptPtidej Team
 
Introduction to reinforcement learning
Introduction to  reinforcement learningIntroduction to  reinforcement learning
Introduction to reinforcement learningAkshay Salunkhe
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...Jihoo Kim
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine LearningAndres Mendez-Vazquez
 
final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)butest
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation systemKimikazu Kato
 
Programing Slicing and Its applications
Programing Slicing and Its applicationsPrograming Slicing and Its applications
Programing Slicing and Its applicationsAnkur Jain
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchEfficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchMelvin Zhang
 

Similar to Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings (20)

Handout simulasi computer
Handout simulasi computerHandout simulasi computer
Handout simulasi computer
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
 
Sbst15 tooldemo.ppt
Sbst15 tooldemo.pptSbst15 tooldemo.ppt
Sbst15 tooldemo.ppt
 
Sbst15 tooldemo.ppt
Sbst15 tooldemo.pptSbst15 tooldemo.ppt
Sbst15 tooldemo.ppt
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018
 
Sbst16 tooldemo.ppt
Sbst16 tooldemo.pptSbst16 tooldemo.ppt
Sbst16 tooldemo.ppt
 
Sbst16 tooldemo.ppt
Sbst16 tooldemo.pptSbst16 tooldemo.ppt
Sbst16 tooldemo.ppt
 
Introduction to reinforcement learning
Introduction to  reinforcement learningIntroduction to  reinforcement learning
Introduction to reinforcement learning
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
 
final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
Programing Slicing and Its applications
Programing Slicing and Its applicationsPrograming Slicing and Its applications
Programing Slicing and Its applications
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
Features.pdf
Features.pdfFeatures.pdf
Features.pdf
 
Imitation Learning
Imitation LearningImitation Learning
Imitation Learning
 
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchEfficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 

Recently uploaded

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Recently uploaded (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Efficient Bayesian Inference for Online Matrix Factorization in Bandit Settings

  • 1. Efficient Bayesian Inference for Online Matrix Factorization in Bandit settings Vijay Pal Jat Supervisor: Dr. Piyush Rai Department of Computer Science and Engineering Indian Institute of Technology Kanpur January 26, 2020 Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 1 / 37
  • 2. Overview 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 2 / 37
  • 3. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 3 / 37
  • 4. Online Interactive Recommender System Online Interactive Recommender System Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
  • 5. Online Interactive Recommender System Online Interactive Recommender System Challenges: We don’t know about users a priori. Users’ preferences may change over time. We don’t know the correlation between items. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
  • 6. Online Interactive Recommender System Online Interactive Recommender System Challenges: We don’t know about users a priori. Users’ preferences may change over time. We don’t know the correlation between items. Approach: Matrix Factorization technique with Bandit algorithm Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 4 / 37
  • 7. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 5 / 37
  • 8. Matrix Factorization Decompose the rating matrix R into low rank users (U) and items (V) feature vector matrices. R ≈ U V T K << min(N, M) An entry in matrix is represented as : rij ≈ uT i vj Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 6 / 37
  • 9. Contextual Bandit Setting Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 10. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 11. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Given a user ui , model decides to pull a arm vj , j ∈ [M]. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 12. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Given a user ui , model decides to pull a arm vj , j ∈ [M]. Model receives a reward ri,j which is determined by user and pulled arm. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 13. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Given a user ui , model decides to pull a arm vj , j ∈ [M]. Model receives a reward ri,j which is determined by user and pulled arm. Objective: Maximize the cumulative rewards. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 14. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Given a user ui , model decides to pull a arm vj , j ∈ [M]. Model receives a reward ri,j which is determined by user and pulled arm. Objective: Maximize the cumulative rewards. Strategy: Play according to Explore-Exploit (EE) trade-off. Exploration: Gather more information about each arms. Exploitation: Pull the seemingly most rewarding arm. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 15. Contextual Bandit Setting Let v1, v2, .., vM be the set of arms (items). Given a user ui , model decides to pull a arm vj , j ∈ [M]. Model receives a reward ri,j which is determined by user and pulled arm. Objective: Maximize the cumulative rewards. Strategy: Play according to Explore-Exploit (EE) trade-off. Exploration: Gather more information about each arms. Exploitation: Pull the seemingly most rewarding arm. Contextual information ⇒ user and item latent feature vectors which are not given explicitly. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 7 / 37
  • 16. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 8 / 37
  • 17. Existing Solutions Matrix Factorization based solution: Good matrix completion algorithm does not imply good Recommender system. It is useless to know how well worst items are predicted. We never recommend them. A rough estimate of rating of worst items is enough. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
  • 18. Existing Solutions Matrix Factorization based solution: Good matrix completion algorithm does not imply good Recommender system. It is useless to know how well worst items are predicted. We never recommend them. A rough estimate of rating of worst items is enough. Multi-Armed Bandit algorithm based solutions Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 9 / 37
  • 19. Existing Solutions UCB based solution: Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
  • 20. Existing Solutions UCB based solution: Use additional information about users and items to improve convergence rate. e.g. [5] Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
  • 21. Existing Solutions UCB based solution: Use additional information about users and items to improve convergence rate. e.g. [5] Thompson sampling based solution: Learn point estimate of both factors U and V or learn point estimate of one of the factors (e.g V) and update the posterior of probability of other (U). e.g. ICF Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
  • 22. Existing Solutions UCB based solution: Use additional information about users and items to improve convergence rate. e.g. [5] Thompson sampling based solution: Learn point estimate of both factors U and V or learn point estimate of one of the factors (e.g V) and update the posterior of probability of other (U). e.g. ICF Need of local conjugacy between likelihood and prior distribution. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
  • 23. Existing Solutions UCB based solution: Use additional information about users and items to improve convergence rate. e.g. [5] Thompson sampling based solution: Learn point estimate of both factors U and V or learn point estimate of one of the factors (e.g V) and update the posterior of probability of other (U). e.g. ICF Need of local conjugacy between likelihood and prior distribution. Works better if reward is real valued. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 10 / 37
  • 24. Our Model Our Approach: Does not require local conjugacy between likelihood and prior. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
  • 25. Our Model Our Approach: Does not require local conjugacy between likelihood and prior. Does not use additional information. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
  • 26. Our Model Our Approach: Does not require local conjugacy between likelihood and prior. Does not use additional information. Works better for any type of rewards. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
  • 27. Our Model Our Approach: Does not require local conjugacy between likelihood and prior. Does not use additional information. Works better for any type of rewards. Updates both posterior U and V simultaneously. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 11 / 37
  • 28. Mathematical Foundation Reward is stochastic function of the action and the unknown true context of items (V ∗ ) and users (U∗ ). Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
  • 29. Mathematical Foundation Reward is stochastic function of the action and the unknown true context of items (V ∗ ) and users (U∗ ). Choose the item that maximizes the expected reward J∗ (i) = max j E(ri,j |v∗ j , u∗ i ) . Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
  • 30. Mathematical Foundation Reward is stochastic function of the action and the unknown true context of items (V ∗ ) and users (U∗ ). Choose the item that maximizes the expected reward J∗ (i) = max j E(ri,j |v∗ j , u∗ i ) . Exploitation case (maximizing the immediate reward) E(ri,j ) = E(ri,j |vj , ui )p(vj , ui |D)dui dvj J∗ (i) = max j E(ri,j ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 12 / 37
  • 31. Mathematical Foundation For EE setting, the probability matching heuristic is to randomly select an item according to its probability of being optimal. p(J∗ (it) = j) = I E(ri,j |ui , vj ) = max j E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
  • 32. Mathematical Foundation For EE setting, the probability matching heuristic is to randomly select an item according to its probability of being optimal. p(J∗ (it) = j) = I E(ri,j |ui , vj ) = max j E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj Above integral is approximated by drawing random parameter from their distribution [1]. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
  • 33. Mathematical Foundation For EE setting, the probability matching heuristic is to randomly select an item according to its probability of being optimal. p(J∗ (it) = j) = I E(ri,j |ui , vj ) = max j E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj Above integral is approximated by drawing random parameter from their distribution [1]. Regret = (optimal - obtained reward) ∆t = max j u∗T i v∗ j − max j uT i vj Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
  • 34. Mathematical Foundation For EE setting, the probability matching heuristic is to randomly select an item according to its probability of being optimal. p(J∗ (it) = j) = I E(ri,j |ui , vj ) = max j E(ri,j |ui , vj ) p(ui |D)p(vj |D)dui dvj Above integral is approximated by drawing random parameter from their distribution [1]. Regret = (optimal - obtained reward) ∆t = max j u∗T i v∗ j − max j uT i vj Cummulative regret t τ=1 ∆τ Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 13 / 37
  • 35. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 14 / 37
  • 36. Thompson Sampling Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 15 / 37
  • 37. Distribution of Parameters and Hyperparameters The Likelihood of rewards and prior on parameters are defined as: p(R|U, V , α) = N i=1 M j=1 N(rij |uT i vj , α−1 ) Iij p(U|ΛU ) = N i=1 N(ui |0, Λ−1 U ) p(V |ΛV ) = M j=1 N(vi |0, Λ−1 V ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
  • 38. Distribution of Parameters and Hyperparameters The Likelihood of rewards and prior on parameters are defined as: p(R|U, V , α) = N i=1 M j=1 N(rij |uT i vj , α−1 ) Iij p(U|ΛU ) = N i=1 N(ui |0, Λ−1 U ) p(V |ΛV ) = M j=1 N(vi |0, Λ−1 V ) Prior on hyperparameter: ΛUd = Γ(ΛUd |αo, βo) Λvd = Γ(ΛVd |αo, βo) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
  • 39. Distribution of Parameters and Hyperparameters The Likelihood of rewards and prior on parameters are defined as: p(R|U, V , α) = N i=1 M j=1 N(rij |uT i vj , α−1 ) Iij p(U|ΛU ) = N i=1 N(ui |0, Λ−1 U ) p(V |ΛV ) = M j=1 N(vi |0, Λ−1 V ) Prior on hyperparameter: ΛUd = Γ(ΛUd |αo, βo) Λvd = Γ(ΛVd |αo, βo) ΛU , ΛV are diagonal matrices and Λud , Λvd are dth diagonal elements. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 16 / 37
  • 40. Distribution of Parameters and Hyperparameters The conditional posterior distribution of user ui : p(ui |R, V , α, αU , αV ) = N(ui |µui , Σui ) Σui = α M j=1 vj vT j Iij + ΛU µui = α Σui −1 M j=1 rij vj Iij Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
  • 41. Distribution of Parameters and Hyperparameters The conditional posterior distribution of user ui : p(ui |R, V , α, αU , αV ) = N(ui |µui , Σui ) Σui = α M j=1 vj vT j Iij + ΛU µui = α Σui −1 M j=1 rij vj Iij Similarly conditional posterior distribution of items can be defined. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
  • 42. Distribution of Parameters and Hyperparameters The conditional posterior distribution of user ui : p(ui |R, V , α, αU , αV ) = N(ui |µui , Σui ) Σui = α M j=1 vj vT j Iij + ΛU µui = α Σui −1 M j=1 rij vj Iij Similarly conditional posterior distribution of items can be defined. Due to conjugacy between Gaussian’s precision and Gamma distribution, the posterior of hyperparameter is also Gamma which is given below: ΛUd = Γ(Λud |αo + N 2 , βo + 1 2 N i=1 u2 di ) Λvd = Γ(Λvd |αo + M 2 , βo + 1 2 M j=1 v2 dj ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 17 / 37
  • 43. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 18 / 37
  • 44. SGLD Assume X = {xi }N i=1 set of N data points θ ← parameter vector p(θ) ← prior on parameter p(x|θ) ← likelihood model Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
  • 45. SGLD Assume X = {xi }N i=1 set of N data points θ ← parameter vector p(θ) ← prior on parameter p(x|θ) ← likelihood model The posterior distribution of data points: p(θ|X) ∝ p(θ) · N i=1 p(xi |θ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
  • 46. SGLD Assume X = {xi }N i=1 set of N data points θ ← parameter vector p(θ) ← prior on parameter p(x|θ) ← likelihood model The posterior distribution of data points: p(θ|X) ∝ p(θ) · N i=1 p(xi |θ) SGD update equation [4] for MAP estimation: θt+1 = θt + t 2 ∆ log p(θt) + N n n i=1 ∆ log p(xti |θt) Gradient calculated on mini batch of size n. MAP estimation does not capture uncertainty in parameter. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 19 / 37
  • 47. SGLD Langevin Dynamics [2] θt+1 = θt + 2 ∆ log p(θt) + N i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
  • 48. SGLD Langevin Dynamics [2] θt+1 = θt + 2 ∆ log p(θt) + N i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, ) Gradient computed over whole dataset. Require Metropolis Hastings (MH) accept-reject test. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
  • 49. SGLD Langevin Dynamics [2] θt+1 = θt + 2 ∆ log p(θt) + N i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, ) Gradient computed over whole dataset. Require Metropolis Hastings (MH) accept-reject test. SGLD [6] θt+1 = θt + t 2 ∆ log p(θt) + N n n i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, t) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
  • 50. SGLD Langevin Dynamics [2] θt+1 = θt + 2 ∆ log p(θt) + N i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, ) Gradient computed over whole dataset. Require Metropolis Hastings (MH) accept-reject test. SGLD [6] θt+1 = θt + t 2 ∆ log p(θt) + N n n i=1 ∆ log p(xti |θt) + ηt; ηt ∼ N(0, t) As step size t tends to zero, the acceptance rate tends to one. No need of MH accept-reject test. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 20 / 37
  • 51. SGLD based Thompson Sampling Sample user and item feature vector using SGLD: ∆ui = t 2 α |Ωi | j=1 (ri,j − f (uT i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t) ∆vj = t 2 α |Ωj | i=1 (ri,j − f (uT i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 21 / 37
  • 52. SGLD based Thompson Sampling Sample user and item feature vector using SGLD: ∆ui = t 2 α |Ωi | j=1 (ri,j − f (uT i vj )) vj − ΛU ui + ηt; ηt ∼ N(0, t) ∆vj = t 2 α |Ωj | i=1 (ri,j − f (uT i vj )) ui − ΛV vj + ηt; ηt ∼ N(0, t) Sample Hyper parameter using Gibbs: ΛUd = Γ(Λud |αo + N 2 , βo + 1 2 N i=1 u2 di ) Λvd = Γ(Λvd |αo + M 2 , βo + 1 2 M j=1 v2 dj ) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 21 / 37
  • 53. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 22 / 37
  • 54. Results If an algorithm explores forever or exploits forever, it will have linear total regret. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
  • 55. Results If an algorithm explores forever or exploits forever, it will have linear total regret. SGLD based Thompson sampling achieves sub-linear cumulative regret. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
  • 56. Results If an algorithm explores forever or exploits forever, it will have linear total regret. SGLD based Thompson sampling achieves sub-linear cumulative regret. (a) N=10, M=10, K=1 (b) N=10, M=10, K=3 (c) N=20, M=20, K=1 (d) N=30, M=30, K=1 Figure: Cumulative Regret on different size of synthesis data Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 23 / 37
  • 57. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 24 / 37
  • 58. P´olya-Gamma Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
  • 59. P´olya-Gamma If rewards are binary or count value then the likelihood is not conjugate with prior. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
  • 60. P´olya-Gamma If rewards are binary or count value then the likelihood is not conjugate with prior. It is difficult to update posterior distribution in online manner. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
  • 61. P´olya-Gamma If rewards are binary or count value then the likelihood is not conjugate with prior. It is difficult to update posterior distribution in online manner. Use PG scheme that allows us to derive inference algorithm that looks like simple Bayesian linear regression. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 25 / 37
  • 62. P´olya-gamma data augmentation P´olya-Gamma augmentation scheme allows us to re-write the Negative Binomial likelihood as follows, (exp(ψ)) a (1 + exp(ψ)) b = 2−b eκψ ∞ 0 e−ωψ2 /2 p(ω)dω Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
  • 63. P´olya-gamma data augmentation P´olya-Gamma augmentation scheme allows us to re-write the Negative Binomial likelihood as follows, (exp(ψ)) a (1 + exp(ψ)) b = 2−b eκψ ∞ 0 e−ωψ2 /2 p(ω)dω Log-likelihood Q(β), is a quadratic form in β [3], Q(β) = N n=0 − 1 2 ωnψ2 n + κnψn where, κn = bn − an/2, ψn = xT n β and ωn =D PG(b, 0) Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 26 / 37
  • 64. P´olya-gamma data augmentation Logistic-Bernoulli likelihood, Q(ξij |ωij ) ∝ exp − 1 2 ωij (κij /ωij − ξij ) 2 where κij = yij − 0.5 and ξij = uT i vj . Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
  • 65. P´olya-gamma data augmentation Logistic-Bernoulli likelihood, Q(ξij |ωij ) ∝ exp − 1 2 ωij (κij /ωij − ξij ) 2 where κij = yij − 0.5 and ξij = uT i vj . Posterior on ω, p(ω|b, ˆψ) = exp(−ω ˆψ2 2 )p(b, 0) ∞ 0 exp(−ω ˆψ2 2 )p(b, 0)dω = PG(b, ˆψ) where, p(b, 0) is P´olya-Gamma distribution. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 27 / 37
  • 66. P´olya-Gamma Sampling Scheme To draw a sample from posterior of parameter, simply iterate following two steps: (ωij |ui ) ∼ PG(1, vT j ui ) ∀j ∈ [1, 2, .., Ωi ] (ui |Ri , ωi ) ∼ N(µui , Σui ) where Σui = ( |Ωi | j=1 vj vT j ωij + I)−1 µui = Σui ( |Ωi | j=1 vj kij ) N(0, λ−1 I) prior distribution over users feature vector. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 28 / 37
  • 67. Outline 1 Introduction Online Interactive Recommender System Matrix Factorization Existing Solutions Thompson Sampling 2 SGLD based Thompson Sampling SGLD Results 3 P´olya-Gamma based Thompson Sampling P´olya-Gamma Algorithm 4 Results 5 Conclusion Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 29 / 37
  • 68. Algorithm Algorithm 1 P´olyaGamma-Stochastic Gradient Descent(PGSGD) Input: λ 1: Initialize: A ← λ−1 I; b ← 0 2: for t = 1, ...., T do 3: Receive user it 4: Σit = A−1 b; µit = A−1 5: Sample uit from N(µit , Σit ) 6: Get MAP solution of item vj , ∀ j ∈ [Ωuit ] 7: Choose item j∗ (it)=argmaxj uT it vj 8: Receive the reward rit ,j∗(it ) 9: ωij ∼ PG(1, uit Tvj∗(it )) 10: kij = rit ,j∗(it ) 11: Update A= A + vj∗(it )vT j∗(it )ωij 12: Update b= b + vj∗(it )kij 13: Update items feature vector at MAP solution using SGD 14: end for Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 30 / 37
  • 69. Results Figure: Jester Real-valued Dataset Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 31 / 37
  • 70. Results Figure: Jester Binary-valued Dataset Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 32 / 37
  • 71. Summary and Future Work Presented a fast and generalized SGLD based Thompson sampling algorithm. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
  • 72. Summary and Future Work Presented a fast and generalized SGLD based Thompson sampling algorithm. Can handle any type of reward. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
  • 73. Summary and Future Work Presented a fast and generalized SGLD based Thompson sampling algorithm. Can handle any type of reward. Can handle constraint on parameter space. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
  • 74. Summary and Future Work Presented a fast and generalized SGLD based Thompson sampling algorithm. Can handle any type of reward. Can handle constraint on parameter space. Presented an efficient algorithm for binary or count valued data. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
  • 75. Summary and Future Work Presented a fast and generalized SGLD based Thompson sampling algorithm. Can handle any type of reward. Can handle constraint on parameter space. Presented an efficient algorithm for binary or count valued data. Analysis can be done on regret bound. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 33 / 37
  • 76. Questions ? Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 34 / 37
  • 77. References I O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249–2257, 2011. R. M. Neal et al. Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2:113–162, 2011. N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using p´olya-gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349, 2013. H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951. H. Wang, Q. Wu, and H. Wang. Factorization bandits for interactive recommendation. 2017. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 35 / 37
  • 78. References II M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011. Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 36 / 37
  • 79. The End Vijay Pal Jat (IIT Kanpur) Online Matrix Factorization 37 / 37