1. Bayesian Dark Knowledge and Matrix Factorization
Masatoshi Uehara
Mentor: Oono Kenta, Brian Vogel
October 27, 2016
2. Contents
1 Introduction
2 Bayesian Dark Knowledge with various SG-MCMC methods
3 Matrix Factorization
(JPN) Masatoshi October 27, 2016 2 / 18
3. Introduction
Introduction
SG-MCMC is a sampling algorithm towards large data.
We apply a variety of SG-MCMC methods to Bayesian Dark
Knowledge.
We combine GANs with Bayesian Dark Knowledge.
We apply SG-MCMC and neural networks to matrix factorization.
(JPN) Masatoshi October 27, 2016 3 / 18
4. Introduction
SGLD
SGLD
SGLD is a method combining with SGD and MLA(a sampling
algorithm)
θt+1 ← θt − tD ˜U(θt) + N(0, 2 D)
In the case of Bayesian Neural Network, the formula is as follows:
∆θt =
t
2
log p(θt) +
N
n
log p(yti |xti , θt) + ηt, ηt ∼ N(0, t).
Note that the noise term is removed in SGD.
(JPN) Masatoshi October 27, 2016 4 / 18
5. Bayesian Dark Knowledge with various SG-MCMC methods
Bayesian Dark Knowledge Overview
Overview
Bayesian Dark knowledge is a method of combining SGLD with the
concept of distillation.
SGLD is a useful method for learning Bayeisian Deep Networks.
The problem is that SGLD needs to archive many copies of
parameters.
The motivation is replacing a set of neural networks with a single
deep network.
We can estimate the confidence rate even if data number is small.
(JPN) Masatoshi October 27, 2016 5 / 18
6. Bayesian Dark Knowledge with various SG-MCMC methods
Method
Teacher networks is denoted as p(y|x, DN).
Student network is denoted as S(y|x, ω).
In the distillation phase, the following
equation is minimized.
Distillation loss
L(ω) = p(ω|x)p(x)
≈
1
Θ
1
D
θ∈Θ x ∈D
p(y|x , θ)[S(y|x , ω)]dx
(JPN) Masatoshi October 27, 2016 6 / 18
7. Bayesian Dark Knowledge with various SG-MCMC methods
Algorithm
Algorithm
Note that the student network is trained online. We do not have to
archive many copies of parameters.
(JPN) Masatoshi October 27, 2016 7 / 18
8. Bayesian Dark Knowledge with various SG-MCMC methods
How to improve?
We want to make a variety of teachers.
Use other SG-MCMC methods.
How to make unlabeled data set?
Use GANs.
(JPN) Masatoshi October 27, 2016 8 / 18
9. Bayesian Dark Knowledge with various SG-MCMC methods
SG-HMC and SG-NHT
SG-HMC
θt+1 ← θt + M−1
rt
rt+1 ← rt − t
˜U(θt) − tCM−1
rt + N(0, t(2C − tBt))
SG-NHT
θt+1 ← θt + rt
rt+1 ← rt − t
˜U(θt) − tζtrt + N(0, t(2C − tBt))
ζt+1 ← ζt + (
1
d
rT
t rt − 1)
(JPN) Masatoshi October 27, 2016 9 / 18
10. Bayesian Dark Knowledge with various SG-MCMC methods
Bayesian Dark Knowledge with GANs
GANs can mimic the empirical
distribution.
In the distillation phase, we use GANs
as a simulator.
How to remove poor images....
(JPN) Masatoshi October 27, 2016 10 / 18
11. Bayesian Dark Knowledge with various SG-MCMC methods
Anormaly detection by GANs
uLSIF
GAN
(JPN) Masatoshi October 27, 2016 11 / 18
12. Bayesian Dark Knowledge with various SG-MCMC methods
Result : MNIST
Setting: 800 labeled samples in MNIST, Epoch: 2000, Burn-in
intervals:200, Thinning intervals:5.
Network 784-1200-1200-10, Activation: Relu
Result
(JPN) Masatoshi October 27, 2016 12 / 18
13. Matrix Factorization
Matrix Factorization
Rating matrix is given.
ui ....user feature, vj ...item
feature , Rij ... rating matrix.
When learning, use SGD.
ui+1 ← ui − ui [(Ri,j − uT
i vj )2
+ λu2
i ]
vj+1 ← vj − vj [(Ri,j − uT
i vj )2
+ λv2
j ]
(JPN) Masatoshi October 27, 2016 13 / 18
14. Matrix Factorization
Matrix Factorization with SGLD
p(R|U, V , τ) =
L
i=1
M
j=1
[N(Rij |UT
i Vj , τ−1
]Iij
p(U|λU) =
L
i=1
N(Ui |0, λ−1
U )
p(V |λV ) =
M
j=1
N(Vj |0, λ−1
V )
λUd
∼ Gamma(α0, β0)
λVd
∼ Gamma(α0, β0)
Use Gibbs Sampling.
When updating u and v, SGLD
is used.
λ is automatically tuned.
(JPN) Masatoshi October 27, 2016 14 / 18
15. Matrix Factorization
Neural Network Matrix Factorization
Estimate Xn,m by the equation: ˆXn,m = fθ(Un, Vm)
Cost function:
(Xn,m − ˆXn,m)2
+ λ[ Un
2
2 + Vm
2
2]
Update θ, un, vm at the same time
NNMF can reach state of the art accuracy....
(JPN) Masatoshi October 27, 2016 15 / 18
16. Matrix Factorization
Results
Use ML-100K, ML-1M data set.
Evaluate by root mean square method(RMSE).
Unfortunately, state of the art accuracy is not reproduced....
(JPN) Masatoshi October 27, 2016 16 / 18
17. Matrix Factorization
Discussion
Does data generated by GANs help classifiers?
What is a good method of combining Neural Networks with matrix
factorization?
(JPN) Masatoshi October 27, 2016 17 / 18
18. Matrix Factorization
References Papers
Large-Scale Distributed Bayesian Matrix Factorization using
Stochastic Gradient MCMC
Neural Network Matrix Factorization
A Complete Recipe for Stochastic Gradient MCMC
Bayesian Dark Knowledge
Probabilistic Matrix Factorization
(JPN) Masatoshi October 27, 2016 18 / 18