Successfully reported this slideshow.
Upcoming SlideShare
×

# 20191026 bayes dl

23 views

Published on

「ベイズ深層学習」4.2 - 4.6

Published in: Engineering
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### 20191026 bayes dl

1. 1. ベイズ深層学習 4.2 ∼ 4.6
2. 2. Variational Bayes DKL = [q(Z; ξ) ∥ p(Z|X)] = − ∫ q(Z; ξ)ln p(Z|X) q(Z; ξ) dZ = − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ)p(X) dZ = ln p(X) − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ) dZ = ln p(X) − ℒ(ξ) • Maximize ELBO Minimize KL-divergence • Normalization constant not required for computing ELBO DKL(q ∥ p) ≥ 0 ⇒ ln p(X) ≥ ℒ(ξ) Evidence lower bound (ELBO)
3. 3. Linear dimensionality reduction p(X ∣ Z, W) = N ∏ n=1 p(xn |zn, W) = N ∏ n=1 𝒩 (xn ∣ Wzn, σ2 x I) Observation model Prior p(Z) = N ∏ n=1 𝒩 (zn ∣ 0, I) p(W) = ∏ i,j 𝒩 (wij ∣ 0,σ2 w) Variational posterior p(Z, W ∣ X) ≈ q(Z)q(W)
4. 4. ℒ = Eq(Z)q(W) [ln p(X|Z, W)] − DKL [q(Z) ∥ p(Z)] − DKL [q(W) ∥ p(W)] Likelihood Regularizer Objective ℒqi(Z) = Eqi(Z)qi+1(W) [ln p(X ∣ Z, W)] − DKL [qi+1(W) ∥ p(W)] + const . = Eqi+1(W) ln exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) qi+1(W) + const . = DKL [qi+1(W) ∥ ri(W)] + const . ∴ qi+1(W) = ri(W) ∝ exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W)
5. 5. Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W) qi+1(Z) = ri(Z) ∝ exp (Eqi(W) [ln p(X ∣ Z, W)]) p(Z)
6. 6. Gaussian mixture model p(X ∣ S, W) = N ∏ n=1 p(xn |sn, W) = N ∏ n=1 𝒩 (xn ∣ Wsn, σ2 x I) Observation model sn ∈ {0,1}K , K ∑ k=1 sn,k = 1 Prior Variational posterior p(S) = N ∏ n=1 cat (sn ∣ π) p(S, W ∣ X) ≈ q(S)q(W)
7. 7. Laplace approximation p(Z ∣ X) ≈ 𝒩 (Z ∣ ZMAP, Λ(ZMAP)) Quadratic approximation of posterior around Λ(Z) = − ∇2 Zln p (Z ∣ X) ∵ ln p(Z ∣ X) ≈ ln p (ZMAP ∣ X)+ (Z − ZMAP) ⊤ ∇2 Zln p (Z|X) Z=ZMAP (Z − ZMAP) ZMAP ∵ ∇Zln p(Z|X) Z=ZMAP = 0
8. 8. Moment matching q(z; η) = h(z)exp (η⊤ t(z) − a(η))Approximate byp(z) DKL (p(z) ∥ q(z; η)) = − Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)] = − ηEp(z) [t(z)] + a(η) + const . ∴ Eq(z;η) [t(z)] = Ep(z) [t(z)] ∇ηDKL (p(z) ∥ q(z; η)) = − Ep(z) [t(z)] + ∇ηa(η) = − Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
9. 9. Assumed density ﬁltering qi+1(θ) ≈ ri+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ) = Z−1 i+1fi+1(θ)qi(θ) With conjugate prior, are the same familypi(θ)(i = 0,1,⋯) pi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)pi(θ) With non-conjugate prior, … Consider estimation for sequence of data 𝒟1, 𝒟2, ⋯ (q0(θ) = p0(θ)) Moment matching
10. 10. MM with 1-dim Gaussian distribution qi(θ) = 𝒩 (θ ∣ μi, vi) Diﬀerentiate w.r.t.ln Zi+1 μi Normalization constant Zi+1 = ∫ fi+1(θ) 1 2πv2 i exp ( − (θ − μi)2 2v2 i ) dθ ∂ ∂μi ln Zi+1 = 1 Zi+1 ∫ fi+1(θ)𝒩(θ ∣ μi, vi) θ − μi vi dθ = Eri+1 [θ] − μi vi ∴ μi+1 = Eri+1 [θ] = μi + vi ∂ ∂μi ln Zi+1
11. 11. Diﬀerentiate w.r.t.ln Zi+1 vi
12. 12. MM with Gamma distribution
13. 13. MM for probit regression Marginal likelihood is intractable p(Y ∣ X, w) = N ∏ n=1 ϕ(yn ∣ xn, w) p(w) = 𝒩 (w ∣ 0,v0) Z = ∫ p(Y ∣ X, w)p(w)dw Instead, apply recursive update qi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ)
14. 14. Expectation propagation