Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20191026 bayes dl

23 views

Published on

「ベイズ深層学習」4.2 - 4.6

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

20191026 bayes dl

  1. 1. ベイズ深層学習 4.2 ∼ 4.6
  2. 2. Variational Bayes DKL = [q(Z; ξ) ∥ p(Z|X)] = − ∫ q(Z; ξ)ln p(Z|X) q(Z; ξ) dZ = − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ)p(X) dZ = ln p(X) − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ) dZ = ln p(X) − ℒ(ξ) • Maximize ELBO Minimize KL-divergence • Normalization constant not required for computing ELBO DKL(q ∥ p) ≥ 0 ⇒ ln p(X) ≥ ℒ(ξ) Evidence lower bound (ELBO)
  3. 3. Linear dimensionality reduction p(X ∣ Z, W) = N ∏ n=1 p(xn |zn, W) = N ∏ n=1 𝒩 (xn ∣ Wzn, σ2 x I) Observation model Prior p(Z) = N ∏ n=1 𝒩 (zn ∣ 0, I) p(W) = ∏ i,j 𝒩 (wij ∣ 0,σ2 w) Variational posterior p(Z, W ∣ X) ≈ q(Z)q(W)
  4. 4. ℒ = Eq(Z)q(W) [ln p(X|Z, W)] − DKL [q(Z) ∥ p(Z)] − DKL [q(W) ∥ p(W)] Likelihood Regularizer Objective ℒqi(Z) = Eqi(Z)qi+1(W) [ln p(X ∣ Z, W)] − DKL [qi+1(W) ∥ p(W)] + const . = Eqi+1(W) ln exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) qi+1(W) + const . = DKL [qi+1(W) ∥ ri(W)] + const . ∴ qi+1(W) = ri(W) ∝ exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W)
  5. 5. Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W) qi+1(Z) = ri(Z) ∝ exp (Eqi(W) [ln p(X ∣ Z, W)]) p(Z)
  6. 6. Gaussian mixture model p(X ∣ S, W) = N ∏ n=1 p(xn |sn, W) = N ∏ n=1 𝒩 (xn ∣ Wsn, σ2 x I) Observation model sn ∈ {0,1}K , K ∑ k=1 sn,k = 1 Prior Variational posterior p(S) = N ∏ n=1 cat (sn ∣ π) p(S, W ∣ X) ≈ q(S)q(W)
  7. 7. Laplace approximation p(Z ∣ X) ≈ 𝒩 (Z ∣ ZMAP, Λ(ZMAP)) Quadratic approximation of posterior around Λ(Z) = − ∇2 Zln p (Z ∣ X) ∵ ln p(Z ∣ X) ≈ ln p (ZMAP ∣ X)+ (Z − ZMAP) ⊤ ∇2 Zln p (Z|X) Z=ZMAP (Z − ZMAP) ZMAP ∵ ∇Zln p(Z|X) Z=ZMAP = 0
  8. 8. Moment matching q(z; η) = h(z)exp (η⊤ t(z) − a(η))Approximate byp(z) DKL (p(z) ∥ q(z; η)) = − Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)] = − ηEp(z) [t(z)] + a(η) + const . ∴ Eq(z;η) [t(z)] = Ep(z) [t(z)] ∇ηDKL (p(z) ∥ q(z; η)) = − Ep(z) [t(z)] + ∇ηa(η) = − Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
  9. 9. Assumed density filtering qi+1(θ) ≈ ri+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ) = Z−1 i+1fi+1(θ)qi(θ) With conjugate prior, are the same familypi(θ)(i = 0,1,⋯) pi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)pi(θ) With non-conjugate prior, … Consider estimation for sequence of data 𝒟1, 𝒟2, ⋯ (q0(θ) = p0(θ)) Moment matching
  10. 10. MM with 1-dim Gaussian distribution qi(θ) = 𝒩 (θ ∣ μi, vi) Differentiate w.r.t.ln Zi+1 μi Normalization constant Zi+1 = ∫ fi+1(θ) 1 2πv2 i exp ( − (θ − μi)2 2v2 i ) dθ ∂ ∂μi ln Zi+1 = 1 Zi+1 ∫ fi+1(θ)𝒩(θ ∣ μi, vi) θ − μi vi dθ = Eri+1 [θ] − μi vi ∴ μi+1 = Eri+1 [θ] = μi + vi ∂ ∂μi ln Zi+1
  11. 11. Differentiate w.r.t.ln Zi+1 vi
  12. 12. MM with Gamma distribution
  13. 13. MM for probit regression Marginal likelihood is intractable p(Y ∣ X, w) = N ∏ n=1 ϕ(yn ∣ xn, w) p(w) = 𝒩 (w ∣ 0,v0) Z = ∫ p(Y ∣ X, w)p(w)dw Instead, apply recursive update qi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ)
  14. 14. Expectation propagation

×