Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

深層生成モデルを用いたマルチモーダルデータの半教師あり学習

JSAI2017での発表資料

  • Login to see the comments

深層生成モデルを用いたマルチモーダルデータの半教師あり学習

  1. 1. 2017/05/23
  2. 2. ¤ ¤ ¤ ¤ ¤ ¤ →
  3. 3. ¤ ¤ [Guillaumin+ 10] [Cheng+ 16] ¤ end-to-end ¤ VAE [Kingma+ 14][Maaløe+ 16] GAN [Salimans+ 16] ¤ JMVAE[Suzuki+ 16]
  4. 4. ¤ Guillaumin [Guillaumin+ 10] ¤ ¤ 2 ¤ MKL ¤ ¤ Cheng [Cheng+ 16] ¤ RGB-D ¤ Co-training ¤ ¤ ->
  5. 5. ¤ Semi-Supervised Learning with Deep Generative Models [Kingma+ 14] ¤ ¤ ! " # ℒ = ℒ " + ℒ ", # + ( ) *[−log01 # " ] 34 " #, ! 01 ! ", # 01 # "
  6. 6. ¤ Joint multimodal variational autoencoders (JMVAE)[Suzuki+ 16] ¤ joint 3(", 6) ¤ " 6 ! joint representation ¤ それらの生成過程を次のように考える. z ∼ pθ(z) (1) x, w ∼ pθ(x, w|z) (2) ,それぞれのドメインのデータについて条件付き独立と仮定する. pθ(x, w|z) = pθx (x|z)pθw (w|z) (3) x z w 図 1 両方のドメインが観測されたときの TrVB モデルの変分下界 L は,次のようになる. 2 01 ! ", # 34 ", 6 ! 34 ", 6 ! = 34("|!)34(6|!)
  7. 7. ¤ ¤ ¤ ¤ ¤ Sea [blue, sky, sand,…]
  8. 8. SS-MVAE ¤ Semi-Supervised Multimodal Variational AutoEncoder (SS-MVAE) ¤ JMVAE ¤ L = {(x1, w1, y1, ), ..., xi wi ∈ {0, 1}C N , wN )} q(y|x, w) (a) SS-MVAE (b) SS-HMVAE 1: 2 34 ", 6 #, !01 # ", 6 34 ", 6 #, ! = 34 " #, ! 34 6 #, ! 01 ! ", 6, #
  9. 9. SS-MVAE ¤ SS-MVAE ¤ ¤ = µ + σ2 ⊙ ϵ 12 Maddison 16] 1 y)dz Ll(x, w, y) = L(x, w, y) − α · log qφ(y|x, w) (4) α α = 0.5 · M+N M J = (xi,wi,yi)∈DL Ll(xi, wi, yi) + (xj ,wj )∈DU U(xj, wj) (5) JMVAE φ θ qφ(y|x) Semi-Supervised Multimodal Variational AutoEncoder SS-MVAE 3.4 SS-HMVAE SS-MVAE a p(x, w, y) = pθ(x|a)pθ(w|a)pθ(a|z, y)p(z)p(y)dadz 1 (a) (b) SS-MVAE y z x w y z a x w |z)pθ(z)dz θ z)pθ(z) w) ] (1) µ + σ2 ⊙ ϵ 12 ddison 16] log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 Ll(x, w, y) = L(x, w, y) − α · log qφ(y|x, w) (4) α α = 0.5 · M+N M J = (xi,wi,yi)∈DL Ll(xi, wi, yi) + (xj ,wj )∈DU U(xj, wj) (5) JMVAE φ θ qφ(y|x) Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C JMVAE qφ( Se Variational AutoEncoder SS-M 3.4 SS-MVAE a pθ(x|a)pθ(w|a)pθ(a|z, y)p(z) (a) (b) SS-MVAE q(a, z|x, w, y) = q(z|x, w, y) = p(z|x, w, y) Gulrajani 16] 2 12 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C (xi,wi,yi)∈ Variational Au 3.4 S a pθ(x|a)pθ(w (a) (b) p(z|x, w Gulrajani 16] 2 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C Va 3. a (a G (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 x1, w1, y1, ), ..., xi wi q(y|x, w) oder JMVAE x w pθ(w|z)pθ(z)dz θ (w|z)pθ(z) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) [Kingma 14a, Rezende 14] 12 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] J = (xi,wi,yi)∈ Variational Au 3.4 S a pθ(x|a)pθ( (a) (b) = {(x1, w1, y1, ), ..., xi wi {0, 1}C wN )} q(y|x, w) utoencoder JMVAE x w pθ(x|z)pθ(w|z)pθ(z)dz θ )dz (x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 0
  10. 10. (a) SS-MVAE (b) SS-HMVAE 1: SS-HMVAE ¤ Semi-Supervised Hierarchical Multimodal Variational AutoEn- coder (SS-HMVAE) ¤ ¤ 2 34 9 #, ! 34 ", 6 9 01(9|", 6) 01(!|9, #) 01(#|", 6)
  11. 11. 2 ¤ SS-HMVAE 9 ¤ auxiliary variables ¤ ¤ [Maaløe+ 16] DL = {(x1, w1, y1, ), ..., xi wi y ∈ {0, 1}C , (xN , wN )} N q(y|x, w) ional autoencoder JMVAE x w w) = pθ(x|z)pθ(w|z)pθ(z)dz θ E(x, w) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] L = {(x1, w1, y1, ), ..., xi wi ∈ {0, 1}C N , wN )} q(y|x, w) l autoencoder JMVAE x w = pθ(x|z)pθ(w|z)pθ(z)dz θ w) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) SS-MVAE SS-HMVAE q(z|x, w, y) = Z q(a, z|x, w, y)da
  12. 12. ¤ ¤ ¤ ¤ Gumbel softmax[Jang+ 2016] ¤ 15,000 10,000 975,000 M = 15, 000 N = 975, 000 4.2 x w R3857 {0, 1}2000 pθ(x|z, y) = N(x|µθ(z, y), diag(σ2 θ (z, y))) (8) pθ(w|z, y) = Ber(w|πθ(z, y)) (9) pθ(x|a) = N(x|µθ(a), diag(σ2 θ (a))) (10) pθ(w|a) = Ber(w|πθ(a)) (11) y {0, 1}38 qφ(y|x, w) = Ber(y|πθ(x, w)) (12) SS-MVAE SS-HMVAE ∗2 http://www.flickr.com ∗3 http://www.cs.toronto.edu/˜nitish/multimodal/index.html SS-HMVAE MC=10 SS-MVAE MAP MAP HMVAE 5. 2 ∗4 https://github.com/Thean ∗5 https://github.com/Lasag ∗6 https://github.com/masa- ∗7 [ 16] LRAP MAP 3 4.2 x w R3857 {0, 1}2000 pθ(x|z, y) = N(x|µθ(z, y), diag(σ2 θ (z, y))) (8) pθ(w|z, y) = Ber(w|πθ(z, y)) (9) pθ(x|a) = N(x|µθ(a), diag(σ2 θ (a))) (10) pθ(w|a) = Ber(w|πθ(a)) (11) y {0, 1}38 qφ(y|x, w) = Ber(y|πθ(x, w)) (12) SS-MVAE SS-HMVAE ∗2 http://www.flickr.com ∗3 http://www.cs.toronto.edu/˜nitish/multimodal/index.html MAP MAP HMVAE 5. 2 ∗4 https://github.com/Thea ∗5 https://github.com/Lasag ∗6 https://github.com/masa ∗7 [ 16] LRAP MAP 3 Semi- ultimodal Variational AutoEn- |a)pθ(w|a)pθ(a|z, y)p(z)p(y) qφ(a, z|x, w, y) ] (6) p(z) = N(z|0, I) (13) p(y) = Ber(y|π) (14) pθ(a|z, y) = N(a|µθ(z, y), diag(σ2 θ (z, y))) (15) qφ(a|x, w) = N(z|µθ(x, w), diag(σ2 θ (x, w))) (16) qφ(z|a, y) = N(z|µθ(a, y), diag(σ2 θ (a, y))) (17) rectified linear unit Adam [Kingma 14b]
  13. 13. ¤ Tars ¤ Tars ¤ ¤ ¤ Github https://github.com/masa-su/Tars P(A,B,C,D)=P(A)P(B∣A)P(C∣A)P(D∣A,B)
  14. 14. Tars ¤ VAE x = InputLayer((None,n_x)) q_0 = DenseLayer(x,num_units=512,nonlinearity=activation) q_1 = DenseLayer(q_0,num_units=512,nonlinearity=activation) q_mean = DenseLayer(q_1,num_units=n_z,nonlinearity=linear) q_var = DenseLayer(q_1,num_units=n_z,nonlinearity=softplus) q = Gauss(q_mean,q_var,given=[x]) 0(!|") z = InputLayer((None,n_z)) p_0 = DenseLayer(z,num_units=512,nonlinearity=activation) p_1 = DenseLayer(p_0,num_units=512,nonlinearity=activation) p_mean = DenseLayer(p_1,num_units=n_x,nonlinearity=sigmoid) p = Bernoulli(p_mean,given=[z]) 3("|!) model = VAE(q, p, n_batch=n_batch, optimizer=adam) lower_bound_train = model.train([train_x])
  15. 15. Tars ¤ ¤ z = q.sample_given_x(x) # z = q.sample_mean_given_x(x) # log_likelihood = q.log_likelihood_given_x(x, z) • • !~0(!|") log 0 (!|")
  16. 16. ¤ Flickr25k ¤ ¤ 38 one-hot ¤ 3,857 2,000 ¤ ¤ 100 2 5000 -> 97 5000 desert, nature, landscape, sky rose, pink clouds, plant life, sky, tree flower, plant life
  17. 17. ¤ ¤ SS-MVAE ¤ SS-HMVAE ¤ ¤ SVM DBN Autoencoder DBM JMVAE ¤ mean average precision (mAP) ¤ 3. 3.1 DL = {(x1, w1, y1, ), ..., (xM , wM , yM )} xi wi y ∈ {0, 1}C ∗1 DU = {(x1, w1, y1, ), ..., (xN , wN )} M << N q(y|x, w) 3.2 JMVAE joint multimodal variational autoencoder JMVAE [Suzuki 16][ 16] z x w p(x, w) = pθ(x|z)pθ(w|z)pθ(z)dz VAE JMVAE θ θ −UJMV AE(x, w) log p(x, w) = log pθ(x, w, z)dz ≥ Eqφ(z|x,w)[log pθ(x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] ≡ −UJMV AE(x, w) (1) qφ(z|x, w) φ (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 3. 3.1 DL = {(x1, w1, y1, ), ..., (xM , wM , yM )} xi wi y ∈ {0, 1}C ∗1 DU = {(x1, w1, y1, ), ..., (xN , wN )} M << N q(y|x, w) 3.2 JMVAE joint multimodal variational autoencoder JMVAE [Suzuki 16][ 16] z x w p(x, w) = pθ(x|z)pθ(w|z)pθ(z)dz VAE JMVAE θ θ −UJMV AE(x, w) log p(x, w) = log pθ(x, w, z)dz ≥ Eqφ(z|x,w)[log pθ(x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] ≡ −UJMV AE(x, w) (1) qφ(z|x, w) φ (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 SS-MVAE SS-HMVAE
  18. 18. mAP SVM [Huiskes+] 0.475 DBN [Srivastava+]* 0.609 Autoencoder [Ngiam+]* 0.612 DBM [Srivastava+]* 0.622 JMVAE [Suzuki+] 0.618 SS-MVAE (MC=1) 0.612 SS-MVAE (MC=10) 0.626 SS-HMVAE (MC=1) 0.632 SS-HMVAE (MC=10) 0.628 • • SS-HMVAE • • • • * • MC
  19. 19. ¤ mAP validation curve MAP 0.618 SS-MVAE (MC=1) 0.612 SS-HMVAE (MC=1) 0.632 SS-MVAE (MC=10) 0.626 SS-HMVAE (MC=10) 0.628 2: MAP Flickr retrie internation trieval, pp. [Ioffe 15] Ioffe Acceleratin covariate sh [Jang 16] Jan cal Repara preprint ar [Kingma 13] Auto-encod arXiv:1312 [Kingma 14a] and Wellin generative Processing [Kingma 14b] stochastic o (2014) [Maaløe 16] M • SS-HMVAE • SS-MVAE JMVAE
  20. 20. ¤ MIR Flickr25k ¤ ¤ ¤ ¤ ¤ ¤ RGB-D
  21. 21. ¤ ¤ JMVAE SS-HMVAE SS-MVAE ¤ Tars ¤ ¤ SS-HMVAE ¤ ¤ ¤ ¤ GAN VAT

×