Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# 深層生成モデルを用いたマルチモーダルデータの半教師あり学習

JSAI2017での発表資料

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Login to see the comments

### 深層生成モデルを用いたマルチモーダルデータの半教師あり学習

1. 1. 2017/05/23
2. 2. ¤ ¤ ¤ ¤ ¤ ¤ →
3. 3. ¤ ¤ [Guillaumin+ 10] [Cheng+ 16] ¤ end-to-end ¤ VAE [Kingma+ 14][Maaløe+ 16] GAN [Salimans+ 16] ¤ JMVAE[Suzuki+ 16]
4. 4. ¤ Guillaumin [Guillaumin+ 10] ¤ ¤ 2 ¤ MKL ¤ ¤ Cheng [Cheng+ 16] ¤ RGB-D ¤ Co-training ¤ ¤ ->
5. 5. ¤ Semi-Supervised Learning with Deep Generative Models [Kingma+ 14] ¤ ¤ ! " # ℒ = ℒ " + ℒ ", # + ( ) *[−log01 # " ] 34 " #, ! 01 ! ", # 01 # "
6. 6. ¤ Joint multimodal variational autoencoders (JMVAE)[Suzuki+ 16] ¤ joint 3(", 6) ¤ " 6 ! joint representation ¤ それらの生成過程を次のように考える. z ∼ pθ(z) (1) x, w ∼ pθ(x, w|z) (2) ，それぞれのドメインのデータについて条件付き独立と仮定する． pθ(x, w|z) = pθx (x|z)pθw (w|z) (3) x z w 図 1 両方のドメインが観測されたときの TrVB モデルの変分下界 L は，次のようになる． 2 01 ! ", # 34 ", 6 ! 34 ", 6 ! = 34("|!)34(6|!)
7. 7. ¤ ¤ ¤ ¤ ¤ Sea [blue, sky, sand,…]
8. 8. SS-MVAE ¤ Semi-Supervised Multimodal Variational AutoEncoder (SS-MVAE) ¤ JMVAE ¤ L = {(x1, w1, y1, ), ..., xi wi ∈ {0, 1}C N , wN )} q(y|x, w) (a) SS-MVAE (b) SS-HMVAE 1: 2 34 ", 6 #, !01 # ", 6 34 ", 6 #, ! = 34 " #, ! 34 6 #, ! 01 ! ", 6, #
9. 9. SS-MVAE ¤ SS-MVAE ¤ ¤ = µ + σ2 ⊙ ϵ 12 Maddison 16] 1 y)dz Ll(x, w, y) = L(x, w, y) − α · log qφ(y|x, w) (4) α α = 0.5 · M+N M J = (xi,wi,yi)∈DL Ll(xi, wi, yi) + (xj ,wj )∈DU U(xj, wj) (5) JMVAE φ θ qφ(y|x) Semi-Supervised Multimodal Variational AutoEncoder SS-MVAE 3.4 SS-HMVAE SS-MVAE a p(x, w, y) = pθ(x|a)pθ(w|a)pθ(a|z, y)p(z)p(y)dadz 1 (a) (b) SS-MVAE y z x w y z a x w |z)pθ(z)dz θ z)pθ(z) w) ] (1) µ + σ2 ⊙ ϵ 12 ddison 16] log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 Ll(x, w, y) = L(x, w, y) − α · log qφ(y|x, w) (4) α α = 0.5 · M+N M J = (xi,wi,yi)∈DL Ll(xi, wi, yi) + (xj ,wj )∈DU U(xj, wj) (5) JMVAE φ θ qφ(y|x) Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C JMVAE qφ( Se Variational AutoEncoder SS-M 3.4 SS-MVAE a pθ(x|a)pθ(w|a)pθ(a|z, y)p(z) (a) (b) SS-MVAE q(a, z|x, w, y) = q(z|x, w, y) = p(z|x, w, y) Gulrajani 16] 2 12 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C (xi,wi,yi)∈ Variational Au 3.4 S a pθ(x|a)pθ(w (a) (b) p(z|x, w Gulrajani 16] 2 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] ≡ −L(x, w, y) (2) ∗1 C Va 3. a (a G (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 x1, w1, y1, ), ..., xi wi q(y|x, w) oder JMVAE x w pθ(w|z)pθ(z)dz θ (w|z)pθ(z) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) [Kingma 14a, Rezende 14] 12 Gumbel softmax[Jang 16, Maddison 16] φ θ 1 3.3 SS-MVAE JMVAE y p(x, w, y) = pθ(x|z, y)pθ(w|z, y)p(z)p(y)dz 1(a) log p(x, w, y) = log pθ(x, w, z, y)dz ≥ Eqφ(z|x,w,y)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z|x, w, y) ] J = (xi,wi,yi)∈ Variational Au 3.4 S a pθ(x|a)pθ( (a) (b) = {(x1, w1, y1, ), ..., xi wi {0, 1}C wN )} q(y|x, w) utoencoder JMVAE x w pθ(x|z)pθ(w|z)pθ(z)dz θ )dz (x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 0
10. 10. (a) SS-MVAE (b) SS-HMVAE 1: SS-HMVAE ¤ Semi-Supervised Hierarchical Multimodal Variational AutoEn- coder (SS-HMVAE) ¤ ¤ 2 34 9 #, ! 34 ", 6 9 01(9|", 6) 01(!|9, #) 01(#|", 6)
11. 11. 2 ¤ SS-HMVAE 9 ¤ auxiliary variables ¤ ¤ [Maaløe+ 16] DL = {(x1, w1, y1, ), ..., xi wi y ∈ {0, 1}C , (xN , wN )} N q(y|x, w) ional autoencoder JMVAE x w w) = pθ(x|z)pθ(w|z)pθ(z)dz θ E(x, w) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] L = {(x1, w1, y1, ), ..., xi wi ∈ {0, 1}C N , wN )} q(y|x, w) l autoencoder JMVAE x w = pθ(x|z)pθ(w|z)pθ(z)dz θ w) (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) SS-MVAE SS-HMVAE q(z|x, w, y) = Z q(a, z|x, w, y)da
12. 12. ¤ ¤ ¤ ¤ Gumbel softmax[Jang+ 2016] ¤ 15,000 10,000 975,000 M = 15, 000 N = 975, 000 4.2 x w R3857 {0, 1}2000 pθ(x|z, y) = N(x|µθ(z, y), diag(σ2 θ (z, y))) (8) pθ(w|z, y) = Ber(w|πθ(z, y)) (9) pθ(x|a) = N(x|µθ(a), diag(σ2 θ (a))) (10) pθ(w|a) = Ber(w|πθ(a)) (11) y {0, 1}38 qφ(y|x, w) = Ber(y|πθ(x, w)) (12) SS-MVAE SS-HMVAE ∗2 http://www.ﬂickr.com ∗3 http://www.cs.toronto.edu/˜nitish/multimodal/index.html SS-HMVAE MC=10 SS-MVAE MAP MAP HMVAE 5. 2 ∗4 https://github.com/Thean ∗5 https://github.com/Lasag ∗6 https://github.com/masa- ∗7 [ 16] LRAP MAP 3 4.2 x w R3857 {0, 1}2000 pθ(x|z, y) = N(x|µθ(z, y), diag(σ2 θ (z, y))) (8) pθ(w|z, y) = Ber(w|πθ(z, y)) (9) pθ(x|a) = N(x|µθ(a), diag(σ2 θ (a))) (10) pθ(w|a) = Ber(w|πθ(a)) (11) y {0, 1}38 qφ(y|x, w) = Ber(y|πθ(x, w)) (12) SS-MVAE SS-HMVAE ∗2 http://www.ﬂickr.com ∗3 http://www.cs.toronto.edu/˜nitish/multimodal/index.html MAP MAP HMVAE 5. 2 ∗4 https://github.com/Thea ∗5 https://github.com/Lasag ∗6 https://github.com/masa ∗7 [ 16] LRAP MAP 3 Semi- ultimodal Variational AutoEn- |a)pθ(w|a)pθ(a|z, y)p(z)p(y) qφ(a, z|x, w, y) ] (6) p(z) = N(z|0, I) (13) p(y) = Ber(y|π) (14) pθ(a|z, y) = N(a|µθ(z, y), diag(σ2 θ (z, y))) (15) qφ(a|x, w) = N(z|µθ(x, w), diag(σ2 θ (x, w))) (16) qφ(z|a, y) = N(z|µθ(a, y), diag(σ2 θ (a, y))) (17) rectiﬁed linear unit Adam [Kingma 14b]
13. 13. ¤ Tars ¤ Tars ¤ ¤ ¤ Github https://github.com/masa-su/Tars P(A,B,C,D)=P(A)P(B∣A)P(C∣A)P(D∣A,B)
14. 14. Tars ¤ VAE x = InputLayer((None,n_x)) q_0 = DenseLayer(x,num_units=512,nonlinearity=activation) q_1 = DenseLayer(q_0,num_units=512,nonlinearity=activation) q_mean = DenseLayer(q_1,num_units=n_z,nonlinearity=linear) q_var = DenseLayer(q_1,num_units=n_z,nonlinearity=softplus) q = Gauss(q_mean,q_var,given=[x]) 0(!|") z = InputLayer((None,n_z)) p_0 = DenseLayer(z,num_units=512,nonlinearity=activation) p_1 = DenseLayer(p_0,num_units=512,nonlinearity=activation) p_mean = DenseLayer(p_1,num_units=n_x,nonlinearity=sigmoid) p = Bernoulli(p_mean,given=[z]) 3("|!) model = VAE(q, p, n_batch=n_batch, optimizer=adam) lower_bound_train = model.train([train_x])
15. 15. Tars ¤ ¤ z = q.sample_given_x(x) # z = q.sample_mean_given_x(x) # log_likelihood = q.log_likelihood_given_x(x, z) • • !~0(!|") log 0 (!|")
16. 16. ¤ Flickr25k ¤ ¤ 38 one-hot ¤ 3,857 2,000 ¤ ¤ 100 2 5000 -> 97 5000 desert, nature, landscape, sky rose, pink clouds, plant life, sky, tree flower, plant life
17. 17. ¤ ¤ SS-MVAE ¤ SS-HMVAE ¤ ¤ SVM DBN Autoencoder DBM JMVAE ¤ mean average precision (mAP) ¤ 3. 3.1 DL = {(x1, w1, y1, ), ..., (xM , wM , yM )} xi wi y ∈ {0, 1}C ∗1 DU = {(x1, w1, y1, ), ..., (xN , wN )} M << N q(y|x, w) 3.2 JMVAE joint multimodal variational autoencoder JMVAE [Suzuki 16][ 16] z x w p(x, w) = pθ(x|z)pθ(w|z)pθ(z)dz VAE JMVAE θ θ −UJMV AE(x, w) log p(x, w) = log pθ(x, w, z)dz ≥ Eqφ(z|x,w)[log pθ(x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] ≡ −UJMV AE(x, w) (1) qφ(z|x, w) φ (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 3. 3.1 DL = {(x1, w1, y1, ), ..., (xM , wM , yM )} xi wi y ∈ {0, 1}C ∗1 DU = {(x1, w1, y1, ), ..., (xN , wN )} M << N q(y|x, w) 3.2 JMVAE joint multimodal variational autoencoder JMVAE [Suzuki 16][ 16] z x w p(x, w) = pθ(x|z)pθ(w|z)pθ(z)dz VAE JMVAE θ θ −UJMV AE(x, w) log p(x, w) = log pθ(x, w, z)dz ≥ Eqφ(z|x,w)[log pθ(x|z)pθ(w|z)pθ(z) qφ(z|x, w) ] ≡ −UJMV AE(x, w) (1) qφ(z|x, w) φ (a) SS-MVAE (b) SS-HMVAE 1: qφ(y|x, w) log p(x, w) = log pθ(x, w, z, y)dzdy ≥ Eqφ(z,y|x,w)[log pθ(x|z, y)pθ(w|z, y)pθ(z) qφ(z, y|x, w) ] ≡ −U(x, w) (3) qφ(z, y|x, w) = qφ(z|x, w, y)qφ(y|x, w) qφ(y|x) 2 SS-MVAE SS-HMVAE
18. 18. mAP SVM [Huiskes+] 0.475 DBN [Srivastava+]* 0.609 Autoencoder [Ngiam+]* 0.612 DBM [Srivastava+]* 0.622 JMVAE [Suzuki+] 0.618 SS-MVAE (MC=1) 0.612 SS-MVAE (MC=10) 0.626 SS-HMVAE (MC=1) 0.632 SS-HMVAE (MC=10) 0.628 • • SS-HMVAE • • • • * • MC
19. 19. ¤ mAP validation curve MAP 0.618 SS-MVAE (MC=1) 0.612 SS-HMVAE (MC=1) 0.632 SS-MVAE (MC=10) 0.626 SS-HMVAE (MC=10) 0.628 2: MAP Flickr retrie internation trieval, pp. [Ioﬀe 15] Ioﬀe Acceleratin covariate sh [Jang 16] Jan cal Repara preprint ar [Kingma 13] Auto-encod arXiv:1312 [Kingma 14a] and Wellin generative Processing [Kingma 14b] stochastic o (2014) [Maaløe 16] M • SS-HMVAE • SS-MVAE JMVAE
20. 20. ¤ MIR Flickr25k ¤ ¤ ¤ ¤ ¤ ¤ RGB-D
21. 21. ¤ ¤ JMVAE SS-HMVAE SS-MVAE ¤ Tars ¤ ¤ SS-HMVAE ¤ ¤ ¤ ¤ GAN VAT