Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep generative model.pdf

4,336 views

Published on

Tutorial of VAE and vanila-GAN

Published in: Software

Deep generative model.pdf

  1. 1. DEEP GENERATIVE MODELS VAE & GANs TUTORIAL 조형주 DeepBio DeepBio 1
  2. 2. DEEP GENERATIVE MODELs DeepBio 2
  3. 3. WHAT IS GENERATIVE MODEL (x) = g (z) https://blog.openai.com/generative‐models/ pθ^ θ DeepBio 3 Z ' . feature space . l Assumption ) for Connectionist . % TZMTTYVCT CNOTSAMPLINCT )
  4. 4. WHY GENERATIVE The new way of simulating applied math/engineering domain Combining with Reinforcement Learning Good for semi‐supervised learning Can work with multi‐modal output Can make data with realitic generation DeepBio 4 ZAN 's TALK W wynsrolb , & mmm
  5. 5. https://blog.openai.com/generative‐models/ DeepBio 5 ROMANLEFORKZNGRATZEALZANE Hfkf . | stziotvum BIT take UKA httonl HE 81 HIM .
  6. 6. TAXONOMIC TREE OF GENERATIVE MODEL GAN_Tutorial from Ian Goodfellow DeepBio 6 nonsense 'M µ%%aHk . ( www.WN ) ⇒TA( zuqinyn . RBM CGAMPLZNCT)
  7. 7. TOY EXAMPLE DeepBio 7
  8. 8. Generative model (x) = g (z) Let z ∼ N(0, 1) Let g be a neural networks with transpose convolutional layers ﴾So Nice !!﴿ x ∼ X : MNIST dataset L2 Loss ﴾Mean Square Error﴿ p^θ θ DeepBio 8 FEATURE SPACE PARAMHZZZFD BY O =) MA×2MUMw4L2k2e2H= if font I - No , E) p(Ylx7 ~ N(g(Ho) , f) -8440) : learner parfait . LIPIO ) = by Tlpcai , yi ) = Ggtplyiiai ) Paci ) =6gTNYiK) +6ft Ma ) argmoaxtulokartmfncogtplesilnt = hytrstrexp thrashing = - Nloy FE6 - Ira ( Itt - genital P )
  9. 9. Generator ﴾TF‐code﴿ DeepBio 9 ¥ ' # " Enge .me#..
  10. 10. Results ... Maybe we need more conditions... DeepBio 10 so MZTHZNCT WRONCT . z GGHO) :
  11. 11. VARIATIONAL AUTO‐ENCODER DeepBio 11
  12. 12. Notations x : Observed data, z : Latent variable p(x) : Evidence, p(z) : Prior p(x∣z) : Likelihood, p(z∣x) : Posterior Probabilistic Model Difined As Joint Distribution of x, z p(x, z) DeepBio 12 I t.FM#yfnsPH=niaeH= # eK=xd= Fj p(y=yzA=yrji G- § no , r ; s Fini,plrx-aifsEPk-aiiEahPCZZ1zIPCx.nayt.ls@TkneNZ-zslx.x ;) = I. Pixar , 2- = th = ¥ = ¥ Fu =P IEE , I # ni ) Paxil PRIME
  13. 13. Model p(x, z) = p(x∣z)p(z) Our Interest Is Posterior !! p(z∣x) p(z∣x) = : Infer Good Value of z Given x p(x) = p(x, z)dz = p(x∣z)p(z)dz p(x) is Hard to calculate﴾INTRACTABLE﴿ Approaximate Posterior p(x) p(x∣z)p(z) ∫ ∫ DeepBio 13 4th = HE rhueirjsinry 02.2 Latent variable 2 't 3h17 , " I did ? *t#N Bayesian Znfercnce £ OBERVABUE . ZMP2Rzc#= ( BAYHZAN ) ( PRODUCT RULE ) SAMMY gMY#*oHAstw4HARD JOB 't guys 's .si#PuN4 :
  14. 14. Variational Inference Pick a familiy of distributions over the latent variables with its own variational parameters, q (z∣x) Find ϕ that makes q close to the posterior of interest ϕ DeepBio 14 www.vpaste.MN @ 0 At KNT DON'T Access ) PARAMZRZZZD ¢ we -0 / - Measure ⇒ USZNCT Vz , SAMPLZNCT PROBLEM maw " pqootp → OPTIMZZZNLT PROBLEM , y → gfor gaussian , ( µ , r ) for uniform , ( *min , * max ) i.
  15. 15. KULLBACK LEIBLER DIVERGENCE Only if Q(i) = 0 implies P(i) = 0, for all i, Measure of the non‐symmetric difference between two probability distributions P and Q KL(P∣∣Q) = p(x) log dx∫ q(x) p(x) = p(x) log p(x)dx − p(x) log q(x)dx∫ ∫ DeepBio 15 Pe > Qc c) equivalent . =) Q > P at Ewan th Kee Pnt ' Ehyiohf %Z Malek 3h we ioy M 39 , Qcij ' t o 4mL Ki ) ' to Terminator . fiercely=) KENTROPY - ENTROPY " BECAUSE 67 ENTROPY , NWTYMIETRK ENTROPY = UNCERTAINTY
  16. 16. Property The Kullback Leibler divergence is always non‐negative, KL(P∣∣Q) ≥ 0 DeepBio 16 kl ( MIQ ) =/ pimhg My dn Hmm @tTH )
  17. 17. Proof X − 1 ≥ log X ⇒ log ≥ 1 − X Using this, X 1 KL(P∣∣Q) = p(x) log dx∫ q(x) p(x) ≥ p(x) 1 − dx∫ ( p(x) q(x) ) = {p(x) − q(x)}dx∫ = p(x)dx − q(x)dx∫ ∫ = 1 − 1 = 0 DeepBio 17 ¥t¥Ia⇒#i* ⇒ " :# * , Lee : :
  18. 18. Relationship with Maximum Likelihood Estimation For minimizaing KL Divergence, ϕ = argmin − p(x) log q(x; ϕ)dx KL(P∣∣Q; ϕ) = p(x) log dx∫ q(x; ϕ) p(x) = p(x) log p(x)dx − p(x) log q(x; ϕ)dx∫ ∫ ∗ ϕ ( ∫ ) DeepBio 18 PC a) = qlk ; 0 ) ⇒ KLCPHQ ) -0 .
  19. 19. Maximizing Likelihood is equivalent to minimizing KL Divergence ϕ∗ = argmin − p(x) log q(x; ϕ)dxϕ ( ∫ ) = argmax p(x) log q(x; ϕ)dxϕ ∫ = argmax E [log q(x; ϕ)]ϕ x∼p(x;ϕ) ≊ argmax Σ log q(x ; ϕ)ϕ [ N 1 i N i ] DeepBio 19 ¥ III?Iki : :*;¥a - LOLTLZKZLZHOLOD
  20. 20. JENSEN'S INEQUALITY For Concave Function, f(E[x]) ≥ E[f(x)] For Conveax Function, f(E[x]) ≤ E[f(x)] DeepBio 20 An c. Norte .#T¥±i:#⇒fftn )
  21. 21. Evidence Lower BOund log p(x) = log p(x, z)dx∫ z = log p(x, z) dx∫ z q(z) q(z) = log q(z) dx∫ z q(z) p(x, z) = log E dxq q(z) p(x, z) ≥ E [log p(x, z)] −E [log q(z)]q q DeepBio 21 of i. WZU KNOWN PROBABZLISTZC Xdz DZSTRZBUTZON - ÷of www.?EeumgD*µ÷⇒=***n⇒× dz - zefso LOCTPCHE 2-430am on at .
  22. 22. Variational Distribution q (z∣x) = argmin KL(q (z∣x)∣∣p (z∣x)) Choose a family of variational distributions﴾q﴿ Fit the parameter﴾ϕ﴿ to minimize the distance of two distribution﴾KL‐Divergence﴿ ϕ ∗ ϕ ϕ θ DeepBio 22 go.fi#ttI ( RZVZRSZ KL DWERCTZNEE )
  23. 23. KL Divergence KL(q (z∣x)∣∣p (z∣x))ϕ θ =E logqϕ [ p (z∣x)θ q (z∣x)ϕ ] =E log q (z∣x) − log p (z∣x)qϕ [ ϕ θ ] =E log q (z∣x) − log p (z∣x)qϕ [ ϕ θ p (x)θ p (x)θ ] =E log q (z∣x) − log p (x, z) + log p (x)qϕ [ ϕ θ θ ] =E [log q (z∣x) − log p (x, z)] + log p (x)qϕ ϕ θ θ DeepBio 23 1KZVERSE)
  24. 24. Object q (z∣x) = argmin E log q (z∣x) − log p (x, z) + log p (x) q (z∣x) is negative ELBO plus log marginal probability of x log p (x) does not depend on q Minimizing the KL divergence is the same as maximizing the ELBO q (z∣x) = argmax ELBO ϕ ∗ ϕ [ qϕ [ ϕ θ ] θ ] ϕ ∗ θ ϕ ∗ ϕ DeepBio 24 frtoolmyttmmee MZNZMZZZKL 7430 →• → hfpdn ) 6 mm - EUBO
  25. 25. Variational Lower Bound For each data point x , marginal likelihood of individual data pointi log p (x )θ i ≥ L(θ, ϕ; x )i =E − log q (z∣x ) + log p (x , z)q (z∣x )ϕ i [ ϕ i θ i ] =E log p (x ∣z)p (z) − log q (z∣x )q (z∣x )ϕ i [ θ i θ ϕ i ] =E log p (x ∣z) − (log q (z∣x ) − log p (z))q (z∣x )ϕ i [ θ i ϕ i θ ] =E log p (x ∣z) −E logq (z∣x )ϕ i [ θ i ] q (z∣x )ϕ i [( p (z)θ q (z∣x )ϕ i )] =E log p (x ∣z) − KL q (z∣x )∣∣p (z)q (z∣x )ϕ i [ θ i ] (( ϕ i θ )) DeepBio 25 EUBO Infarct IT a - µAxvM2# ⇒ KLBIMZMMH ) #yq# EKKAVGATA
  26. 26. ELBO L(θ, ϕ; x ) =E log p (x ∣z) − KL q (z∣x )∣∣p (z) q (z∣x ) : proposal distribution p (z) : prior ﴾our belief﴿ How to Choose a Good Proposal Distribution Easy to sample Differentiable ﴾∵ Backprop.﴿ i q (z∣x )ϕ i [ θ i ] (( ϕ i θ )) ϕ i θ DeepBio 26 n posterior approximate → Earth 4h . ) → CTAVKZAN
  27. 27. Maximizing ELBO ‐ I L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z) ϕ = argmax E log p(x ∣z) E log p(x ∣z) : Log‐Likelihood ﴾NOT LOSS﴿ Maximize likelihood for maximizing ELBO ﴾NOT MINIMIZE!!﴿ i q (z∣x )ϕ i [ i ] (( ϕ i )) ∗ ϕ q (z∣x )ϕ i [ i ] q (z∣x )ϕ i [ i ] DeepBio 27 ( Lott ruklrtloob )
  28. 28. Log Likelihood In case of Bernoulli distribution p(x∣z) is, E log p(x∣z) = x log p(y ) + (1 − x ) log(1 − p(y )) For maximize it, minimize Negative Log Likelihood !! Loss = − [x log( ) + (1 − x ) log(1 − )] Already know as Sigmoid Cross‐Entropy is output of Decoder We call it Reconstructure Loss q (z∣x)ϕ i=1 ∑ n i i i i n 1 i=1 ∑ n i x^i i x^i x^i DeepBio 28 normalisation I L f : the output is 4 , i ] ) * zl £CH ( or Binomial Cross Entropy ) Zn ale of Faustian distribution , loss = L 2 los } ( Mk )
  29. 29. Maximizing ELBO ‐ II L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z) ϕ = argmin KL q (z∣x )∣∣p(z) Assume that prior and posterior approaximation are Gaussian ﴾actually it's not a critical issue...﴿ Then we can use KL Divergence according to definition Let prior be N(0, 1) How about q (z∣x ) ? i q (z∣x )ϕ i [ i ] (( ϕ i )) ∗ ϕ (( ϕ i )) ϕ i DeepBio 29
  30. 30. Posterior Posterior approaximation is Gaussian, q (z∣x ) = N(μ , σ ) where, (μ , σ ) is the output of Encoder ϕ i i i 2 i i DeepBio 30 if dimofznto ⇒ Nof µ , 6 =@ • • • •
  31. 31. Minimizing KL Divergence KL(q (z∣x)∣∣p(z)) = q (z) log q (z)dz − q (z) log p(z)dz q (z) log q (z∣x)dz = N(μ , σ ) log N(μ , σ )dz     = − log 2π − (1 + log σ ) q (z) log p(z)dz = N(μ , σ ) log N(0, 1)dz    = − log 2π − (μ + σ ) Therefore, KL(q (z∣x)∣∣p(z)) = 1 + log σ − μ − σ ϕ ∫ ϕ ϕ ∫ ϕ ∫ ϕ ϕ ∫ i i 2 i i 2 2 N 2 1 ∑N i 2 ∫ ϕ ∫ i i 2 2 N 2 1 ∑N i 2 i 2 ϕ 2 1 ∑ N [ i 2 i 2 i 2 ] DeepBio 31 for )"EFFI. ! Basic format
  32. 32. AUTO‐ENCODER Encoder : MLPs to Infer (μ , σ ) for q (z∣x ) Decoder : MLPs to Infer using latent variables ∼ N(μ, σ ) Is it differentiable? ﴾ = possible to backprop?﴿ i i ϕ i x^ 2 DeepBio 32 J I
  33. 33. REPARAMETERIZATION TRICK Tutorial on Variational Autoencoders DeepBio 33 NOT ABLE To 0 BACKPAY - Now , sampling process is independent To the model . 1- → D k ) I GAMPLENLT ( not ✓ armpit ) ( Tust constant )
  34. 34. Latent Code batch_size = 32 rand_dim = 50 z = tf.random_normal((batch_size, rand_dim)) Data load # MNIST input tensor ( with QueueRunner ) data = tf.sg_data.Mnist(batch_size=32) # input images x = data.train.image DeepBio 34 # All code is written using Sugar - tensor , KF wrapper) # number of 2- variables # normal distribution
  35. 35. Encoder # assume that std = 1 with tf.sg_context(name='encoder', size=4, stride=2, act='relu'):     mu = (x           .sg_conv(dim=64)           .sg_conv(dim=128)           .sg_flatten()           .sg_dense(dim=1024)           .sg_dense(dim=num_dim, act='linear'))            # re‐parameterization trick with random gaussian z = mu + tf.random_normal(mu.get_shape()) DeepBio 35 # down sampling Ya # down sanplny 112 A MPs # MLPS # assume that 6=1
  36. 36. Decoder with tf.sg_context(name='decoder', size=4, stride=2, act='relu'):     xx = (z           .sg_dense(dim=1024)           .sg_dense(dim=7*7*128)           .sg_reshape(shape=(‐1, 7, 7, 128))           .sg_upconv(dim=64)           .sg_upconv(dim=1, act='sigmoid')) DeepBio 36 ) # MPs → reshape to 4h - tensor # transpose convnet If transpose com net
  37. 37. Losses loss_recon = xx.sg_mse(target=x, name='recon').sg_mean(axis=[1,  loss_kld = tf.square(mu).sg_sum(axis=1) / (28 * 28) tf.sg_summary_loss(loss_kld, name='kld') loss = loss_recon + loss_kld * 0.5 DeepBio 37 yla loss ( , ⇒ ⇒ asset
  38. 38. Train # do training tf.sg_train(loss=loss, log_interval=10, ep_size=data.train.num_batch,             save_dir='asset/train/vae') DeepBio 38
  39. 39. Results DeepBio 39 BLURRY ZMAFZ
  40. 40. Features Advantage Fast and Easy to train We can check the loss and evaluate Disadvantage Low Quality Even though q reached the optimal point, it is quite different with p Issues Reconstruction loss ﴾x‐entropy, L1, L2, ...﴿ MLPs structure Regularizer loss ﴾sometimes don't use log, sometimes use exp, ...﴿ ... DeepBio 40
  41. 41. GENERATIVE ADVERSARIAL NETWORKS DeepBio 41
  42. 42. DeepBio 42terry vm 's facebook page .
  43. 43. DeepBio 43
  44. 44. DeepBio 44
  45. 45. Value Function min max V (D, G) =E [log D(x)] +E [log(1 − D(G(z)))] For second term, E [log(1 − D(G(z)))] D want to maximize it → Do not fool G want to minimize it → Fool G D x∼p (x)data z∼p (z)z z∼p (z)z DeepBio 45 D
  46. 46. Example DeepBio 46 around trunk for tlznit for fixed 'T Zteration ~ → - , Is
  47. 47. Global Optimulity of p = p D (x) = note that 'FOR ANY GIVEN generator G' g data G ∗ p + p (x)data g p (x)data DeepBio 47 olaphoz of Ham CT4 output 't original data st Foot 2tt . 14
  48. 48. Proof For G fixed, V (G, D) = p (x) log(D(x))dx + p (z) log(1 − D(G(z))dz = p (x) log(D(x)) + p (x) log(1 − D(x))dx Let X = D(x),   a = p (x),   b = p (x). So, V = a log X + b log(1 − X) Find X which can maximize the value function V . ∇ V ∫x r ∫z g ∫x r g r g X DeepBio 48 8-8# d- pug if P . = Pg , then pay =D ( GCZI ) alternate @ € # [ 1
  49. 49. Proof ∇ VX = ∇ a log X + b log(1 − X)X ( ) = ∇ a log X + ∇ b log(1 − X)X X = a + b X 1 1 − X −1 = X(1 − X) a(1 − X) − bX = X(1 − X) a − aX − bX = X(1 − X) a − (a + b)X DeepBio 49
  50. 50. Proof Find the solution of this, f(X) = a − (a + b)X DeepBio 50
  51. 51. Proof Find the solution of this, f(X) = a − (a + b)X Solution, Function f(X) is monotone decreasing. ∴ is the maximum point of f(X). a − (a + b)X = 0 (a + b)X = a X = a + b a a+b a DeepBio 51 ← fix ) has maximum point
  52. 52. Theorem The global minimum of the virtual training criterion L(D, g ) is achieved if and only if p = p . At that point, L(D, g ) achieves the value − log 4. θ g r θ DeepBio 52
  53. 53. Proof L(D , g ) = max V (G, D)∗ θ D =E [log D (x)] +E [log(1 − D (G(z)))]x∼pr G ∗ z∼pz G ∗ =E [log D (x)] +E [log(1 − D (x))]x∼pr G ∗ x∼pg G ∗ =E [log ] +E [log ]x∼pr p (x) + p (x)r g p (x)r x∼pg p (x) + p (x)r g p (x)g =E [log ] +E [log ] + log 4 − log 4x∼pr p (x) + p (x)r g p (x)r x∼pg p (x) + p (x)r g p (x)g =E [log ] + log 2 +E [log ] + log 2 − log 4x∼pr p (x) + p (x)r g p (x)r x∼pg p (x) + p (x)r g p (x)g =E [log ] +E [log ] − log 4x∼pr p (x) + p (x)r g 2p (x)r x∼pg p (x) + p (x)r g 2p (x)g DeepBio 53 ← fixed D , find EF ) it , Preae =P gen .
  54. 54. where JS is Jensen‐Shannon Divergence difined as JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M) where, M = (P + Q) ∵ JS always ≥ 0, then − log 4 is global minimum   =E [log(p (x)/ )] +E [log(p (x)/ ] − log 4x∼pr r 2 p (x) + p (x)r g x∼pg g 2 p (x) + p (x)r g = KL[p (x)∣∣ ] + KL[p (x)∣∣ ] − log 4r 2 p (x) + p (x)r g g 2 p (x) + p (x)r g = −log4 + 2JS(p (x)∣∣p (x))r g 2 1 2 1 2 1 DeepBio 54
  55. 55. Jensen‐Shannon Divergence JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M) Two types of KL Divergence KL(P∣∣Q) : Maximum liklihood. Approximations Q that overgeneralise P KL(Q∣∣P) : Reverse KL Divergence. tends to favour under‐generalisation. The optimal Q will typically describe the single largest mode of P well Jensen Divergence would exhibit a behaviour that is kind of halfway between the two extremes above 2 1 2 1 DeepBio 55
  56. 56. DeepBio 56
  57. 57. DeepBio 57
  58. 58. Training Cost Function For D J = − E log D(x) − E log(1 − D(G(z))) Typical cross entropy with label 1, 0 ﴾Bernoulli﴿ Cost Function For G J = − E log(D(G(z))) Maximize log D(G(z)) instead of minimizing log(1 − D(G(z))) ﴾cause vanishing gradient﴿ Also standard cross entropy with label 1 Really Good this way is?? (D) 2 1 x∼pdata 2 1 z (G) 2 1 z DeepBio 58
  59. 59. Secret of G Loss We already know that E [∇ log(1 − D (g (z)))] = ∇ 2JS(P ∣∣P ) Furthurmore, z θ ∗ θ θ r g KL(P ∣∣P )g r =E logx[ p (x)r p (x)g ] =E log −E logx[ p (x)r p (x)g ] x[ p (x)g p (x)g ] =E log − KL(P ∣∣P )x[ 1 − D (x)∗ D (x)∗ ] g g =E log − KL(P ∣∣P )x[ 1 − D (g (z))∗ θ D (g (z))∗ θ ] g g DeepBio 59 ( from Martin )
  60. 60. Taking derivatives in θ at θ we get Subtracting this last equation with result for JSD, E [−∇ log D (g (z))] = ∇ [KL(P ∣∣P ) − JS(P ∣∣P )] JS push for the distributions to be different, which seems like a fault in the update KL appearing here assigns an extremely high cost to generation fake looking samples, and an extremely low cost on mode dropping 0 ∇ KL(P ∣∣P )θ gθ r = −∇ E log − ∇ KL(P ∣∣P )θ z[ 1 − D (g (z))∗ θ D (g (z))∗ θ ] θ gθ gθ =E −∇ logz[ θ 1 − D (g (z))∗ θ D (g (z))∗ θ ] z θ ∗ θ θ gθ r gθ r DeepBio 60
  61. 61. DeepBio 61 Fagnant ascendoy - D ( Fcei 's )
  62. 62. Latent Code batch_size = 32 rand_dim = 50 z = tf.random_normal((batch_size, rand_dim)) Data load data = tf.sg_data.Mnist(batch_size=batch_size) x = data.train.image y_real = tf.ones(batch_size) y_fake = tf.zeros(batch_size) DeepBio 62 # Sugar tensor lode * seal label 1 # fake label °
  63. 63. Model D def discriminator(tensor):     # reuse flag     reuse = len([t for t in tf.global_variables() if t.name.startswit     with tf.sg_context(name='discriminator', size=4, stride=2, act=         res = (tensor                .sg_conv(dim=64, name='conv1')                .sg_conv(dim=128, name='conv2')                .sg_flatten()                .sg_dense(dim=1024, name='fc1')                .sg_dense(dim=1, act='linear', bn=False, name='fc2'                .sg_squeeze())         return res DeepBio 63
  64. 64. Model G def generator(tensor):     # reuse flag     reuse = len([t for t in tf.global_variables() if t.name.startswit     with tf.sg_context(name='generator', size=4, stride=2, act='leaky         # generator network         res = (tensor                .sg_dense(dim=1024, name='fc1')                .sg_dense(dim=7*7*128, name='fc2')                .sg_reshape(shape=(‐1, 7, 7, 128))                .sg_upconv(dim=64, name='conv1')                .sg_upconv(dim=1, act='sigmoid', bn=False, name='conv2         return res DeepBio 64
  65. 65. Call # generator gen = generator(z) # discriminator disc_real = discriminator(x) disc_fake = discriminator(gen) DeepBio 65
  66. 66. Losses # discriminator loss loss_d_r = disc_real.sg_bce(target=y_real, name='disc_real') loss_d_f = disc_fake.sg_bce(target=y_fake, name='disc_fake') loss_d = (loss_d_r + loss_d_f) / 2 # generator loss loss_g = disc_fake.sg_bce(target=y_real, name='gen') DeepBio 66
  67. 67. Train # train ops # Default optimizer : MaxProp train_disc = tf.sg_optim(loss_d, lr=0.0001, category='discriminator' train_gen = tf.sg_optim(loss_g, lr=0.001, category='generator')   # def alternate training func @tf.sg_train_func def alt_train(sess, opt):     l_disc = sess.run([loss_d, train_disc])[0]  # training discrimina     l_gen = sess.run([loss_g, train_gen])[0]  # training generator     return np.mean(l_disc) + np.mean(l_gen)      # do training alt_train(ep_size=data.train.num_batch, early_stop=False, save_dir= DeepBio 67
  68. 68. DeepBio 68
  69. 69. Results DeepBio 69
  70. 70. Features Advantage Advanced quality Disadvantage Unstable training Mode collapsing Issues Simple networks structure Loss selection...﴾alternative﴿ other conditions? DeepBio 70
  71. 71. DCGAN DeepBio 71
  72. 72. DeepBio 72
  73. 73. Network structure DeepBio 73
  74. 74. Tips DeepBio 74
  75. 75. Z Vector DeepBio 75
  76. 76. DeepBio 76
  77. 77. GAN HACKS DeepBio 77
  78. 78. Normalizing Input normalize the images between ‐1 and 1 Tanh as the last layer of the generator output A Modified Loss Function Like maximizing D(G(z)) instead of minimizing 1 − D(G(z)) Use a spherical Z Sample from a gaussian distribution rather that uniform DeepBio 78
  79. 79. XX Norm One label per one mini‐batch Batch norm, layer norm, instance norm, or batch renorm ... Avoid Sparse Gradients : Relu, MaxPool the stability of the GAN game suffers if you have sparse gradients leakyRelu = good ﴾in both G and D﴿ For down sampling, use : AVG pooling, strided conv For up sampling, use : Conv_transpose, PixelShuffle DeepBio 79
  80. 80. Use Soft and Noisy Lables real : 1 ‐> 0.7 ~ 1.2 fake : 0 ‐> 0.0 ~ 0.3 flip for discriminator﴾occasionally﴿ ADAM is Good SGD for D, ADAM for G If you have labels, use them go to the Conditional GAN DeepBio 80
  81. 81. Add noise to inputs, decay over time add some artificial noise to inputs to D adding gaussian noise to every layer of G Use dropout in G in both train and test phase Provide noise in the form of dropout Apply on several layers of our G at both traing and test time DeepBio 81
  82. 82. GAN in Medical DeepBio 82
  83. 83. Tumor segmentation DeepBio 83
  84. 84. Metal artifact reduction DeepBio 84
  85. 85. Thank you DeepBio 85

×