1
GANͱΤωϧΪʔϕʔεϞσϧ
Shohei Taniguchi, Matsuo Lab (M2)
֓ཁ
ҎԼͷ3ຊͷ࿦จΛϕʔεʹɺGANͱEBMͷؔ܎ʹ͍ͭͯ·ͱΊΔ


Deep Directed Generative Models with Energy-Based Probability Estimatio
n

https://arxiv.org/abs/1606.03439


Maximum Entropy Generators for Energy-Based Model
s

https://arxiv.org/abs/1901.08508


Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven
Latent Samplin
g

https://arxiv.org/abs/2003.06060
Outline
લఏ஌ࣝ


• Generative Adversarial Network


• Energy-based Model


GANͱEBMͷྨࣅ఺


࿦จ঺հ
Generative Adversarial Network
[Goodfellow et al.,2014]
ࣝผ‫ث‬ ͱੜ੒‫ث‬ ͷϛχϚοΫεήʔϜ






ࣝผ‫ث‬͸ Λ࠷େԽ͠ɺੜ੒‫ث‬͸ Λ࠷খԽ
Dθ Gϕ
Dθ : ℝdx → [0,1], Gϕ : ℝdz → ℝdx
ℒ (θ, ϕ) =
𝔼
p(x) [log Dθ(x)] +
𝔼
p(z) [log (1 − Dθ (Gϕ (z)))]
ℒ ℒ
GANͷֶश
GANͷߋ৽ࣜ͸


ℒ (θ, ϕ) =
N
∑
i=1
log Dθ(xi) + log (1 − Dθ (Gϕ (zi)))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
GANͷҰൠతͳղऍ
ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬
ࣝผ‫ث‬͸σʔλ෼෍ ͱੜ੒αϯϓϧͷ෼෍ ͷ
ີ౓ൺਪఆ‫ͯ͠ͱث‬ͷ໾ׂΛՌͨ͢


i.e., ࣝผ‫࠷͕ث‬దͳͱ͖


p (x) pϕ (x) =
𝔼
p(z) [p (Gϕ (z))]
D*
θ
(x) =
p (x)
p (x) + pϕ (x)
GANͷҰൠతͳղऍ
ੜ੒‫ث‬ͷֶश͸JS divergenceͷ࠷খԽ
ࣝผ‫࠷͕ث‬దͳͱ͖






ੜ੒‫ث‬ ͸σʔλ෼෍ͱͷJensen-Shannon divergence࠷খԽʹΑΓֶश͞ΕΔ
ℒ (θ, ϕ) = JS (p (x) ∥ pϕ (x)) − 2 log 2
JS (p (x) ∥ pϕ (x)) =
1
2
KL
(
p (x) ∥
p (x) + pϕ (x)
2 )
+
1
2
KL
(
pϕ (x) ∥
p (x) + pϕ (x)
2 )
Gϕ
Trick
−log D
ΦϦδφϧͷϩεͩͱɺޯ഑ফࣦ͕‫͜ى‬Γ΍͍͢ͷͰɺ
‫ޙ‬൒ΛҎԼͷΑ͏ʹஔ͖‫͑׵‬ΔτϦοΫ͕Α͘࢖ΘΕΔ




ͨͩ͠ɺ͜ͷ৔߹͸ີ౓ൺਪఆʹ‫ͮ͘ج‬ղऍ͸੒Γཱͨͳ͍
ℒ (θ, ϕ) =
𝔼
p(x) [log Dθ(x)] −
𝔼
p(z) [log Dθ (Gϕ (z))]
GANͷ೿ੜ
σʔλ෼෍ͱͷ‫཭ڑ‬ͷࢦඪΛJSҎ֎ʹม͑Δͱɺ༷ʑͳGANͷ೿ੜ‫࡞͕ܥ‬ΕΔ


ྫɿWasserstein GAN




͸1-Ϧϓγοπͳؔ਺ʢ ʣ


͜ͷͱ͖ɺੜ੒‫ث‬ͷֶश͸1-Wasserstein distanceͷ࠷খԽͱͳΔ
ℒ (θ, ϕ) =
𝔼
p(x) [Dθ(x)] −
𝔼
p(z) [Dθ (Gϕ (z))]
Dθ ℝdx → ℝ
Energy-based Model
ΤωϧΪʔؔ਺ Ͱ֬཰ϞσϧΛද‫͢ݱ‬Δ




͸ෛͷର਺໬౓ ʹൺྫ
Eθ (x)
pθ (x) =
exp (−Eθ (x))
Z (θ)
Z (θ) =
∫
exp (−Eθ (x)) dx
Eθ (x) −log pθ (x)
EBMͷֶश
Contrastive Divergence
EBMͷର਺໬౓ͷޯ഑͸




‫܇‬࿅σʔλͷΤωϧΪʔΛԼ͛ͯɺϞσϧ͔ΒͷαϯϓϧͷΤωϧΪʔΛ্͛Δ
∇θlog pθ (x) = − ∇θEθ (x) + ∇θlog Z (θ)
= − ∇θEθ (x) +
𝔼
x′

∼pθ(x) [∇θEθ (x′

)]
EBM͔ΒͷαϯϓϦϯά
Langevin dynamics
ޯ഑ϕʔεͷMCMC




͜ͷߋ৽ࣜͰ‫܁‬Γฦ͠αϯϓϦϯά͢Δͱɺαϯϓϧͷ෼෍͸ ʹऩଋ͢Δ
x ← x − η∇xEθ [x] + ϵ
ϵ ∼ Normal (0,2ηI)
pθ (x)
ޯ഑߱Լ๏ʹϊΠζ͕ͷͬͨ‫ܗ‬
EBMͷֶश
Contrastive Divergence
·ͱΊΔͱɺEBMͷߋ৽ࣜ͸


ℒ (θ, x′

) = −
N
∑
i=1
Eθ (xi) + Eθ (x′

i)
θ ← θ + ηθ ∇θℒ (θ, x′

)
x′

i ← x′

i − ηx′

∇x′

ℒ (θ, x′

) + ϵ
ϵ ∼ Normal (0,2ηI)
EBMͷֶश
Contrastive Divergence
·ͱΊΔͱɺEBMͷߋ৽ࣜ͸


ℒ (θ, x′

) = −
N
∑
i=1
Eθ (xi) + Eθ (x′

i)
θ ← θ + ηθ ∇θℒ (θ, x′

)
x′

i ← x′

i − ηx′

∇x′

ℒ (θ, x′

) + ϵ
ϵ ∼ Normal (0,2ηI)
Α͘‫ݟ‬ΔͱGANͬΆ͍
GANͷߋ৽ࣜ
ℒ (θ, ϕ) =
N
∑
i=1
log Dθ(xi) + log (1 − Dθ (Gϕ (zi)))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
GANͷߋ৽ࣜ with -logDτϦοΫ
ℒ (θ, ϕ) =
N
∑
i=1
log Dθ(xi) − log Dθ (Gϕ (zi))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
ͱ͓͘ͱ
Eθ (x) = − log Dθ (x)
ℒ (θ, ϕ) = −
N
∑
i=1
Eθ (xi) + Eθ (Gϕ (zi))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
GANͷߋ৽ࣜ with -logDτϦοΫ
GANͱEBMͷྨࣅੑ
ℒ (θ, ϕ) = −
N
∑
i=1
Eθ (xi) + Eθ (Gϕ (zi))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
ℒ (θ, x′

) = −
N
∑
i=1
Eθ (xi) + Eθ (x′

i)
θ ← θ + ηθ ∇θℒ (θ, x′

)
x′

i ← x′

i − ηx′

∇x′

ℒ (θ, x′

) + ϵ
ϵ ∼ Normal (0,2ηI)
GAN with - logD trick EBM
ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏
GANͱEBMͷྨࣅੑ
ℒ (θ, ϕ) = −
N
∑
i=1
Eθ (xi) + Eθ (Gϕ (zi))
θ ← θ + ηθ ∇θℒ (θ, ϕ)
ϕ ← ϕ − ηϕ ∇ϕℒ (θ, ϕ)
zi ∼ Normal (0,I)
ℒ (θ, x′

) = −
N
∑
i=1
Eθ (xi) + Eθ (x′

i)
θ ← θ + ηθ ∇θℒ (θ, x′

)
x′

i ← x′

i − ηx′

∇x′

ℒ (θ, x′

) + ϵ
ϵ ∼ Normal (0,2ηI)
GAN with - logD trick EBM
ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏
ߋ৽ʹϊΠζ͕ͷΔ
αϯϓϧΛ௚઀ߋ৽͢Δ୅ΘΓʹ
ϊΠζ͔ΒαϯϓϧΛੜ੒͢Δؔ਺ Λ
ߋ৽͢Δ
Gϕ
࿦จ঺հ
EBMͷֶशΛGANΈ͍ͨʹੜ੒‫ث‬Λ࢖ͬͯͰ͖ͳ͍͔ʁ


➡ ࿦จ1, 2


GANͷࣝผ‫ث‬ΛΤωϧΪʔؔ਺ͱΈͳ͢ͱɺੜ੒࣌ʹࣝผ‫ث‬Λ࢖͑ΔͷͰ͸ʁ


➡ ࿦จ3
Taesup Kim,Yoshua Bengio (Université de Montréal)
Deep Directed Generative Models


with Energy-Based Probability Estimation
https://arxiv.org/abs/1606.03439
EBMͷֶश
Contrastive Divergence
EBMͷର਺໬౓ͷޯ഑




͔ΒͷαϯϓϦϯάΛ ͔ΒͷαϯϓϦϯάͰஔ͖‫͑׵‬Δ
∇θlog pθ (x) = − ∇θEθ (x) +
𝔼
x′

∼pθ(x) [∇θEθ (x′

)]
≈ − ∇θEθ (x)+
𝔼
z∼p(z) [∇θEθ (Gϕ (z))]
pθ (x) Gϕ (z)
ੜ੒‫ث‬ͷֶश
ͱ͢Δͱɺ ͱͳΕ͹ྑ͍ͷͰ
͜ͷ2ͭͷ෼෍ͷKL divergenceΛ࠷খԽ͢Δ͜ͱͰֶश͢Δ


pϕ (x) =
𝔼
p(z) [δ (Gϕ (z))] pθ (x) = pϕ (x)
KL (pϕ∥ pθ) =
𝔼
pϕ [−log pθ (x)] − H (pϕ)
αϯϓϧͷΤωϧΪʔΛ
Լ͛Δ
αϯϓϧͷΤϯτϩϐʔΛ
্͛Δ
ੜ੒‫ث‬ͷֶश
ͳͥΤϯτϩϐʔ߲͕ඞཁ͔


΋͠Τϯτϩϐʔ߲͕ͳ͍ͱɺੜ੒‫ث‬͸ΤωϧΪʔ͕࠷খʢʹີ౓͕࠷େʣͷ
αϯϓϧͷΈΛੜ੒͢ΔΑ͏ʹֶशͯ͠͠·͏


‣ GANͷmode collapseͱࣅͨΑ͏ͳ‫ݱ‬৅


͜ΕΛ๷͙ͨΊʹΤϯτϩϐʔ߲͕ඞཁ
KL (pϕ∥ pθ) =
𝔼
pϕ [−log pθ (x)] − H (pϕ)
αϯϓϧͷΤωϧΪʔΛ
Լ͛Δ
αϯϓϧͷΤϯτϩϐʔΛ
্͛Δ
ੜ੒‫ث‬ͷֶश
ୈ1߲ͷޯ഑͸ɺҎԼͷΑ͏ʹ؆୯ʹ‫ࢉܭ‬Մೳ


∇ϕ
𝔼
pϕ [−log pθ (x)] =
𝔼
z∼p(z) [∇ϕEθ (Gϕ (z))]
ੜ੒‫ث‬ͷֶश
ୈ2߲ͷΤϯτϩϐʔ͸ղੳతʹ‫·ٻ‬Βͳ͍


࿦จͰ͸ɺόονਖ਼‫ن‬ԽͷεέʔϧύϥϝʔλΛਖ਼‫ن‬෼෍ͷ෼ࢄͱΈͳͯ͠
ͦͷΤϯτϩϐʔΛ‫͢ࢉܭ‬Δ͜ͱͰ୅༻͍ͯ͠Δ


H (pϕ) ≈
∑
ai
H (
𝒩
(μai
, σai)) =
∑
ai
1
2
log (2eπσ2
ai)
GANʹର͢Δར఺
ࣝผ‫ث‬ͷ୅ΘΓʹΤωϧΪʔؔ਺Λֶश͢ΔͷͰɺີ౓ൺਪఆͳͲʹ࢖͑Δ
ੜ੒αϯϓϧ
Rithesh Kumar,Sherjil Ozair,Anirudh Goyal,Aaron Courville,Yoshua Bengio (Université de Montréal)
Maximum Entropy Generators
 

for Energy-Based Models
https://arxiv.org/abs/1901.08508
Τϯτϩϐʔͷ‫ࢉܭ‬


࿦จ1Ͱ͸Τϯτϩϐʔ ͷ‫ࢉܭ‬Λόονਖ਼‫ن‬ԽͷεέʔϧύϥϝʔλͰ
ߦ͍͕ͬͯͨɺώϡʔϦεςΟοΫͰཧ࿦తͳଥ౰ੑ΋ͳ͍
KL (pϕ∥ pθ) =
𝔼
pϕ [−log pθ (x)] − H (pϕ)
H (pϕ)
Τϯτϩϐʔͷ‫ࢉܭ‬
જࡏม਺ ͱੜ੒‫ث‬ͷग़ྗ ͷ૬‫ޓ‬৘ใྔΛߟ͑Δͱ


z x = Gϕ (z)
I(x, z) = H (x) − H (x ∣ z) =
𝔼
p(z) [H (Gϕ (z)) − H (Gϕ (z) ∣ z)]
Τϯτϩϐʔͷ‫ࢉܭ‬
͕ܾఆ࿦తͳؔ਺ͷͱ͖ɺ ͳͷͰ




ͭ·ΓɺΤϯτϩϐʔͷ୅ΘΓʹɺ૬‫ޓ‬৘ใྔΛ࠷େԽ͢Ε͹ྑ͍
Gϕ H (Gϕ (z) ∣ z) = 0
H (pϕ) =
𝔼
p(z) [H (Gϕ (z))] = I (x, z)
૬‫ޓ‬৘ใྔͷਪఆ
૬‫ޓ‬৘ใྔͷਪఆํ๏͸ɺۙ೥͍Ζ͍ΖఏҊ͞Ε͍ͯΔ͕ɺ
͜͜Ͱ͸ɺDeep InfoMaxͰఏҊ͞ΕͨJS divergenceʹ‫ͮ͘ج‬ਪఆ๏Λ༻͍Δ




͸ ͔Βͷαϯϓϧͱ ͔ΒͷαϯϓϧΛ‫ݟ‬෼͚Δࣝผ‫Ͱث‬
ಉ࣌ʹֶश͢Δ
IJSD (x, z) = sup
T∈
𝒯
𝔼
p(x, z) [−sp (−T (x, z))] −
𝔼
p(x)p(z) [sp (T (x, z))]
T p (x, z) p (x) p (z)
ີ౓ਪఆ
ෳࡶͳ෼෍΋͏·ۙ͘ࣅͰ͖͍ͯΔ
Mode Collapse
1000 (or 10000) ‫ݸ‬ͷϞʔυΛ΋ͭσʔλʢStackedMNISTʣͰֶशͨ͠ͱ͖ʹɺ
ϞʔυΛ͍ͭ͘ଊ͑ΒΕ͍ͯΔ͔Λൺֱ͢Δ࣮‫ݧ‬


MEG (ఏҊ๏) ͸͢΂ͯͷϞʔυΛଊ͓͑ͯΓɺmode collapse͕‫͜ى‬Βͳ͍
ը૾ੜ੒
CIFAR-10
EBM͔ΒMCMCͰαϯϓϧͨ͠৔߹͸WGAN-GPΑΓ΋IS, FID͕ྑ͍
Tong Che,Ruixiang Zhang,Jascha Sohl-Dickstein,Hugo Larochelle,Liam Paull,Yuan Cao,Yoshua Bengio (Université de Montréal,Google Brain)
Your GAN is Secretly an Energy-based Model
and You Should Use Discriminator Driven
Latent Sampling
https://arxiv.org/abs/2003.06060
GANͷҰൠతͳղऍ
ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬
ࣝผ‫ث‬͸σʔλ෼෍ ͱੜ੒αϯϓϧͷ෼෍ ͷ
ີ౓ൺਪఆ‫ͯ͠ͱث‬ͷ໾ׂΛՌͨ͢


i.e., ࣝผ‫࠷͕ث‬దͳͱ͖


p (x) pϕ (x) =
𝔼
p(z) [p (Gϕ (z))]
D*
θ
(x) =
p (x)
p (x) + pϕ (x)
GANͷҰൠతͳղऍ
ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬
ΛγάϞΠυؔ਺ɺ ͱ͢Δͱ




σʔλ෼෍ ͸ੜ੒‫ث‬ͷ෼෍ ͱ ͷੵʹൺྫ͢Δ


➡ ֶश‫ޙ‬ͷGANͰ͜ͷ෼෍͔Βαϯϓϧ͢Ε͹ɺੜ੒ͷ্࣭͕͕ΔͷͰ͸ʁ
σ ( ⋅ ) dθ (x) = σ−1
(D (x))
Dθ (x) =
p (x)
p (x) + pϕ (x)
⇒ p (x) ∝ pϕ (x) exp (dθ (x))
p (x) pϕ (x) exp (dθ (x))
જࡏۭؒͰͷMCMC
Discriminator Driven Latent Sampling (DDLS)
͔ΒαϯϓϦϯά͍͕ͨ͠ɺσʔλۭؒͰMCMCΛ͢Δͷ͸
ޮ཰͕ѱ͘೉͍͠


୅ΘΓʹੜ੒‫ث‬ ͷજࡏ্ۭؒͰMCMC (Langevin dynamics)Λߦ͏


pϕ (x) exp (dθ (x))
Gϕ (z)
E (z) = − log p (z) − dθ (Gϕ (z))
z ← z − η∇zE (z) + ϵ
ϵ ∼ Normal (0,2ηI)
࣮‫ݧ‬
ֶशࡁΈͷGANʹDDLSΛ࢖͏͚ͩͰɺIS΍FID͕͔ͳΓվળ͢Δ
·ͱΊ
GANͱEBM͸ਂ͍ؔ܎ʹ͋Δ


྆ऀͷ஌‫ݟ‬Λੜ͔͢͜ͱͰɺ྆ऀͷ͍͍ͱ͜औΓΛ͢ΔΞϓϩʔν͕Ͱ͖Δ


• EBMͷαϯϓϦϯάʹੜ੒‫ث‬Λ࢖͏


• GANͷαϯϓϦϯάʹMCMCΛ࢖͏


ࠓ‫ޙ‬΋ࣅͨΑ͏ͳΞϓϩʔνͷ‫͕ڀݚ‬৭ʑͱग़ͯ͘Δ༧‫ײ‬

[DL輪読会]GANとエネルギーベースモデル