1
Published by T. Kim, M. Cha, H. Kim, J. K. Lee, and, J. Kim
Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017
Seongcheol Baek
Reading Circle Presentation @ Hikihara Lab
Department of Electrical Engineering, Kyoto University
2019/07/19
Learning to Discover Cross-Domain Relations
with Generative Adversarial Networks
Focus of this presentation
- Recently emerging issues around GAN
- Introduction of generative adversarial networks (GAN)
- What is DiscoGAN?
Problems of interest / model architecture / mode collapse problem /
experiments / summaries / comments
2
Recent generative technologies
3
2014 2015 2016 2017 2018 2019
Ian J. Goodfellow
invented
“generative
adversarial network”
Deep
Convolutional GAN
(DCGAN)
Least Squares GAN
(LSGAN)
Semi-Supervised GAN StackGAN,
Auxiliary
Classifier GAN
(ACGAN)
Jun. Oct. Nov. Oct. Mar. Aug. Sep. Oct. Sep. Mar. May
Samsung deepfake AI
fabricate a video from
a single profile picGauGAN (Source: Nvidia)
BW clips into color (Source: Nvidia)
CycleGAN
Original AlphaGo
beat a professional
Go player
DiscoGAN
Recent issues around deepfakes – security, art, etc.
4
A viral video that Obama insults
Donald Trump is fabricated
with FakeApp (Photo: Youtube)
A deepfake clip of Mark Zuckerberg
is being allowed to remain on Instagram
(Photo: Bill Poster UK)
- US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’
- At individual level, deepfakes can be used for cyberbullying, defamation and blackmail
Edmond de Belamy: The first piece
of AI-generated art
(created by GAN in 2018)
What is GAN?
5
- Two neural networks contest with each other in a game. Given a training set, GAN learns to
generate new data with the same statistics as the training set.
- Minimax two-player game (Generative model v.s. Discriminative model)
Minimax Problem of GAN
6
min
$
max
'
((*, ,) = /0~23454(0) log , 0 + /:~2;(;)[log(1 − ,(*(;)))]
( *, , = @
0
ABCDC 0 log , 0 d0 + @
;
AF (;) log(1 − , *(;) ) d;
Training of Generator – min
$
[1 − ,(* G )] = 0
Training of Discriminator max
'
,(I) = 1 max
'
[1 − ,(* G )] = 1
Discriminant for real data Discriminant for generated data
- ((,, *) has a saddle point at ,(* ; ) =
J
K
, ∈ [0, 1]
data is fake/real
Discover Cross-Domain Relations with GAN
7
Training of 2 different data sets
without explicitly paired labelling
Results of domain transfer
- Previous AI could also transfer data from one domain to another, preserving key attributes
- Previous training methods (~2016) require paired data, that is costly and hard to collect
- DiscoGAN requires training of 2 different data sets without any paired data, and its results
shows better performance with robustness to the mode collapse problem
(Domain A) (Domain B)
!"#
!#"
Network Models – DiscoGAN & Previous GANs
8
Standard GAN with GAN loss
GAN with a reconstruction loss & GAN loss
DiscoGAN
- Each generator consists of encoder-decoder pair (input and output are images)
- GAN loss (and the reconstruction) is to be minimized on training processes
- In DiscoGAN, 2 coupled GANs map each domain to its counterpart domain (bijective)
Problem Formulation (1)
9
- Reconstruction loss measures how well the original input is reconstructed after a sequence of two
generations: !"#$%&'
= )(+,-,, +,) such as !0, !1, or Huber loss
- GAN loss measures how realistic the generated image is in domain B: !2,$3
= −56'~8'(6) log <- +,-
- Relaxed constraints are considered to guarantee bijection and domain transition
- Bijection: ideally =,-
>0
= =-,
→ min
2'3
(!"#$%&'
), min
23'
(!"#$%&3
)
- Domain transition: ideally B,- ∈ ℝ-, B-, ∈ ℝ,
→ min
E3
(!E3
), min
E'
(!E'
)
Problem Formulation (2)
10
Training of Generator
(in case of !"#)
Training of Discriminator
(in case of $#)
Constraints Level
(a) Standard GAN with GAN
loss
%&'
= −*+~-+
[log &'(3+'(4+))]
%&'
= −*+~-+
[log &'(3+'(4+))] –
(b) GAN with a
reconstruction loss & GAN
loss
%3+'
= %3+7'
+ %9:7;<+
= −*+~-+
log &' 3+' 4+
+ =(4+'+, 4+)
%&'
= −*'~-'
[log &' 4' ]
− *+~-+
[log(1 − &'(3+'(4+)))]
doubled DOF
from (a),
weaker than (a)
(c) DiscoGAN %3 = %3+'
+ %@AB
= %3+7'
+ %9:7;<+
+ %3+7+
+ %9:7;<'
= −*+~-+
log &' 3+' 4+
+ =(4+'+, 4+)
− *'~-'
log &+ 3'+ 4'
+ =(4'+', 4')
%& = %&+
+ %&'
= −*+~-+
[log &+ 4+ ]
− *'~-'
[log(1 − &+(3'+(4')))]
− *'~-'
[log &' 4' ]
− *+~-+
[log(1 − &'(3+'(4+)))]
doubled DOF
from (b),
weaker than (b)
Architecture of Generator
11
- Each generator takes an image and feeds it through an encoder-decoder pair
- Number of layers ranges from 4 to 5 depending on the domain
Encoder
(convolution layer)
Decoder
(deconvolution layer)
Domain A (resp. B) Domain B (resp. A)
Architecture of Discriminator
12
- Each discriminator feeds an image through convolution layers
- Discriminator outputs a scalar output based on sigmoid, telling how real fed image is
Toy Experiment – Domain Transition Test
13
- In DiscoGAN, discriminator B is perfectly fooled by translated sampled from domain A
- DiscoGAN prevents mode-collapse by translating into distinct well-bounded regions that do
not overlap
Initial state Standard GAN GAN with !"#$%& DiscoGAN
'(
Colored points: samples in domain A
Black x’s: target modes in domain B
Mode Collapse Problem
14
The gradients are biased towards the mode from which
higher number of samples are drawn to form the real training data
- Generator outputs unintended images in different mode, which occurs prevalently in GANs
- Usually, GAN remedy this problem with losses, however it has not been resolved perfectly
- Other examples: communication system, cryptography, automaton, etc.
Why DiscoGAN is robust to mode-collapse?
15
- In DiscoGAN, two coupled models are trained together simultaneously. !"#’s and !#"’s
share parameters
- Constraints of coupled reconstruction losses lead to the strict bijection
Real Domain Experiment – Car to Car, Face to Face
16
Input data Standard GAN GAN with !"#$%& DiscoGAN
CartocarFacetoface
- Reconstruction tests
- Results in DiscoGAN show higher correlations, (robust to mode collapse)
–
Real Domain Experiment – Face Conversion
17
Translation of gender
Blond to black,
Black to blond hair
Glasses to non-glasses,
non-glasses to glasses
- DiscoGAN translates specific feature, preserving other facial features
Cross-Domain Experiment (1)
18
Chair to car Car to face
- Note that training is implemented without any paired data
- The main attribute (azimuth) is preserved
Cross-Domain Experiment (2)
19
- 1-to-N problem
Handbag to sketches
Sketches to shoes
Sketches to handbags
Cross-Domain Experiment (3)
20
- Same style is discovered
Handbag to shoes
Shoes to handbag
Summaries
21
- DiscoGAN is proposed as a learning method to discover cross-domain relations without any
pair labels
- Results showed better performance with robustness to mode-collapse. The symmetry
granted by coupling 2 GANs, is considered to be a key factor for the dynamical robustness
Comments
- The strategy to couple two GAN models reminded me of the symmetry of dynamics. Some
correlations could be drawn to handle the stability problem…?
- This paper is giving me many ideas. It is very pleasant.
22
Thank you!
- Source code for simulations
Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks" (Github) ... https://github.com/SKTBrain/DiscoGAN
- This presentation is also available on:
https://www.slideshare.net/SeongcheolBaek/introduction-of-discogan
References
23
- Crux of Presentation
T. Kim, et al., Learning to Discover Cross-Domain Relations with Generative Adversarial Networks (arXiv) ... https://arxiv.org/abs/1703.05192
- Recent generative technologies
Apple announces Animoji (The Verge) … https://www.theverge.com/2017/9/12/16290210/new-iphone-emoji-animated-animoji-apple-ios-11-update
AI Can Convert Black and White Clips into Color (Nvidia Developer) ... https://news.developer.nvidia.com/ai-can-convert-black-and-white-clips-into-color/
Nvidia’s latest AI software turns rough doodles into realistic landscapes (The Verge) ... https://www.theverge.com/2019/3/19/18272602/ai-art-generation-gan-nvidia-doodle-landscapes
Deepfakes are getting easier than ever to make (The Verge) … https://www.theverge.com/2019/5/23/18637373/deepfakes-samsung-ai-research-results-single-photo-algorithm
- Recent issues around deepfakes – security, art, etc.
A viral video that appeared to show Obama calling Trump a 'dips---' shows a disturbing new trend called 'deepfakes’ (Business Insider) … https://www.businessinsider.com/obama-
deepfake-video-insulting-trump-2018-4
New deepfake tech turns a single photo and audio file into a singing video portrait (The Verge) ... https://www.theverge.com/2019/6/20/18692671/deepfake-technology-singing-talking-
video-portrait-from-a-single-image-imperial-college-samsung
US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’ (The Verge) … https://www.theverge.com/2018/9/14/17859188/ai-deepfakes-national-security-
threat-lawmakers-letter-intelligence-community
Deepfakes: A Threat to Individuals and National Security (Lionbridge) … https://lionbridge.ai/articles/deepfakes-a-threat-to-individuals-and-national-security/
A deepfake clip of Mark Zuckerberg is being allowed to remain on Instagram (iNews) … https://inews.co.uk/news/technology/a-deepfake-clip-of-mark-zuckerberg-is-being-allowed-to-
remain-on-instagram/
Portrait of Edmond Belamy created by GAN (Wikipedia) … https://en.wikipedia.org/wiki/Edmond_de_Belamy
- Generative Adversarial Network
I. J. Goodfellow, Generative Adversarial Nets (arXiv) … https://arxiv.org/abs/1406.2661
Tutorial on Generative Adversarial Networks … https://www.slideshare.net/ckmarkohchang/generative-adversarial-networks
- Mode Collapse Problem
A. Ghosh, et al., Multi-Agent Diverse Generative Adversarial Networks (Research Gate) … https://www.researchgate.net/publication/315882247_Multi-
Agent_Diverse_Generative_Adversarial_Networks

Introduction of DiscoGAN

  • 1.
    1 Published by T.Kim, M. Cha, H. Kim, J. K. Lee, and, J. Kim Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017 Seongcheol Baek Reading Circle Presentation @ Hikihara Lab Department of Electrical Engineering, Kyoto University 2019/07/19 Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
  • 2.
    Focus of thispresentation - Recently emerging issues around GAN - Introduction of generative adversarial networks (GAN) - What is DiscoGAN? Problems of interest / model architecture / mode collapse problem / experiments / summaries / comments 2
  • 3.
    Recent generative technologies 3 20142015 2016 2017 2018 2019 Ian J. Goodfellow invented “generative adversarial network” Deep Convolutional GAN (DCGAN) Least Squares GAN (LSGAN) Semi-Supervised GAN StackGAN, Auxiliary Classifier GAN (ACGAN) Jun. Oct. Nov. Oct. Mar. Aug. Sep. Oct. Sep. Mar. May Samsung deepfake AI fabricate a video from a single profile picGauGAN (Source: Nvidia) BW clips into color (Source: Nvidia) CycleGAN Original AlphaGo beat a professional Go player DiscoGAN
  • 4.
    Recent issues arounddeepfakes – security, art, etc. 4 A viral video that Obama insults Donald Trump is fabricated with FakeApp (Photo: Youtube) A deepfake clip of Mark Zuckerberg is being allowed to remain on Instagram (Photo: Bill Poster UK) - US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’ - At individual level, deepfakes can be used for cyberbullying, defamation and blackmail Edmond de Belamy: The first piece of AI-generated art (created by GAN in 2018)
  • 5.
    What is GAN? 5 -Two neural networks contest with each other in a game. Given a training set, GAN learns to generate new data with the same statistics as the training set. - Minimax two-player game (Generative model v.s. Discriminative model)
  • 6.
    Minimax Problem ofGAN 6 min $ max ' ((*, ,) = /0~23454(0) log , 0 + /:~2;(;)[log(1 − ,(*(;)))] ( *, , = @ 0 ABCDC 0 log , 0 d0 + @ ; AF (;) log(1 − , *(;) ) d; Training of Generator – min $ [1 − ,(* G )] = 0 Training of Discriminator max ' ,(I) = 1 max ' [1 − ,(* G )] = 1 Discriminant for real data Discriminant for generated data - ((,, *) has a saddle point at ,(* ; ) = J K , ∈ [0, 1] data is fake/real
  • 7.
    Discover Cross-Domain Relationswith GAN 7 Training of 2 different data sets without explicitly paired labelling Results of domain transfer - Previous AI could also transfer data from one domain to another, preserving key attributes - Previous training methods (~2016) require paired data, that is costly and hard to collect - DiscoGAN requires training of 2 different data sets without any paired data, and its results shows better performance with robustness to the mode collapse problem (Domain A) (Domain B) !"# !#"
  • 8.
    Network Models –DiscoGAN & Previous GANs 8 Standard GAN with GAN loss GAN with a reconstruction loss & GAN loss DiscoGAN - Each generator consists of encoder-decoder pair (input and output are images) - GAN loss (and the reconstruction) is to be minimized on training processes - In DiscoGAN, 2 coupled GANs map each domain to its counterpart domain (bijective)
  • 9.
    Problem Formulation (1) 9 -Reconstruction loss measures how well the original input is reconstructed after a sequence of two generations: !"#$%&' = )(+,-,, +,) such as !0, !1, or Huber loss - GAN loss measures how realistic the generated image is in domain B: !2,$3 = −56'~8'(6) log <- +,- - Relaxed constraints are considered to guarantee bijection and domain transition - Bijection: ideally =,- >0 = =-, → min 2'3 (!"#$%&' ), min 23' (!"#$%&3 ) - Domain transition: ideally B,- ∈ ℝ-, B-, ∈ ℝ, → min E3 (!E3 ), min E' (!E' )
  • 10.
    Problem Formulation (2) 10 Trainingof Generator (in case of !"#) Training of Discriminator (in case of $#) Constraints Level (a) Standard GAN with GAN loss %&' = −*+~-+ [log &'(3+'(4+))] %&' = −*+~-+ [log &'(3+'(4+))] – (b) GAN with a reconstruction loss & GAN loss %3+' = %3+7' + %9:7;<+ = −*+~-+ log &' 3+' 4+ + =(4+'+, 4+) %&' = −*'~-' [log &' 4' ] − *+~-+ [log(1 − &'(3+'(4+)))] doubled DOF from (a), weaker than (a) (c) DiscoGAN %3 = %3+' + %@AB = %3+7' + %9:7;<+ + %3+7+ + %9:7;<' = −*+~-+ log &' 3+' 4+ + =(4+'+, 4+) − *'~-' log &+ 3'+ 4' + =(4'+', 4') %& = %&+ + %&' = −*+~-+ [log &+ 4+ ] − *'~-' [log(1 − &+(3'+(4')))] − *'~-' [log &' 4' ] − *+~-+ [log(1 − &'(3+'(4+)))] doubled DOF from (b), weaker than (b)
  • 11.
    Architecture of Generator 11 -Each generator takes an image and feeds it through an encoder-decoder pair - Number of layers ranges from 4 to 5 depending on the domain Encoder (convolution layer) Decoder (deconvolution layer) Domain A (resp. B) Domain B (resp. A)
  • 12.
    Architecture of Discriminator 12 -Each discriminator feeds an image through convolution layers - Discriminator outputs a scalar output based on sigmoid, telling how real fed image is
  • 13.
    Toy Experiment –Domain Transition Test 13 - In DiscoGAN, discriminator B is perfectly fooled by translated sampled from domain A - DiscoGAN prevents mode-collapse by translating into distinct well-bounded regions that do not overlap Initial state Standard GAN GAN with !"#$%& DiscoGAN '( Colored points: samples in domain A Black x’s: target modes in domain B
  • 14.
    Mode Collapse Problem 14 Thegradients are biased towards the mode from which higher number of samples are drawn to form the real training data - Generator outputs unintended images in different mode, which occurs prevalently in GANs - Usually, GAN remedy this problem with losses, however it has not been resolved perfectly - Other examples: communication system, cryptography, automaton, etc.
  • 15.
    Why DiscoGAN isrobust to mode-collapse? 15 - In DiscoGAN, two coupled models are trained together simultaneously. !"#’s and !#"’s share parameters - Constraints of coupled reconstruction losses lead to the strict bijection
  • 16.
    Real Domain Experiment– Car to Car, Face to Face 16 Input data Standard GAN GAN with !"#$%& DiscoGAN CartocarFacetoface - Reconstruction tests - Results in DiscoGAN show higher correlations, (robust to mode collapse) –
  • 17.
    Real Domain Experiment– Face Conversion 17 Translation of gender Blond to black, Black to blond hair Glasses to non-glasses, non-glasses to glasses - DiscoGAN translates specific feature, preserving other facial features
  • 18.
    Cross-Domain Experiment (1) 18 Chairto car Car to face - Note that training is implemented without any paired data - The main attribute (azimuth) is preserved
  • 19.
    Cross-Domain Experiment (2) 19 -1-to-N problem Handbag to sketches Sketches to shoes Sketches to handbags
  • 20.
    Cross-Domain Experiment (3) 20 -Same style is discovered Handbag to shoes Shoes to handbag
  • 21.
    Summaries 21 - DiscoGAN isproposed as a learning method to discover cross-domain relations without any pair labels - Results showed better performance with robustness to mode-collapse. The symmetry granted by coupling 2 GANs, is considered to be a key factor for the dynamical robustness Comments - The strategy to couple two GAN models reminded me of the symmetry of dynamics. Some correlations could be drawn to handle the stability problem…? - This paper is giving me many ideas. It is very pleasant.
  • 22.
    22 Thank you! - Sourcecode for simulations Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks" (Github) ... https://github.com/SKTBrain/DiscoGAN - This presentation is also available on: https://www.slideshare.net/SeongcheolBaek/introduction-of-discogan
  • 23.
    References 23 - Crux ofPresentation T. Kim, et al., Learning to Discover Cross-Domain Relations with Generative Adversarial Networks (arXiv) ... https://arxiv.org/abs/1703.05192 - Recent generative technologies Apple announces Animoji (The Verge) … https://www.theverge.com/2017/9/12/16290210/new-iphone-emoji-animated-animoji-apple-ios-11-update AI Can Convert Black and White Clips into Color (Nvidia Developer) ... https://news.developer.nvidia.com/ai-can-convert-black-and-white-clips-into-color/ Nvidia’s latest AI software turns rough doodles into realistic landscapes (The Verge) ... https://www.theverge.com/2019/3/19/18272602/ai-art-generation-gan-nvidia-doodle-landscapes Deepfakes are getting easier than ever to make (The Verge) … https://www.theverge.com/2019/5/23/18637373/deepfakes-samsung-ai-research-results-single-photo-algorithm - Recent issues around deepfakes – security, art, etc. A viral video that appeared to show Obama calling Trump a 'dips---' shows a disturbing new trend called 'deepfakes’ (Business Insider) … https://www.businessinsider.com/obama- deepfake-video-insulting-trump-2018-4 New deepfake tech turns a single photo and audio file into a singing video portrait (The Verge) ... https://www.theverge.com/2019/6/20/18692671/deepfake-technology-singing-talking- video-portrait-from-a-single-image-imperial-college-samsung US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’ (The Verge) … https://www.theverge.com/2018/9/14/17859188/ai-deepfakes-national-security- threat-lawmakers-letter-intelligence-community Deepfakes: A Threat to Individuals and National Security (Lionbridge) … https://lionbridge.ai/articles/deepfakes-a-threat-to-individuals-and-national-security/ A deepfake clip of Mark Zuckerberg is being allowed to remain on Instagram (iNews) … https://inews.co.uk/news/technology/a-deepfake-clip-of-mark-zuckerberg-is-being-allowed-to- remain-on-instagram/ Portrait of Edmond Belamy created by GAN (Wikipedia) … https://en.wikipedia.org/wiki/Edmond_de_Belamy - Generative Adversarial Network I. J. Goodfellow, Generative Adversarial Nets (arXiv) … https://arxiv.org/abs/1406.2661 Tutorial on Generative Adversarial Networks … https://www.slideshare.net/ckmarkohchang/generative-adversarial-networks - Mode Collapse Problem A. Ghosh, et al., Multi-Agent Diverse Generative Adversarial Networks (Research Gate) … https://www.researchgate.net/publication/315882247_Multi- Agent_Diverse_Generative_Adversarial_Networks