CycleGAN
Finding connections among images
Taesung Park, UC Berkeley
Face to Ramen?
Generated images in the reverse direction
How does it work?
pix2pix CycleGAN
GAN
pix2pix CycleGANGAN
pix2pix (Isola et al., 2017)
?
train
test
G
• Supervised
• loss: Minimize the difference
between output G(x) and
ground truth y
Data from [Tylecek, 2013]
x y
pix2pix (Isola et al., 2017)
Loss: Minimize the difference between output G(x) and the ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1
pix2pix (Isola et al., 2017)
Loss: Minimize the difference between output G(x) and the ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1
pix2pix (Isola et al., 2017)
Loss: Minimize the difference between output G(x) and the ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1
pix2pix (Isola et al., 2017)
Loss: Minimize the difference between output G(x) and the ground truth y
Let another deep network point out the difference
pix2pix CycleGANGAN
Generative Adversarial Network (GAN) (Goodfellow et al., 2014)
Let’s skip
Generative Adversarial Network (GAN) (Goodfellow et al., 2014)
Loss: Minimize the difference between output G(x) and the ground truth y
Let another deep network point out the difference
Generator
[Goodfellow et al., 2014](slides credit: Phillip Isola)
D tries to identify the fakes
G tries to synthesize fake images that fool D
Generator Discriminator
real or fake?
[Goodfellow et al., 2014](slides credit: Phillip Isola)
fake (0.9)
real (0.1)
[Goodfellow et al., 2014](slides credit: Phillip Isola)
G tries to synthesize fake images that fool
D:
real or fake?
[Goodfellow et al., 2014](slides credit: Phillip Isola)
G tries to synthesize fake images that fool the best
D:
real or fake?
[Goodfellow et al., 2014](slides credit: Phillip Isola)
Loss Function
G’s perspective: D is a loss function.
Rather than being hand-designed, it is learned.
G and D grow together via competition
[Isola et al., 2017]
[Goodfellow et al., 2014]
(slides credit: Phillip Isola)
Pix2pix (P. Isola et al., 2017)
Loss: Minimize the difference between and output G(x) and ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1 + 𝐿 𝐺𝐴𝑁 (𝐺 𝑥 , 𝑦)
pix2pix CycleGANGAN
CycleGAN (Zhu et al. 2017)
pix2pix CycleGAN
?
?
CycleGAN (Zhu et al. 2017)
CycleGAN
?
?
G
Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦
G(x) should just look photorealistic
CycleGAN (Zhu et al. 2017)
CycleGAN
?
?
G
Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦
G(x) should just look photorealistic
CycleGAN (Zhu et al. 2017)
CycleGAN
?
?
G
Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦
G(x) should just look photorealistic
and be able to reconstruct x
CycleGAN (Zhu et al. 2017)
CycleGAN
?
?
Loss
𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1
G(x) should just look photorealistic
and F(G(x)) should be F(G(x)) = x,
where F is the inverse deep network
G
F
CycleGAN (Zhu et al. 2017)
G
𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1
F
CycleGAN (Zhu et al. 2017)
G
F
𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
CycleGAN (Zhu et al. 2017)
𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1
+ 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
Input Output Input Output
[Zhu, Park, Isola, Efros, submitted](slides credit: Phillip Isola)
Photo Monet Van Gogh Ukiyo-e Cezanne
Ablation Study on Cityscapes dataset
𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1
+ 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
Reconstructed Images
Original CG Fake Photo Reconstruction Real Photo
Generating High Quality Images using CycleGAN
Training Details: Generator 𝐺
Encoder-decoderU-Net
[Ronneberger et al.]
ResNet [He et al.] [Johnson et al.]
• Both have skip connections
• ResNet: fewer parameters
better for ill-posed problems
Training Details: Objective
• GANs with cross-entropy loss
• Least square GANs [Mao et al. 2016]
Stable training + better results
Vanishing gradients
Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
pix2pix
CycleGAN
?
Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
CycleGAN
?
Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
CycleGAN
?
Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
Training Details: replay bufferdiscriminatoroutput
epoch epoch
• … because the discriminators can take very different trajectories in training
Failure cases
ImageNet
“Wild horse”
Application on Domain Adaptation
Inspired by [Johnson et al. 2011], Slides from Junyan Zhu
GTA5 CG Input Output
CG2Real: GTA5 → real streetview
Cityscape Input Output
CG2Real: real streetview → GTA5
Slides from Junyan Zhu
Slides from Junyan Zhu
[Richter*, Vineet* et al. 2016]
GTA5 images Segmentation labels
Slides from Junyan Zhu
Use CG data to train recognition systems
Per-class accuracy Per-pixel accuracy
Oracle (Train and test on Real) 60.3 93.1
Train on CG, test on Real 17.9 54.0
Train on ‘free’ synthetic data (GTA5) Test on real images
Slides from Junyan Zhu
State-of-the-art domain adaption method
Per-class accuracy Per-pixel accuracy
Oracle (Train and test on Real) 60.3 93.1
Train on CG, test on Real 17.9 54.0
FCN in the wild [Hoffman et al.] 27.1 (+6.0) -
VGG-FCN: 17.9 [Long et al. 2015]; Dilated-VGG-FCN: 21.1 [Fisher and Koltun 2015’]
Train on ‘free’ synthetic data (GTA5) Test on real images
adapt
Slides from Junyan Zhu
Domain Adaption with CycleGAN
Per-class accuracy Per-pixel accuracy
Oracle (Train and test on Real) 60.3 93.1
Train on CG, test on Real 17.9 54.0
FCN in the wild [Hoffman et al.] 27.1 (+6.0) -
Train on CycleGAN, test on Real 34.8 (+16.9) 82.8
Train on CycleGAN data Test on real images
[Tzeng et al. ] In submission, Slides from Junyan Zhu
Thank you!
Jun-Yan Zhu Dr. Phillip Isola Prof. Alexei Efros

Finding connections among images using CycleGAN

  • 1.
    CycleGAN Finding connections amongimages Taesung Park, UC Berkeley
  • 7.
  • 8.
    Generated images inthe reverse direction
  • 9.
    How does itwork? pix2pix CycleGAN GAN
  • 10.
  • 11.
    pix2pix (Isola etal., 2017) ? train test G • Supervised • loss: Minimize the difference between output G(x) and ground truth y Data from [Tylecek, 2013] x y
  • 12.
    pix2pix (Isola etal., 2017) Loss: Minimize the difference between output G(x) and the ground truth y ෍ (𝑥,𝑦) 𝑦 − 𝐺(𝑥) 1
  • 13.
    pix2pix (Isola etal., 2017) Loss: Minimize the difference between output G(x) and the ground truth y ෍ (𝑥,𝑦) 𝑦 − 𝐺(𝑥) 1
  • 14.
    pix2pix (Isola etal., 2017) Loss: Minimize the difference between output G(x) and the ground truth y ෍ (𝑥,𝑦) 𝑦 − 𝐺(𝑥) 1
  • 15.
    pix2pix (Isola etal., 2017) Loss: Minimize the difference between output G(x) and the ground truth y Let another deep network point out the difference
  • 16.
  • 17.
    Generative Adversarial Network(GAN) (Goodfellow et al., 2014) Let’s skip
  • 18.
    Generative Adversarial Network(GAN) (Goodfellow et al., 2014) Loss: Minimize the difference between output G(x) and the ground truth y Let another deep network point out the difference
  • 19.
    Generator [Goodfellow et al.,2014](slides credit: Phillip Isola)
  • 20.
    D tries toidentify the fakes G tries to synthesize fake images that fool D Generator Discriminator real or fake? [Goodfellow et al., 2014](slides credit: Phillip Isola)
  • 21.
    fake (0.9) real (0.1) [Goodfellowet al., 2014](slides credit: Phillip Isola)
  • 22.
    G tries tosynthesize fake images that fool D: real or fake? [Goodfellow et al., 2014](slides credit: Phillip Isola)
  • 23.
    G tries tosynthesize fake images that fool the best D: real or fake? [Goodfellow et al., 2014](slides credit: Phillip Isola)
  • 24.
    Loss Function G’s perspective:D is a loss function. Rather than being hand-designed, it is learned. G and D grow together via competition [Isola et al., 2017] [Goodfellow et al., 2014] (slides credit: Phillip Isola)
  • 25.
    Pix2pix (P. Isolaet al., 2017) Loss: Minimize the difference between and output G(x) and ground truth y ෍ (𝑥,𝑦) 𝑦 − 𝐺(𝑥) 1 + 𝐿 𝐺𝐴𝑁 (𝐺 𝑥 , 𝑦)
  • 26.
  • 27.
    CycleGAN (Zhu etal. 2017) pix2pix CycleGAN ? ?
  • 28.
    CycleGAN (Zhu etal. 2017) CycleGAN ? ? G Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 G(x) should just look photorealistic
  • 29.
    CycleGAN (Zhu etal. 2017) CycleGAN ? ? G Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 G(x) should just look photorealistic
  • 30.
    CycleGAN (Zhu etal. 2017) CycleGAN ? ? G Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 G(x) should just look photorealistic and be able to reconstruct x
  • 31.
    CycleGAN (Zhu etal. 2017) CycleGAN ? ? Loss 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1 G(x) should just look photorealistic and F(G(x)) should be F(G(x)) = x, where F is the inverse deep network G F
  • 32.
    CycleGAN (Zhu etal. 2017) G 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1 F
  • 33.
    CycleGAN (Zhu etal. 2017) G F 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
  • 34.
    CycleGAN (Zhu etal. 2017) 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1 + 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
  • 35.
    Input Output InputOutput [Zhu, Park, Isola, Efros, submitted](slides credit: Phillip Isola)
  • 36.
    Photo Monet VanGogh Ukiyo-e Cezanne
  • 37.
    Ablation Study onCityscapes dataset 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1 + 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1
  • 38.
    Reconstructed Images Original CGFake Photo Reconstruction Real Photo
  • 39.
    Generating High QualityImages using CycleGAN
  • 40.
    Training Details: Generator𝐺 Encoder-decoderU-Net [Ronneberger et al.] ResNet [He et al.] [Johnson et al.] • Both have skip connections • ResNet: fewer parameters better for ill-posed problems
  • 41.
    Training Details: Objective •GANs with cross-entropy loss • Least square GANs [Mao et al. 2016] Stable training + better results Vanishing gradients
  • 42.
    Combine with asmuch L1 loss as possible • L1 loss as a stable guiding force in GAN training pix2pix
  • 43.
    CycleGAN ? Combine with asmuch L1 loss as possible • L1 loss as a stable guiding force in GAN training
  • 44.
    CycleGAN ? Combine with asmuch L1 loss as possible • L1 loss as a stable guiding force in GAN training
  • 45.
    CycleGAN ? Combine with asmuch L1 loss as possible • L1 loss as a stable guiding force in GAN training
  • 46.
    Combine with asmuch L1 loss as possible • L1 loss as a stable guiding force in GAN training
  • 47.
    Training Details: replaybufferdiscriminatoroutput epoch epoch • … because the discriminators can take very different trajectories in training
  • 48.
  • 49.
  • 50.
  • 51.
    Inspired by [Johnsonet al. 2011], Slides from Junyan Zhu GTA5 CG Input Output CG2Real: GTA5 → real streetview
  • 52.
    Cityscape Input Output CG2Real:real streetview → GTA5 Slides from Junyan Zhu
  • 53.
  • 54.
    [Richter*, Vineet* etal. 2016] GTA5 images Segmentation labels Slides from Junyan Zhu
  • 55.
    Use CG datato train recognition systems Per-class accuracy Per-pixel accuracy Oracle (Train and test on Real) 60.3 93.1 Train on CG, test on Real 17.9 54.0 Train on ‘free’ synthetic data (GTA5) Test on real images Slides from Junyan Zhu
  • 56.
    State-of-the-art domain adaptionmethod Per-class accuracy Per-pixel accuracy Oracle (Train and test on Real) 60.3 93.1 Train on CG, test on Real 17.9 54.0 FCN in the wild [Hoffman et al.] 27.1 (+6.0) - VGG-FCN: 17.9 [Long et al. 2015]; Dilated-VGG-FCN: 21.1 [Fisher and Koltun 2015’] Train on ‘free’ synthetic data (GTA5) Test on real images adapt Slides from Junyan Zhu
  • 57.
    Domain Adaption withCycleGAN Per-class accuracy Per-pixel accuracy Oracle (Train and test on Real) 60.3 93.1 Train on CG, test on Real 17.9 54.0 FCN in the wild [Hoffman et al.] 27.1 (+6.0) - Train on CycleGAN, test on Real 34.8 (+16.9) 82.8 Train on CycleGAN data Test on real images [Tzeng et al. ] In submission, Slides from Junyan Zhu
  • 58.
    Thank you! Jun-Yan ZhuDr. Phillip Isola Prof. Alexei Efros