Finding connections among images using CycleGAN

CycleGAN
Finding connections among images
Taesung Park, UC Berkeley

Generated images in the reverse direction

How does it work?
pix2pix CycleGAN
GAN

pix2pix (Isola et al., 2017)
?
train
test
G
• Supervised
• loss: Minimize the difference
between output G(x) and
ground truth y
Data from [Tylecek, 2013]
x y

Loss: Minimize the difference between output G(x) and the ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1

Let another deep network point out the difference

Generative Adversarial Network (GAN) (Goodfellow et al., 2014)
Let’s skip

Generative Adversarial Network (GAN) (Goodfellow et al., 2014)
Let another deep network point out the difference

Generator
[Goodfellow et al., 2014](slides credit: Phillip Isola)

D tries to identify the fakes
G tries to synthesize fake images that fool D
Generator Discriminator
real or fake?

fake (0.9)
real (0.1)

G tries to synthesize fake images that fool
D:
real or fake?

G tries to synthesize fake images that fool the best
D:
real or fake?

Loss Function
G’s perspective: D is a loss function.
Rather than being hand-designed, it is learned.
G and D grow together via competition
[Isola et al., 2017]
[Goodfellow et al., 2014]
(slides credit: Phillip Isola)

Pix2pix (P. Isola et al., 2017)
Loss: Minimize the difference between and output G(x) and ground truth y
෍
(𝑥,𝑦)
𝑦 − 𝐺(𝑥) 1 + 𝐿 𝐺𝐴𝑁 (𝐺 𝑥 , 𝑦)

CycleGAN (Zhu et al. 2017)
pix2pix CycleGAN
?
?

CycleGAN
?
?
G
Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦
G(x) should just look photorealistic

CycleGAN
?
?
G
Loss: 𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦
and be able to reconstruct x

CycleGAN
?
?
Loss
𝐿 𝐺𝐴𝑁 𝐺 𝑥 , 𝑦 + 𝐹 𝐺 𝑥 − 𝑥 1
and F(G(x)) should be F(G(x)) = x,
where F is the inverse deep network
G
F

G
F

G
F
𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1

+ 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1

Input Output Input Output
[Zhu, Park, Isola, Efros, submitted](slides credit: Phillip Isola)

Photo Monet Van Gogh Ukiyo-e Cezanne

Ablation Study on Cityscapes dataset
+ 𝐿 𝐺𝐴𝑁 𝐹 𝑦 , 𝑥 + 𝐺 𝐹 𝑦 − 𝑦 1

Reconstructed Images
Original CG Fake Photo Reconstruction Real Photo

Generating High Quality Images using CycleGAN

Training Details: Generator 𝐺
Encoder-decoderU-Net
[Ronneberger et al.]
ResNet [He et al.] [Johnson et al.]
• Both have skip connections
• ResNet: fewer parameters
better for ill-posed problems

Training Details: Objective
• GANs with cross-entropy loss
• Least square GANs [Mao et al. 2016]
Stable training + better results
Vanishing gradients

Combine with as much L1 loss as possible
• L1 loss as a stable guiding force in GAN training
pix2pix

CycleGAN
?

Training Details: replay bufferdiscriminatoroutput
epoch epoch
• … because the discriminators can take very different trajectories in training

Application on Domain Adaptation

Inspired by [Johnson et al. 2011], Slides from Junyan Zhu
GTA5 CG Input Output
CG2Real: GTA5 → real streetview

Cityscape Input Output
CG2Real: real streetview → GTA5
Slides from Junyan Zhu

[Richter*, Vineet* et al. 2016]
GTA5 images Segmentation labels

Use CG data to train recognition systems
Per-class accuracy Per-pixel accuracy
Oracle (Train and test on Real) 60.3 93.1
Train on CG, test on Real 17.9 54.0
Train on ‘free’ synthetic data (GTA5) Test on real images

State-of-the-art domain adaption method
FCN in the wild [Hoffman et al.] 27.1 (+6.0) -
VGG-FCN: 17.9 [Long et al. 2015]; Dilated-VGG-FCN: 21.1 [Fisher and Koltun 2015’]
Train on ‘free’ synthetic data (GTA5) Test on real images
adapt

Domain Adaption with CycleGAN
FCN in the wild [Hoffman et al.] 27.1 (+6.0) -
Train on CycleGAN, test on Real 34.8 (+16.9) 82.8
Train on CycleGAN data Test on real images
[Tzeng et al. ] In submission, Slides from Junyan Zhu

Thank you!
Jun-Yan Zhu Dr. Phillip Isola Prof. Alexei Efros

Finding connections among images using CycleGAN

More Related Content

What's hot

Viewers also liked

Similar to Finding connections among images using CycleGAN

More from NAVER Engineering

Recently uploaded

Finding connections among images using CycleGAN