Image Translation with GAN

Image Translation with GAN
Presentor : Junho Cho
Junho Cho, Perception and Intelligence Lab, SNU 1

Problem statement of Image Translation
Learn
that convert an image of source domain to an image of target domain

Image Translation: and are pair-wise labeled

Image Translation: and are not pair-wised

Before, Style Transfer (NeuralArt) was prominent

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

But it largely depends on textual information of an target style

How to learn more general Image Translation?

Generative
Adversarial
Network
GAN!

Deep Convolutional GAN
(DCGAN)

Two major problems of Image Translation
1. Convert to which domain?
• learn which " "?
2. How to learn the dataset?
• how to properly form dataset?
• pair-wise Supervised? or Unsupervised?

Today, presenting SOTA of Image Translation papers of
- pix2pix: Image-to-Image Translation with Conditional Adversarial Networks (CVPR2017)
- Domain Transfer Network: Unsupervised Cross-Domain Image Generation (ICLR2017)
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- DiscoGAN: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

1. Image-to-Image Translation with
Conditional Adversarial Networks
(pix2pix)
CVPR2017
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

Learn pair-wise images of and
- BW & Color image
- Street Scene & Label
- Facade & Label
- Aerial & Map
- Day & Night
- Edges & Photo
source image , target image (label) is pair-
wise.
thus it is Supervised Learning

Generator of pix2pix
where : image and : noise
Use U-Net shaped network
- known to be powerful at segmentation
task
- use spatial information from features of
bottom layer
- use dropout as noise in decoder part

Discriminator of pix2pix

Loss function
: source image, : target image, : noise

Result_

Do demo!
https://affinelayer.com/pixsrv/

2. Unsupervised Cross-Domain
Image Generation (DTN)
ICLR2017
Yaniv Taigman, Adam Polyak, Lior Wolf

Learn
of two related domains, and
without labels!
(labels of images are usually expensive)

Baseline model
: discriminator, : generator,
: context encoder. outputs feature. (128-dim)

•
•
• -constancy : Does have similar context?

1.
2.
• : distance metric. ex) MSE
• : "Pretrained" context encoder. Parameter ﬁxed.
• can be pretrained with classiﬁcation task on
• Minimize two Risks : and

Experimentally,
Baseline model didn't produce
desirable results.
Thus, similar but more elaborate architecture proposed

Proposed "Domain Transfer Network (DTN)"

Two Difference from the Baseline
First, : the context encoder now encode as then will
generate from it :
- focuses to generate from given context

Two Difference from the Baseline
Second, for , is also encoded by and applied
- "Pretrained on " would not be good as much as on . But enough for context encoding purpose
- : should be similar to
- Also takes and performs ternary (3-class) classiﬁcation. (one real, two fakes)

Losses

: generated from ? / : generated from ? / : sample from ?

Generator : Adversarial Loss
Fool to classify as sample from

Generator : and Identity preserving
, in feature level
, in pixel level
used as MSE in this work

•
•
minimized over
minimized over

Experiments1. Street View House Numbers (SVHN) MNIST
2. Face Emoji
Both cases, and domains differ considerably

SVHN MNIST

• 4 convs (each ﬁlters 64,128,256,128) / max pooling / ReLU
• input RGB / output 128-dim vector.
• do not need to be very powerful classiﬁer.
• achieves 4.95% error on SVHN test set
• Weaker in : 23.92% error on MNIST.
• Learn analogy of unlabeled examples

• Inspired by DCGAN
• SVHN-trained 's 128D representation
• four blocks of deconv, BN, ReLU. TanH at ﬁnal.
•
•

Evaluate DTN
Train classiﬁer on .
- Architecture same as
- MNIST performance 99.4% test set.
Evaluate by testing MNIST classiﬁer on
using : label.

Unseen Digits
Study the ability of DTN to overcome
omission of a class in samples.
For example, class '3'
Ablation applied on
- training DTN, domain
- training DTN, domain
- training .
But '3' exists in testing DTN! Compare
results.

(a) The input images. (b) Results of our DTN. (c) 3 was not in SVNH. (d) 3 was not in MNIST. (e) 3 was
not shown in both SVHN and MNIST. (f) The digit 3 was not shown in SVHN, MNIST and during the
training of f.

Domain Adaptation
: labeled, unlabeled, want to train classiﬁer of
Train k-NN classiﬁer

Face Emoji• face from Facescrub/CelebA
• emoji gained from bitmoji.com, not publicized
• preprocess on emoji with heuristics. Align face.
• from DeepFace pretrained network.
• (Taigman et al. 2014) the author's previous work
• is 256-dim
• outputs
• SR (Dong et al. 2015) to upscale ﬁnal output.

Results !
choose via validation

Original style transfer can't solve it
DTN also can style transfer.
DTN is more general than Styler Transfer method.

Limitations
• usually can be trained in one domain,
thus asymmetric.
• Handle two domains differently.
• is bad.
• Bounded by . Needs pre-traied context
encoder.
• any better way to learn context without
pretraining?
• Any more tasks?

Conclusion1. Demonstrate Domain Transfer, as an unsupervised method.
• Can be generalized to various problems.
2. -constancy to maintain context of domain &
3. Simple domain adaptation and good performance
• inspiring work to future domain adaptation research
More open reviews at OpenReview.net

3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN)
UC Berkeley (pix2pix upgrade)
&
Learning to Discover Cross-Domain Relations with Generative Adversarial Networks (DiscoGAN)
SK T-Brain

DiscoGAN & CycleGAN
Almost Identical concept.
DiscoGAN came 15 days earlier. Low resolution ( )
CycleGAN has better qualitative results ( ) and quantative experiments.
Difference from DTN
• No -constancy. Do not need pre-trained context encoder
• Only need dataset and

DiscoGAN

without cross domain matching, GAN has mode collapse
learn projection to mode in domain , while two domains have one-to-one relation

Typical GAN issue: Mode collapse
top is ideal case, bottom is mode collapse failure case

Toy problem of 2-dim Gaussian mixture model
• 5 modes of domain A to 10 modes of domain B
GAN, GAN + const show injective mapping & mode collapse
DiscoGAN shows bijective mapping & generate all 10 modes of B.

proposed DiscoGAN

CycleGAN has similar contribution on this point

Results

codes and more results in
https://github.com/SKTBrain/DiscoGAN
https://github.com/carpedm20/DiscoGAN-pytorch

CycleGAN
Use more GAN techniques: LSGAN, use image buffer of previous generated samples

failure case

CycleGAN demonstrates more experiments!
project page : https://junyanz.github.io/CycleGAN/
code available with Torch and PyTorch

Thank you!

Image Translation with GAN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Image Translation with GAN

Similar to Image Translation with GAN (20)

More from Junho Cho

More from Junho Cho (7)

Recently uploaded

Recently uploaded (20)

Image Translation with GAN