The document summarizes Junho Cho's presentation on image translation using generative adversarial networks (GANs). It discusses several papers on this topic, including pix2pix, which uses conditional GANs to perform supervised image-to-image translation on paired datasets; Domain Transfer Network (DTN), which uses an unsupervised method to perform cross-domain image generation; and CycleGAN and DiscoGAN, which can perform unpaired image-to-image translation using cycle-consistent adversarial networks. The presentation provides an overview of each method and shows examples of their applications to tasks such as semantic segmentation, style transfer, and domain adaptation.
1. Image Translation with GAN
Presentor : Junho Cho
Junho Cho, Perception and Intelligence Lab, SNU 1
2. Problem statement of Image Translation
Learn
that convert an image of source domain to an image of target domain
Junho Cho, Perception and Intelligence Lab, SNU 2
3. Image Translation: and are pair-wise labeled
Junho Cho, Perception and Intelligence Lab, SNU 3
4. Image Translation: and are not pair-wised
Junho Cho, Perception and Intelligence Lab, SNU 4
17. Two major problems of Image Translation
1. Convert to which domain?
• learn which " "?
2. How to learn the dataset?
• how to properly form dataset?
• pair-wise Supervised? or Unsupervised?
Junho Cho, Perception and Intelligence Lab, SNU 17
18. Today, presenting SOTA of Image Translation papers of
- pix2pix: Image-to-Image Translation with Conditional Adversarial Networks (CVPR2017)
- Domain Transfer Network: Unsupervised Cross-Domain Image Generation (ICLR2017)
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- DiscoGAN: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
Junho Cho, Perception and Intelligence Lab, SNU 18
19. 1. Image-to-Image Translation with
Conditional Adversarial Networks
(pix2pix)
CVPR2017
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
Junho Cho, Perception and Intelligence Lab, SNU 19
23. Learn pair-wise images of and
- BW & Color image
- Street Scene & Label
- Facade & Label
- Aerial & Map
- Day & Night
- Edges & Photo
source image , target image (label) is pair-
wise.
thus it is Supervised Learning
Junho Cho, Perception and Intelligence Lab, SNU 23
24. Generator of pix2pix
where : image and : noise
Use U-Net shaped network
- known to be powerful at segmentation
task
- use spatial information from features of
bottom layer
- use dropout as noise in decoder part
Junho Cho, Perception and Intelligence Lab, SNU 24
37. Baseline model
: discriminator, : generator,
: context encoder. outputs feature. (128-dim)
Junho Cho, Perception and Intelligence Lab, SNU 37
38. •
•
• -constancy : Does have similar context?
Junho Cho, Perception and Intelligence Lab, SNU 38
39. 1.
2.
• : distance metric. ex) MSE
• : "Pretrained" context encoder. Parameter fixed.
• can be pretrained with classification task on
• Minimize two Risks : and
Junho Cho, Perception and Intelligence Lab, SNU 39
40. Experimentally,
Baseline model didn't produce
desirable results.
Thus, similar but more elaborate architecture proposed
Junho Cho, Perception and Intelligence Lab, SNU 40
42. Two Difference from the Baseline
First, : the context encoder now encode as then will
generate from it :
- focuses to generate from given context
Junho Cho, Perception and Intelligence Lab, SNU 42
43. Two Difference from the Baseline
Second, for , is also encoded by and applied
- "Pretrained on " would not be good as much as on . But enough for context encoding purpose
- : should be similar to
- Also takes and performs ternary (3-class) classification. (one real, two fakes)
Junho Cho, Perception and Intelligence Lab, SNU 43
45. : generated from ? / : generated from ? / : sample from ?
Junho Cho, Perception and Intelligence Lab, SNU 45
46. Generator : Adversarial Loss
Fool to classify as sample from
Junho Cho, Perception and Intelligence Lab, SNU 46
47. Generator : and Identity preserving
, in feature level
, in pixel level
used as MSE in this work
Junho Cho, Perception and Intelligence Lab, SNU 47
49. Experiments1. Street View House Numbers (SVHN) MNIST
2. Face Emoji
Both cases, and domains differ considerably
Junho Cho, Perception and Intelligence Lab, SNU 49
51. • 4 convs (each filters 64,128,256,128) / max pooling / ReLU
• input RGB / output 128-dim vector.
• do not need to be very powerful classifier.
• achieves 4.95% error on SVHN test set
• Weaker in : 23.92% error on MNIST.
• Learn analogy of unlabeled examples
Junho Cho, Perception and Intelligence Lab, SNU 51
52. • Inspired by DCGAN
• SVHN-trained 's 128D representation
• four blocks of deconv, BN, ReLU. TanH at final.
•
•
Junho Cho, Perception and Intelligence Lab, SNU 52
54. Evaluate DTN
Train classifier on .
- Architecture same as
- MNIST performance 99.4% test set.
Evaluate by testing MNIST classifier on
using : label.
Junho Cho, Perception and Intelligence Lab, SNU 54
56. Unseen Digits
Study the ability of DTN to overcome
omission of a class in samples.
For example, class '3'
Ablation applied on
- training DTN, domain
- training DTN, domain
- training .
But '3' exists in testing DTN! Compare
results.
Junho Cho, Perception and Intelligence Lab, SNU 56
57. (a) The input images. (b) Results of our DTN. (c) 3 was not in SVNH. (d) 3 was not in MNIST. (e) 3 was
not shown in both SVHN and MNIST. (f) The digit 3 was not shown in SVHN, MNIST and during the
training of f.
Junho Cho, Perception and Intelligence Lab, SNU 57
59. Domain Adaptation
: labeled, unlabeled, want to train classifier of
Train k-NN classifier
Junho Cho, Perception and Intelligence Lab, SNU 59
60. Face Emoji• face from Facescrub/CelebA
• emoji gained from bitmoji.com, not publicized
• preprocess on emoji with heuristics. Align face.
• from DeepFace pretrained network.
• (Taigman et al. 2014) the author's previous work
• is 256-dim
• outputs
• SR (Dong et al. 2015) to upscale final output.
Junho Cho, Perception and Intelligence Lab, SNU 60
61. Results !
choose via validation
Junho Cho, Perception and Intelligence Lab, SNU 61
62. Original style transfer can't solve it
DTN also can style transfer.
DTN is more general than Styler Transfer method.
Junho Cho, Perception and Intelligence Lab, SNU 62
63. Limitations
• usually can be trained in one domain,
thus asymmetric.
• Handle two domains differently.
• is bad.
• Bounded by . Needs pre-traied context
encoder.
• any better way to learn context without
pretraining?
• Any more tasks?
Junho Cho, Perception and Intelligence Lab, SNU 63
64. Conclusion1. Demonstrate Domain Transfer, as an unsupervised method.
• Can be generalized to various problems.
2. -constancy to maintain context of domain &
3. Simple domain adaptation and good performance
• inspiring work to future domain adaptation research
More open reviews at OpenReview.net
Junho Cho, Perception and Intelligence Lab, SNU 64
65. 3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN)
UC Berkeley (pix2pix upgrade)
&
Learning to Discover Cross-Domain Relations with Generative Adversarial Networks (DiscoGAN)
SK T-Brain
Junho Cho, Perception and Intelligence Lab, SNU 65
66. DiscoGAN & CycleGAN
Almost Identical concept.
DiscoGAN came 15 days earlier. Low resolution ( )
CycleGAN has better qualitative results ( ) and quantative experiments.
Difference from DTN
• No -constancy. Do not need pre-trained context encoder
• Only need dataset and
Junho Cho, Perception and Intelligence Lab, SNU 66
69. without cross domain matching, GAN has mode collapse
learn projection to mode in domain , while two domains have one-to-one relation
Junho Cho, Perception and Intelligence Lab, SNU 69
70. Typical GAN issue: Mode collapse
top is ideal case, bottom is mode collapse failure case
Junho Cho, Perception and Intelligence Lab, SNU 70
71. Toy problem of 2-dim Gaussian mixture model
• 5 modes of domain A to 10 modes of domain B
GAN, GAN + const show injective mapping & mode collapse
DiscoGAN shows bijective mapping & generate all 10 modes of B.
Junho Cho, Perception and Intelligence Lab, SNU 71
78. codes and more results in
https://github.com/SKTBrain/DiscoGAN
https://github.com/carpedm20/DiscoGAN-pytorch
Junho Cho, Perception and Intelligence Lab, SNU 78
79. CycleGAN
Use more GAN techniques: LSGAN, use image buffer of previous generated samples
Junho Cho, Perception and Intelligence Lab, SNU 79
86. CycleGAN demonstrates more experiments!
project page : https://junyanz.github.io/CycleGAN/
code available with Torch and PyTorch
Junho Cho, Perception and Intelligence Lab, SNU 86