2. Overview
1. What is Image-to-ImageTranslation?
2. The Problem of Image-to-ImageTranslation
3. Pix2Pix GAN for Image-to-ImageTranslation
4. Pix2Pix Architectural Details
5. Applications of the Pix2Pix GAN
6. Conclusion & References
1/20
6. Problem Definition
• It is a challenging problem that typically requires the development of a
specialized model and hand-crafted loss function for the type of translation
task being performed.
• Classical approaches , use per-pixel classification or regression models.
Ideally, a technique is required that is general, meaning that the same general
model and loss function can be used for multiple different image-to-image
translation tasks.
5/20
2
8. Pix2Pix
Pix2Pix is a Generative Adversarial Network, model designed for general purpose
image-to-image translation.The approach was presented by Phillip Isola, et al. in
their 2016 paper titled Image-to-ImageTranslation with Conditional Adversarial
Networks and presented at CVPR in 2017.
7/20
3
10. GAN and cGAN
9/20
3
The GAN architecture is an approach to training a generator model, typically
used for generating images. A discriminator model is trained to classify
images as real or fake and the generator is trained to fool the discriminator
model.
The Conditional GAN, is an extension of the GAN architecture that provides
control over the image that is generated, e.g. allowing an image of a given
class to be generated.
11. Pix2Pix GAN
10/20
3
The generator model is provided with a given image as input and generates a
translated version of the image.
The discriminator model is given an input image and a real or generated
paired image and must determine whether the paired image is real or fake.
Finally, the generator model is trained to both fool the discriminator model
and to minimize the loss between the generated image and the expected
target image.
12. Dataset Required for Pix2Pix
11/20
3
Pix2Pix GAN must be trained on image datasets that are comprised of input
images (before translation) and output or target images (after translation).
14. Pix2Pix Architecture
13/20
4
Pix2Pix GAN architecture involves the specification of the following:
• The generator model
• The discriminator model
• model optimization procedure
Both the generator and discriminator models use the standard Convolution-
BatchNormalization-ReLU blocks of layers.
15. U-Net Generator Model
14/20
4
A U-Net model architecture is used for the generator, instead of the common
encoder-decoder model. It is very similar to encoder-decoder model as it
involves down sampling to a bottleneck and up sampling again to an output
image, but links or skip-connections are made between layers of the same size
in the encoder and the decoder.
16. PatchGAN Discriminator Model
15/20
4
Unlike the standard GAN model that uses a deep convolutional neural network
to classify images, the Pix2Pix model uses a PatchGAN, which is a deep
convolutional neural network designed to classify patches of an input image as
real or fake, rather than the entire image.The output of the network is a single
feature map of real/fake predictions that can be averaged to give a single
score.
17. Composite Adversarial and L1 Loss
16/20
4
The discriminator model is trained in a standalone manner in the same way as
a standard GAN model.
The generator model is trained using both the adversarial loss for the
discriminator model and the mean absolute pixel difference between the
generated image and the expected image.
Generator Loss = Adversarial Loss + λ × L1 Loss
19. Applications
18/20
5
• Semantic labels ⇔ photo
• Architectural labels ⇒ photo
• Map ⇔ aerial photo
• Black andWhite ⇒ color photos.
• Edges ⇒ photo.
• Sketch ⇒ photo.
• Day ⇒ night photographs.
• Thermal ⇒ color photos.
• Photo with missing pixels ⇒ inpainted photo
20. Conclusion
19/20
6
• Image-to-image translation often requires specialized models and hand-
crafted loss functions.
• Pix2Pix GAN provides a general purpose model and loss function for image-
to-image translation.
• The Pix2Pix GAN was demonstrated on a wide variety of image generation
tasks, including translating photographs from day to night and product
sketches to photographs.
21. References
20/20
6
1. P. Isola, J. Zhu,T. Zhou and A. A. Efros, "Image-to-ImageTranslation with
Conditional Adversarial Networks," 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2017, pp. 5967-5976.
2. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair,
et al., "Generative adversarial nets", NIPS, 2014, pp. 2672–2680.
3. Generative Adversarial Networks with Python Deep Learning Generative
Models for Image Synthesis and ImageTranslation by Jason Brownlee.
4. Generative Adversarial Networks for Image-to-ImageTranslation by Arun
Solanki, Anand Nayyar, and Mohd Naved