Semantic image completion and enhancement using Deep Learning

ABV-IIITM
Semantic Image Completion
and Enhancement using
Deep Learning
ICCCNT - 2019
Priyansh Saxena

ICCCNT | 2019
Objectives
To develop a well trained Wasserstein
GAN model capable of completing
masked region in the image.
To further enhance the completed image
using enhancement network for better
computer vision applications.

Novelty of
the proposal
In most of the existing completion
techniques, the completed region is noisy
and blurry and not of satisfactory quality.
To overcome this, first the Wasserstein
GAN is trained to complete the image
and then the completed image is passed
through the enhancement network which
will enhance the quality of the completed
image further thus giving better
inpainting solution than the present
techniques.

OVERALL
IMPLEMENTATION
FLOWCHART

DATA
PREPROCESSING
IMAGE MASKING:
To create corrupted images for training
the Wasserstein GAN, a binary mask is
used with values 0 or 1. 0 corresponds to
the corrupted region while 1 corresponds
to the uncorrupted region in the image.
This binary mask is applied to all images
to make them corrupted which will serve
as input of the training process.
The method is evaluated on CelebA-hq dataset.1.
64 x 64 x 3 pixels to train the Wasserstein GAN
model.
2. Each face image in the dataset is resized to
3. The enhancement network is
trained using 1000 image pairs
containing blurry images and its
corresponding clean images.

Implementation
Details
2. Each face image in the dataset is
resized to 64* 64* 3 pixels to train the
Wasserstein GAN model.
THE METHODOLOGY COULD BE
DIVIDED INTO TWO DIFFERENT
STEPS:
In first step, Wasserstein GAN based
model is developed to complete the
missing pixels in the image. The image
completion GAN gives a completed
image with missing area filled and
having contextual similarity with the
input image.
In second step,the output of the
generator is passed through the
enhancement network to further refine
the completed image.

Implementation
Details
WASSERSTEIN GAN:
GAN trains two networks
simultaneously: the generator network
G to learn the distribution of training
data and the critic network C which
distinguishes the generated samples
from the original samples.
The GAN architeture used is
Wasserstein GAN which uses
Wasserstein distance as to train the
generator so that it can capture training
data distribution and generate images
similar to those in the training data.
(Contd.)

Implementation Details
The Wasserstein
distance loss function L
to train the generator
can be mathematically
represented as:
Here, the first term represents
the expectation of the
distribution generated by the
generator and the second
term represents the
expectation of the real
training data distribution.
Wasserstein distance as Wasserstein GAN loss function:
(Contd.)

Implementation
Details
WASSERSTEIN DISTANCE AS
WASSERSTEIN GAN LOSS
FUNCTION:
By minimizing the difference between
the two, the generator learns to
generate samples having probability
distribution similar to training data
distribution.
Now, to make the learning faster we
have added gradient penalty term. So,
the overall loss function L becomes:
(Contd.)

Implementation
Details
IMAGE COMPLETION WITH
WASSERSTEIN GAN:
After training the generator to generate
samples which look real, the next aim is to
ensure that the missing region generated
has similar context to the non-missing
region so that the model gives sensible
looking completed images as output.
(Contd.)

CONTEXTUAL
LOSS:
PERCEPTUAL
LOSS:
TOTAL LOSS:
Image completion losses with Wasserstein GAN:
This loss function L(z) is minimized to ensure completed image is contextually similar to input image.
+ =
(Contd.)

Implementation
Details
ENHANCEMENT
NETWORK CONSISTS OF
FOLLOWING LAYERS:
Enhancement network using
residual learning:
Conv + ReLU: It creates feature maps,
and ReLU adds the non-linearity.
Conv + BN + ReLU: This layers contains
added batch normalization between
Conv and ReLU.
Conv: It is used to get the output
residual image.
(Contd.)

The input to the network is blurry image y = x + v,
Training enhancement network :
Here, N represents total training images.
Here x is clear image, v represents the blur added.
The residual network is trained to grasp the mapping R(y)≈v , to get the
clear image x as x = y- R(y). The loss function to learn the trainable
parameters θ in the enhancement network is as follows:
(Contd.)

binary mask, y represents the original image and G(z') represents the an image
from the generator G for some z' that gives a reasonable reconstruction of the
missing portions.
Image completion with Wasserstein GAN:
We find z' that suitably completes the image by minimizing L(z).
reconstructed represents the completed image through our model, M representsLet x
(Contd.)

Results
The following plot was obtained by
training the enhancement network
on 1000 celebA-hq image pairs of
clean and its corresponding blurr
images.
Figure: Enhancement network training plot

Results
The following Wasserstein distance
values and plot was obtained while
training Wasserstein GAN on around
15000 Celeba-hq images for 10000
epochs and batch size of 128.
(Contd.)

Results
Contextual loss plot
for image completion
Perceptual loss plot
for image completion
Total loss plot for
image completion
(Contd.)
The above contextual, perceptual and total loss plots were obtained while training the Wasserstein GAN for
image completion for 1250 epochs on 15000 Celeba-hq images.
(Contd.)

Results
Peak Signal-to-Noise
Ratio (PSNR):
(Contd.)
The following two evaluation metrics were used to evaluate the quality of the
output images by the model:
PSNR is measured in decibels (dB).The
higher the PSNR, the better image has
been completed to match the original
image.

Results
Structural Similarity Index (SSIM):
(Contd.)
The following two evaluation metrics were used to evaluate the quality of the
output images by the model:
It is used to measure the similarity between two images.
where, C represents contrast, I represents luminance, S represents structural term,
x represents original, y represents completed images.
The parameters α > 0, β > 0, and γ > 0, are used to adjust the relative importance of
the three components.

Results
Comparison of PSNR values
(Contd.)
The following PSNR and SSIM values through the model:
Comparison of SSIM values

Output
ORIGINAL
Some of the results obtained through the presented image completion technique are as follows:
INPUT OUTPUT ORIGINAL INPUT OUTPUT
ORIGINAL INPUT OUTPUT ORIGINAL INPUT OUTPUT

CONCLUSION
In most of the existing completion techniques,the completed
images are blurry due to the noise which is inevitably mixed by
the generator.of PSNR values.
To overcome this, the WGAN is first trained to generate the
missing patches in the image and then passed the completed
image given by the WGAN through a enhancement network to
remove the blur and provide better inpainting solutions.
However, in this approach, overall training is highly depended on
the data used for training.

References
Philipp Krahenbuhl Deepak Pathak.
Context encoders: feature learning by inpainting.
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 2436–2544, 2016.
Yizhen Chen and Haifeng Hu.
Neural Processing Letters, Springer, pages 1–13, Jun 2018.
Ruijun Liu, Rui Yang, Shanxi Li, Yuqian Shi, and Xin Jin.
Painting completion with generative translation models.
Multimedia Tools and Applications, Springer, pages 1–14, 2018.
An improved method for semantic image inpainting with
gans: Progressive inpainting.
Martin Arjovsky, Soumith Chintala and Leon Bottou
Wasserstein gan
In Courant Institute of Mathematical Sciences Facebook AI
Research, 2017

Semantic image completion and enhancement using Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semantic image completion and enhancement using Deep Learning

Similar to Semantic image completion and enhancement using Deep Learning (20)

Recently uploaded

Recently uploaded (20)

Semantic image completion and enhancement using Deep Learning