GAN Deep Learning Approaches to Image Processing Applications (1).pptx

GAN Deep Learning
Approaches to Image
Processing Applications
Dr.D.RUKMANI DEVI
Professor, Department of ECE,
R.M.D. Engineering College

Taxonomy of Machine Learning
Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Semi-supervised
Learning
• Image classification
• Instance
segmentation
• Object detection
• Image captioning
• Variational auto-encoders (VAE)
• Generative adversarial networks
• Autoregressive models
Classification
Classification +
Localization Object Detection
Instance
Segmentation
CAT CAT CAT
, DOG, DUCK CAT
, DOG, DUCK
Single object Multiple objects

Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Semi-supervised
Learning
Supervised learning
 Get probability of the label for given data instead of label itself
y = f(x)
f
Cat:
Cake:
Dog:
0.98
0.02
0.00
x
y
Image classification
Image segmentation
Object detection
Image captioning
Variational auto-encoders (VAE)
Generative adversarial networks
Autoregressive models

Supervised Learning
• Mathematical notation of classifying (greedy policy)
 y: label, x: data, θ*: fixed optimal parameter
y
y*
 argmax PY X;*

Optimal label
prediction
parameterized by
given
probability
get y when P is maximum
Linear model:
y = w1 * x + w2

Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Semi-supervised
Learning
Unsupervised Learning
• Find deterministic function f : z = f(x), x: data, z: latent vector
Image segmentation
Object detection
Image captioning
f [0.1, 0.3, -0.8, 0.4, …]

Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Semi-supervised
Learning
Unsupervised Learning
• Find deterministic function f : z = f(x), x: data, z: latent vector
Image segmentation
Object detection
Image captioning
y = g(z)
g
Weight initialization
[0.1, 0.3, -0.8, 0.4, …]
Cat: 0.98
Cake: 0.02
Dog: 0.00
f
Supervised Learning:
limited data
Unsupervised Learning: unlimited data

Self Learning
• Use data itself as label
self learning
Convert unsupervised learning into reconstruction
• z = f(x), x = g(z) x = g(f(x))
f
(encoder)
g
(decoder)
x x'
Supervised Learning
with L2 loss ( = MSE)
MSE: mean squared error
Stacked Auto-Encoder
z
[0.1, 0.3, -0.8, 0.4, …]

I. Random image generation
II. Image-to-image translation
(Cross-modality synthesis)
Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Semi-supervised
Learning
Random face generation
Image-to-image
translation
Paired data
Unpaired data

Random Image Generation
• The generative model can be thought of as analogous to a team of
counterfeiters, trying to produce fake currency and use it without detection,
while the discriminative model is analogous to the police, trying to detect
the counterfeit currency. (From GAN paper NIPS2014)
G: Generator
D: Discriminator
z: random vector
x: real data
z
x
G
D Real or Fake?
Gaussian noise
as an input for G
Test phase

Image-to-Image Translation
f
(encoder)
g
(decoder)
z
Latent vector
x y'
MSE: mean squared error
VAE: variational auto encoder
Stacked Auto-Encoder
Supervised Learning
with L2 loss ( = MSE)
y
• Use data itself as label Convert unsupervised learning into reconstruction
self learning
• z = f(x), x = g(z) x = g(f(x))

Generative Models
Tractable
density
• Fully visible belief nets
• Neural autoregressive
distribution estimator
(NADE)
• Masked autoencoder
for distribution
estimation (MADE)
• PxielRNN
• Change of variables
models (nonlinear ICA)
…
Maximum Likelihood
Explicit density Implicit density
Approximate
density
Variational
• Variational
autoencoder (VAE)
Markov chain
• Boltzmann machine
Markov chain
• Generative
Stochastic
Networks
Direct
• Generative
adversarial
networks
(GAN)

Generative Models
• Three image generation approaches are dominating the field:
VAE GAN Autoregressive Models
Pros.
- Efficient inference with
approximate latent variables.
- Generate sharp images.
- No need for any Markov chain or
approximate networks during
sampling.
- Very simple and stable training
process.
- Currently gives the best log likelihood.
- Tractable likelihood.
Cons. - Generated samples tend to be
blurry.
- Difficult to optimize due to unstable
training dynamics.
- Relatively inefficient during sampling
VariationalAuto-Encoders (VAE)
Z
X 
x p x z
 
q z x

z p z
Decoder
Encoder
GenerativeAdversarial Networks (GAN)
Z G
Real
X
Fake D Real or
Fake?
generate
minmaxV(D,G) 
G D


x~ pdata x   z~ pz z 
log Dx   log1 DGz
Autoregressive Models

n2
i 1 i1
i1
px  px x ,..., x

 Most of the mainstream neural nets can be easily fooled into
misclassifying things by adding only a small amount of noise into
the original data.
 Sometimes the model after adding noise has higher confidence in
the wrong prediction than when it is predicted correctly.
 The reason for such an adversary is that most machine learning
models learn from a limited amount of data, which is a huge
drawback, as it is prone to overfitting.
 Also, the mapping between the input and the output is almost
linear and even a small change in a point in the feature space
might lead to misclassification of data.
Why do we need GAN?

Generative:
To learn a generative model, which describes how data is
generated in terms of a probabilistic model.
Adversarial:
The training of a model is done in an adversarial setting.
Networks:
Use deep neural networks as artificial intelligence (AI) algorithms
for training purposes.
How GAN works

Generative Adversarial Neural Network (GAN)
Deep Learning :
The purpose of deep learning is to discover rich,
hierarchical models that represent probability
distributions over the kinds of data encountered in
artificial intelligence applications, such as natural images,
audio waveforms containing speech, and symbols in
natural language.

 Generative Adversarial Networks (GANs) are a powerful class of
neural networks that are used for unsupervised learning.
 GANs Enable Multi-Model Outputs
 It was developed and introduced by Ian J. Goodfellow in 2014.
 GANs are basically made up of a system of two competing neural
network models which compete with each other and are able to
analyze, capture and copy the variations within a dataset.
 GANs can create anything whatever we feed to them, as they
Learn- Generate-Improve.

Convolutional Neural Networks:
 To understand GANs first we must have little understanding of
Convolutional Neural Networks.
 CNNs are trained to classify images with respect to their labels if
an image is fed to a CNN, it analyzes the image pixel by pixel and
is passed through nodes present in CNN’s hidden layers and as an
output, it tells what the image is about or what it sees in the
image.

For example:
If a CNN is trained to classify dogs and cats and an image is fed to
this CNN, it can tell whether there is a dog or a cat in that image.
Therefore, it can also be called as a classification algorithm.

 In GANs, there is a generator and a discriminator.
 The Generator generates fake samples of data and tries to fool
the Discriminator.
 The Discriminator, on the other hand, tries to distinguish
between the real and fake samples.
 The Generator and the Discriminator are both Neural Networks
and they both run in competition with each other in the training
phase. The steps are repeated several times and in this, the
Generator and Discriminator get better and better in their
respective jobs after each repetition.

Here, the generative model captures the distribution of data
and is trained in such a manner that it tries to maximize the
probability of the Discriminator making a mistake.
The Discriminator, on the other hand, is based on a model that
estimates the probability that the sample that it got is received
from the training data and not from the Generator.

Generator
• A generator G is a network. The network defines a
probability distribution 𝑃𝐺
generator
G
𝑧 𝑥 = 𝐺 𝑧
Normal
Distribution
𝑃𝐺(‫)ݔ‬ 𝑃𝑑𝑎𝑡𝑎 𝑥
as close as possible
How to compute the divergence?
𝐺∗ = 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
Divergence between distributions 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎
𝑥: an image (a high-
dimensional vector)

Generation
NN
Generator
Image Generation
Sentence Generation
NN
Generator
We will control what to generate
latter. → Conditional Generation
0.1
−0.1
⋮
0.7
−0.3
0.1
⋮
0.9
0.1
−0.1
⋮
0.2
−0.3
0.1
⋮
0.5
Good morning.
How are you?
0.3
−0.1
⋮
−0.7
0.3
−0.1
⋮
−0.7 Good afternoon.
In a specific range

Basic Idea of GAN
Generator
It is a neural network
(NN), or a function.
Generator
0.1
−3
⋮
2.4
0.9
image
vector
Generator
3
−3
⋮
2.4
0.9
Generator
0.1
2.1
⋮
5.4
0.9
Generator
0.1
−3
⋮
2.4
3.5
Each dimension of input vector
represents some characteristics.
Longer hair
blue hair Open mouth
high
dimensional
vector

A generator is an Inverse Convolutional Neural Net, it does exactly the opposite of what a CNN does
because in CNN an actual image is given as an input, and a classified label is expected as an output but, in
a Generator, a random noise (a vector having some values to be precise) is given as an input to this Inverse
CNN and an actual image is expected as an output.
In simple terms, it generates data from a piece of data using its
own imagination.

Discri-
minator
scalar
image
Basic Idea of GAN It is a neural network
(NN), or a function.
Larger value means real,
smaller value means fake.
Discri-
minator
Discri-
minator
Discri-
minator
1.0 1.0
0.1 Discri-
minator
0.1

Discriminator
𝐺
Although we do not know the distributions of 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎,
we can sample from them.
sample
G
vector
vector
vector
vector
sample from
normal
Database
Sampling from 𝑷𝑮
Sampling from 𝑷𝒅𝒂𝒕𝒂

Discriminator 𝐺∗
= 𝑎𝑟𝑔 min
𝐺
Discriminator
: data sampled from 𝑃𝑑𝑎𝑡𝑎
: data sampled from 𝑃𝐺
train
𝑉 𝐺, 𝐷 = 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥 + 𝐸𝑥∼𝑃𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
Example Objective Function for D
(G is fixed)
𝐷∗ = 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺
Training:
Using the example objective
function is exactly the same as
training a binary classifier.
The maximum objective value
is related to JS divergence.

𝐺
max
𝐷
𝑉 𝐺, 𝐷
The maximum objective value
is related to JS divergence.
• Initialize generator and discriminator
• In each training iteration:
Step 1: Fix generator G, and update discriminator D
Step 2: Fix discriminator D, and update generator G
𝐷∗ = 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺

The discriminator is a Convolutional Neural Network consisting of many
hidden layers and one output layer, the major difference here is the
output layer of GANs can have only two outputs
The output of the discriminator can
either be 1 or 0 because of a
specifically chosen activation
function for this task, if the output is
1 then the provided data is real and if
the output is 0 then it refers to it as
fake data.
The Discriminator is trained on the real data so it learns to recognize what actual
data looks like and what features should the data have to be classified as real.

D 1/0
log(1-D(x)) + log(D(G(z)))
log(1-D(G(z)))
Generator
Make G(z) indistinguishable
from real data for D
Discriminator
Tell G(z) as fake data from x
being real ones
real samples
z~pZ G G(z)
random noise fake samples
x~pX
A loss function for training generative models

MMGAN (Minimax GAN)
15
• Ian J. Goodfellow, et al., “Generative adversarial Nets,” NIPS2014
      
 
data
G D
min maxV(D,G)  x~ p
 
    
z 
x z~ p z
log D x    log 1 D G z
Real
P
D
x G(z)
Fake
G
Fully Connected
Neural Network
G: Generator
D: Discriminator
x~pz(x): Real data distribution
z~pz(z): Random vector (Gaussian or Uniform distribution)
P: probability for real data (0-1)
(N, 100)
z
Fully Connected
Neural Network
(N, 28, 28, 1)
Sigmoid()
(N, 28, 28, 1)

Working with both Generator and Discriminator together
The discriminator is trained on actual data to classify whether
given data is true or not, so Discriminator’s work is to tell what’s
real and what’s fake.
The generator starts to generate data from a random input and
then that generated data is passed to Discriminator as input now
Discriminator analyzes the data and checks how close it is to
being classified as real if the generated data does not contain
enough features to be classified as real by the Discriminator, then
this data and weights associated with it are sent back to the
Generator using backpropagation.

so that it can re-adjust the weights associated with the data and
create new data which is better than the previous one. This
freshly generated data is again passed to the Discriminator, and
it continues.
This process keeps repeating as long as the Discriminator keeps
classifying the generated data as fakes, every time data is
classified as fake, and with every backpropagation the quality of
data keeps getting better and better and there comes a time
when the Generator becomes so accurate that it becomes tough
to distinguish between the real data and the data generated by
the Generator.

The discriminator is a trained guy who can tell what’s real and
what’s fake and Generator is trying to fool the Discriminator and
make him believe that the generated data is real, with each
unsuccessful attempt Generator learns and improves itself to
produce data more real like.
It can also be stated as a competition between Generator and
Discriminator.

Use Cases of Generative Adversarial Networks
Generative Adversarial Networks (GANs) are most popular for
generating images from a given dataset of images but GANs is now
being used for a variety of applications. These are a class of neural
network that has a discriminator block and a generator block which
works together and is able to produce new samples apart from just
classifying of predicting the class of sample.
It can generates new data.

In area of data security :
Artificial intelligence has proved to be a boon to many industries
but it is also surrounded by the problem of Cyber threats.GANs
are proved to be a great help to handle the adversarial attacks.
The adversarial attacks use a variety of techniques to fool deep
learning architectures. By creating fake examples and training the
model to identify them we counter these attacks.

Generating Data using GANs:
Data is the most important key for any deep learning algorithm. In
general, more is the data, better is the performance of any deep
learning algorithm. But in many cases such as health diagnostics,
the amount of data is restricted, in such cases, there is a need to
generate good quality data. For which GANs are being used.

Privacy Preservance using GAN :
There are many cases when our data needs to be kept confidential.
This is especially useful in defense and military applications. We have
many data encryption schemes but each has its own limitations, in
such a case GANs can be useful.
Recently, Google opened a new research path on using GAN
competitive framework for encryption problem, where two networks
had to compete in creating the code and cracking it

Data Manipulation using GAN:
We can use GANs for pseudo style transfer i.e. modifying a part of
subject, without complete style transfer. For e.g. in many
applications, we want to add a smile to an image, or just work on
the eyes part of the image. This can also be extended to other
domains such as Natural Language Processing, speech processing,
etc. For e.g. we can work on some selected words of a paragraph
without modifying the whole paragraph.

 A GAN consists of two neural nets a generator and a
discriminator.
 A generative model (G) that captures the data distribution,
and a discriminative model (D) that estimates the probability
that a sample came from the training data rather than G.
The training procedure for G is to maximize the probability
of D making a mistake.
 The generator creates fake images , a discriminator tries to
discern if it is real or if it is fake. Over time both get better.
 In recent ,NVIDIA found that by progressively adding new
layers to both networks improved the quality of the network
and speed up training time.

Deep Convolutional GAN(DCGAN)
DCGAN is a generative adversarial network architecture based on CNNs. It uses a couple of
guidelines, in particular:
•Replacing pooling layers with strided convolutions (discriminator) and fractional-strided
convolutions (generator).
•Using batch norm in both the generator and the discriminator.
•Removing fully connected hidden layers for deeper architectures.
•Using ReLU activation in the generator for all layers except for the output, which uses tanh.
•Using LeakyReLU activation in the discriminator for all layers.
This network takes in a 100x1 noise vector, denoted z,
and maps it into the G(Z) output which is 64x64x3. We
see the network goes from -> 100x1 → 1024x4x4 →
512x8x8 → 256x16x16 → 128x32x32 → 64x64x3

Conditional GAN (CGAN)
D 1/0
real samples
G
z~pZ G(z, y)
random noise fake samples
x
(x, y)~pX, Y
y~pY
conditions
Key—Feed conditions to both G and D
Discriminator now examine whether
a pair (x, y) or (G(z), y) is real or not
Generator now generate samples
based on some conditions
y
conditions

Dog Cat Tiger
Conditional GAN—Samples

pix2pixGAN
DY 1/0
G(x)
x
fake samples
GX→Y
real samples
(x, y)~pX, Y
y
real samples
Key—Use pixel-wise loss for supervisions
pixel-wise
loss

real samples
x~pX
y~pY
DY 1/0
GX→Y(x)
fake samples
GX→Y
real samples
GY→X(y)
fake samples
GY→X
DX
1/0
GAN 1
GAN 2
Cycle-consistent GAN (CycleGAN)
The CycleGAN is a technique that involves the automatic training of image-to-image
translation models without paired examples.

(CycleGAN) We only need a collection of Monet’s paintings and a collection of photos  easier to acquire
CycleGAN Samples

Issues with GANs
Training GANs is a non-trivial problem because of the min-max game
formulation of the GAN architecture.
Common failure modes in training GANs are:
•Non-convergence: the model parameters oscillate, destabilize and
never converge.
•Mode collapse: the generator collapses, which produces a limited
variety of samples.
•Diminished gradient: the discriminator gets too successful that the
generator gradient vanishes and learns nothing.
•Unbalance between the generator and discriminator causes
overfitting.
•Highly sensitive to the hyperparameter selections.

Organ Segmentation
• Organ segmentation is a crucial step to obtain effective computer-aided detection on
Chest X-ray (CXR). Original CXR image Lung field annotation
Dai, Wei, et al. "Scan: Structure correcting adversarial network for chest x-rays organ segmentation." arXiv preprint arXiv:1703.08770 (2017).
(N, 400, 400, 1)
(N, 400, 400, 4)
(N, 400, 400, 4)
(N, 400, 400, 1+4)

JSRT (4 labels) Montgomery (3 labels)
Dai, Wei, et al. "Scan: Structure correcting adversarial network for chest x-rays organ segmentation." arXiv preprint arXiv:1703.08770 (2017).

Conclusion
•GAN is an active area of research
•GAN architecture is flexible to support variety of learning
problems
•GAN does not guarantee to converge
•GAN is able to capture perceptual similarity and generates
better
images than VAE
•Needs a lot of work in theoretic foundation of Network
• Evaluation of GAN is still an open research

GAN Deep Learning Approaches to Image Processing Applications (1).pptx

Recommended

Recommended

More Related Content

Similar to GAN Deep Learning Approaches to Image Processing Applications (1).pptx

Similar to GAN Deep Learning Approaches to Image Processing Applications (1).pptx (20)

Recently uploaded

Recently uploaded (20)

GAN Deep Learning Approaches to Image Processing Applications (1).pptx