The presentation focuses on the utilization of different deep generative models for synthetic image generation. These models include Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Auto-Regressive Models, and Flow-Based Models.
Firstly, the presentation introduces VAEs, which are probabilistic models that aim to encode input images into a latent space and generate new images by sampling from this latent space. It explains the underlying principles of VAEs and their ability to generate diverse and realistic synthetic images.
Next, the presentation delves into GANs, which involve two competing neural networks: a generator network and a discriminator network. The generator network generates synthetic images, while the discriminator network learns to distinguish between real and synthetic images. The presentation describes the training process and the theoretical basis of GANs.
The presentation further explores Auto-Regressive Models, which model the joint probability distribution of the image pixels conditioned on previous pixels. It discusses how these models leverage the dependencies among pixels to generate coherent and high-quality synthetic images.
Flow-Based Models, another class of generative models, are then introduced. These models learn a bijective transformation between a simple base distribution and the target distribution of images. The presentation explains how these models can generate images by sampling from the base distribution and applying the inverse transformation.
Finally, the presentation highlights the Triple GAN, a specific type of GAN that exhibits superiority in synthetic image generation compared to other models and existing GANs. It discusses the unique characteristics of Triple GAN, such as its improved stability and ability to generate high-resolution images. The presentation supports these claims by providing mathematical proofs and presenting implementation results that demonstrate the superior performance of Triple GAN in generating realistic and diverse synthetic images.
Overall, the presentation covers various deep generative models, their principles, and their applications in synthetic image generation. It emphasizes the superiority of Triple GAN, supported by mathematical proofs and implementation results, showcasing its advancements in this field.
3. GENERATIV
E MODELS
Auto Regressive Models
Variational Autoencoders (VAE
)
Generative Adversarial
Networks (GAN )
Flow based Generative Model
4. •Need be trained on a large dataset of images
•learns the conditional distribution of each pixel given the previous pixels
•The model would generate the image one pixel at a time, conditioning
on the previous pixels
•May suffer from the problem of exposure bias
•They can generate high-quality samples that are similar to the original
data
Auto
Regressive
Models
•learns a bijective mapping between a simple base distribution and the
target data distribution
•The mapping is implemented as a series of invertible transformations
•fast sampling, stable training and no need for specialized inference,
easier to implement
•Limited modelling power, Large memory requirements, Lack of diversity
Flow
based
Models
5. •VAEs learn a probabilistic mapping from data to a low-
dimensional latent space
• Generates output vectors that are similar, but not identical, to the
source images
•VAE introduces variability with mean and standard deviation
layer while maintaining similarity to source images
• Generate blurry outputs, and are unrealistic
VAE
•GANs learn a non-probabilistic mapping from random noise to
data
• Can generate highly realistic variations of real-world data
• GAN models can fail to converge
• GAN models can collapse and also computationally intensive
•with mean and standard deviation layer while maintaining
similarity to source images
GAN
8. GAN : THE ADVERSARIAL GAME (2 PLAYER
FORMULATION)
● GAN is formulated as a two-player game
● The Generator G takes a random noise z
as input and produces a sample G(z) in
the data space
● The discriminator D identifies whether a
certain sample comes from the true data
distribution p(x) or the generator.
● Both G and D are parameterized as
deep neural networks and the training
procedure is to solve a minimax
problem
● pz(z) is a simple distribution
11. ❖ ArgmaxD {V(G,D)}
❖ The Optimal
Discriminator is
D*
G(x) = p(x)/(pg(x) +
p(x))
in the nonparametric
setting, and the global
equilibrium of this game
is achieved if and only if
pg(x) = p(x)
❖ ArgmaxG {V(DG,D)}
❖ G* = ∫-log2(pG(x)+p(x)) +
p(x)[log2 +
log(p(x)/pG(x)+p(x))]
❖ G = ∫-log2(pG(x)+p(x)) + p(x)[log2 +
log(p(x)/pG(x)+p(x))]
12. PROBLEMS WITH GAN
Mode Collapse Problem with Counting
Mode Collapse occurs due to a
sample or feature getting overfit
and that fools the discriminator,
Thus it is being outputted, It
occurs due to over training
13. PROBLEMS WITH GAN
Problem with perspective Vanishing Gradient
GAN is effected by the Vanishing
Gradient Problem, Some features
lose their importance over a
period of time, After running for
many iterations
15. PROBLEMS WITH GAN – AND DISCUSSIONS
❖ Two alternative training
objectives that work well for
either classification or image
generation in SSL, but not both.
They are :
1. Feature matching works well in
classification but fails to generate
indistinguishable samples
2. Minibatch discrimination is good
at realistic image generation but
cannot predict labels accurately
❖ Disentangling meaningful
physical factors like the object
category from the latent
representations with limited
supervision is of general interest
Reason : A single discriminator network which has the sole role of
distinguishing whether a data-label pair is from the real labeled dataset or
not.
16. Existing GANs in SSL have two problems:
(1) the generator and the discriminator (i.e., the classifier) may not be optimal at
the same time
(2) the generator cannot control the semantics of the generated samples. The
problems essentially arise from the two-player formulation, where a single
discriminator shares incompatible roles of identifying fake samples and predicting
labels and it only estimates the data without considering the labels
18. TRIPLE GAN
Triple generative adversarial network (Triple-GAN) framework for both
classification and class-conditional image generation with limited
supervision
Two conditional networks—a Classifier and a Generator to generate
fake labels given real data and fake data given real labels, which will
perform the classification and class-conditional generation tasks
respectively. To jointly justify the quality of the samples from the
conditional networks, we define a discriminator network which has the
sole role of distinguishing whether a data-label pair is from the real
labeled dataset or not. The resulting model is called Triple-GAN
because we consider three networks as well as three joint distributions,
i.e., the true data-label distribution and the distributions defined by
the two conditional networks
19. OBJECTIVES OF TRIPLE GAN
Characterize the
process of
Classification
Class-
conditional
generation in SSL
20. COMPONENTS OF TRIPLE GAN
A classifier C that (approximately) characterizes the
conditional distribution pc(y|x) ≈ p(y|x)
A class-conditional generator G that (approximately)
characterizes the conditional distribution in the other
direction pg(x|y) ≈ p(x|y); and
A discriminator D that distinguishes whether a pair of data
(x, y) comes from the true distribution p(x, y). All the
components are parameterized as neural networks
21. THREE PLAYER
FORMULATION
In the game, after a sample x is drawn from p(x), C
produces a fake label y given x following the conditional
distribution pc(y|x).
The fake input-label pair is a sample from the joint
distribution pc(x, y) = p(x)pc(y|x).
Similarly, a fake input-label pair is sampled from G by
first drawing y ∼ p(y) and then drawing x|y ∼ pg(x|y),
Hence from the joint distribution pg(x, y) =
p(y)pg(x|y).For pg(x|y), we assume that x is transformed
by the latent style variables z given the label y, namely,
x = G(y, z), z ∼ pz(z), where pz(z) is a simple distribution
(e.g., uniform or standard normal). The fake input-label
pairs (x, y) generated by both C and G are sent to the
discriminator D. D can also access the input-label pairs
from the true data distribution as positive samples.
Our desired equilibrium is that the joint distributions
defined by the classifier and the generator both converge
to the true data distribution
22. LOSS FUNCTION
The objective functions in the process as
adversarial losses :
minC,G max D Ep(x,y) [log D(x, y)] + αEpc(x,y)
[log(1 − D(x, y))] +(1 − α)Epg(x,y) [log(1 −
D(G(y, z), y))]
where α ∈ (0, 1) is a constant that
controls the relative importance of
classification and generation
for convenience α=1/2 .
23. For any fixed C and G, the optimal D of the game
defined by the utility function U(C, G, D) is:
D*C,G(x, y) = p(x, y)/ (p(x, y) + pα(x, y)) ,
where pα(x, y) := (1 − α)pg(x, y) + αpc(x, y) is a
mixture distribution for α ∈ (0, 1).
V (C, G) = maxD U(C, G, D)
V (C, G) = − log 4 + 2JSD(p(x, y)||pα(x, y))
24. pα(x, y) := (1 − α)pg(x, y) + αpc(x, y)
The equilibrium indicates that if one of C and G tends to the data distribution,
the other will also go towards the data distribution, which addresses the
competing problem
Given p(x, y) = pα(x, y), the marginal distributions are the same for p, pc and
pg, i.e. p(x) = pg(x) = pc(x) and p(y) = pg(y) = pc(y)
However, it may not be unique, and we should minimize an additional
objective to ensure the uniqueness
25. The objective functions in the process as adversarial losses :
minC,G max D Ep(x,y) [log D(x, y)] + αEpc(x,y) [log(1 − D(x, y))] +(1 − α)Epg(x,y) [log(1 − D(G(y, z),
y))] +RC+αPRP
where α ∈ (0, 1)
● Because label information is extremely insufficient in SSL, we propose pseudo
discriminative loss
RP = E pg [− log pc(y|x)]
● The Cross Entropy Loss to C
RC = E (x,y)~p(x,y) [− log pc(y|x)]
27. DISENTANGLE THE CLASSES AND STYLES OF THE
INPUT AND TRANSFER SMOOTHLY IN THE DATA SPACE
VIA INTERPOLATION IN THE LATENT SPACE CLASS-
CONDITIONALLY”.
in simpler terms:
● To disentangle the classes and styles of the input means to
separate the different categories (such as dogs or cats) and
visual features (such as colour or shape) of the images that are
given to the model.
● To transfer smoothly in the data space means to create new
images that look realistic and natural by changing some
aspects of the original images gradually.
● To do this via interpolation in the latent space means to use a
mathematical technique that finds intermediate values between
two points in a hidden representation of the data that captures
its essential characteristics.
● To do this class-conditionally means to do this only for images
that belong to the same category (such as dogs or cats), and
not across different categories.
33. Deep generative modeling still
remains an active area of research
with many challenges, such as
evaluating the quality of generated
samples and the prevention of
mode collapse, which occurs when
the generator starts producing
similar or identical samples,
leading to a collapse in the modes
of the data distribution