The document summarizes research on using generative adversarial networks (GANs) to edit images using text. It discusses Text-Adaptive GAN, which can manipulate images based on natural language descriptions, and Editable GAN, which can simultaneously generate and edit faces. It then proposes a model called Editable Text-Adaptive GAN that combines aspects of these two models to allow generating and editing images using natural language descriptions. Key aspects discussed include the model structure, use of a connection network and text-adaptive discriminator, and potential limitations and areas for improvement.
2. 2
1. Introduction
1. Conditional GAN
2. AttGAN
2. Related Work
1. Reference
2. Motivation
3. Model Structure
4. Text-Adaptive Discriminator
5. Formulation
6. Implementation Details
7. Experiments
8. Limitations
3. Text-Adaptive GAN
4. Editable GAN
1. Comparison of existing GANs
2. Key idea
3. Model Structure
4. Formulation
5. Experiments
6. Conclusion
5. Editable Text-Adaptive GAN
Table of Contents
1. Reference
2. Motivation
3. Model Structure
4. Connection Network
5. Formulation
6. Experiments
7. Limitations
6. Discussion
3. 3
Generate
edited images
1. Editing Images
: Editing Methods aim to manipulate single or multiple attributes of a original image, i.e., to generate a
new images with desired attriutes while preserving other details.
Introduction
4. 4
Introduction
2. Approaches to Introduce
1) Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language [1]
2) Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously [2]
3) Editable GAN + Text-Adpative GAN (By my suggestion)
b) EditableGANa) Text-Adaptive GAN
[1] Baek, Kyungjune, et al. “Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously.”
[2] Nam, Seonghyeon, et al. “Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language”
5. 5
Related Works
1. Conditional GAN
: Conditional GAN suggests a new framework to control the semantics of generated samples; they
formulate the problem as reproducing the conditional data distribution by training the conditional model
distribution.
https://github.com/hwalsuklee/tensorflow-generative-model-collections
6. 6
Related Works
2. AttGAN
: AttGAN aims to generate a new face with desired attributes while preserving other details. Introduced in
He, Z, et al. “Arbitrary facial attribute editing: Only change what you want” in arXiv
Add
Glasses
Blond
Hair
7. 7
Related Works
2. AttGAN
: Based on the encoder-decoder architecture, AttGAN apply an attribute classification constraint to the
generated image to just guarantee the correct changes of desired attributes, i.e., to “change what you want”.
8. 8
Text-Adaptive GAN
1. Reference
: Nam, Seonghyeon, et al. “Text-Adaptive Generative Adversarial
Networks: Manipulating Images with Natural Language” in NeurIPS 2018
9. 9
2. Motivation
: Text-Adaptive GAN aims to semantically modify visual attributes of an object in an image according to
the text describing the new visual appearance.
Text-Adaptive GAN
“This is a black bird with gray
and white wings and a bright
yellow belly and chest.”
Proposed
Synthesize novel images,
not manipulate.
Do not fully preserve
text-irrelevant contents.
Existing Methods
11. 11
3. Model Structure
: A simplified architecture of Text-Adaptive GAN.
Text-Adaptive GAN
Generator
Discriminator
Text Encoder
“She has blond hair”
Text-Adaptive
Discriminator
Text Encoder
“She has blond hair”
Real / Fake?
Has described
attributes?
Learning independently
12. 12
4. Text-Adaptive Discriminator
: Text-Adaptive Discriminator classifies each attributes independently using word-level local discriminators.
By doing so, the generator receives feedback from each local discriminator for each visual attributes.
Text-Adaptive GAN
1) Determines whether a visual attribute related to word exists in the image.
2) Adding word-level attentions to reduce the impact of less important words.
(using softmax values)
(u: temporal average of wi)
3) Final scores
13. 13
5. Formulation
Text-Adaptive GAN
“This is a brown bird”
Original x Positive Text t
“This is a black bird with
gray and white wings and
a bright yellow belly and
chest.”
Negative Text ƸtGenerated G(x, Ƹt)
x has classes described in t ?
G(x, Ƹ𝑡) has classes described in Ƹ𝑡 ?
a) Discriminator
b) Generator
log 𝐷(𝐺(𝑥, 𝑡))
−−
−
14. 14
6. Implementation Details
Text-Adaptive GAN
- Using bidirectional RNN to encode the whole text
- Using conditioning augmentation method for smooth text representation and the diversity of generated outputs
Randomly sample latent variables from the independent Gaussian
distribution Ɲ with 𝜇 𝜙 , 𝜎(𝜙). (Introduced with StackGAN)
- Using fastText for word embedding
16. 16
8. Limitations
Text-Adaptive GAN
- Can not edit properly for objects that do nat match those attributes in datasets.
“This flower is blue”+ = Bad result
- Good at only for a few attributes.
“This bird has
a very small wings”
+ = Bad result
17. 17
Editable GAN
1. Reference
: Baek, Kyungjune, et al. “Editable Generative Adversarial
Networks: Generating and Editing Faces Simultaneously.” in ACCV 2018
18. 18
2. Motivation
: Develop a single unified model that can simultaneously create and edit high quality face images with
desired attributes.
Editable GAN
Single model (Proposed)
+ Blond hair
Edit Attribute
Generate novel image
Blond hair
IcGAN, VAE/GAN …
AttGAN, cGAN …
Share
20. 20
3. Model Structure
: A simplified architecture of Editable GAN
Editable GAN
Generator
Discriminator
Attribute
Classifier
Real / Fake?
Has described
attributes?
Connection
Network
Structural
Information
Attribute
Information
Estimate
Latent vector
𝑧
𝑦
𝑥
21. 21
3. Model Structure
: Generate novel images with specific attributes in natural language.
Editable GAN
Generator
Discriminator
Attribute
Classifier
Real / Fake?
Has described
attributes?
Connection
Network
Structural
Information
Attribute
Information
Estimate
Latent vector
𝑧
𝑦 𝑥 𝑔𝑒𝑛
Sample from
uniform distribution
𝑎 = [0, 1, … , 0, … ]
(Blond hair)
22. 22
3. Model Structure
: Manipulate images with specific attributes.
Editable GAN
Generator
Discriminator
Attribute
Classifier
Connection
Network
Structural
Information
Attribute
Information
Estimate
Latent vector
ǁ𝑧
𝑦
Real / Fake?
Has described
attributes?
Edited
Original
𝑥 𝑜𝑟𝑖𝑔𝑖𝑛
𝑥 𝑒𝑑𝑖𝑡
(Estimated)
Original
Latent vector
𝑎 = [0, 1, … , 0, … ]
(Blond hair)
23. 23
4. Connection Network
: Connection Network performs the inverse generation process. Take 𝑓𝑑 from the discriminator and 𝑓𝑐
from the classifier as input, it estimates the latent vector.
By using connection network, it is able to bypass the disadvantage of the encoder-decoder architecture,
which overloads the generator training.
Editable GAN
Discriminator ClassifierReal image? Blond hair?
𝑓𝑑: Vector used for
detect fake images
𝑓𝑐: Vector used for
check classes (attributes)
Latent vector
Output feature vector
of the last fully connected layer
(Structural Information) (Attribute Information)
24. 24
5. Formulation
Editable GAN
c) Connection Network: Estimate image’s latent vector
Vector from 𝐺(𝑧, 𝑦)
a) Discriminator
b) Generator: Ensure reality, with the correct classes
+
𝐿 𝑎𝑑𝑣 + 𝐿 𝑐𝑙𝑎𝑠𝑠
+
Fool Discriminator Right Classes?
Novel Image Edited Image
𝑓𝑑𝑓𝑐
(Estimated)
𝑥
CN ǁ𝑧
(Random)
Decoder
𝑦
𝐺(𝑧, 𝑦)(Classes)
𝑧
(Edit image)
25. 25
6. Experiments
Editable GAN
a) Image quality, Reconstruction performance
b) Image editing c) Image generating d) Control the strength of attribute effect
26. 26
7. Limitations
Editable GAN
- Compare with other methods, Editable GAN is not very good at editing.
Compared to AttGAN, it does not properly preserve other details.
a) Editable GAN b) AttGAN
Not match Match
27. 27
Editable Text-Adaptive GAN ?
Feature CGAN AttGAN AttnGAN SISGAN TAGAN EditableGAN ???
Inplicit Classes
(Natural Language)
X X O O O X O
Generating Arbitrary Image O X O X X O O
Editing Image X O X O O O O
1. Comparison of Existing GANs
: Comparison of some GANs, which are known to perform well in editing or generative arbitrary images.
28. 28
Editable Text-Adaptive GAN ?
Feature CGAN AttGAN AttnGAN SISGAN TAGAN EditableGAN ???
Inplicit Classes
(Natural Language)
X X O O O X O
Generating Arbitrary Image O X O X X O O
Editing Image X O X O O O O
1. Comparison of Existing GANs
: Comparison of some GANs, which are known to perform well in editing or generative arbitrary images.
Q. Is it possible to make a novel framework that can generate and edit images simultaneously
using natural language with reference to Editable GAN [1] and Text-Adaptive GAN [2]?
[1] Baek, Kyungjune, et al. “Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously.”
[2] Nam, Seonghyeon, et al. “Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language”
29. 29
Editable Text-Adaptive GAN ?
2. Key idea
: Combine Editable GAN [1] and Text-Adpative GAN [2]. Proposed framework uses connection network
and text-adaptive discriminator and includes some components of both models.
[1] Baek, Kyungjune, et al. “Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously.”
[2] Nam, Seonghyeon, et al. “Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language”
+
Edit Images with natural language
Generate and edit simultaneously
30. 30
Editable Text-Adaptive GAN ?
3. Model Structure
Decoder
Classifier
Attribute
Classifier
Text Encoder
Text Encoder
CN
Text-Adaptive
Classifier
Unconditional Loss
(Adversarial Loss)
Conditional Loss
Discriminator
Generator
𝑥
𝑧
𝑓𝑑 𝑓𝑐
31. 31
Editable Text-Adaptive GAN ?
3. Model Structure
: Generate novel images with specific attributes in natural language.
Decoder
Classifier
Attribute
Classifier
Text Encoder
Text Encoder
CN
Text-Adaptive
Classifier
Unconditional Loss
(Adversarial Loss)
Conditional Loss
Sample from
uniform distribution
“She has blond hair”
𝑥 𝑔𝑒𝑛
𝑧
32. 32
Classifier
Attribute
Classifier
CN
Editable Text-Adaptive GAN ?
3. Model Structure
: Manipulate images with specific attributes in natural language.
Decoder
Text Encoder
Text Encoder
Text-Adaptive
Classifier
Unconditional Loss
(Adversarial Loss)
Conditional Loss
Edited
Original
𝑥 𝑜𝑟𝑖𝑔𝑖𝑛
𝑥 𝑒𝑑𝑖𝑡
ǁ𝑧
𝑓𝑑 𝑓𝑐
“She has blond hair”
33. 33
Editable Text-Adaptive GAN ?
4. Formulation
c) Connection Network: Estimate image’s latent vector
a) Discriminator with Text-Adaptive Classifier
b) Generator: Ensure reality, with the correct classes + Preserve other details in editing
𝐿 𝐷 = 𝐿 𝑎𝑑𝑣 + 𝜆 𝑐𝑜𝑛𝑑 𝐿 𝐶 𝑟𝑒𝑎𝑙
𝐿 𝐺 = 𝐿 𝑎𝑑𝑣 + 𝜆 𝑐𝑜𝑛𝑑 𝐿 𝐶 𝑒𝑑𝑖𝑡
+ 𝐿 𝐶 𝑔𝑒𝑛
+ 𝜆 𝑟𝑒𝑐𝑜𝑛 𝐿 𝑟𝑒𝑐 𝑖𝑚𝑎𝑔𝑒
𝐿 𝐶𝑁 = 𝜆 𝑟𝑒𝑐𝑜𝑛 𝐿 𝑟𝑒𝑐 𝑧
More suitable for editing (Only change what you want)
34. 34
Editable Text-Adaptive GAN ?
5. Experiments
: Still training and fine-tuning…. Results in below are images that comes out during learning.
“Not good, But it works anyway”
“This flower has petals that are white
and has patches of yellow.”
“A light pink flower with pointed petals
and a yellow circle.”
Original
Image
Edited
Image
Given same
description
a) Recognize attributes in text b) Generate and Edit images simultaneously with given text
Novel
Image
35. 35
Editable Text-Adaptive GAN ?
6. Conclusion
: The limitations of Editable Text-Adaptvie GAN and what I got.
- It has all the problems of the existing two models (Editable GAN, Text-Adaptive GAN).
: Text-Adaptive Discriminator and Connection Network works independtly, so it is not helpful to
solve those problems by combining two models.
- Reconstruction loss in image units, not just latent vectors, works effectively even this model does not
based on encoder-decoder architecture.
- 𝑓𝑐 probably contains enough attribute information. There was no major problem in learning without
entering y in Connection Network.
Reconsturction Loss
by images
36. 36
Discussion
- Is it possible to generate and edit images simultaneously without loss of original
information?
- Can we improve performance by integrating with a structure with other models like
StackGAN which introduced for using natural language?
- Metrics to compare the modified image with the original one. (Reconstruction loss is
greatly increased when the color/structure of the image is changed.)