Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning

Olga Petrova
Machine Learning Engineer @ Scaleway
Harnessing the power of Generative
Adversarial Networks (GANs)
for supervised learning

1. INTRODUCTION
a)Generating content with AI
b)Supervised vs. Unsupervised learning
2. DEEP LEARNING PIPELINE
a)The building blocks of a DL project
b)How can a GPU help you?
3. FACE FRONTALIZATION GAN
OUTLINE

INTRODUCTION
a) Generating content with AI
b) Supervised vs. Unsupervised learning

THIS SLIDE DOES NOT EXIST
ThisPersonDoesNotExist.com
These images were generated via StyleGAN,
an artiﬁcial neural network by NVIDIA.

WHY USE A GPU FOR THIS?
NVIDIA
• Graphics Processing Unit manufacturer
• Training: very computationally intensive
• GPUs are optimised for Deep Learning
Scaleway GPU offer
• Dedicated 16-GB NVIDIA Tesla P100 GPU
• 10 CPU cores
• 45 GB of RAM
• 400 GB of local NVMe storage

SUPERVISED vs. UNSUPERVISED
Unsupervised learning
ThisPersonDoesNot exist GAN:
- Show the model a lot of pictures of people
(~ 70 000 images from Flickr)
- The model learns how to generate new pictures of faces
Unlabelled data: unsupervised learning
The training set: instead of (input, output) pairs, only (input)
The original use of GANs was unsupervised learning

Supervised learning
Labeled data:
(input, output) pairs
- Dog vs cat: inputs = images,
outputs = class labels
- Super resolution: inputs = images,
outputs = SRed images
Typically, GANs were not used
Low resolution input vs. Super resolution output
“Photo-Realistic Single Image Super-Resolution Using a Generative
Adversarial Network” by Twitter (2017)

Supervised learning
Labeled data:
(input, output) pairs
- Dog vs cat: inputs = images,
outputs = class labels
- Super resolution: inputs = images,
outputs = SRed images
Typically, GANs were not used
Super resolution vs. Super resolution w/ GAN
“Photo-Realistic Single Image Super-Resolution Using a Generative
Adversarial Network” by Twitter (2017)

FACE FRONTALIZATION
Supervised learning
Inputs:
proﬁle images at an angle
Outputs:
frontal images of the face
“Beyond Face Rotation: Global and Local Perception GAN for
Photorealistic and Identity Preserving Frontal View Synthesis” by
R. Huang et al (2017)
input
generated
output
ground
truth

FACE FRONTALIZATION
Supervised learning
Inputs:
proﬁle images at an angle
Outputs:
frontal images of the face
Scaleway’s Face Frontalization GAN
input
generated
output
ground
truth

DEEP LEARNING PIPELINE
a) The building blocks of a DL project
b) How can a GPU help you?

FACE FRONTALIZATION
Input = proﬁle images; output = frontal images
Model = program ← this is the product

FACE FRONTALIZATION
THE MODEL
• Architecture + hyper parameters ← ML engineer’s job
• Trainable parameters ← learned numerical values

FACE FRONTALIZATION
TWO REGIMES
1. Training: learn the right parameters for the model
2. Inference: using the trained model to infer output

TRAINING THE MODEL I
Training data: correct (input, output) pairs
1. Feed inputs into the model
2. Compare the generated output to ground truth
3. Adjust trainable parameters to generate better outputs

TRAINING THE MODEL II
• Training done in mini-batches:
analyse a few images, then update parameters
• 1 pass through training dataset = 1 training epoch

TECHNICAL CHALLENGE: COMPUTATION
Heavy computations:
Trainable parameters: ~ 8 000 000 in the frontalization GAN
Process many arithmetic operations in parallel

TECHNICAL CHALLENGE: COMPUTATION
SOLUTION: GPU
GPUs are optimised for such calculations
Example: NVIDIA Tesla P100 3584 cores

CPU vs GPU performance in training
GP1-XS 4 vCPUs
Scaleway Tesla
P100 GPU
Pricing
€39/month

€0.078/hour
€500/month

€1/hour

/hourTraining time per
epoch
8.5 hours 18 minutes
Cost €0.66 €0.30

CPU vs GPU performance in training
GP1-XS 4 vCPUs
Scaleway Tesla
P100 GPU
Pricing
€39/month

€0.078/hour
€500/month

€1/hour

/hourTraining time per
epoch
8.5 hours 18 minutes
Cost €0.66 €0.30
GPU: over 28 times faster for less than half the price

TECHNICAL CHALLENGE: HEAVY I/O
Feed batches of (input, output) pairs of images
Frontalization training set size ~700 000, 13+ Gb

TECHNICAL CHALLENGE: HEAVY I/O
SOLUTION: Local Storage
Scaleway GPU instances come with
400 Gb of local NVMe SSD storage

FACE FRONTALIZATION GAN
a) GAN: Generative Adversarial Network
b) Generator: Encoder + Decoder
c) Training and Inference

GANs: Generative Adversarial Nets I
Generative Adversarial Nets by Ian Goodfellow et
al. (2014)
Yann LeCun (Director of Facebook AI):
“the most interesting idea in the last 10 years in
Machine Learning”
Fig: http://skymind.ai/wiki/generative-adversarial-network-gan
Generative: there is a Generator part
Adversarial: there is also a Discriminator. You train the two against each other

GANs: Generative Adversarial Nets II
1. Generator: generates output images
2. Discriminator: has two objectives
- accept images from the training set (Real)
and
-reject the generated images (Fake)
The purpose of training:
the Generator gets good enough to be able to fool the Discriminator
into accepting the generated images as Real

GENERATOR I
Input image: 3 x 128 x 128 = 49152 numbers
Perhaps we do not need all the 49152 values to describe a face?

GENERATOR II
ENCODER: Analyse the face → 512 numbers that describe it
DECODER: 512 numbers → Reconstruct the face
ENCODER + DECODER = GENERATOR

TRAINING
Model: only the Generator (not the Discriminator)
To train the GAN: train both Discriminator and Generator
To see the beneﬁt of GAN, consider only the Generator ﬁrst

TRAINING: ONLY THE GENERATOR
• ML engineer needs an assessment for how far the generated result is
from the ideal
• Example: pixelwise loss function

Top: faces generated after 1 epoch (~700 000 training samples)

Bottom: ground truth frontal face photographs
from the ideal
generated
output
ground
truth

Top: faces generated after 10 epochs

Bottom: actual frontal face photographs
from the ideal
generated
output
ground
truth


from the ideal
generated
output
ground
truth


• Why does this work?
• We have a lot of trainable parameters (~ 5 000 000)
generated
output
ground
truth

INFERENCE
Model after one training epoch: results are blurry
• Training for too long leads to overﬁtting the training data:
the model does not generalise well at inference time
• How can we get good training results faster?
Top: input
Middle: generated output
Bottom: ground truth

INFERENCE
Model after 50 training epochs: getting better
Top: input

INFERENCE
Model after 400 epochs: generated images are getting worse
Top: input

TRAINING THE GAN:
GENERATOR + DISCRIMINATOR
Top: faces generated after 1 epoch (old model)

1. Minimize the pixel wise loss as before
2. Fool the Discriminator into believing the generated images
are Real
generated
output
ground
truth
Train Generator + Discriminator. Two objectives for the Generator:

TRAINING THE GAN:
Old model + GAN after 1 training epoch:

ﬁne features are sharpened much faster
are Real
generated
output
ground
truth

TRAINING THE GAN:
Only the Generator
Generated output after 10 training epochs
GAN = Generator + Discriminator
are Real

TRAINING THE GAN:
Only the Generator
are Real
GAN = Generator + Discriminator

1. With GAN we get better visual quality for the training set faster
2. Stop training earlier
3. The ﬁnal model will generalise better
TRAINING THE GAN:
Only the Generator GAN = Generator + Discriminator

COMBINED LOSS FUNCTION
Combined Loss function
1. Pixelwise loss for the generator:
how close is the output to ground truth?
2. Binary Cross Entropy loss for the
discriminator:
Fake or Real
Loss = the difference between
the generated output and the
desired output
input
generated
output
ground
truth

1. Fine features: small in pixelwise loss
2. Discriminator: uses ﬁne features to
categorise images as Real or Fake
3. Result: better visual quality of the output Super resolution vs. Super resolution w/
GAN
“Photo-Realistic Single Image Super-Resolution Using a
Generative Adversarial Network” by Twitter (2017)
Loss = the difference between
the generated output and the
desired output
input
generated
output
ground
truth
COMBINED LOSS FUNCTION

CONCLUSION
1. Using GANs in
Supervised Learning
can be a good idea
2. Training such
complex deep
networks beneﬁts
greatly from a GPU

Thank You
Stay tuned for exclusive how-to's and updates, follow us on Twitter and LinkedIn
@Scaleway
Emplacement QR Code
You can also follow me on LinkedIn www.linkedin.com/in/olga-p-petrova/
The face frontalization GAN code can be found on www.github.com/scaleway/frontalization

Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning

Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning

Recommended

Recommended

More Related Content

Similar to Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning

Similar to Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning (20)

More from Scaleway

More from Scaleway (20)

Recently uploaded

Recently uploaded (20)

Harnessing the power of Generative Adversarial Networks (GANs) for supervised learning