PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
High-Resolution Image Synthesis and
Semantic Manipulation with
Conditional GANs
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu,
Andrew Tao, Jan Kautz, Bryan Catanzaro
인공지능연구원
이광희

2
 High-resolution (e.g. 2048x1024)photo-realistic images from semantic label map
Goal
https://github.com/NVIDIA/pix2pixHD

3
 Interactive visual manipulation (object removing/adding, changing the object category)
 Generate diverse results given the same input allowing users to edit the object appearance
interactively
Goal
Interactive editing resultsEditing interface
https://github.com/NVIDIA/pix2pixHD

4
Related Work – Pix2Pix [21]
Image-to-Image Translation with Conditional Adversarial Networks (CVPR 2017)
cGAN: {x , z} → y
x: observed image (condition)
z: random noisevector
y: generatedoutput

5
Related Work – Cascade Refinement Networks [5]
Photographic Image Synthesis with Cascaded Refinement Networks (ICCV 2017)
• GAN : training instability and optimization issues
• First model that can synthesize HD images
• Propose cascade of refinement modules
• Direct regression objective with perceptual loss
• Weakness : lack fine details and realistic textures
pix2pixHD

6
 From semantic label map to neural photo
Pix2Pix[21] : training unstable, the quality unsatisfactory
Conditional GAN Framework

7
Improving Photorealism and Resolution

8

9

10

11
<Coarse-to-fine Generator>
Perceptual losses for real-time style transfer and super-resolution. (ECCV2016) [22]
G1 : Global Generator
G2 : Local Enhancer Generator G2 : Local Enhancer Generator
G2 Input :
2048x1024
Element-wise sum of two feature maps
G2 Output :
2048x1024
G1 Input : 1024x512
G1 Output : 1024x512
Training :
1. Train the global generator
2. Train the local enhancer
3. Jointly fine-tune all the networks together

12
Semantic label map vs Instance Map
<Input Image> <Semantic Label Map> <Instance Label Map>
Semantic Label Map은 같은 class의 object를 구분하지 못함.
Instance Label Map은 개별 object마다 고유의 ID를 포함함.

14
<Multi-scale Discriminator>
To differentiate high-resolution real and synthesized Images,
the discriminator needs to have large receptive field.
1. A deeper network
2. Larger convolutional kernels
increased network capacity, overfitting
Multi-scale discriminators :
3 discriminators that have an identical network structure

15
<Improved Adversarial Loss> Improve GAN loss by incorporating a feature matching loss
based on discriminator.
i th-layer feature extractor
VGG perceptual loss 를 추가 시
약간의 성능 향상

16
Learning an Instance-level Feature Embedding
To generate diverse images and allow instance-level control:
Adding additional low-dimensional feature channels as the input to the generator.
Training time :
1. discriminator, generator, feature encoder를 같이
학습
2. Training data의 모든 instance에 대한 feature를
기록
3. 각 semantic category에 포함된 feature들에 대
해서 k-means clustering 수행
Inference time :
1. 각 object instance에 대해서 랜덤으로 cluster
의 center 중 하나를 선택해서 encoded
feature로 사용함.
2. Editing 시 user가 k mode중 하나를 선택하도
록 해서 다른 스타일을 선택 가능

17
Learning an Instance-level Feature Embedding

18
 Implementation details
• LSGAN
• 𝜆 = 10
• K =10 for K-means
• 3-dimentional vectors to encode features
• Ours : GAN loss + Feature Matching Loss + VGG Perceptual Loss
• Ours(w/o VGG loss) : GAN loss + Feature Matching Loss
 Datasets
• Cityscapes, NYU Indoor RGBD, ADE20K, Helen Face
 Baseline
• pix2pix, CRN
Experimental Results

19
 Quantitative Comparisons
• Ground truth vs PSPNet from generated image
<Different Methods>
<Different Generators>
<Different Discriminators>

20
 Human Perceptual Study
• A/B tests deployed on the Amazon Mechanical Turk
• Unlimited time
• Limited time : 1/8 seconds~8 seconds
<Preference Rates>

23
 NYU Datasets

24
 ADE20K dataset

25
 Diverse Results on the Helen Face dataset

ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Thank you

PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Similar to PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (20)

More from 광희 이

More from 광희 이 (7)

Recently uploaded

Recently uploaded (20)

PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs