2021 03-02-spade

Semantic Segmentation
"Conditional Random Fields Meet Deep Neural Network for Semantic Segmentation"
: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8254255

 Proposed the spatially-adaptive normalization
-> Effective layer for synthesizing photorealistic images given an input semantic layout.
 Previous methods Problem : Normalization layers tend to "wash away" semantic information.
 Finally, our model allows user control over both semantic and style.
Abstract

𝑚𝑚 ∈ 𝕃𝕃𝐻𝐻×𝑊𝑊
: Semantic Segmentation Mask
where 𝕃𝕃 is a set of integers denoting the semantic labels, 𝐻𝐻 and 𝑊𝑊 are the image height and
width.
Each entry in 𝒎𝒎 denotes the semantic label of a pixel.
We aim to learn a mapping function that can convert an input segmentation mask 𝒎𝒎 to a
photorealistic image.
Semantic Image Synthesis

 임의의 삼각형을 임의의 삼각형으로 매핑시킬 수 있는 변환.
 제약 조건 : 같은 평면에 있어야한다.
 평행 이동, 회전, 크기 변형, 비틀기
Affine transformation
https://darkpgmr.tistory.com/79

(Unconditional) Batch Normalization

Conditional Batch Normalization

Adaptive Instance Normalization(AdalN)
style
content
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
: https://arxiv.org/abs/1703.06868

SPatially-Adaptive DEnormalization(SPADE)
https://nvlabs.github.io/SPADE/
activaton
mean
std

SPADE generator
noise
downsampling
With the SPADE, there is no need to feed the segmentation map to the first layer of the generator, since
the learned modulation parameters have encoded enough information about the label layout.

SPADE
output a mean vector µ
output a variance vector σ

 A short answer is that it can better preserve semantic information against common
normalization layers.
 Specifically, while normalization layers such as the InstanceNorm are essential pieces in
almost all the state-of-the-art conditional image synthesis models, they tend to wash away
semantic information when applied to uniform or flat segmentation masks.
Why does the SPADE work better?
생성자에 모든 동일한 라벨을 가지는 Mask를 넣는 경우
InstanceNorm이 통과되면 모든 값이 0이 된다.
-> pix2pix generator는 semantic mask를 입력으로 넣
어서 InstanceNorm을 통과하게 된다.
-> SPADE generator는 Semantic mask의 정규화 부분
이 없다. 이전 layer의 활성화만 정규화 된다.

Why does the SPADE work better?

 By using a random vector as the input of the
generator, our architecture provides a simple
way for multi-modal synthesis.
 Namely, one can attach an encoder that
processes a real image into a random vector,
which will be then fed to the generator.
 The encoder and generator form a VAE.
 For training, we add a KL-Divergence loss
term.
Multi-modal synthesis
두 확률분포의 다른 정도를 측정

 Apply the Spectral Norm to all the layers in the Generator and Discriminator.
 Learning rate : Generator - 0.0001, Discriminator - 0.0004
 ADAM : 𝛽𝛽1 = 0, 𝛽𝛽2 = 0.999
 synchronized BatchNorm - these statistics are collected from all the GPUs.
Experiments - Implementation details

 If the output images are realistic, a well-trained semantic segmentation model should be
able to predict the ground truth label.
 Cascaded Refinement Network(CRN)
 Semiparametric Image Synthesis method(SIMS)
 공평한 평가를 위해
 CRN, pix2pixHD models -> 저자가 구현한 모델을 사용
 SIMS -> 논문에서 결과를 가져옴
Experiments - Performance metrics

Experiments - Qualitative results

Experiments - Human evaluation
We use the Amazon Mechanical Turk (AMT) to compare the perceived visual fidelity of our method against
existing approaches.
Specifically, we give the AMT workers an input segmentation mask and two synthesis outputs from
different methods and ask them to choose the output image that looks more like a corresponding image of
the segmentation mask.
Randomly generate 500 questions for each dataset

Experiments - Effectiveness of the SPADE

 Proposed the spatially-adaptive normalization, which utilizes the input semantic layout while
performing the affine transformation in the normalization layers.
 The proposed normalization leads to the first semantic image synthesis model that can
produce photorealistic outputs for diverse scenes including indoor, outdoor, landscape, and
street scenes.
 We further demonstrate its application for multi-modal synthesis and guided image synthesis.
Conclusion

2021 03-02-spade

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2021 03-02-spade

Similar to 2021 03-02-spade (20)

More from JAEMINJEONG5

More from JAEMINJEONG5 (7)

Recently uploaded

Recently uploaded (20)

2021 03-02-spade