Model Call Girl in Narela Delhi reach out to us at ๐8264348440๐
ย
2021 03-02-spade
1.
2. Semantic Segmentation
"Conditional Random Fields Meet Deep Neural Network for Semantic Segmentation"
: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8254255
3. ๏ง Proposed the spatially-adaptive normalization
-> Effective layer for synthesizing photorealistic images given an input semantic layout.
๏ง Previous methods Problem : Normalization layers tend to "wash away" semantic information.
๏ง Finally, our model allows user control over both semantic and style.
Abstract
4. ๐๐ โ ๐๐๐ป๐ปร๐๐
: Semantic Segmentation Mask
where ๐๐ is a set of integers denoting the semantic labels, ๐ป๐ป and ๐๐ are the image height and
width.
Each entry in ๐๐ denotes the semantic label of a pixel.
We aim to learn a mapping function that can convert an input segmentation mask ๐๐ to a
photorealistic image.
Semantic Image Synthesis
10. SPADE generator
noise
downsampling
With the SPADE, there is no need to feed the segmentation map to the first layer of the generator, since
the learned modulation parameters have encoded enough information about the label layout.
13. ๏ง A short answer is that it can better preserve semantic information against common
normalization layers.
๏ง Specifically, while normalization layers such as the InstanceNorm are essential pieces in
almost all the state-of-the-art conditional image synthesis models, they tend to wash away
semantic information when applied to uniform or flat segmentation masks.
Why does the SPADE work better?
์์ฑ์์ ๋ชจ๋ ๋์ผํ ๋ผ๋ฒจ์ ๊ฐ์ง๋ Mask๋ฅผ ๋ฃ๋ ๊ฒฝ์ฐ
InstanceNorm์ด ํต๊ณผ๋๋ฉด ๋ชจ๋ ๊ฐ์ด 0์ด ๋๋ค.
-> pix2pix generator๋ semantic mask๋ฅผ ์ ๋ ฅ์ผ๋ก ๋ฃ
์ด์ InstanceNorm์ ํต๊ณผํ๊ฒ ๋๋ค.
-> SPADE generator๋ Semantic mask์ ์ ๊ทํ ๋ถ๋ถ
์ด ์๋ค. ์ด์ layer์ ํ์ฑํ๋ง ์ ๊ทํ ๋๋ค.
15. ๏ง By using a random vector as the input of the
generator, our architecture provides a simple
way for multi-modal synthesis.
๏ง Namely, one can attach an encoder that
processes a real image into a random vector,
which will be then fed to the generator.
๏ง The encoder and generator form a VAE.
๏ง For training, we add a KL-Divergence loss
term.
Multi-modal synthesis
๋ ํ๋ฅ ๋ถํฌ์ ๋ค๋ฅธ ์ ๋๋ฅผ ์ธก์
17. ๏ง Apply the Spectral Norm to all the layers in the Generator and Discriminator.
๏ง Learning rate : Generator - 0.0001, Discriminator - 0.0004
๏ง ADAM : ๐ฝ๐ฝ1 = 0, ๐ฝ๐ฝ2 = 0.999
๏ง synchronized BatchNorm - these statistics are collected from all the GPUs.
Experiments - Implementation details
18. ๏ง If the output images are realistic, a well-trained semantic segmentation model should be
able to predict the ground truth label.
๏ง Cascaded Refinement Network(CRN)
๏ง Semiparametric Image Synthesis method(SIMS)
๏ง ๊ณตํํ ํ๊ฐ๋ฅผ ์ํด
๏ง CRN, pix2pixHD models -> ์ ์๊ฐ ๊ตฌํํ ๋ชจ๋ธ์ ์ฌ์ฉ
๏ง SIMS -> ๋ ผ๋ฌธ์์ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ์ ธ์ด
Experiments - Performance metrics
20. Experiments - Human evaluation
We use the Amazon Mechanical Turk (AMT) to compare the perceived visual fidelity of our method against
existing approaches.
Specifically, we give the AMT workers an input segmentation mask and two synthesis outputs from
different methods and ask them to choose the output image that looks more like a corresponding image of
the segmentation mask.
Randomly generate 500 questions for each dataset
22. ๏ง Proposed the spatially-adaptive normalization, which utilizes the input semantic layout while
performing the affine transformation in the normalization layers.
๏ง The proposed normalization leads to the first semantic image synthesis model that can
produce photorealistic outputs for diverse scenes including indoor, outdoor, landscape, and
street scenes.
๏ง We further demonstrate its application for multi-modal synthesis and guided image synthesis.
Conclusion