SlideShare a Scribd company logo
1 of 28
Download to read offline
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
수요세미나
인공지능연구원 정 정 민
arxiv 19.03.18
2019.06.12
2
Intro Video
label
style
3
• Introduction
• Related Work
• Proposed Method : SPADE
• Network Architecture
• Experiments
• Conclusion
목차
4
• 배경 및 문제점
§ Image-to-image translation 과정에서 conv process(conv → normalize → activation)가
흔히 사용되는데
§ 이때 normalization layer는 입력 이미지의 정보를 “유실”시킴
Ø Normalization layer tend to “wash away” information in input image
• 목표 및 기여점
§ Semantic(segmentation, mask) image → photorealistic image
§ 입력 이미지의 정보를 “유실”되게 하지 않는 새로운 normalize 방법 제안
Introduction
5
• Deep Generative Model
• Conditional Image Synthesis
• Unconditional Normalization Layers
• Conditional Normalization Layers
Related Work
6
• GAN의 구조를 갖음
§ Generator, Discriminator
Related Work - Deep Generative Model
7
• 특정 조건을 제시하고 이를 만족하는 이미지를 만들어 내는 것
§ Given category labels
§ Given text
§ Given image
Ø Image-to-image translation
Ø Semantic mask image -> photo realistic image
Related Work – Conditional Image Synthesis
8
• 외부 데이터에 의존하지 않고 단순히 propagated layer 내부에서
normalize를 진행
§ Batch Norm
§ Weight Norm
§ Layer Norm
§ Instance Norm
§ Group Norm
§ etc.
Related Work – Unconditional Normalization Layers
사진 출처 : http://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-learning/
9
• 외부 데이터에 의존적임
§ Image synthesis 과정에서 주로 사용
§ Conditional Batch Norm
§ Adaptive Instance Norm(AdaIN)
• 방법
1. Normalize(0 mean, 1 sd) using internal dataset
2. De-normalize using external dataset’s mean and sd
Ø Affine transformation with parameters inferred from external data
Related Work – Conditional Normalization Layers
A
B
…
…
…
…
Normalize using data A
De-normalize using data B
10
• Conditional(Mask) Image에서 평균, 분산 (scalar)값을 구하는게 아니라
tensor의 형태로 element-wise affine transformation 연산을 진행
• 의의
§ 𝛾와 β의 각 x, y 위치는 input mask의 특정 공간에 해당하는 의미 있는 정보를 담고 있음
§ Why does SPADE work better ? “Better preserve semantic information”
• 다른 Conditional Normalization Layer의 일반적인 형태
§ Conditional Batch Norm
Ø Mask image 대신 class label
Ø 𝛾 tensor의 값이 모두 같고 β tensor의 값이 모두 같음 (spatially-invariant)
§ AdaIN
Ø Mask image 대신 class label
Ø 𝛾 tensor의 값이 모두 같고 β tensor의 값이 모두 같음 (spatially-invariant)
Ø Put mini batch size = 1
SPatially-Adaptive DEnormalization(SPADE)
11
• 입력 : 실제 이미지와 그 이미지의 segmentation 이미지
• 출력 : segmentation 이미지에 실제 이미지의 스타일이 덮어
씌워진 이미지. 즉, 실제 원본 이미지
• 구성
§ Image Encoder
§ Generator
§ Discriminator
• SPADE는 Generator 과정에서만 사용
Network Architecture – Overview
12
• 실제 이미지를 encoding하는 과정
• 입력 : 실제 이미지
• 출력 : 256차원 vector 2개(평균, 분산)
§ 이 값은 reparameterization trick으로 sampling 과정에 사용
• 출력값이 정규 분포를 따르도록 학습 진행
§ KL Divergence Loss 사용
§ Multi-Modal synthesis 가능
• 최종적으로 정상적으로 학습이 됐다면 출력값은
입력 이미지의 스타일 분포를 의미함
Network Architecture - Image Encoder
13
• 정규분포를 따르는 Random Number(임의의 스타일)로부터 Mask
이미지를 스타일링 하는 과정
• 입력 :
§ Sampling 된 256차원 vector
§ mask Image
• 출력 : 스타일링 된 mask Image
• 구성
§ SPADE ResBlk
§ Upsample
Network Architecture - Generator
14
Generator - SPADE, SPADE ResBlk
15
• 입력으로 받는 이미지들이 ‘실제 이미지+실제 마스크 이미지’인지
‘생성 이미지+실제 마스크 이미지’ 인지를 구분
• 입력 : concat(Image, mask Image)
• 출력 : True / False
• Loss Function : Hinge Loss
Network Architecture - Discriminator
16
Network Architecture – Overview
17
Result
COCO-stuff dataset ADE20k, Cityscapes dataset
18
• Comparison SPADE with other baselines
§ Quantitative comparison
§ Qualitative comparison
• Effectiveness of SPADE
• Multi-modal synthesis
Experiments
19
• COCO-stuff
§ 118,000 train / 5,000 validation / 182 semantic class
• ADE20k
§ 20,210 train / 2,000 validation / 150 semantic class
• ADE20k-outdoor
• Cityscapes
§ 3,000 train / 500 validation / 19 semantic class
• Flicker Landscape
§ 41,000 train / 1,000 validation / 130 semantic class
Dataset
20
Dataset examples
COCO-stuff ADE20k, Cityscapes Flicker Landscape
21
• Segmentation Accuracy
§ 잘 생성된 이미지의 segmentation map은 원래 segmentation map과 비슷!
§ mIoU(mean Intersection over Union)
§ Pixel accuracy
• Frechet Inception Distance(FID)
§ 생선된 이미지의 분포와 실제 이미지 분포 사이의 거리
Performance Metrics
22
• Baselines
§ pix2pixHD Model
§ Cascaded Refinement Network(CRN)
§ Semi-parametric IMage Synthesis model (SIMS)
Experiments – Comparison SPADE with other baselines
SIMS
23
• 대부분의 데이터에 대해서 SPADE가 좋은 성능을 보임(높은 mIoU, accu, 낮
은 FID)
• SIMS의 External memory는 outdoor에 대해서만 존재하므로 ADE20k-
outdoor과 Cityscapes 데이터에서만 성능 비교 시도
• 외부 레이블 데이터(건물, 차, 도로 등)의 일반적인 특징을 학습한 네트워크이
므로 FID에서 SPADE와 비슷하거나 더 좋은 성능을 보임
Quantitative comparison
24
• Amazon Mechanical Turk 이용
• 사용자는 segmentation 이미지와 두 개의 합성 이미지(SPADE & 비교 모델)
를 받음
• “해당 segmentation이미지와 더 잘 부합하는 합성 이미지는 무엇인가요?”
• 각 데이터셋마다 500번의 질의
• 모든 경우 비교 모델보다 사용자의 평가가 높음
Qualitative comparison
25
①
②
• 모델에서 SPADE의 유무에 따른 성능 비교
• pix2pixHD++ : baseline 이었던 pix2pix모델을 강화
• ① : decoder(generator) part
§ Concatenate보다 SPADE가 더 semantic의미를 잘 전달
§ Compact 모델이 concat 뿐 아니라 ②의 다른 모델들보다 대부분 좋은 성능을 보임
• ② : pix2pixHD 모델에 적용
§ SPADE를 쓴 모델이 기존 모델의 성능을 더 끌어올림
Effectiveness of SPADE
26
• 잘 학습된 Image Encoder의 결과는 정규 분포를 따르므로
정규 분포를 따르는 임의의 data point를 입력으로 넣어주면
segmentation map에 다양한 스타일을 입힐 수 있음
Multi-modal Synthesis
27
• Image-to-image translation 과정 중 Normalization layer를 통과하면서 발생
하는 input image information 유실 문제를 해결하고자 새로운 Normalize 방법
(SPADE)을 제시
• 이는 Semantic synthesis task에서 기존 모델보다 좋은 성능을 양적/질적 실
험을 통해 보여줌
Conclusion
28
감사합니다

More Related Content

What's hot

Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Universitat Politècnica de Catalunya
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Application of Image processing in Defect Detection of PCB by Jeevan B M
Application of Image processing in Defect Detection of PCB by Jeevan B MApplication of Image processing in Defect Detection of PCB by Jeevan B M
Application of Image processing in Defect Detection of PCB by Jeevan B MJeevan B M
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithmManas Mantri
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep LearningYu Huang
 
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Ulaş Bağcı
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationShuangshuang Zhou
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extractionskylian
 
Lec12: Shape Models and Medical Image Segmentation
Lec12: Shape Models and Medical Image SegmentationLec12: Shape Models and Medical Image Segmentation
Lec12: Shape Models and Medical Image SegmentationUlaş Bağcı
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksElaheh Rashedi
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generationKyuYeolJung
 

What's hot (20)

Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Application of Image processing in Defect Detection of PCB by Jeevan B M
Application of Image processing in Defect Detection of PCB by Jeevan B MApplication of Image processing in Defect Detection of PCB by Jeevan B M
Application of Image processing in Defect Detection of PCB by Jeevan B M
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithm
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text Classification
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Lec12: Shape Models and Medical Image Segmentation
Lec12: Shape Models and Medical Image SegmentationLec12: Shape Models and Medical Image Segmentation
Lec12: Shape Models and Medical Image Segmentation
 
Image retrieval
Image retrievalImage retrieval
Image retrieval
 
Face Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generation
 

Similar to Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemonJuyongLee21
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper ReviewLEE HOSEONG
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipelinechangehee lee
 
2019 5-5-week-i-learned-generative model
2019 5-5-week-i-learned-generative model2019 5-5-week-i-learned-generative model
2019 5-5-week-i-learned-generative modelstrutive07
 
247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식NAVER D2
 
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Learning Less is More - 6D Camera Localization via 3D Surface RegressionLearning Less is More - 6D Camera Localization via 3D Surface Regression
Learning Less is More - 6D Camera Localization via 3D Surface RegressionBrian Younggun Cho
 
[Paper Review] Visualizing and understanding convolutional networks
[Paper Review] Visualizing and understanding convolutional networks[Paper Review] Visualizing and understanding convolutional networks
[Paper Review] Visualizing and understanding convolutional networksKorea, Sejong University.
 
Encoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image TranslationEncoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image Translationtaeseon ryu
 
[Paper review] contrastive language image pre-training, open ai, 2020
[Paper review] contrastive language image pre-training, open ai, 2020[Paper review] contrastive language image pre-training, open ai, 2020
[Paper review] contrastive language image pre-training, open ai, 2020Seonghoon Jung
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희changehee lee
 
Hierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingHierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingYEONG-CHEON YOU
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...JaeYeongKo
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolutionNAVER Engineering
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬현찬 양
 
Anomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learningAnomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learning도형 임
 
09_Bilateral filtering/Reprojection Cache 소개
09_Bilateral filtering/Reprojection Cache 소개09_Bilateral filtering/Reprojection Cache 소개
09_Bilateral filtering/Reprojection Cache 소개noerror
 
분석과 설계
분석과 설계분석과 설계
분석과 설계Haeil Yi
 
AnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIAnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIHYEJINLIM10
 
Unsupervised learning for real-world super-resolution review presentation
Unsupervised learning for real-world super-resolution review presentationUnsupervised learning for real-world super-resolution review presentation
Unsupervised learning for real-world super-resolution review presentationSeoung-Ho Choi
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 

Similar to Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE) (20)

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemon
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
 
2019 5-5-week-i-learned-generative model
2019 5-5-week-i-learned-generative model2019 5-5-week-i-learned-generative model
2019 5-5-week-i-learned-generative model
 
247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식
 
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Learning Less is More - 6D Camera Localization via 3D Surface RegressionLearning Less is More - 6D Camera Localization via 3D Surface Regression
Learning Less is More - 6D Camera Localization via 3D Surface Regression
 
[Paper Review] Visualizing and understanding convolutional networks
[Paper Review] Visualizing and understanding convolutional networks[Paper Review] Visualizing and understanding convolutional networks
[Paper Review] Visualizing and understanding convolutional networks
 
Encoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image TranslationEncoding in Style: a Style Encoder for Image-to-Image Translation
Encoding in Style: a Style Encoder for Image-to-Image Translation
 
[Paper review] contrastive language image pre-training, open ai, 2020
[Paper review] contrastive language image pre-training, open ai, 2020[Paper review] contrastive language image pre-training, open ai, 2020
[Paper review] contrastive language image pre-training, open ai, 2020
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희
 
Hierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingHierachical z Map Occlusion Culling
Hierachical z Map Occlusion Culling
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolution
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬
 
Anomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learningAnomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learning
 
09_Bilateral filtering/Reprojection Cache 소개
09_Bilateral filtering/Reprojection Cache 소개09_Bilateral filtering/Reprojection Cache 소개
09_Bilateral filtering/Reprojection Cache 소개
 
분석과 설계
분석과 설계분석과 설계
분석과 설계
 
AnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AIAnoGAN을 이용한 철강 소재 결함 검출 AI
AnoGAN을 이용한 철강 소재 결함 검출 AI
 
Unsupervised learning for real-world super-resolution review presentation
Unsupervised learning for real-world super-resolution review presentationUnsupervised learning for real-world super-resolution review presentation
Unsupervised learning for real-world super-resolution review presentation
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 

Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)

  • 3. 3 • Introduction • Related Work • Proposed Method : SPADE • Network Architecture • Experiments • Conclusion 목차
  • 4. 4 • 배경 및 문제점 § Image-to-image translation 과정에서 conv process(conv → normalize → activation)가 흔히 사용되는데 § 이때 normalization layer는 입력 이미지의 정보를 “유실”시킴 Ø Normalization layer tend to “wash away” information in input image • 목표 및 기여점 § Semantic(segmentation, mask) image → photorealistic image § 입력 이미지의 정보를 “유실”되게 하지 않는 새로운 normalize 방법 제안 Introduction
  • 5. 5 • Deep Generative Model • Conditional Image Synthesis • Unconditional Normalization Layers • Conditional Normalization Layers Related Work
  • 6. 6 • GAN의 구조를 갖음 § Generator, Discriminator Related Work - Deep Generative Model
  • 7. 7 • 특정 조건을 제시하고 이를 만족하는 이미지를 만들어 내는 것 § Given category labels § Given text § Given image Ø Image-to-image translation Ø Semantic mask image -> photo realistic image Related Work – Conditional Image Synthesis
  • 8. 8 • 외부 데이터에 의존하지 않고 단순히 propagated layer 내부에서 normalize를 진행 § Batch Norm § Weight Norm § Layer Norm § Instance Norm § Group Norm § etc. Related Work – Unconditional Normalization Layers 사진 출처 : http://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-learning/
  • 9. 9 • 외부 데이터에 의존적임 § Image synthesis 과정에서 주로 사용 § Conditional Batch Norm § Adaptive Instance Norm(AdaIN) • 방법 1. Normalize(0 mean, 1 sd) using internal dataset 2. De-normalize using external dataset’s mean and sd Ø Affine transformation with parameters inferred from external data Related Work – Conditional Normalization Layers A B … … … … Normalize using data A De-normalize using data B
  • 10. 10 • Conditional(Mask) Image에서 평균, 분산 (scalar)값을 구하는게 아니라 tensor의 형태로 element-wise affine transformation 연산을 진행 • 의의 § 𝛾와 β의 각 x, y 위치는 input mask의 특정 공간에 해당하는 의미 있는 정보를 담고 있음 § Why does SPADE work better ? “Better preserve semantic information” • 다른 Conditional Normalization Layer의 일반적인 형태 § Conditional Batch Norm Ø Mask image 대신 class label Ø 𝛾 tensor의 값이 모두 같고 β tensor의 값이 모두 같음 (spatially-invariant) § AdaIN Ø Mask image 대신 class label Ø 𝛾 tensor의 값이 모두 같고 β tensor의 값이 모두 같음 (spatially-invariant) Ø Put mini batch size = 1 SPatially-Adaptive DEnormalization(SPADE)
  • 11. 11 • 입력 : 실제 이미지와 그 이미지의 segmentation 이미지 • 출력 : segmentation 이미지에 실제 이미지의 스타일이 덮어 씌워진 이미지. 즉, 실제 원본 이미지 • 구성 § Image Encoder § Generator § Discriminator • SPADE는 Generator 과정에서만 사용 Network Architecture – Overview
  • 12. 12 • 실제 이미지를 encoding하는 과정 • 입력 : 실제 이미지 • 출력 : 256차원 vector 2개(평균, 분산) § 이 값은 reparameterization trick으로 sampling 과정에 사용 • 출력값이 정규 분포를 따르도록 학습 진행 § KL Divergence Loss 사용 § Multi-Modal synthesis 가능 • 최종적으로 정상적으로 학습이 됐다면 출력값은 입력 이미지의 스타일 분포를 의미함 Network Architecture - Image Encoder
  • 13. 13 • 정규분포를 따르는 Random Number(임의의 스타일)로부터 Mask 이미지를 스타일링 하는 과정 • 입력 : § Sampling 된 256차원 vector § mask Image • 출력 : 스타일링 된 mask Image • 구성 § SPADE ResBlk § Upsample Network Architecture - Generator
  • 14. 14 Generator - SPADE, SPADE ResBlk
  • 15. 15 • 입력으로 받는 이미지들이 ‘실제 이미지+실제 마스크 이미지’인지 ‘생성 이미지+실제 마스크 이미지’ 인지를 구분 • 입력 : concat(Image, mask Image) • 출력 : True / False • Loss Function : Hinge Loss Network Architecture - Discriminator
  • 18. 18 • Comparison SPADE with other baselines § Quantitative comparison § Qualitative comparison • Effectiveness of SPADE • Multi-modal synthesis Experiments
  • 19. 19 • COCO-stuff § 118,000 train / 5,000 validation / 182 semantic class • ADE20k § 20,210 train / 2,000 validation / 150 semantic class • ADE20k-outdoor • Cityscapes § 3,000 train / 500 validation / 19 semantic class • Flicker Landscape § 41,000 train / 1,000 validation / 130 semantic class Dataset
  • 20. 20 Dataset examples COCO-stuff ADE20k, Cityscapes Flicker Landscape
  • 21. 21 • Segmentation Accuracy § 잘 생성된 이미지의 segmentation map은 원래 segmentation map과 비슷! § mIoU(mean Intersection over Union) § Pixel accuracy • Frechet Inception Distance(FID) § 생선된 이미지의 분포와 실제 이미지 분포 사이의 거리 Performance Metrics
  • 22. 22 • Baselines § pix2pixHD Model § Cascaded Refinement Network(CRN) § Semi-parametric IMage Synthesis model (SIMS) Experiments – Comparison SPADE with other baselines SIMS
  • 23. 23 • 대부분의 데이터에 대해서 SPADE가 좋은 성능을 보임(높은 mIoU, accu, 낮 은 FID) • SIMS의 External memory는 outdoor에 대해서만 존재하므로 ADE20k- outdoor과 Cityscapes 데이터에서만 성능 비교 시도 • 외부 레이블 데이터(건물, 차, 도로 등)의 일반적인 특징을 학습한 네트워크이 므로 FID에서 SPADE와 비슷하거나 더 좋은 성능을 보임 Quantitative comparison
  • 24. 24 • Amazon Mechanical Turk 이용 • 사용자는 segmentation 이미지와 두 개의 합성 이미지(SPADE & 비교 모델) 를 받음 • “해당 segmentation이미지와 더 잘 부합하는 합성 이미지는 무엇인가요?” • 각 데이터셋마다 500번의 질의 • 모든 경우 비교 모델보다 사용자의 평가가 높음 Qualitative comparison
  • 25. 25 ① ② • 모델에서 SPADE의 유무에 따른 성능 비교 • pix2pixHD++ : baseline 이었던 pix2pix모델을 강화 • ① : decoder(generator) part § Concatenate보다 SPADE가 더 semantic의미를 잘 전달 § Compact 모델이 concat 뿐 아니라 ②의 다른 모델들보다 대부분 좋은 성능을 보임 • ② : pix2pixHD 모델에 적용 § SPADE를 쓴 모델이 기존 모델의 성능을 더 끌어올림 Effectiveness of SPADE
  • 26. 26 • 잘 학습된 Image Encoder의 결과는 정규 분포를 따르므로 정규 분포를 따르는 임의의 data point를 입력으로 넣어주면 segmentation map에 다양한 스타일을 입힐 수 있음 Multi-modal Synthesis
  • 27. 27 • Image-to-image translation 과정 중 Normalization layer를 통과하면서 발생 하는 input image information 유실 문제를 해결하고자 새로운 Normalize 방법 (SPADE)을 제시 • 이는 Semantic synthesis task에서 기존 모델보다 좋은 성능을 양적/질적 실 험을 통해 보여줌 Conclusion