SlideShare a Scribd company logo
1 of 26
Download to read offline
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
High-Resolution Image Synthesis and
Semantic Manipulation with
Conditional GANs
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu,
Andrew Tao, Jan Kautz, Bryan Catanzaro
인공지능연구원
이광희
2
 High-resolution (e.g. 2048x1024)photo-realistic images from semantic label map
Goal
https://github.com/NVIDIA/pix2pixHD
3
 Interactive visual manipulation (object removing/adding, changing the object category)
 Generate diverse results given the same input allowing users to edit the object appearance
interactively
Goal
Interactive editing resultsEditing interface
https://github.com/NVIDIA/pix2pixHD
4
Related Work – Pix2Pix [21]
Image-to-Image Translation with Conditional Adversarial Networks (CVPR 2017)
cGAN: {x , z} → y
x: observed image (condition)
z: random noisevector
y: generatedoutput
5
Related Work – Cascade Refinement Networks [5]
Photographic Image Synthesis with Cascaded Refinement Networks (ICCV 2017)
• GAN : training instability and optimization issues
• First model that can synthesize HD images
• Propose cascade of refinement modules
• Direct regression objective with perceptual loss
• Weakness : lack fine details and realistic textures
pix2pixHD
6
 From semantic label map to neural photo
Pix2Pix[21] : training unstable, the quality unsatisfactory
Conditional GAN Framework
7
Improving Photorealism and Resolution
8
Improving Photorealism and Resolution
9
Improving Photorealism and Resolution
10
Improving Photorealism and Resolution
11
Improving Photorealism and Resolution
<Coarse-to-fine Generator>
Perceptual losses for real-time style transfer and super-resolution. (ECCV2016) [22]
G1 : Global Generator
G2 : Local Enhancer Generator G2 : Local Enhancer Generator
G2 Input :
2048x1024
Element-wise sum of two feature maps
G2 Output :
2048x1024
G1 Input : 1024x512
G1 Output : 1024x512
Training :
1. Train the global generator
2. Train the local enhancer
3. Jointly fine-tune all the networks together
12
Semantic label map vs Instance Map
<Input Image> <Semantic Label Map> <Instance Label Map>
Semantic Label Map은 같은 class의 object를 구분하지 못함.
Instance Label Map은 개별 object마다 고유의 ID를 포함함.
13
Using Instance Maps
concat
14
Improving Photorealism and Resolution
<Multi-scale Discriminator>
To differentiate high-resolution real and synthesized Images,
the discriminator needs to have large receptive field.
1. A deeper network
2. Larger convolutional kernels
increased network capacity, overfitting
Multi-scale discriminators :
3 discriminators that have an identical network structure
15
Improving Photorealism and Resolution
<Improved Adversarial Loss> Improve GAN loss by incorporating a feature matching loss
based on discriminator.
i th-layer feature extractor
VGG perceptual loss 를 추가 시
약간의 성능 향상
16
Learning an Instance-level Feature Embedding
To generate diverse images and allow instance-level control:
Adding additional low-dimensional feature channels as the input to the generator.
Training time :
1. discriminator, generator, feature encoder를 같이
학습
2. Training data의 모든 instance에 대한 feature를
기록
3. 각 semantic category에 포함된 feature들에 대
해서 k-means clustering 수행
Inference time :
1. 각 object instance에 대해서 랜덤으로 cluster
의 center 중 하나를 선택해서 encoded
feature로 사용함.
2. Editing 시 user가 k mode중 하나를 선택하도
록 해서 다른 스타일을 선택 가능
17
Learning an Instance-level Feature Embedding
18
 Implementation details
• LSGAN
• 𝜆 = 10
• K =10 for K-means
• 3-dimentional vectors to encode features
• Ours : GAN loss + Feature Matching Loss + VGG Perceptual Loss
• Ours(w/o VGG loss) : GAN loss + Feature Matching Loss
 Datasets
• Cityscapes, NYU Indoor RGBD, ADE20K, Helen Face
 Baseline
• pix2pix, CRN
Experimental Results
19
 Quantitative Comparisons
• Ground truth vs PSPNet from generated image
Experimental Results
<Different Methods>
<Different Generators>
<Different Discriminators>
20
 Human Perceptual Study
• A/B tests deployed on the Amazon Mechanical Turk
• Unlimited time
• Limited time : 1/8 seconds~8 seconds
Experimental Results
<Preference Rates>
21
Experimental Results
22
Experimental Results
23
 NYU Datasets
Experimental Results
24
 ADE20K dataset
Experimental Results
25
 Diverse Results on the Helen Face dataset
Experimental Results
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Thank you

More Related Content

What's hot

[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image TranslationDeep Learning JP
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphingDeep Learning JP
 
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted WindowsToru Tamaki
 
GANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムGANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムHirosaji
 
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo CartoonizationDeep Learning JP
 
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~SSII
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...KIT Cognitive Interaction Design
 
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...Deep Learning JP
 
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...Deep Learning JP
 
[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution DistillationDeep Learning JP
 
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial NetworksDeep Learning JP
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)norimatsu5
 
CNNチュートリアル
CNNチュートリアルCNNチュートリアル
CNNチュートリアルIkuro Sato
 
StyleGAN解説 CVPR2019読み会@DeNA
StyleGAN解説 CVPR2019読み会@DeNAStyleGAN解説 CVPR2019読み会@DeNA
StyleGAN解説 CVPR2019読み会@DeNAKento Doi
 
Link prediction
Link predictionLink prediction
Link predictionybenjo
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 

What's hot (20)

[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
 
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
 
GANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムGANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズム
 
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
[DL輪読会]CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
 
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
 
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
 
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
 
[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation
 
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial Networks
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)
 
CNNチュートリアル
CNNチュートリアルCNNチュートリアル
CNNチュートリアル
 
StyleGAN解説 CVPR2019読み会@DeNA
StyleGAN解説 CVPR2019読み会@DeNAStyleGAN解説 CVPR2019読み会@DeNA
StyleGAN解説 CVPR2019読み会@DeNA
 
Link prediction
Link predictionLink prediction
Link prediction
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 

Similar to PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
2016 MediaEval - Interestingness Task Overview
2016 MediaEval - Interestingness Task Overview2016 MediaEval - Interestingness Task Overview
2016 MediaEval - Interestingness Task Overviewmultimediaeval
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptxManojGowdaKb
 
Content-based product image retrieval using squared-hinge loss trained convol...
Content-based product image retrieval using squared-hinge loss trained convol...Content-based product image retrieval using squared-hinge loss trained convol...
Content-based product image retrieval using squared-hinge loss trained convol...IJECEIAES
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsIRJET Journal
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introductionKosuke Nakago
 
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...Catalina Arango
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...Seiya Ito
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET Journal
 
Unpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewUnpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewIRJET Journal
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Photo Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted FeaturesPhoto Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted FeaturesIRJET Journal
 
Creating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersCreating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersIRJET Journal
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksGreeshma M.S.R
 
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...IRJET Journal
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
Inpainting related works (part 2)
Inpainting related works (part 2)Inpainting related works (part 2)
Inpainting related works (part 2)Seowoo Han
 

Similar to PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (20)

Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
2016 MediaEval - Interestingness Task Overview
2016 MediaEval - Interestingness Task Overview2016 MediaEval - Interestingness Task Overview
2016 MediaEval - Interestingness Task Overview
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptx
 
Content-based product image retrieval using squared-hinge loss trained convol...
Content-based product image retrieval using squared-hinge loss trained convol...Content-based product image retrieval using squared-hinge loss trained convol...
Content-based product image retrieval using squared-hinge loss trained convol...
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its Variants
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introduction
 
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
 
Unpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewUnpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A Review
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Photo Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted FeaturesPhoto Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted Features
 
Creating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersCreating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and Autoencoders
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
Narendra achari.s
Narendra achari.sNarendra achari.s
Narendra achari.s
 
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
Inpainting related works (part 2)
Inpainting related works (part 2)Inpainting related works (part 2)
Inpainting related works (part 2)
 

More from 광희 이

LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation광희 이
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network광희 이
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?광희 이
 
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...광희 이
 
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet PhotosPR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos광희 이
 
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GANPR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GAN광희 이
 

More from 광희 이 (7)

LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
 
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
 
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet PhotosPR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
 
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GANPR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GAN
 
PR12-CAM
PR12-CAMPR12-CAM
PR12-CAM
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 

PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

  • 1. ARTIFICIAL INTELLIGENCE RESEARCH INSTITUTE High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro 인공지능연구원 이광희
  • 2. 2  High-resolution (e.g. 2048x1024)photo-realistic images from semantic label map Goal https://github.com/NVIDIA/pix2pixHD
  • 3. 3  Interactive visual manipulation (object removing/adding, changing the object category)  Generate diverse results given the same input allowing users to edit the object appearance interactively Goal Interactive editing resultsEditing interface https://github.com/NVIDIA/pix2pixHD
  • 4. 4 Related Work – Pix2Pix [21] Image-to-Image Translation with Conditional Adversarial Networks (CVPR 2017) cGAN: {x , z} → y x: observed image (condition) z: random noisevector y: generatedoutput
  • 5. 5 Related Work – Cascade Refinement Networks [5] Photographic Image Synthesis with Cascaded Refinement Networks (ICCV 2017) • GAN : training instability and optimization issues • First model that can synthesize HD images • Propose cascade of refinement modules • Direct regression objective with perceptual loss • Weakness : lack fine details and realistic textures pix2pixHD
  • 6. 6  From semantic label map to neural photo Pix2Pix[21] : training unstable, the quality unsatisfactory Conditional GAN Framework
  • 11. 11 Improving Photorealism and Resolution <Coarse-to-fine Generator> Perceptual losses for real-time style transfer and super-resolution. (ECCV2016) [22] G1 : Global Generator G2 : Local Enhancer Generator G2 : Local Enhancer Generator G2 Input : 2048x1024 Element-wise sum of two feature maps G2 Output : 2048x1024 G1 Input : 1024x512 G1 Output : 1024x512 Training : 1. Train the global generator 2. Train the local enhancer 3. Jointly fine-tune all the networks together
  • 12. 12 Semantic label map vs Instance Map <Input Image> <Semantic Label Map> <Instance Label Map> Semantic Label Map은 같은 class의 object를 구분하지 못함. Instance Label Map은 개별 object마다 고유의 ID를 포함함.
  • 14. 14 Improving Photorealism and Resolution <Multi-scale Discriminator> To differentiate high-resolution real and synthesized Images, the discriminator needs to have large receptive field. 1. A deeper network 2. Larger convolutional kernels increased network capacity, overfitting Multi-scale discriminators : 3 discriminators that have an identical network structure
  • 15. 15 Improving Photorealism and Resolution <Improved Adversarial Loss> Improve GAN loss by incorporating a feature matching loss based on discriminator. i th-layer feature extractor VGG perceptual loss 를 추가 시 약간의 성능 향상
  • 16. 16 Learning an Instance-level Feature Embedding To generate diverse images and allow instance-level control: Adding additional low-dimensional feature channels as the input to the generator. Training time : 1. discriminator, generator, feature encoder를 같이 학습 2. Training data의 모든 instance에 대한 feature를 기록 3. 각 semantic category에 포함된 feature들에 대 해서 k-means clustering 수행 Inference time : 1. 각 object instance에 대해서 랜덤으로 cluster 의 center 중 하나를 선택해서 encoded feature로 사용함. 2. Editing 시 user가 k mode중 하나를 선택하도 록 해서 다른 스타일을 선택 가능
  • 17. 17 Learning an Instance-level Feature Embedding
  • 18. 18  Implementation details • LSGAN • 𝜆 = 10 • K =10 for K-means • 3-dimentional vectors to encode features • Ours : GAN loss + Feature Matching Loss + VGG Perceptual Loss • Ours(w/o VGG loss) : GAN loss + Feature Matching Loss  Datasets • Cityscapes, NYU Indoor RGBD, ADE20K, Helen Face  Baseline • pix2pix, CRN Experimental Results
  • 19. 19  Quantitative Comparisons • Ground truth vs PSPNet from generated image Experimental Results <Different Methods> <Different Generators> <Different Discriminators>
  • 20. 20  Human Perceptual Study • A/B tests deployed on the Amazon Mechanical Turk • Unlimited time • Limited time : 1/8 seconds~8 seconds Experimental Results <Preference Rates>
  • 25. 25  Diverse Results on the Helen Face dataset Experimental Results