SlideShare a Scribd company logo
Encoding in Style: a StyleGAN Encoder
for Image-to-Image Translation
2021. 11. 21
김준철, 고형권, 김상현, 전선영, 조경진, 허다운
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
2
• Background
• Introduction
• Related Work
• The pSp Framework
• Applications and Experiments
• Discussion
• Conclusion
Contents
3
Background StyleGAN
StyleGAN StyleGAN2
G
D
PGGAN
4
Background W W+ space
• W space : 512 dimension
Latent z vector 로부터 만들어지는 하나의 벡터
• W+ space : 18 x 512 dimension
Generator에 Style로 적용되기 전에
Affine layer를 지나 가공된 벡터
Synthesis network
5
Introduction Structure
1. Novel encoder architecture ( Image to w+ space directly )
2. Encoder architecture ( Feature Pyramid Network )
3. Fixed pre-trained StyleGAN
6
Introduction Previous Problem
1. Input must be invertable
Latent code가 존재하지 않는 Feature도 변환 할 수 있는 모델
2. Previous models can solve only a single problem
Pix2pix의 정신을 이어받은 Generic Framework
3. Adversary discriminator needs to be trained
학습에 Discriminator가 필요하지 않은 모델
4. Explicitly feed the generator with residual feature maps – locality bias
Style vector를 보내는 것으로 locality bias 완화
7
Related Work
01 GAN Inversion
02 Latent Space Manipulation
03 Image-to-Image
GAN Inversion
: 입력 이미지를 토대로 GAN Model이 유사한 이미지를 재생성
8
Related Work GAN Inversion
• Previous Work
1. Latent vector optimization for a single image
2. Image-to-Latent space mapping
위의 방법은 성능은 좋지만 시간이 오래 걸리는 문제가 있다.
이미지를 효율적으로 W+ vector 로 변환시키는 모델
• 추가적인 최적화가 없는 모델
• Discriminator 없는 학습
9
Related Work Latent Space manipulation
• Previous Work
1. Search Linear Directions Attributes
2. Train semantic face edits with Pre-trained Model
3. Search latent space with image transformation(zoom, rotate)
4. PCA of an intermediate activation space in un-supervised manner
5. Editing by changing latent space
Image Editing
“invert first, edit later”
한번에 해결 하자
Latent Space manipulation
: Latent Space를 활용하여 이미지를 수정
Latent space
10
Related Work Image-to-image
• Previous Work : 각 Domain 변환에 새로운 모델을 개발해야 했다.
하나의 모델로 여러가지 Task를 해결 할 수 있다.
Image-to-image
: 이미지의 Domain간의 변환
11
Q & A
12
pSp Framework
01 Architecture
02 Loss Function
03 Benefits of StyleGAN
13
The pSp Framework Architecture
• Encoder의 마지막 Feature Map만으로만 만들어진 Style은 Fine details 를 살리지 못했다.
• 각 계층(Coarse, Medium, Fine)마다 map2style network를 적용하였다.
pSp Architecture
14
The pSp Framework Loss Function
• L2-Loss
• LPIPS-Loss
• Regularization-Loss
• ID-Loss
F : Perceptual feature extractor
E : Encoder
R : ArcFace Network
15
The pSp Framework Loss Function
Regularization-Loss ID-Loss
• Model output
: mean of pre-trained w+ vector
• StyleGAN의 한계
- 학습된 데이터의 분포를 따라갈 수 밖에 없다.
• Real Image에 강건한 모델
- 얼굴인식에 쓰이는 ArcFace Loss를 활용한다.
16
The pSp Framework The Benefits of The StyleGAN Domain
1. Pixel에 집중하는 local operation에서 벗어나 global operation이 가능해졌다.
Local bias limit으로부터 자유로워졌다.
2. StyleGAN으로부터 Disentanglement를 학습 하기 때문에 semantic attribute를 조정하기 용이함
Multi-modal synthesis를 가능하게 만들었다.
17
Applications and Experiments
18
Applications and Experiments StyleGAN Inversion
19
Applications and Experiments StyleGAN Inversion
• Ablation Study
20
Applications and Experiments Face Frontalization
21
Applications and Experiments Conditional Image Synthesis
22
Extending to Other Applications Others
23
Going Beyond the Facial Domain
• StyleGAN이 학습된 도메인이라면 모두 적용 가능하다.
24
Discussion Limit
ID-Loss를 통해 Identity개선이 있었지만
결국 StyleGAN을 활용하기 때문에 학습되지 않은 feature를 만드는데 한계를 보였다.
• 얼굴 이외의 배경에 취약
• 측면 이미지에 취약
25
Conclusion
• Directly map a real image into the W+ latent space with no optimization required
• Propose a generic framework for solving various image-to-image translation tasks
• In contrast to the “invert first, edit later”, directly encode these translation tasks to StyleGAN
26
Q & A

More Related Content

What's hot

Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
捷恩 蔡
 
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
Tatsuya Yokota
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
taeseon ryu
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging Applications
Kyuhwan Jung
 
Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
NAVER Engineering
 
Image Restoration
Image Restoration Image Restoration
Image Restoration
Mahmudul Hasan
 
Style gan
Style ganStyle gan
Style gan
哲东 郑
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN family
Vitaly Bondar
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
Yu Huang
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
Brodmann17
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
Junho Cho
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
Shyam Krishna Khadka
 
Homomorphic filtering
Homomorphic filteringHomomorphic filtering
Homomorphic filtering
Gautam Saxena
 
Image transforms
Image transformsImage transforms
Image transforms
11mr11mahesh
 
An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
guestd1b1b5
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
Universitat Politècnica de Catalunya
 
Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)
asodariyabhavesh
 

What's hot (20)

Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
Introduction to Common Spatial Pattern Filters for EEG Motor Imagery Classifi...
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging Applications
 
Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
 
Image Restoration
Image Restoration Image Restoration
Image Restoration
 
Style gan
Style ganStyle gan
Style gan
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN family
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Homomorphic filtering
Homomorphic filteringHomomorphic filtering
Homomorphic filtering
 
Image transforms
Image transformsImage transforms
Image transforms
 
An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)
 

Similar to Encoding in Style: a Style Encoder for Image-to-Image Translation

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemon
JuyongLee21
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolution
NAVER Engineering
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
jungminchung
 
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
Sukwoo Lee
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
changehee lee
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
Gyubin Son
 
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
khuhacker
 
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
MYEONGGYU LEE
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
Dae Hyun Nam
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
jungminchung
 
Codex project
Codex projectCodex project
Codex project
Lee Jungpyo
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
Lee Seungeun
 
Cocos2d x a to z (상)
Cocos2d x a to z (상)Cocos2d x a to z (상)
Cocos2d x a to z (상)
SeungIl Choi
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
taeseon ryu
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Oh Yoojin
 
[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics
Chris Ohk
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...
JaeYeongKo
 
Shaderstudy Motion Blur
Shaderstudy Motion BlurShaderstudy Motion Blur
Shaderstudy Motion Bluryong gyun im
 
NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집
noerror
 
엔지니어링비젼_언리얼엔진4_커스텀______________________
엔지니어링비젼_언리얼엔진4_커스텀______________________엔지니어링비젼_언리얼엔진4_커스텀______________________
엔지니어링비젼_언리얼엔진4_커스텀______________________
Kyoung Seok(경석) Ko(고)
 

Similar to Encoding in Style: a Style Encoder for Image-to-Image Translation (20)

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemon
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolution
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
 
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
 
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
 
Codex project
Codex projectCodex project
Codex project
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
 
Cocos2d x a to z (상)
Cocos2d x a to z (상)Cocos2d x a to z (상)
Cocos2d x a to z (상)
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...
 
Shaderstudy Motion Blur
Shaderstudy Motion BlurShaderstudy Motion Blur
Shaderstudy Motion Blur
 
NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집
 
엔지니어링비젼_언리얼엔진4_커스텀______________________
엔지니어링비젼_언리얼엔진4_커스텀______________________엔지니어링비젼_언리얼엔진4_커스텀______________________
엔지니어링비젼_언리얼엔진4_커스텀______________________
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Encoding in Style: a Style Encoder for Image-to-Image Translation

  • 1. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation 2021. 11. 21 김준철, 고형권, 김상현, 전선영, 조경진, 허다운 Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
  • 2. 2 • Background • Introduction • Related Work • The pSp Framework • Applications and Experiments • Discussion • Conclusion Contents
  • 4. 4 Background W W+ space • W space : 512 dimension Latent z vector 로부터 만들어지는 하나의 벡터 • W+ space : 18 x 512 dimension Generator에 Style로 적용되기 전에 Affine layer를 지나 가공된 벡터 Synthesis network
  • 5. 5 Introduction Structure 1. Novel encoder architecture ( Image to w+ space directly ) 2. Encoder architecture ( Feature Pyramid Network ) 3. Fixed pre-trained StyleGAN
  • 6. 6 Introduction Previous Problem 1. Input must be invertable Latent code가 존재하지 않는 Feature도 변환 할 수 있는 모델 2. Previous models can solve only a single problem Pix2pix의 정신을 이어받은 Generic Framework 3. Adversary discriminator needs to be trained 학습에 Discriminator가 필요하지 않은 모델 4. Explicitly feed the generator with residual feature maps – locality bias Style vector를 보내는 것으로 locality bias 완화
  • 7. 7 Related Work 01 GAN Inversion 02 Latent Space Manipulation 03 Image-to-Image
  • 8. GAN Inversion : 입력 이미지를 토대로 GAN Model이 유사한 이미지를 재생성 8 Related Work GAN Inversion • Previous Work 1. Latent vector optimization for a single image 2. Image-to-Latent space mapping 위의 방법은 성능은 좋지만 시간이 오래 걸리는 문제가 있다. 이미지를 효율적으로 W+ vector 로 변환시키는 모델 • 추가적인 최적화가 없는 모델 • Discriminator 없는 학습
  • 9. 9 Related Work Latent Space manipulation • Previous Work 1. Search Linear Directions Attributes 2. Train semantic face edits with Pre-trained Model 3. Search latent space with image transformation(zoom, rotate) 4. PCA of an intermediate activation space in un-supervised manner 5. Editing by changing latent space Image Editing “invert first, edit later” 한번에 해결 하자 Latent Space manipulation : Latent Space를 활용하여 이미지를 수정 Latent space
  • 10. 10 Related Work Image-to-image • Previous Work : 각 Domain 변환에 새로운 모델을 개발해야 했다. 하나의 모델로 여러가지 Task를 해결 할 수 있다. Image-to-image : 이미지의 Domain간의 변환
  • 12. 12 pSp Framework 01 Architecture 02 Loss Function 03 Benefits of StyleGAN
  • 13. 13 The pSp Framework Architecture • Encoder의 마지막 Feature Map만으로만 만들어진 Style은 Fine details 를 살리지 못했다. • 각 계층(Coarse, Medium, Fine)마다 map2style network를 적용하였다. pSp Architecture
  • 14. 14 The pSp Framework Loss Function • L2-Loss • LPIPS-Loss • Regularization-Loss • ID-Loss F : Perceptual feature extractor E : Encoder R : ArcFace Network
  • 15. 15 The pSp Framework Loss Function Regularization-Loss ID-Loss • Model output : mean of pre-trained w+ vector • StyleGAN의 한계 - 학습된 데이터의 분포를 따라갈 수 밖에 없다. • Real Image에 강건한 모델 - 얼굴인식에 쓰이는 ArcFace Loss를 활용한다.
  • 16. 16 The pSp Framework The Benefits of The StyleGAN Domain 1. Pixel에 집중하는 local operation에서 벗어나 global operation이 가능해졌다. Local bias limit으로부터 자유로워졌다. 2. StyleGAN으로부터 Disentanglement를 학습 하기 때문에 semantic attribute를 조정하기 용이함 Multi-modal synthesis를 가능하게 만들었다.
  • 18. 18 Applications and Experiments StyleGAN Inversion
  • 19. 19 Applications and Experiments StyleGAN Inversion • Ablation Study
  • 20. 20 Applications and Experiments Face Frontalization
  • 21. 21 Applications and Experiments Conditional Image Synthesis
  • 22. 22 Extending to Other Applications Others
  • 23. 23 Going Beyond the Facial Domain • StyleGAN이 학습된 도메인이라면 모두 적용 가능하다.
  • 24. 24 Discussion Limit ID-Loss를 통해 Identity개선이 있었지만 결국 StyleGAN을 활용하기 때문에 학습되지 않은 feature를 만드는데 한계를 보였다. • 얼굴 이외의 배경에 취약 • 측면 이미지에 취약
  • 25. 25 Conclusion • Directly map a real image into the W+ latent space with no optimization required • Propose a generic framework for solving various image-to-image translation tasks • In contrast to the “invert first, edit later”, directly encode these translation tasks to StyleGAN