SlideShare a Scribd company logo
1 of 26
Encoding in Style: a StyleGAN Encoder
for Image-to-Image Translation
2021. 11. 21
김준철, 고형권, 김상현, 전선영, 조경진, 허다운
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
2
• Background
• Introduction
• Related Work
• The pSp Framework
• Applications and Experiments
• Discussion
• Conclusion
Contents
3
Background StyleGAN
StyleGAN StyleGAN2
G
D
PGGAN
4
Background W W+ space
• W space : 512 dimension
Latent z vector 로부터 만들어지는 하나의 벡터
• W+ space : 18 x 512 dimension
Generator에 Style로 적용되기 전에
Affine layer를 지나 가공된 벡터
Synthesis network
5
Introduction Structure
1. Novel encoder architecture ( Image to w+ space directly )
2. Encoder architecture ( Feature Pyramid Network )
3. Fixed pre-trained StyleGAN
6
Introduction Previous Problem
1. Input must be invertable
Latent code가 존재하지 않는 Feature도 변환 할 수 있는 모델
2. Previous models can solve only a single problem
Pix2pix의 정신을 이어받은 Generic Framework
3. Adversary discriminator needs to be trained
학습에 Discriminator가 필요하지 않은 모델
4. Explicitly feed the generator with residual feature maps – locality bias
Style vector를 보내는 것으로 locality bias 완화
7
Related Work
01 GAN Inversion
02 Latent Space Manipulation
03 Image-to-Image
GAN Inversion
: 입력 이미지를 토대로 GAN Model이 유사한 이미지를 재생성
8
Related Work GAN Inversion
• Previous Work
1. Latent vector optimization for a single image
2. Image-to-Latent space mapping
위의 방법은 성능은 좋지만 시간이 오래 걸리는 문제가 있다.
이미지를 효율적으로 W+ vector 로 변환시키는 모델
• 추가적인 최적화가 없는 모델
• Discriminator 없는 학습
9
Related Work Latent Space manipulation
• Previous Work
1. Search Linear Directions Attributes
2. Train semantic face edits with Pre-trained Model
3. Search latent space with image transformation(zoom, rotate)
4. PCA of an intermediate activation space in un-supervised manner
5. Editing by changing latent space
Image Editing
“invert first, edit later”
한번에 해결 하자
Latent Space manipulation
: Latent Space를 활용하여 이미지를 수정
Latent space
10
Related Work Image-to-image
• Previous Work : 각 Domain 변환에 새로운 모델을 개발해야 했다.
하나의 모델로 여러가지 Task를 해결 할 수 있다.
Image-to-image
: 이미지의 Domain간의 변환
11
Q & A
12
pSp Framework
01 Architecture
02 Loss Function
03 Benefits of StyleGAN
13
The pSp Framework Architecture
• Encoder의 마지막 Feature Map만으로만 만들어진 Style은 Fine details 를 살리지 못했다.
• 각 계층(Coarse, Medium, Fine)마다 map2style network를 적용하였다.
pSp Architecture
14
The pSp Framework Loss Function
• L2-Loss
• LPIPS-Loss
• Regularization-Loss
• ID-Loss
F : Perceptual feature extractor
E : Encoder
R : ArcFace Network
15
The pSp Framework Loss Function
Regularization-Loss ID-Loss
• Model output
: mean of pre-trained w+ vector
• StyleGAN의 한계
- 학습된 데이터의 분포를 따라갈 수 밖에 없다.
• Real Image에 강건한 모델
- 얼굴인식에 쓰이는 ArcFace Loss를 활용한다.
16
The pSp Framework The Benefits of The StyleGAN Domain
1. Pixel에 집중하는 local operation에서 벗어나 global operation이 가능해졌다.
Local bias limit으로부터 자유로워졌다.
2. StyleGAN으로부터 Disentanglement를 학습 하기 때문에 semantic attribute를 조정하기 용이함
Multi-modal synthesis를 가능하게 만들었다.
17
Applications and Experiments
18
Applications and Experiments StyleGAN Inversion
19
Applications and Experiments StyleGAN Inversion
• Ablation Study
20
Applications and Experiments Face Frontalization
21
Applications and Experiments Conditional Image Synthesis
22
Extending to Other Applications Others
23
Going Beyond the Facial Domain
• StyleGAN이 학습된 도메인이라면 모두 적용 가능하다.
24
Discussion Limit
ID-Loss를 통해 Identity개선이 있었지만
결국 StyleGAN을 활용하기 때문에 학습되지 않은 feature를 만드는데 한계를 보였다.
• 얼굴 이외의 배경에 취약
• 측면 이미지에 취약
25
Conclusion
• Directly map a real image into the W+ latent space with no optimization required
• Propose a generic framework for solving various image-to-image translation tasks
• In contrast to the “invert first, edit later”, directly encode these translation tasks to StyleGAN
26
Q & A

More Related Content

What's hot

AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetJungwon Kim
 
Artificial Intelligence and Machine Learning Training & Virtual Internships P...
Artificial Intelligence and Machine Learning Training & Virtual Internships P...Artificial Intelligence and Machine Learning Training & Virtual Internships P...
Artificial Intelligence and Machine Learning Training & Virtual Internships P...Ravikanth Jagarlapudi
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Recommender system with artificial intelligence for fitness assistance system
Recommender system with artificial intelligence for fitness assistance systemRecommender system with artificial intelligence for fitness assistance system
Recommender system with artificial intelligence for fitness assistance systemVenkat Projects
 
Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptxHchethankumar
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptxfunnyworld18
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...OpenSource Connections
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeTauhidul Khandaker
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Hsing-chuan Hsieh
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

What's hot (20)

AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, Resnet
 
Artificial Intelligence and Machine Learning Training & Virtual Internships P...
Artificial Intelligence and Machine Learning Training & Virtual Internships P...Artificial Intelligence and Machine Learning Training & Virtual Internships P...
Artificial Intelligence and Machine Learning Training & Virtual Internships P...
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Recommender system with artificial intelligence for fitness assistance system
Recommender system with artificial intelligence for fitness assistance systemRecommender system with artificial intelligence for fitness assistance system
Recommender system with artificial intelligence for fitness assistance system
 
Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptx
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, Pe
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
AlexNet
AlexNetAlexNet
AlexNet
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

Similar to Encoding in Style: a Style Encoder for Image-to-Image Translation

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemonJuyongLee21
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolutionNAVER Engineering
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)jungminchung
 
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전Sukwoo Lee
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipelinechangehee lee
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)khuhacker
 
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...MYEONGGYU LEE
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...Dae Hyun Nam
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...jungminchung
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderLee Seungeun
 
Cocos2d x a to z (상)
Cocos2d x a to z (상)Cocos2d x a to z (상)
Cocos2d x a to z (상)SeungIl Choi
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksOh Yoojin
 
[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer GraphicsChris Ohk
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...JaeYeongKo
 
Shaderstudy Motion Blur
Shaderstudy Motion BlurShaderstudy Motion Blur
Shaderstudy Motion Bluryong gyun im
 
NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집noerror
 

Similar to Encoding in Style: a Style Encoder for Image-to-Image Translation (20)

GAN based selfie-to-pokemon
GAN based selfie-to-pokemonGAN based selfie-to-pokemon
GAN based selfie-to-pokemon
 
Deep learning super resolution
Deep learning super resolutionDeep learning super resolution
Deep learning super resolution
 
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
 
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
진화하는 컴퓨터 하드웨어와 게임 개발 기술의 발전
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
2015 제2회 동아리 해커 세미나 - 병렬컴퓨팅 소개 (16기 김정현)
 
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
 
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-In...
 
Codex project
Codex projectCodex project
Codex project
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
 
Cocos2d x a to z (상)
Cocos2d x a to z (상)Cocos2d x a to z (상)
Cocos2d x a to z (상)
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics[9XD] Introduction to Computer Graphics
[9XD] Introduction to Computer Graphics
 
[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...[Pix2 pix] image to-image translation with conditional adversarial network re...
[Pix2 pix] image to-image translation with conditional adversarial network re...
 
Shaderstudy Motion Blur
Shaderstudy Motion BlurShaderstudy Motion Blur
Shaderstudy Motion Blur
 
NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집NDC08_실시간비주얼그래프편집
NDC08_실시간비주얼그래프편집
 
엔지니어링 비젼_동영상제거.pptx
엔지니어링 비젼_동영상제거.pptx엔지니어링 비젼_동영상제거.pptx
엔지니어링 비젼_동영상제거.pptx
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Encoding in Style: a Style Encoder for Image-to-Image Translation

  • 1. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation 2021. 11. 21 김준철, 고형권, 김상현, 전선영, 조경진, 허다운 Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
  • 2. 2 • Background • Introduction • Related Work • The pSp Framework • Applications and Experiments • Discussion • Conclusion Contents
  • 4. 4 Background W W+ space • W space : 512 dimension Latent z vector 로부터 만들어지는 하나의 벡터 • W+ space : 18 x 512 dimension Generator에 Style로 적용되기 전에 Affine layer를 지나 가공된 벡터 Synthesis network
  • 5. 5 Introduction Structure 1. Novel encoder architecture ( Image to w+ space directly ) 2. Encoder architecture ( Feature Pyramid Network ) 3. Fixed pre-trained StyleGAN
  • 6. 6 Introduction Previous Problem 1. Input must be invertable Latent code가 존재하지 않는 Feature도 변환 할 수 있는 모델 2. Previous models can solve only a single problem Pix2pix의 정신을 이어받은 Generic Framework 3. Adversary discriminator needs to be trained 학습에 Discriminator가 필요하지 않은 모델 4. Explicitly feed the generator with residual feature maps – locality bias Style vector를 보내는 것으로 locality bias 완화
  • 7. 7 Related Work 01 GAN Inversion 02 Latent Space Manipulation 03 Image-to-Image
  • 8. GAN Inversion : 입력 이미지를 토대로 GAN Model이 유사한 이미지를 재생성 8 Related Work GAN Inversion • Previous Work 1. Latent vector optimization for a single image 2. Image-to-Latent space mapping 위의 방법은 성능은 좋지만 시간이 오래 걸리는 문제가 있다. 이미지를 효율적으로 W+ vector 로 변환시키는 모델 • 추가적인 최적화가 없는 모델 • Discriminator 없는 학습
  • 9. 9 Related Work Latent Space manipulation • Previous Work 1. Search Linear Directions Attributes 2. Train semantic face edits with Pre-trained Model 3. Search latent space with image transformation(zoom, rotate) 4. PCA of an intermediate activation space in un-supervised manner 5. Editing by changing latent space Image Editing “invert first, edit later” 한번에 해결 하자 Latent Space manipulation : Latent Space를 활용하여 이미지를 수정 Latent space
  • 10. 10 Related Work Image-to-image • Previous Work : 각 Domain 변환에 새로운 모델을 개발해야 했다. 하나의 모델로 여러가지 Task를 해결 할 수 있다. Image-to-image : 이미지의 Domain간의 변환
  • 12. 12 pSp Framework 01 Architecture 02 Loss Function 03 Benefits of StyleGAN
  • 13. 13 The pSp Framework Architecture • Encoder의 마지막 Feature Map만으로만 만들어진 Style은 Fine details 를 살리지 못했다. • 각 계층(Coarse, Medium, Fine)마다 map2style network를 적용하였다. pSp Architecture
  • 14. 14 The pSp Framework Loss Function • L2-Loss • LPIPS-Loss • Regularization-Loss • ID-Loss F : Perceptual feature extractor E : Encoder R : ArcFace Network
  • 15. 15 The pSp Framework Loss Function Regularization-Loss ID-Loss • Model output : mean of pre-trained w+ vector • StyleGAN의 한계 - 학습된 데이터의 분포를 따라갈 수 밖에 없다. • Real Image에 강건한 모델 - 얼굴인식에 쓰이는 ArcFace Loss를 활용한다.
  • 16. 16 The pSp Framework The Benefits of The StyleGAN Domain 1. Pixel에 집중하는 local operation에서 벗어나 global operation이 가능해졌다. Local bias limit으로부터 자유로워졌다. 2. StyleGAN으로부터 Disentanglement를 학습 하기 때문에 semantic attribute를 조정하기 용이함 Multi-modal synthesis를 가능하게 만들었다.
  • 18. 18 Applications and Experiments StyleGAN Inversion
  • 19. 19 Applications and Experiments StyleGAN Inversion • Ablation Study
  • 20. 20 Applications and Experiments Face Frontalization
  • 21. 21 Applications and Experiments Conditional Image Synthesis
  • 22. 22 Extending to Other Applications Others
  • 23. 23 Going Beyond the Facial Domain • StyleGAN이 학습된 도메인이라면 모두 적용 가능하다.
  • 24. 24 Discussion Limit ID-Loss를 통해 Identity개선이 있었지만 결국 StyleGAN을 활용하기 때문에 학습되지 않은 feature를 만드는데 한계를 보였다. • 얼굴 이외의 배경에 취약 • 측면 이미지에 취약
  • 25. 25 Conclusion • Directly map a real image into the W+ latent space with no optimization required • Propose a generic framework for solving various image-to-image translation tasks • In contrast to the “invert first, edit later”, directly encode these translation tasks to StyleGAN