SlideShare a Scribd company logo
1 of 46
Download to read offline
Image-to-Image Translation:
Methods and Applications
Yingxue Pang, Jianxin Lin, Tao Qin, and Zhibo Chen
IEEE Transactions on Multimedia, 2021
,
2022/6/17
◼Image-to-Image translation (I2I)
I.
i.
◼Image-to-Image translation (I2I)
•
•
•
◼Variational Auto Encoder
[Kingma & Welling, arXiv2014]
• 𝓏 𝓍
𝑝𝜃(𝓍|𝓏)
• 𝓍 𝑝𝜃(𝓍)
• log 𝑝𝜃(𝓍)
◼Generative Adversarial Networks
[Goodfellow+, NeurIPS2014]
•
•
•
Generative Adversarial Network (GAN)
◼Unconditional GAN
• GAN [Goodfellow+, NeurIPS2014]
• DGANs [Radford+, ICLR2016]
•
◼Conditional GAN [Mirza & Osindero, arXiv2014]
•
•
or
◼Image-to-Image translation (I2I)
I.
ii.
◼ Peak signal-to-noise ratio (PSNR)
•
◼Structural similarity index (SSIM) [Wang+, IEEE2004]
• ground truth
◼Inception score (IS) [Salimans+, NeurIPS2016]
• Inception
◼Mask-SSIM and Mask-IS [Ma+, NeurIPS2017]
•
•
◼Conditional inception score (CIS) [Huang+, ECCV2018]
• I2I
•
◼Perceptual distance (PD) [Johnson+, ECCV2016]
•
◼Fréchet inception distance (FID) [Heusel+, NeurIPS2017]
•
◼Kernel inception distance (KID) [Binkowski+, ICLR2018]
• Inception network [Szegedy+, CVPR2016]
•
◼Single image Fréchet inception distance (SIFID) [Shaham+, ICCV2019]
•
• FID
◼Learned Perceptual Image Patch Similarity (LPIPS)
[Zhang+, CVPR2018]
•
•
•
◼FCN scores [Isola+, CVPR2017]
• Semantic map
• IoU
◼Classification accuracy [Liu+, NeurIPS2017]
•
•
◼Density and Coverage (DC) [Naeem+, PMLR2020]
•
•
◼Image-to-Image translation (I2I)
II. Two-domain I2I , Multi-domain I2I
i. Two-Domain Image-To-Image Translation
Two-Domain Image-To-Image Translation
◼Two-Domain I2I
• Image style transfer [Zhu+, ICCV2017]
• Semantic segmentation [Park+, CVPR2017]
• Image colorization [Suárez+, CVPR2017],
◼Two-Domain I2I
• Supervised
•
• Unsupervised
•
• Semi-supervised
•
• Few-shot
•
Image style transfer
Image colorization
Supervised I2I Single-modal Output
◼Single-modal Output
•
◼Pix2pix [Isola+, CVPR2017]
• Conditional GAN
• 𝐿1
◼Pix2PixHD [Wang+, CVPR2018]
• 2048 1024
◼SPADE [Park+, CVPR2019]
•
SPADE [Park+, CVPR2019]
Supervised I2I Multimodal Output
◼Multimodal Outputs
•
•
◼BicycleGAN [Zhu+, NeurIPS2017]
• cVAE-GAN [Hinton & Salakhutdinov, Science2006]
cLR-GAN [Chen+, NeurIPS2016]
•
• I2I
◼PixelNN [Bansal+, ICLR2018]
•
• ( ) PixelNN [Bansal+, ICLR2018]
Unsupervised I2I Single-modal Output
◼Translation using
a Cycle consistency Constraint
•
• Cycle-consistency Loss
•
loss
• CycleGAN [Zhu+, ICCV2017]
• 2
• U-GAT-IT [Kim+, ICLR2019]
•
•
CycleGAN [Zhu+, ICCV2017]
Unsupervised I2I Single-modal Output
◼Translation beyond Cycle-consistency Constraint
•
• TraVeLGAN [Amodio & Krishnaswamy, CVPR2019]
•
• CUT [Park+, ECCV2020]
•
•
Unsupervised I2I Single-modal Output
◼Translation of Fine-grained Objects
•
◼DAGAN [Ma+, CVPR2018]
◼Attention GAN
[Chen+, ECCV2018]
◼Attention guided GAN
[Mejjati+, NeurIPS2017]
Unsupervised I2I Single-modal Output
◼Translation by combining knowledge in other fields
•
◼RevGAN [Ouderaa & Worrall, CVPR2019]
◼Art2Real [Tomei+, CVPR2019]
◼GDWCT [Cho+, CVPR2019]
◼NICE-GAN [Chen+, CVPR2020]
Art2Real [Tomei+, CVPR2019]
Unsupervised I2I Multimodal Output
◼CycleGAN
• DSVIB [Kazemi+, NeurIPS2018]
• Augmented CycleGAN [Almahairi+, PMLR2018]
◼Disentangled representations
• cd-GAN [Lin+, CVPR2018]
• MUNIT [Huang+, ECCV2018]
• DRIT [Lee+, ECCV2018]
• EGSC-IT [Ma+, ICLR2018]
◼
• INIT [Shen+, CVPR2019]
• DSMAP [Chang+, ECCV2020]
DRIT [Lee+, ECCV2018]
Semi-supervised Image-to-Image Translation
◼TCR-SSIT [Mustafa & Mantiuk, ECCV2020]
• Transformation Consistency Regularization (TCR)
•
Few-Shot Image-to-Image Translation
◼ I2I
◼Transferring GAN
[Wang+, ECCV2018]
• GAN
•
◼MT-GAN [Lin+, arXiv2019]
•
•
One-shot Image-to-Image Translation
◼
• OST [Benaim & Wolf, NeurIPS2018]
•
•
• BiOST [Cohen & Wolf, ICCV2019]
•
•
◼
• TuiGAN [Lin+, ECCV2020]
•
• coarse-to-fine
◼Image-to-Image translation (I2I)
II. Two-domain I2I , Multi-domain I2I
ii. Multi-Domain Image-To-Image Translation
Unsupervised I2I Single-modal Output
◼Training with multi-modules
• 2 I2I
•
• Domain-Bank [Hui+, ICPR2018]
• n
n
• ModularGAN [Zhao+, ECCV2018]
•
Training with one generator and discriminator pair
◼Training with one generator and discriminator pair
•
•
• StarGAN [Choi+, CVPR2018]
•
•
• AttGAN [He+, IEEE TIP2019]
•
•
• RelGAN [Wu+, ICCV2019] STGAN [Liu+, CVPR2019]
•
•
StarGAN, RelGAN
Training by combining knowledge in other fields
◼Training by combining knowledge in other fields
•
• Fixed-Point GAN [Siddiquee+, ICCV2019]
•
• SGN [Chang+, ICCV2019]
•
sym-parameter
• INIT [Cao+, ECCV2020]
•
• ADSPM [Wu+, ICCV2019]
•
SGN Structure
Unsupervised I2I Multimodal Output
◼ DosGAN [Lin+, arXiv2019]
• CNN
•
◼ GANimation [Pumarola+, ECCV2018]
•
•
◼ Disentanglement assumption
• UFDN [Liu+, NeurIPS2018]
• DMIT [Yu+, NeurIPS2019]
• StarGAN v2 [Choi+, CVPR2020]
• DRIT++ [Lee+, ECCV2018]
• GMM-UNIT [Liu+, arXiv2020]
Semi-Supervised Multi-domain I2I
◼SEMIT [Wang+, CVPR2020]
• Few-shot I2I
•
◼AGUIT [Li+, arXiv2019]
1.
2. AdaIN [Huang+, ICCV2017]
3. Cycle-consistency Loss Feature-consistency Loss
Few-shot Multi-domain I2I
◼FUNIT [Liu+, ICCV2019]
•
• I2I
• I2I
◼COCO-FUNIT [Saito+, ECCV2020]
•
•
◼ZstGAN [Lin+, arXiv2019]
• zero-shot I2I
•
◼Image-to-Image translation (I2I)
II. Two-domain I2I , Multi-domain I2I
iii.
Experimental Evaluation
◼ Two-Domain I2I
• Pix2pix [Isola+, CVPR2017]
• BicycleGAN [Zhu+, NeurIPS2017]
• CycleGAN [Zhu+, ICCV2017]
• U-GAT-IT [Kim+, ICLR2019]
• GDWCT [Cho+, CVPR2019]
• CUT [Park+, ECCV2020]
• MUNIT [Huang+, ECCV2018]
◼ Multi-Domain I2I
• StarGAN [Choi+, CVPR2018]
• AttGAN [He+, IEEE TIP2019]
• STGAN [Liu+, CVPR2019]
• DosGAN [Lin+, arXiv2019]
• StarGANv2 [Choi+, CVPR2020]
◼UT-Zap50K [Yu & Grauman, CVPR2019]
•
• 49826
• 200
• 256 256
◼CelebA [Liu, ICCV2015]
• 202599
• 40
• train, val, test 8:1:1
• 178 218 center crop
•
◼Inception score (IS) [Salimans+, NeurIPS2016]
•
◼Fréchet inception distance (FID) [Heusel+, NeurIPS2017]
•
◼LPIPS [Zhang+, CVPR2018]
• LPIPS
•
Two-Domain I2I
◼Single-modal ◼Multimodal
Two-Domain I2I
◼Single-modal
•
• CUT FID IS
•
• StyleGAN
[Karras+, CVPR2019]
◼Multimodal
•
• LPIPS 0.047
MultiDomain I2I
◼Single-modal ◼Multimodal
MultiDomain I2I
◼StarGANv2
• FID IS LPIPS StarGANv2
• LPIPS 0.336
◼Image-to-Image translation (I2I)
III. I2I
i. I2I
ii.
Application
◼
• [Park+, CVPR2019]
• [Yan+, ACM2017]
•
• [Isola+, CVPR2017]
• [Almahairi+, PMLR2018]
• [Mao+, CVPR2019]
• [Shocher+, CVPR2020]
• [Pathak+, CVPR2016]
• [Zheng+, AAAI2019]
◼
• [Hicsinmez+, Image and Vision Computing, 2020]
• [Taigman+, ICLR2019]
◼
• [Zhu+, ICCV2017]
• [Kim+, ICML2017]
◼
• [Yi+, ICCV2017]
• [Park+, NeurIPS2020]
◼
• [Yuan+, CVPR2018]
• [Manakov+, DART2019]
• [Zhang+, TCSVT2020]
• [Engin+, CVPRW2019]
• [Madam+, ECCV2018]
◼
• I2I
◼
•
◼AMT perceptual studies
• Amazon Mechanical Turk (AMT)
•
•
Two-domain, Multi-domain I2Iの分類
◼Inception score (IS) Fréchet inception distance (FID)
• Two-Domain & Single-modal
•
• Two-Domain & Multimodal
• 19 FID
• Multi-Domain & Single-modal
• FID
• Multi-Domain & Multimodal
• 19
• FID

More Related Content

What's hot

点群深層学習 Meta-study
点群深層学習 Meta-study点群深層学習 Meta-study
点群深層学習 Meta-studyNaoya Chiba
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイyohei okawa
 
[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and PoseDeep Learning JP
 
【DL輪読会】WIRE: Wavelet Implicit Neural Representations
【DL輪読会】WIRE: Wavelet Implicit Neural Representations【DL輪読会】WIRE: Wavelet Implicit Neural Representations
【DL輪読会】WIRE: Wavelet Implicit Neural RepresentationsDeep Learning JP
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイcvpaper. challenge
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...Deep Learning JP
 
画像処理AIを用いた異常検知
画像処理AIを用いた異常検知画像処理AIを用いた異常検知
画像処理AIを用いた異常検知Hideo Terada
 
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...Deep Learning JP
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...Deep Learning JP
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)Takanori Ogata
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎Takumi Ohkuma
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII
 
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual VideosDeep Learning JP
 
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないことNorishige Fukushima
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)Deep Learning JP
 

What's hot (20)

点群深層学習 Meta-study
点群深層学習 Meta-study点群深層学習 Meta-study
点群深層学習 Meta-study
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイ
 
[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose
 
【DL輪読会】WIRE: Wavelet Implicit Neural Representations
【DL輪読会】WIRE: Wavelet Implicit Neural Representations【DL輪読会】WIRE: Wavelet Implicit Neural Representations
【DL輪読会】WIRE: Wavelet Implicit Neural Representations
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...
【DL輪読会】EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Pointsfor...
 
画像処理AIを用いた異常検知
画像処理AIを用いた異常検知画像処理AIを用いた異常検知
画像処理AIを用いた異常検知
 
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
 
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)
 

More from Toru Tamaki

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...Toru Tamaki
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...Toru Tamaki
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNetToru Tamaki
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A surveyToru Tamaki
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex ScenesToru Tamaki
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...Toru Tamaki
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video SegmentationToru Tamaki
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New HopeToru Tamaki
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...Toru Tamaki
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt TuningToru Tamaki
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in MoviesToru Tamaki
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICAToru Tamaki
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context RefinementToru Tamaki
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...Toru Tamaki
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...Toru Tamaki
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusionToru Tamaki
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous DrivingToru Tamaki
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large MotionToru Tamaki
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense PredictionsToru Tamaki
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 

More from Toru Tamaki (20)

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 

Recently uploaded

AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf
AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdfAWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf
AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdfFumieNakayama
 
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)Hiroshi Tomioka
 
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdfクラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdfFumieNakayama
 
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineerYuki Kikuchi
 
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)UEHARA, Tetsutaro
 
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案sugiuralab
 
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?akihisamiyanaga1
 
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察 ~Text-to-MusicとText-To-ImageかつImage-to-Music...
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察  ~Text-to-MusicとText-To-ImageかつImage-to-Music...モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察  ~Text-to-MusicとText-To-ImageかつImage-to-Music...
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察 ~Text-to-MusicとText-To-ImageかつImage-to-Music...博三 太田
 

Recently uploaded (8)

AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf
AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdfAWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf
AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf
 
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)
業務で生成AIを活用したい人のための生成AI入門講座(社外公開版:キンドリルジャパン社内勉強会:2024年4月発表)
 
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdfクラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf
クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf
 
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer
自分史上一番早い2024振り返り〜コロナ後、仕事は通常ペースに戻ったか〜 by IoT fullstack engineer
 
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)
デジタル・フォレンジックの最新動向(2024年4月27日情洛会総会特別講演スライド)
 
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案
TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案
 
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?
CTO, VPoE, テックリードなどリーダーポジションに登用したくなるのはどんな人材か?
 
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察 ~Text-to-MusicとText-To-ImageかつImage-to-Music...
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察  ~Text-to-MusicとText-To-ImageかつImage-to-Music...モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察  ~Text-to-MusicとText-To-ImageかつImage-to-Music...
モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察 ~Text-to-MusicとText-To-ImageかつImage-to-Music...
 

文献紹介:Image-to-Image Translation: Methods and Applications

  • 1. Image-to-Image Translation: Methods and Applications Yingxue Pang, Jianxin Lin, Tao Qin, and Zhibo Chen IEEE Transactions on Multimedia, 2021 , 2022/6/17
  • 4. ◼Variational Auto Encoder [Kingma & Welling, arXiv2014] • 𝓏 𝓍 𝑝𝜃(𝓍|𝓏) • 𝓍 𝑝𝜃(𝓍) • log 𝑝𝜃(𝓍) ◼Generative Adversarial Networks [Goodfellow+, NeurIPS2014] • • •
  • 5. Generative Adversarial Network (GAN) ◼Unconditional GAN • GAN [Goodfellow+, NeurIPS2014] • DGANs [Radford+, ICLR2016] • ◼Conditional GAN [Mirza & Osindero, arXiv2014] • • or
  • 7. ◼ Peak signal-to-noise ratio (PSNR) • ◼Structural similarity index (SSIM) [Wang+, IEEE2004] • ground truth ◼Inception score (IS) [Salimans+, NeurIPS2016] • Inception ◼Mask-SSIM and Mask-IS [Ma+, NeurIPS2017] • •
  • 8. ◼Conditional inception score (CIS) [Huang+, ECCV2018] • I2I • ◼Perceptual distance (PD) [Johnson+, ECCV2016] • ◼Fréchet inception distance (FID) [Heusel+, NeurIPS2017] •
  • 9. ◼Kernel inception distance (KID) [Binkowski+, ICLR2018] • Inception network [Szegedy+, CVPR2016] • ◼Single image Fréchet inception distance (SIFID) [Shaham+, ICCV2019] • • FID ◼Learned Perceptual Image Patch Similarity (LPIPS) [Zhang+, CVPR2018] • • •
  • 10. ◼FCN scores [Isola+, CVPR2017] • Semantic map • IoU ◼Classification accuracy [Liu+, NeurIPS2017] • • ◼Density and Coverage (DC) [Naeem+, PMLR2020] • •
  • 11. ◼Image-to-Image translation (I2I) II. Two-domain I2I , Multi-domain I2I i. Two-Domain Image-To-Image Translation
  • 12. Two-Domain Image-To-Image Translation ◼Two-Domain I2I • Image style transfer [Zhu+, ICCV2017] • Semantic segmentation [Park+, CVPR2017] • Image colorization [Suárez+, CVPR2017], ◼Two-Domain I2I • Supervised • • Unsupervised • • Semi-supervised • • Few-shot • Image style transfer Image colorization
  • 13. Supervised I2I Single-modal Output ◼Single-modal Output • ◼Pix2pix [Isola+, CVPR2017] • Conditional GAN • 𝐿1 ◼Pix2PixHD [Wang+, CVPR2018] • 2048 1024 ◼SPADE [Park+, CVPR2019] • SPADE [Park+, CVPR2019]
  • 14. Supervised I2I Multimodal Output ◼Multimodal Outputs • • ◼BicycleGAN [Zhu+, NeurIPS2017] • cVAE-GAN [Hinton & Salakhutdinov, Science2006] cLR-GAN [Chen+, NeurIPS2016] • • I2I ◼PixelNN [Bansal+, ICLR2018] • • ( ) PixelNN [Bansal+, ICLR2018]
  • 15. Unsupervised I2I Single-modal Output ◼Translation using a Cycle consistency Constraint • • Cycle-consistency Loss • loss • CycleGAN [Zhu+, ICCV2017] • 2 • U-GAT-IT [Kim+, ICLR2019] • • CycleGAN [Zhu+, ICCV2017]
  • 16. Unsupervised I2I Single-modal Output ◼Translation beyond Cycle-consistency Constraint • • TraVeLGAN [Amodio & Krishnaswamy, CVPR2019] • • CUT [Park+, ECCV2020] • •
  • 17. Unsupervised I2I Single-modal Output ◼Translation of Fine-grained Objects • ◼DAGAN [Ma+, CVPR2018] ◼Attention GAN [Chen+, ECCV2018] ◼Attention guided GAN [Mejjati+, NeurIPS2017]
  • 18. Unsupervised I2I Single-modal Output ◼Translation by combining knowledge in other fields • ◼RevGAN [Ouderaa & Worrall, CVPR2019] ◼Art2Real [Tomei+, CVPR2019] ◼GDWCT [Cho+, CVPR2019] ◼NICE-GAN [Chen+, CVPR2020] Art2Real [Tomei+, CVPR2019]
  • 19. Unsupervised I2I Multimodal Output ◼CycleGAN • DSVIB [Kazemi+, NeurIPS2018] • Augmented CycleGAN [Almahairi+, PMLR2018] ◼Disentangled representations • cd-GAN [Lin+, CVPR2018] • MUNIT [Huang+, ECCV2018] • DRIT [Lee+, ECCV2018] • EGSC-IT [Ma+, ICLR2018] ◼ • INIT [Shen+, CVPR2019] • DSMAP [Chang+, ECCV2020] DRIT [Lee+, ECCV2018]
  • 20. Semi-supervised Image-to-Image Translation ◼TCR-SSIT [Mustafa & Mantiuk, ECCV2020] • Transformation Consistency Regularization (TCR) •
  • 21. Few-Shot Image-to-Image Translation ◼ I2I ◼Transferring GAN [Wang+, ECCV2018] • GAN • ◼MT-GAN [Lin+, arXiv2019] • •
  • 22. One-shot Image-to-Image Translation ◼ • OST [Benaim & Wolf, NeurIPS2018] • • • BiOST [Cohen & Wolf, ICCV2019] • • ◼ • TuiGAN [Lin+, ECCV2020] • • coarse-to-fine
  • 23. ◼Image-to-Image translation (I2I) II. Two-domain I2I , Multi-domain I2I ii. Multi-Domain Image-To-Image Translation
  • 24. Unsupervised I2I Single-modal Output ◼Training with multi-modules • 2 I2I • • Domain-Bank [Hui+, ICPR2018] • n n • ModularGAN [Zhao+, ECCV2018] •
  • 25. Training with one generator and discriminator pair ◼Training with one generator and discriminator pair • • • StarGAN [Choi+, CVPR2018] • • • AttGAN [He+, IEEE TIP2019] • • • RelGAN [Wu+, ICCV2019] STGAN [Liu+, CVPR2019] • • StarGAN, RelGAN
  • 26. Training by combining knowledge in other fields ◼Training by combining knowledge in other fields • • Fixed-Point GAN [Siddiquee+, ICCV2019] • • SGN [Chang+, ICCV2019] • sym-parameter • INIT [Cao+, ECCV2020] • • ADSPM [Wu+, ICCV2019] • SGN Structure
  • 27. Unsupervised I2I Multimodal Output ◼ DosGAN [Lin+, arXiv2019] • CNN • ◼ GANimation [Pumarola+, ECCV2018] • • ◼ Disentanglement assumption • UFDN [Liu+, NeurIPS2018] • DMIT [Yu+, NeurIPS2019] • StarGAN v2 [Choi+, CVPR2020] • DRIT++ [Lee+, ECCV2018] • GMM-UNIT [Liu+, arXiv2020]
  • 28. Semi-Supervised Multi-domain I2I ◼SEMIT [Wang+, CVPR2020] • Few-shot I2I • ◼AGUIT [Li+, arXiv2019] 1. 2. AdaIN [Huang+, ICCV2017] 3. Cycle-consistency Loss Feature-consistency Loss
  • 29. Few-shot Multi-domain I2I ◼FUNIT [Liu+, ICCV2019] • • I2I • I2I ◼COCO-FUNIT [Saito+, ECCV2020] • • ◼ZstGAN [Lin+, arXiv2019] • zero-shot I2I •
  • 30. ◼Image-to-Image translation (I2I) II. Two-domain I2I , Multi-domain I2I iii.
  • 31. Experimental Evaluation ◼ Two-Domain I2I • Pix2pix [Isola+, CVPR2017] • BicycleGAN [Zhu+, NeurIPS2017] • CycleGAN [Zhu+, ICCV2017] • U-GAT-IT [Kim+, ICLR2019] • GDWCT [Cho+, CVPR2019] • CUT [Park+, ECCV2020] • MUNIT [Huang+, ECCV2018] ◼ Multi-Domain I2I • StarGAN [Choi+, CVPR2018] • AttGAN [He+, IEEE TIP2019] • STGAN [Liu+, CVPR2019] • DosGAN [Lin+, arXiv2019] • StarGANv2 [Choi+, CVPR2020]
  • 32. ◼UT-Zap50K [Yu & Grauman, CVPR2019] • • 49826 • 200 • 256 256 ◼CelebA [Liu, ICCV2015] • 202599 • 40 • train, val, test 8:1:1 • 178 218 center crop •
  • 33. ◼Inception score (IS) [Salimans+, NeurIPS2016] • ◼Fréchet inception distance (FID) [Heusel+, NeurIPS2017] • ◼LPIPS [Zhang+, CVPR2018] • LPIPS •
  • 35. Two-Domain I2I ◼Single-modal • • CUT FID IS • • StyleGAN [Karras+, CVPR2019] ◼Multimodal • • LPIPS 0.047
  • 37. MultiDomain I2I ◼StarGANv2 • FID IS LPIPS StarGANv2 • LPIPS 0.336
  • 39. Application ◼ • [Park+, CVPR2019] • [Yan+, ACM2017] • • [Isola+, CVPR2017] • [Almahairi+, PMLR2018] • [Mao+, CVPR2019] • [Shocher+, CVPR2020] • [Pathak+, CVPR2016] • [Zheng+, AAAI2019]
  • 40. ◼ • [Hicsinmez+, Image and Vision Computing, 2020] • [Taigman+, ICLR2019] ◼ • [Zhu+, ICCV2017] • [Kim+, ICML2017] ◼ • [Yi+, ICCV2017] • [Park+, NeurIPS2020]
  • 41. ◼ • [Yuan+, CVPR2018] • [Manakov+, DART2019] • [Zhang+, TCSVT2020] • [Engin+, CVPRW2019] • [Madam+, ECCV2018]
  • 43.
  • 44. ◼AMT perceptual studies • Amazon Mechanical Turk (AMT) • •
  • 46. ◼Inception score (IS) Fréchet inception distance (FID) • Two-Domain & Single-modal • • Two-Domain & Multimodal • 19 FID • Multi-Domain & Single-modal • FID • Multi-Domain & Multimodal • 19 • FID