Unsupervised image to-image translation via pre-trained style gan2 network

Unsupervised Image-to-Image Translation
via Pre-trained StyleGAN2 Network
Kwang Hee Lee

Abstract
• Traditional I2I translation
✓ Train data in two or more domains together
✓ Require lots of computation resources
✓ Lower quality and many artifacts
✓ Training process could be unstable when the data in different domains are not balanced
✓ Modal collapse is more likely to happen
• Proposed a new I2I translation method
✓ Generates a new model in the target domain
✓ via a series of model transformations
✓ on a pretrained StyleGAN2 model in the source domain
✓ Proposed an inversion method

Related Works
Image Translation
Paired Unpaired
Unimodal Multi-modal Unimodal Multi-modal Multi-domain Multi-mapping
Pix2Pix
Pix2PixHD
BicycleGAN CycleGAN
DiscoGAN
UNIT
MUNIT
DRIT
StarGAN DRIT++
DMIT
SDIT
StarGAN
v2

Related Works
Multi-modal
Multi-domain

Related Works-Multi-domain Translation
StarGAN (CVPR2018)
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Yunjey Choi1,2, Minje Choi1,2, Munyoung Kim2,3, Jung-Woo Ha2, Sung Kim2,4,
and Jaegul Choo1,2 IEEE CVPR, 2018 (Oral)

Related Works-Multi-modal Translation
MUNIT (ECCV2018)
DRIT (ECCV2018)
Lee, Hsin-Ying, et al. "Diverse image-to-image translation via disentangled representations." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

Related Works-Multi-mapping Translation
Lee, Hsin-Ying, et al. "DRIT++: Diverse Image-to-Image Translation via Disentangled Representations." arXiv preprint arXiv:1905.01270 (2019).
DRIT++(arXiv2019)

Related Works-Examplar Guided I2I
Wang, Miao, et al. "Example-guided style-consistent image synthesis from semantic
labeling." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
Zhu, Zhen, et al. "Progressive pose attention transfer for person image
generation." Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 2019.

Related Works-Current I2I
• Require online training on at least two domains
• Domain-distinguished generators and discriminators
• High demand on training resources in terms of both time and memory
• Modal collapse when training data are not balanced in any domains

Related Works-StyleGAN-based I2I

Related Works-GAN Inversion Mapping
Zhu, Jiapeng, et al. "In-domain gan inversion for real image editing." European
conference on computer vision. Springer, Cham, 2020.
Goal:
Image Editing
Image2Image Translation

Major Contributions
• Define the distance between two models to measure the semantic similarity between
two images generated by two models, based on the same input latent vector
• Propose an unsupervised I2I translation method via a pre-trained StyleGAN2 model
• Support multi-modal and multi-domain I2I translation
• Drastically improved the results and requires much less training resource
• Proposed an inversion method is based on an embedded GAN space providing a
boundary constraint for searching the latent code of the input image

Method
• 가정: source domain 과 target domain의 모델간 semantic similarity가 작을 경우에 좋은
결과를 얻을 수 있음.
• Worlflow
• Pretrained StyleGAN2 model on source domain dataset Ds
• Fine-tuning target domain model Gt using source domain model Gs and target dataset Dt
• Source domain의 image가 주어지면, source domain model로 부터 latent code를 찾음.
• 찾아진 latent code를 fine tuning된 target domain model에 넣어서 image를 생성함.

Method
• Fine-tuning with data in target domain Dt
• Freeze the FC layers in fine-tuning process
• To have the same embedded space as the base model
Fine tuning process
Fine-tuned model

Method
• During fine-tuning, the semantic similarity decreases due to domain difference.
• Layer-swapping: source domain 의 더 많은 feature 들을 유지시키기 위해서
• Source domain model과 target domain model 사이의 model distance를 줄여주는 역할

Method
• Model Transform
Fine-tuning
Layer-swapping
Model
Transformation

Method
• Image-to-code inversion

Method
From StyleGAN2 paper:
Problem: w가 embedded space에 속하는지 보장하지 못함.

Method
The difference between two latent codes:
(n: a specific semantic attribute)
(n에 대한 편미분)

Method
• Multi-modal and Multi-domain I2I translation
• Style code injected at the higher layers(8x8, 16x16,…) can change a major structure of the
output (identity, hair style, face shape)
• Style code applied at lower layers can modify only minor features(color, light conditions and
other micro-structures)
• Multi-modal (다양한 스타일 생성)
• Multi-domain (다양한 도메인 생성)
를 통해서 을 얻을 수 있음.

Experiment
• Baseline models
• CycleGAN
• MUNIT
• DRIT++
• Different scenarios
• Face2portrait
• Face2cartoon
• Face2anime
• Cat2dog
• Cat2wild
• FFHQ for face cases, AFHQ for cats, dogs and wild animals
• Portrait (wikiart), anime(Danbooru2018) and cartoon(Toonify)

Expriment
• Implementation Details
• A2B
• Freeze FC part of the A model
• Fine-tuning: 12000(low resolution cases), 20000(high resolution cases) iterations
• 1024 x 1024 (face2portrait, face2cartoon), 256 x 256 (face2anime)
• 2080Ti GPU(11G), fine-tuning 2 days
• Kept the training strategy and loss functions the same as those in the original StyleGAN2
• Layer-swap: 0.3ms, 8x8, 16x16, 32x32 ,
• Inversion process: 1000 iterations, 0.8s~1s

Experiment-Multi-modal, Multi-domain

Experiment-Reference guided image generation

Experiment-Ablation Study of Fine-tuning

Experiment
• FT(Fine-tuning)
• FC(Freeze FC layer)
• LS(Layer Swap)

Experiment-Layer Swap Analysis

Experiment-Model Distance
[55] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of
deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 586–595, 2018.

Experiment
• 2000 test images chosen randomly
• LPIPSd (Diversity): LPIPS between two randomly selected images in the generated results
• LPIPSs (Semantic Similarity): LPIPS between the generated image and the input

Unsupervised image to-image translation via pre-trained style gan2 network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unsupervised image to-image translation via pre-trained style gan2 network

Similar to Unsupervised image to-image translation via pre-trained style gan2 network (20)

Recently uploaded

Recently uploaded (20)

Unsupervised image to-image translation via pre-trained style gan2 network