보다 유연한 이미지 변환을 하려면?

보다 유연한 이미지 변환을 하려면?
보잉 한국 기술연구소 (BKETC)
모두의 연구소 Self Driving Lab
이 광 희
Image-to-Image Translation

About Me
이 광 희 ( Kwang Hee Lee)
• BKETC(보잉한국기술연구소) / Senior AI Technologist
• 모두의 연구소 Self Driving LAB
lkwanghee@gmail.com
+82-10-9990-9474
◆ Work Experience
• 前 AIRI(인공지능연구원) / Senior Researcher
• 前 TOVIS / Senior Researcher
• 前 Samsung Medison / Senior Research Engineer
◆ Research Interests
• Image-to-Image Translation
• Arbitrary Style Transfer
• XAI(eXplainable AI)
• Visual SLAM

유연한 이미지 변환에 대해..

많은 이미지 변환 기법들이 있지만..
Generative Adversarial Networks(GANs)
Artistic/Photo Style Transfer
Deep Image Analogy
Variational Autoencoder (VAE)
Image Warping / Morphing
Texture Synthesis
…

그 중에서도 GAN기반의
Image-to-Image Translation

What is Image Translation?
Source Domain Target Domain
Image Translator
Selfie2Anime
Input Image Translated Image

Taxonomy of Image Translation
Image Translation
Paired Unpaired
Unimodal Multi-modal Unimodal Multi-modal Multi-domain Multi-mapping
Pix2Pix BicycleGAN CycleGAN
DiscoGAN
UNIT
MUNIT
DRIT
StarGAN DRIT++
DMIT
SDIT
StarGAN v2

Paired (Supervised) Image Translation
Image-to-Image Translation with Conditional Adversarial Networks Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros CVPR, 2017.
Pix2Pix (CVPR2017)

Unpaired (Unsupervised) Image Translation
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", in IEEE ICCV 2017.
CycleGAN (ICCV2017)

Unpaired Image Translation
Unpaired
Unimodal Multi-modal Multi-domain Multi-mapping

Unimodal(one-to-one) Translation
G1
G2
G3
Diversity
Scalability
Generator

G1
G2
G3
G1 G2 G3
G1
G2
G3
Diversity
Scalability
Generator

Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", in IEEE ICCV 2017.
CycleGAN (ICCV2017)
DiscoGAN (ICML2017)
UNIT (NIPS2017)

 Loss = Adversarial Loss + Cycle Consistency Loss
 Cycle Consistency Loss

UNIT (NIPS2017)
G1
G2
CycleGAN (ICCV2017)
DiscoGAN (ICML2017)
c𝑥2𝑥1
𝑐1
𝑐2
𝑥2𝑥1
-Shared Latent space Assumption-Cycle Consistency Loss

Multi-domain Translation
G
Diversity
Scalability
Generator

G G
Diversity
Scalability
Generator

StarGAN (CVPR2018)
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Yunjey Choi1,2, Minje Choi1,2, Munyoung Kim2,3, Jung-Woo Ha2, Sung Kim2,4,
and Jaegul Choo1,2 IEEE CVPR, 2018 (Oral)

StarGAN(CVPR2018)
G
𝑥 𝑁𝑥2𝑥1
c
…
- Domain Label
- Unified Structure

Multi-modal Translation
G1
G2
G3 Diversity
ScalabilityGenerator

G1
G2
G3 Diversity
Scalability
G1 G2 G3
Generator

MUNIT (ECCV2018)
DRIT (ECCV2018)
Lee, Hsin-Ying, et al. "Diverse image-to-image translation via disentangled representations." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

MUNIT (ECCV2018)
DRIT(ECCV2018)
c 𝑥2𝑥1
G1 G2
𝑠1 𝑠2
-Partially Shared Latent Space
-Feature Disentanglement
(separate content and style feature)

Multi-mapping Translation
G
Diversity
Scalability

G
Diversity
Scalability
G

Lee, Hsin-Ying, et al. "DRIT++: Diverse Image-to-Image Translation via Disentangled Representations." arXiv preprint arXiv:1905.01270 (2019).
DRIT++(arXiv2019)

DRIT++ (arXiv2019)
DMIT (Neurips2019)
SDIT (ICM2019)
StarGAN v2 (arXiv2019)
G
𝑥 𝑁𝑥2𝑥1
s
c -Unified Style Encoder

Comparison Unsupervised
I2I Translation Models
CycleGAN
UNIT(L)
DiscoGAN
U-GAT-IT(AT)
MUNIT(PL, FD)
DRIT(PL, FD)
StarGAN(U, DL) SDIT(U, DL, AT)
DMIT (U, L, FD)
DRIT++(U, L, FD, DL)
U: Unified Structure
L: Shared Latent Space (하나의 Latent Space로 통합)
DL: Domain Label
PL: Partially Latent Space
FD: Feature Disentanglement (Content, Style feature를 분리)
AT: Attention

기존의 I2I 모델들의 한계?

그 전에 먼저,
Image Translation 문제를 세분화 해봅시다.

Style Transfer Object Transfiguration
with Geometry Change
Object Transfiguration
without Geometry Change
Aligned Data
Unaligned Data
(In the Wild)

Aligned Data
Unaligned Data
(In the Wild)
가장 잘됨!

Aligned Data
Unaligned Data
(In the Wild)
가장 잘됨!
배경색이 변하긴
하지만, 잘 되는 편..

Aligned Data
Unaligned Data
(In the Wild)
가장 잘됨!
배경색이 변하긴
하지만, 잘 되는 편..
Aligned Data에서는 잘 되지만,
Wild Image에서는 실패함.

Aligned Data
Unaligned Data
(In the Wild)
혼합된 형태의 문제도 있음.

Object Transfiguration에서
기존 I2I 모델들의 한계는?

한계점: Geometry Change의 실패

한계점: 배경의 컬러 변질 및 형태정보 손실

한계점: 원치 않는 대상의 변환

이러한 문제 해결을 위한 연구들..

Attention GAN
Attention
Chen, Xinyuan, et al. "Attention-GAN for object transfiguration in wild images." Proceedings of the
European Conference on Computer Vision (ECCV). 2018.

InstaGANMo, Sangwoo, Minsu Cho, and Jinwoo Shin. "InstaGAN: Instance-aware
Image-to-Image Translation." arXiv preprint arXiv:1812.10889 (2018).

TransGaGaWu, Wayne, et al. "Transgaga: Geometry-aware unsupervised image-to-image
translation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

여전히 Geometry Change는
쉽지 않은 문제!

무엇이 Geometry Change를
어렵게 만들까?

Aligned vs Wild (Unaligned)
Aligned Aligned

Aligned vs Wild (Unaligned)
Aligned Aligned
Unaligned
Unaligned

Intra-class Variation
Low Intra Class Variation

Intra-class Variation
Low Intra Class Variation
High Intra Class Variation

Inter-class Variation
Low Inter Class Variation

Inter-class Variation
Low Inter Class Variation
High Inter Class Variation

Model and Hyperparameter Setting
https://github.com/HsinYingLee/DRIT

우리가 Geometry Change를
잘 하기 위해 고려한 것들..

무엇을 변화시킬지..
얼마나 변화시킬지..
하나의 고정된 모델과 하이퍼파라미터..

이 세가지를 동시에 해결한 논문:
U-GAT-IT

U-GAT-IT:
Unsupervised Generative ATtentional Networks with Adaptive
Layer-Instance Normalization for Image-to-Image Translation
Junho Kim, MinJae Kim, Hyeonwoo Kang, Kwang Hee Lee
https://github.com/taki0112/UGATIT
https://arxiv.org/abs/1907.10830

Summary
- Propose a novel I2I model with the Attention module and the AdaLIN.
- Attention module: distinguish between source and target domains
- Guide the translation focusing on more important regions
- AdaLIN: flexible control the amount of change in shape and texture without
tuning (model architecture or hyperparameters)
- Learnable normalization function

Background: Class Activation Mapping(CAM)
Zhou, Bolei, et al. "Learning deep features for discriminative localization." Proceedings of
the IEEE conference on computer vision and pattern recognition. 2016.

Background : Normalization Function
Normalization
Whitening

Background : Normalization Function
- Instance Norm(IN), Batch Norm(BN), Layer Norm(LN)
- Adaptive Instance Norm(AdaIN), Conditional Instance Norm(CIN),…
https://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-
learning/?fbclid=IwAR3eC6oIW10_2Lwrvu55iOHxkKTLn0Q4e3jhPrtsOrIE4O-f4Fx6egUyTQI

Background : BIN(Batch-Instance Normalization)
Nam, Hyeonseob, and Hyo-Eun Kim. "Batch-instance normalization for adaptively style-
invariant neural networks." Advances in Neural Information Processing Systems. 2018.

Background : BIN(Batch-Instance Normalization)
Nam, Hyeonseob, and Hyo-Eun Kim. "Batch-instance normalization for adaptively style-
invariant neural networks." Advances in Neural Information Processing Systems. 2018.
BN과 IN을 잘 결합하면 중요한 style은 유지하고
불필요한 style은 제거하는 효과를 가져옴.

Motivation : IN & LN
Instance Normalization Layer Normalization
▪ 주로 Image generation/style transfer에서 좋은
성능을 냄.
▪ Channel–wise statistics를 고려함.
▪ 상대적으로 source domain content의
structure를 잘 유지함.
▪ Global statistics를 고려함.
▪ 상대적으로 source domain content의 structure를
덜 유지하고, target domain의 style을 더 잘 반영함.

Motivation : IN & LN
Instance Normalization Layer Normalization
▪ 주로 Image generation/style transfer에서 좋은
성능을 냄.
▪ Channel–wise statistics를 고려함.
▪ 상대적으로 source domain content의
structure를 잘 유지함.
▪ Global statistics를 고려함.
▪ 상대적으로 source domain content의 structure를
덜 유지하고, target domain의 style을 더 잘 반영함.
BIN처럼 잘 합치면, task에 따라서 적절히
normalization을 수행 할 수 있지 않을까?

Adaptive Layer-Instance Normalization (AdaLIN)
Learned Parameter
-Attention 모델이 Shape 변화의 양을 유연하게 조절하도록 가이드
-Normalization의 정도를 선택적으로 결정

U-GAT-IT U-GAT-ITInput InputCycleGAN UNIT MUNIT DRIT AGGAN CycleGAN UNIT MUNIT DRIT AGGAN

Conclusion and Future Work
- Attention module 과 새로운 learnable normalization 기법인 AdaLIN 제안.
- CAM기반의 attention module은 두 도메인간의 차이가 큰 영역을 더 잘
바꾸도록 가이드해 준다.
- AdaLIN을 통해서 shape과 style 변환의 양을 model이나 hyper-parameter의
변경 없이 조절할 수 있었다.
- Future work: 더 wild한 이미지 간의 변환, multi-mapping image translation

보다 유연한 이미지 변환을 하려면?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 보다 유연한 이미지 변환을 하려면?

Similar to 보다 유연한 이미지 변환을 하려면? (20)

More from 광희 이

More from 광희 이 (6)

Recently uploaded

Recently uploaded (20)

보다 유연한 이미지 변환을 하려면?