GAN based selfie-to-pokemon

목차
• Task 소개
• Related works 정리 및 소개
• Test 결과
• Develop할 만한 부분

selfie-to-pokemon
사람 얼굴 특징점의
Shape
pokemon 개체들의 공통
Texture
Style-transferred된
Generated data

GAN
https://ysbsb.github.io/gan/2020/06/17/GAN-newbie-guide.html

DCGAN – insights on architecture

* Img2Img translation
Img2Img
Translation
Supervised
Unsupervised
Unimodal
Multimodal
Unimodal
Multimodal
Pix2Pix
BicycleGAN
CycleGAN, UNIT
StarGAN, MUNIT

cGAN
https://blog.naver.com/laonple/221306150417
• Condition y가 z에게 가이드 역할
• y를 조절하여 원하는 클래스 data generation

Pix2Pix – Pair image
• '조건 이미지와 이미지의 짝’의 대응관계를 학습
• Output image is conditional on an input
• Paired input x controls modes of G(x)

Contribution
• Shape shift에도 좋은 결과 (not only local texture shift)
• AdaLIN 적용하여 shape & texture 변화를 유연하게 조절
• Datset에 따라 구조 & hyperparameter 변경 불필요
• Attention module 적용

Goal is to learn mapping G
• G transfers style from unpaired
source domain X_s and target
domain X_t

Generator
① Attention 모듈은 auxiliary classifier
를 통해 얻은 attention map을 기반으
로 Source domain와 Target domain의
차이, 즉 중요한 영역에 더욱 집중.
②AdaLIN(Adaptive Layer-Instance
Normalization)을 통해 shape와
texture(style) 변화 모두에 대응할 수 있
도록
https://comlini8-8.tistory.com/48
S domain & T domain 이미지를
T domain 이미지로 translation

Generator - Encoder
• S와 T의 도메인 이미지를 같이 넣어주면, feature map 생성

Generator – Auxiliary Map
• 입력 이미지의 feature map이
source domain의 것일 확률을 구
함
• 각 domain에 따라서 각 channel에
얼만큼 attend해야할 지 결정

Generator – Auxiliary Map
• Attention module로 CAM 사용

Generator - Decoder
• Weighted feature map으로 T domain image 생성
• AdaLIN 적용

Generator – AdaLIN
• Layer Norm 과 Instance Norm 을 적절히 잘 섞음
각 batch에 대해 normalize 하여 세부 style
전파에 IN보다 강함 (source shape는 유지
안되는 경우가 많음)
각 channel & batch에 대해 normalize하여
하나의 style만 뽑아내는 경향이 있으며,
image shape 유지에 강함
https://comlini8-8.tistory.com/48

Generator – AdaLIN
• 중요도가 매겨진 feature map에서 이
에 대해 얼만큼의 비율로 IN/LN을 사
용할지 결정됨
• Channel 마다 shape 변화도와 style
전파도가 다름
https://www.slideshare.net/jungminchung/ugatit-unsupervised-generative-attentional-networks-with-adaptive-
layerinstance-normalization-for-imagetoimage-translation-173206999

Discriminator – Auxiliary Map
• 목적이 generator에서와 달리, 실제 T domain 이미지와 생성된
이미지를 구별하는 것
• G에게 실제 이미지와 유사한 이미지 생성에 중요한 부분에 집
중하도록 가이드 해주는 역할

Loss function
• 1) Adversarial Loss : translated image가 T domain에 있도록
• 2) Cycle Loss
• 3) Identity Loss : input & output 색감이 비슷하도록
• 4) CAM Loss : (Auxiliary map 역할)

Dataset
• Face
• https://afad-dataset.github.io/
• Pokemon
• https://www.kaggle.com/kvpratama/pokemon-images-dataset

Limitation
• 1) Large shape shift (landmark 비직관적)
• 1) T domain distribution 이해 부족 (small dataset)
• 1) Better than CAM (not perfect attention)
• = face landmark들을 사용한다기 보다 collapsing하여 새롭게
generation 하는 느낌에 그침

Img2img translation을 벗어나서
• 1) 차라리 selfie vs. pokemon retrieval 문제
• 1) sketch to pokemon cGAN 모델
• 1) VAE-based style transfer
• (i) 두 도메인을 joint embedding space로 올려놓고
• (ii) selfie x에서(disentangle로) style s(x)를 뽑아내고
• (iii) (VAE처럼) x의 joint embedding에서 비슷한 샘플 z로. gen.
• (iv) generation된 pokemon G(z)에 s(x)로 style-transfer

GAN based selfie-to-pokemon

Recommended

Recommended

More Related Content

Featured

Featured (20)

GAN based selfie-to-pokemon