16. Method - Auto-Encoder
- Downsampling: CNN
- AdaIn: parameters in normalization layers to represent styles
17. Method - Auto-Encoder
- Discriminator
- LSGAN objective
- multi-scale discriminators
- to learn realistic details
- to learn correct global structure
- Domain-invariant perceptual loss
- supervised setting でしか使えない perceptual loss を unsupervised にも拡張
- a distance in the VGG feature space between the output and the reference image
- high-resolution の学習を助ける。
23. Evaluation - Human Preference
- to evaluate the quality
- Amazon Mechanical Turk
- 500 questions/worker
- 1 source image
- 2 translated images from different methods
24. Evaluation - LPIPS Distances
- to evaluate diversity
- a weighted L2 distance between pairs of deep features of
randomly-sampled translated images from the same input
- deep feature extractor: ImageNet-pretrained AlexNet
- correlate well with human perceptual similarity
- 1900 pairs
- 100 input images
- x 19 output pairs/input
28. Evaluation - (C)IS=(Conditional) Inception Score
- popular for image generation
- to evaluate quality and diversity
- IS: diversity of all output images
- Inception-v3 で識別しやすい画像であるほどスコアが高い。
- CIS: diversity of outputs conditioned on a single input image
- more suited for evaluating multi-modal mapping
- e.g. 1 枚の猫の画像が、ほぼ完璧な犬の画像に変換されたら、 ISは高くなる。ただ、もし、その変換
先が、画像ごとに同じ犬の画像に変換される( multi-modal mapping でない)なら、 IS は高いが、
CIS は低くなる。
29. Evaluation - (C)IS=(Conditional) Inception Score
- x1: source image
- x2: target image
- x1->2: translated image from 1 to 2
- y: class=mode (e.g. ポメラニアン、柴犬、シベリアンハスキー if X2 is a set of
dogs)
30. - 既存の unsupervised approach に比べ圧勝。 (the higher the better)
Results - unsupervised - quantitative