■ Text-to-imageのタスク
■ 文章を入力して画像を生成
Imagesynthesis via GANs
Conditional Image Synthesis
Text-to-image
Label-to-image
背景
People riding on
elephants that
are walking through
a river.
引用5 [Seunghoon Hong et al., 2018]
■ Discriminatorはpix2pixHDと同じMulti-scale discriminator(PatchGAN準拠)
(Adversarial loss + Feature Matching loss + Perceptual loss)
■ least squared loss -> Hinge lossに変更
■ DiscriminatorにはSPADE層をいれない
実装詳細
引用1 [Taesung Park et al., 2019]
引用7 [Ting-Chun Wang et al, 2017]
64.
■ GeneratorとDiscriminatorの両方にSpectral Normを適用
■Generator LR = 0.0001、Discriminator LR = 0.0004
■ ADAM β1 = 0、β2 = 0.999
■ Dataset
⁃ COCO-Stuff: train 118,000枚、validation 5,000枚、182 classes
⁃ ADE20K:train 20,210枚、validation 2,000枚、150 classes
⁃ Cityscapes dataset:train 3,000枚、validation 500枚
⁃ Flickr Landscapes:train 40,000枚、validation 1,000枚 (DeepLabV2使用)
実装詳細
引用11 [Holger Caesar, et al., 2018]
引用12 [Bolei Zhou, et al., 2016]
引用13 [Marius Cordts, et al., 2017]
■ Base Line:
①Pix2pixHD:SOTAなGANベースアプローチ
ベースライン
引用7 [Ting-Chun Wang et al, 2017]
67.
■ Base Line:
①Pix2pixHD:SOTAなGANベースアプローチ
② CRN:段階的に高解像度Semantic mapを入力するFeedforwardアプローチ
ベースライン
引用14 [Qifeng Chen et al., 2017]
68.
■ Base Line:
①Pix2pixHD:SOTAなGANベースアプローチ
② CRN:段階的に高解像度Semantic mapを入力するFeedforwardアプローチ
③ SIMS:本物画像のDBからセグメント合成するアプローチ
ベースライン
引用15 [Xiaojuan Qi et al., 2018]
参考文献
■ [1] TaesungPark et al. Semantic Image Synthesis with Spatially-Adaptive Normalization, 2019
https://arxiv.org/abs/1903.07291
https://youtu.be/9GR8V-VR4Qg?t=614
■ [2] Tero Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2018
https://arxiv.org/abs/1710.10196
https://youtu.be/XOxxPcy5Gr4
■ [3] Alec Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
https://arxiv.org/abs/1511.06434
■ [4] Takeru Miyato et al. cGANs with Projection Discriminator, 2018
https://arxiv.org/abs/1802.05637
■ [5] Seunghoon Hong et al. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis, 2018
https://arxiv.org/abs/1801.05091
■ [6] Phillip Isola et al. Image-to-Image Translation with Conditional Adversarial Networks, 2016
https://arxiv.org/abs/1611.07004
■ [7] Ting-Chun Wang et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, 2017
https://arxiv.org/abs/1711.11585
https://youtu.be/3AIpPlzM_qs
80.
参考文献
■ [8] QifengChen, et al. Photographic Image Synthesis with Cascaded Refinement Networks, 2017
https://arxiv.org/abs/1707.09405
■ [9] Xun Huang, et al. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, 2017
https://arxiv.org/abs/1703.06868
■ [10] Harm de Vries, et al. Modulating early visual processing by language, 2017
https://arxiv.org/abs/1707.00683
■ [11] Holger Caesar, et al. COCO-Stuff: Thing and Stuff Classes in Context, 2018
https://arxiv.org/abs/1612.03716
■ [12] Bolei Zhou, et al. Semantic Understanding of Scenes through the ADE20K Dataset, 2016
https://arxiv.org/abs/1608.05442
■ [13] Marius Cordts, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding, 2016
https://arxiv.org/abs/1604.01685
■ [14] Qifeng Chen, et al. Photographic Image Synthesis with Cascaded Refinement Networks, 2017
https://arxiv.org/abs/1707.09405
■ [15] Xiaojuan Qi, et al. Semi-parametric Image Synthesis, 2018
https://arxiv.org/abs/1804.10992