Be Your Own Prada:
Fashion Synthesis With Structural Coherence
DeepFashion: facing toward the camera, and the background of the image is not severely cluttered（凌
Given an input image of a
person and a sentence
describing a different outfit,
our model “redresses” the
person as desired, while at the
same time keeping the wearer
and her/his pose unchanged.
Notation Of FashionGAN
Training data: one photo per user where each photo has a sentence description of the outfit.
original image 𝐼 𝑜 ,
segmentation map：𝑆 𝑜 , pixel−wise class labels such as hair, face, upper−clothes, pants/shorts, etc body
vector of binary attribute: a ,gender, long/short hair, wearing/not sunglasses and wearing/not hat, skin
information should be preserved during generation
Description v : generated by a text encoder
Design coding d = (a, v)
𝑆 𝑜 : background, hair, face, upper-clothes, pants/shorts, legs, and arms.
m(𝑆 𝑜): background, hair, face, and rest (all clothing pixels). Only capture body, not includes cloth
Reason: solve the contradiction between 𝑆 𝑜 and d
Goal：generate new segmentation which is
(1) Attributes are consistent with the design
(2) Pose are consistent with original image
(3) Segmentation is consistent with new
description in design coding.
each pixel in the map has a probabilistic simplex constraint, use the Softmax activation function on each
pixel at the end of the generator, so that the generated fake shape map is comparable with the real
DeepFashion contains over 800,000 diverse fashion images ranging from well-posed shop images to
unconstrained consumer photos.
DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled
with 50categories, 1,000 descriptive attributes, bounding box and clothing landmarks.
DeepFashion contains over 300,000 cross-pose/cross-domain image pairs.
4个任务：Attribute Prediction, Consumer-to-shop Clothes Retrieval, In-shop Clothes
Retrieval, Landmark Detection.