Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 22nd
Abstract. Generative Adversarial Networks (GANs) in recent years has certainly become one of the biggest trends in the computer vision domain. GANs are used for generating face images and computer game scenes, transferring artwork style, visualizing designs, creating super-resolution images, translating text to images, etc. We want to present a model to solve an image problem: generate new outfits onto people’s images. This task seems to be extremely important for the offline/online trade and fashion industry.Changing clothing on people’s images isn’t a trivial task. The generated part of the image should have high quality without blurring. Another problem is generating long sleeves on the images with T-shirts, for example. As a result, well-known models are not suitable for this task. In the master project, we are going to reproduce the model for clothing hanging on people’s images based on the existing approaches and improve it in order to get better quality of the image.
4. RESEARCH QUESTIONS
1. Can GAN models be used to changing clothing on people
images?
2. Which model should be used for the best possible visual
results?
3. How it’s possible to improve the model to get better
results?
4
6. RELATED WORKS ANALYSIS
I selected the following models for research.
1) When we have target cloth image:
Virtual Try-on Network,
SwapNet GAN (can't reproduce results)
2) When we have target people image:
Liquid Warping GAN
6
8. TRY-ON IMAGE (IO)
VIRTUAL TRY-ON NETWORK: TOM TRAINING
UNET ENCODER-DECODER NETWORK OUTPUT WARPED CLOTH (C)
GROUND TRUTH
8
MIR
9. VIRTUAL TRY-ON NETWORK: MODIFIED GMM LOSS FUNCTION
Original GMM loss function:
Modified GMM loss function.
9
10. VIRTUAL TRY-ON NETWORK: MODIFIED TOM LOSS FUNCTION
Modified TOM loss function
Original TOM loss function
10
11. DATASET AND TRAINING DETAILS
Zalando dataset (commonly used in similar works).
16,253 - pairs of person and the corresponding cloth;
192x256 - image size;
Adam optimizer parameters: beta1 = 0.5 and beta2 = 0.999.
Training for 200k steps.
14221/2000 - training/test pairs.
Learning rate starts from 0.0001, linearly decays after 100k steps.
Total training time (Tesla V100, 16 Gb memory): ~10h.
11
14. LIQUID WARPING GAN: BACKGROUND GENERALIZATION
Images from Place2 dataset
Original iPER dataset
14
15. LIQUID WARPING GAN TRAINING DETAILS
Adam optimizer
Training for 30 epochs
8/2 train/test ratio;
Learning rate starts from 0.0002, linearly decays every epoch.
Total training time (Tesla P4, 8 Gb memory): ~70h.
Total training time (Tesla V100, 16 Gb memory): ~20h.
15
22. LIQUID WARPING GAN: RESULTS WITH BACKGROUND - 1 22
Source
person
Target
person
LWGAN
LWGAN
+
Place2
23. LIQUID WARPING GAN: PROBLEM 23
Source
person
Target
person
LWGAN
LWGAN
+
Place2
24. LIQUID WARPING GAN: RESULTS WITH BACKGROUND - 3 24
Source
person
Target
person
LWGAN
LWGAN
+
Place2
25. LIQUID WARPING GAN: SUCCESSFUL CASES 25
Source
person
Target
person
LWGAN
LWGAN
+
Place2
26. LIQUID WARPING GAN: FAILED CASES 26
Source
person
Target
person
LWGAN
LWGAN
+
Place2
27. LIQUID WARPING GAN: FAILED CASES 27
Source
person
Target
person
LWGAN
LWGAN
+
Place2
28. ANSWERS ON THE RESEARCH QUESTIONS
1. Can GANs be used to changing clothing on people images?
In general, yes. The trained models demonstrate the acceptable results on the test
data and in some cases on the random data. However, the improvements are required.
2. Which model should be used for the best possible visual results?
According to the obtained visual results and further practical application - CP-VTON
model.
3. How it’s possible to improve the model to get better results (further steps)?
- handle cases with full-height person images;
- handle cases when hands are along the body or in other unusual position
- transform source image to the clothing-agnostic person representation in the
pipeline;
28
29. CONTRIBUTIONS
My contributions can be summarized as follows:
1) modified loss function for GMM and TOM module and
trained the corresponding model (VITON-GAN);
2) trained Liquid Warping GAN with background
generalization (Place2 dataset);
3) compared the results of CP-VTON and VITON-GAN on
Zalando dataset;
4) checked Liquid Warping GAN on Zalando dataset;
5) compared the results of Liquid Warping GAN and Liquid
Warping GAN+background generalization on the images
from the internet.
29
30. REVIEW COMMENTS
1) The choice of the Liquid Warping GAN is less explained
2) Additional explanation of loss function of the TOM module
required
3) Place2 dataset is only mentioned
4) Not presented the criteria of collecting test images for Liquid
Warping GAN from the internet
5) No information about model's hyperparameters tuning
6) GAN results evaluation
7) LWGAN was trained on different dataset, so the conclusion on
the architecture benchmark remains questionable
30
31. REVIEW DISCUSSION
1. The unusual hand position remains an important problem for
the garment swapping. Clothing-agnostic person representation
of CP-VTON and VITON-GAN models contain “face and hair
mask” that remains unchanged between the source and the
target images. Do you think it is a good idea to extend the “face
and hair mask” with a “hand mask” to have three of them
unchanged between the source and the target images? What are
the possible drawbacks of this approach?
2. What are the global trends in applying generative models to
the fashion industry?
31
32. THANK YOU FOR YOUR ATTENTION
I AM LOOKING FORWARD FOR
YOUR QUESTIONS
34. ANSWERS ON REVIEW QUESTIONS: 2
The most recent paper exactly related to the topic:
- SwapNet (2018);
- Virtual Try-on (2018);
- Liquid Warping GAN (2019).
The previous UCU master student works related to the topic:
- Mykola Mykhailych: Application of Generative Neural
Models for Style Transfer Learning in Fashion (2017)
- Andriy Kusyy: Color and style transfer using generative
adversarial networks (2018)