Title: VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization
Authors: Seunghwan Choi*, Sunghyun Park*, Minsoo Lee*, Jaegul Choo (*: equal contributions)
Abstract:
The task of image-based virtual try-on aims to transfer a target clothing item onto the corresponding region of a person, which is commonly tackled by fitting the item to the desired body part and fusing the warped item with the person. While an increasing number of studies have been conducted, the resolution of synthesized images is still limited to low (e.g., 256x192), which acts as the critical limitation against satisfying online consumers. We argue that the limitation stems from several challenges: as the resolution increases, the artifacts in the misaligned areas between the warped clothes and the desired clothing regions become noticeable in the final results; the architectures used in existing methods have low performance in generating high-quality body parts and maintaining the texture sharpness of the clothes. To address the challenges, we propose a novel virtual try-on method called VITON-HD that successfully synthesizes 1024x768 virtual try-on images. Specifically, we first prepare the segmentation map to guide our virtual try-on synthesis, and then roughly fit the target clothing item to a given person's body. Next, we propose ALIgnment-Aware Segment (ALIAS) normalization and ALIAS generator to handle the misaligned areas and preserve the details of 1024x768 inputs. Through rigorous comparison with existing methods, we demonstrate that VITON-HD highly sur-passes the baselines in terms of synthesized image quality both qualitatively and quantitatively.
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
ย
[CVPR 2021] VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization
1. VITON-HD: High-Resolution Virtual Try-On
via Misalignment-Aware Normalization
1Korea Advanced Institute of Science and Technology (KAIST)
Seunghwan Choi1* Sunghyun Park1* Minsoo Lee1* Jaegul Choo1
2. Image-based Virtual Try-on
2
โข Virtual try-on refers to an image generation task of changing the clothing item on a
person into a different item.
โข Various image-based virtual try-on methods follow two processes in common:
1. Warping the clothing images initially to fit the human body.
2. Fusing the warped clothes and the image of a person to generate the final result
that includes pixel-level refinement.
3. Limitation of Existing Virtual Try-On Approach
3
โข The misalignment between the warped clothes and a personโs body occurs.
โข The misaligned regions become more noticeable as the size of the image increases.
โข U-Net is insufficient in synthesizing initially occluded body parts in a final high-
resolution images.
4. Key Contributions
4
โข We propose a novel virtual try-on network called VITON-HD, which is the first model
to successfully synthesize high-resolution images.
โข We introduce a clothing-agnostic person representation that removes the
dependency on the clothing item originally worn.
โข To address the misalignment problem, we propose ALIAS normalization and ALIAS
generator, which is effective in maintaining the details of clothes.
โข We demonstrate the superior performance of our method through experiments with
baselines on the newly collected dataset.
6. VITON-HD
6
โข Given a reference image ๐ผ containing a target person, we predict the segmentation
map ๐ and pose map ๐.
โข We utilize ๐ and ๐ to make a clothing-agnostic person image ๐ผ๐ and segmentation
๐๐ where the information of the clothing item originally worn is truly eliminated.
7. VITON-HD
7
โข Given the clothing-agnostic person representation (๐๐, ๐), and target clothing
item ๐, the segmentation generator predicts the segmentation map แ
๐.
8. VITON-HD
8
โข Using the clothing-agnostic person representation (๐ผ๐, ๐) and เทก
๐๐ as inputs,
Geometric Matching Module predicts TPS transformation parameter ฮธ, and
then ๐ is warped by ๐.
9. VITON-HD
9
โข ALIAS generator synthesizes the final output image แ
๐ผ via ALIAS normalization
and the multi-scale refinement.
โข Warped clothing image ๐(๐, ๐) and (๐ผ๐, ๐) are injected into each layer of the
generator to preserve the clothing details.
โข Also, each layer of the generator utilize ALIAS normalization.
10. ALIAS Normalization
10
โข ALIAS normalization standardizes the regions of
๐๐๐๐ ๐๐๐๐๐ and the other regions in โ๐
separately,
and then modulates the standardized activation
using affine transformation parameters.
โข The rationale behind this strategy is to remove
the misleading information in the misaligned regions.
11. Experiments
Figure: Qualitative comparison of the segmentation generator of ACGPN and VITON-HD.
The clothing-agnostic segmentation map used by each model is also reported. 11
13. Experiments
Table: Quantitative comparison with baselines across different resolutions. VITON-HD* is a VITON-HD variant
where the standardization in ALIAS normalization is replace by channel-wise standardization as in the original
instance normalization. For the SSIM, higher is better. For the LPIPS and the FID, lower is better.
13
14. Experiments
Figure: Effects of ALIAS normalization. The orange colored areas in the enlarged images indicate the misaligned regions.
14