Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cross-domain complementary learning with synthetic data for multi-person part segmentation

37 views

Published on

presented by Kevin Lin

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cross-domain complementary learning with synthetic data for multi-person part segmentation

  1. 1. Cross-domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun University of Washington, Seattle Microsoft, Redmond International Conference on Computer Vision (ICCV), Demonstration, 2019 1
  2. 2. Outline • Introduction • Related works • Proposed method • Experiments • On-going work and Conclusion 2
  3. 3. Human part segmentation • Human part segmentation aims at partitioning persons in the image to multiple semantically consistent regions. Typically 14 parts: Head, torso, left upper-arm, right upper-arm, left lower-arm, right lower- arm, left hand, right hand, left thigh, right thigh, left shank, right shank, left foot, right foot Input Image Part Segmentation 3
  4. 4. Challenges • Training data labeling in pixel-level is very expensive and labor intensive. 4
  5. 5. Previous works • People have been exploring synthetic data as an alternative. • They trained deep CNN using the synthetic data. Samples of the synthetic training data and the synthetic labels [CVPR17] 5
  6. 6. Previous works Their method works well only on the well-controlled, single-person scenario. Learning from Synthetic Humans, CVPR 2017 Input images Output results 6
  7. 7. The domain gap • The discrepancy of pixel-value distributions between the synthetic and real data makes transferring the knowledge from the synthetic to real domain challenging. Synthetic image Real images 7
  8. 8. Related works on street-view segmentation • People are also trying to use graphics simulation for training a segmentation model for street-view images. • They also observe the domain-gap issue. Zhang et al, Fully Convolutional Adaptation Networks for Semantic Segmentation, CVPR 2018. 8
  9. 9. Related works on street-view segmentation • Previous studies tried to address the domain-gap issue by using adversarial training. • They use a discriminator to distinguish whether the input is from the source or target domain. [Tsai et al, ICCV2019], [Tsai et al, CVPR 2018], [Ren et al, CVPR2018], [Tzeng et al, CVPR2017], [Ganin et al, ICML2015] Graphics simulation Real-world images 9
  10. 10. Challenges • Can we learn human part segmentation without data labeling? • How to learn human part segmentation from graphics simulations, and make the resulting model work well on real world scenario? We propose a new approach, named cross-domain complementary learning (CDCL) to address the challenges. 10
  11. 11. Our multi-person synthetic data • We create a new multi-person synthetic dataset which contains multiple persons performing various actions in a 3D room. 11
  12. 12. The idea •We observe that real and synthetic humans both have a skeleton (pose) representation. 12
  13. 13. Proposed method • We propose to bridge the domains with skeletons and learn part segmentation from synthetic data. 13
  14. 14. Proposed network: Module 1 Backbone (ResNet101) Part Affinity Fields Keypoint Maps Skeletons Real Inputs Head networks The network architecture is similar to “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” in CVPR 2017. 14
  15. 15. Proposed network: Module 2 Backbone (ResNet101) Head networks Keypoint Maps Body Part Maps Skeletons Body Part Segmentation Synthetic Inputs Part Affinity Fields 15
  16. 16. Two modules are trained interchangeably Backbone (ResNet101) Head networks Keypoint Maps Body Part Maps Skeletons Body Part Segmentation Backbone (ResNet101) Part Affinity Fields Keypoint Maps Skeletons Parameter Sharing Synthetic Inputs Real Inputs Head networks Module 2 Module 1 Part Affinity Fields 16
  17. 17. • Intersection over Union (IoU) is one of the most commonly used metrics in semantic segmentation. • IoU is calculated for each body part category separately. • We average over all categories to provide a mean IoU. Evaluation metric IoU = 𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝐴𝑒𝑟𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∩ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∪ 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 17
  18. 18. Evaluation benchmarks • Pascal-Person-Parts dataset • 1716 training images • 1817 test images • COCO-DensePose dataset • 26151 training images • 1508 test images 18
  19. 19. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal 19
  20. 20. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Performance Gap 20
  21. 21. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Performance Gap 21
  22. 22. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Relax labeling requirements! 22
  23. 23. Comparison on Pascal and COCO (mIOU, %) Synthetic Only Adversarial Training Fang et al CVPR18 OursChen et al TPAMI18 Gong et al CVPR17 Ours + Real part labels Use real part labels Use additional real part labels Ideal Our performance upper bound 23
  24. 24. Qualitative comparison Training with Synthetic Data Only [CVPR17] Ours 24
  25. 25. Qualitative comparison Domain Adaptation with Adversarial Training [CVPR18] Ours 25
  26. 26. Ablation study 26
  27. 27. Synthetic training data analysis 27
  28. 28. Qualitative comparison [1] Learning from Synthetic Humans, CVPR17. 28
  29. 29. Qualitative comparison [1] Learning from Synthetic Humans, CVPR17. 29
  30. 30. General approach • Our proposed cross-domain training approach is general and can be extended to other applications, such as novel keypoint detection. We can simply generate new labels on the synthetic data 30
  31. 31. Novel keypoint detection • In some applications, we need to detect other keypoints (e.g., joints) such as hand tips, toes, pelvis, spine. • We create novel keypoints using the graphics simulator and train our model to detect new human skeleton including those on the hands and feet. The definition of our newly created novel keypoints 31
  32. 32. Qualitative results 32
  33. 33. Conclusion • We discover human pose is very effective to bridge the real and synthetic domains for multi-person part segmentation. • We introduce an effective framework to leverage information in both real and synthetic images for multi-person part segmentation. • Our method can be extended to generate labels for keypoints such as those on hands and feet in real images without human labeling. 33
  34. 34. On-going work and future directions • Reconstruct 3D human mesh from a single image without ground truth training labels 34
  35. 35. On-going work and future directions • Training data labeling for 3D body shape is very expensive. First stage: Ask workers to label parts Second stage: Ask workers to label the corresponding points on 3D human model Sampled points: uniformly sampled points within the part Guler et al, “DensePose: Learning image-to-surface correspondence,” CVPR 2018. 35
  36. 36. On-going work and future directions • We plan to explore different approaches to learn human 3D body shape from graphics simulations. 36
  37. 37. Thank you 37

×