Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Human parsing

84 views

Published on

Seminar 27-01-2018
Human Parsing
By Yawei Luo

Published in: Science
  • Be the first to comment

  • Be the first to like this

Human parsing

  1. 1. Human Parsing Yawei Luo
  2. 2. Problem description  Human parsing aims to segment a human image into multiple semantic parts.  It is a pixel-wise parsing problem.  It is a supervised machine learning problem.
  3. 3. Challenges  Occluded (especially by other people)  Multi-scale  Cross-domain  Label conflict  Blurry  Cavity  … Main conflict is the desire for both larger field of view & more accurate location (Deeper or Denser?) } } Need larger field of view Need denser & more accurate location
  4. 4. Related works  Atrous Convolution e.g. Deeplab
  5. 5. Related works  Atrous Convolution e.g. Deeplab
  6. 6. Related works  Skip Net e.g. U-net (top) FCN(bottom)
  7. 7. Related works Edge + Pixel Voting e.g. CoCNN
  8. 8. Baseline ASPP 3*256*256 20*256*256 20*256*256 64*128*128 fake real 256*64*64 512*32*32 1024*16*16 8192*16*16 2048*16*16 DeeplabV2 Resnet101 Block Resnet101 Block with Atrous Conv Tensor Transfer Upsampling
  9. 9. Two GANs  Patch GAN focuses on low-level and local features, which guarantees sharp and clear labelmaps.  Pose GAN focuses on high-level and global features, which helps generating labelmaps that consist with human pose priors.
  10. 10. ASPP Patch D Patch GAN loss Shallow NLL loss Deep NLL loss Resize Concat Totalloss Copy 3*256*256 20*256*256 20*256*256 3*256*256 20*16*16 64*128*128 20*16*16 fake real fake 256*64*64 512*32*32 real 1024*16*16 8192*16*16 2048*16*16 Resnet101 Block Resnet101 Block with Atrous Conv Tensor Transfer Upsampling
  11. 11. Experimental result with Patch GAN (LIP)
  12. 12. Experimental result with Patch GAN (LIP)
  13. 13. ASPP Patch D Pose D Patch GAN loss Shallow NLL loss Deep NLL loss Pose GAN loss Resize Concat Concat Totalloss Copy 3*256*256 19*16*16 20*256*256 20*256*256 3*256*256 19*16*16 20*16*16 64*128*128 Openpose 20*16*16 fake real fake 256*64*64 512*32*32 real 1024*16*16 8192*16*16 2048*16*16 Resnet101 Block Resnet101 Block with Atrous Conv Tensor Transfer Upsampling Resize Concat
  14. 14. Real: 1 ⋯ 1 ⋮ ⋱ ⋮ 1 ⋯ 1 Fake: 0 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 0 Real: 1 Fake: 0 Patch GAN Pose GAN Difference between two discriminator RGB image Pose Label map Feature map
  15. 15. Experimental result with Two GANs (LIP)
  16. 16. Experimental result with Two GANs (LIP)
  17. 17. Experimental result with Two GANs (LIP): Total loss
  18. 18. Experimental result with Two GANs (LIP): D_loss and G_loss
  19. 19. Contributions  We propose an effective PP-GAN for human parsing, which employs two conditional GANs as supplementary supervisions on shallow, fine layers and deep, coarse layers of the network respectively. Our model explicitly divides the human parsing into "what" and "where" subtasks in an unified framework and boosts the parsing performance on both image level and semantic level.  To our best knowledge, it is the first attempt to integrate human pose information into a conditional GAN framework for human parsing task, which significantly reduces the structural error of parsing results.  In the proposed framework, discrimination process is naturally divided into two easier tasks and two different discriminators are employed. The experiments demonstrate that multiple discriminators, which only focus on their own areas, prevail over single discriminator which is prone to saturate when facing with complex task.  The proposed PP-GAN significantly surpasses the previous methods on both challenging LIP and XXX benchmark datasets.

×