Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II

14 views

Published on

ICME2019 Tutorial: Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II

  1. 1. 1 Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition liuwu1@jd.com CV Lab JD AI Research Wu Liu
  2. 2. 2 Human Behavior Understanding: Human-Oriented Analysis ParsingPose PoseTrack
  3. 3. 3 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  4. 4. 4 Introduction • Human pose estimation Single person Multi person 1. Right_Shoulder 2. Right_Elbow 3. Right_Wrist 4. Left_Shoulder 5. Left_Elbow 6. Left_Wrist 7. Right_Hip 8. Right_Knee 9. Right_Ankle 10. Left_Hip 11. Left_Knee 12. Left_Ankle 13. Head 14. Neck 15. Spine 16. Pelvis
  5. 5. 5 Applications • Human action recognition • Human-computer interaction • Animation • Intelligent Retail, such as self-service supermarket and intelligent warehouses
  6. 6. 6 Challenges • Various appearances and low-resolutions • Diverse human poses and views • Occluded or invisible key points • Crowded background
  7. 7. 7 Top-down Methods [1] Stacked hourglass net-works for human pose estimation. [Newell, ECCV2016] [2] Towards accurate multi-person pose estimation in the wild. [Papandreou, CVPR2017] [3] RMPE: Regional Multi-Person Pose Estimation. [Fang, ICCV2017] [4] Simple Baselines for Human Pose Estimation and Tracking. [Xiao, ECCV2018] [5] Cascaded Pyramid Network for Multi-Person Pose Estimation. [Chen, CVPR2018] [6] HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation.[Sun, CVPR2019) Human detection + single person key points detection Advantage: State-of-the-art accuracy Problem: Lower speed, human detection accuracy.
  8. 8. 8 Bottom-Up Methods [1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017] [2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell A, NeurIPS 2017] [3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas, ECCV2018] [4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part- Based, Geometric. [Papandreou, ECCV2018] [5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019] [6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019] Detecting key points + synthesizing human bodies Advantage: Higher speed, do not rely on human detection Problem: Lower accuracy
  9. 9. 9 • Single person: stacked hourglass – basic network backbone [1] • Each hourglass first subsamples the feature maps, and then upsamples the feature maps with the combination of higher resolution features from bottom layers. • This bottom-up, top-down processing is repeated for several times. Single Person Alejandro Newell, Kaiyu Yang, Jia Deng: Stacked Hourglass Networks for Human Pose Estimation. ECCV (8) 2016: 483-499.
  10. 10. 10 • Single person: feature pyramid module [2] • Feature pyramid representation can provide sufficient context information, especially for the occluded and invisible key points. • The residual blocks are substituted by feature pyramid modules. Each module consists of bottlenecks at different resolutions. Learning feature pyramids for human pose estimation. W. Yang, S. Li, W. Ouyang, et al. ICCV 2017. Single Person https://github.com /bearpaw/PyraNet
  11. 11. 11 Top-down Methods • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy: Towards Accurate Multi-person Pose Estimation in the Wild. CVPR 2017: 3711-3719
  12. 12. 12 • Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose Estimation. ICCV 2017: 2353-2362 Top-down Methods • Handle inaccurate bounding boxes and redundant detections • Symmetric Spatial Transformer Network (SSTN) • Parametric Pose Non-Maximum-Suppression (NMS) • Pose-Guided Proposals Generator (PGPG) https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/
  13. 13. 13 • Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose Estimation. ICCV 2017: 2353-2362 Top-down Methods https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/ . Problem of bounding box localization errors Symmetric Spatial Transformer Network • Symmetric Spatial Transformer Network (SSTN) • Parametric Pose Non-Maximum-Suppression (NMS) • Pose-Guided Proposals Generator (PGPG)
  14. 14. 14 • Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun: Cascaded Pyramid Network for Multi-Person Pose Estimation. CVPR 2018: 7103-7112 Top-down Methods • This model applies pyramid features. In globalnet, different level features are added together to give a rough prediction of key point positions. • Refinenet utilizes globalnet’s output, upsamples the pyramid features and use hard point mining to improve the accuracy.
  15. 15. 15 • Bin Xiao, Haiping Wu, Yichen Wei: Simple Baselines for Human Pose Estimation and Tracking. ECCV (6) 2018: 472-487 Top-down Methods https://github.com/leoxiaobin/pose.pytorch How high resolution feature maps are generated This method combines the upsampling and convolutional parameters into deconvolutional layers in a much simpler way, without using skip layer connections.
  16. 16. 16 • Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation Learning for Human Pose Estimation. CVPR 2019 Top-down Methods 1. Proposed human pose estimation network maintains high-resolution representations through the whole process; 2. start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. 3. repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. https://github.com/leoxiaobin/deep-high-resolution- net.pytorch
  17. 17. 17 Bottom-Up Methods [1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017] [2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell A, NeurIPS 2017] [3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas, ECCV2018] [4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part- Based, Geometric. [Papandreou, ECCV2018] [5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019] [6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019] Detecting key points + synthesizing human bodies Advantage: Higher speed Problem: Lower accuracy
  18. 18. 18 • Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh: Convolutional Pose Machines. CVPR 2016: 4724-4732 • Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh: Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. CVPR 2017: 1302-1310 • Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR abs/1812.08008 (2018) Bottom-Up Methods OpenPose
  19. 19. 19 Bottom-Up Methods OpenPose Part association strategies.Architecture of the two-branch multi-stage CNN. Graph matching.
  20. 20. 20 • Associative Embedding: End-to-end Learning for Joint Detection and Grouping. Alejandro Newell, Zhiao Huang, and Jia Deng. Neural Information Processing Systems (NIPS), 2017. Bottom-Up Methods https://github.com/princeton-vl/pose-ae-train Detection + Grouping
  21. 21. 21 • Muhammed Kocabas, Salih Karagoz, Emre Akbas: MultiPoseNet: Fast Multi-Person Pose Estimation Using Pose Residual Network. ECCV (11) 2018: 437-453 Bottom-Up Methods https://github.com/mkocabas/pose-residual-network MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems.
  22. 22. 22 • George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy: PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. ECCV (14) 2018: 282-299 Bottom-Up Methods • PersonLab system consists of a CNN model that predicts: (1) keypoint heatmaps, (2) short-range offsets, (3) mid-range pairwise offsets, (4) person segmentation maps, and (5) long-range offsets. • The first three predictions are used by the Pose Estimation Module in order to detect human poses. • The latter two, along with the human pose detections, are used by the Instance Segmentation Module in order to predict person instance segmentation masks.
  23. 23. 24 Pose Estimation Dataset Dataset Single person Multi-person Num of Kpts Num of Person LSP Y N 14 ~2K FLIC Y N 9 ~20K MPII Y Y 16 ~25K COCO N Y 17 ~100K AI Challenger N Y 14 ~700K PoseTrack N Y 15 ~160K
  24. 24. 25 Pose Estimation COCO leaderboard
  25. 25. 26 Pose Estimation Paper Leaderboard Category Method Pub mAP Bottom-up Methods Openpose CVPR2017 61.8 Associative Embedding NeurlPS 2017 65.5 MultiPoseNet ECCV2018 69.6 PersonLab ECCV2018 68.7 Pifpaf CVPR2019 66.7 Multi-person Articulated Tracking CVPR2019 68.0 Top-down Methods G-RMI CVPR2017 64.9 Mask RCNN ICCV2017 63.1 RMPE ICCV2017 72.3 Simple Baseline ECCV2018 73.7 CPN CVPR2018 72.1 HRNet CVPR2019 75.5 Category Method Pub PCKh@50 Bottom-up Methods Openpose CVPR2017 75.6 Associative Embedding NeurlPS 2017 77.5 Top-down Methods RMPE ICCV2017 82.1 Simple Baseline ECCV2018 91.5 HRNet CVPR2019 92.3 COCO MPII
  26. 26. 27 Human Pose Estimation API @ Neuhub (1)CVPR 2018 LIP Challenge Single Human Pose Estimation 1st place (2)CVPR 2018 LIP Challenge Multi-Human Pose Estimation 1st place
  27. 27. 28 ‘Finger Heart & 618’Gesture for AR Scan WeChat Mini Program for Halloween WeChat Mini Program for POPMART Human Pose Estimation API @ Neuhub
  28. 28. 29 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  29. 29. 30 PoseTrack • Mykhaylo Andriluka, Google Research, Zürich, Switzerland • Umar Iqbal, University of Bonn, Germany • Anton Milan, Amazon • Christoph Lassner, Amazon • Eldar Insafutdinov, MPI for Informatics, Saarbrücken, Germany • Leonid Pishchulin, MPI for Informatics, Saarbrücken, Germany • Juergen Gall, University of Bonn, Germany • Bernt Schiele, MPI for Informatics, Saarbrücken, Germany PoseTrack is a joint project of the Max Planck Institute for Informatics, University of Bonn and the PoseTrack team.
  30. 30. 31 PoseTrack Key Figures  1356 video sequences  46K annotated video frames  276K body pose annotations Two challenges:  Multi-Person Pose Estimation  Multi-Person Pose Tracking
  31. 31. 32 Challenges • Large pose and scale variations • Fast motions • a varying number of persons • Visible body parts due to occlusion or truncation
  32. 32. 33 Related Work Bottom-up Methods [1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018. [2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. [4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018. [5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018. [6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In ECCV 2018. [7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Top-down Methods [1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In CVPR 2018. [2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. [3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
  33. 33. 34 Top-down Methods Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and- Track: Effcient Pose Estimation in Videos. In CVPR 2018. https://github.com/facebookresearch/DetectAndTrack They propose a two-stage approach to keypoint estimation and tracking in videos. 1) a novel video pose estimation formulation, 3D Mask R- CNN, that takes a short video clip as input and produces a tubelet per person and keypoints within those. 2) lightweight optimization to link the detections over time.
  34. 34. 35 Top-down Methods Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. https://github.com/YuliangXiu/PoseFlow • Overall Pipeline: 1) Pose Estimator. 2) Pose Flow Builder. 3) Pose Flow NMS. • First, they estimate multi-person poses. • Second, they build pose flows by maximizing overall confidence and purify them by Pose Flow NMS. • Finally, reasonable multi-pose trajectories can be obtained.
  35. 35. 36 Top-down Methods Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018 https://github.com/microsoft/human-pose-estimation.pytorch
  36. 36. 37 Related Work Bottom-up Methods [1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018. [2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. [4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018. [5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018. [6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In ECCV 2018. [7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Top-down Methods [1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In CVPR 2018. [2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. [3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
  37. 37. 38 Bottom-up Methods • Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017. • Mykhaylo Andriluka, Umar Iqbal, Anton Milan, Eldar Insafutdinov, Leonid Pishchulin, Juergen Gall, and Bernt Schiele. PoseTrack: A Benchmark for Human Pose Estimation and Tracking. In CVPR 2018. OpenPose / DeepCut + Graph partition
  38. 38. 39 Bottom-up Methods • Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. https://github.com/eldar/pose-tensorflow
  39. 39. 40 Bottom-up Methods • Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
  40. 40. 41 Bottom-up Methods Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018.
  41. 41. 42 Bottom-up Methods • Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
  42. 42. 43 Bottom-up Methods Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018.
  43. 43. 44 Bottom-up Methods Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019  A unified framework for Pose estimation and tracking  A bottom-up method  State-of-the-art result Part-level grouping  Part appearance  Geometric information Temporal grouping  Human embedding  Temporal embedding Pose tracking bipartite graph matching
  44. 44. 45 Bottom-up Methods Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Hourglass Model [20] Human Embedding (HE) Temporal Instance Embedding (TIE) Human-level representation Temporal representation for ID association
  45. 45. 46 PoseTrack in JD AI Research 1. An end-to-end POINet: feature extraction and identity association in a unified network. 2. Pose-guided feature extraction network: pose information + part-alignment attention in hierarchical convolution features. 3. Ovonic insight network to learn the identity matching and switching across frames. [ACM MM 2019]
  46. 46. 47 人体姿态估计+人体跟踪技术 https://posetrack.net/leaderboard.php
  47. 47. 48
  48. 48. 49 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  49. 49. 50 What is Human Parsing? Single Human Parsing Multiple Human Parsing Instance-level Human Parsing Fine-grained Human Parsing 59 Categories
  50. 50. 51 Human Parsing Applications SnapShot Fashion Analysis Recommendation Fashion Captioning Clothing Search 搭配分析 Fashion Analysis 流行指数 ★★★★★ 气质指数 ★★★★☆ 性感指数 ★★★★☆ 文本生成 飘逸的长发散发着 青春与活力,搭配 天鹅黄长裙彰显修 长的身材,褐色外 套与包包更增添几 分优雅气质。 Human Parsing + X
  51. 51. 52 Challenges of Human Parsing? • Intrinsic Varied Person Appearance Ambiguity of Clothing Complexity of Clothing Low Efficiency Small Targets Unbalance of Data • Extrinsic Occlusion Clutter
  52. 52. 53 Human Parsing History Clothing Parsing Human & Object Parsing Pedestrian Parsing [Bo et al., CVPR11] Fashion Parsing [Yamaguchi et al., CVPR12 ] [Liu et al., MM14, TMM14, MM15 ] [Liang et al., ICCV15, TPAMI15, ECCV16 ] Constrained Un-constrained
  53. 53. 54 Related Work • Single Human parsing [Bo et al., CVPR11 ] • Unsupervised super-pixel • Shape-based matching • Spatial constraints Conventional methods: Yihang Bo, Charless C. Fowlkes: Shape-based pedestrian parsing. CVPR 2011: 2265-2272
  54. 54. 55 Related Work • Single Human parsing • Conventional methods: • Yamaguchi, Kota, et al. "Parsing clothing in fashion photographs." CVPR, 2012. • Yamaguchi, Kota, M. Hadi Kiapour, and Tamara L. Berg. "Paper doll parsing: Retrieving similar styles to parse clothing items." ICCV, 2013. • Dong, Jian, et al. "A deformable mixture parsing model with parselets." ICCV, 2013. Pose Parsing
  55. 55. 56 Related Work • Single Human parsing • Conventional methods: • Liu, Si, et al. "Fashion parsing with video context." MM2014, TMM2015. • Liu, Si, et al. "Fashion parsing with weak color-category labels." TMM, 2014. weak supervision
  56. 56. 57 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Luo, Ping, Xiaogang Wang, and Xiaoou Tang. "Pedestrian parsing via deep decompositional network." ICCV, 2013. Hog + DNN Deep Decompositional Network
  57. 57. 58 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Liu, Si, et al. "Matching-cnn meets knn: Quasi-parametric human parsing." CVPR. 2015. • Liang, Xiaodan, et al. "Deep human parsing with active template regression." TPAMI, 2015 Parsing by Matching
  58. 58. 59 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Liang, Xiaodan, et al. "Human parsing with contextualized convolutional neural network." ICCV2015, TPAMI2017. Parsing Image-level Label Edge Superpixel
  59. 59. 60 Related Work • Single Human parsing • Deep learning-based methods in 2017 • Gong, Ke, et al. "Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing." CVPR. 2017. SSL: Self-supervised Structure-sensitive Learning https://github.com/Engineering-Course/LIP_SSL
  60. 60. 61 Related Work • Single Human parsing • Deep learning-based methods in 2017 • Liang, Xiaodan, et al. "Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark." TPAMI, 2018. JPP-Net: Joint Body Parsing & Pose Estimation Network Pose Parsing https://github.com/Engineering-Course/LIP_JPPNet
  61. 61. 62 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Luo, Yawei, et al. "Macro-micro adversarial network for human parsing." ECCV. 2018. MMAN: Macro-Micro Adversarial Network Parsing GAN https://github.com/RoyalVane/MMAN
  62. 62. 63 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Liu, Si, et al. "Cross-domain human parsing via adversarial feature and label adaptation.“ AAAI, 2018. Cross-domain Human Parsing Parsing GAN https://github.com/mathfinder/Cross-domain-Human- Parsing-via-Adversarial-Feature-and-Label-Adaptation
  63. 63. 64 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Luo, Xianghui, et al. "Trusted Guidance Pyramid Network for Human Parsing." ACMMM, 2018 TGPNet: Trusted Guidance Pyramid Network
  64. 64. 65 Related Work • Multi Human parsing • Li, Qizhu, Anurag Arnab, and Philip HS Torr. "Holistic, Instance-level Human Parsing." BMVC, 2017. Detector FCN“parsing-by-detection”
  65. 65. 67 Related Work • Multi Human parsing • Fang, Hao-Shu, et al. “Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer.” CVPR, 2018. Parsing Pose RefineNet https://github.com/MVIG-SJTU/WSHP
  66. 66. 68 Related Work • Multi Human parsing • Gong, Ke, et al. "Instance-level human parsing via part grouping network." ECCV, 2018 Parsing Edge https://github.com/Engineering- Course/CIHP_PGN
  67. 67. 69 Related Work • Multi Human parsing • Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student Paper. https://github.com/ZhaoJ901 4/Multi-Human-Parsing
  68. 68. 70 Related Work • Multi Human parsing • Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student Paper. Parsing GAN semantic saliency prediction instance-agnostic parsing instance-aware clustering https://github.com/ZhaoJ901 4/Multi-Human-Parsing
  69. 69. 71 Related Work • Multi Human parsing • Li, Jianshu, et al. "Multi-Human Parsing Machines." ACM MM, 2018. GAN Instance Segmentation Parsing
  70. 70. 72 Related Work • Multi Human parsing • Tao Ruan, Ting Liu, et al. "Devil in the details: Towards accurate single and multiple human parsing." AAAI, 2019. Parsing Edge Context Embedding with Edge Perceiving PSPNet U-Net Edge-Net https://github.com/liutinglt/CE2P CE2P
  71. 71. 73 Related Work • Multi Human parsing • Liu, Ting, et al. "Devil in the details: Towards accurate single and multiple human parsing." AAAI, 2019. Parsing Mask- RCNN
  72. 72. 74 Related Work • Multi Human parsing • Gong, Ke et al. "Graphonomy: Universal Human Parsing via Graph Transfer Learning." CVPR, 2019. Universal Human Parsing: One Model for Different Datasets Parsing Graph Transfer Learning https://github.com/ Gaoyiminggithub/ Graphonomy
  73. 73. 75 Related Work • Multi Human parsing • Yang, Lu et al. "Parsing R-CNN for Instance-Level Human Analysis." CVPR, 2019. An End-to-end Framework for Multi-Human Parsing FPN RPN Non- Local Parsing R-CNN
  74. 74. 76 Related Work • Video Human parsing • Zhou, Qixian, et al. “Adaptive Temporal Encoding Network for Video Instance-level Human Parsing.” ACMMM, 2018. https://github.com/HCPLab-SYSU/ATEN
  75. 75. 77 Related Work • Multi Human parsing • Xinchen, Liu, et al. “Devil in the details: Towards accurate single and multiple human parsing.” MM, 2019.  A Braiding Network with two sub-nets: • A deep-and-narrow net to learn semantic knowledge; • A shallow-but-wide net to capture local structures.  A novel Braiding Module: • Exchange information between the two sub-nets • Learn robust and effective features for small targets.  Pairwise Hard Region Embedding: • Differentiate ambiguous parsing targets through a hard-aware regional metric learning loss.
  76. 76. 78 Datasets Single Total Train Val Test Class Instance Fashionista 685 456 - 229 56 1 ATR 17,700 16,000 700 1,000 18 1 LIP 50,462 30,462 10,000 10,000 20 1 JD-Fashion 16,497 16,317 180 - 21 1 Multiple PASCAL-Person-Part 3,533 1,716 - 1,817 7 × CIHP 38,280 28,280 5,000 5,000 20 √ MHP v1.0 4,980 3,000 1,000 980 19 √ MHP v2.0 25,403 15,403 5,000 5,000 59 √ Video Indoor (1 frame label) 700 400 200 100 13 1 Outdoor (1 frame label) 741 421 120 200 13 1 VIP (1/25 frame label) 404 354 - 50 20 √
  77. 77. 79 Evaluation Metric • Single Human Parsing • Pixel accuracy • Mean pixel accuracy • Mean IoU • Frequency weighted IoU • F1-score F1 = 2 ∙ 𝑃 ∙ 𝑅 𝑃 + 𝑅
  78. 78. 80 Evaluation Metric • Multi Human Parsing • Mean IoU • APr & mAP • Percentage of Correctly Parsed (PCP) • Video Human Parsing • Similar to Single & Multi Human Parsing • Additional: FPS
  79. 79. 81 Results of Single Human Parsing • On ATR Method Pub Pixel Acc F1-score Paper Doll CVPR13 88.96 44.76 M-CNN CVPR15 89.57 62.81 ATR PAMI15 91.11 64.38 Deeplab-v2(vgg16) PAMI16 94.42 73.53 PSPnet (resnet101) CVPR17 95.20 75.84 Co-CNN ICCV15 95.23 76.95 Attention(vgg16) CVPR16 95.41 77.23 Deeplab-v3+ ECCV18 95.96 79.49 LG-LSTM CVPR16 96.18 80.97 TGPN MM18 96.45 81.76 Graph-LSTM ECCV16 97.60 83.76 Graphonomy CVPR19 98.32 90.89
  80. 80. 82 Results of Single Human Parsing • On LIP validation Method Pub Pixel Acc mIoU SegNet PAMI17 69.04 18.17 FCN-8s CVPR15 76.06 28.29 DeepLabV2 ICLR15 82.66 41.64 Attention CVPR16 83.43 42.92 DeepLabV2 + SSL CVPR17 83.16 42.44 Attention + SSL CVPR17 84.36 44.73 SS-NAN CVPRW17 87.59 47.92 MMAN ECCV18 - 46.81 JPPNet PAMI18 86.39 51.37 CE2P AAAI19 87.37 53.10 BraidNet MM19 87.60 54.42
  81. 81. 83 Results of Multi Human Parsing • On CIHP Method Pub mIoU AP @IoU Threshold mAP 0.5 0.6 0.7 PGN ECCV18 55.8 35.8 28.6 20.5 33.6 DMNet CVPR18 61.51 46.12 41.50 M-CE2P AAAI19 59.50 48.69 40.13 29.74 42.83 Graphonomy CVPR19 58.58 - - - - Parsing RCNN CVPR19 59.8 - - - - BraidNet MM19 60.62 48.99 41.67 32.71 43.59
  82. 82. 84 Results of Multi Human Parsing • On MHP v2 Method Pub PCP @0.5 AP @IoU Threshold mAP 0.5 0.6 0.7 Mask R-CNN ICCV17 25.11 14.90 MH-Parser MM18 26.98 17.99 - - - PGN ECCV18 32.25 25.14 - - 41.78 S-LAB CVPR18 38.27 31.47 - - 40.71 CE2P AAAI19 41.82 33.34 - - 42.25 Parsing RCNN CVPR19 44.2 - - - 40.3
  83. 83. 85 Thinking in Human Parsing • Methodology  Multi-task learning: Parsing + Pose + Edge  Multi-granularity supervision Low + Middle + High Un/Semi-supervised  Improve Efficiency To be real-time  Cross-domain Fashion  Surveillance  Multi-modality Image  Video
  84. 84. 86 Thanks! liuwu1@jd.com Computer Vision and Multimedia Lab AI Platform and Research

×