IATMSI 2022 Presentation Format.pptx

IEEE International Conference on Interdisciplinary Approaches in Technology and
Management for Social Innovation (Hybrid)
December 21 – 23, 2022, Gwalior, India
Enabling the Change! Social Innovation
for sustainable societies
Paper Title: Human Pose Estimation: Benchmarking Deep
Learning-based Methods
All authors Name and Affiliation
Mayank Lovanshi and Vivek Tiwari, IIIT-Naya Raipur
Paper ID: 8399
Track No. : 3
Presented by
Mayank Lovanshi, IIIT-Naya Raipur

Content
• Introduction
• Related Work
• Methodology: Human Pose Estimation Models
• Dataset Used
• Experiment & Results
• Conclusion
• References
2

INTRODUCTION
• Human Pose Estimation: Identifying and
classifying the joints in the human body [1,2].
• Way to capture a set of coordinates for each
joint (arm, head, torso, etc.,) Known as key
points [2,3].
• The connection between these points is
known as a Pair [1,2,3].
• Extraction of the angle information between
the body joints [2,3].
3
Fig.1: Sample of Pose Estimation
Source:https://www.quickerhire.com/blogs/human-pose-estimation-for-multiple-subjects-with-machine-learning

Cont…
4
Fig.2: Human body modeling: a) Skeleton based, b) Contour based,
c) Volume-based
Source:https://shop62004.afacetoreframe.org/content?c=body%20pose%20estimation&id=1
Three types of approaches to HPE:
 The skeleton-based model includes a set of
key points (joints) like ankles, knees, shoulders, and
elbows [1].
 The contour-based model consists of the
contour and rough width of the body, torso, and limbs
[1].
 The volume-based model consists of multiple
popular 3D human body models and poses represented
by human geometric meshes and shapes [1].

Cont…
2D Pose Estimation: 2D human pose estimation uses visuals like images and video to
evaluate the 2D human pose or spatial location of the human body’s key points [2,3].
3D Pose Estimation: The 3D Human Pose Estimation method is used to locate human joints
in 3D space [2,3].
5
Fig.3: 2D vs 3D Pose Estimation sample

Related Work
6
S.N. Paper Title Problem Statement Method Limitation
1. DeepPose: Human Pose
Estimation via Deep Neural
Networks by A. Toshev et.al.
(2014) [8].
Aim to extract 2D/3D
key points information
using deep learning
based HPE algorithm
PosePipe: a open-source deep
learning model used to extract
2D/3D keypoints.
Hard to work on the
video based datasets.
2. Human Pose Estimation via
Convolutional Part Heatmap
Regression by A. Bulat et.al.
(2016) [10].
Extraction of Human
body joints using deep
learning based
approach
A Convolutional Neural
Network (CNN) based approach
used for identification of the
human pose
CNN based approach
doesn’t work on part
based posture
identification
3. Combining local appearance and
holistic view: Dual-Source Deep
Neural Networks for human pose
estimation by Xiaochuan Fan
et.al. (2018) [12].
Aim to extract local
part pose information
to enhance human
posture evaluation
Dual-Source Deep
Convolutional Neural Network
(DS-CNN) used for posture
evaluation.
Pose estimation results
is not that much
correct.

Related Work
7
S.N. Paper Title Problem Statement Method Limitation
4. Deep learning based 2D human
pose estimation: A survey by Q.
Dang et.al. (2019) [2].
Identification of 2D/3D
human pose estimation
using kinematic model
Model-free & model based;two
estimation algorithm is used
It doesn’t work on the
RGB-D images.
5. End-to-end recovery of human
shape and pose by A. Kanazawa
(2020) [11].
Identification of the
human posture with
joints angle & key
points information
Human Mess Recovery (HMR):
an end-to-end system for
generating a complete 3D/2D
mess.
3D mess can’t be
extracted from the
depth RGB
image/video.
6. Hand Pose Estimation from RGB
Images Based on Deep Learning:
A Survey by Y. Liu et.al. (2021)
[1].
Identification of 2D
human pose
estimation.
DeepPose: a cascaded deep
learning-based regressor used
It doesn’t work to
extract 3D human pose
estimation

HUMAN POSE ESTIMATION MODELS
1. OpenPose [4]
2. ViTPose [13]
3. HRNet [6]
4. AlphaPose [5]
5. DenseNet [14]
6. EfficientPose [15,16]
7. DensePose [17]
8. Hourglass [18]
8
Fig. 4: Architecture of our proposed work

1. OpenPose:
• OpenPose is based on the VGG-19
convolutional neural network.
• It comprises four parts: input, part confidence
map, bipartite matching, & output image.
2. ViTPose:
• Based on non-hierarchical vision transformers
as backbones.
• Two deconvolution layers and one prediction
layer.
9
Fig. 5: Image extraction through the OpenPose method
Fig. 6: Framework of the ViTPose method

3. HRNet:
• Backbone model as a convolutional neural
network.
• Used for semantic segmentation, object
recognition, and image categorisation.
4. AlphaPose:
• Used Symmetric Spatial Transformer Network
(SSTN)
• Single-Person Pose Estimator (SPPE)
10
Fig. 8: Image extraction through the AlphaPose method
Fig. 7: Framework of HRNet method

5. DenseNet:
• The backbone model is Resnet (based on
CNN).
• Solve the vanishing gradient problem by
using LSTM as one layer.
6. EfficientPose:
• The backbone model is a Convolutional
neural network.
• It comprises two main parts; an efficient
backbone and an efficient head.
11
Fig. 9: Layered structure of DenseNet method
Fig. 10: Architecture of the EfficientPose Methods

7. DensePose:
• A fully-convolutional network design
was used in the Dense Regression
(DenseReg).
• It combines the DenseReg method with
the Mask-RCNN to improve Pose.
8. Hourglass:
• Based on tightly linked fully
convolutional networks.
• Conv-deconv and encoder-decoder
methods are linked to the hourglass
module.
12
Fig. 11: Architecture of the DensePose Methods
Fig. 12: Architecture of the Hourglass Methods

DATASET USED
A. COCO: [20]
• Images: 66,808
• Annotations: 273,469
• Key points: 17 key points
B. MPII: [12]
• Images: 40,000
• Annotations: 223,589
• Classes: 410 classes
13
Fig.13: COCO Dataset: sample images [20]
Fig.14: MPII Dataset: sample images [12]

RESULTS
Evaluate results on the basis of the following matrices:
1. Average Precision(AP): The weighted mean of precisions at each threshold; the
weight is the increase in recall from the prior threshold [7,8].
𝐴𝑃@𝛼 = 0
1
𝑝 𝑟 𝑑𝑟
2. Mean Average Precision(mAP): Average precision value over different IOUs [9].
3. Percentage of correct key point (PCK): PCK is a precision metric determining the
anticipated key point and the actual joint in a given distance [10,11].
14
𝑚𝐴𝑃@𝛼 =
1
𝑛
𝑖=1
𝑛
𝐴𝑃𝑖

RESULTS
TABLE I: BENCHMARKING WITH SOTA POSE ESTIMATION NETWORKS ON COCO DATASET BASED ON AP & MAP
15
Algorithm COCO Dataset
AP AP0.5 AP0.75 APM APL mAP
OpenPose[4] 60.5 83.4 66.4 55.1 68.1 65.9
AlphaPose [5] 73.3 89.2 79.1 69.0 78.6 77.84
HRNet [6] 77.4 92.6 84 73.6 83.7 82.3
ViTPose-B [13] 81.1 95.0 88.2 87.8 86.0 85.6
DenseNet [14] 77.1 93.3 83.6 72.2 83.6 82.6
EfficientPose [15,16] 70.5 91.1 79.0 67.3 76.2 76.1
DensePose [17] 55.8 83.7 56.3 42.2 53.8 61.1
Hourglass [18] 65.6 88.8 69.3 - - 74.5
4*RSN-50 [19] 78.6 94.6 86.6 83.3 75.5 83.8

RESULTS
TABLE II: BENCHMARKING WITH SOTA POSE ESTIMATION NETWORKS ON MPII DATASET BASED ON PCK OF BODY PART &
AVERAGE PCK
16
Algorithm MPII Dataset
Ankle Knee Hip Wrist Elbow Shoulder Head Avg.
PCK
OpenPose [4] 79.87 87.17 93.0 79.15 89.03 95.97 96.11 88.73
AlphaPose[5] 72.4 79.9 80.3 76.4 84.0 90.5 91.3 82.1
HRNet [6] 82.5 86.1 89.1 85.9 90.5 85.9 96.9 90.0
ViTPose-B [13] 88.3 91.9 92.4 90.1 93.7 97.4 97.6 93.4
EfficientPose [15,16] 83.9 87.5 90.3 87.5 91.7 96.0 98.2 91.2
Hourglass [18] 89.3 92.2 93.2 91.2 94.4 97.5 98.8 94.1
4*RSN-50 [19] 86.8 90.6 92.0 89.9 93.9 97.3 98.5 93.0

CONCLUSION
• This study helps to yield accurate and spatially precise key point heat maps, average
precision & probability of correct key points of human pose estimation.
• Experimental analysis was done over two datasets, i.e. COCO & MPII datasets.
• ViTPose-B performed better than the others in every AP variant on COCO dataset
because it uses a transformer instead of a convolution.
• OpenPose underperformed on the COCO dataset.
• The average PCK of the MPII dataset and the PCKs for each class were
outperformed by the hourglass model.
• AlphaPose underperformed on the MPII dataset.
17

WORK DONE(based on discussed work)
Task 1: Human Skeleton Pose and Spatio-Temporal Feature-based Activity
Recognition using ST-GCN
• Human activity recognition using pose estimation algorithm.
• Normalise human activity sequence with the Gaussian filter method.
• Investigate ST-GCN model for extraction of Spatial & Temporal features.
Task 2: 3D Skeleton-based Human Motion Prediction using Dynamic Multi-scale
Spatiotemporal Graph Recurrent Neural Networks
• Human motion prediction using graph recurrent neural network.
• Investigate a novel DMST-GRNN model on the multi-scale variation for the extraction of spatial &
temporal features.
• Validate human motion based on time series-based 3D sequential datasets.
18

REFERENCES
[1] Y. Liu, J. Jiang, and J. Sun, “Hand Pose Estimation from RGB Images Based on Deep Learning: A Survey.” 2021 IEEE 7th International Conference on
Virtual Reality (ICVR), 2021.
[2] Q. Dang, J. Yin, B. Wang, and W. Zheng, “Deep learning based 2D human pose estimation: A survey.” Tsinghua Science and Technology, vol. 24, no. 6,
pp. 663-676, 2019.
[3] Meenakshi Choudhary, Vivek Tiwari, and Swati Jain. Person reidentification using deep siamese network with multi-layer similarity constraints. Multimedia
Tools and Applications, pages 1– 17, 2021.
[4] D. Osokin, “Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose.” Proceedings of the 8th International Conference on Pattern
Recognition Applications and Methods, 2019.
[5] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: Regional Multiperson Pose Estimation.” 2017 IEEE International Conference on Computer Vision (ICCV),
2017.
[6] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep High-Resolution Representation Learning for Human Pose Estimation.” 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 2019.
[7] W. Li, R. Du, and S. Chen, “Skeleton-Based Spatio-Temporal UNetwork for 3D Human Pose Estimation in Video.” Sensors, vol. 22, no. 7, p. 2573, 2022.
[8] A. Toshev and C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks.” 2014 IEEE Conference on Computer Vision and Pattern
Recognition, 2014.
[9] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Real-time Multi-person 2D Pose Estimation Using Part Affinity Fields.” 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2017.
[10] A. Bulat and G. Tzimiropoulos, “Human Pose Estimation via Convolutional Part Heatmap Regression.” Computer Vision – ECCV 2016, pp. 717-732,
2016.
19

REFERENCES
[11] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2018.
[12] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2D Human Pose Estimation: New Benchmark and State of the Art Analysis.” 2014 IEEE
Conference on Computer Vision and Pattern Recognition, 2014.
[13] Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation.” Computer Vision – ECCV 2022,
2022.
[14] S. W. Chu, Y. Song, J. J. Zouo, and W. Cai, “Human Pose Estimation Using Deep Convolutional Densenet Hourglass Network with Intermediate
Points Voting.” 2019 IEEE International Conference on Image Processing (ICIP), 2019.
[15] J. Li, C. Wang, H. Zhu, Y. Mao, H.-S. Fang, and C. Lu, “CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark.” 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[16] D. Groos, H. Ramampiaro, and E. A. Ihlen, “EfficientPose: Scalable single-person pose estimation.” Applied Intelligence, vol. 51, no. 4, pp. 2518-
2533, 2020.
[17] R. A. Guler, N. Neverova, and I. Kokkinos, “DensePose: Dense Human Pose Estimation in the Wild.” 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2018.
[18] T. Xu and W. Takano, “Graph Stacked Hourglass Networks for 3D Human Pose Estimation.” 2021 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2021.
[19] Y. Cai, “Learning Delicate Local Representations for Multi-person Pose Estimation.” Computer Vision – ECCV 2020.
[20] T.-Y. Lin, “Microsoft COCO: Common Objects in Context.” Computer Vision – ECCV 2014, pp. 740-755, 2014.
20

IATMSI 2022 Presentation Format.pptx

Recommended

Recommended

More Related Content

Similar to IATMSI 2022 Presentation Format.pptx

Similar to IATMSI 2022 Presentation Format.pptx (20)

Recently uploaded

Recently uploaded (20)

IATMSI 2022 Presentation Format.pptx