1. VNect: Real-time 3D Human
Pose Estimation with a
Single RGB Camera
2017.08.14
Yunkyu Choi
2. Contents
● Overview
● Process
● 3D Pose Estimation
○ CNN Regression
○ Kinematic Skeleton Fitting
● Result
● Limitation
● Conclusion
3. Overview
● Full global 3D skeleton pose
○ global: not local 3D pose relative to a bounding box
● real-time
○ 30Hz
● a single RGB Camera
● CNN based pose regressor + kinematic skeleton fitting
○ CNN base on (https://arxiv.org/pdf/1611.09813.pdf ) 100 Layers => 50 Layers
○ Don’t require tightly cropped input frame
4. Process
● CNN to regress 2D and 3D joint positions
○ trained on annotated 3D human pose datasets => Joint Positions
● Kinematic Skeleton Fitting
Optional: Skeleton
Initialization by height
5. 3D Pose Estimation
● I => PG
○ I : Image
○ PG : Global Pose
○ PG (θ, d): joint angle θ, Global Position in Camera Space d
○ PL : Root-relative 3D Joint position
○ K: 2D keypoints
● CNN Pose Regression
12. Limitations
● Depth estimation from single image => ill posed
● Temporal jitter
○ Floor constraint
○ Head angle and pose by HMD
● Implausible 3D pose by misprediction
● Very fast motion
13. Conclusion
● 3D global 3D skeleton
● Single RGB camera
● 30Hz realtime
● Fully-convolutional CNN => Regress 2D and 3D Joint positions
● Skeleton fitting
● Temporally stable
● Without Strict bounding boxes