VNect

VNect: Real-time 3D Human
Pose Estimation with a
Single RGB Camera
2017.08.14
Yunkyu Choi

Contents
● Overview
● Process
● 3D Pose Estimation
○ CNN Regression
○ Kinematic Skeleton Fitting
● Result
● Limitation
● Conclusion

Overview
● Full global 3D skeleton pose
○ global: not local 3D pose relative to a bounding box
● real-time
○ 30Hz
● a single RGB Camera
● CNN based pose regressor + kinematic skeleton fitting
○ CNN base on (https://arxiv.org/pdf/1611.09813.pdf ) 100 Layers => 50 Layers
○ Don’t require tightly cropped input frame

Process
● CNN to regress 2D and 3D joint positions
○ trained on annotated 3D human pose datasets => Joint Positions
● Kinematic Skeleton Fitting
Optional: Skeleton
Initialization by height

3D Pose Estimation
● I => PG
○ I : Image
○ PG : Global Pose
○ PG (θ, d): joint angle θ, Global Position in Camera Space d
○ PL : Root-relative 3D Joint position
○ K: 2D keypoints
● CNN Pose Regression

CNN Regression
● Location map
○ No structure imposed
○ 3D position relative to Root
Loss Function

CNN Regression
● Training
○ Pretrained for 2D pose estimation on MPII and LSP
○ 3D pose:
■ MPI-INF-3DHP : 100k image samples
■ Human3.6m(except S9, S11): 75k image samples
● Bounding Box Tracker
○ CNN don’t require BB
○ but CNN runtime performance affected by the image size

Kinematic Skeleton
Fitting
● 2D prediction of K are
temporally filtered
○ used for 3D coordinates

Result
자세한 부분은 영상과 논문 참조

Limitations
● Depth estimation from single image => ill posed
● Temporal jitter
○ Floor constraint
○ Head angle and pose by HMD
● Implausible 3D pose by misprediction
● Very fast motion

Conclusion
● 3D global 3D skeleton
● Single RGB camera
● 30Hz realtime
● Fully-convolutional CNN => Regress 2D and 3D Joint positions
● Skeleton fitting
● Temporally stable
● Without Strict bounding boxes

VNect

Recommended

Recommended

More Related Content

Similar to VNect

Similar to VNect (7)

Recently uploaded

Recently uploaded (20)

VNect