Your SlideShare is downloading. ×
Estimating Human Pose from Occluded Images (ACCV 2009)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Estimating Human Pose from Occluded Images (ACCV 2009)

1,624

Published on

We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint …

We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,624
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
42
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Estimating Human Pose from Occluded Images Jia-Bin Huang and Ming-Hsuan Yang {jbhuang,mhyang}@ieee.org Electrical Engineering and Computer Science University of California at Merced ACCV, Xi’ an, September , 2009 1 / 25
  • 2. What this talk is about Estimating human poses from single 2D images as a direct nonlinear regression Recovering poses even when humans in the scenes are partially or heavily occluded via sparse approximation Achieving (implicitly) relevant feature selection 2 / 25
  • 3. Outline 1 Introduction 2 Recovering Poses via Sparse Approximation 3 Experimental Setups 4 Conclusions 3 / 25
  • 4. Introduction 4 / 25
  • 5. Motivation - Applications Motion Analysis Diagnostics of orthopedic patients Analysis and optimization of an athletes’ performances Content-based retrieval and compression of video Auto saftey: control of airbags, drowsiness detection, pedestrian detection, etc. Surveillance People counting or crowd flux, flow, and congestion analysis Analysis of actions, activities, and behaviors both for crowds and individuals Human Computer Interaction Virtual Reality 5 / 25
  • 6. Motivation - Applications Motion Analysis Diagnostics of orthopedic patients Analysis and optimization of an athletes’ performances Content-based retrieval and compression of video Auto saftey: control of airbags, drowsiness detection, pedestrian detection, etc. Surveillance People counting or crowd flux, flow, and congestion analysis Analysis of actions, activities, and behaviors both for crowds and individuals Human Computer Interaction Virtual Reality 5 / 25
  • 7. Motivation - Applications Motion Analysis Diagnostics of orthopedic patients Analysis and optimization of an athletes’ performances Content-based retrieval and compression of video Auto saftey: control of airbags, drowsiness detection, pedestrian detection, etc. Surveillance People counting or crowd flux, flow, and congestion analysis Analysis of actions, activities, and behaviors both for crowds and individuals Human Computer Interaction Virtual Reality 5 / 25
  • 8. Key Challenges Why the problem is difficult? Ambiguity: Loss of depth information Variations of shape and appearance of human body Background clutters Occlusions In this paper, we address the problems of occlusions and background clutters. 6 / 25
  • 9. Related Works Model-based (Generative) Employ a know model (e.g., tree structure) based on prior knowledge Include two parts: 1) Modeling, 2) Estimation [Felzenszwalb, Huttenlocher ’00], [Ioeffe et al. ’01], [Ronfard et al. ’02], [Ramnan, Forsyth ’03], [Mori, Malik, ’04] Model-free (Discriminative) Example-based [Shakhnarovich et al. ’03], [Poppe ’07] Learning-based [Rosales, Sclaroff, ’01], [Agarwal, Triggs ’06], [Sminchisescu et al. ’07], [Bissacco et al. 07], [Okada, Soatto ’08] 7 / 25
  • 10. Recovering Poses via Sparse Approximation 8 / 25
  • 11. Problem Formulation Given: N training samples {x1 , y1 }, {x2 , y2 } · · · , {xN , yN } Input: test sample b Output: pose descriptor vector y Input: single image Output: pose 9 / 25
  • 12. Test Image as a Sparse Linear Combination of Training Images Given N training samples x1 , x2 , · · · , xN ∈ IRm , we represent a test sample b by b = ω1 x1 + ω2 x2 + · · · + ωN xN Let A = [x1 , x2 , · · · , xN ] ∈ IRm×N , b = Aω, where ω = [ω1 , ω2 , . . . , ωN ]T is the coefficient vector. We want to find the sparsest solution of the linear system. 10 / 25
  • 13. Goal: Find the sparsest solution Recover the sparsest solution via 1 -norm minimization 0 -norm solution–NP hard min ω 0 subject to b = Aω ω 1 -norm solution–convex optimization problem min ω 1 subject to b = Aω ω Relaxed constraint on equality to allow small noise min ω 1 subject to b − Aω 2 ≤ ω 11 / 25
  • 14. Coping with Background Clutter and Occlusions When errors occur? 1) Misalignment, 2) Background clutter, 3) Occlusions. Introduce an error term e ω b = Aω + e = [A I] = Bv e Solve the extended linear system min ||v||1 subject to b − Bv 2 ≤ Recovered test sample bR can be represented as Aω 12 / 25
  • 15. Occlusion Recovery: An Example Figure: Occlusion recovery on a synthetic dataset. (a)(b) The original input image and its feature. (c) Corrupted feature via adding random block. (d) Recovered feature via find the sparsest solution. (e) Reconstruction error. 13 / 25
  • 16. Handling Background Clutter: An Example Figure: Feature selection example. (a) Original test image. (b) The HOG feature descriptor computed from (a). (c) Recovered feature vector by our algorithm. (d) The reconstruction error. 14 / 25
  • 17. Experimental Results 15 / 25
  • 18. Experimental Results Settings 1.8 GHz PC with 2 GB RAM Matlab implementation 1 solver 1 magic (second-order cone programming, SOCP) Sparse Lab Many other packages... Regressor: Map image feature to pose vector Gaussian Process Regressor Relevant Vector Regressor [Agarwal, Triggs PAMI ’06] Support Vector Regressor 16 / 25
  • 19. Experimental Results Settings 1.8 GHz PC with 2 GB RAM Matlab implementation 1 solver 1 magic (second-order cone programming, SOCP) Sparse Lab Many other packages... Regressor: Map image feature to pose vector Gaussian Process Regressor Relevant Vector Regressor [Agarwal, Triggs PAMI ’06] Support Vector Regressor 16 / 25
  • 20. Experimental Results Settings 1.8 GHz PC with 2 GB RAM Matlab implementation 1 solver 1 magic (second-order cone programming, SOCP) Sparse Lab Many other packages... Regressor: Map image feature to pose vector Gaussian Process Regressor Relevant Vector Regressor [Agarwal, Triggs PAMI ’06] Support Vector Regressor 16 / 25
  • 21. Robustness to Occlusions Synthetic data [Agarwal, Triggs PAMI ’06] 1927 silhouette image for training and 418 images for testing Image descriptor: PCA and downsampled images Pose vector: 55-dimensional vector describe the joint angle (in degree) Sample test images (a) 0.1 (b) 0.2 (c) 0.3 (d) 0.4 (e) 0.5 (f) 0.6 17 / 25
  • 22. Robustness to Occlusions Results on synthetic data set Estimating pose from original (blue), corrupted (green), and recovered (red) testing images (a) PCA (b) Downsample Figure: Average error of pose estimation on synthetic data set using different features: (a) PCA with 20 coefficients. (b) downsampled (20×20) images. Localized features are more suitable for error correction! 18 / 25
  • 23. Robustness to Occlusions Real Database: HumanEva I [Sigal, Black TR ’06] Synchronized image and motion capture data Four views of four subjects Six predefined actions (walking, jogging, gesturing, throwing/catching, boxing, combo) We use the common motion walking sequences of three subjects from the first camera Synthesized testing images For each image, we randomly generate two occluding blocks with various corruption level for synthesizing images with occlusions. 19 / 25
  • 24. Robustness to Occlusions Real Database: HumanEva I [Sigal, Black TR ’06] Synchronized image and motion capture data Four views of four subjects Six predefined actions (walking, jogging, gesturing, throwing/catching, boxing, combo) We use the common motion walking sequences of three subjects from the first camera Synthesized testing images For each image, we randomly generate two occluding blocks with various corruption level for synthesizing images with occlusions. 19 / 25
  • 25. Robustness to Occlusions HumanEva I: Sample test images (a) 0.1 (b) 0.2 (c) 0.3 (d) 0.4 (e) 0.5 (f) 0.6 20 / 25
  • 26. Robustness to Occlusions Results on HumanEva data set Estimating pose from original (blue), corrupted (green), and recovered (red) testing images (a) S1 (b) S2 (C) S3 Figure: Results of pose estimation on HumanEva data set I in walking sequences. (a) Subject 1. (b) Subject 2. (c) Subject 3. 21 / 25
  • 27. Robustness to Background Clutters Implicit relevant feature selection Compared with the original feature representation (HOG), the estimations induced from recovered feature are better. The improvements of mean position errors (mm) are 4.89, 10.84, and 7.87 for S1, S2, and S3, respectively Figure: Mean 3D error plots for the walking sequences (S2). 22 / 25
  • 28. Conclusions 23 / 25
  • 29. Conclusions Spare approximation can be used for recovering errors resulted from occlusions when estimating humans from occluded images Without occlusions, the recovered features are still better than the original ones The recovery of feature representation is independent from the learning process, i.e., our method can be adopted as a preprocessing module of other model-free pose estimation approaches 24 / 25
  • 30. Thank You! Questions? 25 / 25

×