Silhouette analysis based action recognition via exploiting human poses


Published on

We propose a novel scheme for human action recognition that combines the advantages of both local and global representations.
We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Silhouette analysis based action recognition via exploiting human poses

  1. 1. Silhouette Analysis-Based Action Recognition via Exploiting Human Poses
  2. 2. CONTENTS • • • • • • • • • • Abstract & Objective Introduction Software Requirement Hardware Requirement Existing System & Disadvantages Proposed System & Advantages Literature Survey Application Conclusion References
  3. 3. Abstract • • • • • In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action. A modified bag-of-words model, named bag of correlated poses, is introduced to encode temporally local features of actions. To utilize the property of visual word ambiguity, we reduce the dimensionality of our model. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features.
  4. 4. OBJECTIVE • The objective of vision-based human action recognition is to label the video sequence with its corresponding action category.
  5. 5. Software Requirement • Operating System : Windows XP • Language : MATLAB • Version : MATLAB 7.9
  6. 6. Hardware Requirement • Pentium IV – 2.7 GHz • 1 GB DDR RAM • 250 GB Hard Disk
  7. 7. Existing system • • • • • • STIPs using a temporal Gabor filter and a spatial Gaussian filter. STIP detectors such as Harris3D, Cuboid, 3D-Hessian, dense sampling, spatiotemporal regularity-based feature HOG/HOF, HOG3D, extended SURF, and MoSIFT. PageRank-based centrality measure to select key poses according to the recovered geometric structure. Utilizing properties of the solution to the Poisson equation to extract space-time features. By calculating the differences between frames and used them as intermediate features. Action recognition framework fusing local 3D-SIFT descriptors and holistic Zernike motion energy image (MEI) features.
  8. 8. disadvantages • Segmentation and tracking are not possible. • It is consuming too time for the feature points computation. • Sparse representation, such as bag of visual words (BoVWs), discards geometric relationship of the features and is less discriminative. • Hard-assignment quantization during codebook construction for BoVW. the
  9. 9. Proposed system • Here we proposed the method to recognize the action in the silhouette of human. • Here we extract the BoCP(Bag of correlated posses). • BoCP feature will extract in the sequence of steps. – PCA feature extraction followed by k-means clustering and the correlogram matrix construction. – We reduce the correlogram dimension by the use of LDA. • BoCP feature descriptor and Extended-MHI forms the feature vector. • SVM (Support Vector Machine) trains the features and predict the result.
  10. 10. advantages • Reduce computational complexity and quantization error. • The proposed scheme takes advantages of local and global features • Provides a discriminative representation for human actions.
  11. 11. Literature survey
  12. 12. Action recognition using context and appearance distribution features distribution • We first propose a new spatio-temporal context • • • • • feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: – 1) multiple GMM distributions of spatio-temporal context; – 2) GMM distribution of local video appearance.
  13. 13. Action Recognition using Space-time Shape Difference Images we present a novel motion representation • In this paper, • • • • based on difference images. In this paper we have presented a new method of extracting useful features from human action videos for action recognition. We show that this representation exploits the dynamics of motion, and show its effectiveness in action recognition We showed the effectiveness of our method, and compared our results against other well established algorithms, which shows our algorithm has competitive accuracy, is fast, and furthermore, is not very sensitive to video resolution, partial shape deformation of actions nor the number of clusters used. Future work can include combining other features containing additional shape information, and improving the quality of silhouette extraction.
  14. 14. Making action recognition robust to occlusions and viewpoint changes • • • • We propose a novel approach to providing robustness to both occlusions and viewpoint changes that yields significant improvements over existing techniques. At its heart is a local partitioning and hierarchical classification of the 3D Histogram of Oriented Gradients (HOG) descriptor to represent sequences of images that have been concatenated into a data volume. We achieve robustness to occlusions and viewpoint changes by combining training data from all viewpoints to train classifiers that estimate action labels independently over sets of HOG blocks. A top level classifier combines these local labels into a global action class decision.
  15. 15. Action recognition using correlogram of body poses and spectral regression • In this paper, we propose a novel representation for human actions using Correlogram of Body Poses (CBP) which takes advantage of both the probabilistic distribution and the temporal relationship of human poses. • To reduce the high dimensionality of the CBP representation, an efficient subspace learning technique called Spectral Regression Discriminant Analysis (SRDA) is explored. • Experimental results on the challenging IXMAS dataset show that the proposed algorithm outperforms the state-of-the-art methods on action recognition.
  16. 16. Evaluation of local spatio temporal features for action recognition paper is to evaluate and compare • The purpose of this • • • • previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations.
  17. 17. applications • • • • • Video Surveillance Robotics Human–Computer Interaction User Interface Design Multimedia Video Retrieval
  18. 18. Conclusion • • • • • In this paper, we proposed two new representations, namely, BoCP and the extended-MHI for action recognition. BoCP was a temporally local feature descriptor and the extended-MHI was a holistic motion descriptor. The extension of MHI compensated for information loss in the original approach and later we verified the conjecture that local and holistic features were complementary to each other. In this paper, our system showed promising performance and produced better results than any published paper on the IXMAS. With more sophisticated feature descriptors and advanced dimensionality reduction methods, we reckoned better performance.
  19. 19. Future Work • We propose to replace PCA (Principal Component Analysis) feature extraction by ICA (Independent Component Analysis), so that the accuracy of recognition can be improved.
  20. 20. References • X. Wu, D. Xu, L. Duan, and J. Luo, “Action recognition using context and appearance distribution features,” • H. Qu, L. Wang, and C. Leckie, “Action recognition using space-time shape difference images,” • D. Weinland, M. O¨ zuysal, and P. Fua, “Making action recognition robust to occlusions and viewpoint changes,” • L. Shao, D. Wu, and X. Chen, “Action recognition using correlogram of body poses and spectral regression,” • H. Wang, M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,”
  21. 21. /AvvenireTechnologies /avveniretech