Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Download to read offline

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

  1. 1. Presented in Partial Fullment of the Requirements of the Degree of Masters of Science in the School of Communication and Information Technology Fadwa Fawzy Fouad Supervisor: Dr. Moataz M.Abdelwahab
  2. 2. Agenda   Introduction  Quick overview  2DHOOF/2DPCA Contour Based Optical Flow Algorithm  Human Gesture Recognition Employing Radon Transform/2DPCA
  3. 3. Introduction  • Importance & Applications • Action V.S. Activity • Challenges & characteristics of the domain
  4. 4. Importance &Applications  Human actionactivity recognition is one of the most promising applications of computer vision. The interest of this topic is motivated by the promise of many applications include • character animation for games and movies • advanced intelligent user interfaces • biomechanical analysis of actions for sports and medicine • automatic surveillance
  5. 5. Action V.S. Activity  Action Activity  Single person  Complex sequence of actions  Single/ multiple person(s)  Short time duration  Long time duration  Simple motion pattern
  6. 6. Challenges and characteristics of the domain  The difficulty of the recognition process is associated with multiple variation sources  Inter- and intra-class variations  Environmental Variations and Capturing conditions  Temporal variations
  7. 7. • Inter-class variations (variations within single class) The variations in the performance of certain action due to anthropometric differences between individuals. For example, running movements can differ in speed and stride length. • Intra-class variations (variations within different classes) Overlap between different action classes due to the similarity in actions performance.
  8. 8. • Environmental variations Destructions originate from the actor’s surroundings include dynamic or cluttered environments, illumination variation, Body occlusion • Capturing conditions Depend on the method used to capture the scene, wither singlemultiple static/dynamic camera(s) systems. • Temporal variations Includes the changes in the performance rate from one person to another. Also, the changes in the recording rate (frame/sec).
  9. 9. Agenda   Introduction  Quick overview  2DHOOF/2DPCA Contour Based Optical Flow Algorithm  Human Gesture Recognition Employing Radon Transform/2DPCA
  10. 10. Overview  The main structure of action recognition system
  11. 11. The main structure of action recognition system  The structure of the action recognition system is typically hierarchical. Action classification Extraction of the action descriptors Human detection & segmentation Capture the input video
  12. 12. Capture the input video For single camera, the scene is captured from only one viewpoint, so it can't provide enough information about the action performed in case of poor viewpoint. Besides, it can't handle the occlusion problem. Video 1 Video 3 Video 2 Video 4
  13. 13. Multi-camera systems can capture the same view from different poses., so they provide sufficient information that can alleviate the occlusion problem. Camera 0 Camera 2 Camera 1 Camera 3
  14. 14. The new technology of Kinect depth camera can be utilized to capture the performed actions. The device has: RGB camera, depth sensor and multi-array microphone. It provides full-body 3D motion capture, facial recognition and voice recognition capabilities. Furthermore, depth information can be used for segmentation. RGB information Kinect depth camera Depth information
  15. 15. Human detection & segmentation It’s the first step of the full process of human sequence evaluation. Techniques can be divided into : • Background Subtraction techniques • Motion Based techniques • Appearance Based techniques • Depth Based Segmentation
  16. 16. Extraction of the action descriptors Input videos consist of massive amounts of information in the form of spatiotemporal pixel intensity variations. But most of this information is not directly relevant to the task of understanding and identifying the activity occurring in the video. In this work we used Non-Parametric approaches in which a set of features are extracted per video frame, then these features are accumulated and matched to stored templates. Example: Motion Energy Image & Motion History Image
  17. 17. Action classification When the extracted features are available for an input video, human action recognition becomes a classification problem. Dimensionality reduction is a common step before the actual classification and is discussed first. Dimensionality reduction Image representations are often high-dimensional. This makes matching task computationally more expensive. Also, the representation might contain noisy features. This problem trigged the idea of obtaining a more compact, robust feature representation by reducing the space of the image representation into a lower dimensional space. Example: OneTwo Dimension(s) Principal component analysis (PCA)
  18. 18. Nearest neighbor classification k-Nearest neighbor (NN) classifiers use the distance between the features of an observed sequence and those in a training set. The most common label among the k closest training sequences is chosen as the classification. NN classification can be either performed at the frame level, or for the whole video sequences. In the latter case, issues with different frame lengths need to be resolved. In our work we used 1-NN with Euclidean distance to classify the tested actions. is class is class
  19. 19. Agenda   Introduction  Quick overview  2DHOOF/2DPCA Contour Based Optical Flow Algorithm  Human Gesture Recognition Employing Radon Transform/2DPCA
  20. 20. 2DHOOF/2DPCA Contour Based Optical Flow Algorithm  • • • • • Dense V.S. Sparse OF Alignment issues with OF The Calculation of 2D Histogram of Optical Flow(2DHOOF) Overall System Description Experimental Results
  21. 21. Dense V.S. Sparse OF  In practice, dense OF is not the best choice to get the OF. Besides it’s high computation complexity, it is not accurate for homogenous moving objects (aperture problem).
  22. 22. Alignment issues with OF  We had two choices to decide the best order for actor alignment:  Align actor then calculate OF  Calculate OF then Align it
  23. 23. Jumping & Transition effects in Running action
  24. 24. Calculate OF then Align OF Align actor then calculate OF
  25. 25. The Calculation of 2D Histogram of Optical Flow(2DHOOF)  Calculated OF Histogram layers W/m x H/m x n
  26. 26. An example to obtain the n-layers 2DHOOF for any two successive frames
  27. 27. Accumulated 2D-HOOF that represents the whole video
  28. 28. 1DHOOF V.S. 2DHOOF 
  29. 29. Confusion between Wave and Bend actions when using 1DHOOF Wave Bend
  30. 30. Overall System Description  Segmentation & Contour Extraction Extract the dominant vectors Store extracted features Training Mode Sparse OF Testing Mode Segmentation & Contour Extraction 2DHOOF Projection on the dominant vectors Sparse OF 2DPCA Classification and Voting Scheme 2DHOOF
  31. 31. Training Mode  Segmentation & Contour Extraction Sparse OF Extract the dominant vectors 2DHOOF Store extracted features 2DPCA
  32. 32. Segmentation & Contour Extraction (Method 1) • Geodesic segmentation Where xi : stroke pixels (black) x : other pixels (white) I : image intensity Input Video Frame Face Detection Initial Stroke Blob Extraction Final Contour
  33. 33. Segmentation & Contour Extraction (Method 2) • Contour extraction from Magnitude dense OF Edge pixel has specific criteria based on it's (3 x 3) neighbor pixels.
  34. 34. Applying edgy criteria on the magnitude of the dense OF
  35. 35. Steps of contour extraction from dense OF
  36. 36. Training Mode  Segmentation & Contour Extraction Sparse OF Extract the dominant vectors 2DHOOF Store extracted features 2DPCA
  37. 37. 2DHOOF-2DPCA Features Extraction 2DHOOF of Training Videos Final Features Projection
  38. 38. Training Mode  Segmentation & Contour Extraction Sparse OF Extract the dominant vectors 2DHOOF Store extracted features 2DPCA
  39. 39. Testing Mode  Segmentation & Contour Extraction Projection on the dominant vectors Sparse OF Classification and Voting Scheme 2DHOOF
  40. 40. Projection on the dominant vectors
  41. 41. Classification D3 Final Decision based on the minimum D value
  42. 42. Experimental Results  Two experiments were conducted to evaluate the performance of the proposed algorithm. • For the first experiment Weizmann dataset was used to measure the performance of the low resolution single camera operation. • For the second Experiment IXMAS multi-view dataset was used to evaluate the performance of the parallel camera structure. The two experiments was conducted using the Leave-One-Actor-Out (LOAO) technique to be consistent with the most recent algorithms. Both datasets provide RGB frames and the actor ‘s silhouettes.
  43. 43. Weizmann dataset  The Weizmann dataset consists of 90 low-resolution video sequences showing 9 different actors, each performing 10 natural actions such as walk, run, jump forward, gallop sideways, bend, wave with one hand (wave1), wave with two hands (wave2), jump in place (Pjump), jump-jack, and skip. Bend Run Jump Jump-jack Gallop
  44. 44. The confusion matrix for this experiment shows that the average recognition accuracy is 97.78%, and eight actions were 100% accurate. 2DHOOF / 2DPCA
  45. 45. On the other hand, using 1DHOOF with 1DPCA decreases the accuracy to 63.34% because of the large confusion between actions (as discussed before). 1DHOOF / 1DPCA
  46. 46. Comparison with the most recent algorithms: • Recognition Accuracy Method Accuracy Previous Contribution 98.89% Our Algorithm 97.79% Shah et al. 95.57% Yang et al. 92.8% Yuan et al. 92.22% • Average Testing Time Method Average Runtime Our Algorithm 66.11 msec Previous Contribution 113.00 msec Shah et al. 18.65 sec Blank et al. 30 sec
  47. 47. Samples from the calculated contour OF Walk Skip P-jump
  48. 48. IXMAS Dataset  The proposed parallel structure algorithm was applied on the IXMAS multi-view dataset. Each camera is considered as an independent system, then a voting scheme was carried out between the four cameras to obtain the final decision. This dataset consists of 5 cameras capturing the scene, 12 actors, each performing Our 13 natural actions 3 times in which the actors are free to change their orientation Camera0 Algorithm for each scenario. Our Voting Scheme Camera1 The actions: check Algorithm arms, scratch head, sit down, get up, turn watch, cross Final around, walk, wave, punch, kick, and pick up and throw. Camera2 Our Algorithm Camera3 Our Algorithm Decision
  49. 49. Example on IXMAS multi-camera dataset. Action: Pick up and Throw Camera 0 Camera 2 Camera 1 Camera 3
  50. 50. Confusion matrix for IXMAS dataset shows that average accuracy is 87.12%,where SH=Scratch head, CW=Check watch, CA=Cross arms, SD=Sit down, GU=Get up, TA=Turn around, PU=Pick up.
  51. 51. Cam(2) % Cam(3) % 12 97.29 79.04 72.47 78.53 87.12 Previous Contribution 12 78.9 78.61 80.93 77.38 84.59 Weinland et al. 10 65.04 70.00 54.30 66.00 81.30 Srivastava et al. 10 N/A N/A N/A N/A 81.40 Shah et al. 12 72.00 53.00 68.00 63.00 78.00 Method Overall Vote% Cam(1) % Proposed Algorithm Actors # Cam(0) % Comparison with the best reported accuracies shows that we achieved the highest accuracy with an enhancement of 3%. Bold indicates the best performance, N/A= Not available in published reports
  52. 52. Samples from the calculated contour OF Walk Set down Kick
  53. 53. Published Paper  F. Fawzy, M. Abdelwahab, and W. Mikhael. 2DHOOF-2DPCA Contour Based Optical Flow Algorithm for Human Activity Recognition . IEEE International Midwest Symposium on Circuits and Systems (MWSCAS 2013), Ohio, USA.
  54. 54. Agenda   Introduction  Quick overview  2DHOOF/2DPCA Contour Based Optical Flow Algorithm  Human Gesture Recognition Employing Radon Transform/2DPCA
  55. 55. Human Gesture Recognition Employing Radon Transform/2DPCA  • Radon Transform (RT) • Overall system description
  56. 56. Radon Transform  The RT computes projections of an image matrix along specified directions. A projection of a two-dimensional function f(x,y) is a set of line integrals along parallel paths, or beams.
  57. 57. Overall system description  The proposed system is designed and tested for gesture recognition and can be extended to regular action recognition. We have two modes for this algorithm • Training Mode • Testing Mode Both have a pre-processing step before feature extraction.
  58. 58. Training Mode 
  59. 59. Pre-processing Step: 1) Input videos The One Shot Learning ChaLearn Gesture Dataset was used for this experiment. In this dataset a single user facing a fixed Kinect™ camera, interacting with a computer by performing gestures was captured. Videos are represented by RGB and depth images. Each actor has from 8 to 15 different gestures(vocabulary) for training, and 47 input videos each has from 1 to 5 gesture(s) for testing. We applied our algorithm on a subset of this dataset consists of 37 different actors.
  60. 60. The dataset can be divided into two main groups; standing actors, and sitting actors. In this experiment we used a subset of the standing actor group in which actors are using their whole body to perform the gesture and make significant motion to be captured by the MEI and MHI. Standing actors Sitting actors
  61. 61. Also, we used only the depth videos as input videos. Depth information makes the segmentation task easier than using RGB or gray videos, especially when the actor's clothes have the same color as the background, or textured background.
  62. 62. Training Mode 
  63. 63. Pre-processing Step: 2) Segmentation & Blob extraction We used Basic Global Thresholding Algorithm in order to extract the actor's blob.
  64. 64. In some cases the resultant blob has some objects with it. This noise results from some objects that were at the same depth as the actor. Case 1 Case 2 Case 3
  65. 65. In this situation we perform a noise elimination step Case 1 Case 2 Case 3
  66. 66. Training Mode 
  67. 67. Alignment using RT of the First Frame  • Vertical alignment using the projection on the y-axis (90o from RT)
  68. 68. • Horizontal alignment using the projection on the x-axis (0o from RT)
  69. 69. Training Mode 
  70. 70. Calculate the MEI and MHI  Whole Body MEI Body Parts MHI MEI MHI
  71. 71. Training Mode 
  72. 72. Get Radon Transform for MEI and MHI 
  73. 73. Basically, the difference between RT of the whole body and RT of the body parts is the white portion in the center representing the projection of the actor's body
  74. 74. Training Mode 
  75. 75. Testing Mode 
  76. 76. Video Chopping  As we have mentioned, the testing videos may contain from 1 to 5 different gestures per video. In this case we need to separate these gestures into one gesture per video to test our system with. We can do that by two main steps : 1. Calculate the plot that represents the moving area/frame 2. Apply the Local minima criteria on this plot.
  77. 77. 1. Calculate the plot that represents the moving area/frame
  78. 78. 2. Apply the Local minima criteria We are searching for a frame i that satisfies the following conditions: a) The number of frames before this i Threshold. is greater than or equal to the Frame b) The amount of decrease in the area at i value. is greater than 50% of Peak c) The area at i-1 and i+1 is grater than the area at i to insure that i is a local minima between two peaks.
  79. 79. Good Results
  80. 80. Bad Results
  81. 81. Experimental Results  We did four One Shot Learning experiments I, II Radon Transform OSL Experiments III, IV MEI/MHI 2DPCA Direct correlation 2DPCA Direct correlation
  82. 82. Recognition accuracy of the four experiments RT MEI/MHI Experiment Whole Body Body Parts MEI MHI MEI MHI I 71 69 82 81.5 II 70 70 81.7 81.6 III 70 68 82 81.7 IV 71.24 68.7 83.33 82.9 Better Comparison between using RT, and using MEI/MHI directly without RT Features % Maintained Energy Storage Requirements RT 99% 72 Mbytes MEI/MHI 88% 102Mbytes 2DPCA Features

Views

Total views

1,343

On Slideshare

0

From embeds

0

Number of embeds

19

Actions

Downloads

62

Shares

0

Comments

0

Likes

0

×