Human Action Recognition Based on Spacio-temporal features-Poster

Human Action Recognition Based on Spatio-temporal Features Nikhil Sawant K. K. Biswas Department of Computer Science and Engineering Indian Institute of Technology, Delhi Target Localization Motion Features Fixed sized grid Possible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and time Background subtraction helps localizing the actor. ROI is marked around the actor ROI is the only region processed, rest all ignored We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows A fixed sized grid overlaid on the region of interest Dimension of the grid is (Xdiv x Ydiv) ROI is divided into both bij with cenres at cij respectively Organizing Optical Flows Simple averaging Weighted averaging Shape Feature Shape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle features Foreground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid. Noise Reduction Noise removal by averaging. Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI Unorganized optical flows organized optical flows Spatio-temporal Descriptor Learning with Adaboost Shape and motion features combined over the span of time to form spatio-temporal features TSPAN is the offset between the consecutive video frames TLEN is the number of video frames used We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers. We prepare mutually exclusive training and testing dataset. The system is trained first for the set of actions. For each give video system classifies it into one of the action class for which it is trained. TLEN and TSPAN allows us to capture large change in possibly small number of number of frames Data set Results and conclusion We observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features. We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2.

Human Action Recognition Based on Spatio-temporal Features Nikhil Sawant K. K. Biswas Department of Computer Science and Engineering Indian Institute of Technology, Delhi Fixed sized grid Motion Features Target Localization Possible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and time Background subtraction helps localizing the actor. ROI is marked around the actor ROI is the only region processed, rest all ignored We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows A fixed sized grid overlaid on the region of interest Dimension of the grid is (Xdiv x Ydiv) ROI is divided into both bij with cenres at cij respectively Organizing Optical Flows Noise Reduction Weighted averaging Noise removal by averaging. Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI Adaboost Shape Feature We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers. Classification in case of Adaboost can be binary or multiclass we make use of multiclass classification. We give ‘n’ action classes to the Adaboost system which trains itself to detect the pattern produced by different actions. Shape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle features Foreground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid. Learning Confusion matrix (weizman dataset) Unorganized optical flows organized optical flows Spatio-temporal features formed using shape and motion features. The features extracted from the training are provides to the learning system so that the pattern produced by the action classes is understood. We prepare mutually exclusive training and testing dataset. Once the system is trained with variety of samples from each class it is ready of action detection. For each given video system classifies it into one of the action class for which it is trained. Spatio-temporal Descriptor Shape and motion features combined over the span of time to form spatio-temporal features TSPAN is the offset between the consecutive video frames TLEN is the number of video frames used TLEN and TSPAN allows us to capture large change in possibly small number of number of frames Results and conclusion Data set We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2. We observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features.

Human Action Recognition Based on Spacio-temporal features-Poster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Human Action Recognition Based on Spacio-temporal features-Poster

Similar to Human Action Recognition Based on Spacio-temporal features-Poster (20)

Human Action Recognition Based on Spacio-temporal features-Poster