The document proposes a method for human action recognition based on spatio-temporal features. It extracts optical flow-based motion features on a fixed grid over the region of interest and uses Viola-Jones features to extract shape features. These features are combined over time to form spatio-temporal descriptors, which are classified using AdaBoost into different action classes. The method is tested on a custom dataset with 7 actions and the Weizman dataset, achieving an overall error rate of 2.17%.
발표자: 이인웅 (연세대 박사과정)
발표일: 2017.12.
개요:
영상에서 사람의 행동을 인식하는 방법은 크게 영상에서 직접적으로 행동 라벨을 추출하는 것과 자세 정보를 기반으로 행동 라벨을 추출하는 것으로 나뉠 수 있습니다.
본 발표는 행동 인식에 대한 전반적인 개요를 설명하고 그 중에서도 사람의 자세 정보를 기반으로 하는 행동 인식 기술에 초점을 두고 최근 ICCV 2017 학회에서 발표된 Temporal Sliding LSTM 네트워크를 활용한 행동 인식 기술을 중점적으로 설명합니다. 구체적으로, 스켈레톤 기반 행동 인식 이슈, 제안하는 방법과 실험 결과들이 소개되고 앞으로 나아갈 만한 새로운 연구 이슈들도 추가적으로 설명합니다.
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...cscpconf
Video Stabilization, which is important for better analysis and user experience, is typically done through Global Motion Estimation (GME) and Compensation. GME can be done in image domain using many techniques or in Transform domain using the well-known Phase Correlation methods which relate motion to phase shift in the spectrum. While image domain methods are generally slower (due to dense vector field computations), they can do global as well as local motion estimation. Transform domain methods cannot normally do local motion, but are faster and more accurate on homogeneous images, and are resilient to even rapid illumination changes and large motion. However both these approaches can become very time consuming if one needs more accuracy and smoothness because of the nature of the tradeoff. We show here that wavelet transforms can be used in a novel way to achieve a very smooth stabilization along with a significant speedup in this Fourier domain computation without sacrificing accuracy. We
do this by adaptively selecting and combining motion computed on a specific pair of sub-bands using the wavelet interpolation capability. Our approach yields a smooth, scalable, fast and
adaptive algorithm (based on time requirement and recent motion history) to yield significantly better accuracy than a single level wavelet decomposition based approach.
Geometric wavelet transform for optical flow estimation algorithmijcga
This paper described an algorithm for computing the optical flow (OF) vector of a moving objet in a video sequence based on geometric wavelet transform (GWT). This method tries to calculate the motion between two successive frames by using a GWT. It consists to project the OF vectors on a basis of geometric wavelet. Using GWT for OF estimation has been attracting much attention. This approach takes advantage of the geometric wavelet filter property and requires only two frames. This algorithm is fast and able to estimate the OF with a low-complexity. The technique is suitable for video compression, and can be used for stereo vision and image registration.
발표자: 이인웅 (연세대 박사과정)
발표일: 2017.12.
개요:
영상에서 사람의 행동을 인식하는 방법은 크게 영상에서 직접적으로 행동 라벨을 추출하는 것과 자세 정보를 기반으로 행동 라벨을 추출하는 것으로 나뉠 수 있습니다.
본 발표는 행동 인식에 대한 전반적인 개요를 설명하고 그 중에서도 사람의 자세 정보를 기반으로 하는 행동 인식 기술에 초점을 두고 최근 ICCV 2017 학회에서 발표된 Temporal Sliding LSTM 네트워크를 활용한 행동 인식 기술을 중점적으로 설명합니다. 구체적으로, 스켈레톤 기반 행동 인식 이슈, 제안하는 방법과 실험 결과들이 소개되고 앞으로 나아갈 만한 새로운 연구 이슈들도 추가적으로 설명합니다.
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...cscpconf
Video Stabilization, which is important for better analysis and user experience, is typically done through Global Motion Estimation (GME) and Compensation. GME can be done in image domain using many techniques or in Transform domain using the well-known Phase Correlation methods which relate motion to phase shift in the spectrum. While image domain methods are generally slower (due to dense vector field computations), they can do global as well as local motion estimation. Transform domain methods cannot normally do local motion, but are faster and more accurate on homogeneous images, and are resilient to even rapid illumination changes and large motion. However both these approaches can become very time consuming if one needs more accuracy and smoothness because of the nature of the tradeoff. We show here that wavelet transforms can be used in a novel way to achieve a very smooth stabilization along with a significant speedup in this Fourier domain computation without sacrificing accuracy. We
do this by adaptively selecting and combining motion computed on a specific pair of sub-bands using the wavelet interpolation capability. Our approach yields a smooth, scalable, fast and
adaptive algorithm (based on time requirement and recent motion history) to yield significantly better accuracy than a single level wavelet decomposition based approach.
Geometric wavelet transform for optical flow estimation algorithmijcga
This paper described an algorithm for computing the optical flow (OF) vector of a moving objet in a video sequence based on geometric wavelet transform (GWT). This method tries to calculate the motion between two successive frames by using a GWT. It consists to project the OF vectors on a basis of geometric wavelet. Using GWT for OF estimation has been attracting much attention. This approach takes advantage of the geometric wavelet filter property and requires only two frames. This algorithm is fast and able to estimate the OF with a low-complexity. The technique is suitable for video compression, and can be used for stereo vision and image registration.
【ITSC2015】Fine-grained Walking Activity Recognition via Driving Recorder DatasetHirokatsu Kataoka
ITSC2015
http://www.itsc2015.org/
The paper presents a fine-grained walking activity recognition toward an inferring pedestrian intention which is an important topic to predict and avoid a pedestrian’s dangerous activity. The fine-grained activity recognition is to distinguish different activities between subtle changes such as walking with different directions. We believe a change of pedestrian’s activity is significant to grab a pedestrian intention. However, the task is challenging since a couple of reasons, namely (i) in-vehicle mounted camera is always moving (ii) a pedestrian area is too small to capture a motion and shape features (iii) change of pedestrian activity (e.g. walking straight into turning) has only small feature difference. To tackle these problems, we apply vision-based approach in order to classify pedestrian activities. The dense trajectories (DT) method is employed for high-level recognition to capture a detailed difference. Moreover, we additionally extract detection-based region-of-interest (ROI) for higher performance in fine-grained activity recognition. Here, we evaluated our proposed approach on “self-collected dataset” and “near-miss driving recorder (DR) dataset” by dividing several activities– crossing, walking straight, turning, standing and riding a bicycle. Our proposal achieved 93.7% on the self-collected NTSEL traffic dataset and 77.9% on the near-miss DR dataset.
Noise Removal in SAR Images using Orthonormal Ridgelet TransformIJERA Editor
Development in the field of image processing for reducing speckle noise from digital images/satellite images is a challenging task for image processing applications. Previously many algorithms were proposed to de-speckle the noise in digital images. Here in this article we are presenting experimental results on de-speckling of Synthetic Aperture RADAR (SAR) images. SAR images have wide applications in remote sensing and mapping the surfaces of all planets. SAR can also be implemented as "inverse SAR" by observing a moving target over a substantial time with a stationary antenna. Hence denoising of SAR images is an essential task for viewing the information. Here we introduce a transformation technique called ―Ridgelet‖, which is an extension level of wavelet. Ridgelet analysis can be done in the similar way how wavelet analysis was done in the Radon domain as it translates singularities along lines into point singularities under different frequencies. Simulation results were show cased for proving that proposed work is more reliable than compared to other de-speckling processes, and the quality of de-speckled image is measured in terms of Peak Signal to Noise Ratio and Mean Square Error
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)Matthew O'Toole
Recent advances in both computational photography and displays have given rise to a new generation of computational devices. Computational cameras and displays provide a visual experience that goes beyond the capabilities of traditional systems by adding computational power to optics, lights, and sensors. These devices are breaking new ground in the consumer market, including lightfield cameras that redefine our understanding of pictures (Lytro), displays for visualizing 3D/4D content without special eyewear (Nintendo 3DS), motion-sensing devices that use light coded in space or time to detect motion and position (Kinect, Leap Motion), and a movement toward ubiquitous computing with wearable cameras and displays (Google Glass).
This short (1.5 hour) course serves as an introduction to the key ideas and an overview of the latest work in computational cameras, displays, and light transport.
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
This presentation was prepared for ViPr Reading group at Multimedia University, Cyberjaya. The goal of this presentation was to make aware the lab members about the recent advancements in action recognition.
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupLihang Li
This is the slides about DTAM for my group meeting report, hope it does help to anyone who will want to implement DTAM and need to understand it deeply.
This slide was presented in a talk invited by mCube Inc. last month in Taipei, Taiwan. Here to share with those who may be interested. And also welcome to invite me for sharing related topics.
Magnetic tracking is one of miscellaneous motion capture methods, and maybe the oldest. However, its working principle is rarely introduced in detail perhaps due to its early adaptation resides in military and medical industry. Due to my interest in VR & animation MoCap, I’ve spent some time digging into the very depth of it and would like to share some non-confidential knowledge of it with you.
In this slide, a short history of magnetic tracking will be visited, followed by its working principle and algorithm simulation. Hope you enjoy it.
If you wanna discuss something in depth with me, please don’t hesitate to contact me via: dibao.wang@gmail.com
Automatic identification of animal using visual and motion saliencyeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
【ITSC2015】Fine-grained Walking Activity Recognition via Driving Recorder DatasetHirokatsu Kataoka
ITSC2015
http://www.itsc2015.org/
The paper presents a fine-grained walking activity recognition toward an inferring pedestrian intention which is an important topic to predict and avoid a pedestrian’s dangerous activity. The fine-grained activity recognition is to distinguish different activities between subtle changes such as walking with different directions. We believe a change of pedestrian’s activity is significant to grab a pedestrian intention. However, the task is challenging since a couple of reasons, namely (i) in-vehicle mounted camera is always moving (ii) a pedestrian area is too small to capture a motion and shape features (iii) change of pedestrian activity (e.g. walking straight into turning) has only small feature difference. To tackle these problems, we apply vision-based approach in order to classify pedestrian activities. The dense trajectories (DT) method is employed for high-level recognition to capture a detailed difference. Moreover, we additionally extract detection-based region-of-interest (ROI) for higher performance in fine-grained activity recognition. Here, we evaluated our proposed approach on “self-collected dataset” and “near-miss driving recorder (DR) dataset” by dividing several activities– crossing, walking straight, turning, standing and riding a bicycle. Our proposal achieved 93.7% on the self-collected NTSEL traffic dataset and 77.9% on the near-miss DR dataset.
Noise Removal in SAR Images using Orthonormal Ridgelet TransformIJERA Editor
Development in the field of image processing for reducing speckle noise from digital images/satellite images is a challenging task for image processing applications. Previously many algorithms were proposed to de-speckle the noise in digital images. Here in this article we are presenting experimental results on de-speckling of Synthetic Aperture RADAR (SAR) images. SAR images have wide applications in remote sensing and mapping the surfaces of all planets. SAR can also be implemented as "inverse SAR" by observing a moving target over a substantial time with a stationary antenna. Hence denoising of SAR images is an essential task for viewing the information. Here we introduce a transformation technique called ―Ridgelet‖, which is an extension level of wavelet. Ridgelet analysis can be done in the similar way how wavelet analysis was done in the Radon domain as it translates singularities along lines into point singularities under different frequencies. Simulation results were show cased for proving that proposed work is more reliable than compared to other de-speckling processes, and the quality of de-speckled image is measured in terms of Peak Signal to Noise Ratio and Mean Square Error
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)Matthew O'Toole
Recent advances in both computational photography and displays have given rise to a new generation of computational devices. Computational cameras and displays provide a visual experience that goes beyond the capabilities of traditional systems by adding computational power to optics, lights, and sensors. These devices are breaking new ground in the consumer market, including lightfield cameras that redefine our understanding of pictures (Lytro), displays for visualizing 3D/4D content without special eyewear (Nintendo 3DS), motion-sensing devices that use light coded in space or time to detect motion and position (Kinect, Leap Motion), and a movement toward ubiquitous computing with wearable cameras and displays (Google Glass).
This short (1.5 hour) course serves as an introduction to the key ideas and an overview of the latest work in computational cameras, displays, and light transport.
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
This presentation was prepared for ViPr Reading group at Multimedia University, Cyberjaya. The goal of this presentation was to make aware the lab members about the recent advancements in action recognition.
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupLihang Li
This is the slides about DTAM for my group meeting report, hope it does help to anyone who will want to implement DTAM and need to understand it deeply.
This slide was presented in a talk invited by mCube Inc. last month in Taipei, Taiwan. Here to share with those who may be interested. And also welcome to invite me for sharing related topics.
Magnetic tracking is one of miscellaneous motion capture methods, and maybe the oldest. However, its working principle is rarely introduced in detail perhaps due to its early adaptation resides in military and medical industry. Due to my interest in VR & animation MoCap, I’ve spent some time digging into the very depth of it and would like to share some non-confidential knowledge of it with you.
In this slide, a short history of magnetic tracking will be visited, followed by its working principle and algorithm simulation. Hope you enjoy it.
If you wanna discuss something in depth with me, please don’t hesitate to contact me via: dibao.wang@gmail.com
Automatic identification of animal using visual and motion saliencyeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Discovering Anomalies Based on Saliency Detection and Segmentation in Surveil...ijtsrd
This paper proposes extracting salient objects from motion fields. Salient object detection is an important technique for many content-based applications, but it becomes a challenging work when handling the clustered saliency maps, which cannot completely highlight salient object regions and cannot suppress background regions. We present algorithms for recognizing activity in monocular video sequences, based on discriminative gradient Random Field. Surveillance videos capture the behavioral activities of the objects accessing the surveillance system. Some behavior is frequent sequence of events and some deviate from the known frequent sequences of events. These events are termed as anomalies and may be susceptible to criminal activities. In the past, work was based on discovering the known abnormal events. Here, the unknown abnormal activities are to be detected and alerted such that early actions are taken. K. Shankar | Dr. S. Srinivasan | Dr. T. S. Sivakumaran | K. Madhavi Priya"Discovering Anomalies Based on Saliency Detection and Segmentation in Surveillance System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-1 , December 2017, URL: http://www.ijtsrd.com/papers/ijtsrd5871.pdf http://www.ijtsrd.com/engineering/computer-engineering/5871/discovering-anomalies-based-on-saliency-detection-and-segmentation-in-surveillance-system/k-shankar
Real-time Moving Object Detection using SURFiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES sipij
Human action recognition is still a challenging problem and researchers are focusing to investigate this
problem using different techniques. We propose a robust approach for human action recognition. This is
achieved by extracting stable spatio-temporal features in terms of pairwise local binary pattern (P-LBP)
and scale invariant feature transform (SIFT). These features are used to train an MLP neural network
during the training stage, and the action classes are inferred from the test videos during the testing stage.
The proposed features well match the motion of individuals and their consistency, and accuracy is higher
using a challenging dataset. The experimental evaluation is conducted on a benchmark dataset commonly
used for human action recognition. In addition, we show that our approach outperforms individual features
i.e. considering only spatial and only temporal feature.
Event recognition image & video segmentationeSAT Journals
Abstract This paper gives a clear look at the segmentation process at the basic level. Segmentation is done at multiple levels so that we will get different results. Segmentation of relative motion descriptors gives a clear picture about the segmentation done for a given input video. Relative motion computation and histograms incrementation are used to evaluate this approach. Also here we will give complete information about the related research which is done about how segmentation can be done for the both images and videos. Keywords: Image Segmentation, Video Segmentation.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Similar to Human Action Recognition Based on Spacio-temporal features-Poster (20)
Human Action Recognition Based on Spacio-temporal features-Poster
1. Human Action Recognition Based on Spatio-temporal Features Nikhil Sawant K. K. Biswas Department of Computer Science and Engineering Indian Institute of Technology, Delhi Target Localization Motion Features Fixed sized grid Possible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and time Background subtraction helps localizing the actor. ROI is marked around the actor ROI is the only region processed, rest all ignored We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows A fixed sized grid overlaid on the region of interest Dimension of the grid is (Xdiv x Ydiv) ROI is divided into both bij with cenres at cij respectively Organizing Optical Flows Simple averaging Weighted averaging Shape Feature Shape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle features Foreground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid. Noise Reduction Noise removal by averaging. Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI Unorganized optical flows organized optical flows Spatio-temporal Descriptor Learning with Adaboost Shape and motion features combined over the span of time to form spatio-temporal features TSPAN is the offset between the consecutive video frames TLEN is the number of video frames used We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers. We prepare mutually exclusive training and testing dataset. The system is trained first for the set of actions. For each give video system classifies it into one of the action class for which it is trained. TLEN and TSPAN allows us to capture large change in possibly small number of number of frames Data set Results and conclusion We observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features. We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2.
2. Human Action Recognition Based on Spatio-temporal Features Nikhil Sawant K. K. Biswas Department of Computer Science and Engineering Indian Institute of Technology, Delhi Fixed sized grid Motion Features Target Localization Possible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and time Background subtraction helps localizing the actor. ROI is marked around the actor ROI is the only region processed, rest all ignored We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows A fixed sized grid overlaid on the region of interest Dimension of the grid is (Xdiv x Ydiv) ROI is divided into both bij with cenres at cij respectively Organizing Optical Flows Noise Reduction Weighted averaging Noise removal by averaging. Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI Adaboost Shape Feature We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers. Classification in case of Adaboost can be binary or multiclass we make use of multiclass classification. We give ‘n’ action classes to the Adaboost system which trains itself to detect the pattern produced by different actions. Shape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle features Foreground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid. Learning Confusion matrix (weizman dataset) Unorganized optical flows organized optical flows Spatio-temporal features formed using shape and motion features. The features extracted from the training are provides to the learning system so that the pattern produced by the action classes is understood. We prepare mutually exclusive training and testing dataset. Once the system is trained with variety of samples from each class it is ready of action detection. For each given video system classifies it into one of the action class for which it is trained. Spatio-temporal Descriptor Shape and motion features combined over the span of time to form spatio-temporal features TSPAN is the offset between the consecutive video frames TLEN is the number of video frames used TLEN and TSPAN allows us to capture large change in possibly small number of number of frames Results and conclusion Data set We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2. We observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features.