Human action recognition is still a challenging problem and researchers are focusing to investigate this
problem using different techniques. We propose a robust approach for human action recognition. This is
achieved by extracting stable spatio-temporal features in terms of pairwise local binary pattern (P-LBP)
and scale invariant feature transform (SIFT). These features are used to train an MLP neural network
during the training stage, and the action classes are inferred from the test videos during the testing stage.
The proposed features well match the motion of individuals and their consistency, and accuracy is higher
using a challenging dataset. The experimental evaluation is conducted on a benchmark dataset commonly
used for human action recognition. In addition, we show that our approach outperforms individual features
i.e. considering only spatial and only temporal feature.
Effective Object Detection and Background Subtraction by using M.O.IIJMTST Journal
Ā
This paper proposes efficient motion detection and people counting based on background subtraction using dynamic threshold approach with mathematical morphology. Here these different methods are used effectively for object detection and compare these performance based on accurate detection. Here the techniques frame differences, dynamic threshold based detection will be used. After the object foreground detection, the parameters like speed, velocity motion will be determined. For this, most of previous methods depend on the assumption that the background is static over short time periods. In dynamic threshold based object detection, morphological process and filtering also used effectively for unwanted pixel removal from the background. The background frame will be updated by comparing the current frame intensities with reference frame. Along with this dynamic threshold, mathematical morphology also used which has an ability of greatly attenuating color variations generated by background motions while still highlighting moving objects. Finally the simulated results will be shown that used approximate median with mathematical morphology approach is effective rather than prior background subtraction methods in dynamic texture scenes and performance parameters of moving object such sensitivity, speed and velocity will be evaluated by using MOI.
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Ā
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handlingā¦ In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
Remote sensing image fusion using contourlet transform with sharp frequency l...Zac Darcy
Ā
This paper addresses four different aspects of the remote sensing image fusion: i) image fusion method, ii)
quality analysis of fusion results, iii) effects of image decomposition level, and iv) importance of image
registration. First, a new contourlet-based image fusion method is presented, which is an improvement
over the wavelet-based fusion. This fusion method is then utilized withinthe main fusion process to analyze
the final fusion results. Fusion framework, scheme and datasets used in the study are discussed in detail.
Second, quality analysis of the fusion results is discussed using various quantitative metrics for both spatial
and spectral analyses. Our results indicate that the proposed contourlet-based fusion method performs
better than the conventional wavelet-based fusion methodsin terms of both spatial and spectral analyses.
Third, we conducted an analysis on the effects of the image decomposition level and observed that the
decomposition level of 3 produced better fusion results than both smaller and greater number of levels.
Last, we created four different fusion scenarios to examine the importance of the image registration. As a
result, the feature-based image registration using the edge features of the source images produced better
fusion results than the intensity-based imageregistration.
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...IJSRD
Ā
Person detection in a video surveillance system is major concern in real world. Several application likes abnormal event detection, congestion analysis, human gait characterization, fall detection, person identification, gender classification and for elderly people. In this algorithm, we use GMM method in background subtraction for multi person detection because of Gaussian Mixture Model (GMM) model is one such popular method this give a real time object detection. There is still not robustly but Multi person tracking with shadow removal fill this gap, in this work, HOG-LBP hybrid approach with GMM algorithm is presented for Multi person tracking with Shadow removal.
Image Registration using NSCT and Invariant MomentCSCJournals
Ā
Image registration is a process of matching images, which are taken at different times, from different sensors or from different view points. It is an important step for a great variety of applications such as computer vision, stereo navigation, medical image analysis, pattern recognition and watermarking applications. In this paper an improved feature point selection and matching technique for image registration is proposed. This technique is based on the ability of nonsubsampled contourlet transform (NSCT) to extract significant features irrespective of feature orientation. Then the correspondence between the extracted feature points of reference image and sensed image is achieved using Zernike moments. Feature point pairs are used for estimating the transformation parameters mapping the sensed image to the reference image. Experimental results illustrate the registration accuracy over a wide range for panning and zooming movement and also the robustness of the proposed algorithm to noise. Apart from image registration proposed method can be used for shape matching and object classification. Keywords: Image Registration, NSCT, Contourlet Transform, Zernike Moment.
Effective Object Detection and Background Subtraction by using M.O.IIJMTST Journal
Ā
This paper proposes efficient motion detection and people counting based on background subtraction using dynamic threshold approach with mathematical morphology. Here these different methods are used effectively for object detection and compare these performance based on accurate detection. Here the techniques frame differences, dynamic threshold based detection will be used. After the object foreground detection, the parameters like speed, velocity motion will be determined. For this, most of previous methods depend on the assumption that the background is static over short time periods. In dynamic threshold based object detection, morphological process and filtering also used effectively for unwanted pixel removal from the background. The background frame will be updated by comparing the current frame intensities with reference frame. Along with this dynamic threshold, mathematical morphology also used which has an ability of greatly attenuating color variations generated by background motions while still highlighting moving objects. Finally the simulated results will be shown that used approximate median with mathematical morphology approach is effective rather than prior background subtraction methods in dynamic texture scenes and performance parameters of moving object such sensitivity, speed and velocity will be evaluated by using MOI.
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Ā
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handlingā¦ In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
Remote sensing image fusion using contourlet transform with sharp frequency l...Zac Darcy
Ā
This paper addresses four different aspects of the remote sensing image fusion: i) image fusion method, ii)
quality analysis of fusion results, iii) effects of image decomposition level, and iv) importance of image
registration. First, a new contourlet-based image fusion method is presented, which is an improvement
over the wavelet-based fusion. This fusion method is then utilized withinthe main fusion process to analyze
the final fusion results. Fusion framework, scheme and datasets used in the study are discussed in detail.
Second, quality analysis of the fusion results is discussed using various quantitative metrics for both spatial
and spectral analyses. Our results indicate that the proposed contourlet-based fusion method performs
better than the conventional wavelet-based fusion methodsin terms of both spatial and spectral analyses.
Third, we conducted an analysis on the effects of the image decomposition level and observed that the
decomposition level of 3 produced better fusion results than both smaller and greater number of levels.
Last, we created four different fusion scenarios to examine the importance of the image registration. As a
result, the feature-based image registration using the edge features of the source images produced better
fusion results than the intensity-based imageregistration.
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...IJSRD
Ā
Person detection in a video surveillance system is major concern in real world. Several application likes abnormal event detection, congestion analysis, human gait characterization, fall detection, person identification, gender classification and for elderly people. In this algorithm, we use GMM method in background subtraction for multi person detection because of Gaussian Mixture Model (GMM) model is one such popular method this give a real time object detection. There is still not robustly but Multi person tracking with shadow removal fill this gap, in this work, HOG-LBP hybrid approach with GMM algorithm is presented for Multi person tracking with Shadow removal.
Image Registration using NSCT and Invariant MomentCSCJournals
Ā
Image registration is a process of matching images, which are taken at different times, from different sensors or from different view points. It is an important step for a great variety of applications such as computer vision, stereo navigation, medical image analysis, pattern recognition and watermarking applications. In this paper an improved feature point selection and matching technique for image registration is proposed. This technique is based on the ability of nonsubsampled contourlet transform (NSCT) to extract significant features irrespective of feature orientation. Then the correspondence between the extracted feature points of reference image and sensed image is achieved using Zernike moments. Feature point pairs are used for estimating the transformation parameters mapping the sensed image to the reference image. Experimental results illustrate the registration accuracy over a wide range for panning and zooming movement and also the robustness of the proposed algorithm to noise. Apart from image registration proposed method can be used for shape matching and object classification. Keywords: Image Registration, NSCT, Contourlet Transform, Zernike Moment.
Uniform and non uniform single image deblurring based on sparse representatio...ijma
Ā
Considering the sparseness property of images, a sparse representation based iterative deblurring method
is presented for single image deblurring under uniform and non-uniform motion blur. The approach taken
is based on sparse and redundant representations over adaptively training dictionaries from single
blurred-noisy image itself. Further, the K-SVD algorithm is used to obtain a dictionary that describes the
image contents effectively. Comprehensive experimental evaluation demonstrate that the proposed
framework integrating the sparseness property of images, adaptive dictionary training and iterative
deblurring scheme together significantly improves the deblurring performance and is comparable with the
state-of-the art deblurring algorithms and seeks a powerful solution to an ill-conditioned inverse problem.
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
Ā
Today, the surveillance systems and other monitoring systems are considering the capturing of image sequences in a single frame. The captured images can be combined to get the mosaiced image or combined image sequence. But the captured image may have quality issues like brightness issue, alignment issue (correlation issue), resolution issue, manual image registration issue etc. The existing technique like cross correlation can offer better image mosaicing but faces brightness issue in mosaicing. Thus, this paper introduces two different methods for mosaicing i.e., (a) Sliding Window Module (SWM) based Color Image Mosaicing (CIM) and (b) Discrete Cosine Transform (DCT) based CIM on Field Programmable Gate Array (FPGA). The SWM based CIM adopted for corner detection of two images and perform the automatic image registration while DCT based CIM aligns both the local as well as global alignment of images by using phase correlation approach. Finally, these two methods performances are analyzed by comparing with parameters like PSNR, MSE, device utilization and execution time. From the analysis it is concluded that the DCT based CIM can offers significant results than SWM based CIM.
International Journal of Engineering Research and DevelopmentIJERD Editor
Ā
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
VIDEO SEGMENTATION FOR MOVING OBJECT DETECTION USING LOCAL CHANGE & ENTROPY B...csandit
Ā
Motion detection and object segmentation are an important research area of image-video
processing and computer vision. The technique and mathematical modeling used to detect and
segment region of interest (ROI) objects comprise the algorithmic modules of various high-level
techniques in video analysis, object extraction, classification, and recognition. The detection of
moving object is significant in many tasks, such as video surveillance & moving object tracking.
The design of a video surveillance system is directed on involuntary identification of events of
interest, especially on tracking and on classification of moving objects. An entropy based realtime
adaptive non-parametric window thresholding algorithm for change detection is
anticipated in this research. Based on the approximation of the value of scatter of sections of
change in a difference image, a threshold of every image block is calculated discriminatively
using entropy structure, and then the global threshold is attained by averaging all thresholds for
image blocks of the frame. The block threshold is calculated contrarily for regions of change
and background. Investigational results show the proposed thresholding algorithm
accomplishes well for change detection with high efficiency.
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...cscpconf
Ā
Video Stabilization, which is important for better analysis and user experience, is typically done through Global Motion Estimation (GME) and Compensation. GME can be done in image domain using many techniques or in Transform domain using the well-known Phase Correlation methods which relate motion to phase shift in the spectrum. While image domain methods are generally slower (due to dense vector field computations), they can do global as well as local motion estimation. Transform domain methods cannot normally do local motion, but are faster and more accurate on homogeneous images, and are resilient to even rapid illumination changes and large motion. However both these approaches can become very time consuming if one needs more accuracy and smoothness because of the nature of the tradeoff. We show here that wavelet transforms can be used in a novel way to achieve a very smooth stabilization along with a significant speedup in this Fourier domain computation without sacrificing accuracy. We
do this by adaptively selecting and combining motion computed on a specific pair of sub-bands using the wavelet interpolation capability. Our approach yields a smooth, scalable, fast and
adaptive algorithm (based on time requirement and recent motion history) to yield significantly better accuracy than a single level wavelet decomposition based approach.
Feature Tracking of Objects in Underwater Video SequencesIDES Editor
Ā
Feature tracking is a key, underlying component in
many approaches to 3D reconstruction, detection, localization
and recognition of underwater objects. In this paper, we
proposed to adapt SIFT technique for feature tracking in
underwater video sequences. Over the past few years the
underwater vision is attracting researchers to investigate
suitable feature tracking techniques for underwater
applications. The researchers have developed many feature
tracking techniques such as KLT, SIFT, SURF etc., to track
the features in video sequence for general applications. The
literature survey reveals that there is no standard feature
tracker suitable for underwater environment. We proposed to
adapt SIFT technique for tracking features of objects in
underwater video sequence. The SIFT extracts features, which
are invariant to scale, rotation and affine transformations.
We have compared and evaluated SIFT with popular techniques
such as KLT and SURF on captured video sequence of
underwater objects. The experimental results shows that
adapted SIFT works well for underwater video sequence
Complex Background Subtraction Using Kalman FilterIJERA Editor
Ā
Background subtraction from dynamic background, At any location of the scene, this system extract a sequence of regular video bricks, i.e., video volumes spanning over both spatial and temporal domain. The background modeling is thus posed as pursuing subspaces within the video bricks while adapting the scene variations. For each sequence of video bricks, it pursues the subspace by employing the auto regressive moving average model that jointly characterizes the appearance consistency and temporal coherence of the observations. During online processing, it use tracking algorithm kalmanās filter for background/foreground classification and incrementally update the subspaces to cope with disturbances from foreground objects and scene changes.
Leader follower formation control of ground vehicles using camshift based gui...ijma
Ā
Autonomous ground vehicles have been designed for the purpose of that relies on ranging and bearing
information received from forward looking camera on the Formation control . A visual guidance control
algorithm is designed where real time image processing is used to provide feedback signals. The vision
subsystem and control subsystem work in parallel to accomplish formation control. A proportional
navigation and line of sight guidance laws are used to estimate the range and bearing information from the
leader vehicle using the vision subsystem. The algorithms for vision detection and localization used here
are similar to approaches for many computer vision tasks such as face tracking and detection that are
based color-and texture based features, and non-parametric Continuously Adaptive Mean-shift algorithms
to keep track of the leader. This is being proposed for the first time in the leader follower framework. The
algorithms are simple but effective for real time and provide an alternate approach to traditional based
approaches like the Viola Jones algorithm. Further to stabilize the follower to the leader trajectory, the
sliding mode controller is used to dynamically track the leader. The performance of the results is
demonstrated in simulation and in practical experiments.
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
Ā
Augmented reality has been a topic of intense research for several years for many applications. It consists of inserting a virtual object into a real scene. The virtual object must be accurately positioned in a desired place. Some measurements (calibration) are thus required and a set of correspondences between points on the calibration target and the camera images must be found. In this paper, we present a tracking technique based on both detection of Chessboard corners and a least squares method; the objective is to estimate the perspective transformation matrix for the current view of the camera. This technique does not require any information or computation of the camera parameters; it can used in real time without any initialization and the user can change the camera focal without any fear of losing alignment between real and virtual object.
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
Ā
Abstract: Many recent advancements have been introduced in both hardware and software technologies. Out of these technologies, we gain a lot of interest in Image Processing. As the eccentric identification of a vehicle, number plate is a key clue to discover theft and over-speed vehicles. The captured images from the camera are always in low resolution and suffer severe loss of edge information, which cast great challenge to existing blind deblurring methods. The blur kernel can be showed as linear uniform convolution and with angle and length estimation. In this paper, sparse representation is used to identify the blur kernel. Then, the length of the motion kernel has been estimated with Radon transform in Fourier domain. We evaluate our approach on real-world images and compare with several popular blind image deblurring algorithms. Based on the results obtained the supremacy of our proposed approach in terms of efficacy.
Keywords: Image Segmentation, Angle Estimation, Length Estimation, Deconvolution Algorithm, Artificial Neural Network.
Title: NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIMATION AND ANN
Author: S.KALAIVANI, K.PRAVEENA, T.PREETHI, N.PUNITHA
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
A Pattern Classification Based approach for Blur Classificationijeei-iaes
Ā
Blur type identification is one of the most crucial step of image restoration. In case of blind restoration of such images, it is generally assumed that the blur type is known prior to restoration of such images. However, it is not practical in real applications. So, blur type identification is extremely desirable before application of blind restoration technique to restore a blurred image. An approach to categorize blur in three classes namely motion, defocus, and combined blur is presented in this paper. Curvelet transform based energy features are utilized as features of blur patterns and a neural network is designed for classification. The simulation results show preciseness of proposed approach.
Silhouette analysis based action recognition via exploiting human posesAVVENIRE TECHNOLOGIES
Ā
We propose a novel scheme for human action recognition that combines the advantages of both local and global representations.
We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action.
Symbolic representation and recognition of gait an approach based on lbp of ...sipij
Ā
Gait is one of the biometric techniques used to identify an individual from a distance by his/her walking
style. Gait can be recognized by studying the static and dynamic part variations of individual body contour
during walk. In this paper, an interval value based representation and recognition of gait using local
binary pattern (LBP) of split gait energy images is proposed. The gait energy image (GEI) of a subject is
split into four equal regions. LBP technique is applied to each region to extract features and the extracted
features are well organized. The proposed representation technique is capable of capturing variations in
gait due to change in cloth, carrying a bag and different instances of normal walking conditions more
effectively. Experiments are conducted on the standard and considerably large database (CASIA database
B) and newly created University of Mysore (UOM) gait dataset to study the efficacy of the proposed gait
recognition system. The proposed system being robust to handle variations has shown significant
improvement in recognition rate.
CHARACTERIZING HUMAN BEHAVIOURS USING STATISTICAL MOTION DESCRIPTORsipij
Ā
Identifying human behaviors is a challenging research problem due to the complexity and variation of
appearances and postures, the variation of camera settings, and view angles. In this paper, we try to
address the problem of human behavior identification by introducing a novel motion descriptor based on
statistical features. The method first divide the video into N number of temporal segments. Then for each
segment, we compute dense optical flow, which provides instantaneous velocity information for all the
pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32
bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify dif erent human behaviors with the
accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.
Shot Boundary Detection using Radon Projection MethodIDES Editor
Ā
The detection of shot boundaries provides a base for
nearly all video abstraction and high-level video segmentation
approaches. Therefore solving the problem of shot boundary
detection is one of the major prerequisites for revealing
higher level video content structure. As a crucial step in video
indexing and retrieval, accurate shot boundary detection plays
an important role to organize and summarize video content
into meaningful shots for further scene analysis. Many
algorithms have been proposed for detecting video shot
boundaries and classifying shots and shot transition types. In
this paper we propose a novel technique for shot boundary
detection using radon transform. We first removed the effect
of illumination using DCT and DWT. Then shot boundary is
detected using radon transform. Radon transform is based on
projection of image intensity along a radial line oriented at a
specific angle. Projection of image intensity for current frame
is different than that of projection of the previous frame, where
shot boundary is detected. In order to verify the performance
of algorithm, experiments have been carried out with news,
documentary and movie. Experimental result demonstrates
efficiency of proposed shot boundary detection technique.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Uniform and non uniform single image deblurring based on sparse representatio...ijma
Ā
Considering the sparseness property of images, a sparse representation based iterative deblurring method
is presented for single image deblurring under uniform and non-uniform motion blur. The approach taken
is based on sparse and redundant representations over adaptively training dictionaries from single
blurred-noisy image itself. Further, the K-SVD algorithm is used to obtain a dictionary that describes the
image contents effectively. Comprehensive experimental evaluation demonstrate that the proposed
framework integrating the sparseness property of images, adaptive dictionary training and iterative
deblurring scheme together significantly improves the deblurring performance and is comparable with the
state-of-the art deblurring algorithms and seeks a powerful solution to an ill-conditioned inverse problem.
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
Ā
Today, the surveillance systems and other monitoring systems are considering the capturing of image sequences in a single frame. The captured images can be combined to get the mosaiced image or combined image sequence. But the captured image may have quality issues like brightness issue, alignment issue (correlation issue), resolution issue, manual image registration issue etc. The existing technique like cross correlation can offer better image mosaicing but faces brightness issue in mosaicing. Thus, this paper introduces two different methods for mosaicing i.e., (a) Sliding Window Module (SWM) based Color Image Mosaicing (CIM) and (b) Discrete Cosine Transform (DCT) based CIM on Field Programmable Gate Array (FPGA). The SWM based CIM adopted for corner detection of two images and perform the automatic image registration while DCT based CIM aligns both the local as well as global alignment of images by using phase correlation approach. Finally, these two methods performances are analyzed by comparing with parameters like PSNR, MSE, device utilization and execution time. From the analysis it is concluded that the DCT based CIM can offers significant results than SWM based CIM.
International Journal of Engineering Research and DevelopmentIJERD Editor
Ā
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
VIDEO SEGMENTATION FOR MOVING OBJECT DETECTION USING LOCAL CHANGE & ENTROPY B...csandit
Ā
Motion detection and object segmentation are an important research area of image-video
processing and computer vision. The technique and mathematical modeling used to detect and
segment region of interest (ROI) objects comprise the algorithmic modules of various high-level
techniques in video analysis, object extraction, classification, and recognition. The detection of
moving object is significant in many tasks, such as video surveillance & moving object tracking.
The design of a video surveillance system is directed on involuntary identification of events of
interest, especially on tracking and on classification of moving objects. An entropy based realtime
adaptive non-parametric window thresholding algorithm for change detection is
anticipated in this research. Based on the approximation of the value of scatter of sections of
change in a difference image, a threshold of every image block is calculated discriminatively
using entropy structure, and then the global threshold is attained by averaging all thresholds for
image blocks of the frame. The block threshold is calculated contrarily for regions of change
and background. Investigational results show the proposed thresholding algorithm
accomplishes well for change detection with high efficiency.
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...cscpconf
Ā
Video Stabilization, which is important for better analysis and user experience, is typically done through Global Motion Estimation (GME) and Compensation. GME can be done in image domain using many techniques or in Transform domain using the well-known Phase Correlation methods which relate motion to phase shift in the spectrum. While image domain methods are generally slower (due to dense vector field computations), they can do global as well as local motion estimation. Transform domain methods cannot normally do local motion, but are faster and more accurate on homogeneous images, and are resilient to even rapid illumination changes and large motion. However both these approaches can become very time consuming if one needs more accuracy and smoothness because of the nature of the tradeoff. We show here that wavelet transforms can be used in a novel way to achieve a very smooth stabilization along with a significant speedup in this Fourier domain computation without sacrificing accuracy. We
do this by adaptively selecting and combining motion computed on a specific pair of sub-bands using the wavelet interpolation capability. Our approach yields a smooth, scalable, fast and
adaptive algorithm (based on time requirement and recent motion history) to yield significantly better accuracy than a single level wavelet decomposition based approach.
Feature Tracking of Objects in Underwater Video SequencesIDES Editor
Ā
Feature tracking is a key, underlying component in
many approaches to 3D reconstruction, detection, localization
and recognition of underwater objects. In this paper, we
proposed to adapt SIFT technique for feature tracking in
underwater video sequences. Over the past few years the
underwater vision is attracting researchers to investigate
suitable feature tracking techniques for underwater
applications. The researchers have developed many feature
tracking techniques such as KLT, SIFT, SURF etc., to track
the features in video sequence for general applications. The
literature survey reveals that there is no standard feature
tracker suitable for underwater environment. We proposed to
adapt SIFT technique for tracking features of objects in
underwater video sequence. The SIFT extracts features, which
are invariant to scale, rotation and affine transformations.
We have compared and evaluated SIFT with popular techniques
such as KLT and SURF on captured video sequence of
underwater objects. The experimental results shows that
adapted SIFT works well for underwater video sequence
Complex Background Subtraction Using Kalman FilterIJERA Editor
Ā
Background subtraction from dynamic background, At any location of the scene, this system extract a sequence of regular video bricks, i.e., video volumes spanning over both spatial and temporal domain. The background modeling is thus posed as pursuing subspaces within the video bricks while adapting the scene variations. For each sequence of video bricks, it pursues the subspace by employing the auto regressive moving average model that jointly characterizes the appearance consistency and temporal coherence of the observations. During online processing, it use tracking algorithm kalmanās filter for background/foreground classification and incrementally update the subspaces to cope with disturbances from foreground objects and scene changes.
Leader follower formation control of ground vehicles using camshift based gui...ijma
Ā
Autonomous ground vehicles have been designed for the purpose of that relies on ranging and bearing
information received from forward looking camera on the Formation control . A visual guidance control
algorithm is designed where real time image processing is used to provide feedback signals. The vision
subsystem and control subsystem work in parallel to accomplish formation control. A proportional
navigation and line of sight guidance laws are used to estimate the range and bearing information from the
leader vehicle using the vision subsystem. The algorithms for vision detection and localization used here
are similar to approaches for many computer vision tasks such as face tracking and detection that are
based color-and texture based features, and non-parametric Continuously Adaptive Mean-shift algorithms
to keep track of the leader. This is being proposed for the first time in the leader follower framework. The
algorithms are simple but effective for real time and provide an alternate approach to traditional based
approaches like the Viola Jones algorithm. Further to stabilize the follower to the leader trajectory, the
sliding mode controller is used to dynamically track the leader. The performance of the results is
demonstrated in simulation and in practical experiments.
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
Ā
Augmented reality has been a topic of intense research for several years for many applications. It consists of inserting a virtual object into a real scene. The virtual object must be accurately positioned in a desired place. Some measurements (calibration) are thus required and a set of correspondences between points on the calibration target and the camera images must be found. In this paper, we present a tracking technique based on both detection of Chessboard corners and a least squares method; the objective is to estimate the perspective transformation matrix for the current view of the camera. This technique does not require any information or computation of the camera parameters; it can used in real time without any initialization and the user can change the camera focal without any fear of losing alignment between real and virtual object.
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
Ā
Abstract: Many recent advancements have been introduced in both hardware and software technologies. Out of these technologies, we gain a lot of interest in Image Processing. As the eccentric identification of a vehicle, number plate is a key clue to discover theft and over-speed vehicles. The captured images from the camera are always in low resolution and suffer severe loss of edge information, which cast great challenge to existing blind deblurring methods. The blur kernel can be showed as linear uniform convolution and with angle and length estimation. In this paper, sparse representation is used to identify the blur kernel. Then, the length of the motion kernel has been estimated with Radon transform in Fourier domain. We evaluate our approach on real-world images and compare with several popular blind image deblurring algorithms. Based on the results obtained the supremacy of our proposed approach in terms of efficacy.
Keywords: Image Segmentation, Angle Estimation, Length Estimation, Deconvolution Algorithm, Artificial Neural Network.
Title: NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIMATION AND ANN
Author: S.KALAIVANI, K.PRAVEENA, T.PREETHI, N.PUNITHA
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
A Pattern Classification Based approach for Blur Classificationijeei-iaes
Ā
Blur type identification is one of the most crucial step of image restoration. In case of blind restoration of such images, it is generally assumed that the blur type is known prior to restoration of such images. However, it is not practical in real applications. So, blur type identification is extremely desirable before application of blind restoration technique to restore a blurred image. An approach to categorize blur in three classes namely motion, defocus, and combined blur is presented in this paper. Curvelet transform based energy features are utilized as features of blur patterns and a neural network is designed for classification. The simulation results show preciseness of proposed approach.
Silhouette analysis based action recognition via exploiting human posesAVVENIRE TECHNOLOGIES
Ā
We propose a novel scheme for human action recognition that combines the advantages of both local and global representations.
We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action.
Symbolic representation and recognition of gait an approach based on lbp of ...sipij
Ā
Gait is one of the biometric techniques used to identify an individual from a distance by his/her walking
style. Gait can be recognized by studying the static and dynamic part variations of individual body contour
during walk. In this paper, an interval value based representation and recognition of gait using local
binary pattern (LBP) of split gait energy images is proposed. The gait energy image (GEI) of a subject is
split into four equal regions. LBP technique is applied to each region to extract features and the extracted
features are well organized. The proposed representation technique is capable of capturing variations in
gait due to change in cloth, carrying a bag and different instances of normal walking conditions more
effectively. Experiments are conducted on the standard and considerably large database (CASIA database
B) and newly created University of Mysore (UOM) gait dataset to study the efficacy of the proposed gait
recognition system. The proposed system being robust to handle variations has shown significant
improvement in recognition rate.
CHARACTERIZING HUMAN BEHAVIOURS USING STATISTICAL MOTION DESCRIPTORsipij
Ā
Identifying human behaviors is a challenging research problem due to the complexity and variation of
appearances and postures, the variation of camera settings, and view angles. In this paper, we try to
address the problem of human behavior identification by introducing a novel motion descriptor based on
statistical features. The method first divide the video into N number of temporal segments. Then for each
segment, we compute dense optical flow, which provides instantaneous velocity information for all the
pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32
bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify dif erent human behaviors with the
accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.
Shot Boundary Detection using Radon Projection MethodIDES Editor
Ā
The detection of shot boundaries provides a base for
nearly all video abstraction and high-level video segmentation
approaches. Therefore solving the problem of shot boundary
detection is one of the major prerequisites for revealing
higher level video content structure. As a crucial step in video
indexing and retrieval, accurate shot boundary detection plays
an important role to organize and summarize video content
into meaningful shots for further scene analysis. Many
algorithms have been proposed for detecting video shot
boundaries and classifying shots and shot transition types. In
this paper we propose a novel technique for shot boundary
detection using radon transform. We first removed the effect
of illumination using DCT and DWT. Then shot boundary is
detected using radon transform. Radon transform is based on
projection of image intensity along a radial line oriented at a
specific angle. Projection of image intensity for current frame
is different than that of projection of the previous frame, where
shot boundary is detected. In order to verify the performance
of algorithm, experiments have been carried out with news,
documentary and movie. Experimental result demonstrates
efficiency of proposed shot boundary detection technique.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Exploration of Normalized Cross Correlation to Track the Object through Vario...iosrjce
Ā
Object tracking is a process devoted to locate the pathway of moving object in the succession of
frames. The tracking of the object has been emerged as a challenging facet in the fields of robot navigation,
military, traffic monitoring and video surveillance etc. In the first phase of contributions, the tracking of object
is exercised by means of matching between the template and exhaustive image through the Normalized Cross
Correlation (NCCR). In order to update the template, the moving objects are detected using frame difference
technique at regular interval of frames. Subsequently, NCCR or Principal Component Analysis (PCA) or
Histogram Regression Line (HRL) of the template and moving objects are estimated to find the best match to
update the template. The second phase discusses the tracking of object between the template and partitioned
image through the NCCR with reduced computational aspects. However, the updating schemes remain same.
Here, an exploration with varied bench mark dataset has been carried out. Further, the comparative analysis of
the proposed systems with different updating schemes such as NCCR, PCA and HRL has been succeeded. The
offered systems considerably reveal the capability to track an object indisputably under diverse illumination conditions.
Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo
Ā
Jingwei Ji, Ranjay Krishna, Li Fei-Fei, and Juan Carlos Niebles. Action genome: Actions as composition of spatio-temporal scene graphs. arXiv preprint arXiv:1912.06992, 2019.
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
Ā
Visual prior from generic real-world images study to represent that objects in a scene. The existing work presented online tracking algorithm to transfers visual prior learned offline for online object tracking. To learn complete dictionary to represent visual prior with collection of real world images. Prior knowledge of objects is generic and training image set does not contain any observation of target object. Transfer learned visual prior to construct object representation using Sparse coding and Multiscale max pooling. Linear classifier is learned online to distinguish target from background and also to identify target and background appearance variations over time. Tracking is carried out within Bayesian inference framework and learned classifier is used to construct observation model. Particle filter is used to estimate the tracking result sequentially however, unable to work efficiently in noisy scenes. Time sift variance were not appropriated to track target object with observer value to prior information of object structure. Proposal HMM based kalman filter to improve online target tracking in noisy sequential image frames. The covariance vector is measured to identify noisy scenes. Discrete time steps are evaluated for identifying target object with background separation. Experiment conducted on challenging sequences of scene. To evaluate the performance of object tracking algorithm in terms of tracking success rate, Centre location error, Number of scenes, Learning object sizes, and Latency for tracking.
Real-time Moving Object Detection using SURFiosrjce
Ā
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Survey On Tracking Moving Objects Using Various AlgorithmsIJMTST Journal
Ā
Sparse representation has been applied to the object tracking problem. Mining the self- similarities between particles via multitask learning can improve tracking performance. How-ever, some particles may be different from others when they are sampled from a large region. Imposing all particles share the same structure may degrade the results. To overcome this problem, we propose a tracking algorithm based on robust multitask sparse representation (RMTT) in this letter. When we learn the particle representations, we decompose the sparse coefficient matrix into two parts in our algorithm. Joint sparse regularization is imposed on one coefficient matrix while element-wise sparse regularization is imposed on another matrix. The former regularization exploits self-similarities of particles while the later one considers the differences between them.
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNETgerogepatton
Ā
The development of convolutional neural networks(CNN) has provided a new tool to make classification and prediction of human's body motion. This project tends to predict the drop point of a ball thrown out by experimenters by classifying the motion of their body in the process of throwing. Kinect sensor v2 is used to record depth maps and the drop points are recorded by a square infrared induction module. Firstly, convolutional neural networks are made use of to put the data obtained from depth maps in and get the prediction of drop point according to experimenters' motion. Secondly, huge amount of data is used to trainthe networks of different structure, and a network structure that could provide high enough accuracy for drop point prediction is established. The network model and parameters are modified to improve the accuracy of the prediction algorithm. Finally, the experimental data is divided into a training group and a test group. The prediction results of test group reflect that the prediction algorithm effectively improves the accuracy of human motion perception.
Motion Prediction Using Depth Information of Human Arm Based on Alexnetgerogepatton
Ā
The development of convolutional neural networks(CNN) has provided a new tool to make classification
and prediction of human's body motion. This project tends to predict the drop point of a ball thrown out by
experimenters by classifying the motion of their body in the process of throwing. Kinect sensor v2 is used to
record depth maps and the drop points are recorded by a square infrared induction module. Firstly,
convolutional neural networks are made use of to put the data obtained from depth maps in and get the
prediction of drop point according to experimenters' motion. Secondly, huge amount of data is used to train
the networks of different structure, and a network structure that could provide high enough accuracy for
drop point prediction is established. The network model and parameters are modified to improve the
accuracy of the prediction algorithm. Finally, the experimental data is divided into a training group and a
test group. The prediction results of test group reflect that the prediction algorithm effectively improves the
accuracy of human motion perception.
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...ijitjournal
Ā
Human pose estimation is one of the key problems in computer visionthat has been studied in the recent
years. The significance of human pose estimation is in the higher level tasks of understanding human
actions applications such as recognition of anomalous actions present in videos and many other related
applications. The human poses can be estimated by extracting silhouettes of humans as silhouettes are
robust to variations and it gives the shape information of the human body. Some common challenges
include illumination changes, variation in environments, and variation in human appearances. Thus there
is a need for a robust method for human pose estimation. This paper presents a study and analysis of
approaches existing for silhouette extraction and proposes a robust technique for extracting human
silhouettes in video sequences. Gaussian Mixture Model (GMM) A statistical approach is combined with
HSV (Hue, Saturation and Value) color space model for a robust background model that is used for
background subtraction to produce foreground blobs, called human silhouettes. Morphological operations
are then performed on foreground blobs from background subtraction. The silhouettes obtained from this
work can be used in further tasks associated with human action interpretation and activity processes like
human action classification, human pose estimation and action recognition or action interpretation.
DevOps and Testing slides at DASA ConnectKari Kakkonen
Ā
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Ā
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Ā
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatās changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
Ā
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Ā
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Ā
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as āpredictable inferenceā.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Ā
Are you looking to streamline your workflows and boost your projectsā efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, youāre in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part āEssentials of Automationā series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Hereās what youāll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
Weāll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Donāt miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Ā
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
1. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
DOI : 10.5121/sipij.2017.8601 1
HUMAN ACTION RECOGNITION IN VIDEOS
USING STABLE FEATURES
Mohib Ullah1
, Habib Ullah2
and Ibrahim M. Alseadoon2
1
Department of Computer Science,
Norwegian University of Science and Technology, GjĆøvik, Norway
2
Department of Computer Science and Software Engineering,
University of Haāil, Haāil, Saudi Arabia
ABSTRACT
Human action recognition is still a challenging problem and researchers are focusing to investigate this
problem using different techniques. We propose a robust approach for human action recognition. This is
achieved by extracting stable spatio-temporal features in terms of pairwise local binary pattern (P-LBP)
and scale invariant feature transform (SIFT). These features are used to train an MLP neural network
during the training stage, and the action classes are inferred from the test videos during the testing stage.
The proposed features well match the motion of individuals and their consistency, and accuracy is higher
using a challenging dataset. The experimental evaluation is conducted on a benchmark dataset commonly
used for human action recognition. In addition, we show that our approach outperforms individual features
i.e. considering only spatial and only temporal feature.
KEYWORDS
Neural Networks, Local Binary Pattern, Action Recognition, and Scale Invariant Feature Transform
1. INTRODUCTION
Human action recognition is a significant component in many applications including but not
limited to video surveillance, ambient intelligence, human-computer interaction systems, and
health-care. In spite of incredible research efforts and many encouraging advances in the last ten
years, accurate recognition of the human actions is still a challenging problem.
To improve human action recognition performance, many techniques have been proposed and
they rely on: Bag-of-Word [1] representations extracted from spatio-temporal interest points [2],
dynamic time warping [3] algorithm derived from exemplar-based approaches, and eigenjoints
[4] stem from skeleton-based approaches. However, designing effective features for human action
recognition is difficult due to large intra-class variation arising from pose appearance and
temporal variations. Therefore, it is crucial to design discriminative features. It is worth noticing
that the combination of features could boost the discriminative power of a model. Considering
combination of two features could provide much more information than observing occurrence of
two features individually. We propose a robust approach to combine spatial and temporal features
to design a unified model for human action recognition. For this purpose, we consider pairwise
local binary pattern (P-LBP) [40] and scale invariant feature transform (SIFT) [41]. The P-LBP
2. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
2
and SIFT are spatial and temporal features, respectively. We then adopt an MLP neural network
using the spatio-temporal features (P-LBP and SIFT) during the training stage. The action classes
are inferred from the testing videos during the testing stage. The overall process of our proposed
approach is presented in Fig. 1.
The rest of the paper is organized as follows: Section 2 presents the related work; Section 3
elaborates our proposed method of features extraction; Section 4 presents experimental
evaluation; and Section 5 concludes this paper.
Figure 1. Flow diagram. The training videos are provided to train the neural network and testing videos are
used to recognize action classes during the testing stage.
2. RELATED WORK
Herath et al. [5] and Dhulekar et al. [6] present comprehensive surveys on action recognition
methods. Fernando et al. [7] propose a function-based temporal pooling method that explores the
latent structure of the video sequence. For this purpose, they find out that how frame-level
features evolve over time in a video. Rahmani et al. [8] present a robust non-linear knowledge
transfer model for human action recognition from different views. The model is based on a deep
neural network that transfers knowledge of human actions from any unprecedented view to a
shared high-level virtual view by finding a set of non-linear transformations that combines the
views. Idrees et al. [9] introduce the THUMOS challenge to serve as a benchmark for action
recognition. In THUMOS, action recognition is promoted to a more practical level by introducing
temporally untrimmed videos. These also include background videos which share similar scenes
and backgrounds as action videos, but are devoid of the specific actions. Zhang et al. [10] present
3. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
3
a descriptor called 3D histograms of texture to extract discriminant features from a sequence. The
descriptor compactly characterizes the salient information of a specific action, on which texture
features are calculated to represent the action. Yang et al. [11] propose a discriminative multi-
instance multitask learning framework for human action recognition. For this purpose, they
discover the intrinsic relationship between joint configurations and action classes. Wang et al.
[12] recognize human action by extracting feature point matches between frames using SURF
descriptors and dense optical flow. Han et al. [1] extract sparse geometric features based on the
second generation Bandelet transformation. Liu et al. [14] select, characteristic frames using a
martingale-based method, followed by the formation of the corresponding motion history through
backtracking along the characteristic frames.
Deep learning approaches [15, 16] are also used for human action recognition. For using these
approaches, the characteristics of deep networks can be exploited either by improving the
connectivity of the network in time [17] or by using optical flow. The convolutional independent
subspace analysis method [18] is a deep neural network that uses both visual appearance and
motion information in an unsupervised way on video volumes instead of frames as input to the
network. However, human actions last several seconds depicting spatio-temporal structure. The
deep learning based methods use this structure and learn action representations at the level of a
few video frames failing to model actions at their full temporal extent. To cope with this problem,
Varol et al. [19] learn video representations using neural networks with longterm temporal
convolutions. They demonstrate that this type of modeling with increased temporal extents
improve the accuracy of action recognition.
Some researchers prefer depth sensors over color cameras due to their invariance to lightning and
color conditions. These methods are generally based on a bag of 3D points [20], projected depth
maps [21, 22, 23], spatio-temporal depth cuboid [24], occupancy patterns [25], surface normals
[26, 27], and skeleton joints [28, 29, 30, 31]. Methods [32, 33, 34, 35, 36] considering only depth
maps can deteriorate from noisy depth maps. To handle this problem, Tran et al. [37] propose a
feature combination scheme. However, it can be costly if several features are used since this
method needs one classifier and one weight coefficient for each feature, separately. Wang et al.
[38] present the actionlet ensemble model (AEM) that exploits both the depth maps and skeleton
joints. The AEM combines the relative 3D position of subgroups of skeleton joints and the local
occupancy pattern descriptor. Additionally, to capture the temporal structure of actions, they use
the short time Fourier transform on fused features to model the final feature vector.
3. PROPOSED METHOD
First we consider spatial feature P-LBP. In order to calculate pairwise rotation invariant local
binary pattern (P-LBP), for each pixel on the video frame, a threshold is applied to its symmetric
neighbor set on a circle of radius R and the result is considered as a binary number. The P-LBP is
formulated in Eq. (1) and Eq. (2).
4. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
4
Where x is a coordinate, Np(x) is the pth
neighbor of point x, and V(x) is the pixel value of point x.
It is worth noticing that the function sp(x) isnāt affected by changes in mean luminance.
Therefore, the P-LBP could achieve invariance. In order to achieve rotation invariance, Ojala et
al. [39] formulates the rotation invariance in Eq. (3)
Where ROR(x, i) calculates a circular bit-wise right shift for i times on P-bit number x. Ojala et
al. [39] also investigate that patterns presenting limited spatial transitions indicate the
fundamental properties of frame microstructure. For example, the pattern represented by the
sequence 11110000 describes a local edge, and the pattern represented by the sequence 11111111
describes a flat region or a dark spot. To formally define these patterns, a uniformity measure is
formulated in Eq. (4),
where sP (x) is defined as s0(x). The uniform patterns are subject to the condition U(x) ā¤ 2.
We calculate temporal feature using SIFT. Scale invariant feature transform (SIFT) have different
properties that make them suitable for extracting temporal information. In fact, SIFT features are
invariant to scaling and rotation, and partially invariant to change in illumination and 3D camera
viewpoint. They are well concentrated in both the spatial and frequency domains, reducing the
chances of disruption by occlusion and other unprecedented noise. SIFT features are very
distinctive, which allows a single feature to be uniquely identified, providing a basis for human
action recognition.
For extracting the SIFT features, the scale space of a video frame is defined as a function L(x, y,
Ļ) that is produced from the convolution of the input video frame with a variable scale Gaussian
as formulated in Eq. (5)
Where G(x, y, Ļ) is the variable scale Gaussian as formulated in Eq. (6)
In the next stage, the difference of two scales separated by a constant multiplicative factor k is
computed according to Eq. (7)
5. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
5
To find the local maxima and minima of D(x, y, Ļ), each spot is compared to its eight neighbors
in the frame under observation and nine neighbors in the scale above and below it. It is chosen
only if it is larger or smaller than all the neighbors. An important aspect of SIFT approach is that
it produces large numbers of features that densely cover the video frame over the full range of
scales and locations. The quantity of features is particularly important for human action
recognition, where the ability to consider objects exhibiting different actions is very crucial.
We extract both spatial and temporal features from a set of testing videos. We then adopt an MLP
feed-forward neural network to learn the behavior of these features instead of considering the
values of all the pixels. In fact, these features are exploited to learn different classes of human
actions. The motivation for exploring MLP is in its substantial ability, through backpropagation,
to resist to noise, and the dexterity to generalize. During the training stage, the weights W and
biases b are updated so that the actual output y becomes closer to the desired output d. For this
purpose, a cost function is defined as in Eq. (8).
The cost function calculates the squared error between the desired and actual output vectors.
Backpropagation algorithm requires the gradient of the cost function E(W ,b) with respect to the
weights and biases in each iteration for optimizing the overall cost. According to the learning rate
Ī±, the parameters are updated as in Eq. (9) and Eq. (10).
4. EXPERIMENTAL EVALUATION
We extensively evaluated our approach on benchmark KTH dataset [2]. A few frames from the
dataset are depicted in Fig. 2. The KTH dataset is very diverse since a set of actions are viewed in
front of a uniform background. It consists of six human action classes: walking, jogging, running,
boxing, waving and clapping. Each action is performed several times by 25 participants. Four
different scenarios were considered for recording these sequences: outdoors, outdoors with scale
variation, outdoors with different clothes, and indoors. In total, the dataset consists of 2391 video
samples. Each sequence averages about 4 seconds in length.
6. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
6
Figure 2. KTH dataset. There are six human action in this dataset including walking, jogging, running,
boxing, waving, and clapping
The neural network has been configured using one input layer, two hidden layers and one output
layer. The input layer consists of three neurons, each hidden layer consists of three neurons, and a
single neuron is considered in the output layer. The adjustment of the neural network in terms of
number of layers and number of neuron does not affect the performance significantly. We use an
MLP neural network to learn six action classes based on the extracted spatio-temporal features.
The experimental results are presented in Table 1 in term of a confusion matrix. It can be seen
that our proposed method shows very good performance on all action classes of KTH dataset. To
further elaborate the effectiveness of our proposed method, we carried out experiments
considering only spatial and only temporal features. These results are presented in Table 2 and
Table 3. It can be seen that the performance considering either spatial or temporal features
significantly declines. Thus it shows the robustness of our method.
Table 1. Our spatio-temporal model. Our proposed spatio-temporal model shows very good performance
considering KTH dataset.
7. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
7
Table 2. Only spatial features. It can be seen that the performance significantly declines using only spatial
features P-LBP.
Table 3. Only temporal features. It can be seen that the performance significantly declines also in this case
considering only temporal features SIFT.
5. CONCLUSIONS
In this paper, we proposed an approach for human action recognition using spatio-temporal
features and an MLP feed-forward neural network. We demonstrated the capability of our
approach in capturing the dynamics of different classes by extracting these features. These
features adopt the MLP neural network to learn six action classes. The main advantage of the
proposed method is its simplicity and robustness.
REFERENCES
[1] Niebles, J. C., Wang, H., & Fei-Fei, L. Unsupervised learning of human action categories using
spatial-temporal words. International journal of computer vision, 79(3), 299-318, 2008.
[2] Laptev, Ivan. "On space-time interest points." International journal of computer vision 64.2-3: 107-
123, 2005.
[3] Rabiner, Lawrence R., and Biing-Hwang Juang. "Fundamentals of speech recognition." 1993.
[4] Yang, Xiaodong, and YingLi Tian. "Effective 3d action recognition using eigenjoints." Journal of
Visual Communication and Image Representation 25.1: 2-11, 2014
8. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
8
[5] Herath, Samitha, Mehrtash Harandi, and Fatih Porikli. "Going deeper into action recognition: A
survey." Image and Vision Computing 60: 4-21, 2017.
[6] Dhulekar, Pravin, et al. "Human Action Recognition: An Overview." Proceedings of the International
Conference on Data Engineering and Communication Technology. Springer Singapore, 2017.
[7] Fernando, Basura, et al. "Rank pooling for action recognition." IEEE transactions on pattern analysis
and machine intelligence 39.4: 773-787, 2017.
[8] Rahmani, Hossein, Ajmal Mian, and Mubarak Shah. "Learning a deep model for human action
recognition from novel viewpoints." IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2017.
[9] Idrees, Haroon, et al. "The THUMOS challenge on action recognition for videos āin the wildā."
Computer Vision and Image Understanding 155: 1-23, 2017.
[10] Zhang, Baochang, et al. "Action recognition using 3D histograms of texture and a multi-class
boosting classifier." IEEE Transactions on Image Processing 26.10: 4648-4660, 2017.
[11] Yang, Yanhua, et al. "Discriminative Multi-instance Multitask Learning for 3D Action Recognition."
IEEE Transactions on Multimedia 19.3: 519-529, 2017.
[12] Wang, H., Oneata, D., Verbeek, J., & Schmid, C. A robust and efficient video representation for
action recognition. International Journal of Computer Vision, 119(3), 219-238, 2016.
[13] Han, H., and X. J. Li. "Human action recognition with sparse geometric features." The Imaging
Science Journal 63, no. 1: 45-53, 2015.
[14] Liu, Xueping, Yibo Li, and Qing Shen. "Real-time action detection and temporal segmentation in
continuous video." The Imaging Science Journal 65, no. 7: 418-427, 2017.
[15] Minhas, Rashid, Aryaz Baradarani, Sepideh Seifzadeh, and QM Jonathan Wu. "Human action
recognition using extreme learning machine based on visual vocabularies." Neurocomputing 73, no.
10: 1906-1917, 2010
[16] Wang, Tingwei, Chuancai Liu, and Liantao Wang. "Action recognition by Latent Duration Model."
Neurocomputing 273: 111-119, 2018.
[17] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, āLarge-scale video
classification with convolutional neural networks,ā in CVPR, 2014.
[18] Z. Lan, D. Yao, M. Lin, S.-I. Yu, and A. Hauptmann, āThe best of both worlds: Combining data-
independent and data-driven approaches for action recognition,ā arXiv preprint arXiv:1505.04427,
2015.
[19] Varol, Gul, Ivan Laptev, and Cordelia Schmid. "Long-term temporal convolutions for action
recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
[20] Wanqing L,ZhangZ, Liu Z (2010) Action recognition based on a bag of 3d points. In: IEEE computer
society conference on Computer Vision and Pattern Recognition Workshops CVPRW, 2010.
[21] Chen C, Jafari R, Kehtarnavaz N. Action recognition from depth sequences using depth motion maps
based local binary patterns. In: 2015 IEEE winter conference on Applications of Computer Vision
WACV, 2015.
9. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
9
[22] Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps.
Journal of real-time image processing, p 1ā9, 2013.
[23] Xiaodong Y, Zhang C, Tian YL. Recognizing actions using depth motion maps-based histograms of
oriented gradients. Proceedings of the 20th ACM international conference on multimedia, 2012.
[24] Lu X, Aggarwal JK. Spatio-temporal depth cuboid similarity feature for activity recognition using
depth camera. In: IEEE conference on Computer Vision and Pattern Recognition CVPR, 2013.
[25] Wang J, Liu Z, Chorowski J, Chen Z, Wu, Y. Robust 3d action recognition with random occupancy
patterns. In: ECCV, p 872ā885, 2012.
[26] Omar O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth
sequences. In: IEEE conference on Computer Vision and Pattern Recognition CVPR, 2013.
[27] XiaodongY, TianYL. Super normal vector for activityrecognitionusingdepthsequences. 2014IEEE
conference on Computer Vision and Pattern Recognition CVPR, 2014.
[28] Bin Liang and Lihong Zheng. A Survey on Human Action Recognition Using Depth Sensors. In
International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages
1ā8, 2015.
[29] Georgios E, Singh G, Horaud R Skeletalquads: Human action recognition using joint quadruples. In:
2014 22nd international conference on Pattern Recognition ICPR, 2014.
[30] Raviteja V, Arrate F, Chellappa R. Human action recognition by representing 3d skeletonsas points in
a lie group. In: IEEE conference on Computer Vision and Pattern Recognition CVPR, 2014.
[31] Xiaodong Y, Tian YL. Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In:
IEEE computer society conference on Computer Vision and Pattern Recognition Workshops, 2012.
[32] Wanqing Li, Zhengyou Zhang, and Zicheng Liu. Action Recognition Based on a Bag of 3D Points. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 9ā14, 2010.
[33] C. Lu, J. Jia, and C. K. Tang. Range-Sample Depth Feature for Action Recognition. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pages 772ā779, 2014.
[34] O. Oreifej and Zicheng Liu. HON4D: Histogram of Oriented 4D Normals for Activity Recognition
from Depth Sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pages 716ā 723, June 2013.
[35] H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. Histogram of Oriented Principal Components for
Cross-View Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), PP(99):1ā1, 2016.
[36] X. Yang and Y. Tian. Super Normal Vector for Activity Recognition Using Depth Sequences. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 804ā811, 2014.
[37] Q. D. Tran and N. Q. Ly. An effective fusion scheme of spatio-temporal features for human action
recognition in RGB-D video. In International Conference on Control, Automation and Information
Sciences (ICCAIS), pages 246ā251, 2013.
10. Signal & Image Processing : An International Journal (SIPIJ) Vol.8, No.6, December 2017
10
[38] Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. Mining Actionlet Ensemble for Action
Recognition with Depth Cameras. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 1290ā1297, June 2012.
[39] Ojala, T., Pietikainen, M., & Maenpaa, T. Multiresolution gray-scale and rotation invariant texture
classification with local binary patterns. IEEE Transactions on pattern analysis and machine
intelligence, 24(7), 971-987, 2002.
[40] Qi, Xianbiao, Rong Xiao, Chun-Guang Li, Yu Qiao, Jun Guo, and Xiaoou Tang. "Pairwise rotation
invariant co-occurrence local binary pattern." IEEE Transactions on Pattern Analysis and Machine
Intelligence 36, no. 11: 2199-2213, 2014.
[41] Lowe, David G. "Object recognition from local scale-invariant features." In Computer vision, 1999.
The proceedings of the seventh IEEE international conference on, vol. 2, pp. 1150-1157. Ieee, 1999.
AUTHORS
Mohib Ullah did Master degree in Telecommunication engineering from the
University of Trento, Italy, in 2015. He is currently pursuing Ph.D. degree in
computer science from the Norwegian University of Science and Technology
(NTNU), GjĆøvik, Norway. His research interests include crowd analysis, object
detection and tracking, and human action recognition.
Habib Ullah did PhD in 2015 from the University of Trento, Italy. He is currently
working as Assistant Professor in the College of Computer Science and Engineering at
the University of Haāil, Haāil, Saudi Arabia. His research interests include action
recognition, crowd motion analysis, and machine learning.
Ibrahim M. Alsaadoen did PhD from the Queensland University of Technology,
Brisbane, Australia. Currently, he is working as Assistant Professor and Dean of the
College of Computer Science and Engineering at the University of Haāil, Saudi
Arabia. His research interests encompass information security, risk management, and
computer vision.