Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so
that it can effectively capture the features of related nodes and improve the performance of model. More
attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges
with the existing methods based on GCNs. First, the consistency of temporal and spatial features is ignored
due to extracting features node by node and frame by frame. We design a generic representation of
skeleton sequences for action recognition and propose a novel model called Temporal Graph Networks
(TGN), which can obtain spatiotemporal features simultaneously. Secondly, the adjacency matrix of graph
describing the relation of joints are mostly depended on the physical connection between joints. We
propose a multi-scale graph strategy to appropriately describe the relations between joints in skeleton
graph, which adopts a full-scale graph, part-scale graph and core-scale graph to capture the local features
of each joint and the contour features of important joints. Extensive experiments are conducted on two
large datasets including NTU RGB+D and Kinetics Skeleton. And the experiments results show that TGN
with our graph strategy outperforms other state-of-the-art methods.
GPGPU-Assisted Subpixel Tracking Method for Fiducial MarkersNaoki Shibata
With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to realize this, we devised a connected-component labeling method suitable for GPGPU execution. In order to improve accuracy, we devised a method for detecting 2D positions of the corners of markers in subpixel accuracy. We implemented our method in Java and OpenCL, and we confirmed that the proposed method provides better detection and measurement accuracy, and recognizing from high-resolution images is beneficial for improving accuracy. We also confirmed that our method is more than two times as fast as the existing method with CPU computation.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Intel, Intelligent Systems Lab: Syable View Synthesis WhitepaperAlejandro Franceschi
Intel, Intelligent Systems Lab:
Stable View Synthesis Whitepaper
We present Stable View Synthesis (SVS). Given a set
of source images depicting a scene from freely distributed
viewpoints, SVS synthesizes new views of the scene. The
method operates on a geometric scaffold computed via
structure-from-motion and multi-view stereo. Each point
on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of
this point in the input images.
The core of SVS is view dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view.
The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection.
Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse realworld datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes.
AN ENHANCED SEPARABLE REVERSIBLE DATA HIDING IN ENCRYPTED IMAGES USING SIDE M...Editor IJMTER
This paper proposes a scheme for Enhanced Separable Reversible Data Hiding in
Encrypted images Using Side Match. In the first step the original image is encrypted using an
encryption key. Then additional data is embedded into the image by modifying a small portion of the
encrypted image using a data hiding key. With an encrypted image containing additional data, if a
receiver has the data hiding key, he can extract the additional data. If the receiver has the encryption
key, he can decrypt the image, but cannot extract the additional data. If the receiver has both the data
hiding key and encryption key, he can extract the additional data and recover the original content by
exploiting the spatial correlation in natural images. The accuracy of data extraction is improved by
using a better scheme for measuring the smoothness of the received image, and uses the Side Match
scheme to further decrease the error rate of extracted bits.
Robust foreground modelling to segment and detect multiple moving objects in ...IJECEIAES
Last decade has witnessed an ever increasing number of video surveillance installa- tions due to the rise of security concerns worldwide. With this comes the need for video analysis for fraud detection, crime investigation, traffic monitoring to name a few. For any kind of video analysis application, detection of moving objects in videos is a fundamental step. In this paper, an efficient foreground modelling method to segment multiple moving objects is implemented. Proposed method significantly reduces noise thereby accurately segmenting region of interest under dynamic conditions while handling occlusion to a large extent. Extensive performance analysis shows that the proposed method was found to give far better results when compared to the de facto standard as well as relatively new approaches used for moving object detection.
GPGPU-Assisted Subpixel Tracking Method for Fiducial MarkersNaoki Shibata
With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to realize this, we devised a connected-component labeling method suitable for GPGPU execution. In order to improve accuracy, we devised a method for detecting 2D positions of the corners of markers in subpixel accuracy. We implemented our method in Java and OpenCL, and we confirmed that the proposed method provides better detection and measurement accuracy, and recognizing from high-resolution images is beneficial for improving accuracy. We also confirmed that our method is more than two times as fast as the existing method with CPU computation.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Intel, Intelligent Systems Lab: Syable View Synthesis WhitepaperAlejandro Franceschi
Intel, Intelligent Systems Lab:
Stable View Synthesis Whitepaper
We present Stable View Synthesis (SVS). Given a set
of source images depicting a scene from freely distributed
viewpoints, SVS synthesizes new views of the scene. The
method operates on a geometric scaffold computed via
structure-from-motion and multi-view stereo. Each point
on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of
this point in the input images.
The core of SVS is view dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view.
The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection.
Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse realworld datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes.
AN ENHANCED SEPARABLE REVERSIBLE DATA HIDING IN ENCRYPTED IMAGES USING SIDE M...Editor IJMTER
This paper proposes a scheme for Enhanced Separable Reversible Data Hiding in
Encrypted images Using Side Match. In the first step the original image is encrypted using an
encryption key. Then additional data is embedded into the image by modifying a small portion of the
encrypted image using a data hiding key. With an encrypted image containing additional data, if a
receiver has the data hiding key, he can extract the additional data. If the receiver has the encryption
key, he can decrypt the image, but cannot extract the additional data. If the receiver has both the data
hiding key and encryption key, he can extract the additional data and recover the original content by
exploiting the spatial correlation in natural images. The accuracy of data extraction is improved by
using a better scheme for measuring the smoothness of the received image, and uses the Side Match
scheme to further decrease the error rate of extracted bits.
Robust foreground modelling to segment and detect multiple moving objects in ...IJECEIAES
Last decade has witnessed an ever increasing number of video surveillance installa- tions due to the rise of security concerns worldwide. With this comes the need for video analysis for fraud detection, crime investigation, traffic monitoring to name a few. For any kind of video analysis application, detection of moving objects in videos is a fundamental step. In this paper, an efficient foreground modelling method to segment multiple moving objects is implemented. Proposed method significantly reduces noise thereby accurately segmenting region of interest under dynamic conditions while handling occlusion to a large extent. Extensive performance analysis shows that the proposed method was found to give far better results when compared to the de facto standard as well as relatively new approaches used for moving object detection.
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesAlejandro Franceschi
This document explains how the technology involved, can semantically replace moving objects, humans, and other such visual input, and transform it, using mixed reality, into whatever the viewer would prefer the real world looks like instead.
From videogames to movies, to education, healthcare, commerce, communications, and industrial solutions, this will radically change the way we interact with the world, with others, and ourselves.
ICVG: PRACTICAL CONSTRUCTIVE VOLUME GEOMETRY FOR INDIRECT VISUALIZATIONijcga
The task of creating detailed three dimensional virtual worlds for interactive entertainment software can be simplified by using Constructive Solid Geometry (CSG) techniques. CSG allows artists to combine primitive shapes, visualized through polygons, into complex and believable scenery. Constructive Volume Geometry (CVG) is a super-set of CSG that operates on volumetric data, which consists of values recorded at constant intervals in three dimensions of space. To allow volumetric data to be integrated into existing frameworks, indirect visualization is performed by constructing and visualizing polygon meshes corresponding to the implicit surfaces in the volumetric data. The Indirect CVG (ICVG) algebra, which provides constructive volume geometry operators appropriate to volumetric data that will be indirectly visualized is introduced. ICVG includes operations analogous to the union, difference, and intersection operators in the standard CVG algebra, as well as new operations. Additionally, a series of volumetric primitives well suited to indirect visualization is defined.
ICVG : Practical Constructive Volume Geometry for Indirect Visualization ijcga
The task of creating detailed three dimensional virtual worlds for interactive entertainment software can be simplified by using Constructive Solid Geometry (CSG) techniques. CSG allows artists to combine primitive shapes, visualized through polygons, into complex and believable scenery. Constructive Volume Geometry (CVG) is a super-set of CSG that operates on volumetric data, which consists of values recorded at constant intervals in three dimensions of space. To allow volumetric data to be integrated into existing frameworks, indirect visualization is performed by constructing and visualizing polygon meshes
corresponding to the implicit surfaces in the volumetric data. The Indirect CVG (ICVG) algebra, which
provides constructive volume geometry operators appropriate to volumetric data that will be indirectly visualized is introduced. ICVG includes operations analogous to the union, difference, and intersection operators in the standard CVG algebra, as well as new oper
COLOR IMAGE ENCRYPTION BASED ON MULTIPLE CHAOTIC SYSTEMSIJNSA Journal
This paper proposed a novel color image encryption scheme based on multiple chaotic systems. The
ergodicity property of chaotic system is utilized to perform the permutation process; a substitution
operation is applied to achieve the diffusion effect. In permutation stage, the 3D color plain-image matrix
is converted to a 2D image matrix, then two generalized Arnold maps are employed to generate hybrid
chaotic sequences which are dependent on the plain-image’s content. The generated chaotic sequences are
then applied to perform the permutation process. The encryption’s key streams not only depend on the
cipher keys but also depend on plain-image and therefore can resist chosen-plaintext attack as well as
known-plaintext attack. In the diffusion stage, four pseudo-random gray value sequences are generated by
another generalized Arnold map. The gray value sequences are applied to perform the diffusion process by
bitxoring operation with the permuted image row-by-row or column-by-column to improve the encryption
rate. The security and performance analysis have been performed, including key space analysis, histogram
analysis, correlation analysis, information entropy analysis, key sensitivity analysis, differential analysis
etc. The experimental results show that the proposed image encryption scheme is highly secure thanks to its
large key space and efficient permutation-substitution operation, and therefore it is suitable for practical
image and video encryption
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...VLSICS Design
This paper presents the architecture and VHDL design of a Two Dimensional Discrete Cosine Transform (2D-DCT) with Quantization and zigzag arrangement. This architecture is used as the core and path in JPEG image compression hardware. The 2D- DCT calculation is made using the 2D- DCT Separability property, such that the whole architecture is divided into two 1D-DCT calculations by using a transpose buffer. Architecture for Quantization and zigzag process is also described in this paper. The quantization process is done using division operation. This design aimed to be implemented in Spartan-3E XC3S500 FPGA. The 2D- DCT architecture uses 1891 Slices, 51I/O pins, and 8 multipliers of one Xilinx Spartan-3E XC3S500E FPGA reaches an operating frequency of 101.35 MHz One input block with 8 x 8 elements of 8 bits each is processed in 6604 ns and pipeline latency is 140 clock cycles .
Intel Intelligent Systems Labs:
Enhancing Photorealism Enhancement
Abstract:
We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...Vladimir Kanchev
This presentation provides detailed description of the methodology of the segmentation method of brain tissues in MR image sequences using 2D histogram matching.
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
Given a video of 푀 frames of size ℎ × 푤. Background components of a video are the elements matrix which relative constant over 푀 frames. In PCA (principal component analysis) method these elements are referred as “principal components”. In video processing, background subtraction means excision of background component from the video. PCA method is used to get the background component. This method transforms 3 dimensions video (ℎ × 푤 × 푀) into 2 dimensions one (푁 × 푀), where 푁 is a linear array of size ℎ × 푤 . The principal components are the dominant eigenvectors which are the basis of an eigenspace. The limited memory block Krylov subspace optimization then is proposed to improve performance the computation. Background estimation is obtained as the projection each input image (the first frame at each sequence image) onto space expanded principal component. The procedure was run for the standard dataset namely SBI (Scene Background Initialization) dataset consisting of 8 videos with interval resolution [146 150, 352 240], total frame [258,500]. The performances are shown with 8 metrics, especially (in average for 8 videos) percentage of error pixels (0.24%), the percentage of clustered error pixels (0.21%), multiscale structural similarity index (0.88 form maximum 1), and running time (61.68 seconds).
Enhanced target tracking based on mean shift algorithm for satellite imageryeSAT Journals
Abstract Target tracking in high resolution satellite images is challenging task for computer vision field. In this paper we have proposed a mean shift algorithm based enhanced target tracking system for high resolution satellite imagery. In proposed tracking algorithm, Target modeling is done using spectral features of target object i.e. Mean & Energy density function. Feature Vector space with minimum Euclidean Distance is used for predicting next possible position of target object in consecutive frames. Proposed tracking algorithm has been tested using two high resolution databases i.e. Harbor & Airport region database acquired by WorldView-2 satellite at different times. Recall, Precision & F1 score etc. performance parameters are also calculated for showing the tracking ability of the proposed method in real-time applications and are compared with the results of Regional Operator Design based tracking algorithm proposed in [1]. The results show that our proposed method gives relatively better performance than the other tracking algorithms used in satellite imagery. Keywords- Target tracking, Mean shift algorithm, Energy density function, Feature Vector Space, Frame
NMS and Thresholding Architecture used for FPGA based Canny Edge Detector for...idescitation
In this paper, an architecture designed for Non-
Maximal Suppression used in Canny edge detection algorithm
is presented in order to reduce memory requirements
significantly. The architecture also achieves decreased latency
and increased throughput with no loss in edge detection. The
new algorithm used has a low-complexity 8-bin non-uniform
gradient magnitude histogram to compute block-based
hysteresis thresholds that are used by the Canny edge detector.
Furthermore, the hardware architecture of the proposed
algorithm is presented in this paper and the architecture is
synthesized on the Xilinx Virtex 5 FPGA. The design
development is done in VHDL and simulated results are
obtained using modelsim 6.3 with Xilinx 12.2.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Fast Full Search for Block Matching Algorithmsijsrd.com
This project introduces configurable motion estimation architecture for a wide range of fast block-matching algorithms (BMAs). Contemporary motion estimation architectures are either too rigid for multiple BMAs or the flexibility in them is implemented at the cost of reduced performance. In block-based motion estimation, a block-matching algorithm (BMA) searches for the best matching block for the current macro block from the reference frame. During the searching procedure, the checking point yielding the minimum block distortion (MBD) determines the displacement of the best matching block.
EFFICIENT IMAGE COMPRESSION USING LAPLACIAN PYRAMIDAL FILTERS FOR EDGE IMAGESijcnac
This project presents a new image compression technique for the coding of retinal and
fingerprint images. Retinal images are used to detect diseases like diabetes or
hypertension. Fingerprint images are used for the security purpose. In this work, the
contourlet transform of the retinal and fingerprint image is taken first. The coefficients of
the contourlet transform are quantized using adaptive multistage vector quantization
scheme. The number of code vectors in the adaptive vector quantization scheme depends
on the dynamic range of the input image.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Novel Graph Representation for Skeleton-based Action Recognitionsipij
Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so that it can effectively capture the features of related nodes
and improve the performance of model. More attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges with the existing methods based on GCNs.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Mixed Reality: Pose Aware Object Replacement for Alternate RealitiesAlejandro Franceschi
This document explains how the technology involved, can semantically replace moving objects, humans, and other such visual input, and transform it, using mixed reality, into whatever the viewer would prefer the real world looks like instead.
From videogames to movies, to education, healthcare, commerce, communications, and industrial solutions, this will radically change the way we interact with the world, with others, and ourselves.
ICVG: PRACTICAL CONSTRUCTIVE VOLUME GEOMETRY FOR INDIRECT VISUALIZATIONijcga
The task of creating detailed three dimensional virtual worlds for interactive entertainment software can be simplified by using Constructive Solid Geometry (CSG) techniques. CSG allows artists to combine primitive shapes, visualized through polygons, into complex and believable scenery. Constructive Volume Geometry (CVG) is a super-set of CSG that operates on volumetric data, which consists of values recorded at constant intervals in three dimensions of space. To allow volumetric data to be integrated into existing frameworks, indirect visualization is performed by constructing and visualizing polygon meshes corresponding to the implicit surfaces in the volumetric data. The Indirect CVG (ICVG) algebra, which provides constructive volume geometry operators appropriate to volumetric data that will be indirectly visualized is introduced. ICVG includes operations analogous to the union, difference, and intersection operators in the standard CVG algebra, as well as new operations. Additionally, a series of volumetric primitives well suited to indirect visualization is defined.
ICVG : Practical Constructive Volume Geometry for Indirect Visualization ijcga
The task of creating detailed three dimensional virtual worlds for interactive entertainment software can be simplified by using Constructive Solid Geometry (CSG) techniques. CSG allows artists to combine primitive shapes, visualized through polygons, into complex and believable scenery. Constructive Volume Geometry (CVG) is a super-set of CSG that operates on volumetric data, which consists of values recorded at constant intervals in three dimensions of space. To allow volumetric data to be integrated into existing frameworks, indirect visualization is performed by constructing and visualizing polygon meshes
corresponding to the implicit surfaces in the volumetric data. The Indirect CVG (ICVG) algebra, which
provides constructive volume geometry operators appropriate to volumetric data that will be indirectly visualized is introduced. ICVG includes operations analogous to the union, difference, and intersection operators in the standard CVG algebra, as well as new oper
COLOR IMAGE ENCRYPTION BASED ON MULTIPLE CHAOTIC SYSTEMSIJNSA Journal
This paper proposed a novel color image encryption scheme based on multiple chaotic systems. The
ergodicity property of chaotic system is utilized to perform the permutation process; a substitution
operation is applied to achieve the diffusion effect. In permutation stage, the 3D color plain-image matrix
is converted to a 2D image matrix, then two generalized Arnold maps are employed to generate hybrid
chaotic sequences which are dependent on the plain-image’s content. The generated chaotic sequences are
then applied to perform the permutation process. The encryption’s key streams not only depend on the
cipher keys but also depend on plain-image and therefore can resist chosen-plaintext attack as well as
known-plaintext attack. In the diffusion stage, four pseudo-random gray value sequences are generated by
another generalized Arnold map. The gray value sequences are applied to perform the diffusion process by
bitxoring operation with the permuted image row-by-row or column-by-column to improve the encryption
rate. The security and performance analysis have been performed, including key space analysis, histogram
analysis, correlation analysis, information entropy analysis, key sensitivity analysis, differential analysis
etc. The experimental results show that the proposed image encryption scheme is highly secure thanks to its
large key space and efficient permutation-substitution operation, and therefore it is suitable for practical
image and video encryption
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...VLSICS Design
This paper presents the architecture and VHDL design of a Two Dimensional Discrete Cosine Transform (2D-DCT) with Quantization and zigzag arrangement. This architecture is used as the core and path in JPEG image compression hardware. The 2D- DCT calculation is made using the 2D- DCT Separability property, such that the whole architecture is divided into two 1D-DCT calculations by using a transpose buffer. Architecture for Quantization and zigzag process is also described in this paper. The quantization process is done using division operation. This design aimed to be implemented in Spartan-3E XC3S500 FPGA. The 2D- DCT architecture uses 1891 Slices, 51I/O pins, and 8 multipliers of one Xilinx Spartan-3E XC3S500E FPGA reaches an operating frequency of 101.35 MHz One input block with 8 x 8 elements of 8 bits each is processed in 6604 ns and pipeline latency is 140 clock cycles .
Intel Intelligent Systems Labs:
Enhancing Photorealism Enhancement
Abstract:
We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...Vladimir Kanchev
This presentation provides detailed description of the methodology of the segmentation method of brain tissues in MR image sequences using 2D histogram matching.
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
Given a video of 푀 frames of size ℎ × 푤. Background components of a video are the elements matrix which relative constant over 푀 frames. In PCA (principal component analysis) method these elements are referred as “principal components”. In video processing, background subtraction means excision of background component from the video. PCA method is used to get the background component. This method transforms 3 dimensions video (ℎ × 푤 × 푀) into 2 dimensions one (푁 × 푀), where 푁 is a linear array of size ℎ × 푤 . The principal components are the dominant eigenvectors which are the basis of an eigenspace. The limited memory block Krylov subspace optimization then is proposed to improve performance the computation. Background estimation is obtained as the projection each input image (the first frame at each sequence image) onto space expanded principal component. The procedure was run for the standard dataset namely SBI (Scene Background Initialization) dataset consisting of 8 videos with interval resolution [146 150, 352 240], total frame [258,500]. The performances are shown with 8 metrics, especially (in average for 8 videos) percentage of error pixels (0.24%), the percentage of clustered error pixels (0.21%), multiscale structural similarity index (0.88 form maximum 1), and running time (61.68 seconds).
Enhanced target tracking based on mean shift algorithm for satellite imageryeSAT Journals
Abstract Target tracking in high resolution satellite images is challenging task for computer vision field. In this paper we have proposed a mean shift algorithm based enhanced target tracking system for high resolution satellite imagery. In proposed tracking algorithm, Target modeling is done using spectral features of target object i.e. Mean & Energy density function. Feature Vector space with minimum Euclidean Distance is used for predicting next possible position of target object in consecutive frames. Proposed tracking algorithm has been tested using two high resolution databases i.e. Harbor & Airport region database acquired by WorldView-2 satellite at different times. Recall, Precision & F1 score etc. performance parameters are also calculated for showing the tracking ability of the proposed method in real-time applications and are compared with the results of Regional Operator Design based tracking algorithm proposed in [1]. The results show that our proposed method gives relatively better performance than the other tracking algorithms used in satellite imagery. Keywords- Target tracking, Mean shift algorithm, Energy density function, Feature Vector Space, Frame
NMS and Thresholding Architecture used for FPGA based Canny Edge Detector for...idescitation
In this paper, an architecture designed for Non-
Maximal Suppression used in Canny edge detection algorithm
is presented in order to reduce memory requirements
significantly. The architecture also achieves decreased latency
and increased throughput with no loss in edge detection. The
new algorithm used has a low-complexity 8-bin non-uniform
gradient magnitude histogram to compute block-based
hysteresis thresholds that are used by the Canny edge detector.
Furthermore, the hardware architecture of the proposed
algorithm is presented in this paper and the architecture is
synthesized on the Xilinx Virtex 5 FPGA. The design
development is done in VHDL and simulated results are
obtained using modelsim 6.3 with Xilinx 12.2.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Fast Full Search for Block Matching Algorithmsijsrd.com
This project introduces configurable motion estimation architecture for a wide range of fast block-matching algorithms (BMAs). Contemporary motion estimation architectures are either too rigid for multiple BMAs or the flexibility in them is implemented at the cost of reduced performance. In block-based motion estimation, a block-matching algorithm (BMA) searches for the best matching block for the current macro block from the reference frame. During the searching procedure, the checking point yielding the minimum block distortion (MBD) determines the displacement of the best matching block.
EFFICIENT IMAGE COMPRESSION USING LAPLACIAN PYRAMIDAL FILTERS FOR EDGE IMAGESijcnac
This project presents a new image compression technique for the coding of retinal and
fingerprint images. Retinal images are used to detect diseases like diabetes or
hypertension. Fingerprint images are used for the security purpose. In this work, the
contourlet transform of the retinal and fingerprint image is taken first. The coefficients of
the contourlet transform are quantized using adaptive multistage vector quantization
scheme. The number of code vectors in the adaptive vector quantization scheme depends
on the dynamic range of the input image.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Novel Graph Representation for Skeleton-based Action Recognitionsipij
Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so that it can effectively capture the features of related nodes
and improve the performance of model. More attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges with the existing methods based on GCNs.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
This paper exposes the use of recent deep learning techniques in the state of the art, little addressed in robotic applications, where a new algorithm based on Faster R-CNN and CNN regression is exposed. The machine vision systems implemented, tend to require multiple stages to locate an object and allow a robot to take it, increasing the noise in the system and the processing times. The convolutional networks based on regions allow one to solve this problem, it is used for it two convolutional architectures, one for classification and location of three types of objects and one to determine the grip angle for a robotic gripper. Under the establish virtual environment, the grip algorithm works up to 5 frames per second with a 100% object classification, and with the implementation of the Faster R-CNN, it allows obtain 100% accuracy in the classifications of the test database, and over a 97% of average precision locating the generated boxes in each element, gripping successfully the objects.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Intra Block and Inter Block Neighboring Joint Density Based Approach for Jpeg...ijsc
Steganalysis is the method used to detect the presence of any hidden message in a cover medium. A novel approach based on feature mining on the discrete cosine transform (DCT) domain based approach, machine learning for steganalysis of JPEG images is proposed. The neighboring joint density on both intra-block and inter-block are extracted from the DCT coefficient array. After the feature space has been constructed, it uses SVM like binary classifier for training and classification. The performance of the proposed method on different Steganographic systems named F5, Pixel Value Differencing, Model Based Steganography with and without deblocking, JPHS, Steghide etc are analyzed. Individually each feature and combined features classification accuracy is checked and concludes which provides better classification.
INTRA BLOCK AND INTER BLOCK NEIGHBORING JOINT DENSITY BASED APPROACH FOR JPEG...ijsc
Steganalysis is the method used to detect the presence of any hidden message in a cover medium. A novel
approach based on feature mining on the discrete cosine transform (DCT) domain based approach,
machine learning for steganalysis of JPEG images is proposed. The neighboring joint density on both
intra-block and inter-block are extracted from the DCT coefficient array. After the feature space has been
constructed, it uses SVM like binary classifier for training and classification. The performance of the
proposed method on different Steganographic systems named F5, Pixel Value Differencing, Model Based
Steganography with and without deblocking, JPHS, Steghide etc are analyzed. Individually each feature
and combined features classification accuracy is checked and concludes which provides better
classification.
Node classification with graph neural network based centrality measures and f...IJECEIAES
Graph neural networks (GNNs) are a new topic of research in data science where data structure graphs are used as important components for developing and training neural networks. GNN always learns the weight importance of the neighbor for perform message aggregation in which the feature vectors of all neighbors are aggregated without considering whether the features are useful or not. Using such more informative features positively affect the performance of the GNN model. So, in this paper i) after selecting a subset of features to define important node features, we present new graph features’ explanation methods based on graph centrality measures to capture rich information and determine the most important node in a network. Through our experiments, we find that selecting certain subsets of these features and adding other features based on centrality measure can lead to better performance across a variety of datasets and ii) we introduce a major design strategy for graph neural networks. Specifically, we suggest using batch renormalization as normalization over GNN layers. Combining these techniques, representing features based on centrality measures that passed to multilayer perceptron (MLP) layer which is then passed to adjusted GNN layer, the proposed model achieves greater accuracy than modern GNN models.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
1. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
DOI: 10.5121/sipij.2020.11605 65
A NOVEL GRAPH REPRESENTATION FOR
SKELETON-BASED ACTION RECOGNITION
Tingwei Li1
, Ruiwen Zhang2
and Qing Li3
1,3
Department of Automation Tsinghua University, Beijing, China
2
Department of Computer Science Tsinghua University, Beijing, China
ABSTRACT
Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so
that it can effectively capture the features of related nodes and improve the performance of model. More
attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges
with the existing methods based on GCNs. First, the consistency of temporal and spatial features is ignored
due to extracting features node by node and frame by frame. We design a generic representation of
skeleton sequences for action recognition and propose a novel model called Temporal Graph Networks
(TGN), which can obtain spatiotemporal features simultaneously. Secondly, the adjacency matrix of graph
describing the relation of joints are mostly depended on the physical connection between joints. We
propose a multi-scale graph strategy to appropriately describe the relations between joints in skeleton
graph, which adopts a full-scale graph, part-scale graph and core-scale graph to capture the local features
of each joint and the contour features of important joints. Extensive experiments are conducted on two
large datasets including NTU RGB+D and Kinetics Skeleton. And the experiments results show that TGN
with our graph strategy outperforms other state-of-the-art methods.
KEYWORDS
Skeleton-based action recognition, Graph convolutional network, Multi-scale graphs
1. INTRODUCTION
Human action recognition is a meaningful and challenging task. It has widespread potential
applications, including health care, human-computer interaction and autonomous driving. At
present, skeleton data is more often used for action recognition because skeleton data is robust to
the noise of background and different view points compared to video data. Skeleton-based action
recognition are mainly based on deep learning methods like Recurrent Neural Networks (RNNs),
Convolutional Neural Networks (CNNs) and GCNs [3, 8, 10,12, 13, 15, 17, 18]. RNNs and
CNNs generally process the skeleton data into vector sequence and image respectively. These
representing methods cannot fully express the dependencies between correlated joints. With more
researches on GCN, [12] first employs GCN in skeleton-based action recognition and inspires a
lot of new researches [7, 10, 15, 17, 18].
The key of action recognition based on GCN is to obtain the temporal and spatial features of an
action sequence through graph [7, 10, 18]. In the skeleton graph, skeleton joints transfer into node
and the relations between joints are represented by edges. As shown in Fig. 1, in most previous
work, there are more than one graphs, a node only contains spatial features. In this case, GCN
extracts spatial features frame by frame, then Temporal Convolutional Network (TCN)
extractstemporal features node by node. But, features of a joint in an action is not only related to
other joints intra frames but also joints inter frames. As a result, existing methods split this
consistency of spatiotemporal features. To solve this problem, we propose TGN to capture
2. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
66
spatiotemporal features simultaneously, as shown in Fig. 2, each node composes a joint of all
frames and contains both spatial and temporal feature in the graph, thus TGN obtains
spatiotemporal features by processing all frames of each joint simultaneously.
Besides, the edges of skeleton graph mainly depend on the adjacency matrix A, which is related
to the physical connections of joints [12,17,18]. GCN still have no effective adaptive graph
mechanism to establish a global connection through the physical relations of nodes, such as a
relation between head and toes. GCN can only obtain local features, such as a relation between
head and neck. In this paper, a multi-scale graph strategy is proposed, which adopts different
scale graphs in different network branches, as a result, physically unconnected information is
added to help network capture the local features of each joint and the contour features of
important joints.
The main contributions of this paper are summarized in three aspects:
(1) Temporal Graph Network (TGN) proposed in this paper is a feature extractor which has the
ability to obtain spatiotemporal features simultaneously. Besides, this feature extractor can
adapt to most skeleton-based action recognition models based on GCN and help to improve
the performance.
(2) A multi-scale graph strategy is proposed for optimization of graph, which can extract
different scale spatial features. This strategy can capture not only the local features but the
global features.
(3) Multi-scale Temporal Graph Network (MS-TGN) is proposed based on TGN and the multi-
scale graph strategy. Extensive experiments are conducted on two datasets including NTU
RGB+D and Kinetics Skeleton, and our MS-TGN out performs state-of-the-art methods on
these datasets for skeleton-based action recognition.
2. RELATED WORKS
2.1. Skeleton-Based Action Recognition
The handcrafted features are usually used to model the human skeleton data for action
recognition in conventional methods. However, the performance of these conventional methods is
barely satisfactory. With the development of deep learning, neural networks have become the
main methods, including RNNs and CNNs. RNN-based methods, such as LSTM and GRU, are
usually used to model the skeleton data as a sequence of the coordinate vectors each represents a
human body joint [3, 9, 11,13]. The 3D coordinates of all joints in a frame are concatenated in
some order to be the input vector. CNN-based methods model the skeleton data as a image where
a node can be regarded as a pixel [2, 7, 8]. Some works transform a skeleton sequence to an
image by treating the joint coordinate (x,y,z) as the R, G, and B channels of a pixel. Because the
skeleton data are naturally embedded in the form of graphs rather than a sequence or grids, both
RNNs and CNNs cannot fully represent the structure of the skeleton data. Since ST-GCN [12]
proposed in 2018, a series of methods for skeleton-based action recognition are based on GCN.
GCN has been proven to be effective for processing structured data, have also been used to model
the skeleton data.
2.2. Graph Convolutional Networks
ST-GCN [12] introduces GCN into skeleton-based action recognition, constructs the
spatiotemporal graph by natural connection of human joints, this method designs three different
adjacency matrix to extract spatial and temporal features and achieves better performance than
3. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
67
previous methods. 2s-AGCN [18] proposes a two-streams method and designs an adaptive
adjacency matrix. The adaptive adjacency matrix can not only present the information of human
body structure, but be learned using attention mechanism. DGNN [28] designs a directed graph
structure which learns the information between the joint and the bone in GCN. SGN [29]
explicitly introduces the high level semantics of joints, including the joint type and frame index,
into the network to enhance the feature representation capability. These approaches treat each
joint as a node of the graph, and the edge denoting the joint relationship is pre-defined by human
based on prior knowledge. However, the spatial and temporal features of actions are learned by
GCN and TCN, respectively, which makes the network less efficient.
3. METHODOLOGY
3.1. Temporal Graph Networks
The skeleton data of an action can be described as 𝑋 = {𝑥 𝑐,𝑣,𝑡} 𝐶×𝑉×𝑇
, where𝑇is the number
offrames, 𝑉 is number of joints in a frame, 𝐶 is number of channels in a joint and𝑥 𝑐,𝑣,𝑡represents
the skeleton data of joint 𝑣 in the frame t with 𝑐channels.𝑋𝑖is the sequence 𝑖.Previous methods
construct 𝑇 graphs and each graph has𝑉 nodes, as shown in Fig. 1, where the node set𝑁 =
{𝑥 𝑣,𝑡|, 𝑣 = 1,2, … 𝑉, 𝑡 = 1,2 … 𝑇} 𝐶
has joint 𝑣in𝑡𝑡ℎ frame. It means a frame is represented as one
graph and there are totally 𝑇 graphs. The size feature of a nodeis 𝐶. GCN is used to obtain spatial
features from each graph, then outputs of GCN are fed into TCN to extract temporal features.
Figure 1. A node in any graph represents data of a joint in a certain frame. GCN extracts spatial features
frame by frame, and TCN extracts temporal features node by node.
We redefine graph and propose TGN. Compared to 𝑇 graphs, we only have one graph with 𝑉
nodes, as seen in Fig. 2.In the graph, the node set𝑁 = {𝑥 𝑣|𝑣 = 1,2 … 𝑇} 𝐶×𝑉 has the joint 𝑣in all
frame. The size of feature of a node is 𝐶 × 𝑉. Compared with series methods using GCN and
TCN alternately, we only use one GCN block to realize the extraction of spatiotemporal features.
Figure 2. Each node represents data of a joint in all frames so temporal information is also contained.
Compared with Fig1, TGN extracts temporal and spatial features simultaneously.
4. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
68
In a basic TGN block, each node already has temporal information and spatial information,
therefore, in the process of graph convolution, temporal and spatial features can be calculated
simultaneously. The output value for a single channel at the spatial location 𝑁𝑣 can be written as
Eq. 2.
𝑆 𝑣 = {𝑛𝑗|𝑛𝑗 ∈ 𝑆( 𝑁𝑣,ℎ)} (1)
𝐹𝑜(𝑁𝑣) = ∑(𝐹𝑖(𝑆 𝑣(𝑗)) × 𝑤(𝑗)
𝑘
𝑗=0
)
(2)
Where 𝑁𝑣is node 𝑣. 𝑠( 𝑁𝑣,ℎ)is a sampling functionused to find node set𝑛𝑗 adjacent to the
node𝑁𝑣,𝐹𝑖 maps nodes to feature vector, 𝑤(ℎ)is weights of CNNwhose kernel size is
1×t,𝐹𝑜( 𝑁𝑣)is output of 𝑁𝑣.Eq.2 is a general formula among most GCN-based models of action
recognition, as it was used to extract spatial features in a graph. our method can be adapted to
existing methods by changing graph structure of this methods.
3.2. Multi-Scale Graph Strategy
(a) (b) (c)
Figure 3. (a) full-scale graph, (b) part-scale graph, (c) core-scale graph
Dilated convolution can obtain ignored features such as features between unconnected points in
an image by over step convolution. Inspired of it, we select different expressive joints to form
different scale graphs for convolution. Temporal features of joints with larger motion space are
more expressive. Joints in a body generally have different relative motion space. For example, the
elbow and knee can move in larger space compared to the surrounding joints like shoulder and
span. In the small-scale graph, there are less but more expressive nodes so there is a larger
receptive field. In large-scale graph, there are more but less expressive nodes so the receptive
field is smaller.
We design three different scale graphs based on NTU-RGB+D datasets, as shown in Fig. 3. Full-
scale graph in Fig. 3(a) has all 25 nodes and can obtain local features of each joint for its small
receptive field. Part-scale graph in Fig. 3(b) is represented by only 11 nodes. In this case,
receptive field becomes larger so it tends to capture contour information. Fig. 3(c) is core-scale
graph with only seven nodes. It has largest convolution receptive field, although it ignores the
internal state of the limbs, it can connect the left and right limbs directly, so the global
information can be obtained. Through different scale graphs, GCN can capture local features,
contour features and global features respectively.
5. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
69
Based on the former, we combine TGN with a multi-scale graph strategy to get the final model
MS-TGN, as shown in Fig. 4.The model consists of 10 TGN layers, each layer uses 3 × 1
convolutional kernel to extract the temporal features of each node, and a fully-connected layer to
classify based on the extracted feature.
Figure 4. The network architecture of MS-TGN.Multi-scale graphs are displayed by different adjacency
matrices𝐴1, 𝐴2 and 𝐴3.
4. EXPERIMENT
4.1. Datasets and Implementation Details
NTU RGB+D. This dataset is large and widely used. It contains 3D skeleton data collected by
Microsoft’s kinetics V2 [16] and has 60 classes of actions and 56,000 action sequences, with 40
subjects are photographed by three cameras fixed at 0◦, 45◦ and 45◦, respectively. Each sequence
has several frames and each frame is composed of 25 joints. We adopt the same method [16] to
carry out the cross-view (cv) and cross-subject (cs) experiments. In the cross-view experiments,
the training data is 37,920 action sequences with the view at 45◦ and 0◦, and the test data is
18,960 action sequences with the view at 45◦ in the cross-subject experiments, the training data is
action sequences performed by 28 subjects, and the test data contains 16,560 action sequences
performed by others. We use the top-1 accuracy for evaluation.
Kinetics Skeleton. Kinetics [6] is a video action recognition dataset obtained from thevideo on
YouTube. Kinetics Skeleton employs Open Pose [23] estimation toolbox to detect 18 joints of a
skeleton. It contains 400 kinds of actions and 260,232 sequences. The training data consists of
240,436 action sequences, and the test data is the remaining 19,796sequences. We use the top-1
and top-5 accuracies for evaluation.
Unless otherwise stated, all models proposed employ strategies following. The number of
channels is 64 in the first four layers, 128 in the middle three layers and 256 in the last three
layers. SGD optimizer with Nesterov accelerated gradient is used for gradient descent and
different learning rate adjustment strategies are designed for different data. The mini-batch size is
32 and the momentum is set to 0.9. All skeleton sequences are padded to T = 300 frames by
replaying the actions. Inputs are processed with normalization and translation as [18].
4.2.ABLATION EXPERIMENTS
4.2.1. Feature Extractor: TGN
ST-GCN [12] and 2s-AGCN [18] are representative models utilizing GCN and TCN alternatively
and were chosen as baselines. We replace GCN&TCN with TGN in these two models and keep
6. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
70
adjacency matrix construction strategies unchanged. The experimental results on the two datasets
are listed in Table 1 and Table 2. From Table 1, the original performance of ST-GCN increases to
82.3% and 90.8% on X-sub and X-view, and the accuracy of 2s-AGCN increases by 0.55% on
average. From Table 2, accuracies of two models are both improved. In conclusion, TGN is
sufficiently flexible to be used as a feature extractor and performs better than methods based on
GCN & TCN.
Table 1. Effectiveness of our TGN module on NTURGB+D dataset in terms of accuracy.
Model TGN X-sub(%) X-view(%)
ST-GCN[12]
81.6 88.8
√ 82.3 90.8
2s-AGCN[18]
88.5 95.1
√ 89.0 95.4
Js-Ours
86.0 93.7
√ 86.6 94.1
Bs-Ours
86.9 93.2
√ 87.5 93.9
Table 2. Effectiveness of our TGN moduleon Kinetics dataset in terms of accuracy.
Model TGN Top-1(%) Top-5(%)
ST-GCN[12]
30.7 52.8
√ 31.5 54.0
2s-AGCN[18]
36.1 58.7
√ 36.7 59.5
Js-Ours
35.0 93.7
√ 35.2 94.1
Bs-Ours
33.0 55.7
√ 33.3 56.2
4.2.2. Multi-Scale Graph Strategy
Based on section 4.2.1, we construct the full-scale graph containing all joints of the original data,
and the part-scale graph containing 11 joints in NTU+RGB-D dataset. In Kinetics Skeleton
dataset, the full-scale graph contains all nodes and the part-scale graph contains 11 joints. We
evaluate each scale graph, and Table 3 and Table 4 show the experimental results on the two
datasets. In detail, adding a part-scale graph increases accuracy by 1.5% and 0.45%, respectively,
adding a core-scale graph increases accuracy by 0.3% and 0.2%, respectively. A core-scale graph
provides the global features of the whole body, a part-scale graph provides the contour features of
the body part, and a full-scale graph provides the local features of each joint. By feature fusion,
the model obtains richer information and performs better.
Table 3. Effectiveness of Multi-scale Graph on NTU RGB+D dataset in terms of accuracy.
Full-Scale Part-Scale Core-Scale X-sub(%) X-view(%)
√ 89.0 95.2
√ 86.0 94.0
√ 85.6 93.3
√ √ 89.2 95.7
√ √ √ 89.5 95.9
7. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
71
Table 4. Effectiveness of Multi-scale Graph on Kinetics dataset in terms of accuracy.
Full-Scale Part-Scale Core-Scale Top-1(%) Top-5(%)
√ 36.6 59.5
√ 35.0 55.9
√ 33.8 54.6
√ √ 36.9 59.9
√ √ √ 37.3 60.2
4.3. Comparison With State-of-the-Art Methods
We compare MS-TGN with the state-of-the-art skeleton-based action recognition methods on
both the NTU RGB+D dataset and the Kinetics-Skeleton dataset. The results of NTU RGB+D are
shown in Table 5. Our model performs the best in cross-view experiment on NTU RGB+D, and
has the highest top-1 accuracy on Kinetics Skeleton dataset as listed in Table 6.
Table 5. Performance comparisons on NTU RGB+D dataset with the CSand CV settings.
Method Year X-sub(%) X-view(%)
HBRNN-L[7] 2015 59.1 64.0
PA LSTM[16] 2016 62.9 70.3
STA-LSTM[21] 2017 73.4 81.2
GCA-LSTM[22] 2017 74.4 82.8
ST-GCN[12] 2018 81.5 88.3
DPRL+GCNN[25] 2018 83.5 89.8
SR-TSL[19] 2018 84.8 92.4
AS-GCN[10] 2019 86.8 94.2
2s-AGCN[18] 2019 88.5 95.1
VA-CNN[26] 2019 88.7 94.3
SGN[27] 2020 89.0 94.5
MS-TGN(ours) - 89.5 95.9
Table 6. Performance comparisons on Kinetics dataset with SOTA methods.
Method Year Top-1(%) Top-5(%)
PA LSTM[16] 2016 16.4 35.3
TCN[12] 2017 20.3 40.0
ST-GCN[12] 2018 30.7 52.8
AS-GCN[10] 2019 34.8 56.6
2s-AGCN[18] 2019 36.1 58.7
NAS[15] 2020 37.1 60.1
MS-TGN(ours) - 37.3 60.2
Besides, our model reduces the computational work and parameters, as listed in Table 7. It means
our model is simpler and has a better ability for modelling spatial and temporal features. Why
does our model have fewer parameters and calculation but perform better? In our designed graph,
each node denotes all frame data of a joint, which brings two advantages: (1) TGN extractor only
contains GCN without TCN, which reduces the parameters and calculation. (2) Instead of
extracting alternately, TGN can extract spatial and temporal features at the same time to
strengthen consistence of the spatial and temporal features.
8. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
72
Table 7. Comparisons of the cost of computing with state-of-the-arts.The #Params and FLOPs are
calculated by the tools called THOP (PyTorch-OpCounter) [24].
Method Year X-sub(%) #Params(M) #FLOPs(G)
ST-GCN[12] 2018 81.6 3.1 15.2
AS-GCN[10] 2019 86.8 4.3 17.1
2s-AGCN[18] 2019 88.5 3.5 17.4
NAS[15] 2020 89.4 6.6 36.6
MS-TGN(ours) - 89.5 3.0 15.0
5. CONCLUSIONS
The MS-TGN model proposed in this paper mainly has two innovations: one is the TGN model
which extracts the temporal and spatial features at the same time, and the other is the multi-scale
graph strategy to obtain local features and contour features simultaneously. TGN designs a novel
representation of skeleton sequences for action recognition, which can obtain spatiotemporal
features simultaneously. The multi-scale graph strategy can extract global spatial features.On two
published datasets, the proposed MS-TGN achieves the SOTA accuracy with the least parameters
and computation.
ACKNOWLEDGEMENT
This work is sponsored by the National Natural Science Foundation of China No. 61771281, the
"New generation artificial intelligence" major project of China No. 2018AAA0101605, the 2018
Industrial Internet innovation and development project, and Tsinghua University initiative
Scientific Research Program.
REFERENCES
[1] Bruna, Joan, et al. “Spectral Networks and Locally Connected Networks on Graphs.” ICLR 2014 :
International Conference on Learning Representations (ICLR) 2014, 2014.
[2] Du, Yong, et al. “Skeleton Based Action Recognition with Convolutional Neural Network.” 2015 3rd
IAPR Asian Conference on Pattern Recognition (ACPR), 2015, pp. 579–583.
[3] Du, Yong, et al. “Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition.”
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1110–1118.
[4] Feichtenhofer, Christoph, et al. “Convolutional Two-Stream Network Fusion for Video Action
Recognition.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
pp. 1933–1941.
[5] Huang, Zhiwu, et al. “Deep Learning on Lie Groups for Skeleton-Based Action Recognition.” 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1243–1252.
[6] Kay, Will, et al. “The Kinetics Human Action Video Dataset.” ArXiv Preprint ArXiv:1705.06950,
2017.
[7] Ke, Qiuhong, et al. “A New Representation of Skeleton Sequences for 3D Action Recognition.” 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4570–4579.
[8] Kim, Tae Soo, and Austin Reiter. “Interpretable 3D Human Action Analysis with Temporal
Convolutional Networks.” 2017 IEEE Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), 2017, pp. 1623–1631.
[9] Li, Chao, et al. “Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and
Detection with Hierarchical Aggregation.” IJCAI’18 Proceedings of the 27th International Joint
Conference on Artificial Intelligence, 2018, pp. 786–792.
[10] Li, Maosen, et al. “Actional-Structural Graph Convolutional Networks for Skeleton-Based Action
Recognition.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2019, pp. 3595–3603.
9. Signal & Image Processing: An International Journal (SIPIJ) Vol.11, No.6, December 2020
73
[11] Li, Shuai, et al. “Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper
RNN.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 5457–
5466.
[12] Yan, Sijie, et al. “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action
Recognition.” AAAI, 2018, pp. 7444–7452.
[13] Martinez, Julieta, et al. “On Human Motion Prediction Using Recurrent Neural Networks.” 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4674–4683.
[14] Ofli, Ferda, et al. “Sequence of the Most Informative Joints (SMIJ): A New Representation for
Human Skeletal Action Recognition.” 2012 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition Workshops, 2012, pp. 8–13.
[15] Peng, Wei, et al. “Learning Graph Convolutional Network for Skeleton-Based Human Action
Recognition by Neural Searching.” Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 34, no. 3, 2020, pp. 2669–2676.
[16] Shahroudy, Amir, et al. “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis.”
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1010–1019.
[17] Shi, Lei, et al. “Skeleton-Based Action Recognition With Directed Graph Neural Networks.” 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7912–7921.
[18] Shi, Lei, et al. “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action
Recognition.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2019, pp. 12026–12035.
[19] Si, Chenyang, et al. “Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack
Learning.” Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 106–
121.
[20] Si, Chenyang, et al. “An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-
Based Action Recognition.” 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2019, pp. 1227–1236.
[21] Liu, Jun, et al. “Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition.”
European Conference on Computer Vision, 2016, pp. 816–833.
[22] Song, Sijie, et al. “An End-to-End Spatio-Temporal Attention Model for Human Action Recognition
from Skeleton Data.” AAAI’17 Proceedings of the Thirty-First AAAI Conference on Artificial
Intelligence, 2017, pp. 4263–4270.
[23] Cao, Zhe, et al. “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields.”
ArXiv Preprint ArXiv:1812.08008, 2018.
[24] Ligeng Zhu. Thop: Pytorch-opcounter. https://github.com/Lyken17/pytorch-OpCounter.
[25] Tang, Yansong, et al. “Deep Progressive Reinforcement Learning for Skeleton-Based Action
Recognition.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp.
5323–5332.
[26] Zhang, Pengfei, et al. “View Adaptive Neural Networks for High Performance Skeleton-Based
Human Action Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
41, no. 8, 2019, pp. 1963–1978.
[27] Zhang, Pengfei, et al. “Semantics-Guided Neural Networks for Efficient Skeleton-Based Human
Action Recognition.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2020, pp. 1112–1121.
[28] Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graphneural
networks[C]//Proceedings of the IEEE Conference on Computer Vision andPattern Recognition.
2019: 7912-7921.
[29] Zhang P, Lan C, Zeng W, et al. Semantics-Guided Neural Networks for EfficientSkeleton-Based
Human Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 2020: 1112-1121.