The document describes a proposed approach for human action recognition using an attention based spatiotemporal graph convolutional network. The approach uses both temporal and spatial attention modules to select important frames and joints from skeletal data. The temporal attention module identifies informative frames, while the spatial attention module highlights significant joints within selected frames. Both attention modules help capture temporal and spatial relationships in skeletal data to improve action recognition accuracy. The proposed network incorporates temporal and spatial attention mechanisms into a graph convolutional network to efficiently recognize human actions from skeleton sequences.
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...ijsrd.com
Multi-Target Tracking for Sensor Networks using Ant Colony Optimization is proposed in this paper. The proposed approach uses ant colony optimization technique that composed of both dynamic and mobile nodes. While mobile nodes are used for optimizing the target tracking, dynamic nodes ensure the total coverage of the network. As a result, the performance of the Wireless Sensor Networks with multi-target tracking on mobility nodes will improve. While improving the performance of the wireless sensor networks, automatically minimum distance will be travelled by the nodes, thereby decreasing the energy consumption of the mobility nodes. This is achieved by selecting the optimal path by searching each path of the existing network. The software should check whether any chance of collision and if it is happening, then how to avoid it by providing a minimum delay for finding the optimal path. For finding the prediction of each node to reach the optimal path different methods should be applied, like k-means and random selection. The experimental results show that Object tracking with prediction mechanism is much better than without prediction mechanism.
Human action recognition with kinect using a joint motion descriptorSoma Boubou
- We proposed a novel descriptor for motion of skeleton joints.
- Proposed descriptor proved to outperform the state-of-the-art descriptors such as HON4D and the one proposed by Chen et al 2013.
- Our proposed approached proved to be effective for periodic actions (e.g., Waving, Walking, Jogging, Side-Boxing, etc).
- Grouping was effective for actions with unique joints trajectories (e.g., Tennis serving, Side kicking , etc).
- Grouping joints into eight groups is always effective with actions of MSR3D dataset.
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...ijsrd.com
Multi-Target Tracking for Sensor Networks using Ant Colony Optimization is proposed in this paper. The proposed approach uses ant colony optimization technique that composed of both dynamic and mobile nodes. While mobile nodes are used for optimizing the target tracking, dynamic nodes ensure the total coverage of the network. As a result, the performance of the Wireless Sensor Networks with multi-target tracking on mobility nodes will improve. While improving the performance of the wireless sensor networks, automatically minimum distance will be travelled by the nodes, thereby decreasing the energy consumption of the mobility nodes. This is achieved by selecting the optimal path by searching each path of the existing network. The software should check whether any chance of collision and if it is happening, then how to avoid it by providing a minimum delay for finding the optimal path. For finding the prediction of each node to reach the optimal path different methods should be applied, like k-means and random selection. The experimental results show that Object tracking with prediction mechanism is much better than without prediction mechanism.
Human action recognition with kinect using a joint motion descriptorSoma Boubou
- We proposed a novel descriptor for motion of skeleton joints.
- Proposed descriptor proved to outperform the state-of-the-art descriptors such as HON4D and the one proposed by Chen et al 2013.
- Our proposed approached proved to be effective for periodic actions (e.g., Waving, Walking, Jogging, Side-Boxing, etc).
- Grouping was effective for actions with unique joints trajectories (e.g., Tennis serving, Side kicking , etc).
- Grouping joints into eight groups is always effective with actions of MSR3D dataset.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksMichele Weigle
Hady Abdel-Salem's PhD Defense Slides
Department of Computer Science
Old Dominion University
November 1, 2010
Note: You may need to download the file to see all of the animations.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
Recent advances in deep neural networks have been successfully demonstrated with fairly good accuracy for multi-class activity identification. However, existing methods have limitations in achieving complex spatial-temporal dependencies. In this work, we design two stream fusion attention (2SFA) connected to a temporal bidirectional gated recurrent unit (GRU) one-layer model and classified by prediction voting classifier (PVC) to recognize the action in a video. Particularly in the proposed deep neural network (DNN), we present 2SFA for capturing appearance information from red green blue (RGB) and motion from optical flow, where both streams are correlated by proposed fusion attention (FA) as the input of a temporal network. On the other hand, the temporal network with a bi-directional temporal layer using a GRU single layer is preferred for temporal understanding because it yields practical merits against six topologies of temporal networks in the UCF101 dataset. Meanwhile, the new proposed classifier scheme called PVC employs multiple nearest class mean (NCM) and the SoftMax function to yield multiple features outputted from temporal networks, and then votes their properties for high-performance classifications. The experiments achieve the best average accuracy of 70.8% in HMDB51 and 91.9%, the second best in UCF101 in terms of 2DConvNet for action recognition.
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION ijcsit
Iris is a powerful tool for reliable human identification. It has the potential to identify individuals with a
high degree of assurance. Extracting good features is the most significant step in the iris recognition
system. In the past, different features have been used to implement iris recognition system. Most of them are
depend on hand-crafted features designed by biometrics specialists. Due to the success of deep learning in
computer vision problems, the features learned by the Convolutional Neural Network (CNN) have gained
much attention to be applied for iris recognition system. In this paper, we evaluate the extracted learned
features from a pre-trained Convolutional Neural Network (Alex-Net Model) followed by a multi-class
Support Vector Machine (SVM) algorithm to perform classification. The performance of the proposed
system is investigated when extracting features from the segmented iris image and from the normalized iris
image. The proposed iris recognition system is tested on four public datasets IITD, iris databases CASIAIris-V1,
CASIA-Iris-thousand and, CASIA-Iris- V3 Interval. The system achieved excellent results with the
very high accuracy rate.
Iris is a powerful tool for reliable human identification. It has the potential to identify individuals with a high degree of assurance. Extracting good features is the most significant step in the iris recognition system. In the past, different features have been used to implement iris recognition system. Most of them are depend on hand-crafted features designed by biometrics specialists. Due to the success of deep learning in computer vision problems, the features learned by the Convolutional Neural Network (CNN) have gained much attention to be applied for iris recognition system. In this paper, we evaluate the extracted learned features from a pre-trained Convolutional Neural Network (Alex-Net Model) followed by a multi-class Support Vector Machine (SVM) algorithm to perform classification. The performance of the proposed system is investigated when extracting features from the segmented iris image and from the normalized iris image. The proposed iris recognition system is tested on four public datasets IITD, iris databases CASIAIris-V1, CASIA-Iris-thousand and, CASIA-Iris- V3 Interval. The system achieved excellent results with the very high accuracy rate.
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Yuji Oyamada
Our deep learning based 3d object classification work published at International Workshop on Frontiers of Computer Vision (IW-FCV), 2018.
For further detail, see the following page.
https://sites.google.com/view/dryujioyamada/research/objectclassification
Abstract: Tracking and detecting the crowd is the main problem in the current era hence we are making video scenes method .Detection of many individual objects has been improved over recent years. It has been challenging to detect and track the tasks due to occlusions and variation in people appearance. Facing these challenges, we suggest to leverage information on the global structure of the scenes and to resolve jointly. We explore the constraints of the crowd density and detection of optimization using joint energy function. We show how the optimization of such energy function improves to track and detect in floating crowds. We validate our approach on a challenging video dataset of crowded scenes. The addition of different features which is relevant to tracking peoples such as movement, size, height and the observation models in the particle filters and followed by a clustering methods. It minimizes the communication cost and Data Retrieval is easy.
Willie Nelson Net Worth: A Journey Through Music, Movies, and Business Venturesgreendigital
Willie Nelson is a name that resonates within the world of music and entertainment. Known for his unique voice, and masterful guitar skills. and an extraordinary career spanning several decades. Nelson has become a legend in the country music scene. But, his influence extends far beyond the realm of music. with ventures in acting, writing, activism, and business. This comprehensive article delves into Willie Nelson net worth. exploring the various facets of his career that have contributed to his large fortune.
Follow us on: Pinterest
Introduction
Willie Nelson net worth is a testament to his enduring influence and success in many fields. Born on April 29, 1933, in Abbott, Texas. Nelson's journey from a humble beginning to becoming one of the most iconic figures in American music is nothing short of inspirational. His net worth, which estimated to be around $25 million as of 2024. reflects a career that is as diverse as it is prolific.
Early Life and Musical Beginnings
Humble Origins
Willie Hugh Nelson was born during the Great Depression. a time of significant economic hardship in the United States. Raised by his grandparents. Nelson found solace and inspiration in music from an early age. His grandmother taught him to play the guitar. setting the stage for what would become an illustrious career.
First Steps in Music
Nelson's initial foray into the music industry was fraught with challenges. He moved to Nashville, Tennessee, to pursue his dreams, but success did not come . Working as a songwriter, Nelson penned hits for other artists. which helped him gain a foothold in the competitive music scene. His songwriting skills contributed to his early earnings. laying the foundation for his net worth.
Rise to Stardom
Breakthrough Albums
The 1970s marked a turning point in Willie Nelson's career. His albums "Shotgun Willie" (1973), "Red Headed Stranger" (1975). and "Stardust" (1978) received critical acclaim and commercial success. These albums not only solidified his position in the country music genre. but also introduced his music to a broader audience. The success of these albums played a crucial role in boosting Willie Nelson net worth.
Iconic Songs
Willie Nelson net worth is also attributed to his extensive catalog of hit songs. Tracks like "Blue Eyes Crying in the Rain," "On the Road Again," and "Always on My Mind" have become timeless classics. These songs have not only earned Nelson large royalties but have also ensured his continued relevance in the music industry.
Acting and Film Career
Hollywood Ventures
In addition to his music career, Willie Nelson has also made a mark in Hollywood. His distinctive personality and on-screen presence have landed him roles in several films and television shows. Notable appearances include roles in "The Electric Horseman" (1979), "Honeysuckle Rose" (1980), and "Barbarosa" (1982). These acting gigs have added a significant amount to Willie Nelson net worth.
Television Appearances
Nelson's char
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksMichele Weigle
Hady Abdel-Salem's PhD Defense Slides
Department of Computer Science
Old Dominion University
November 1, 2010
Note: You may need to download the file to see all of the animations.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
Recent advances in deep neural networks have been successfully demonstrated with fairly good accuracy for multi-class activity identification. However, existing methods have limitations in achieving complex spatial-temporal dependencies. In this work, we design two stream fusion attention (2SFA) connected to a temporal bidirectional gated recurrent unit (GRU) one-layer model and classified by prediction voting classifier (PVC) to recognize the action in a video. Particularly in the proposed deep neural network (DNN), we present 2SFA for capturing appearance information from red green blue (RGB) and motion from optical flow, where both streams are correlated by proposed fusion attention (FA) as the input of a temporal network. On the other hand, the temporal network with a bi-directional temporal layer using a GRU single layer is preferred for temporal understanding because it yields practical merits against six topologies of temporal networks in the UCF101 dataset. Meanwhile, the new proposed classifier scheme called PVC employs multiple nearest class mean (NCM) and the SoftMax function to yield multiple features outputted from temporal networks, and then votes their properties for high-performance classifications. The experiments achieve the best average accuracy of 70.8% in HMDB51 and 91.9%, the second best in UCF101 in terms of 2DConvNet for action recognition.
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION ijcsit
Iris is a powerful tool for reliable human identification. It has the potential to identify individuals with a
high degree of assurance. Extracting good features is the most significant step in the iris recognition
system. In the past, different features have been used to implement iris recognition system. Most of them are
depend on hand-crafted features designed by biometrics specialists. Due to the success of deep learning in
computer vision problems, the features learned by the Convolutional Neural Network (CNN) have gained
much attention to be applied for iris recognition system. In this paper, we evaluate the extracted learned
features from a pre-trained Convolutional Neural Network (Alex-Net Model) followed by a multi-class
Support Vector Machine (SVM) algorithm to perform classification. The performance of the proposed
system is investigated when extracting features from the segmented iris image and from the normalized iris
image. The proposed iris recognition system is tested on four public datasets IITD, iris databases CASIAIris-V1,
CASIA-Iris-thousand and, CASIA-Iris- V3 Interval. The system achieved excellent results with the
very high accuracy rate.
Iris is a powerful tool for reliable human identification. It has the potential to identify individuals with a high degree of assurance. Extracting good features is the most significant step in the iris recognition system. In the past, different features have been used to implement iris recognition system. Most of them are depend on hand-crafted features designed by biometrics specialists. Due to the success of deep learning in computer vision problems, the features learned by the Convolutional Neural Network (CNN) have gained much attention to be applied for iris recognition system. In this paper, we evaluate the extracted learned features from a pre-trained Convolutional Neural Network (Alex-Net Model) followed by a multi-class Support Vector Machine (SVM) algorithm to perform classification. The performance of the proposed system is investigated when extracting features from the segmented iris image and from the normalized iris image. The proposed iris recognition system is tested on four public datasets IITD, iris databases CASIAIris-V1, CASIA-Iris-thousand and, CASIA-Iris- V3 Interval. The system achieved excellent results with the very high accuracy rate.
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Yuji Oyamada
Our deep learning based 3d object classification work published at International Workshop on Frontiers of Computer Vision (IW-FCV), 2018.
For further detail, see the following page.
https://sites.google.com/view/dryujioyamada/research/objectclassification
Abstract: Tracking and detecting the crowd is the main problem in the current era hence we are making video scenes method .Detection of many individual objects has been improved over recent years. It has been challenging to detect and track the tasks due to occlusions and variation in people appearance. Facing these challenges, we suggest to leverage information on the global structure of the scenes and to resolve jointly. We explore the constraints of the crowd density and detection of optimization using joint energy function. We show how the optimization of such energy function improves to track and detect in floating crowds. We validate our approach on a challenging video dataset of crowded scenes. The addition of different features which is relevant to tracking peoples such as movement, size, height and the observation models in the particle filters and followed by a clustering methods. It minimizes the communication cost and Data Retrieval is easy.
Willie Nelson Net Worth: A Journey Through Music, Movies, and Business Venturesgreendigital
Willie Nelson is a name that resonates within the world of music and entertainment. Known for his unique voice, and masterful guitar skills. and an extraordinary career spanning several decades. Nelson has become a legend in the country music scene. But, his influence extends far beyond the realm of music. with ventures in acting, writing, activism, and business. This comprehensive article delves into Willie Nelson net worth. exploring the various facets of his career that have contributed to his large fortune.
Follow us on: Pinterest
Introduction
Willie Nelson net worth is a testament to his enduring influence and success in many fields. Born on April 29, 1933, in Abbott, Texas. Nelson's journey from a humble beginning to becoming one of the most iconic figures in American music is nothing short of inspirational. His net worth, which estimated to be around $25 million as of 2024. reflects a career that is as diverse as it is prolific.
Early Life and Musical Beginnings
Humble Origins
Willie Hugh Nelson was born during the Great Depression. a time of significant economic hardship in the United States. Raised by his grandparents. Nelson found solace and inspiration in music from an early age. His grandmother taught him to play the guitar. setting the stage for what would become an illustrious career.
First Steps in Music
Nelson's initial foray into the music industry was fraught with challenges. He moved to Nashville, Tennessee, to pursue his dreams, but success did not come . Working as a songwriter, Nelson penned hits for other artists. which helped him gain a foothold in the competitive music scene. His songwriting skills contributed to his early earnings. laying the foundation for his net worth.
Rise to Stardom
Breakthrough Albums
The 1970s marked a turning point in Willie Nelson's career. His albums "Shotgun Willie" (1973), "Red Headed Stranger" (1975). and "Stardust" (1978) received critical acclaim and commercial success. These albums not only solidified his position in the country music genre. but also introduced his music to a broader audience. The success of these albums played a crucial role in boosting Willie Nelson net worth.
Iconic Songs
Willie Nelson net worth is also attributed to his extensive catalog of hit songs. Tracks like "Blue Eyes Crying in the Rain," "On the Road Again," and "Always on My Mind" have become timeless classics. These songs have not only earned Nelson large royalties but have also ensured his continued relevance in the music industry.
Acting and Film Career
Hollywood Ventures
In addition to his music career, Willie Nelson has also made a mark in Hollywood. His distinctive personality and on-screen presence have landed him roles in several films and television shows. Notable appearances include roles in "The Electric Horseman" (1979), "Honeysuckle Rose" (1980), and "Barbarosa" (1982). These acting gigs have added a significant amount to Willie Nelson net worth.
Television Appearances
Nelson's char
UNDERSTANDING WHAT GREEN WASHING IS!.pdfJulietMogola
Many companies today use green washing to lure the public into thinking they are conserving the environment but in real sense they are doing more harm. There have been such several cases from very big companies here in Kenya and also globally. This ranges from various sectors from manufacturing and goes to consumer products. Educating people on greenwashing will enable people to make better choices based on their analysis and not on what they see on marketing sites.
Natural farming @ Dr. Siddhartha S. Jena.pptxsidjena70
A brief about organic farming/ Natural farming/ Zero budget natural farming/ Subash Palekar Natural farming which keeps us and environment safe and healthy. Next gen Agricultural practices of chemical free farming.
Artificial Reefs by Kuddle Life Foundation - May 2024punit537210
Situated in Pondicherry, India, Kuddle Life Foundation is a charitable, non-profit and non-governmental organization (NGO) dedicated to improving the living standards of coastal communities and simultaneously placing a strong emphasis on the protection of marine ecosystems.
One of the key areas we work in is Artificial Reefs. This presentation captures our journey so far and our learnings. We hope you get as excited about marine conservation and artificial reefs as we are.
Please visit our website: https://kuddlelife.org
Our Instagram channel:
@kuddlelifefoundation
Our Linkedin Page:
https://www.linkedin.com/company/kuddlelifefoundation/
and write to us if you have any questions:
info@kuddlelife.org
"Understanding the Carbon Cycle: Processes, Human Impacts, and Strategies for...MMariSelvam4
The carbon cycle is a critical component of Earth's environmental system, governing the movement and transformation of carbon through various reservoirs, including the atmosphere, oceans, soil, and living organisms. This complex cycle involves several key processes such as photosynthesis, respiration, decomposition, and carbon sequestration, each contributing to the regulation of carbon levels on the planet.
Human activities, particularly fossil fuel combustion and deforestation, have significantly altered the natural carbon cycle, leading to increased atmospheric carbon dioxide concentrations and driving climate change. Understanding the intricacies of the carbon cycle is essential for assessing the impacts of these changes and developing effective mitigation strategies.
By studying the carbon cycle, scientists can identify carbon sources and sinks, measure carbon fluxes, and predict future trends. This knowledge is crucial for crafting policies aimed at reducing carbon emissions, enhancing carbon storage, and promoting sustainable practices. The carbon cycle's interplay with climate systems, ecosystems, and human activities underscores its importance in maintaining a stable and healthy planet.
In-depth exploration of the carbon cycle reveals the delicate balance required to sustain life and the urgent need to address anthropogenic influences. Through research, education, and policy, we can work towards restoring equilibrium in the carbon cycle and ensuring a sustainable future for generations to come.
WRI’s brand new “Food Service Playbook for Promoting Sustainable Food Choices” gives food service operators the very latest strategies for creating dining environments that empower consumers to choose sustainable, plant-rich dishes. This research builds off our first guide for food service, now with industry experience and insights from nearly 350 academic trials.
Summary of the Climate and Energy Policy of Australia
final_project_1_2k21cse07.pptx
1. Human Action Recognition Using
Attention Based Spatiotemporal Graph
Convolutional Network
Under the guidance of:
Prof. Anil Singh Parihar
Dept. of Computer Science &
Engineering
Submitted By:
Anshula Sharma
2K21/CSE/07
2. What is Human Action Recognition?
• Human Action Recognition or (HAR) is concerned with predicting or classifying the
actions being performed by a human being.
• It is an important area of research.
• Different data modalities used for HAR:
1. Optical Flow
2. RGB Images
3. Body Skeletons
• Deep Learning techniques are used to predict and recognize human actions.
3. Problem Statement
• The major goal is to build and improve a model that can recognize and interpret human actions
using skeletal data.
• Traditional techniques frequently depend on RGB video data, which can be affected by lighting and
occlusions. By capturing human actions using the spatial joints, action recognition based on
skeletons provide a more robust and efficient alternative.
• However, skeleton-based action identification faces a number of obstacles:
• It is still difficult to extract relevant information from skeletal data and properly capture the
spatiotemporal dynamics of human motions.
• Existing approaches usually process the body skeletons in the entire sequence that represents
the action performed. This strategy is inefficient in terms of computation time and memory
utilization.
• We propose an attention based spatiotemporal graph convolutional network to overcome these
challenges.
4. Recognition using Skeleton based data
• Body skeletons are increasingly used for
Human action recognition due to their
compact and action-focused nature.
• Skeletons are three-dimensional or two-
dimensional coordinate representations of
human body joints.
• Skeletons are found in graph formations,
where the graph’s nodes represents the
skeleton's joints whereas the edges of the
graph indicate the many connections between
various body joints.
• Actions can be identified from the different
motion patterns of the joints of the skeletal
body.
5. Graph Convolutional Networks
• Graph Convolutional Networks (GCNs) have become quite prominent in the field of skeleton-based
action recognition.
• Using deep feed-forward architectures, graph convolutional networks successfully capture the
spatiotemporal characteristics inherent in human skeletons.
• GCNs are a variant of Convolutional Neural Networks (CNN), and help to generalize graph-
structured data.
• GCNs operate, in a manner similar to CNNs, by inspecting the neighboring nodes.
• The input is in non-Euclidean structural form data, with each node having varying numbers of
connections.
• Nodes and their connections (edges) with other nodes are represented with the help of an
adjacency matrix, which is then introduced to the forward propagation equation.
6. Graph Convolutional Network
• The initial input to the GCN is the graph's node
features along with the adjacency matrix. It
records local connection patterns as well as
information from surrounding nodes.
• The aggregated node features are applied with
the weight matrix. It is multiplied by the
aggregated features to compute the modified
features for each node.
• Finally, non-linear activation function is applied to
obtain updated node representations. The
updated node representations are then served as
the next layer’s input, along with the adjacency
matrix.
8. Dataset
• The model is experimented on a large-scale indoor action recognition dataset, NTU-RGB+D.
• It is the most comprehensive datasets for 3D annotations for human action recognition tasks.
• The video clips have been acquired by Microsoft Kinect v2 sensors.
• It covers 60 action classes covering over 56,880 action samples, with the actions carried out by 40
distinct test subjects.
• The action classes cover a wide variety of daily actions, including walking, waving, clapping, sitting down,
getting up, and playing musical instruments. It consists of both individual actions and interactions
between two subjects.
• The annotations that are obtained by the Kinect depth sensors offer 3D joint positions (X, Y, Z).
• A total of 25 joints are present in each subject in the skeleton series.
• The dataset consists of two evaluation benchmarks:
1. Cross-view (X-view): 3 different cameras capture the videos. The training set comprises of 37,820 video clips. The test
set consists of 18,960 video clips.
2. Cross-subject (X-sub): Focuses on cross-subject action recognition. The training set has 40,320 videos and the test
set contains 16,560 videos.
10. Proposed Approach
• The proposed approach is an attention-based model for human action recognition that uses both
temporal and spatial attention modules to improve recognition accuracy.
• The temporal attention module selects the most informative frames from a sequence of skeletons,
capturing the action's critical temporal dynamics.
• Following that, the spatial attention mechanism highlights the most significant joints within the
selected frames, emphasizing their distinguishing characteristics.
• The computed attention scores are then used to select frames, allowing the identification of the
skeletons with the highest attention values.
• Both temporal and spatial relationships are effectively utilized in the skeletal data by incorporating
the attention modules into a graph convolutional network.
• The temporal and spatial attention mechanisms together improve the efficiency of human action
recognition based on skeletons, resulting in further accurate and robust identification results.
11. Spatial Graph Convolution Module
• Spatial Graph Convolution block focuses on capturing spatial relationships.
• It operates by considering the nodes in the network as skeletal joints and the edges as connections that
describe interactions between these joints.
• The block aggregates information from nearby nodes in order to capture local interactions and
dependencies.
• The block accepts a tensor of shape (N, Cin, T, V) as input, where N denotes the batch size, T represents
the sequence length, Cin represents the number of input channels, and V represents the number of
joints.
• Along with the input tensor, an adjacency matrix is used which depicts pairwise interactions between
the joints.
• The block then runs a graph convolution operation, which updates the characteristics of each joint by
aggregating information from its nearby joints.
• Non-linear transformations are used after the graph convolution function to incorporate non-linearity
and improve discriminative capability of the features.
12. Temporal Graph Convolution Module
• Temporal convolution block captures the dynamics of temporal information in the skeletal data.
• The block takes as input the output of the spatial graph convolution block.
• For temporal graph convolution, the neighborhood of each vertex is extended to include each node's
temporal neighbors.
• Each node in the preceding and following skeleton frames is linked to the same node, resulting in a
temporal neighborhood size of 2 for each node.
• A 2D convolution is performed to process the extended neighborhood, with a kernel which determines
the convolution operation's temporal receptive field.
• The temporal convolution block aggregates the features of each individual body joint at various time
steps by using a fixed kernel size, allowing the model to capture the dynamics of temporal dimension
and patterns contained in the skeletal data.
13. Temporal Attention Block
• Temporal attention is used to detect relevant frames within a sequence of frames.
• Its goal is to emphasize and recognize the frames that make a significant contribution to the overall
interpretation of an action.
• The temporal attention module computes the average activation over all joints and channels for each
frame.
• It aggregates the collective information inside each frame, indicating its overall significance.
• The model learns the weights associated with each frame by passing the aggregated frame-level
activations through a linear layer.
• A sigmoid activation function is then applied to these weights which produces attention weights for
each frame.
• These attention weights serve as a mask, selectively amplifying or suppressing specific frame activations.
• The model can successfully focus on the frames that are deemed significant or informative for the given
action by applying attention weights to the input.
14. Spatial Attention Block
• Spatial attention is concerned with identifying important joints within the frames highlighted in the
temporal attention module.
• It computes the average activation for each joint over all frames and channels.
• The joint-level activations are then transferred to a linear layer, which allows the model to learn the
weights associated with each joint.
• These weights are then fed into a sigmoid activation function, which produces attention weights that
indicate the relative significance of each joint.
• The generated attention weights operate as a mask, selectively adjusting the activations of individual
joints based on their relevance by adding attention weights to the attended frames.
• Finally, a subset of most informative skeletons from a given sequence of skeletons is fetched.
• The skeletons are sorted in a decreasing order according to their attention weights.
• These selected skeletons are then incorporated into the network's subsequent layers for additional
processing and analysis.
15. Network Architecture
• Skeleton data is fed as the input to the model comprising of the coordinate locations of all the joints of the
skeleton.
• The model is made up of six graph convolutional layers and an attention module.
• Initially, two spatial graph convolutional layers are present in the model. Spatial Graph Convolution layer
focuses on capturing spatial relationships.
• Temporal and Spatial attention blocks are present after the spatial graph convolutional layers. The temporal
attention module captures the most informative frames from a sequence of skeletons. The spatial attention
mechanism emphasizes the most informative joints from the frames highlighted. Frame selection is then
performed to select the skeletons with the highest attention scores.
• The last four graph convolutional layers combine spatial and temporal convolutions. Temporal dynamics in the
skeletal data is captured by the temporal convolution block. The block takes as input the output of the spatial
graph convolution block.
• Each skeleton sequence's enhanced spatiotemporal characteristics are passed through a global average
pooling, yielding a 256-dimensional output feature vector.
• Finally, human actions are classified using a fully connected layer that includes a SoftMax classifier. To reduce
classification error, the model trains end-to-end via backpropagation.
17. Experimental Results
• The model achieved Top-1 and Top-5
accuracy of 83.58% and 97.07% on the
cross-subject benchmark. While a Top-
1 and Top-5 accuracy of 91.22% and
98.85% is achieved on the cross-view
benchmark.
• Model is compared with other notable
approaches on the Top-1 accuracy for
both cross-subject and cross-view
benchmarks.
18. Conclusion
• Utilization of temporal and spatial attention mechanisms helped in enhancing the model’s
performance.
• The model is evaluated on a widely used NTU-RGB+D benchmark, assessing its top-1 and top-5
accuracies on the dataset’s cross-view and cross-subject benchmarks.
• In comparison to other skeleton-based models, we observed a significant difference in
performance between RNN and CNN-based methods and our method. Furthermore, our method
outperformed other GCN-based models, demonstrating the benefit of incorporating spatial and
temporal attention mechanisms to graph convolutional network which enhanced the model's
accuracy and efficiency.
19. References
• S. Yan, Y. Xiong and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based
action recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018.
• N. Heidari and A. Iosifidis, "Temporal attention-augmented graph convolutional network for
efficient skeleton-based human action recognition," in 2020 25th International Conference on
Pattern Recognition (ICPR), 2021.
• T. N. Kipf and M. Welling, ‘Semi-Supervised Classification with Graph Convolutional Networks’,
in International Conference on Learning Representations, 2017.
• A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-scale video
classification with convolutional neural networks," 2014.
• Y. Du, W. Wang and L. Wang, "Hierarchical recurrent neural network for skeleton based action
recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition,
2015.