human action recognition with CNN is a thesis paper based on background reduction using maskrcnn and by using 3D cNN we can evaluate the result in two base model which is restnet50 and vgg16.
Similar to human action recognition with CNN is a thesis paper based on background reduction using maskrcnn and by using 3D cNN we can evaluate the result in two base model which is restnet50 and vgg16.
Similar to human action recognition with CNN is a thesis paper based on background reduction using maskrcnn and by using 3D cNN we can evaluate the result in two base model which is restnet50 and vgg16. (20)
Globus Connect Server Deep Dive - GlobusWorld 2024
human action recognition with CNN is a thesis paper based on background reduction using maskrcnn and by using 3D cNN we can evaluate the result in two base model which is restnet50 and vgg16.
2. 2
Advisor
Dr. Md. Abu Layek
Associate Professor
Department of Computer Science and Engineering
Jagannath University
Md Monirul Islam
ID: B170305034
Department of Computer Science
& Engineering
Jagannath University
monirulshahinme2@gmail.com
Shazid Ahmed Rajib
ID: B170305049
Department of Computer Science &
Engineering
Jagannath University
shazidahmed159@gmail.com
Human AcHtion Recognition with Background substraction
and 3D CNN
3. 3
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future
Literature Review
Materials
4. 4
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
As described by the author, The reason for the lower accuracy is that
some of the background elements in these classes are the same,
hence our goal is to eliminate the background elements using pre-
processing techniques.
6. 6
How deep learning influence to detect Human Action
recognition?
- Feature Extraction: It automates the extraction of relevant
features from raw data, which is crucial for recognizing human
actions.
- Neural Networks: Utilizes complex neural networks capable of
processing large volumes of video data to identify intricate action
patterns.
- Spatial-Temporal Analysis: Employs models like CNNs and RNNs
to capture spatial and temporal dependencies, thereby improving
recognition accuracy.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
7. 7
• Less accuracy in few classes (Biking,Swing,Walking with Dog )
• Because of same background elements
• Low input resolution.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
1. Clear the background noise as much as possible.
2. Develop an automatic Background remove system to fasten
the process.
Solution
8. 8
1. HAR is a significant challenge for various
reason
2. Usage of cameras has expanded
3. Identify any kind of crime or violence
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
9. 9
Data Preprocessing
Data Background
Noise Redution
Multiple CNN
Architecture
Result
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
10. 10
• Deep learning is a subfield of machine learning based on ANN(Artificial Neural Network).
Neural
Network
Shallow neural
network
Deep neural network
It consist
• input layer
• one hidden layer
• output layer
It consist
• input layer
• More than one hidden
layer
• output layer
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
11. 11
• In deep learning the hidden units in hidden layers act like biological neuron.
• Each hidden unit called neuron
• It takes inputs from input layer and then process these inputs in each hidden
units to make a sense or decision and then transfer the outputs from one hidden
layer to other hidden layers.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
12. 12
• In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of
deep neural networks, most commonly applied to analyze visual imagery.
• In CNN model , it consists three types of layer
• Convolutional layer
• Polling layer
• Fully Connected layer
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
13. 13
• Convolutional layer:
• Convolutional layers convolve the input and pass its result to the next layer.
• This layer extracts the feature with various kernel / filter.
• The objective of the Convolution Operation is to extract the high-level
features such as edges from the input image.
• The first ConvLayer is responsible for capturing the Low-Level features such
as color, gradient orientation, etc. With added layers, the architecture adapts
to the High-Level features
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
15. 15
• Pooling layer:
• Pooling layer is responsible for reducing the spatial size of the Convolved
Feature.
• Decrease the computational power required to process the data through
dimensionality reduction.
• There are two types of Pooling
1. Max Pooling and
2. Average Pooling
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
16. 16
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future
Literature Review
Materials
17. 17
Reference Contribution Drawback Key Contribution
Performance
Comparison of
ResNet50V2 and
VGG16 Models
for Feature
Extraction in
Deep Learning
The study aimed to compare the
performance of ResNet50V2 and
VGG16 for feature extraction in image
classification tasks.
• The paper suggests that while
both models are effective,
VGG16 may be less efficient
due to slower convergence
and lower accuracy in certain
tasks.
ResNet50V2
outperformed
VGG16, exhibiting
faster convergence
and achieving
higher accuracy in
the context of
masked face
recognition.
Human Action
Recognition from
Various Data
Modalities
The paper reviews the use of various
data modalities in HAR, including
the application of ResNet and
VGG16.
The review does not provide a
direct comparison between the
models.
It highlights the
importance of
multimodal data
for improving the
accuracy of HAR
systems.
Introduction Literature Review CNN Architecture Materials Evaluation Conclusion
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
18. 18
Reference Contribution Drawback Key contribution
Modern architectures
convolutional neural
networks in human
activity recognition
Discusses the role of modern CNN
architectures like ResNet and
VGG16 in HAR
• Specific drawbacks
of each model in the
context of HAR are
not detailed.
Emphasizes the
advancements in CNN
architectures that enhance
HAR performance.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
19. 19
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future Directions
Literature Review
Materials
20. 20
• Here, we have used some CNN architecture.
• VGG-16
• ResNet-50
• These architectures are success in competitions - the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC).
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
evaluates algorithms for
object detection and
image classification at
large scale
21. 21
VGG16(Visual Geometry Group) :
• VGG16 is developed by oxford
and win the ILSVR (ImageNet)
competition in 2014.
• It has 16 layers.
Layers Label Layers Quantity
Convolutional layer 13
Fully Connected
layer
3
Total 16
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
22. 22
ResNet 50:
• In 2015 ResNet was the winner
of ImageNet challenge.
• In the ResNet 50 contains 50
layers.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
23. 23
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future Directions
Literature Review
Materials
24. 24
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
• ImageNet dataset
has more than 15
million labeled
images belonging
22,000 categories.
Pre-trained
dataset
• Keras Deep learning
frameworks used
which is open-
source library
written on python.
Framework
• ReLU (Rectified
Linear Units) non-
linear function
activity Function .
Activity
Function
25. 25
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
InceptionV3
Methodology
Tools
Proposed
Methodology
Conclusion
Summary
Limitations & Future
Directions
Literature Review
Materials
26. 26
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
Tools
CPU 64 bit
RAM 32 GB
Operating System Windows 11
Programming
Language
Python
H/W And S/W Requirements
27. 27
• Data are collected from Kaggle’s data repository .
• This dataset is composed a set of 101 subjects.
• we will be using the UCF101 dataset.
• It has 101 classes of human action where each of the
classes contains more than 100 videos on average.
• The frames will be extracted from our dataset, and any
background elements will be removed before we begin
processing the data.
• Furthermore, we will maintain the 224*224 resolution
of the images.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
28. 28
• Background subtraction by MaskRCNN
• Extracting Frames
• Training the frames in ResNet CNN
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
29. 29
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
Background subtraction using MaskRCNN
31. 31
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
One of the first things we did after gathering the data
was to extract images from each video. After that, we
removed the background, taking into account only the
most crucial components that were required for the
detection of a certain object.
32. 32
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future
Literature Review
Materials
33. 33
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
• 80% Training Testing Accuracy
• More Than 90% accuracy in new videos
• Background element was the issue
Training Accuracy vs Testing Accuracy And Training Loss vs Testing Loss Of VGG16
35. 35
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
Training Accuracy vs Testing Accuracy And Training Loss vs Testing Loss Of ResNet50
37. 37
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
Used Model Accuracy Precision Recall F-1 Score
ResNet 93.93% 95% 93% 94%
VGG-16 51.68% 47% 56% 52%
38. 38
Evaluations and results
Introduction
Problem Statement
Motivation
Proposed Solution
Background Study
CNN Architecture
VGG16
ResNet
Methodology
Tools
Proposed
Methodology
Conclusion & Possible Improvements
Summary
Limitations & Future
Literature Review
Materials
39. 39
• Same approach can be implemented in various video classification problem
Limitations
• Lack of original large dataset with variety of subjects.
• Study depends on only built-in CNN architectures.
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
40. 40
Future Directions
• Custom Object Detection needed
• CNN+LSTM Model can be implemented further.
• Pose estimation values can be added in the model
Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion
41. 41
Bibliography
1. T. Lima, B. Fernandes and P. Barros, "Human action recognition with 3D convolutional neural
network," 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI),2017, pp.
1-6, doi: 10.1109/LA-CCI.2017.8285700.
2. Saoudi, E.M., Jaafari, J. and Andaloussi, S.J., 2023. Advancing human action recognition: A hybrid
approach using attention-based LSTM and 3D CNN. Scientific African, 21, p.e01796.
3. de la Torre Frade, F., MARTINEZ MARROQUIN, E., SANTAMARIA PEREZ, M.E. and MORAN MORENO,
J.A., 1997. Moving object detection and tracking system: a real-time implementation.
4. LeCun, Y. and Bengio, Y., 1995. Convolutional networks for images, speech, and time series. The
handbook of brain theory and neural networks, 3361(10), p.1995.
5. Li, Liyuan, Weimin Huang, Irene YH Gu, and Qi Tian. "Foreground object detection from videos
containing complex background." In Proceedings of the eleventh ACM international conference on
Multimedia, pp. 2-10. 2003.
6. Zhou, Q., 2001. Tracking and classifying moving objects from videos. In Proc. 2nd IEEE Workshop
on Performance Evaluation of Tracking and Surveillance, 2001.
7. Pham, H.H., Khoudour, L., Crouzil, A., Zegers, P. and Velastin, S.A., 2022. Video-based human action
recognition using deep learning: a review. arXiv preprint arXiv:2208.03775.
8. Yang, C., Mei, F., Zang, T., Tu, J., Jiang, N. and Liu, L., 2023. Human Action Recognition Using Key-
Frame Attention-Based LSTM Networks. Electronics, 12(12), p.2622.
42. 42
CREDITS: This presentation template was created by Slidesgo, and
includes icons by Flaticon, and infographics & images by Freepik
THANKS!
Editor's Notes
Once you find your sources, you will want to evaluate your sources using the following questions:
Author:
Who is the author?
Why should I believe what he or she has to say on the topic?
Is the author seen as an expert on the topic? How do you know?
Current:
How current is the information in the source?
When was the source published?
Is the information out-of-date?
Accuracy:
Is the content accurate?
Is the information presented objectively? Do they share the pros and cons?
Once you find your sources, you will want to evaluate your sources using the following questions:
Author:
Who is the author?
Why should I believe what he or she has to say on the topic?
Is the author seen as an expert on the topic? How do you know?
Current:
How current is the information in the source?
When was the source published?
Is the information out-of-date?
Accuracy:
Is the content accurate?
Is the information presented objectively? Do they share the pros and cons?
Once you find your sources, you will want to evaluate your sources using the following questions:
Author:
Who is the author?
Why should I believe what he or she has to say on the topic?
Is the author seen as an expert on the topic? How do you know?
Current:
How current is the information in the source?
When was the source published?
Is the information out-of-date?
Accuracy:
Is the content accurate?
Is the information presented objectively? Do they share the pros and cons?
Once you find your sources, you will want to evaluate your sources using the following questions:
Author:
Who is the author?
Why should I believe what he or she has to say on the topic?
Is the author seen as an expert on the topic? How do you know?
Current:
How current is the information in the source?
When was the source published?
Is the information out-of-date?
Accuracy:
Is the content accurate?
Is the information presented objectively? Do they share the pros and cons?
Once you find your sources, you will want to evaluate your sources using the following questions:
Author:
Who is the author?
Why should I believe what he or she has to say on the topic?
Is the author seen as an expert on the topic? How do you know?
Current:
How current is the information in the source?
When was the source published?
Is the information out-of-date?
Accuracy:
Is the content accurate?
Is the information presented objectively? Do they share the pros and cons?
After consulting a variety of sources, you will need to narrow your topic. For example, the topic of internet safety is huge, but you could narrow that topic to include internet safety in regards to social media apps that teenagers are using heavily. A topic like that is more specific and will be relevant to your peers. Some questions to think about to help you narrow your topic:
What topics of the research interest me the most?
What topics of the research will interest my audience the most?
What topics will the audience find more engaging? Shocking? Inspiring?
After consulting a variety of sources, you will need to narrow your topic. For example, the topic of internet safety is huge, but you could narrow that topic to include internet safety in regards to social media apps that teenagers are using heavily. A topic like that is more specific and will be relevant to your peers. Some questions to think about to help you narrow your topic:
What topics of the research interest me the most?
What topics of the research will interest my audience the most?
What topics will the audience find more engaging? Shocking? Inspiring?