The document proposes a method for recognizing abnormal behavior on campus using temporal segment transformers (TST). It introduces the CABR50 dataset containing 50 abnormal behavior classes. The proposed TST model divides videos into three equal segments and samples each to capture motion sequences. Experiments show the TST-L+ model achieves 83.57% top-1 and 97.16% top-5 accuracy on the CABR50 dataset, outperforming other models. The advantages of TST include temporal understanding, fine-grained analysis, and ability to recognize complex patterns, allowing for automated abnormal behavior detection.
Recognizing Campus Behavior with Temporal Segment Transformers
1. Base paper Title: Campus Abnormal Behavior Recognition With Temporal Segment
Transformers
Modified Title: Recognizing Aberrant Behavior on Campus Using Temporal Segment
Transformers
Abstract
The intelligent campus surveillance system is beneficial to improve safety in school.
Abnormal behavior recognition, a field of action recognition in computer vision, plays an
essential role in intelligent surveillance systems. Computer vision has been actively applied to
action recognition systems based on Convolutional Neural Networks (CNNs). However,
capturing sufficient motion sequence features from videos remains a significant challenge in
action recognition. This work explores the challenges of video-based abnormal behavior
recognition on campus. In addition, a novel framework is established on long-range temporal
video structure modeling and a global sparse uniform sampling strategy that divides a video
into three segments of identical durations and uniformly samples each snippet. The proposed
method incorporates a consensus of three temporal segment transformers (TST) that globally
connects patches and computes selfattention with joint spatiotemporal factorization. The
proposed model is developed on the newly created campus abnormal behavior recognition
(CABR50) dataset, which contains 50 human abnormal action classes with an average of over
700 clips per class. Experiments show that it is feasible to implement abnormal behavior
recognition on campus and that the proposed method is competitive with other peer video
recognition in terms of Top-1 and Top-5 recognition accuracy. The results suggest that TST-
L+ can improve campus abnormal behavior recognition, corresponding to Top-1 and Top-5
accuracy results of 83.57% and 97.16%, respectively.
Existing System
Campus abnormal behavior recognition refers to using surveillance devices and
artificial intelligence to identify unusual or potentially threatening behavior on campus. Video
understanding is a core technology [1], [2], [3], [4] in many scenarios of surveillance systems.
Over the years, unexpected actions, such as fighting, accidents, falling, and suicides, have
occurred frequently in schools, causing general concern. Recognizing abnormal behavior can
achieve real-time andefficient warning, positively affecting school safety management.
2. Researchers focus on directly exploring abnormal behaviors instead of relying heavily on pre-
processing to classify video behaviors [5], [6]. Researchers have focused on applications in
specific scenarios on campus, such as classrooms [45] and laboratories [46], [47]. However,
there is little research on campus abnormal behavior recognition. Essentially, abnormal
behavior is a wide range of applications of video understanding. Motivated by video
understanding, this study aims to provide an effective solution for recognizing video-based
abnormal behavior on campus.
Drawback in Existing System
Limited Training Data: The performance of TST models heavily relies on the
availability and quality of training data. If the dataset is small or biased, the model may
not generalize well to diverse abnormal behaviors.
Interpretability Challenges: Understanding how and why a TST model makes a
particular prediction can be difficult. The complex nature of the model might hinder
interpretability, making it hard to trust or explain its decisions, which is crucial in
sensitive environments like campuses.
Privacy and Surveillance: Implementing such systems raises privacy concerns as they
involve continuous monitoring of individuals' behaviors. It can lead to conflicts
between ensuring safety and respecting privacy rights.
Maintenance and Updates: Continuous maintenance and updates are required to keep
the model relevant and effective. This includes retraining on new data and adapting to
evolving behaviors or threats.
Proposed System
Therefore, the proposed models are adequately compared with three relevant proposals:
TSN [13], Slowfast [22], and Swin-B [31]. In addition, this work attempts to innovate
abnormal behavior identification on campus. First, the backbone network consists of
video shifted windows transformer [31],
We propose a consensus of three temporal segment transformers (TST) based on the
video Swin transformer for the new campus abnormal behavior recognition (CABR50)
3. dataset. It enhances the ability to capture motion sequences and model long-range
abnormal behavior on campus.
They propose a deep convolutional network architecture to detect and classify the
behavioral patterns of students and teachers in computer-enabled laboratories.
This section describes the proposed TST based on the video Swin transformer [31] for
abnormal campus behavior recognition.
Algorithm
These algorithms are related to decomposing spatiotemporal self-attention using
different factorized methods. They achieved better results than previous pure CNN and
methods for adding self-attention units to videos
TST-H and TST-L gain 2.85% and 5.29% Top-1 accuracy and a corresponding
significant complexity enhancement over the previous algorithms.
Stochastic Neighbor Embedding (T-SNE) visualized algorithm is used to reduce the
dimensionality of the features and project
Advantages
Temporal Understanding: TSTs excel in understanding temporal relationships within
data. In the context of abnormal behavior recognition, this temporal analysis allows the
model to capture nuanced patterns and sequences of actions, enhancing the accuracy of
identifying anomalies.
Fine-Grained Analysis: They enable the model to break down long videos or
sequences into smaller segments, enabling a more granular analysis. This granularity
can help in pinpointing specific moments or sequences that constitute abnormal
behavior.
Complex Pattern Recognition: TSTs can recognize complex patterns that might be
challenging for traditional models or human observers to detect. This includes subtle
changes or deviations in behavior that might indicate potential risks or anomalies.
Scalability and Automation: These models can efficiently process large volumes of
video data, allowing for the automation of surveillance and detection processes across
a campus or large area, which could be challenging for manual monitoring.
Software Specification
Processor : I3 core processor
4. Ram : 4 GB
Hard disk : 500 GB
Software Specification
Operating System : Windows 10 /11
Frond End : Python
Back End : Mysql Server
IDE Tools : Pycharm