bazgir2020.pdf

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3028362, IEEE Sensors
Journal
IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2020 1
Active shooter detection in multiple-person
scenario using RF based Machine Vision
Omid Bazgir, Daniel Nolte, Saugato Rahman Dhruba, Yiran Li, Changzhi Li, Souparno Ghosh, and
Ranadip Pal
Abstract— Emerging applications of radio frequency (RF) vision sensors for security and gesture recognition primarily
target single individual scenarios which restricts potential applications. In this article, we present the design of a cyber-
physical framework that analyzes RF micro-Doppler signatures for individual anomaly detection, such as a hidden rifle
among multiple individuals. RF avoids certain limitations of video surveillance, such as recognizing concealed objects
and privacy concerns. Current RF-based approaches for human activity detection or gesture recognition usually consider
single individual scenarios, and the features extracted for such scenarios are not applicable for multi-person cases. From
a machine learning perspective, the RF sensor spectrogram images are conducible for training using deep convolutional
neural networks. However, generating a large labeled training dataset with an exhaustive variety of multi-person scenarios
is extremely time consuming and nearly impossible due to the wide range of combinations possible. We present
approaches for multi-person spectrogram generation based on individual person spectrograms that can augment the
training dataset and increase the accuracy of prediction. Our results show that the spectrogram generated by RF sensors
can be harnessed by artificial intelligence algorithms to detect anomalies such as a concealed weapon for single and
multiple people scenarios. The proposed system can aid as a standalone tool, or be complemented by video surveillance
for anomaly detection, in scenarios involving single or multiple individuals.
Index Terms— Deep Learning, Bayesian optimization, Radar, Data Augmentation, Convolutional Neural Network
I. INTRODUCTION
THis article considers the problem of anomaly detection in
multi-person scenarios using RF-based machine vision.
The specific application problem is to detect a concealed
weapon carried by an individual in a crowd. A person carrying
a concealed weapon refers to a person concealing a weapon
(such as a rifle) under their clothes (such as overcoats.) Note
that active shooting incidents have been on the rise in the
last two decades, and a significant portion of the incidents
involve shot-guns or rifles [1]. Furthermore, the number of
crimes involving guns per 100, 000 habitants is very high in
many countries, e.g., 21.5 in Mexico, 4.7 in United States
and 1.6 in Belgium [2]. Thus, an early recognition technology
for detecting concealed weapons can be a beneficial tool
for surveillance purposes. Current technologies for anomaly
detection in a crowd usually consider video surveillance that
can be restrictive for detecting a concealed rifle. Other forms
of shooter detection technologies, such as acoustic gunshot
identification and infrared camera gunfire flash detection [3],
only trigger an alarm after a weapon is fired. Thus, an effective
shooter detection system, especially for detecting shooters
armed with a concealed rifle/shotgun before the shooter draws
the weapon, is in high demand to prevent such tragedies [4],
O. Bazgir, D. Nolte, S.R. Dhruba, C. Li, and R. Pal are with the De-
partment of Electrical and Computer Engineering, Texas Tech University,
Lubbock, TX 79409 USA (email:ranadip.pal@ttu.edu).
S. Ghosh is with the Department of Mathematics and Statistics, Texas
Tech University, Lubbock, TX 79409 USA
Y. Li, was with the Department of Electrical and Computer Engineer-
ing, Texas Tech University, Lubbock, TX 79409 USA. She is now with
United Imaging, Houston, TX 77054 USA
[5]. In our earlier paper [4], we have shown that an RF based
machine vision system can be utilized to detect whether a
person has a concealed weapon. However, the manual feature
extraction-based approach for classifying a concealed weapon
followed in [4] turns out to be unsuitable for multi-person
scenarios. In this article, we attempt to solve this issue by
designing a deep-learning system for detecting individual
anomalous behavior in a crowd. The problem of limited
number of multi-person samples for training is solved using
data augmentation by synthetically generating multi-person RF
signatures from individual person signatures. To the best of
our knowledge, this is the first approach to detect anomalies
in multi-person scenarios using an RF based machine vision
system that incorporates deep learning and data augmentation.
Background
RF based approaches that consider the radar-measured
micro-Doppler signatures have been widely studied for target
recognition [6]–[8], activity classification [9], [10] and smart
home applications [11], [12]. For instance, [13] extracts gaits
characteristics such as torso speed, arm and leg swing motion
from micro-Doppler signatures for enhancement of through-
the-wall surveillance applications. However, most studies have
concentrated on individual person activity or gait classification,
whereas in real-life, the extension to multi-person scenarios is
highly relevant. The baseline approach to solve this problem
will consist of using the features and training for single person
scenarios and applying the learned model on multi-person
scenarios. However, this approach provides low accuracy as
Authorized licensed use limited to: University of Glasgow. Downloaded on October 31,2020 at 14:27:47 UTC from IEEE Xplore. Restrictions apply.

Journal
2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2020
we observed in our case for concealed weapon detection where
the problem consists of detecting whether any person among
a group of two or three people carries a concealed rifle based
on the micro-Doppler radar signature. Thus, to improve the
accuracy of detecting anomalies in multi-person scenarios, we
present a machine vision based framework that incorporates
synthetic data augmentation. We consider the scenario that
the actual experimental training dataset is limited to single
person scenarios, and we plan to test our classification model
on multi-person scenarios.
The goal of the machine-vision system is to classify whether
any person among a group of two or three people carries a
concealed rifle. The assumption is that the radar spectrogram
will capture the difference in body movements for someone
with and without a concealed weapon [4]. However, a micro-
Doppler radar signature of multiple people contains a superim-
position of the velocities from different people, which makes it
difficult to extract the features such as limb velocity difference,
torso speed, bandwidth, and frequency gap [4] for the indi-
viduals in the multi-person scenario. Thus, a more automated
feature extraction approach is desirable rather than a manually
generated feature extraction methodology. To tackle this issue,
we consider a convolutional neural network (CNN) based
framework for automated feature extraction and classification.
The RF sensor based images are suitable for deep learning
using the CNN architecture as the neighboring pixels in the RF
sensor images are expected to be correlated, and an automated
approach for feature extraction is desired. However, expecting
numerous potential training samples for multi-person scenarios
is hard to achieve in practice. For instance, if we consider 3-
person scenarios and m types of people expected (such as
differences based on gender, height, weight, walking speed
etc.), then for exhaustive training of potential crowd scenarios,
we will require m
3

combinations with a multitude of training
samples in each. The combinations might require an extensive
number of experimental training samples, for instance m = 15
will result in 15
3

= 455 scenarios. If 20 samples of each
scenario are preferred for training, 9100 experimental training
samples will need to be captured and labeled, which will be
hard to achieve in normal circumstances. Thus, a desirable
approach will consist of using individual person images to
generate the crowd images to be used for training. This will
allow us to create numerous scenarios for training without
resorting to experimental generation of each scenario.
Contribution
In this article, we consider data augmentation for model
training using synthetically created radar signatures for
multiple-person scenarios based on radar signatures of in-
dividual persons. We show that the data augmentation and
subsequent Bayesian CNN based training can significantly
improve the performance of detecting anomalies in multiple-
person scenarios. We also explored how the ratio of the normal
to anomalous number of training samples affect the testing
performance. We observed that the balanced number of sam-
ples did not produce the best performance, rather a skewness
towards more samples from the anomalous class produced
Radar
output
Doppler
mode
𝐵𝐼 𝑡
𝐵𝑄 𝑡
Radar
signal
processing
Micro-
Doppler
signature
fle
R
shooter
normal
Active shooter detector model
Concealed rifle
Fig. 1: Application scenario of the proposed active shooter
detection system
better testing performance. For comparison purposes, we also
considered one-class classification, where only the normal
samples are assumed to be available, and the classification
algorithm tries to detect whether the new sample belongs to
the distribution of the normal samples and if not is considered
anomalous [14]–[17].
The results reported in this paper show that synthetic radar
signature generation for multi-person scenarios and subsequent
data augmentation for training can be utilized to significantly
improve the anomaly detection performance for multi-person
scenarios.
II. METHODOLOGY
A. Data Acquisition
Our proposed potential active shooter detection system
is an application of RF radars that measure micro-Doppler
signatures. As shown in Fig. 1, the micro-Doppler signals
are obtained from the radar output that operates under the
Doppler mode.For this application, we consider that a radar
sensor can be mounted at the entrance of a building or a
monitored area inside a building. Thus, in the scenario that
a potential active shooter carrying a rifle/shotgun is walking
toward the door at distance R, the radar transmitted signal
is reflected by the moving target and the reflected signal is
received and processed by the radar receiver (Rx) end. Two
radar output channels are sampled and collected as the I and Q
channel radar outputs BI(t) and BQ(t). When the radar works
under the Doppler mode, one output channel is grounded,
and the other output channel is sampled and collected as the
beat signal BB(t). Micro-Doppler signatures are created by
applying Short-Time Fourier Transform (STFT) with a sliding
Hamming window of size 1.024s to these outputs. We then
applied a noise removal procedure on the obtained signatures,
where the main noise source is the experimental environment
(i.e., radar vibration due to the wind) which is minimum
in practical applications since the device is designed to be
mounted firmly on the gate/wall. To remove the noise, a power
density threshold is set and any value below the threshold (−60
in this work) is suppressed. The information of human subject
movement is well preserved while the noise at undesired
frequencies is filtered out. Fig. 2 illustrates the process of
deriving the signatures from the radar output along with an
example of a clean signature (after noise removal) where a
human subject walked toward the radar sensor without holding

Journal
BAZGIR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (OCTOBER 2020) 3
anything. The hardware architecture of the radar system and
relevant information can be found in detail in our previous
work [4].
Fig. 2: Micro-Doppler signature is created by applying Short-
Time Fourier Transform (STFT) to the radar output. The
sampled radar output is a series of voltage readings that
form a 1-D array along time. To perform STFT, the radar
output is divided into short segments with equal length, then
Fast Fourier Transform (FFT) is taken for each segment and
Doppler shift due to target movement is revealed for each short
period. Micro-Doppler signature of a human subject walking
naturally toward the radar sensor (a) before and (b) after noise
removal.
B. Experimental setup
Experiments were conducted to evaluate the feasibility of
detecting a gun (rifle/shotgun) concealed by one person among
multiple people walking towards the radar based on training
using single person radar signatures. In addition to the gathered
data of subjects who performed the following three activities:
(a) walking with a concealed gun under a trench coat, (b)
walking without holding anything and (c) walking with a gym
bag, in our previous study [4], we acquired new data of 4
human subjects, where the subjects alone or in groups of two
or three performed the activities (a)-(c).
The experimental setup is shown in Fig. 3. We conducted
the experiments outdoors where the radar sensor was fixed on
a tripod at 1m above the ground and powered by a battery-pack
mounted on the tripod. We asked each human subject to start
from a point that is 7 meter away from the sensor, as shown
in Fig. 3 and walk straight toward the radar (0◦
to the main
lobe of the radar antenna). Eight human subjects (5 male and
𝜽 𝜽
Radar sensor
𝜽
Radar sensor
Fig. 3: Experimental setup for acquiring radar signals when
multiple persons walk toward the radar with different angle (θ),
where θ is either 15◦
or 35◦
degree. The red and green person
indicate an active shooter and a normal person (a person who
does not carry a gun), respectively. In our experiment for the
three-person scenario, the active shooter was in the middle or
on the sides equal number of times; and for the two-person
scenario, the active shooter was either on the right or left side
equal number of times.
3 female) participated in the study, and we instructed each
subject to repeat every activity for four times. Furthermore,
to evaluate the detection capability of the proposed method
under different movement angles, we adopted two other start
points at 15◦
and 35◦
to the main lobe, respectively. The
subjects were again asked to perform each activity for four
times starting randomly between these two points. The data
gathered in our previous work [4] include only one-person
cases. In this study, we used data for four human subjects in
multiple combinations, where each of the subjects walk toward
the radar (i) alone, (ii) with another subject, and (iii) with two
other subjects, while performing activities (a) - (c).Overall we
had 8 subjects in this study where 4 of them weren’t labeled
subject-wise, meaning that they were labeled only as gun-
holder or non-gun-holder. The other 4 were labeled subject-
wise, for instance in the case where subject number one holds
the gun, he was labeled as ”S1-gun” and so on. Therefore those
subjects who weren’t labeled subject-wise were included only
in the training set, and those who were labeled subject-wise
were included in both training and test sets as described in the
II-D section.
As shown in Fig. 4, when multiple subjects are walking
towards the gate (radar) which is very common in public
places, the individual walking patterns will be present as a part
of a multivariate signal. Hence, features such as gap between
human body signature, torso speed, or limb speed will not
be easily identifiable as compared to the one person scenario
where the features were easily identifiable as shown in Fig. 4.
Therefore using a model with embedded data-driven feature
extraction is expected to be more efficient.
C. Modeling
We investigated different classifier models to detect the
cases when a subject is carrying a gun. This section provides
detailed descriptions for the models used.
1) Convolutional Neural Networks (CNN): Deep CNNs have
shown improved performance in different tasks including

Journal
Fig. 4: Micro-Doppler signature images of different subjects
(a. subject number 1 2, b. subject number 1 2 3) when
they walk toward the radar with and without holding the rifle.
The x-axis and y-axis denote time in seconds and velocity in
meter per seconds, respectively.
object detection [18],image recognition [19] and segmentation
[20], classification and regression [21]. CNNs have two main
modules – (a) Feature extractor: convolutional layers perform-
ing spatial feature extraction from the input images followed
by sub-sampling layers applying dimensionality reduction, (b)
Dense: dense or fully connected (FC) layers where all the
nodes of each layer are connected to the prior and latter layer
via weights. A non-linearity through an activation function is
applied on each node. The final layer is the classification layer
that makes the decision about a given sample belonging to the
most suitable class. The network is trained through the back
propagation process.
We trained a deep CNN with three feature extractor and four
dense modules. Each extractor module includes a convolution
layer followed by a batch normalization and a rectified linear
unit (ReLu) activation layer. Each dense module includes a
fully connected layer followed by a batch normalization, a
ReLu, and a dropout [22] layer, except for the classification
layer that excludes dropout. We used cross validation to set
the optimal values for the hyperparameters (i.e., number of
kernels, kernel size and stride of convolution layers, dropout
rate, number of units in dense layers, and learning rates).
2) Principal Component Analysis (PCA): PCA is a powerful
dimensionality reduction technique [23] that maps the data in
a way such that the top principal components (PC) contain
the majority of the data variance, therefore, one can discard
the smaller PCs and keep just the high variance components
to achieve dimensionality reduction. In this study, we keep
the top 90 PCs with 85% data variance, which reduces the
data dimension from 4096 to 90 (by 97.8%). We used the
coefficients for these 90 PCs as features to train and test an
SVM model.
3) Support Vector Machines (SVM): SVM learns to opti-
mally select the hyperplanes separating different data classes
[24], [25]. To map the linearly non-separable components into
linearly separable features, we use a Gaussian or radial basis
function (RBF) kernel representation of the input PCs as the
input feature space of the SVM. We then train two SVMs:
one consisting of a standard two-class classifier and another
a one-class classifier for the anomaly detection problem. The
anomaly detection problem can be formulated as the situation
where there exists zero or a limited number of samples of
human subjects walking with a concealed gun (i.e., anomaly)
in the training data compared to the normal samples, where
a normal activity consists of a subject walking either holding
nothing or carrying an object other than the concealed gun.
The results of the one-class SVM are analyzed in the
scenario where no anomalous samples are present in the
training set. In the case with a small number of anomalous
samples, we adopted the two-class SVM and observed the
effect of increasing the number of anomalous samples on the
performance. The hyperparameters of the SVM models were
set using a cross-validation procedure.
D. Data Augmentation
In general, fine-tuning the entire deep CNN (i.e., training
for all the weights) is only utilized when the new dataset is
large enough, or the model could suffer from overfitting. Since
the first few layers extract low-level features such as edges and
corners, they do not change significantly and can be reused for
similar purposes. Therefore, to train the deep CNN, we used
data augmentation to generate a sufficient number of images
for training. The proposed data augmentation framework gen-
eralizes large enough set of training images. This also serves
to evaluate the generalization performance of our model, as we
train the model on artificially generated multiple-person (2 or
3 person) images from one-person images and then test on
the actual two-person and three-person images. We also kept
the training and testing sets separate by dividing them based
on the appearance of human subjects e.g., we used Eq. (1) to
augment the images of subject 1 and subject 2 with random
weight values, as shown in Fig.6 (a), to increase the size of the
training set and then tested on the actual two-person images
of subjects 3 and 4.
STFT(s1 o s2) ≈ w1STFT(s1) + w2STFT(s2)
s.t. w1 + w2 = 1
(1)
where sk, wk are the radar output signal and augmentation
weight, respectively, for subject k, STFT(s1 o s2) is the
resultant synthetic augmented Micro-Doppler signature image.
The weight is estimated by minimizing the reconstruction
error. Then the weights are perturbed using Uniform(0,1) such
that reconstruction error obtained after the perturbation is
within 1 standard deviation from the min reconstruction error.
For three-person scenarios, we combined two-person aug-
mented images with one-person images to generate augmented

Journal
(a) Actual and augmented Micro-Doppler signature images of one of
the two-person (2P) cases
(b) Actual and augmented Micro-Doppler signature images of one of
the three-person (3P) cases
Fig. 5: Augmented and actual Micro-Doppler signature images of two-person and three-person cases.The x-axis and y-axis
denote time in seconds and velocity in meter per seconds, respectively.
× 𝒘𝟐
× 𝒘𝟏
𝒘𝟏 + 𝒘𝟐 = 𝟏
× 𝒘𝟏 × 𝒘𝟐
× 𝒘𝟑
× 𝒘𝟒
𝒘𝟏 + 𝒘𝟐 = 𝟏 𝒘𝟑 + 𝒘𝟒 = 𝟏
𝒔𝟏 𝒔𝟐
𝒔𝟏𝒐 𝒔𝟐
𝒔𝟏𝒐 𝒔𝟐
𝒔𝟏 𝒔𝟐
𝒔𝟑
𝒔𝟏𝒐 𝒔𝟐 𝒐 𝒔𝟑
a) b)
Fig. 6: Demonstration of the augmentation process for: a) two-
person and b) three-person scenarios.
images for training, shown in Fig.6 (b), and then tested on the
actual three-person images. Since the augmented images are
used for training and actual images are used for testing, higher
prediction accuracy can be achieved when the augmented
images are more similar to the actual images. Examples of
augmented and actual micro-Doppler signature images of two-
person and three-person cases are provided in Fig. 5 to show
the similarity between the augmented and actual images.
Note that, the STFT(s1 o s2) shown in Eq. 1 is an ap-
proximation of a two-person micro-Doppler signature image
under specific assumptions. If both targets are in line-of-sight,
the antenna is omnidirectional (i.e., radiating towards every
direction on the horizontal plane with the same power), and
there is no multipath, then their signals would add up similar
Fig. 7: Plot of array factor for an antenna array with two
elements separated with half-wavelength in the horizontal
plane, as an approximation of the radiation pattern for the
radar we used.
to Eq. 1 at the radar receiver. However, the antenna we used
has a directional radiation pattern. In the case considered
in our previous work [26], the single human subject was
right in front of the antenna (90 deg), where the radiation
is the strongest. In the current work when two people are
present, it would be unlikely that they are both in the direction
of the strongest radiation (90 deg) and there could be an
occlusion when one subject obstruct the radar beam, see Fig.

Journal
7. Therefore, the signal returned from each individual depends
on the angle they are walking at, and the distance they are from
the antenna. Thus, Eq. 1 is approximating the joint STFT only
for specific scenarios that are often being violated in normal
circumstances of experimentation. In our study, we are using
Eq. 1 to generate synthetic examples of two-person STFTs for
training our classification model. Note that the goal of data
augmentation for training is to improve the model accuracy
for testing and even if the augmented samples do not reflect
the physical reality accurately, they are serving their purpose
as long as they are able to improve the prediction accuracy
for actual testing samples. We further show the usefulness of
Eq. 1 in approximating the two-person STFT by conducting
a simulation study included in Results section that illustrates
that images generated using Eq. 1 preserve the characteristics
of actual two-person STFT images as compared to images
created by randomly sampling from two-person STFT image
pixels.
1) Skewness of training dataset: Since we can synthetically
generate any number of samples for the training dataset by
assigning different random weights to the images, we inves-
tigated skewness of training set classes as a hyper-parameter.
In other words, we created training dataset class distributions
skewed towards the abnormal class (gun holder class) with
different percentages, and each model was trained on the
generated dataset. For each scenario of a skewed training
dataset, we generated 2000 images as normal images and the
number of anomalous images (scenarios with concealed gun)
were dependent on the unbalanced rate of the classes. For
instance, if the skewness is 60%, then the training set contains
2000 normal images and 3000 images with a concealed gun.
The created training datasets were skewed for both two-person
and three-person cases.
E. Sensitivity Maximization
In our application, the repercussion of making a type II
error, or false negative (FN) detection, is much more severe
than making a type I error, or false positive (FP) detection.
In other words, we aim to make fewer mistakes in detecting
no gun when a concealed gun is present, rather than detecting
some other object as a gun. Since detecting a regular person
erroneously as a gun holder results in a small check to keep
people safe, while letting an actual gun holder go undetected
might lead to catastrophic outcomes. Therefore, our classi-
fier must be extra responsive to the anomalous ”gun” class.
This leads to minimization of the false negative rate (FNR)
or equivalently, maximizing the true positive rate (TPR) or
Sensitivity, defined in Eq. 2 using the confusion matrix in Table
I, where # denotes total count.
TABLE I: Confusion matrix for the gun detection problem
Observation
Total
Population
Gun No Gun
Prediction
Gun
True Positive
(# := TP)
False Positive
(# := FP)
No Gun
False Negative
(# := FN)
True Negative
(# := TN)
Sen = TPR =
TP
TP + FN
= 1 − FNR (2)
ACC =
TP + TN
TP + FN + FP + TN
(3)
To this end, we explore two approaches to maximize the
model sensitivity as described below.
1) Penalized cross entropy: In information theory, cross-
entropy is a measure of the difference between two probability
distributions, built upon the idea of information entropy. In
machine learning, cross-entropy can be defined as the average
number of bits required to encode data coming from a source
with distribution t(x) when we use an approximate model y(x)
[27]. For a binary classification problem, the distribution of x
can be modeled as the Bernoulli distribution, therefore, the
cross-entropy loss function can be framed as
H(t, y) = −
1
N
N
X
i=1
h
ti log yi + (1 − ti) log (1 − yi)
i
(4)
where ti := t (xi) , yi := y (xi) for i = 1, 2, · · · , N. The
two individual terms in Eq. 4 correspond to TP and TN
measures, respectively. Since we are interested in minimizing
the cross-entropy loss to achieve the maximum accuracy (or
in information theory terms, minimizing the number of bits
required to express the difference between the target and
approximated distributions), we add a regularization term in
Eq. 4 to define our revised loss function Hp in Eq. 5. The
regularization term here corresponds to the FN measure i.e.,
it penalizes for the scenarios where the algorithm misses
to detect gun holders (t = 1) as anomalies (y = 0) and
the regularization parameter λ is again learned through cross
validation.
Hp(t, y) = −
1
N
N
X
i=1
h
ti log yi+(1 − ti) log (1 − yi)
+ λ ti log (1 − yi)
i
(5)
2) Bayesian optimization: Bayesian optimization is a statisti-
cal framework for derivative-free global optimization of expen-
sive black-box functions [28]. The framework, based on Bayes
rule in Eq. 6, performs sequential queries of a distribution
over the black-box model M defined by a surrogate model,
or simply put, constructs a probabilistic model that defines a
distribution over an objective function mapped from the input
space to optimize the objective of interest. Specifically, we
prescribe a prior belief P (Θ | M) over the possible objective
function f and then sequentially refine M as new data is
observed by updating the Bayesian posterior P(Θ | M, D)
representing our updated belief on the likelihood of f given
N observation pairs, D = {(xi, yi)}N
i=1.
P (Θ | M, D) =
P (D | Θ, M) P (Θ | M)
P(D | M)
(6)
The posterior model hyperparameter Θ is queried sequen-
tially to locate the global optimum. The promise of a new
experiment is quantified using an acquisition function, which,
applied to the posterior mean and variance provides a trade-off

Journal
5×5
conv,
12
Batch
Norm
ReLu
Fc,
488
Batch
Norm
ReLu
Dropout
(0.7)
5×5
conv,
24
Batch
Norm
ReLu
5×5
conv,
48
Batch
Norm
ReLu
Fc,
160
Batch
Norm
ReLu
Dropout
(0.7)
Fc,
16
Batch
Norm
ReLu
Dropout
(0.7)
Fc,2
Batch
Norm
SoftMax
Input
Output
Fig. 8: The Bayesian CNN architecture. The kernel size, number of channels for each convolutional layer, and number of units
for each fully connected (FC) layer, is denoted their corresponding box.
between exploration and exploitation. The acquisition function
evaluates the utility of candidate points for the next evaluation,
and the next subset of hyperparameters with maximum uncer-
tainty is selected dependent on the currently observed sets
[29]–[32]. For constructing the distribution over f, Gaussian
processes (GP) are widely used due to their flexibility, well-
calibrated uncertainty, and analytic properties [28]. To utilize
Bayesian optimization for our application, we need to define
two key ingredients: (a) prior distribution of hyperparameters,
P (Θ | M), and (b) loss function, f to be minimized by the
surrogate model. The hyperparameter set, Θ, consists of the
following variables, where U denotes the discrete Uniform
distribution.
(i) Learning rate: continuous, log ρ ∼ U(0.001, 0.01)
(ii) #Kernels in conv. layer: discrete, ∼ U(8, 32)
(iii) Stride: categorical, 1 or 2
(iv) #Units in dense layers: discrete, modeled by a discrete
quantile distribution between a and b i.e.,
n ∼
j
1
q U(a, b)
m
q =: Uq(a, b)
• Layer 1: ∼ Uq(300, 500)
• Layer 2: ∼ Uq(80, 200)
• Layer 3: ∼ Uq(10, 50)
Since our goal is to maximize the sensitivity as defined in
Eq. 2, we choose FNR or miss rate as our loss function, f
to be minimized by Bayesian optimization.
argmin
Θ∗∈ Ω
FNR (Θ) , Ω ⊂ RK
(7)
In our work, we build upon the standard GP based approach
of Bergstra et al. [33]. The Bayesian CNN architecture is
provided in Fig. 8. To reduce the computational complexity
of the Bayesian learning process, we double the number
of channels of each convolutional layer as compared to its
preceding convolutional layer. We used the same architecture
for the Bayesian CNN and base CNN approaches where
the ultimate hyperparameters for each model is different due
to their hyperparameter learning paradigms. The base CNN
model hyperparameters are optimized using a grid search. The
hyperparameters of each network are provided in table II.
III. RESULTS
In this section we present results of 1) Simulation of
linear combination of micro-Doppler signature images that are
close approximation of their actuals. 2) Predictive modeling
frameworks on single and multiple-person data.
TABLE II: Bayesian CNN and CNN (base) hyperparameters
Hyperparameter Bayesian CNN CNN (base)
Conv1,2,3 kernel 5 × 5 3 × 3
Conv1,2,3 stride 2 1
Conv1 #Channels 12 8
Conv2 #Channel 24 16
Conv3 #Channels 48 32
FC1 #units 488 512
FC2 #units 160 256
FC3 #units 16 64
Learning rate 0.0037 0.001
A. Data augmentation simulation
This section considers a simulation to estimate whether
synthetically generated images using Eq. 1 preserve the
characteristics of an actual two-person STFT image. To
create the simulation, we consider a set of actual two-
person images (I1, I2, ..., In) and multiple one person images
(P1, P2, ..., Pk). We use similarity measures such as SSIM
[34]; and its embedded component including, luminance (Eq.
8), contrast (Eq. 9), structure (Eq. 10); and total variation
[35] to calculate the distance between a linear combination
of Pi’s and the I images (call that d). SSIM (Eq. 11) is the
weighted average of the three independent components, which
measures the similarity between two images X, Y . Each of the
components are calculated using a sliding window, where µx
and µy denote average of x and y, σ2
x and σ2
y are variance of
x and y, σxy is covariance of x and y, and c1, c2, c3 are to
stabilize the division. Same as [34], we set α = β = γ = 1 and
c3 = c2/2. Note that, the larger SSIM, luminance, contrast,
and structure indicate better similarity between the images X
(augmented image) and Y (original image).
l(x, y) =
2µxµy + c1
µ2
x + µ2
y + c1
(8)
c(x, y) =
2σxσy + c2
σ2
x + σ2
y + c2
(9)
s(x, y) =
σxy + c3
σxσy + c3
(10)
SSIM(x, y) = [l(x, y)α
.c(x, y)β
.s(x, y)γ
] (11)
Signals with excessive spurious detail have higher total vari-
ation, therefore a similar total variation value of a signal
compared to its original signal represents a closer match. For a
2-D signal Y , total variation proposed as Eq. 12. We compute

Journal
0.0 0.5 1.0
0
200
400
600
800
1000
1200
Count
0.0 0.5
0
1000
2000
3000
4000
5000
0.0 0.5 1.0
0
200
400
600
800
1000
1200
0.00 0.25 0.50
0
5000
10000
15000
20000
0 10000 20000
0
200
400
600
800
1000
1200
2
person
0.0 0.5 1.0
Luminance
0
200
400
600
800
1000
1200
Count
0.0 0.5
Contrast
0
500
1000
1500
2000
0.0 0.5 1.0
Structure
0
200
400
600
800
1000
1200
0.00 0.25 0.50
SSIM
0
5000
10000
15000
20000
25000
30000
0 10000 20000
Total Variation
0
500
1000
1500
2000
2500
3
person
Augmented Random
Fig. 9: Distribution of luminance, contrast, structure, SSIM, and total variation of images created by the proposed augmentation
framework and randomly sampled from the actual micro-Doppler signatures for two-person(top row) and three-person(bottom
row) scenarios.
total variation for both augmented and actual Micro-Doppler
signature images, and measure their distance.
V (y) = Σij
q
|yi+1,j − yi,j|2 + |yi,j+1 − yi,j|2 (12)
We create a distribution of distances using random im-
ages and their distances from the two-person/three-person
images to locate where our distances (d) stand with respect
to those random image distances. Note that, the synthetic
images created do not have an exact corresponding actual
image, but we have included the luminance, contrast, structure
and SSIM distribution of the actual two-person/three-person
images. We hypothesize that in the case of SSIM and its
embedded components, if the d’s are at the far right side of the
distribution, it will show that the linear combinations produce
images that are closer to two-person/three-person scenarios as
compared to random images. Inversely, in the case of total
variation, if the d’s are at the far left side of the distribution,
it will also show that the augmented images are closer to two-
person/three-person scenarios than random images sampled
from the same pixel distribution as the actual two-person/three-
person images. We created the random images by sampling
from the actual images such that the intensity value of each
pixel of each random image was sampled from the distribution
of pixel’s intensity values across all actual two-person/three-
person images. We show the distribution of each similarity
metric, along with the embedded components of SSIM includ-
ing, luminance, contrast and structure in Fig. 9. As the results
indicate, we observe no overlap between the distributions and
the associated score with our augmented images is better than
randomly sampled images for all the metrics. Furthermore,
the luminance and structure of the augmented images are
very close to the perfect score of 1, hence it represents our
augmented images preserve the structure and luminance of
the actual images. We also applied Kolmogorov-Smirnov [36]
test to compare the distributions for each metric, where the
D-statistics and p-values are 1 and 0, respectively for all the
metrics.
B. Predictive modeling
We trained predictive models using one-person data (both
with and without a gun) and tested them to detect potential
active shooters based on three scenarios of 1-person, 2-person
and 3-person data. For the multi-person testing scenarios, we
considered either none of them have guns or one of them
has a concealed gun. To extract frequency domain dependent
features of the data, the 2-D fast Fourier transform (FFT)
was applied to the micro-Doppler signature images. Note that
the micro-Doppler signature images were created by applying
STFT to the radar output, where the radar output is divided into
short segments of equal length, and the FFT is taken for each
segment revealing the Doppler shift due to target movement for
each short period. Hence, the micro-Doppler signature images
contain combinational frequency domain characteristics of the
subjects. The FFT of the micro-Doppler signature images
unveil the combinational pattern existing in the frequency
domain as shown in Fig. 10. Fig. 10 represents two radar
signature images of subject 1 where he walks toward the
radar with and without the concealed gun, along with their
corresponding FFT images in regular and logarithmic scales.
As shown in Fig. 10, the periodic pattern is different when
the subject holds the rifle as compared to when he doesn’t
hold the rifle. Thus, all the models were trained on the FFT

Journal
images and the micro-Doppler signature images separately to
investigate the best performance. Since FFT images in regular
scale are sparse, we used their logarithmic scale to train the
CNNs.
Fig. 10: Micro-Doppler signature images of subject number
one, when he walks toward the radar with and without the
rifle, with their corresponding FFT images in regular and
logarithmic scales. The x-axis and y-axis of (a) denote time in
seconds and velocity in meter per seconds, respectively. The
x-axis and y-axis of (b) (c) denote frequency in hertz.
We used the leave-one-out cross validation approach to train
and test all the predictive models, where in each case one
subject is left out for testing and the models were trained
on the rest of the subjects. In the second scenario, where we
trained our model on the augmented 1-person data, and tested
on the actual data, we kept the subjects included in the test
set completely out of the training set. For example, when we
test on subject 1-2, we train on the augmented data of the
subjects 3 to 7, as explained in section Data Augmentation.
The baseline for comparison is training the same model on the
actual data without any data augmentation.
To train the one-class and two-class SVM’s, the 64 pixels
by 64 pixels micro-Doppler signature images and their Fourier
transformed Images were vectorized into vectors of length
4096 and organized into training and testing matrices based
on the subject being tested. The training matrices were then
normalized and used to train PCA separately for both the two-
class and one-class training data. All training and testing data
were transformed to the first 90 principal components as over
85 % of the variance of the original data was still obtained
while reducing the dimensions, from 4096 to 90. The same
set of images were used to train and test the CNN with cross
entropy loss function, CNN with penalized cross entropy loss
function and CNN optimized with Bayesian approach that
maximizes the sensitivity (BCNN). The results for training
on one-person and testing on one-person is provided in Table
III. As shown in Table III, CNN optimized with the Bayesian
framework outperforms other models. In the case where we
used FFT of micro-Doppler images for training the predictive
models, two-class SVM outperform other models.
We used the same modeling framework for the two-person
scenarios, with the only difference being the training data,
as there were no actual two-person signatures in the training
set. The models were trained on artificially augmented data
from scratch but tested on actual two-person data. The training
data was augmented using the method described in section
Data Augmentation while leaving the images of the two
subjects in the test set completely out of the training set
before augmentation. As the augmentation includes assigning
random weights to each subject through an iterative process,
we generated a different number of images for training with
different skewness percentages towards the abnormal class. We
then trained the models on the skewed augmented datasets,
and tested the models on the actual two-person data. Then
the average of each metric was calculated. We have also used
majority voting to combine all the classifiers together as the
ensemble prediction approach. The results are summarized in
Fig. 11, and the complete results are provided in Table VII. As
the provided results show, the ensemble classifier and the CNN
optimized with Bayesian framework outperforms the other
models not only in terms of sensitivity which was its objective
function but also in the majority of the other metrics. As shown
in Fig. 11, by skewing the distribution of the training dataset
towards the abnormal class (gun-holder class), the classifier
performance improves with a peak on 80 % skewness for the
images generated using the Micro-Doppler images, and a peak
on 70 % for the images created using the FFT of the Micro-
Doppler images.
To minimize the objective loss (we considered FNR that
maximizes the sensitivity), the Bayesian optimizer seeks op-
timal hyper-parameters iteratively, where in each iteration, it
picks the next set of hyper-parameters based on their estimated
uncertainty. In other words, the posterior distributions of the
hyper-parameters are being updated by considering the area
of the hyper-parameter distribution with the most uncertainty
about the objective function. The prior and posterior distribu-
tion of the CNN’s learning rate is represented in Fig. 13, where
the optimum learning rate is 0.0037. The prior and posterior
distributions of the first through third dense layer’s number
of units are shown in Fig. 13. Their prior distributions were
chosen to be uniform quantile with different ranges. As Fig. 13
shows, the posterior distribution of number of units in the first
and third dense layer is almost unimodal, while the posterior
distribution of unit numbers in second dense layer did not
deviate considerably from a uniform distribution. It can be
concluded from their posterior distributions that the first and
third dense layers play more important role in minimizing the
objective function of the Bayesian optimizer.
Similar to the procedure for the two-person data, we aug-
mented the one-person data to generate three-person data for
training all the models and subsequently tested the models on
actual three-person data. We used leave-one-out cross valida-
tion method to evaluate all the models, where the subjects
included in the test set are left out from the training set. To
generate the augmented training set, we used the II-D approach
twice. First, we created two-person data, by assigning random
weights to each individual subject repeatedly. Subsequently,
we created three-person images by assigning random weights
to the created two-person images and another one-person
image. The results of modeling on the created three-person

Journal
TABLE III: Results for training on one-person data and testing on one-person data where penalized cross-entropy is represented
by (PCE) and CNN optimized with Bayesian approach is represented by Bayesian CNN. Each provided result is based on
calculating the average of each metric over multiple runs to minimize the effect of random initialization. 120 Micro-Doppler
images were used for testing in total. Since we used leave-one-subject-out cross validation —4 subjects for testing, for each
subject data acquired in 6 different condition with 5 times repetition—we had 30 images for testing and 191 (90 + 101) for
training at each fold. The additional 101 images belong to three other subjects were not labeled subjects-wise.
Models
Input = Micro-Doppler signature images Input = Fourier transformed Micro-Doppler signature images
Accuracy Sensitivity Precision F1-score AUROC Accuracy Sensitivity Precision F1-score AUROC
One-class SVM 60.64 60.64 57.13 62.66 69.95 54.51 54.51 51.25 51.69 45.92
Two-class SVM 93.30 93.30 93.67 93.16 91.25 98.33 98.33 98.61 98.37 98.75
CNN (base) 85.27 80.39 72.36 88.14 84.66 91.74 87.50 89.74 88.61 90.87
CNN + PCE loss 92.50 84.03 90.75 92.50 92.68 89.90 87.50 85.36 86.61 90.87
Bayesian CNN 95.00 90.65 94.50 94.88 93.45 91.74 90.00 87.80 88.88 90.54
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Accuracy
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Sensitivity
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
F1-score
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
(a) Micro-Doppler signature images as the input of the predictive
models
50 60 70 80 90
Skewness
50
60
70
80
90
100
Accuracy
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Sensitivity
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
F1-score
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
(b) FFT of Micro-Doppler signature images as the input of the
predictive models
Fig. 11: Accuracy, sensitivity (recall) and F1-score for each model across distribution of classes in the training set for the
two-person walking toward the radar detection problem.
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Accuracy
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Sensitivity
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
F1-score
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
(a) Micro-Doppler signature images as the input of the predictive
models
50 60 70 80 90
Skewness
50
60
70
80
90
100
Accuracy
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
Sensitivity
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
30 40 50 60 70 80 90
Skewness
50
60
70
80
90
100
F1-score
One-class SVM
Two-class SVM
CNN
CNN PCE
BCNN
Ensemble
(b) FFT of Micro-Doppler signature images as the input of the
predictive models
Fig. 12: Accuracy, sensitivity (recall) and F1-score for each model across distribution of classes in the training set for the
three-person walking towards the radar detection problem.
data using one-person data and testing on the actual three-
person data are provided in Fig. 12, and the complete results
are provided in Table VIII. As the results show, CNN opti-
mized with the Bayesian framework has superior performance
compared to the other predictive models in terms of accuracy
and sensitivity for both micro-Doppler signature images and
FFT of micro-Doppler signature images as inputs.
The results illustrate that the synthetically generated training
dataset using one-person acquired data has improved the
accuracy (from around 55% for without augmentation scenario
to 88% with augmentation) and sensitivity (from around 55%
for without augmentation scenario to 86% with augmentation)
of the models tested on actual two-person data. However, the
results for the three-person scenario is still limited in terms
of the final accuracy and sensitivity but still a significant
improvement as compared to the baseline scenario of no
augmentation (from around 52% for without augmentation
scenario to 72% with augmentation for accuracy and around
50% for without augmentation scenario to 70% with augmen-
tation for sensitivity). One of the possible reasons for 70%
accuracy is the limitation on the number of individuals used for
generating the one person dataset (only 8 people with 3 being

Journal
kept for testing and the remaining 5 being used for generating
the training set) and thus the variability captured in the training
dataset is still limited. Further research is needed to explore the
effect of including more subjects in the single person training
data on the three-person performance. However, we observed
that changing the percentage of samples in each class in the
training set can affect the accuracy and sensitivity significantly,
as shown in Fig. 11 and 12, and thus the skewness can be
considered a hyperparameter during the model training.
IV. CONCLUSION
This article investigated the feasibility of using RF based
Machine Vision for identifying a potential shooter with a
concealed weapon from a group of two or three people. Earlier
studies have focused on anomaly detection for single person
scenarios, but we observed that a model trained on single
person scenarios and applied to two or three-person scenarios
does not perform any better than random predictions (with
accuracy in the range of 50%). We improved the perfor-
mance of our detection approach for the multi-person scenario
by considering improvements in terms of training sample
generation, training sample selection, and model generation.
In terms of training sample generation, we considered data
augmentation using synthetically created data of multi-person
scenarios resulting in performance improvements between
20-30%. For training sample selection, we considered that
unbalanced classes in training improved the performance of
our classifier, and thus the skewness was incorporated as a
hyperparameter in our model training. In terms of model
selection, we considered a Bayesian optimizer framework
that allows us to define a secondary objective function in
conjunction with the primary objective function of maximizing
accuracy considered in a regular machine learning model.
We showed the Bayesian optimization framework was able
to improve both the primary objective of increasing Accuracy
along with the secondary objective of increasing the Sensi-
tivity by around 5%. Furthermore, our approach considered
mono-static radar—one transmitter and one receiver working
coherently at a single location—which is simpler and has
lower cost in comparison to other state-of-the-art studies in
this area such as [37] and [38] that focused on multi-static
radar —multiple transmitter/receiver at different locations. We
believe that this study is the first to consider concealed rifle
detection in multi-person scenarios using radar sensors. This
detection method can potentially be integrated with other
detection approaches to improve the reliability and robustness
of the system. Future research in this area needs to consider the
effect of multiple radar sensors in improving the reliability and
robustness of the results and the optimized manner of training
sample generation.
APPENDIX
TABLE IV: Acquired micro-Doppler signature images for tasks
performed by one person individually. The rows denote the subject
and the columns represent the tasks including: carrying a riffle (Gun);
not carrying the riffle (Walk); carrying a gym bag (Gym); carrying a
gym bag and walking toward the radar with and angle (GymAngle);
walking toward the radar with an angle without carrying the riffle
(WalkAngle); walking toward the radar with an angle and carrying
the riffle (GunAgnel).
Subjects Gun Walk Gym GymAngle WalkAngle GunAngle Total
S1 5 5 5 5 5 5 30
S2 5 5 5 5 5 5 30
S3 5 5 5 5 5 5 30
S4 5 5 5 5 5 5 30
S5-7 12 12 12 22 21 22 101
TABLE V: Acquired micro-Doppler signature images for tasks
performed by two-person. The rows denote the pair of subjects and
the columns represent the tasks including: one of the two-person
carries a riffle (Gun); none of them carries the riffle (Walk); one of
the carries a gym bag and the other doesn’t carry anything (Gym); one
of the carries the riffle and the other carries the gym bag (GunGym);
none of them carries the riffle and each of them walk toward the
radar with some angle (WalkAngle).
Subjects Gun Walk Gym GunGym WalkAngle Total
S12 2 2 0 4 4 12
S23 2 2 0 4 4 12
S24 4 2 4 4 4 18
S34 4 2 4 4 4 18
TABLE VI: Acquired micro-Doppler signature images for tasks
performed by three-person. The rows denote the group of subjects
and the columns represent the tasks including: one of the three-person
carries a riffle (Gun); none of them carries the riffle (Walk); one of
the carries a gym bag and the others don’t carry anything (Gym);
one of the carries the riffle, one carries the gym bag, and the other
carries nothing (GunGym).
Subjects Gun Walk Gym GunGym Total
S123 5 5 5 5 20
S134 5 5 5 5 20

Journal
Fig. 13: Prior and posterior distribution of hyper-parameters including learning rate, and the number of units in first, second, and third
dense layer of the CNN throughout the optimization process.

Journal
TABLE VII: Training on augmented one person data and testing on two-person data results, where penalized cross-entropy is represented
by (PCE) and CNN optimized with Bayesian is represented by Bayesian CNN. Each provided result is based on calculating the average of
each metric over multiple runs, which minimizes the effect of random initialization. 60 Micro-Doppler images were used for testing, and
the training size varies based on the skew rate provided in each row of the training class column.
Training class Models
Without Augmentation,training size = 191
Baseline One-class SVM 47.42 47.42 22.87 30.67 50.00 48.83 38.83 38.73 33.76 51.25
Baseline Two-class SVM 54.51 54.51 42.96 40.27 51.56 52.73 52.73 28.32 36.70 50.00
Baseline CNN 55.00 0.04 100.00 0.07 51.66 58.33 10.71 100.00 19.35 51.57
Skewed (30 %, ,training size = 2857)
One-class SVM 42.93 43.33 29.74 32.74 43.33 49.83 52.19 49.23 38.58 52.19
Two-class SVM 71.30 68.75 77.68 65.38 68.75 69.64 67.50 78.99 64.30 67.50
CNN (base) 71.67 60.71 73.91 66.67 70.12 70.00 57.14 72.73 64.00 68.74
CNN + PCE 73.33 64.29 75.00 69.23 73.33 73.33 60.71 77.27 68.00 72.41
Bayesian CNN 76.67 67.86 79.17 73.08 75.16 78.33 71.43 80.00 75.47 77.16
Ensemble 76.67 76.34 76.70 76.43 76.34 73.33 76.34 73.30 73.06 72.99
Skewed (40 %,training size = 3143)
One-class SVM 44.32 44.58 31.31 34.29 44.58 48.44 50.63 44.65 37.54 50.63
Two-class SVM 77.26 74.48 84.16 72.67 74.48 71.21 69.58 79.68 66.84 69.58
CNN (base) 75.00 64.29 78.26 70.59 76.42 73.33 60.71 77.27 68.00 72.41
CNN + PCE 76.67 67.86 79.17 73.08 75.16 73.33 64.29 75.00 69.23 73.33
Bayesian CNN 80.00 71.43 83.33 76.92 79.28 78.33 75.00 77.78 76.36 78.33
Ensemble 83.33 83.04 83.48 83.16 83.04 80.00 83.04 80.09 79.80 79.69
One-class SVM 45.71 45.83 32.66 35.72 45.83 50.22 52.19 45.06 38.29 52.19
Two-class SVM 78.82 76.56 84.96 74.83 76.56 69.82 68.02 78.10 64.76 68.02
CNN (base) 78.33 64.28 85.71 73.47 76.56 80.00 71.42 83.33 76.92 79.28
CNN + PCE 75.00 67.85 76.00 71.69 74.44 81.66 78.57 81.48 79.99 82.93
Bayesian CNN 83.33 75.00 87.50 80.76 83.58 83.33 82.14 82.14 82.14 84.71
Ensemble 80.00 79.24 81.34 79.43 79.24 81.67 79.24 81.65 81.54 81.47
One-class SVM 45.71 45.83 32.66 35.72 45.83 50.22 52.19 45.06 38.29 52.19
Two-class SVM 79.99 78.13 84.97 77.44 78.13 78.25 77.19 82.10 76.11 77.19
CNN (base) 78.33 67.86 82.61 74.51 76.93 76.67 64.29 81.82 72.00 76.42
CNN + PCE 76.67 71.43 76.92 74.07 74.36 76.67 67.86 79.17 73.08 75.16
Bayesian CNN 83.33 75.00 87.50 80.76 82.58 86.67 82.14 88.46 85.19 84.65
Ensemble 83.33 83.04 83.48 83.16 83.04 80.00 83.04 80.09 79.80 79.69
One-class SVM 45.71 45.83 32.66 35.72 45.83 50.22 52.19 45.06 38.29 52.19
Two-class SVM 83.16 82.08 84.70 82.17 82.08 79.86 80.21 80.31 79.45 80.21
CNN (base) 81.67 78.57 81.48 80.00 80.54 85.00 82.14 85.19 83.64 83.73
CNN + PCE 83.33 82.14 82.14 82.14 82.79 78.33 82.14 74.19 77.97 77.62
Bayesian CNN 85.00 82.14 85.19 83.64 83.73 86.67 82.14 88.46 85.19 86.28
Ensemble 83.33 83.26 83.26 83.26 83.26 88.33 83.26 88.26 88.30 88.39
One-class SVM 44.32 44.58 31.31 34.29 44.58 50.22 52.19 45.06 38.29 52.19
Two-class SVM 78.42 77.19 79.88 76.85 77.19 68.77 70.42 80.08 66.11 70.42
CNN (base) 86.67 85.71 85.71 85.71 86.28 88.33 85.71 88.89 87.27 87.71
CNN + PCE 82.76 82.14 82.14 82.14 82.85 83.33 78.57 84.62 81.48 81.14
Bayesian CNN 88.33 85.71 88.89 87.27 87.71 88.33 89.29 86.21 87.79 87.57
Ensemble 86.67 86.61 86.61 86.61 86.61 88.33 86.61 88.26 88.30 88.39
One-class SVM 45.71 45.83 32.66 35.72 45.83 50.22 52.19 45.06 38.29 52.19
Two-class SVM 79.64 79.38 80.21 79.07 79.38 59.18 61.25 64.46 51.81 61.25
CNN (base) 83.33 78.57 85.71 73.47 76.56 85.00 82.14 85.18 83.63 83.73
CNN + PCE 81.66 82.14 76.00 71.69 74.44 76.66 82.14 71.78 76.66 78.42
Bayesian CNN 86.66 82.14 87.50 80.76 82.58 86.66 85.71 85.71 85.71 86.42
Ensemble 85.00 84.82 85.02 84.90 84.82 85.00 84.82 84.93 84.96 85.04

Journal
TABLE VIII: Training on augmented one person data and testing on three-person data results, where penalized cross-entropy is represented
by (PCE) and CNN optimized with Bayesian is represented by Bayesian CNN. Each provided result is based on calculating the average of
each metric over multiple runs, which minimizes the effect of random initialization. 40 Micro-Doppler images were used for testing, and
the training size varies based on the skew rate provided in each row of the training class column.
Training class Models
Without Augmentation
Baseline One-class SVM 50.00 50.00 25.00 33.33 50.00 65.00 65.00 77.86 51.84 60.00
Baseline Two-class SVM 50.00 50.00 25.00 33.33 50.00 50.00 50.00 25.00 33.33 50.00
Baseline CNN 52.50 0.05 100.00 0.09 52.50 47.50 50.00 47.61 48.78 47.50
One-class SVM 50.00 50.00 39.59 42.50 50.00 72.50 72.50 79.71 70.06 72.50
Two-class SVM 55.00 55.00 51.39 42.86 55.00 50.00 50.00 25.00 33.33 50.00
CNN (base) 47.50 35.00 46.67 40.00 50.00 52.50 45.00 52.94 48.65 52.50
CNN + PCE 50.00 35.00 50.00 41.18 50.00 52.50 45.00 52.94 48.65 52.50
Bayesian CN 50.00 45.00 50.00 47.37 50.00 52.50 50.00 52.63 51.28 52.50
Ensemble 52.50 52.50 52.67 51.75 52.50 52.50 52.50 52.56 52.23 52.50
One-class SVM 50.00 50.00 39.59 42.50 50.00 72.50 72.50 79.71 70.06 72.50
Two-class SVM 55.00 55.00 51.39 42.86 55.00 50.00 50.00 25.00 33.33 50.00
CNN (base) 55.00 30.00 60.00 40.00 55.00 62.50 50.00 66.67 57.14 61.14
CNN + PCE 55.00 30.00 60.00 40.00 55.00 57.50 55.00 57.89 56.41 56.92
Bayesian CN 57.50 60.00 57.14 58.54 57.50 60.00 55.00 61.11 57.89 58.53
Ensemble 55.00 55.00 55.95 53.13 55.00 62.50 55.00 63.33 61.90 62.50
One-class SVM 47.50 47.50 36.84 39.48 47.50 72.50 72.50 79.71 70.06 72.50
Two-class SVM 55.00 55.00 51.39 42.86 55.00 52.50 52.50 50.66 38.45 52.50
CNN (base) 55.00 25.00 62.50 35.71 51.65 67.50 55.00 73.33 62.86 65.40
CNN + PCE 57.50 35.00 63.63 45.16 52.93 67.50 60.00 70.58 64.86 67.50
Bayesian CN 62.50 75.00 60.00 66.66 60.48 72.50 65.00 76.47 70.27 70.58
Ensemble 55.00 55.00 56.67 52.00 55.00 67.50 55.00 68.67 66.98 67.50
One-class SVM 47.50 47.50 36.84 39.48 47.50 72.50 72.50 79.71 70.06 72.50
Two-class SVM 52.50 52.50 50.66 38.45 52.50 57.50 57.50 77.05 47.98 57.50
CNN (base) 55.00 30.00 60.00 40.00 55.00 65.00 55.00 68.75 61.11 65.00
CNN + PCE 55.00 35.00 58.33 43.75 55.00 67.50 60.00 70.59 64.86 67.50
Bayesian CN 60.00 70.00 58.33 63.64 60.00 70.00 65.00 72.22 68.42 70.00
Ensemble 55.00 55.00 56.67 52.00 55.00 62.50 55.00 64.25 61.32 62.50
One-class SVM 47.50 47.50 36.84 39.48 47.50 72.50 72.50 79.71 70.06 72.50
Two-class SVM 57.50 57.50 52.21 46.72 57.50 60.00 60.00 69.61 54.42 60.00
CNN (base) 60.00 55.00 61.11 57.89 58.53 67.50 60.00 70.59 64.86 67.50
CNN + PCE 62.50 60.00 63.16 61.54 61.83 65.00 60.00 66.67 63.16 63.16
Bayesian CN 67.50 70.00 66.66 68.29 67.50 70.00 65.00 72.22 68.42 69.73
Ensemble 67.50 67.50 67.90 67.32 67.50 67.50 67.50 67.90 67.32 67.50
One-class SVM 47.50 47.50 36.84 39.48 47.50 72.50 72.50 79.71 70.06 72.50
Two-class SVM 55.00 55.00 51.55 45.57 55.00 57.50 57.50 59.29 55.17 57.50
CNN (base) 67.50 65.00 68.42 66.67 67.50 72.50 70.00 73.68 71.79 72.50
CNN + PCE 70.00 65.00 72.22 68.42 70.00 70.00 70.00 70.00 70.00 70.00
Bayesian CN 72.50 70.00 73.68 71.79 72.50 77.50 75.00 78.95 76.92 77.50
Ensemble 70.00 70.00 70.20 69.92 70.00 72.50 70.00 73.02 72.34 72.50
One-class SVM 47.50 47.50 36.84 39.48 47.50 72.50 72.50 79.71 70.06 72.50
Two-class SVM 55.00 55.00 57.81 48.26 55.00 65.00 65.00 65.82 64.55 65.00
CNN (base) 60.00 50.00 62.50 55.55 60.00 70.00 60.00 75.00 66.66 70.00
CNN + PCE 65.00 55.00 68.75 61.11 62.50 70.00 65.00 72.22 68.42 71.00
Bayesian CN 67.50 75.00 65.21 69.76 67.50 72.50 75.00 71.42 73.17 73.75
Ensemble 62.50 62.50 63.33 61.90 62.50 72.50 62.50 73.02 72.34 72.50

Journal
REFERENCES
[1] K. W. Schweit, “Active shooter incidents in the united states in 2014
and 2015,” Washington, DC: US Department of Justice, Federal Bureau
of Investigation, 2016.
[2] E. D. Nunes, “United nations office on drugs and crime (unodc). global
study on homicide: Trends, context, data. vienna: Unodc; 2011,” Ciência
Saúde Coletiva, vol. 17, pp. 3447–3449, 2012.
[3] T. Smith, “Weapon location by acoustic-optic sensor fusion,” Sept. 16
2003. US Patent 6,621,764.
[4] Y. Li, Z. Peng, R. Pal, and C. Li, “Potential active shooter detection
based on radar micro-doppler and range-doppler analysis using artificial
neural network,” IEEE Sensors Journal, vol. 19, no. 3, pp. 1052–1063,
2019.
[5] R. Olmos, S. Tabik, and F. Herrera, “Automatic handgun detection alarm
in videos using deep learning,” Neurocomputing, vol. 275, pp. 66–72,
2018.
[6] D. Tahmoush, J. Silvious, and J. Clark, “An ugs radar with micro-
doppler capabilities for wide area persistent surveillance,” in Radar
Sensor Technology XIV, vol. 7669, p. 766904, International Society for
Optics and Photonics, 2010.
[7] L. Du, L. Li, B. Wang, and J. Xiao, “Micro-doppler feature extraction
based on time-frequency spectrogram for ground moving targets classifi-
cation with low-resolution radar,” IEEE Sensors Journal, vol. 16, no. 10,
pp. 3756–3763, 2016.
[8] L. Du, Y. Ma, B. Wang, and H. Liu, “Noise-robust classification of
ground moving targets based on time-frequency feature from micro-
doppler signature,” IEEE Sensors Journal, vol. 14, no. 8, pp. 2672–2682,
2014.
[9] Y. Kim and H. Ling, “Human activity classification based on micro-
doppler signatures using a support vector machine,” IEEE Transactions
on Geoscience and Remote Sensing, vol. 47, no. 5, pp. 1328–1337, 2009.
[10] Y. Kim and H. Ling, “Human activity classification based on micro-
doppler signatures using an artificial neural network,” in 2008 IEEE
Antennas and Propagation Society International Symposium, pp. 1–4,
IEEE, 2008.
[11] Y. Kim and B. Toomajian, “Hand gesture recognition using micro-
doppler signatures with convolutional neural network,” IEEE Access,
vol. 4, pp. 7125–7130, 2016.
[12] Q. Wan, Y. Li, C. Li, and R. Pal, “Gesture recognition for smart
home applications using portable radar sensors,” in 2014 36th Annual
International Conference of the IEEE Engineering in Medicine and
Biology Society, pp. 6414–6417, IEEE, 2014.
[13] B. Ayhan, C. Kwan, and C. Lu, “Extracting gait characteristics from
micro-doppler features,” in Proceedings of the 2nd International Con-
ference on Vision, Image and Signal Processing, pp. 1–6, 2018.
[14] M. M. Moya, M. W. Koch, and L. D. Hostetler, “One-class classifier
networks for target recognition applications,” NASA STI/Recon Technical
Report N, vol. 93, 1993.
[15] L. Ruff, N. Görnitz, L. Deecke, S. A. Siddiqui, R. Vandermeulen,
A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” in
International Conference on Machine Learning, pp. 4390–4399, 2018.
[16] C. C. Aggarwal, “Outlier analysis,” in Data mining, pp. 237–263,
Springer, 2015.
[17] C. Ding, R. Chae, J. Wang, L. Zhang, H. Hong, X. Zhu, and C. Li,
“Inattentive driving behavior detection based on portable fmcw radar,”
IEEE Transactions on Microwave Theory and Techniques, vol. 67,
no. 10, pp. 4031–4041, 2019.
[18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
object detection with region proposal networks,” in Advances in neural
information processing systems, pp. 91–99, 2015.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778, 2016.
[20] O. Bazgir, K. Barck, R. A. Carano, R. M. Weimer, and L. Xie, “Kidney
segmentation using 3d u-net localized with expectation maximization,”
in 2020 IEEE Southwest Symposium on Image Analysis and Interpreta-
tion (SSIAI), pp. 22–25, IEEE, 2020.
[21] O. Bazgir, R. Zhang, S. R. Dhruba, R. Rahman, S. Ghosh, and R. Pal,
“Representation of features as images with neighborhood dependencies
for compatibility with convolutional neural networks,” Nature Commu-
nications, vol. 11, no. 1, pp. 1–13, 2020.
[22] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.
Salakhutdinov, “Improving neural networks by preventing co-adaptation
of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
[23] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”
Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–
52, 1987.
[24] V. Vapnik, “The support vector method of function estimation,” in
Nonlinear Modeling, pp. 55–85, Springer, 1998.
[25] V. Vapnik, The nature of statistical learning theory. Springer science
business media, 2013.
[26] C. Li, J. Wang, D. Rodriguez, A. Mishra, Z. Peng, and Y. Li, “Portable
doppler/fsk/fmcw radar systems for life activity sensing and human
localization,” in 2019 14th International Conference on Advanced
Technologies, Systems and Services in Telecommunications (TELSIKS),
pp. 83–93, IEEE, 2019.
[27] K. P. Murphy, Machine learning: a probabilistic perspective. MIT press,
2012.
[28] J. Mockus, Bayesian approach to global optimization: theory and
applications, vol. 37. Springer Science Business Media, 2012.
[29] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram,
M. Patwary, M. Prabhat, and R. Adams, “Scalable bayesian optimization
using deep neural networks,” in International conference on machine
learning, pp. 2171–2180, 2015.
[30] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms
for hyper-parameter optimization,” in Advances in neural information
processing systems, pp. 2546–2554, 2011.
[31] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas,
“Taking the human out of the loop: A review of bayesian optimization,”
Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2015.
[32] O. Bazgir, S. Ghosh, and R. Pal, “Investigation of refined cnn ensemble
learning for anti-cancer drug sensitivity prediction,” arXiv preprint
arXiv:2009.04076, 2020.
[33] J. Bergstra, D. Yamins, and D. D. Cox, “Making a science of model
search: Hyperparameter optimization in hundreds of dimensions for
vision architectures,” 2013.
[34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: from error visibility to structural similarity,” IEEE
transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[35] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based
noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60,
no. 1-4, pp. 259–268, 1992.
[36] H. W. Lilliefors, “On the kolmogorov-smirnov test for normality with
mean and variance unknown,” Journal of the American statistical
Association, vol. 62, no. 318, pp. 399–402, 1967.
[37] F. Fioranelli, M. Ritchie, and H. Griffiths, “Classification of un-
armed/armed personnel using the netrad multistatic radar for micro-
doppler and singular value decomposition features,” IEEE Geoscience
and Remote Sensing Letters, vol. 12, no. 9, pp. 1933–1937, 2015.
[38] F. Fioranelli, M. Ritchie, and H. Griffiths, “Centroid features for
classification of armed/unarmed multiple personnel using multistatic
human micro-doppler,” IET Radar, Sonar Navigation, vol. 10, no. 9,
pp. 1702–1710, 2016.

Journal
Omid Bazgir (S’18) received the B.S. degree
in electrical engineering from Islamic Azad Uni-
versity, Najafabad, Iran, in 2012 and the M.S.
degree in electrical engineering from University
of Tabriz, Tabriz, Iran, in 2015. He is currently
pursuing the Ph.D. degree in electrical engineer-
ing at Texas Tech University, Lubbock, TX, USA.
In the summer of 2019, he was with Genentech,
South San Francisco, California, where he was
working on deep image segmentation algorithms
development for 3D MRI images. In the summer
of 2020, he was with Bayer, Saint Louis, MO, USA. His research
interest includes the development of machine learning and signal/image
processing algorithms for biological and industrial applications.
Daniel Nolte received the B.S. degree in elec-
trical engineering from Texas Tech University,
Lubbock, TX, USA, in 2019. He is currently pur-
suing the Ph.D. degree in electrical engineering
at Texas Tech University, Lubbock, TX, USA.
His research interests include machine learning
algorithm development for security and anomaly
detection applications.
Saugato Rahman Dhruba (S’18) received his
B.S. degree in Electrical and Electronic Engi-
neering from Bangladesh University of Engi-
neering and Technology, Dhaka, Bangladesh in
2014. He is currently pursuing a Ph.D. degree in
Electrical Engineering from Texas Tech Univer-
sity, Lubbock, TX, USA. From the Summer to Fall
of 2019, he was with Biogen Inc., Cambridge,
MA, where he was developing an analysis and
visualization pipeline for processing next gener-
ation sequencing data using statistical modeling
approaches. His research interests include designing machine learning,
transfer learning, and bioinformatics algorithms for biological and indus-
trial applications.
Yiran Li (S’11) received the B.S. degree in elec-
trical engineering from Southern Medical Uni-
versity, China, in 2009, She received the M.S.
and Ph. D degree in electrical engineering from
Texas Tech University in 2012 and 2019, respec-
tively. She has been working as a research sci-
entist at United Imaging Healthcare in Houston
since summer 2019. From 2015 to 2016, she
was with Advanced Bionics, Valencia, CA, USA,
where she worked on cochlear implant designs.
From 2016 to 2017, she served as the CTO of a
startup company, year ONE, LLC, Lubbock, TX, USA, where she worked
on radar technology-based baby monitor design. In the summer of 2017,
she worked as an intern at the United Technologies Research Center,
East Hartford, CT, USA, where she was focused on human detection al-
gorithm development using FMCW radar. Her research interests include
radar sensor designs for biomedical and security applications.
Changzhi Li (S’06–M’09–SM’13) received the
B.S.degree in electrical engineering from Zhe-
jiang University, China, in 2004, and the Ph.D.
degree in electrical engineering from the Univer-
sity of Florida, Gainesville, FL, USA, in 2009.
In the summers of 2007–2009, he was with
Alereon Inc., Ausitn, TX, USA. Then, he was with
Coherent Logix Inc., Austin, TX, USA, where he
was involved with the ultra wide band (UWB)
transceivers and software-defined radio, respec-
tively. He joined Texas Tech University as an
Assistant Professor in 2009, became an Associate Professor in 2014,
and Professor in 2020. His research interests include biomedical ap-
plications of microwave technology, wireless sensors, and RF/analog
circuits. Dr. Li was a recipient of the IEEE Microwave Theory and
Techniques Society (MTT-S) Outstanding Young Engineer Award, the
IEEE Sensors Council Early Career Technical Achievement Award, the
ASEE Frederick Emmons Terman Award, the IEEE-HKN Out standing
Young Professional Award, the NSF Faculty Early CAREER Award,
and the IEEE MTT-S Graduate Fellowship Award. He served as the
TPC Co-Chair for the IEEE MTT-S International Microwave Biomedical
Conference in 2018 and the IEEE Wireless and Microwave Technol-
ogy Conference from 2012 to 2013. He is an Associate Editor of
the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I and the
IEEE JOURNAL OF ELECTROMAGNETICS,RF AND MICROWAVES
IN MEDICINE AND BIOLOGY. He served as an Associate Editor for the
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2014 to
2015.
Souparno Ghosh received MS in Statistics from
University of Calcutta, Kolkata, India, in 2004,
and Ph.D. degree in Statistics from Texas AM
University, TX, USA, in 2009. He is currently an
Associate Professor in the Department of Statis-
tics, University of Nebraska, Lincoln, NE. His
research areas are Bayesian hierarchical model,
Statistical image analysis, Statistical machine
learning.
Ranadip Pal (S’05–M’07–SM’13) received the
B.Tech. degree in electronics and electrical com-
munication engineering from the Indian Institute
of Technology, Kharagpur, India, in 2002, and
the M.S.and Ph.D. degrees in electrical engi-
neering from Texas University, College Station,
TX, USA, in 2004 and 2007, respectively. Since
2007, he has been with Texas Tech University,
where he is currently a Professor with the Electri-
cal and Computer Engineering Department. His
research areas are genomic signal processing,
stochastic modeling and control, machine learning and computational
biology. He has authored more than 90 peer reviewed articles, including
publications in high impact journals such as Nature Medicine and
Cancer Cell and has authored a book entitled Predictive Modeling
of Drug Sensitivity. He received the NSF CAREER Award in 2010,
the President’s Excellence in Teaching Award in 2012, the Whitacre
Research Award in 2014, and the Chancellor’s Council Distinguished
Research Award in 2016.

bazgir2020.pdf

Recommended

Recommended

More Related Content

Similar to bazgir2020.pdf

Similar to bazgir2020.pdf (20)

Recently uploaded

Recently uploaded (20)

bazgir2020.pdf