SlideShare a Scribd company logo
1 of 9
Download to read offline
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Convolutional Neural Network with Second-order
Pooling for Underwater Target Classification
Xu Cao, Student Member, IEEE, Roberto Togneri, Senior Member, IEEE, Xiaomin Zhang,
and Yang Yu, Member, IEEE,
Abstract—Underwater target classification using passive sonar
remains a critical issue due to the changeable ocean environment.
Convolutional Neural Networks (CNNs) have shown success in
learning invariant features using local filtering and max pooling.
In this paper, we propose a novel classification framework which
combines the CNN architecture with the second-order pooling
(SOP) to capture the temporal correlations from the time-
frequency (T-F) representation of the radiated acoustic signals.
The convolutional layers are used to learn the local features with a
set of kernel filters from the T-F inputs which are extracted by the
constant-Q transform (CQT). Instead of using max pooling, the
proposed SOP operator is designed to learn the co-occurrences
of different CNN filters using the temporal feature trajectory
of CNN features for each frequency subband. To preserve the
frequency distinctions, the correlated features of each frequency
subband are retained. The pooling results are normalized with
signed square-root and l2 normalization, and then input into
the softmax classifier. The whole network can be trained in
an end-to-end fashion. To explore the generalization ability to
unseen conditions, the proposed CNN model is evaluated on
the real radiated acoustic signals recorded at new sea depths.
The experimental results demonstrate that the proposed method
yields an 8% improvement in classification accuracy over the
state-of-the-art deep learning methods.
Index Terms—underwater target classification, convolutional
neural networks, second-order pooling, constant-Q transform.
I. INTRODUCTION
UNDERWATER target classification is aimed to detect and
recognize the marine vessels with the radiated acoustic
signals recorded by the passive sonar. It has many important
applications in ocean engineering, such as automatic target
recognition (ATR) and marine monitoring. The task can be
formulated as a feature representation problem where the
discriminative characteristics are learned from the received
acoustic signals for classification. However, when applied
in practical situations, robustness and generalization ability
to environment variation are significant for passive sonar
target classification, especially from single-sensor recordings.
Several factors affect the performances of the classification
systems, including the lack of a priori knowledge of the targets,
the various working conditions of the same class such as the
speed and the power configuration and the unpredictable ocean
X. Cao, X. Zhang and Y. Yu are with the School of Marine
Science and Technology, Northwestern Polytechnical University, Xi’an
710072, China(e-mail: caoxu@mail.nwpu.edu.cn; xmzhang@nwpu.edu.cn; n-
wpuyuy@nwpu.edu.cn).
R. Togneri is with School of Electrical, Electronics and Computer Engineer-
ing, The University of Western Australia, Perth, WA 6009, Australia(e-mail:
roberto.togneri@uwa.edu.au).
Manuscript received August 15, 2018; revised November 5, 2018.
background noise. Consequently, more adaptive and robust
classification models are needed to deal with this problem.
Several pattern recognition methods have been developed
for underwater target classification with different features
extracted from the radiated acoustic signals. In [1], the features
generated from the wavelet packet transform (WPT) and the
linear predictive coding (LPC) are put into the neural network
(NN) classifier. In [2], a Hidden Markov Model (HMM) is
used for multiaspect target detection and identification. In
[3], a preprocessing method is developed to improve the
performance of a feedforward neural network (NN) for passive
sonar signal classification. A novel class detection scheme
utilizing a clustering approach on an unsupervised neural
network based Self-Organizing Map (SOM) is proposed in [4].
In [5], canonical correlation analysis (CCA) is employed as a
multiaspect feature extraction method for underwater target
classification. In [6], a K-nearest neighbor (K-NN) system
is used as a memory to provide the closest matches of an
unknown pattern in the feature space. In the past few years,
support vector machines (SVMs) have seen an increased usage
in applications of underwater water target classification. The
method in [7] proposes to adopt SVM as the classifier for
features captured using the Hilbert-Huang transform (HHT). In
[8], an underwater acoustic feature extraction and classification
method based on the Wigner-Ville distribution (WVD) and the
SVM is presented.
Compared with conventional machine-learning systems
based on a priori knowledge, deep networks are able to
hierarchically learn the high-level features from the large
number of samples, and the extracted deep features are more
robust to invariants [9–11]. In [12], Kamal et al. proposed
to incorporate the Deep Belief Network (DBN) to capture
several layers of deep features from the underwater acoustic
signals, which are more abstract at the higher layers. Our
past work in [13] utilizes a Stacked Autoencoder (SAE)
for feature learning with the short time frequency transform
(STFT), which provides competitive performance. However,
these fully-connected networks demand huge collections of
training samples for effective training, especially when applied
to multiple-frame T-F features.
In recent years, Convolutional Neural Networks (CNNs)
have been successfully applied to many pattern recognition
tasks with local connectivity and weight sharing [14–16].
Compared with the fully-connected deep models, these popular
CNN architectures use a set of filters which process the local
parts of the whole input to capture the detail characteristics.
Usually, the max-pooling is used to generate holistic and
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2
invariant representations from the CNN features. However, the
max-pooling just focuses on the first-order statistics in the
local regions of the CNN features. For underwater acoustic
signals which have strong temporal relations, max-pooling
may ignore the high-level correlations in the time domain. The
second-order pooling (SOP) has shown success in computer
vision tasks to capture the second-order correlations of the
local features [17]. In this paper, we propose to learn the
second-order temporal correlations of the CNN features for
underwater target classification. The proposed SOP strategy
is designed to compute the co-occurrences of different CNN
filters using the temporal feature trajectory of CNN features as
input. Compared to the max-pooling, the proposed SOP strat-
egy is capable of exploring the second-order co-occurrences
for the CNN feature maps of underwater acoustic signals to
improve the classification performance.
The constant-Q transform (CQT) is popular in music sig-
nal processing since the bin frequencies of the CQT scale
have a perceptually relevant geometrical distribution [18–20].
Compared to the STFT, the CQT can provide a better fre-
quency resolution for lower frequencies and a better temporal
resolution for higher frequencies [21]. The radiated signal
of an underwater target contains much useful information
in the low frequency subbands, such as the line spectrum
components, which are related to the propeller’s turning. The
greater resolution in the low frequencies of the CQT can
contribute to a more robust feature representation. In this study,
unlike [12, 13], we use the CQT as the T-F representation
method for underwater target classification.
In this paper, a new underwater target classification frame-
work based on the CNN model is proposed. Our work focuses
on the second-order pooling (SOP) strategy for the CNN
feature maps. The proposed method is named the CNN-SOP
model. For each frequency subband, the pooling operation
takes a temporal sequence of every CNN feature map as
input to compute the similarities between these temporal
features of different CNN filters. The correlation features of
different frequency subbands are then passed through a signed
square-root step and l2 normalization to generate the final
feature vector, which is input into the softmax classifier for
classification. Furthermore, we propose to use the CQT to
generate the T-F representation for the CNN-SOP model. Since
the generalization ability to unseen conditions is significant in
practical applications, the proposed classification method is
tested on the real radiated acoustic signals recorded at new
depths. The results show that the proposed method achieves
an 8% improvement compared to other deep learning-based
approaches. The proposed second-order pooling strategy is
shown to improve the classification accuracy by further 4%
over the max pooling.
The rest of this paper is organized as follows: Section II
introduces the related work of the CNN architecture and the
pooling strategy. Section III details the proposed CNN-SOP
model. The experimental results of this method is provided
in Section IV. In Section V we draw our conclusions of this
work.
II. RELATED WORK
Recently, CNN architectures have increasingly been used in
acoustic signal recognition. Approaches developed for image
recognition [22] can be extended to signal classification by
regarding the T-F representation (e.g. spectrogram and MFCC)
of raw signals as an image. In [23], the CNN model is
introduced in acoustic event detection to capture the local
properties of acoustic events, which provides competitive
performance in the evaluation task. In [24], the CNN networks
in conjunction with different data augmentation methods are
applied to environmental sound classification. In [25], the
performances of different auditory and spectrogram image
features using CNN models are evaluated. In [26], the CNN
architecture is integrated with the SVM classifier to improve
the overall classification performance of the real-time signals.
In [27], a CWT and CNN-based fault detection method is
proposed to extract the comprehensive T-F features of fault
signals.
A difficulty when extending the regular CNN-based meth-
ods to acoustic signals is that the translation invariability
in frequency may not be appropriate since the difference in
frequency bands usually means a different class. This problem
also exists in underwater acoustic signals since the spectrum
distributions of various vessels differ a lot. One may overcome
this difficulty by presenting a novel deep convolutional neural
network architecture, where heterogeneous pooling is used to
provide constrained frequency-shift invariance in the speech
spectrogram [28]. In [29], a parallel CNN architecture is
created, which comprises a CNN layer which is optimized for
processing and recognizing relations in the frequency domain,
and a parallel one which is aimed at capturing temporal
relations. Another promising CNN network to deal with this
problem is to add an intermap pooling (IMP) layer to increase
robustness to spectral variations [30].
Second-order pooling methods have been widely used in
many computer vision tasks. Our proposed pooling approach
is inspired by the second-order pooling scheme in [17] which
summarizes sets of local features inside a free-form region,
while preserving information about their pairwise correlations.
However, this approach uses second-order pooling directly on
raw local descriptors such as SIFT while we apply the SOP
to the CNN feature maps in this work. In [31], a bilinear
CNN (B-CNN) model is proposed for image classification
which consists of two feature extractors based on CNNs whose
outputs are multiplied using the outer product at each location
to obtain the bilinear vector. When using the same CNN
extractor, the bilinear pooling used in the B-CNN model can
be seen as a second-order pooling approach. An improved
bilinear pooling method for CNN features is proposed in [32]
which proposes to use the matrix square-root normalization to
improve the classification performance. In [33], two compact
bilinear representations are proposed to reduce the dimensions
of the full bilinear models. Since the T-F representation is
different from the image input, in contrast to these B-CNN
models, our SOP method just focuses on the temporal corre-
lations and preserves the correlation matrix for each frequency
subband, which can retain the spectral variation characteristics
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
of different classes. Another second-order temporal pooling
is proposed for action recognition in [34], which uses the
temporal classification scores to generate the descriptor rather
than the CNN features.
III. THE PROPOSED SYSTEM
The whole framework of the proposed CNN-SOP model
is described in Fig. 1. In the preprocessing stage, the raw
radiated acoustic signals are converted into a time-frequency
representation using the CQT. Multiple frames of the CQT
representation are combined to generate the input for the CNN
network. Instead of using max pooling, we adopt second-order
pooling for the CNN feature maps to obtain the temporal
correlation features of the input. Elementwise square-root and
l2 normalization are used to further improve the performance.
The whole network can be trained end-to-end with back-
propagation.
A. Preprocessing using CQT
For underwater radiated signals which belong to the non-
stationary signals, the T-F representation approaches have been
shown to be more effective for feature extraction. The CQT
can transform the time-domain signal to the T-F domain
such that the center frequencies of the frequency bins are
geometrically spaced and their Q-factors are all equal [35].
That means the CQT can provide a better frequency resolution
for low frequency subbands compared to the STFT, and can
show more details about the low-frequency components. In this
paper, we propose to use the CQT to deal with the radiated
acoustic signals.
Given a discrete time-domain signal x(n), the CQT is
defined as:
XCQ
(k, n) =
n+⌊Nk/2⌋
∑
j=n−⌊Nk/2⌋
x(j)a∗
k(j − n + Nk/2) (1)
where k = 1, 2, . . . , K represents the K frequency bins of the
CQT, and a∗
k(n) is the complex conjugate of the basis function
[35]. The Nk denotes the window length which is set to be
variable.
The center frequency of the kth
bin is defined by:
fk = f12
k−1
B (2)
where f1 is the center frequency of the lowest -frequency bin
and B is the number of bins for each octave, which determines
the time-frequency resolution trade-off of the CQT. Then, the
total number of frequency bins K of the CQT can be computed
as:
K = B(log2
fmax
f1
+ 1) (3)
where fmax is the center frequency of the highest-frequency
bin.
In our work, we propose to use the CQT to obtain the T-F
features for the CNN model. The CQT T-F feature is derived
from multiple frames as follows:
X =
{
X1
, X2
, . . . , XN
}
(4)
Convolutional
layer 1
Convolutional
layer 2
Convolutional
layer L
Input
1
f 2
f
1
W 2
W
X 1
H 2
H L
H
Fig. 2. The CNN architecture.
where
Xi
= 20 log10 ||XCQ
(i)|| (5)
and Xi
is the CQT feature for frame i and N denotes the total
number of frames. XCQ
(i) ∈ RK
is the complex-valued CQT
vector of the K frequency bins representing frame i.
B. CNN architecture
In contrast to the fully-connected layers, CNNs are designed
to restrict the connections between the hidden units and the
input units, which means that each hidden unit is supposed
to connect to only a small neighborhood of input units. The
locally connected structure also makes it possible for CNNs
to model the local correlations of the input. By replicating
weights across the whole input, the parameters of the convo-
lutional layers are reduced. In this paper, we propose to use the
CNN model comprised of L convolutional layers to learn the
deep representation of the CQT feature. The CNN architecture
is described in Fig. 2. Unlike the regular CNN models, the
max-pooling layers are not adopted in the network since the
resolution is important for classification.
Given our input X, the CNN model is supposed to learn
the nonlinear representation f which maps the input X to the
Lth
output HL
:
HL
= f(X) = fL(· · · f2(f1(X; W1); W2) · · · , WL) (6)
where fl is the mapping function of the lth
convolutional layer,
which takes the input Hl
to generate the feature maps Hl+1
with the filter parameter Wl. The convolutional layers are
constructed by the rectified linear units (ReLUs). The detail
of the convolutional process can be found in [36]. The feature
maps of the last layer HL
is a h × w × c array, where h and
w denote the height and width of the feature map, and c is
the number of the feature maps.
C. Second-order pooling
The T-F representation of the radiated acoustic signals has
strong temporal correlations, which can help to discriminate
different targets. In this work, we propose to use a second-
order pooling scheme for the CNN features to capture the
temporal correlations of the CQT input.
Since the CNN feature maps HL
are learned from the CQT
feature X, h and w correspond to the frequency bins and the
temporal frames of the CQT input. For each frequency bin of
the feature maps, we denote sm
= [sm
1 , sm
2 , . . . , sm
w ] ∈ Rw
as
the temporal feature trajectory of the mth
feature map (see
Fig. 3).
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
CQT input
Second-order
pooling
h w c
´ ´ c c h
´ ´
Linear
+
Normalization
Dense layer
+
Softmax classifier
m
s
CNN architeture
K N
´
CNN feature maps Pooling result Resulting vector
l u
z
( )
SOP S
L
H
X
Class scores
Preprocessing
Original signal
Fig. 1. The framework of the proposed CNN-SOP system.
m
s
S ( )
SOP S
c w
´ c c
´
T
SS
Fig. 3. The second-order pooling operation
The second-order pooling operator is defined as:
SOP(sj
, sk
) =
w
∑
i=1
sj
i sk
i = sjT
sk
(7)
where SOP(sj
, sk
) represents the temporal correlations of
two feature trajectories sj
and sk
from the jth
and kth
feature
maps. The SOP operator is designed to capture the interactions
of two convolution filters along the time axis. For c feature
maps, we denote S ∈ Rc×w
as the temporal feature matrix,
then the SOP operator can be defined in matrix form as:
SOP(S) = SST
(8)
where SOP(S) ∈ Rc2
is a symmetric positive semidefinite
matrix, which captures the temporal correlations of all the
CNN filters for one frequency bin.
Since the differences in the frequency bins are useful to
distinguish the underwater acoustic signals, unlike the pooling
strategy in [31] which use sum-pooling to aggregate the
correlations across the whole image, we retain the SOP results
of all frequency bins to preserve the frequency distinctions for
classification. The last SOP feature is shown in Fig. 1, which
consists of h SOP operators corresponding to the height of the
feature maps.
It is often found that normalization offers significant im-
provements to the deep network. In this work, we incorporate
the elementwise square-root and l2 normalization for the SOP
operators. The resulting SOP operators are first transferred
into the vector p ∈ Rl
, where l = c × c × h. Then, the
resulting vector p is passed through the elementwise square-
root (q ← sign(p)
√
p) and l2 normalization (z ← q/||q||2).
For the CNN feature maps of size h × w × c, the computa-
tional complexity of our proposed SOP strategy is O(hwc2
),
which is the same as the bilinear pooling in [31], while the
max pooling is O(hwc).
D. Softmax classification
The resulting vector of second-order pooling z is then input
to the softmax layer for classification after a dense layer. The
class scores for the ith
sample z(i)
to the category j can be
computed as follows:
p(y(i)
= j|a(i)
; θ) =
eθT
j a(i)
u
∑
t=1
eθT
t a(i)
(9)
where
a(i)
= fL+1(z(i)
; WL+1) (10)
and a(i)
∈ Rl1
is the output activation at the dense layer for
the ith
sample, and l1 and WL+1 denote the node number
and the model parameter of the dense layer. We still use the
ReLU for the mapping function fL+1. The θj ∈ Rl1
denotes
the parameter of the softmax layer for the jth
unit, and u is
the total number of classes.
In this paper, we use the cross-entropy loss function as
the objective function [37]. Since the second-order pooling
and the normalization steps are both differentiable, the back-
propagation can be used to calculate the gradient [31]. Then
we fine-tune the whole model using the Adam optimization
algorithm. The whole model can be trained end-to-end.
IV. EXPERIMENTS AND RESULTS
This section provides experiments to evaluate the perfor-
mance of our proposed CNN-SOP model for underwater target
classification. The experiments were performed on the real
radiated acoustic signals of 5 marine vessels. The advantage
of the proposed SOP scheme was verified by comparing
with the max pooling and the bilinear pooling [31]. We also
compared the classification accuracy of the proposed method
with previous deep learning methods, such as the DBN model
[12] and the SAE model [13].
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
TABLE I
DETAILS ABOUT THE DATASET INCLUDING THE NUMBER OF SAMPLES FOR
EACH VESSEL USED IN THE TRAINING OR TESTING SET
Depth (m) A B C D E Dataset
50 2880 2640 2880 1200 3600 Training
150 5520 6480 1200 4320 800 Training
70 2880 5680 920 2880 640 Testing
100 4800 4560 3600 3360 4800 Testing
200 2640 1640 560 1680 560 Testing
A. Experimental setup
In the experiments, the radiated acoustic signals were
recorded with a single-hydrophone from the South China Sea
in 2015. The hydrophone was placed below the sea level
at 5 depths (50m,70m,100m,150m and 200m). The radiated
signals were collected from 5 different vessels, which had
various weight, size, propeller structure and engine system.
The sampling rate of the signals was 50 kHz. For each run, the
portion of the recording when the vessel ranged from +500m
to -500m was selected.
In the preprocessing stage, the raw radiated signals were
transferred into CQT features. The signals were resampled
at the sampling rate of 4 kHz. We used the Matlab toolbox
to compute the CQT representation [38]. For the radiated
signals, we just focused on the frequency below 1 kHz.
The center frequency of the lowest-frequency bin f1 and
the highest frequency bin fmax were set to be 4 Hz and 1
kHz, respectively. The bin number for each octave B was 8.
Thus, the CQT can capture 64 bands covering 8 octaves. Each
single CQT feature frame can be computed using 23 points (5
milliseconds). We combined 64 frames for each CQT feature
to generate the input sample of the CNN model, then each
sample had the size of 64×64, which was derived from 1472
points (0.32 seconds). Since the radiated signals recorded from
different depths have various signal-to-noise ratios (SNRs),
to evaluate the generalization ability to unseen conditions,
we trained the proposed CNN-SOP model with the samples
generated from the depths of 50m and 150m, while testing the
model with the samples at depths of 70m, 100m and 200m.
The training set contained 31520 input samples and the testing
set had 41200 samples. The details of the whole dataset are
presented in Table I.
The proposed CNN model contained several convolutional
layers which had the same filter size of 8 bands × 8 frames
and the strides size of 2 bands × 2 frames. The whole model
was optimized using the Adam optimizer with the learning
rate of 0.0001. The network was trained for 1000 epochs with
a minibatch size of 50. Our implementation was developed
upon Tensorflow using a NVIDIA Tesla K40 GPU.
B. Comparison with the CNN model using max pooling
We first compared the performance of the proposed second-
order pooling based CNN model (CNN-SOP) with the CN-
N model using max-pooling (CNN-MP). For the CNN-MP
model, the CNN feature maps of the last convolutional layer
were pooled with the pooling size of 2 bands × 2 frames
CQT input (64× 64)
Conv. 32× 32× 8
Conv. 16× 16× 16
SOP 16× 16× 16
Norm. 4096
Dense 1024
Softmax 5
CNN-SOP (2L)
CQT input (64× 64)
Conv. 32× 32× 8
Conv. 16× 16× 16
MP. 8× 8× 16
Dense 1024
Softmax 5
CNN-MP (2L)
Fig. 4. Model structure of the CNN-SOP model and the CNN-MP model.
and the sub-sampling factor of 2 × 2. We have tested the
two CNN models with different convolutional layers from
1 to 4. The model structure for 2 convolutional layers is
described in Fig. 4. To reduce the computational complexity
of the SOP, the number of the CNN filters c was set to a
small number, in this case 16. The resolution is important for
the CQT feature of underwater acoustic signals, especially the
frequency resolution since the CQT features for some targets
are very similar in the frequency domain. To obtain more
discriminative features, we just added one max-pooling layer
after the final convolutional layer for the two CNN models.
Both networks were evaluated using the dataset in Table I.
Fig. 5 shows the classification accuracies of the CNN-SOP
model and the CNN-MP model with different convolutional
layers. It can be seen that the proposed CNN-SOP network
achieves better performance compared to the regular CNN-MP
model, with an improvement of 4% in overall classification
accuracy. We also found that deeper CNNs may not always
produce better results in our experiments when applied to the
two CNN models, and both CNN models yield the highest
accuracies when the number of the convolutional layers set to
be 2. This may be explained by considering that the number
of training samples is limited and we have used a wider CNN
filter (8 × 8), thus using fewer convolutional layers may
be more efficient than a larger number of layers. It can be
observed that the classification accuracies of these two models
decline with the increase of the sea depths. This may be due
to the SNRs of the radiated signals degrading at greater depths
and increasing between the surface vessel and hydrophone.
To explore the individual target performance of these two
models, we also use the confusion matrix to show the clas-
sification results. Both networks have 2 convolutional layers,
which proves to be the best configuration. We can see from
Fig. 6 that the CNN-SOP model provides better classification
accuracies than the CNN-MP model for all targets.
C. Comparison with STFT feature
The STFT feature has been used as the input for a DBN
model to provide the spectrum information of the radiated
signals [12]. To evaluate the advantages of the CQT feature,
the STFT feature was used for comparison. Similar to [12],
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
1 2 3 4
Convolutional layer number
0.8
0.85
0.9
0.95
1
Accuracy
CNN-MP
CNN-SOP
1 2 3 4
Convolutional layer number
0.8
0.85
0.9
0.95
1
Accuracy
CNN-MP
CNN-SOP
1 2 3 4
Convolutional layer number
0.8
0.85
0.9
0.95
1
Accuracy
CNN-MP
CNN-SOP
1 2 3 4
Convolutional layer number
0.8
0.85
0.9
0.95
1
Accuracy
CNN-MP
CNN-SOP
Fig. 5. Classification accuracies of the CNN-SOP model and the CNN-MP
model with different convolutional layers for the dataset at various depths
of 70m (upper-left), 100m (upper-right), 200m (lower-left), and the overall
results (lower-right).
the STFT feature was calculated with 1024 FFT points and a
sampling rate 4 kHz. We concatenated 8 frames to generate
the input for the CNN model. The last input STFT feature
had the size of 512 dimensions × 8 frames. In this section,
we still applied the CNN-SOP model and the CNN-MP model
to the STFT feature for comparison. When using the STFT
feature, the filter size and the strides size of the convolutional
layers were both set to be 8 bands × 1 frame. We still used
2 convolutional layers for the CNN models with the STFT
feature, which have the feature size of 64 × 8 × 8 and 8 ×
8 × 16 for the two convolutional layers.
The classification results of the CQT feature and the STFT
feature using two CNN models are compared in Fig. 7. It can
be seen that the CQT feature offers a 3% improvement over
the STFT feature using the CNN-SOP model, and a 1.6%
improvement using the CNN-MP model. This demonstrates
that the CQT feature is more appropriate for the CNN model
compared with the STFT feature when applied to radiated
acoustic signals, which may be explained by the better res-
olution at the lower frequencies.
D. Comparison with other pooling methods
To verify the effectiveness of the proposed SOP strategy,
we compared the proposed SOP with the bilinear pooling in
[31]. In [31], the B-CNN model is proposed which applies
bilinear pooling to the VGG-16 network [39]. When using the
same CNN extractor, the bilinear pooling can be seen as a
second-order pooling approach. In this section, three pooling
0.9603
0.0279
0.0000
0.0000
0.0322
0.0312
0.9721
0.0274
0.0000
0.0000
0.0000
0.0000
0.9315
0.0216
0.0092
0.0000
0.0000
0.0411
0.9784
0.0000
0.0085
0.0000
0.0000
0.0000
0.9587
class A
class B
class C
class D
class E
c
l
a
s
s
A
c
l
a
s
s
B
c
l
a
s
s
C
c
l
a
s
s
D
c
l
a
s
s
E
0.9062
0.0380
0.0219
0.0000
0.0675
0.0589
0.9374
0.0128
0.0000
0.0000
0.0000
0.0000
0.9041
0.0606
0.0138
0.0000
0.0000
0.0612
0.9394
0.0000
0.0349
0.0246
0.0000
0.0000
0.9187
class A
class B
class C
class D
class E
c
l
a
s
s
A
c
l
a
s
s
B
c
l
a
s
s
C
c
l
a
s
s
D
c
l
a
s
s
E
Fig. 6. Confusion matrix for the overall classification accuracy of the CNN-
SOP model (upper) and the CNN-MP model (lower). X-axis indicates the
predicted label and Y-axis indicates the true label.
70m 100m 200m Overall
0.85
0.88
0.91
0.94
0.97
1
Accuracy
STFT+CNN-MP
STFT+CNN-SOP
CQT+CNN-MP
CQT+CNN-SOP
Fig. 7. Classification results of the CQT feature and the STFT feature using
two CNN models.
approaches based on the VGG network were used for com-
parison with the same CQT feature, which were the proposed
SOP, the bilinear pooling [31] and the max-pooling. However,
since the standard VGG-16 network has 16 convolutional
layers, leading to too many parameters to train, the standard
VGG may not be suitable for our limited dataset. Thus we
considered using a modified VGG-16 network consisting of
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
the first 7 convolutional layers, three pooling layers and one
dense layer in the experiment. Unlike the CNN model used
in IV.B which adopted the convolutional filter of size 8 × 8,
the VGG network used smaller (3 × 3) filters. We also used
fewer filters in each convolutional layer of the modified VGG
network, with (16-32-64) filters for the three convolutional
groups. The single dense layer had 4096 units like the standard
VGG network.
The max-pooling of the modified VGG network (VGG-MP)
was similar to the standard VGG, which was performed over a
2 × 2 window with stride 2. The B-CNN model based on the
modified VGG network had 64 filters in the final convolutional
layer, thus the bilinear feature dimension was 64×64 = 4096.
We also applied the proposed SOP strategy on the same VGG
network above (VGG-SOP) for comparison. The CNN feature
of the final convolutional layer had the size of 8 × 8 ×
64, which meant that the SOP feature had the dimension of
8 × 64 × 64 = 32768. The elementwise square-root and l2
normalization were used before the final classification for the
SOP and the bilinear pooling. The learning rate of the Adam
optimizer was set to 0.001. The network was trained for 600
epochs with a minibatch size of 64.
It can be seen from Fig. 8 that when using the same
VGG network, the VGG-SOP outperforms the B-CNN model
[31] by nearly 2% and the max-pooling by 3%. The results
shows that compared to the bilinear pooling, the proposed SOP
strategy can take advantage of the local discrimination along
the frequency axis, which is more suitable for classification of
underwater acoustic signals.
70m 100m 200m Overall
0.8
0.83
0.86
0.89
0.92
0.95
Accuracy
VGG-MP
B-CNN [31]
VGG-SOP
Fig. 8. Classification results of the proposed SOP, the bilinear pooling and
the max pooling based on the VGG network.
E. Comparison with previous DNN-based classification mod-
els
In this section, we compared the classification accuracy
against other deep learning-based underwater target classifi-
cation systems [12, 13] with our dataset. We have applied
the CQT to the DBN model [12] and the SAE model [13] for
comparison. Since the DBN and SAE are both fully-connected
deep networks, the input CQT sample has the dimension of
4096 (64 bands × 64 frames), which may lead to too many
parameters and a heavy computational load. Thus we extracted
the averages across consecutive 8 frames from the original 64
TABLE II
COMPARISON OF THE PROPOSED CNN-SOP MODEL WITH THE DBN
MODEL AND THE SAE MODEL USING THE CQT FEATURE IN TERMS OF
CLASSIFICATION ACCURACY
Method 70 m 100 m 200 m Overall
DBN [12] 0.8941 0.8707 0.8305 0.8712
SAE [13] 0.9052 0.8819 0.8553 0.8847
Proposed CNN-SOP 0.9714 0.9656 0.9421 0.9634
frames to generate the CQT features for the DBN and SAE,
which had the dimension of 512 (64 bands × 8 frames). The
model structures of the DBN and SAE were similar to [12]
and [13]. The DBN model had 3 hidden layers (200-100-50)
while the SAE model was composed of 3 autoencoders with
100 units. We can see from Table II that the proposed CNN-
SOP model improves the overall classification accuracy by 8%
compared to the DBN and SAE model when using the same
CQT input. This shows that our CNN-SOP model has a great
advantage over these fully-connected networks.
V. CONCLUSION
In this paper, we have introduced a novel CNN model using
second-order pooling to capture the temporal correlations
for underwater target classification. The radiated signals are
transformed into a T-F feature using the CQT as the inputs to
the CNN model. The proposed second-order pooling learns the
temporary similarities of different CNN filters by computing
the covariance matrix of the CNN feature maps along the time
axis. The experimental results on the real radiated acoustic
signals recorded under various depths demonstrate that the
second-order pooling achieves better performance over the
max pooling under various sea depths. The CQT feature
has also been demonstrated to be more effective than the
STFT feature when applied to the proposed CNN model.
The proposed CNN-based classification approach improves the
classification accuracy by 8% compared with the state-of-the-
art deep learning methods.
ACKNOWLEDGMENT
The research was supported by the National Science Foun-
dation of China (Grant no. 61601369). This work was com-
pleted when the first author was a visiting student in the School
of Electrical, Electronic and Computer Engineering, University
of Western Australia. We gratefully acknowledge the support
of NVIDIA Corporation with the donation of the Tesla K40
GPU used for this research.
REFERENCES
[1] M. R. Azimi-Sadjadi, D. Yao, Q. Huang, and G. J.
Dobeck, “Underwater target classification using wavelet
packets and neural networks,” IEEE Transactions on
Neural Networks, vol. 11, no. 3, pp. 784–794, 2000.
[2] S. Ji, X. Liao, and L. Carin, “Adaptive multiaspect target
classification and detection with hidden markov models,”
IEEE Sensors Journal, vol. 5, no. 5, pp. 1035–1042,
2005.
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8
[3] J. De Seixas, N. De Moura et al., “Preprocessing passive
sonar signals for neural classification,” IET radar, sonar
& navigation, vol. 5, no. 6, pp. 605–612, 2011.
[4] S. Kamal, A. Mujeeb, M. Supriya et al., “Novel class
detection of underwater targets using self-organizing
neural networks,” in Underwater Technology (UT), 2015
IEEE. IEEE, 2015, pp. 1–5.
[5] A. Pezeshki, M. R. Azimi-Sadjadi, and L. L. Scharf,
“Undersea target classification using canonical correla-
tion analysis,” IEEE Journal of Oceanic Engineering,
vol. 32, no. 4, pp. 948–955, 2007.
[6] M. R. Azimi-Sadjadi, D. Yao, A. A. Jamshidi, and G. J.
Dobeck, “Underwater target classification in changing
environments using an adaptive feature mapping,” IEEE
Transactions on neural networks, vol. 13, no. 5, pp.
1099–1111, 2002.
[7] S. Wang and X. Zeng, “Robust underwater noise target-
s classification using auditory inspired time–frequency
analysis,” Applied Acoustics, vol. 78, pp. 68–76, 2014.
[8] Y. Wu, X. Li, and Y. Wang, “Extraction and classification
of acoustic scattering from underwater target based on
wigner-ville distribution,” Applied Acoustics, vol. 138,
pp. 52–59, 2018.
[9] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mo-
hamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen,
T. N. Sainath et al., “Deep neural networks for acoustic
modeling in speech recognition: The shared views of
four research groups,” IEEE Signal Processing Magazine,
vol. 29, no. 6, pp. 82–97, 2012.
[10] S.-H. Fang, Y.-X. Fei, Z. Xu, and Y. Tsao, “Learning
transportation modes from smartphone sensors based on
deep neural network,” IEEE Sensors Journal, vol. 17,
no. 18, pp. 6111–6118, 2017.
[11] A. Dairi, F. Harrou, Y. Sun, and M. Senouci, “Ob-
stacle detection for intelligent transportation systems
using deep stacked autoencoder and k-nearest neighbor
scheme,” IEEE Sensors Journal, vol. 18, no. 12, pp.
5122–5132, 2018.
[12] S. Kamal, S. K. Mohammed, P. S. Pillai, and M. Supriya,
“Deep learning architectures for underwater target recog-
nition,” in Ocean Electronics (SYMPOL), 2013. IEEE,
2013, pp. 48–54.
[13] X. Cao, X. Zhang, Y. Yu, and L. Niu, “Deep learning-
based recognition of underwater target,” in Digital Signal
Processing (DSP), 2016 IEEE International Conference
on. IEEE, 2016, pp. 89–93.
[14] P. Swietojanski, A. Ghoshal, and S. Renals, “Convolu-
tional neural networks for distant speech recognition,”
IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1120–
1124, 2014.
[15] X. Xiang, N. Lv, M. Zhai, and A. El Saddik, “Real-
time parking occupancy detection for gas stations based
on haar-adaboosting and cnn,” IEEE Sensors Journal,
vol. 17, no. 19, pp. 6360–6367, 2017.
[16] Y. Wang, A. Yang, X. Chen, P. Wang, Y. Wang, and
H. Yang, “A deep learning approach for blind drift
calibration of sensor networks,” IEEE Sensors Journal,
vol. 17, no. 13, pp. 4158–4171, 2017.
[17] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu,
“Semantic segmentation with second-order pooling,” in
European Conference on Computer Vision. Springer,
2012, pp. 430–443.
[18] W. J. Pielemeier and G. H. Wakefield, “A high-resolution
time–frequency representation for musical instrument
signals,” The Journal of the Acoustical Society of Amer-
ica, vol. 99, no. 4, pp. 2382–2396, 1996.
[19] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni,
“Time-frequency analysis of musical signals,” Proceed-
ings of the IEEE, vol. 84, no. 9, pp. 1216–1230, 1996.
[20] G. Costantini, R. Perfetti, and M. Todisco, “Event based
transcription system for polyphonic piano music,” Signal
Processing, vol. 89, no. 9, pp. 1798–1811, 2009.
[21] J. C. Brown, “Calculation of a constant q spectral trans-
form,” The Journal of the Acoustical Society of America,
vol. 89, no. 1, pp. 425–434, 1991.
[22] Y. LeCun, F. J. Huang, and L. Bottou, “Learning methods
for generic object recognition with invariance to pose and
lighting,” in Computer Vision and Pattern Recognition,
2004. CVPR 2004. Proceedings of the 2004 IEEE Com-
puter Society Conference on, vol. 2. IEEE, 2004, pp.
II–104.
[23] M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani,
“Exploiting spectro-temporal locality in deep learning
based acoustic event detection,” EURASIP Journal on
Audio, Speech, and Music Processing, vol. 2015, no. 1,
p. 26, 2015.
[24] J. Salamon and J. P. Bello, “Deep convolutional neural
networks and data augmentation for environmental sound
classification,” IEEE Signal Processing Letters, vol. 24,
no. 3, pp. 279–283, 2017.
[25] R. Hyder, S. Ghaffarzadegan, Z. Feng, J. H. Hansen,
and T. Hasan, “Acoustic scene classification using a cnn-
supervector system trained with auditory and spectro-
gram image features,” Proc. Interspeech 2017, pp. 3073–
3077, 2017.
[26] S. Lekha and M. Suchetha, “A novel 1-d convolution neu-
ral network with svm architecture for real-time detection
applications,” IEEE Sensors Journal, vol. 18, no. 2, pp.
724–731, 2018.
[27] M.-F. Guo, X.-D. Zeng, D.-Y. Chen, and N.-C. Yang,
“Deep-learning-based earth fault detection using contin-
uous wavelet transform and convolutional neural network
in resonant grounding distribution systems,” IEEE Sen-
sors Journal, vol. 18, no. 3, pp. 1291–1300, 2018.
[28] L. Deng, O. Abdel-Hamid, and D. Yu, “A deep convo-
lutional neural network using heterogeneous pooling for
trading acoustic invariance with phonetic confusion,” in
Acoustics, Speech and Signal Processing (ICASSP), 2013
IEEE International Conference on. IEEE, 2013, pp.
6669–6673.
[29] T. Lidy and A. Schindler, “Cqt-based convolutional
neural networks for audio scene classification,” in Pro-
ceedings of the Detection and Classification of Acous-
tic Scenes and Events 2016 Workshop (DCASE2016),
vol. 90. DCASE2016 Challenge, 2016, pp. 1032–1048.
[30] H. Lee, G. Kim, H.-G. Kim, S.-H. Oh, and S.-Y. Lee,
1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors
Journal
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9
“Deep cnns along the time axis with intermap pooling for
robustness to spectral variations,” IEEE signal processing
letters, vol. 23, no. 10, pp. 1310–1314, 2016.
[31] T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear cnn
models for fine-grained visual recognition,” in Proceed-
ings of the IEEE International Conference on Computer
Vision, 2015, pp. 1449–1457.
[32] T.-Y. Lin and S. Maji, “Improved bilinear pooling with
cnns,” arXiv preprint arXiv:1707.06772, 2017.
[33] Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact
bilinear pooling,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2016, pp.
317–326.
[34] A. Cherian and S. Gould, “Second-order temporal
pooling for action recognition,” arXiv preprint arX-
iv:1704.06925, 2017.
[35] C. Schörkhuber and A. Klapuri, “Constant-q transform
toolbox for music processing,” in 7th Sound and Music
Computing Conference, Barcelona, Spain, 2010, pp. 3–
64.
[36] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn,
“Applying convolutional neural networks concepts to
hybrid nn-hmm model for speech recognition,” in Acous-
tics, Speech and Signal Processing (ICASSP), 2012 IEEE
International Conference on. IEEE, 2012, pp. 4277–
4280.
[37] S. W. Abeyruwan, D. Sarkar, F. Sikder, and U. Visser,
“Semi-automatic extraction of training examples from
sensor readings for fall detection and posture monitor-
ing,” IEEE Sensors Journal, vol. 16, no. 13, pp. 5406–
5415, 2016.
[38] C. Schörkhuber, A. Klapuri, N. Holighaus, and
M. Dörfler, “A matlab toolbox for efficient perfect recon-
struction time-frequency transforms with log-frequency
resolution,” in Audio Engineering Society Conference:
53rd International Conference: Semantic Audio. Audio
Engineering Society, 2014.
[39] K. Simonyan and A. Zisserman, “Very deep convolu-
tional networks for large-scale image recognition,” arXiv
preprint arXiv:1409.1556, 2014.
PLACE
PHOTO
HERE
Xu Cao Biography text here.
Roberto Togneri Biography text here.
Xiaomin Zhang Biography text here.
Yang Yu Biography text here.

More Related Content

Similar to 2933bf63f71e22ee0d6e84792f3fec1a.pdf

Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...
Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...
Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...IRJET Journal
 
An energy aware scheme for layered chain in underwater wireless sensor networ...
An energy aware scheme for layered chain in underwater wireless sensor networ...An energy aware scheme for layered chain in underwater wireless sensor networ...
An energy aware scheme for layered chain in underwater wireless sensor networ...IJECEIAES
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
 
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...IJERA Editor
 
Energy efficient routing in wireless sensor network based on mobile sink guid...
Energy efficient routing in wireless sensor network based on mobile sink guid...Energy efficient routing in wireless sensor network based on mobile sink guid...
Energy efficient routing in wireless sensor network based on mobile sink guid...IJECEIAES
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
 
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...IRJET Journal
 
Reliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkReliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkIJMTST Journal
 
A Review on Routing Protocols for Underwater Wireless Sensor Networks
A Review on Routing Protocols for Underwater Wireless Sensor NetworksA Review on Routing Protocols for Underwater Wireless Sensor Networks
A Review on Routing Protocols for Underwater Wireless Sensor NetworksIRJET Journal
 
Congestion Control Clustering a Review Paper
Congestion Control Clustering a Review PaperCongestion Control Clustering a Review Paper
Congestion Control Clustering a Review PaperEditor IJCATR
 
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...IJCSIS Research Publications
 
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...IJCNCJournal
 
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...IJCNCJournal
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsVijay Karan
 

Similar to 2933bf63f71e22ee0d6e84792f3fec1a.pdf (20)

Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...
Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...
Analysis on Data Transmission in Underwater Acoustic Sensor Network for Compl...
 
An energy aware scheme for layered chain in underwater wireless sensor networ...
An energy aware scheme for layered chain in underwater wireless sensor networ...An energy aware scheme for layered chain in underwater wireless sensor networ...
An energy aware scheme for layered chain in underwater wireless sensor networ...
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
 
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...
 
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...
Implementation of Vacate on Demand Algorithm in Various Spectrum Sensing Netw...
 
15 ijcse-01236
15 ijcse-0123615 ijcse-01236
15 ijcse-01236
 
Energy efficient routing in wireless sensor network based on mobile sink guid...
Energy efficient routing in wireless sensor network based on mobile sink guid...Energy efficient routing in wireless sensor network based on mobile sink guid...
Energy efficient routing in wireless sensor network based on mobile sink guid...
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
 
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
 
Reliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkReliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor Network
 
2 ijcse-01208
2 ijcse-012082 ijcse-01208
2 ijcse-01208
 
A Review on Routing Protocols for Underwater Wireless Sensor Networks
A Review on Routing Protocols for Underwater Wireless Sensor NetworksA Review on Routing Protocols for Underwater Wireless Sensor Networks
A Review on Routing Protocols for Underwater Wireless Sensor Networks
 
Congestion Control Clustering a Review Paper
Congestion Control Clustering a Review PaperCongestion Control Clustering a Review Paper
Congestion Control Clustering a Review Paper
 
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...
An Enhanced Approach of Clustering Protocol to Minimize Energy Holes in Wirel...
 
Santhosh hj shivaprakash
Santhosh hj shivaprakashSanthosh hj shivaprakash
Santhosh hj shivaprakash
 
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
 
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
Multi Objective Salp Swarm based Energy Efficient Routing Protocol for Hetero...
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking Projects
 

More from mokamojah

diagnostics-12-02142.pdf
diagnostics-12-02142.pdfdiagnostics-12-02142.pdf
diagnostics-12-02142.pdfmokamojah
 
Access-2022-21976_Proof_hi.pdf
Access-2022-21976_Proof_hi.pdfAccess-2022-21976_Proof_hi.pdf
Access-2022-21976_Proof_hi.pdfmokamojah
 
electronics-11-02668-v2.pdf
electronics-11-02668-v2.pdfelectronics-11-02668-v2.pdf
electronics-11-02668-v2.pdfmokamojah
 
iros2021_jiaming.pdf
iros2021_jiaming.pdfiros2021_jiaming.pdf
iros2021_jiaming.pdfmokamojah
 
Access-2022-24126_Proof_hi.pdf
Access-2022-24126_Proof_hi.pdfAccess-2022-24126_Proof_hi.pdf
Access-2022-24126_Proof_hi.pdfmokamojah
 
diagnostics-1925996.pdf
diagnostics-1925996.pdfdiagnostics-1925996.pdf
diagnostics-1925996.pdfmokamojah
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 

More from mokamojah (9)

diagnostics-12-02142.pdf
diagnostics-12-02142.pdfdiagnostics-12-02142.pdf
diagnostics-12-02142.pdf
 
Access-2022-21976_Proof_hi.pdf
Access-2022-21976_Proof_hi.pdfAccess-2022-21976_Proof_hi.pdf
Access-2022-21976_Proof_hi.pdf
 
electronics-11-02668-v2.pdf
electronics-11-02668-v2.pdfelectronics-11-02668-v2.pdf
electronics-11-02668-v2.pdf
 
dwdwd
dwdwddwdwd
dwdwd
 
iros2021_jiaming.pdf
iros2021_jiaming.pdfiros2021_jiaming.pdf
iros2021_jiaming.pdf
 
dwdwdwdwdwd
dwdwdwdwdwddwdwdwdwdwd
dwdwdwdwdwd
 
Access-2022-24126_Proof_hi.pdf
Access-2022-24126_Proof_hi.pdfAccess-2022-24126_Proof_hi.pdf
Access-2022-24126_Proof_hi.pdf
 
diagnostics-1925996.pdf
diagnostics-1925996.pdfdiagnostics-1925996.pdf
diagnostics-1925996.pdf
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 

Recently uploaded

Storyboard short: Ferrarius Tries to Sing
Storyboard short: Ferrarius Tries to SingStoryboard short: Ferrarius Tries to Sing
Storyboard short: Ferrarius Tries to SingLyneSun
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...anilsa9823
 
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...anilsa9823
 
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Servicesonnydelhi1992
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Servicesonnydelhi1992
 
Jeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...akbard9823
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson
 
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...anilsa9823
 
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...anilsa9823
 
Bobbie goods coloring book 81 pag_240127_163802.pdf
Bobbie goods coloring book 81 pag_240127_163802.pdfBobbie goods coloring book 81 pag_240127_163802.pdf
Bobbie goods coloring book 81 pag_240127_163802.pdfMARIBEL442158
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...akbard9823
 
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...anilsa9823
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboardthephillipta
 
exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxKurikulumPenilaian
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...akbard9823
 
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...anilsa9823
 
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...anilsa9823
 
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112Nitya salvi
 

Recently uploaded (20)

Storyboard short: Ferrarius Tries to Sing
Storyboard short: Ferrarius Tries to SingStoryboard short: Ferrarius Tries to Sing
Storyboard short: Ferrarius Tries to Sing
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
 
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
 
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
 
Jeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around Europe
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
 
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
 
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
 
Bobbie goods coloring book 81 pag_240127_163802.pdf
Bobbie goods coloring book 81 pag_240127_163802.pdfBobbie goods coloring book 81 pag_240127_163802.pdf
Bobbie goods coloring book 81 pag_240127_163802.pdf
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
 
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
 
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboard
 
exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptx
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
 
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...
Lucknow 💋 Russian Call Girls Lucknow - Book 8923113531 Call Girls Available 2...
 
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
 
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
 

2933bf63f71e22ee0d6e84792f3fec1a.pdf

  • 1. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Convolutional Neural Network with Second-order Pooling for Underwater Target Classification Xu Cao, Student Member, IEEE, Roberto Togneri, Senior Member, IEEE, Xiaomin Zhang, and Yang Yu, Member, IEEE, Abstract—Underwater target classification using passive sonar remains a critical issue due to the changeable ocean environment. Convolutional Neural Networks (CNNs) have shown success in learning invariant features using local filtering and max pooling. In this paper, we propose a novel classification framework which combines the CNN architecture with the second-order pooling (SOP) to capture the temporal correlations from the time- frequency (T-F) representation of the radiated acoustic signals. The convolutional layers are used to learn the local features with a set of kernel filters from the T-F inputs which are extracted by the constant-Q transform (CQT). Instead of using max pooling, the proposed SOP operator is designed to learn the co-occurrences of different CNN filters using the temporal feature trajectory of CNN features for each frequency subband. To preserve the frequency distinctions, the correlated features of each frequency subband are retained. The pooling results are normalized with signed square-root and l2 normalization, and then input into the softmax classifier. The whole network can be trained in an end-to-end fashion. To explore the generalization ability to unseen conditions, the proposed CNN model is evaluated on the real radiated acoustic signals recorded at new sea depths. The experimental results demonstrate that the proposed method yields an 8% improvement in classification accuracy over the state-of-the-art deep learning methods. Index Terms—underwater target classification, convolutional neural networks, second-order pooling, constant-Q transform. I. INTRODUCTION UNDERWATER target classification is aimed to detect and recognize the marine vessels with the radiated acoustic signals recorded by the passive sonar. It has many important applications in ocean engineering, such as automatic target recognition (ATR) and marine monitoring. The task can be formulated as a feature representation problem where the discriminative characteristics are learned from the received acoustic signals for classification. However, when applied in practical situations, robustness and generalization ability to environment variation are significant for passive sonar target classification, especially from single-sensor recordings. Several factors affect the performances of the classification systems, including the lack of a priori knowledge of the targets, the various working conditions of the same class such as the speed and the power configuration and the unpredictable ocean X. Cao, X. Zhang and Y. Yu are with the School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China(e-mail: caoxu@mail.nwpu.edu.cn; xmzhang@nwpu.edu.cn; n- wpuyuy@nwpu.edu.cn). R. Togneri is with School of Electrical, Electronics and Computer Engineer- ing, The University of Western Australia, Perth, WA 6009, Australia(e-mail: roberto.togneri@uwa.edu.au). Manuscript received August 15, 2018; revised November 5, 2018. background noise. Consequently, more adaptive and robust classification models are needed to deal with this problem. Several pattern recognition methods have been developed for underwater target classification with different features extracted from the radiated acoustic signals. In [1], the features generated from the wavelet packet transform (WPT) and the linear predictive coding (LPC) are put into the neural network (NN) classifier. In [2], a Hidden Markov Model (HMM) is used for multiaspect target detection and identification. In [3], a preprocessing method is developed to improve the performance of a feedforward neural network (NN) for passive sonar signal classification. A novel class detection scheme utilizing a clustering approach on an unsupervised neural network based Self-Organizing Map (SOM) is proposed in [4]. In [5], canonical correlation analysis (CCA) is employed as a multiaspect feature extraction method for underwater target classification. In [6], a K-nearest neighbor (K-NN) system is used as a memory to provide the closest matches of an unknown pattern in the feature space. In the past few years, support vector machines (SVMs) have seen an increased usage in applications of underwater water target classification. The method in [7] proposes to adopt SVM as the classifier for features captured using the Hilbert-Huang transform (HHT). In [8], an underwater acoustic feature extraction and classification method based on the Wigner-Ville distribution (WVD) and the SVM is presented. Compared with conventional machine-learning systems based on a priori knowledge, deep networks are able to hierarchically learn the high-level features from the large number of samples, and the extracted deep features are more robust to invariants [9–11]. In [12], Kamal et al. proposed to incorporate the Deep Belief Network (DBN) to capture several layers of deep features from the underwater acoustic signals, which are more abstract at the higher layers. Our past work in [13] utilizes a Stacked Autoencoder (SAE) for feature learning with the short time frequency transform (STFT), which provides competitive performance. However, these fully-connected networks demand huge collections of training samples for effective training, especially when applied to multiple-frame T-F features. In recent years, Convolutional Neural Networks (CNNs) have been successfully applied to many pattern recognition tasks with local connectivity and weight sharing [14–16]. Compared with the fully-connected deep models, these popular CNN architectures use a set of filters which process the local parts of the whole input to capture the detail characteristics. Usually, the max-pooling is used to generate holistic and
  • 2. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 invariant representations from the CNN features. However, the max-pooling just focuses on the first-order statistics in the local regions of the CNN features. For underwater acoustic signals which have strong temporal relations, max-pooling may ignore the high-level correlations in the time domain. The second-order pooling (SOP) has shown success in computer vision tasks to capture the second-order correlations of the local features [17]. In this paper, we propose to learn the second-order temporal correlations of the CNN features for underwater target classification. The proposed SOP strategy is designed to compute the co-occurrences of different CNN filters using the temporal feature trajectory of CNN features as input. Compared to the max-pooling, the proposed SOP strat- egy is capable of exploring the second-order co-occurrences for the CNN feature maps of underwater acoustic signals to improve the classification performance. The constant-Q transform (CQT) is popular in music sig- nal processing since the bin frequencies of the CQT scale have a perceptually relevant geometrical distribution [18–20]. Compared to the STFT, the CQT can provide a better fre- quency resolution for lower frequencies and a better temporal resolution for higher frequencies [21]. The radiated signal of an underwater target contains much useful information in the low frequency subbands, such as the line spectrum components, which are related to the propeller’s turning. The greater resolution in the low frequencies of the CQT can contribute to a more robust feature representation. In this study, unlike [12, 13], we use the CQT as the T-F representation method for underwater target classification. In this paper, a new underwater target classification frame- work based on the CNN model is proposed. Our work focuses on the second-order pooling (SOP) strategy for the CNN feature maps. The proposed method is named the CNN-SOP model. For each frequency subband, the pooling operation takes a temporal sequence of every CNN feature map as input to compute the similarities between these temporal features of different CNN filters. The correlation features of different frequency subbands are then passed through a signed square-root step and l2 normalization to generate the final feature vector, which is input into the softmax classifier for classification. Furthermore, we propose to use the CQT to generate the T-F representation for the CNN-SOP model. Since the generalization ability to unseen conditions is significant in practical applications, the proposed classification method is tested on the real radiated acoustic signals recorded at new depths. The results show that the proposed method achieves an 8% improvement compared to other deep learning-based approaches. The proposed second-order pooling strategy is shown to improve the classification accuracy by further 4% over the max pooling. The rest of this paper is organized as follows: Section II introduces the related work of the CNN architecture and the pooling strategy. Section III details the proposed CNN-SOP model. The experimental results of this method is provided in Section IV. In Section V we draw our conclusions of this work. II. RELATED WORK Recently, CNN architectures have increasingly been used in acoustic signal recognition. Approaches developed for image recognition [22] can be extended to signal classification by regarding the T-F representation (e.g. spectrogram and MFCC) of raw signals as an image. In [23], the CNN model is introduced in acoustic event detection to capture the local properties of acoustic events, which provides competitive performance in the evaluation task. In [24], the CNN networks in conjunction with different data augmentation methods are applied to environmental sound classification. In [25], the performances of different auditory and spectrogram image features using CNN models are evaluated. In [26], the CNN architecture is integrated with the SVM classifier to improve the overall classification performance of the real-time signals. In [27], a CWT and CNN-based fault detection method is proposed to extract the comprehensive T-F features of fault signals. A difficulty when extending the regular CNN-based meth- ods to acoustic signals is that the translation invariability in frequency may not be appropriate since the difference in frequency bands usually means a different class. This problem also exists in underwater acoustic signals since the spectrum distributions of various vessels differ a lot. One may overcome this difficulty by presenting a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram [28]. In [29], a parallel CNN architecture is created, which comprises a CNN layer which is optimized for processing and recognizing relations in the frequency domain, and a parallel one which is aimed at capturing temporal relations. Another promising CNN network to deal with this problem is to add an intermap pooling (IMP) layer to increase robustness to spectral variations [30]. Second-order pooling methods have been widely used in many computer vision tasks. Our proposed pooling approach is inspired by the second-order pooling scheme in [17] which summarizes sets of local features inside a free-form region, while preserving information about their pairwise correlations. However, this approach uses second-order pooling directly on raw local descriptors such as SIFT while we apply the SOP to the CNN feature maps in this work. In [31], a bilinear CNN (B-CNN) model is proposed for image classification which consists of two feature extractors based on CNNs whose outputs are multiplied using the outer product at each location to obtain the bilinear vector. When using the same CNN extractor, the bilinear pooling used in the B-CNN model can be seen as a second-order pooling approach. An improved bilinear pooling method for CNN features is proposed in [32] which proposes to use the matrix square-root normalization to improve the classification performance. In [33], two compact bilinear representations are proposed to reduce the dimensions of the full bilinear models. Since the T-F representation is different from the image input, in contrast to these B-CNN models, our SOP method just focuses on the temporal corre- lations and preserves the correlation matrix for each frequency subband, which can retain the spectral variation characteristics
  • 3. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 of different classes. Another second-order temporal pooling is proposed for action recognition in [34], which uses the temporal classification scores to generate the descriptor rather than the CNN features. III. THE PROPOSED SYSTEM The whole framework of the proposed CNN-SOP model is described in Fig. 1. In the preprocessing stage, the raw radiated acoustic signals are converted into a time-frequency representation using the CQT. Multiple frames of the CQT representation are combined to generate the input for the CNN network. Instead of using max pooling, we adopt second-order pooling for the CNN feature maps to obtain the temporal correlation features of the input. Elementwise square-root and l2 normalization are used to further improve the performance. The whole network can be trained end-to-end with back- propagation. A. Preprocessing using CQT For underwater radiated signals which belong to the non- stationary signals, the T-F representation approaches have been shown to be more effective for feature extraction. The CQT can transform the time-domain signal to the T-F domain such that the center frequencies of the frequency bins are geometrically spaced and their Q-factors are all equal [35]. That means the CQT can provide a better frequency resolution for low frequency subbands compared to the STFT, and can show more details about the low-frequency components. In this paper, we propose to use the CQT to deal with the radiated acoustic signals. Given a discrete time-domain signal x(n), the CQT is defined as: XCQ (k, n) = n+⌊Nk/2⌋ ∑ j=n−⌊Nk/2⌋ x(j)a∗ k(j − n + Nk/2) (1) where k = 1, 2, . . . , K represents the K frequency bins of the CQT, and a∗ k(n) is the complex conjugate of the basis function [35]. The Nk denotes the window length which is set to be variable. The center frequency of the kth bin is defined by: fk = f12 k−1 B (2) where f1 is the center frequency of the lowest -frequency bin and B is the number of bins for each octave, which determines the time-frequency resolution trade-off of the CQT. Then, the total number of frequency bins K of the CQT can be computed as: K = B(log2 fmax f1 + 1) (3) where fmax is the center frequency of the highest-frequency bin. In our work, we propose to use the CQT to obtain the T-F features for the CNN model. The CQT T-F feature is derived from multiple frames as follows: X = { X1 , X2 , . . . , XN } (4) Convolutional layer 1 Convolutional layer 2 Convolutional layer L Input 1 f 2 f 1 W 2 W X 1 H 2 H L H Fig. 2. The CNN architecture. where Xi = 20 log10 ||XCQ (i)|| (5) and Xi is the CQT feature for frame i and N denotes the total number of frames. XCQ (i) ∈ RK is the complex-valued CQT vector of the K frequency bins representing frame i. B. CNN architecture In contrast to the fully-connected layers, CNNs are designed to restrict the connections between the hidden units and the input units, which means that each hidden unit is supposed to connect to only a small neighborhood of input units. The locally connected structure also makes it possible for CNNs to model the local correlations of the input. By replicating weights across the whole input, the parameters of the convo- lutional layers are reduced. In this paper, we propose to use the CNN model comprised of L convolutional layers to learn the deep representation of the CQT feature. The CNN architecture is described in Fig. 2. Unlike the regular CNN models, the max-pooling layers are not adopted in the network since the resolution is important for classification. Given our input X, the CNN model is supposed to learn the nonlinear representation f which maps the input X to the Lth output HL : HL = f(X) = fL(· · · f2(f1(X; W1); W2) · · · , WL) (6) where fl is the mapping function of the lth convolutional layer, which takes the input Hl to generate the feature maps Hl+1 with the filter parameter Wl. The convolutional layers are constructed by the rectified linear units (ReLUs). The detail of the convolutional process can be found in [36]. The feature maps of the last layer HL is a h × w × c array, where h and w denote the height and width of the feature map, and c is the number of the feature maps. C. Second-order pooling The T-F representation of the radiated acoustic signals has strong temporal correlations, which can help to discriminate different targets. In this work, we propose to use a second- order pooling scheme for the CNN features to capture the temporal correlations of the CQT input. Since the CNN feature maps HL are learned from the CQT feature X, h and w correspond to the frequency bins and the temporal frames of the CQT input. For each frequency bin of the feature maps, we denote sm = [sm 1 , sm 2 , . . . , sm w ] ∈ Rw as the temporal feature trajectory of the mth feature map (see Fig. 3).
  • 4. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 CQT input Second-order pooling h w c ´ ´ c c h ´ ´ Linear + Normalization Dense layer + Softmax classifier m s CNN architeture K N ´ CNN feature maps Pooling result Resulting vector l u z ( ) SOP S L H X Class scores Preprocessing Original signal Fig. 1. The framework of the proposed CNN-SOP system. m s S ( ) SOP S c w ´ c c ´ T SS Fig. 3. The second-order pooling operation The second-order pooling operator is defined as: SOP(sj , sk ) = w ∑ i=1 sj i sk i = sjT sk (7) where SOP(sj , sk ) represents the temporal correlations of two feature trajectories sj and sk from the jth and kth feature maps. The SOP operator is designed to capture the interactions of two convolution filters along the time axis. For c feature maps, we denote S ∈ Rc×w as the temporal feature matrix, then the SOP operator can be defined in matrix form as: SOP(S) = SST (8) where SOP(S) ∈ Rc2 is a symmetric positive semidefinite matrix, which captures the temporal correlations of all the CNN filters for one frequency bin. Since the differences in the frequency bins are useful to distinguish the underwater acoustic signals, unlike the pooling strategy in [31] which use sum-pooling to aggregate the correlations across the whole image, we retain the SOP results of all frequency bins to preserve the frequency distinctions for classification. The last SOP feature is shown in Fig. 1, which consists of h SOP operators corresponding to the height of the feature maps. It is often found that normalization offers significant im- provements to the deep network. In this work, we incorporate the elementwise square-root and l2 normalization for the SOP operators. The resulting SOP operators are first transferred into the vector p ∈ Rl , where l = c × c × h. Then, the resulting vector p is passed through the elementwise square- root (q ← sign(p) √ p) and l2 normalization (z ← q/||q||2). For the CNN feature maps of size h × w × c, the computa- tional complexity of our proposed SOP strategy is O(hwc2 ), which is the same as the bilinear pooling in [31], while the max pooling is O(hwc). D. Softmax classification The resulting vector of second-order pooling z is then input to the softmax layer for classification after a dense layer. The class scores for the ith sample z(i) to the category j can be computed as follows: p(y(i) = j|a(i) ; θ) = eθT j a(i) u ∑ t=1 eθT t a(i) (9) where a(i) = fL+1(z(i) ; WL+1) (10) and a(i) ∈ Rl1 is the output activation at the dense layer for the ith sample, and l1 and WL+1 denote the node number and the model parameter of the dense layer. We still use the ReLU for the mapping function fL+1. The θj ∈ Rl1 denotes the parameter of the softmax layer for the jth unit, and u is the total number of classes. In this paper, we use the cross-entropy loss function as the objective function [37]. Since the second-order pooling and the normalization steps are both differentiable, the back- propagation can be used to calculate the gradient [31]. Then we fine-tune the whole model using the Adam optimization algorithm. The whole model can be trained end-to-end. IV. EXPERIMENTS AND RESULTS This section provides experiments to evaluate the perfor- mance of our proposed CNN-SOP model for underwater target classification. The experiments were performed on the real radiated acoustic signals of 5 marine vessels. The advantage of the proposed SOP scheme was verified by comparing with the max pooling and the bilinear pooling [31]. We also compared the classification accuracy of the proposed method with previous deep learning methods, such as the DBN model [12] and the SAE model [13].
  • 5. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5 TABLE I DETAILS ABOUT THE DATASET INCLUDING THE NUMBER OF SAMPLES FOR EACH VESSEL USED IN THE TRAINING OR TESTING SET Depth (m) A B C D E Dataset 50 2880 2640 2880 1200 3600 Training 150 5520 6480 1200 4320 800 Training 70 2880 5680 920 2880 640 Testing 100 4800 4560 3600 3360 4800 Testing 200 2640 1640 560 1680 560 Testing A. Experimental setup In the experiments, the radiated acoustic signals were recorded with a single-hydrophone from the South China Sea in 2015. The hydrophone was placed below the sea level at 5 depths (50m,70m,100m,150m and 200m). The radiated signals were collected from 5 different vessels, which had various weight, size, propeller structure and engine system. The sampling rate of the signals was 50 kHz. For each run, the portion of the recording when the vessel ranged from +500m to -500m was selected. In the preprocessing stage, the raw radiated signals were transferred into CQT features. The signals were resampled at the sampling rate of 4 kHz. We used the Matlab toolbox to compute the CQT representation [38]. For the radiated signals, we just focused on the frequency below 1 kHz. The center frequency of the lowest-frequency bin f1 and the highest frequency bin fmax were set to be 4 Hz and 1 kHz, respectively. The bin number for each octave B was 8. Thus, the CQT can capture 64 bands covering 8 octaves. Each single CQT feature frame can be computed using 23 points (5 milliseconds). We combined 64 frames for each CQT feature to generate the input sample of the CNN model, then each sample had the size of 64×64, which was derived from 1472 points (0.32 seconds). Since the radiated signals recorded from different depths have various signal-to-noise ratios (SNRs), to evaluate the generalization ability to unseen conditions, we trained the proposed CNN-SOP model with the samples generated from the depths of 50m and 150m, while testing the model with the samples at depths of 70m, 100m and 200m. The training set contained 31520 input samples and the testing set had 41200 samples. The details of the whole dataset are presented in Table I. The proposed CNN model contained several convolutional layers which had the same filter size of 8 bands × 8 frames and the strides size of 2 bands × 2 frames. The whole model was optimized using the Adam optimizer with the learning rate of 0.0001. The network was trained for 1000 epochs with a minibatch size of 50. Our implementation was developed upon Tensorflow using a NVIDIA Tesla K40 GPU. B. Comparison with the CNN model using max pooling We first compared the performance of the proposed second- order pooling based CNN model (CNN-SOP) with the CN- N model using max-pooling (CNN-MP). For the CNN-MP model, the CNN feature maps of the last convolutional layer were pooled with the pooling size of 2 bands × 2 frames CQT input (64× 64) Conv. 32× 32× 8 Conv. 16× 16× 16 SOP 16× 16× 16 Norm. 4096 Dense 1024 Softmax 5 CNN-SOP (2L) CQT input (64× 64) Conv. 32× 32× 8 Conv. 16× 16× 16 MP. 8× 8× 16 Dense 1024 Softmax 5 CNN-MP (2L) Fig. 4. Model structure of the CNN-SOP model and the CNN-MP model. and the sub-sampling factor of 2 × 2. We have tested the two CNN models with different convolutional layers from 1 to 4. The model structure for 2 convolutional layers is described in Fig. 4. To reduce the computational complexity of the SOP, the number of the CNN filters c was set to a small number, in this case 16. The resolution is important for the CQT feature of underwater acoustic signals, especially the frequency resolution since the CQT features for some targets are very similar in the frequency domain. To obtain more discriminative features, we just added one max-pooling layer after the final convolutional layer for the two CNN models. Both networks were evaluated using the dataset in Table I. Fig. 5 shows the classification accuracies of the CNN-SOP model and the CNN-MP model with different convolutional layers. It can be seen that the proposed CNN-SOP network achieves better performance compared to the regular CNN-MP model, with an improvement of 4% in overall classification accuracy. We also found that deeper CNNs may not always produce better results in our experiments when applied to the two CNN models, and both CNN models yield the highest accuracies when the number of the convolutional layers set to be 2. This may be explained by considering that the number of training samples is limited and we have used a wider CNN filter (8 × 8), thus using fewer convolutional layers may be more efficient than a larger number of layers. It can be observed that the classification accuracies of these two models decline with the increase of the sea depths. This may be due to the SNRs of the radiated signals degrading at greater depths and increasing between the surface vessel and hydrophone. To explore the individual target performance of these two models, we also use the confusion matrix to show the clas- sification results. Both networks have 2 convolutional layers, which proves to be the best configuration. We can see from Fig. 6 that the CNN-SOP model provides better classification accuracies than the CNN-MP model for all targets. C. Comparison with STFT feature The STFT feature has been used as the input for a DBN model to provide the spectrum information of the radiated signals [12]. To evaluate the advantages of the CQT feature, the STFT feature was used for comparison. Similar to [12],
  • 6. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6 1 2 3 4 Convolutional layer number 0.8 0.85 0.9 0.95 1 Accuracy CNN-MP CNN-SOP 1 2 3 4 Convolutional layer number 0.8 0.85 0.9 0.95 1 Accuracy CNN-MP CNN-SOP 1 2 3 4 Convolutional layer number 0.8 0.85 0.9 0.95 1 Accuracy CNN-MP CNN-SOP 1 2 3 4 Convolutional layer number 0.8 0.85 0.9 0.95 1 Accuracy CNN-MP CNN-SOP Fig. 5. Classification accuracies of the CNN-SOP model and the CNN-MP model with different convolutional layers for the dataset at various depths of 70m (upper-left), 100m (upper-right), 200m (lower-left), and the overall results (lower-right). the STFT feature was calculated with 1024 FFT points and a sampling rate 4 kHz. We concatenated 8 frames to generate the input for the CNN model. The last input STFT feature had the size of 512 dimensions × 8 frames. In this section, we still applied the CNN-SOP model and the CNN-MP model to the STFT feature for comparison. When using the STFT feature, the filter size and the strides size of the convolutional layers were both set to be 8 bands × 1 frame. We still used 2 convolutional layers for the CNN models with the STFT feature, which have the feature size of 64 × 8 × 8 and 8 × 8 × 16 for the two convolutional layers. The classification results of the CQT feature and the STFT feature using two CNN models are compared in Fig. 7. It can be seen that the CQT feature offers a 3% improvement over the STFT feature using the CNN-SOP model, and a 1.6% improvement using the CNN-MP model. This demonstrates that the CQT feature is more appropriate for the CNN model compared with the STFT feature when applied to radiated acoustic signals, which may be explained by the better res- olution at the lower frequencies. D. Comparison with other pooling methods To verify the effectiveness of the proposed SOP strategy, we compared the proposed SOP with the bilinear pooling in [31]. In [31], the B-CNN model is proposed which applies bilinear pooling to the VGG-16 network [39]. When using the same CNN extractor, the bilinear pooling can be seen as a second-order pooling approach. In this section, three pooling 0.9603 0.0279 0.0000 0.0000 0.0322 0.0312 0.9721 0.0274 0.0000 0.0000 0.0000 0.0000 0.9315 0.0216 0.0092 0.0000 0.0000 0.0411 0.9784 0.0000 0.0085 0.0000 0.0000 0.0000 0.9587 class A class B class C class D class E c l a s s A c l a s s B c l a s s C c l a s s D c l a s s E 0.9062 0.0380 0.0219 0.0000 0.0675 0.0589 0.9374 0.0128 0.0000 0.0000 0.0000 0.0000 0.9041 0.0606 0.0138 0.0000 0.0000 0.0612 0.9394 0.0000 0.0349 0.0246 0.0000 0.0000 0.9187 class A class B class C class D class E c l a s s A c l a s s B c l a s s C c l a s s D c l a s s E Fig. 6. Confusion matrix for the overall classification accuracy of the CNN- SOP model (upper) and the CNN-MP model (lower). X-axis indicates the predicted label and Y-axis indicates the true label. 70m 100m 200m Overall 0.85 0.88 0.91 0.94 0.97 1 Accuracy STFT+CNN-MP STFT+CNN-SOP CQT+CNN-MP CQT+CNN-SOP Fig. 7. Classification results of the CQT feature and the STFT feature using two CNN models. approaches based on the VGG network were used for com- parison with the same CQT feature, which were the proposed SOP, the bilinear pooling [31] and the max-pooling. However, since the standard VGG-16 network has 16 convolutional layers, leading to too many parameters to train, the standard VGG may not be suitable for our limited dataset. Thus we considered using a modified VGG-16 network consisting of
  • 7. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7 the first 7 convolutional layers, three pooling layers and one dense layer in the experiment. Unlike the CNN model used in IV.B which adopted the convolutional filter of size 8 × 8, the VGG network used smaller (3 × 3) filters. We also used fewer filters in each convolutional layer of the modified VGG network, with (16-32-64) filters for the three convolutional groups. The single dense layer had 4096 units like the standard VGG network. The max-pooling of the modified VGG network (VGG-MP) was similar to the standard VGG, which was performed over a 2 × 2 window with stride 2. The B-CNN model based on the modified VGG network had 64 filters in the final convolutional layer, thus the bilinear feature dimension was 64×64 = 4096. We also applied the proposed SOP strategy on the same VGG network above (VGG-SOP) for comparison. The CNN feature of the final convolutional layer had the size of 8 × 8 × 64, which meant that the SOP feature had the dimension of 8 × 64 × 64 = 32768. The elementwise square-root and l2 normalization were used before the final classification for the SOP and the bilinear pooling. The learning rate of the Adam optimizer was set to 0.001. The network was trained for 600 epochs with a minibatch size of 64. It can be seen from Fig. 8 that when using the same VGG network, the VGG-SOP outperforms the B-CNN model [31] by nearly 2% and the max-pooling by 3%. The results shows that compared to the bilinear pooling, the proposed SOP strategy can take advantage of the local discrimination along the frequency axis, which is more suitable for classification of underwater acoustic signals. 70m 100m 200m Overall 0.8 0.83 0.86 0.89 0.92 0.95 Accuracy VGG-MP B-CNN [31] VGG-SOP Fig. 8. Classification results of the proposed SOP, the bilinear pooling and the max pooling based on the VGG network. E. Comparison with previous DNN-based classification mod- els In this section, we compared the classification accuracy against other deep learning-based underwater target classifi- cation systems [12, 13] with our dataset. We have applied the CQT to the DBN model [12] and the SAE model [13] for comparison. Since the DBN and SAE are both fully-connected deep networks, the input CQT sample has the dimension of 4096 (64 bands × 64 frames), which may lead to too many parameters and a heavy computational load. Thus we extracted the averages across consecutive 8 frames from the original 64 TABLE II COMPARISON OF THE PROPOSED CNN-SOP MODEL WITH THE DBN MODEL AND THE SAE MODEL USING THE CQT FEATURE IN TERMS OF CLASSIFICATION ACCURACY Method 70 m 100 m 200 m Overall DBN [12] 0.8941 0.8707 0.8305 0.8712 SAE [13] 0.9052 0.8819 0.8553 0.8847 Proposed CNN-SOP 0.9714 0.9656 0.9421 0.9634 frames to generate the CQT features for the DBN and SAE, which had the dimension of 512 (64 bands × 8 frames). The model structures of the DBN and SAE were similar to [12] and [13]. The DBN model had 3 hidden layers (200-100-50) while the SAE model was composed of 3 autoencoders with 100 units. We can see from Table II that the proposed CNN- SOP model improves the overall classification accuracy by 8% compared to the DBN and SAE model when using the same CQT input. This shows that our CNN-SOP model has a great advantage over these fully-connected networks. V. CONCLUSION In this paper, we have introduced a novel CNN model using second-order pooling to capture the temporal correlations for underwater target classification. The radiated signals are transformed into a T-F feature using the CQT as the inputs to the CNN model. The proposed second-order pooling learns the temporary similarities of different CNN filters by computing the covariance matrix of the CNN feature maps along the time axis. The experimental results on the real radiated acoustic signals recorded under various depths demonstrate that the second-order pooling achieves better performance over the max pooling under various sea depths. The CQT feature has also been demonstrated to be more effective than the STFT feature when applied to the proposed CNN model. The proposed CNN-based classification approach improves the classification accuracy by 8% compared with the state-of-the- art deep learning methods. ACKNOWLEDGMENT The research was supported by the National Science Foun- dation of China (Grant no. 61601369). This work was com- pleted when the first author was a visiting student in the School of Electrical, Electronic and Computer Engineering, University of Western Australia. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research. REFERENCES [1] M. R. Azimi-Sadjadi, D. Yao, Q. Huang, and G. J. Dobeck, “Underwater target classification using wavelet packets and neural networks,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 784–794, 2000. [2] S. Ji, X. Liao, and L. Carin, “Adaptive multiaspect target classification and detection with hidden markov models,” IEEE Sensors Journal, vol. 5, no. 5, pp. 1035–1042, 2005.
  • 8. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8 [3] J. De Seixas, N. De Moura et al., “Preprocessing passive sonar signals for neural classification,” IET radar, sonar & navigation, vol. 5, no. 6, pp. 605–612, 2011. [4] S. Kamal, A. Mujeeb, M. Supriya et al., “Novel class detection of underwater targets using self-organizing neural networks,” in Underwater Technology (UT), 2015 IEEE. IEEE, 2015, pp. 1–5. [5] A. Pezeshki, M. R. Azimi-Sadjadi, and L. L. Scharf, “Undersea target classification using canonical correla- tion analysis,” IEEE Journal of Oceanic Engineering, vol. 32, no. 4, pp. 948–955, 2007. [6] M. R. Azimi-Sadjadi, D. Yao, A. A. Jamshidi, and G. J. Dobeck, “Underwater target classification in changing environments using an adaptive feature mapping,” IEEE Transactions on neural networks, vol. 13, no. 5, pp. 1099–1111, 2002. [7] S. Wang and X. Zeng, “Robust underwater noise target- s classification using auditory inspired time–frequency analysis,” Applied Acoustics, vol. 78, pp. 68–76, 2014. [8] Y. Wu, X. Li, and Y. Wang, “Extraction and classification of acoustic scattering from underwater target based on wigner-ville distribution,” Applied Acoustics, vol. 138, pp. 52–59, 2018. [9] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mo- hamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [10] S.-H. Fang, Y.-X. Fei, Z. Xu, and Y. Tsao, “Learning transportation modes from smartphone sensors based on deep neural network,” IEEE Sensors Journal, vol. 17, no. 18, pp. 6111–6118, 2017. [11] A. Dairi, F. Harrou, Y. Sun, and M. Senouci, “Ob- stacle detection for intelligent transportation systems using deep stacked autoencoder and k-nearest neighbor scheme,” IEEE Sensors Journal, vol. 18, no. 12, pp. 5122–5132, 2018. [12] S. Kamal, S. K. Mohammed, P. S. Pillai, and M. Supriya, “Deep learning architectures for underwater target recog- nition,” in Ocean Electronics (SYMPOL), 2013. IEEE, 2013, pp. 48–54. [13] X. Cao, X. Zhang, Y. Yu, and L. Niu, “Deep learning- based recognition of underwater target,” in Digital Signal Processing (DSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 89–93. [14] P. Swietojanski, A. Ghoshal, and S. Renals, “Convolu- tional neural networks for distant speech recognition,” IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1120– 1124, 2014. [15] X. Xiang, N. Lv, M. Zhai, and A. El Saddik, “Real- time parking occupancy detection for gas stations based on haar-adaboosting and cnn,” IEEE Sensors Journal, vol. 17, no. 19, pp. 6360–6367, 2017. [16] Y. Wang, A. Yang, X. Chen, P. Wang, Y. Wang, and H. Yang, “A deep learning approach for blind drift calibration of sensor networks,” IEEE Sensors Journal, vol. 17, no. 13, pp. 4158–4171, 2017. [17] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in European Conference on Computer Vision. Springer, 2012, pp. 430–443. [18] W. J. Pielemeier and G. H. Wakefield, “A high-resolution time–frequency representation for musical instrument signals,” The Journal of the Acoustical Society of Amer- ica, vol. 99, no. 4, pp. 2382–2396, 1996. [19] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni, “Time-frequency analysis of musical signals,” Proceed- ings of the IEEE, vol. 84, no. 9, pp. 1216–1230, 1996. [20] G. Costantini, R. Perfetti, and M. Todisco, “Event based transcription system for polyphonic piano music,” Signal Processing, vol. 89, no. 9, pp. 1798–1811, 2009. [21] J. C. Brown, “Calculation of a constant q spectral trans- form,” The Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425–434, 1991. [22] Y. LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recognition with invariance to pose and lighting,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Com- puter Society Conference on, vol. 2. IEEE, 2004, pp. II–104. [23] M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani, “Exploiting spectro-temporal locality in deep learning based acoustic event detection,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015, no. 1, p. 26, 2015. [24] J. Salamon and J. P. Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017. [25] R. Hyder, S. Ghaffarzadegan, Z. Feng, J. H. Hansen, and T. Hasan, “Acoustic scene classification using a cnn- supervector system trained with auditory and spectro- gram image features,” Proc. Interspeech 2017, pp. 3073– 3077, 2017. [26] S. Lekha and M. Suchetha, “A novel 1-d convolution neu- ral network with svm architecture for real-time detection applications,” IEEE Sensors Journal, vol. 18, no. 2, pp. 724–731, 2018. [27] M.-F. Guo, X.-D. Zeng, D.-Y. Chen, and N.-C. Yang, “Deep-learning-based earth fault detection using contin- uous wavelet transform and convolutional neural network in resonant grounding distribution systems,” IEEE Sen- sors Journal, vol. 18, no. 3, pp. 1291–1300, 2018. [28] L. Deng, O. Abdel-Hamid, and D. Yu, “A deep convo- lutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6669–6673. [29] T. Lidy and A. Schindler, “Cqt-based convolutional neural networks for audio scene classification,” in Pro- ceedings of the Detection and Classification of Acous- tic Scenes and Events 2016 Workshop (DCASE2016), vol. 90. DCASE2016 Challenge, 2016, pp. 1032–1048. [30] H. Lee, G. Kim, H.-G. Kim, S.-H. Oh, and S.-Y. Lee,
  • 9. 1558-1748 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2018.2886368, IEEE Sensors Journal JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9 “Deep cnns along the time axis with intermap pooling for robustness to spectral variations,” IEEE signal processing letters, vol. 23, no. 10, pp. 1310–1314, 2016. [31] T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear cnn models for fine-grained visual recognition,” in Proceed- ings of the IEEE International Conference on Computer Vision, 2015, pp. 1449–1457. [32] T.-Y. Lin and S. Maji, “Improved bilinear pooling with cnns,” arXiv preprint arXiv:1707.06772, 2017. [33] Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pooling,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 317–326. [34] A. Cherian and S. Gould, “Second-order temporal pooling for action recognition,” arXiv preprint arX- iv:1704.06925, 2017. [35] C. Schörkhuber and A. Klapuri, “Constant-q transform toolbox for music processing,” in 7th Sound and Music Computing Conference, Barcelona, Spain, 2010, pp. 3– 64. [36] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition,” in Acous- tics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 4277– 4280. [37] S. W. Abeyruwan, D. Sarkar, F. Sikder, and U. Visser, “Semi-automatic extraction of training examples from sensor readings for fall detection and posture monitor- ing,” IEEE Sensors Journal, vol. 16, no. 13, pp. 5406– 5415, 2016. [38] C. Schörkhuber, A. Klapuri, N. Holighaus, and M. Dörfler, “A matlab toolbox for efficient perfect recon- struction time-frequency transforms with log-frequency resolution,” in Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014. [39] K. Simonyan and A. Zisserman, “Very deep convolu- tional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. PLACE PHOTO HERE Xu Cao Biography text here. Roberto Togneri Biography text here. Xiaomin Zhang Biography text here. Yang Yu Biography text here.