SlideShare a Scribd company logo
1 of 12
Download to read offline
Advanced Engineering Informatics 54 (2022) 101815
Available online 23 November 2022
1474-0346/© 2022 Elsevier Ltd. All rights reserved.
Full length article
A novel Brownian correlation metric prototypical network for rotating
machinery fault diagnosis with few and zero shot learners
Jingli Yang, Changdong Wang, Chang’an Wei *
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150006, People’s Republic of China
A R T I C L E I N F O
Keywords:
Rotating machinery
Fault diagnosis
Prototypical network
Multi-scale mask
Brownian distance covariance
A B S T R A C T
Due to the variability of working conditions and the scarcity of fault samples, the existing diagnosis models still
have a big gap under the condition of covering more practical application scenarios. Therefore, it is of great
significance to study an intelligent diagnosis scheme that takes few samples in the training source domain and
zero samples in the test target domain (FST-ZST) into account. A Brownian correlation metric prototypical
network (BCMPN) algorithm based on a multi-scale mask preprocessing mechanism is proposed for the above
problem. First, this paper constructs a multi-scale mask preprocessing mechanism (MMP) to improve the opti­
mization starting point. Second, the multi-scale feature embedding is realized through the dilation convolution
module and the effective light channel attention (ELCA) module. Third, based on the Brownian distance simi­
larity measurement, we learn the feature representation by measuring the difference between the joint feature
function and the edge product in the field of diagnosis. Finally, based on the gear data set of the Connecticut
university (UConn) and the data collected in the laboratory, it is proved that the BCMPN has better performance
in the problem of FST-ZST.
1. Introduction
With the vigorous development of prediction and health manage­
ment technology, intelligent diagnosis algorithm based on deep learning
helps people understand and decide the maintenance strategy of
equipment by monitoring the obtained data [1,2]. The deep belief
network (DBN) [3], the long-term and short-term memory network
(LSTM) [4], the convolutional neural network (CNN) [5–7], and other
methods of deep learning algorithms have made great achievements in
the field of fault diagnosis and life prediction of rotating machinery.
However, due to the changes in working conditions in the practical
application, not only can we not obtain sufficient fault data sets, but also
there are even no samples available in many actual scenarios due to
safety constraints [8,9]. Therefore, the scarcity of samples and the
complexity of diagnostic conditions seriously limit the potential of the
above methods to adapt to new diagnostic tasks. At present, it is a great
challenge to train a powerful fault diagnosis model and make it work
well under complex conditions such as FST-ZST [10].
As the diagnosis scenario changes with the changes in equipment
structure, operating conditions, and data quality, the data feature also
become highly complex [11,12]. Therefore, an in-depth study of fault
diagnosis based on cross-domain is the key to realizing the application of
deep learning methods to practical equipment. To solve the complex
cross working condition and even cross-equipment problems, some re­
searchers have proposed the corresponding domain adversarial method.
Zou et al. [13] developed a fault diagnosis model based on Wasserstein
adversarial channel compression variational automatic encoder
(WACCVAE). The interference of redundant features is reduced by
compressing the channel, and then the distance constraint between
classes and within classes is imposed on the proposed model to enhance
the distribution alignment of the same different-domain samples. Tian
et al. [14] took multiple source domains as the diagnostic knowledge
and developed a multi- branch network structure to match the feature
space distribution. The local maximum mean difference was introduced
to correct the distribution difference of subdomains, and the multiple
source classifiers were applied to diagnose the status of equipment. To
learn domain invariant features, Wang et al. [15] designed a deep
adaptive adversarial network (DAAN) including a condition recognition
module and domain adversarial learning module to automatically
extract features and classify the health status of different working con­
ditions. Wan et al. [16] constructed a new deep convolution multi
adversarial domain adaptive network (DCMADAN). The constructed
* Corresponding author.
E-mail addresses: jinglidg@hit.edu.cn (J. Yang), weichangan2021@163.com (C. Wei).
Contents lists available at ScienceDirect
Advanced Engineering Informatics
journal homepage: www.elsevier.com/locate/aei
https://doi.org/10.1016/j.aei.2022.101815
Received 14 July 2022; Received in revised form 10 October 2022; Accepted 12 November 2022
Advanced Engineering Informatics 54 (2022) 101815
2
domain adaptive module combined with a multi-core maximum mean
difference (MK-MMD) and multi-domain discriminator to debug the
edge distribution and conditional distribution. Chen et al. [17] devel­
oped an adversarial domain invariant generalization (ADIG) fault
diagnosis framework based on adversarial learning. After integrating the
data of multiple different domains, the adversarial learning between the
feature extraction module and the domain classification module is used
to realize high-precision cross-domain diagnosis. However, the read­
justment of the model will pay a high time cost when the above methods
face new diagnostic tasks.
For alleviating the problem of differences in the distribution of data
features, some researchers have combined transfer learning and deep
learning to diagnose faults in recent years. Based on transfer learning,
Yang et al. [18] constructed a balance factor to weigh the target samples,
so that the adaptive subnetwork can jointly adapt to the partial distri­
bution of the source domain and the target domain. In addition, they
used the multi-source diagnostic knowledge fusion module to integrate
multiple diagnostic decisions. Liu et al. [19] transmitted the pre-trained
deep CNN in the source domain to the target domain, and then applied
deep adversarial training between two domains to optimize the pa­
rameters for reducing the domain offset. Chen et al. [20] pretrained the
one-dimensional CNN in a large source data set, and the excellent
transferability of the proposed transferable convolutional neural
network (TCNN) has been verified on three test rigs. However, the
premise of the high performance of the above diagnosis method based
on transfer learning is that the source domain should contain a large
amount of effective training data [21]. As we all know, the data that can
be obtained is limited in many actual scenarios.
At present, many people focus on the ability of feature embedding
and classification stage models, ignoring the importance of the input
preprocessing stage. We believe that preprocessing based on data
enhancement is an important stage in gaining the internal feature rep­
resentation of different categories. Therefore, a complete and effective
preprocessing mechanism can make the diagnostic performance obtain a
high gain across working conditions and even across equipment [21].
Fortunately, one-dimensional time-domain vibration signals can be
converted into two-dimensional forms and presented in the form of
time–frequency images effectively. Meanwhile, CNN can be utilized to
extract critical features from high-dimensional data in the image [22].
From a statistical point of view, the essence of fault diagnosis after
transforming signals into images is equivalent to embedding random
vectors expressing fault features into high-dimensional space and
measuring their correlation [8]. Among the diagnosis methods based on
correlation metric, the fault method based on few-shot learning gradu­
ally shows superior performance [23,24]. However, the most current
methods use Euclidean distance and cosine distance to realize distance
measurement, and they only consider the marginal distribution between
different sample features and ignore their joint distribution, which will
lead to the limited diagnostic ability of the model [25].
It can be seen from the above that the dependence between the two
images should be measured according to their joint distribution [26].
Although the earth mover’s distance (EMD), which seeks the optimal
joint distribution, is an effective method to measure this dependence
[27], its computational cost is very high [28]. The mutual information
(MI) [29] can quantify the dependence of two random variables through
the Kullback-Leibler (KL) divergence between the joint distribution and
the marginal product. However, it is very difficult to calculate MI in real
value and high-dimensional settings [30]. The Brownian distance
covariance (BDC) measurement [31] is defined as the product of
Euclidean distance and margin between joint feature functions. It can
naturally quantify the dependency between two random variables,
accept feature mapping as input, and output a BDC matrix as image
representation. In this way, we further equivalent the calculation of
similarity between two images to the inner product calculation of two
BDC matrices between corresponding images. We first introduced the
BDC measurement into the field of rotating machinery fault diagnosis
[32]. It is worth noting that our preliminary work is introduced in [33].
In the version of this article, we have made the following innovations
and extensions for the more difficult task such as FST-ZST.
(1) To improve the generalization performance and optimization
starting point of the model, we propose a data augmentation pre­
processing mechanism based on a multi-scale mask. Based on MMP, a
sub-distribution of the original data sample distribution can be obtained
to improve the initial performance and the generalization ability of the
model when it is applied to new working conditions and new equipment.
(2) For the first time, we introduce Brownian distance into the field
of fault diagnosis, and construct a Brownian correlation metric proto­
typical network. In the classification stage, fault recognition is carried
out by measuring the difference between the joint distribution of its
embedded features and the edge product. In addition, the developed
method can break the monopoly of Euclidean distance and cosine dis­
tance used in current metric learning methods, and provide researchers
with a new idea of distance metric.
(3) To better realize fault diagnosis under the problem of the FST-
ZST, we develop a powerful intelligent fault diagnosis scheme based
on the BCMPN. When facing the diagnosis tasks of different working
conditions or different equipment, this method realizes the high-
precision fault diagnosis under the FST-ZST.
(4) Based on the standard data set and the actual data set, we con­
ducted ablation experiments to verify the effectiveness of the innovative
part. Moreover, we also compared with the current advanced diagnostic
methods to verify the superiority of our proposed method in cross
working conditions and cross machine.
The structure of this article is arranged as follows. Section 1 is the
introduction. Section 2 describes the problem from a mathematical point
of view. Section 3 gives a detailed description of the proposed method.
In Section 4, the experimental results and discussion are represented in
depth. Section 5 gives a summary and future jobs.
2. Problem description
First, we give the source domainDS = {(x
(i)
s , y
(i)
s )}
N
i=1, which includes
n labeled images G ∈ Rh×w×3
from the source input space{XS, YS}. In
addition, we extract the feature map m = f(I), m ∈ R
h
r×w
r ×c
,which is in­
dependent of the class. The classifier can recognize the l types of samples
and process the feature map by using the convolution kernelcl ∈ R1×1×c
,
where r is the output step size and c represents the number of feature
channels. x
(i)
s ∈ XS represents a sample in the source input space from the
probability distribution functionps(XS). And y
(i)
s ∈ YS is the correspond­
ing label to identify the fault type. In the source domain, we define the
fault set under different working conditions and different equipment
asU = ∅. In few-shot tasks, these fault instances U are so rare that they
cannot support the work of traditional classifiers. We record the fault
samples in these source domains asM = {cm
i
⃒
⃒i = 1,...,Nm},M ∈ S.
Given DT = {(x
(j)
t )}
M
j=1 as the target domain, which includes m
number of unlabeled images from the target domain space{XT,YT}. The
probability distribution function of the samples in the target domain
ispt(XT). Generally, due to the domain offset, the source domain and the
target domain are different, so the probability distributions of the source
domain and the target domain are unequal, that isps(XS) ∕
= pt(XT).
The goal of cross-domain fault diagnosis is to learn a general model
that performs well in the classification of unlabeled target domains with
the help of labeled source domains. Its mathematical principle is as
follows.
min‖yt, ̂
yt‖ (1)
Where ̂
yt refers to the predicted fault type input in the target domain.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
3
3. Methodology
3.1. The framework of the BCMPN
The overall diagnosis method is shown in Fig. 1, which mainly in­
cludes five stages:
(1) Sample acquisition and data transformation based on frequency
slice wavelet transform (FSWT) [34]. Among them, our previous work
[33,35] has well explained that the FSWT can promote the time­
–frequency description of samples, and then give full play to the ad­
vantages of the CNN in images;
(2) The enhanced preprocessing based on the multiscale mask;
(3) The enhanced samples and the original samples are fused as the
source domain, and the data under different working conditions and
equipment are taken as the target domain. The model is trained in the
source domain based on the feature embedding stage, which is
composed of the ECLA module, the Convolution block and the Dilation
Block;
(4) In the BDC classification stage, the distance metric based on class
prototype and softmax are constructed to identify different fault types,
and the trained model is saved;
(5) Test relevant unknown categories on the target domain and
obtain cross-domain diagnostic results.
The Convolution block established in this paper consists of three
parts: 3 × 3 convolutions, the BN layer and the ReLU activation function.
The detailed structure and parameters of the ELCA module are given in
the literature [33].
3.2. The global–local data fusion preprocessing mechanism based on
multi-scale mask
Human cognitive learning of things is a gradual and multi-scale
development process [36]. Therefore, humans can often complete sim­
ple tasks easily after learning more complex knowledge [37]. Inspired by
the human cognitive learning process, the generalization performance
can be boosted by increasing the training tasks difficulty while retaining
important classification information [38,39].
The specific principle of mask mechanism is shown in Fig. 2. Taking
the gear time–frequency image as an example, the noise in the original
input image is masked in a certain proportion to achieve local
enhancement of critical pixels. In order to improve the ability to capture
sensitive features in the target domain, we have developed a multi-scale
mask global–local data fusion preprocessing mechanism. The specific
process is described in Fig. 3. We mask the training samples pixel at
three scales to achieve multi-scale local feature enhancement. Moreover,
we combine the original input image to realize global data fusion.
To make readers better understand this process, we give a mathe­
matical description as follows.
We complete the process of local feature enhancement by evenly
deleting the area of the image, which is set as follows.
̂
G = G × M (2)
Where G ∈ Rh×w×c
represents the input image, M ∈ {0, 1}h×w
is the bi­
nary mask of the pixel to be removed, and ̂
G ∈ Rh×w×c
is the enhanced
result. For the binary maskM, ifMi,j = 1, the pixels (i, j) in the input
image is retained; otherwise, mask them.
As shown in Fig. 3, we use (r, d, x, y) to representM, r is the ratio of
mask retention, d is the length of a unit, x and y indicate the distance
between the first complete unit of the image and the image boundary.
The r determines the retention ratio of the input image. We define the
retention ratio p of the given mask M as follows.
p =
sum(M)
H × W
(3)
Too large a retention ratio may lead to overfitting, and too small a
retention ratio may lead to underfitting due to the loss of too much in­
formation. The relationship between r and p can be expressed as follows.
p = 1 − (1 − r)2
= 2r − r2
(4)
The d determines the size of a removed area. When r fixed, the
relationship between a removed side length l and d is:
l = r × d (5)
Fig. 1. Framework of the proposed method.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
4
We choose randomly from a range as follows.
d = random(dmin, dmax) (6)
Given the r andd, x and y determines the range of the moving mask,
they can ensure that they can move to all possible positions. Therefore,
the x and y can be selected randomly according to the following
equation.
x(y) = random(0, d − 1) (7)
In the preprocessing stage, we utilize the MMP to ensure the
Fig. 2. Diagram of learning principle in feature representation based on mask.
Fig. 3. The principle of the MMP.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
5
differences between samples, improve the difficulty of model training
tasks, and then make the trained model more generalized. It is worth
noting that we used the original data with a certain probability in the
training process, to ensure that the original data set is a subset of the
enhanced set, and then the trained model has a stronger generalization
ability under the enhanced assurance assumption [40].
3.3. The feature embedding stage
The composition of the proposed dilation module in feature
embedding is shown in Fig. 4. First, the multi-scale feature relationship
is extracted in parallel by using the dilated convolution with different
dilated rates [41]. Second, the features are combined to use 1 × 1
convolution reduce its size, and finally integrate the identity mapping of
the original input into the output layer to reduce the loss of effective
features. From a theoretical point of view, we express dr as dilated rate,
and then the dilated convolution ∗dr can be expressed as follows.
(F∗drk)(p) =
∑
s+drt=p
F(s)k(t) (8)
The detailed structure of other convolution modules and the ELCA
module are described in the reference [33].
3.4. The classification stage
As shown in Fig. 5, the diagnosis task is equivalent to an N-class
image classification task in the training set Dtrain
= {(zj, yj)}N
j=1 and the
test setDtest
= {(zj, yj)}N
j=1. The trained model will be tested on the new
taskDtest
′
. We provide time–frequency images to the network to generate
the BDC matrixAθ(zj). The prototype of the k-category in the training set
is the average value of the BDC matrix belonging to this category. Its
calculation equation is as follows.
Pk =
1
K
∑
(zj,yj)∈Sk
Aθ(zj) (9)
where Sk is the sample set is labeled with k-class. We generate the dis­
tance between the class distribution and the training set class prototype
based on softmax, and then formulate the loss function as follows.
argmin
θ
−
∑
(zj,yj)∈Dtest
′
log
exp(τtr(Aθ(zj)T
Pyj
)
∑
kexp(τtr(Aθ(zj)T
Pk)
(10)
where τ is a learnable scaling parameter.
Based on the theory of covariance of Brownian distance, we let X =
Rp
and Y = Rq
be random vectors with dimensions of p and q respec­
tively. Assuming fXY(x, y) is their joint probability density function, the
joint characteristic function of X and Y can be defined as follows.
∅XY (t, s) =
∫
Rp
∫
Rq
exp(i(tT
x + sT
y))fXY (x, y)dxdy (11)
where i is an imaginary number unit. Obviously, the marginal distri­
butions of X and Y are ∅X(t) = ∅XY(t, 0) and∅Y(s) = ∅XY(0,s), where 0 is
the vector with all elements being zero.
When and only when∅XY(t, s) = ∅X(t)∅Y(s), X and Y are indepen­
dent of each other. At this time, assuming that X and Y have limited first-
order moments, the covariance measure of Brownian distance between
them can be defined as follows.
ρ(X, Y) =
∫
Rp
∫
Rq
|∅XY (t, s) − ∅X(t)∅Y (s)|2
cpcq‖t‖1+p
‖s‖1+q
dtds (12)
where ‖⋅‖ is the Euclidean norm,cp = π(1+p)/2
Γ((1 + p)/2), and Γ is
gamma function.∅XY(t, s) is the joint characteristic function of X andY,
and∅X(t),∅Y(s) represents marginals.
For the set of m observation sets{(x1, y1), ..., (xm, ym)}, if they are
independent and identically distributed, the continuous expression of
the Brownian distance covariance matrix can be defined according to an
empirical characteristic function as follows.
∅XY (t, s) =
1
m
∑
m
k=1
exp(i(tT
xk + sT
yk)) (13)
In the discrete case, let̂
A = (̂
akl) ∈ Rm×m
, where ̂
akl = ‖xk − xl‖ in­
dicates the Euclidean distance matrix between the calculated observa­
tionsX. Similarly, we calculate the Euclidean distance matrix̂
B =
(̂
bkl) ∈ Rm×m
, wherê
bkl = ‖yk − yl‖. Then the covariance measure of
Brownian distance can be characterized in a relatively simple form as
follows.
ρ(X, Y) = tr(AT
B) (14)
where tr(⋅) represents matrix trace, T indicates the matrix transpose, and
A = (akl) is the covariance matrix of Brownian distance. Andakl =
̂
akl − 1
m
∑m
k=1̂
akl− 1
m
∑m
l=1̂
akl− 1
m2
∑m
k=1
∑m
l=1̂
akl.The last three items in the
above equation represent l − th columns, k − th rows, and all entries in̂
A.
Fig. 4. The structure of the dilation module.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
6
The matrix B can be obtained in the same waŷ
B. Since the Brownian
distance covariance matrix is symmetric, ρ(X, Y) can be written in the
form of two Brownian distance covariance matrix vectors a and b inner
products as follows.
ρ(X, Y) = 〈a, b〉 = aT
b (15)
Through the above analysis, we can calculate the Brownian distance
covariance matrix of each input image independently. The module
design based on Brownian distance covariance matrix is shown in Fig. 6.
Specifically, the two-tier structure of the module is used to perform the
operation of reducing the size and calculate the brown distance
covariance matrix.
Suppose we embed the color image into the feature space z ∈ R3
and
obtain a tensorh × w × d, where h and w indicate the height and width, d
represents the number of channels. We reshape tensors into
matricesX ∈ Rhw×d
, and we can view each column χk ∈ Rhw
or row xj ∈
Rd
after transposing as observations of random vectorsX.
In the following content, we make a random observation χk as an
example. First, calculate the square Euclidean distance matrix̃
A = (̃
akl),
where ̃
akl is the square Euclidean distance between k-th column and l-th
column inX.
̃
A = 2(1(XT
X◦
I))sym − 2XT
X (16)
where the matrix 1 ∈ Rd×d
indicates that each element is one, I is the
identity matrix and ◦
represents the Hadamard product.
Then, the Euclidean distance matrix ̂
A = (
̅̅̅̅̅̅
̃
akl
√
) is obtained after
squaring. Finally, the BDC matrix A is obtained by subtracting the row
Fig. 5. The principle of classification stage.
Fig. 6. The calculation process based on BDC.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
7
average value, column average value and average value of all elements
from̂
A. The specific calculation equation is as follows.
A = ̂
A −
2
d
(1̂
A)sym +
1
d2
1̂
A1 (17)
It is worth noting that we can approximate the BDC matrix as a
nonparametric, modular pooling layer. Eq. (13) shows that the BDC
matrix combined with Euclidean distance can model the nonlinear
relationship between channels. Compared with the covariance matrix
that can only simulate the linear relationship, the BDC matrix compre­
hensively considers the joint distribution, which has more advantages
than the covariance matrix that only considers the edge factors in the
task of few shot classification.
4. Case verification
In this section, the performance of the BCMPN is verified based on
the UConn data set and the laboratory data set. Meanwhile, combined
with the actual application requirements, we carry out experiments
based on the situation of FST-ZST to test the performance of this method.
The comparison model includes more advanced transfer learning, meta
learning and adversarial learning methods. The structural parameter
references of comparison methods are as follows.
MSSA [14]: MSSA is a multi-source subdomain adaptation transfer
learning method, which can transfer diagnostic knowledge from multi­
ple sources. In addition, the learning rate is 0.01, and one training in­
cludes 10 epochs.
MSTLN [18]: MSTLN is a multi-source transfer learning network,
which can transfer diagnostic knowledge from multiple source ma­
chines. Moreover, the learning rate is 0.0005, mini-batch size is 64.
TCNN [20]:TCNN is a transferable CNN, which can promote the
learning ability under target tasks. Meanwhile, learning rate is 0.01 and
the momentum is 0.97 with 100 epochs.
FRAN [21]: FRAN is an unsupervised domain adaptation method,
which can relieve domain shifts. The learning rate is1e-5, mini-batch
size is 64.
EAPN [33]: This paper is based on the improvement of the diagnosis
method EAPN that considers the problem of small samples and noise.
The learning rate is selected as 0.001.
The computer configuration of all experiments in this paper is as
follows: the Radeon Graphics CPU @ 2.10 GHz with 16 GB AMD Ryzen 5
4600u memory, the Intel (R) Xeon (R) CPU @ 2.30 GHz with 12 GB
memory and Tesla K80-GPU, the version of Python is 3.7.
In this paper, the evaluation index to measure the performance of
different models is accuracy, which is defined as follows.
ACC =
TP + TN
TP + TN + FP + FN
(18)
where TP represents the number of correctly divided positive cases,
FP represents the number of incorrectly divided positive cases, FN is the
number of incorrectly divided negative cases, TN is the number of
correctly divided negative cases.
4.1. Experimental Dataset description
There are five datasets in this paper, including UConnA, LabE, LabF,
LabG, LabH. In this paper, we assume that the fault types classified in the
target domain are set to be the same as those in the source domain.
4.1.1. Case 1: UConn data set
The UConn data set is a gearbox vibration signal collected by re­
searchers using the dSPACE system at a sampling frequency of 20 kHz.
Among them, the structure and description of the gearbox collecting
signals are shown in Fig. 7 [42], in which a variety of different gear
Fig. 7. The gearbox signal acquisition system and 4 kinds of gear health status.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
8
conditions are introduced into the pinion on the input shaft. Based on
the experimental requirements, three kinds of data, including health,
root cracking and spalling, are selected for the experiment. The sampling
points refer to the number of data points after each sampling based on
the vibration sensor, that is, how many points a time-domain waveform
consists of. In addition, the size of the sampling points will directly affect
the frequency resolution, so we develop each type as a total of 30
samples, and each sample contains 1024 sampling points. It is worth
noting that the number of sampling points can completely include the
number of points in multiple sampling cycles.
4.1.2. Case 2: Laboratory data set
This paper introduces the gear data set collected under real condi­
tions. As shown in Fig. 8, we install acceleration sensors on the input
shaft motor side, gearbox box and output shaft motor side. The gear data
of 7 different states are collected at the sampling frequency of 5120hz,
including health, the root cracks are 1 mm, 2 mm and 3 mm respectively,
and the tooth surface spalling was 1B, 2B and 3B respectively. The
specific details are shown in Fig. 9. In order to highlight the advantages
of this method in identifying weak faults with low damage degree, this
paper uses health, 1 mm tooth root crack and 1B tooth surface spalling,
which are shown in Table 1. The rotating speed is selected as 1500 rpm
and 600 rpm, and the load is set as 0hp, 2HP and 4HP.
4.2. Model structure and parameter selection
In the parameter selection stage, we test for the selection of various
parameters based on the input 250 × 250 size of pictures.
By observing Fig. 10 (a), it can be revealed that the highest accuracy
rate is 93.71% whenr = 0.5. In the task of cross machine, the highest
accuracy of 87.39% can be obtained whenr = 0.6. Through the result of
above experiments, we can draw that we need to choose larger r in more
difficult tasks. After thinking, it can be considered that if the model
wants to perform well in the more difficult task, it should retain the
complex classification features at a deeper level, and on the other hand,
it should remove the simple classification information extracted at a
shallow level. In summary, the results of the experiment are consistent
with our common sense.
Fig. 10 (b) shows the comparison of model diagnostic performance in
different ranges ofd. We found that when the range of d is too small, the
diagnostic performance is relatively low. With the increase of the range,
the diagnostic accuracy shows an upward trend. It can be seen that the
selection of values in different ranges will have different effects on the
diagnosis process, and the increase of range of d can well enhance the
diagnosis performance of the network.
In order to illustrate the influence of the modules proposed in this
paper on the diagnosis accuracy at different layers of the network, as
well as the influence of convolution kernel size and dilated rate on the
model classification ability, this paper carried out experiments based on
the cross-condition diagnosis task LabE-LabF, and the results are shown
in Table 2.
From the results in Table 2, it can be observed that the BCMPN1
model achieves the highest accuracy of 93.71%. Through the analysis, it
can be believed that the shallow data features are relatively sparse, and
the multi-scale feature spatial correlation can be captured by using the
dilation module. With the deepening of the network layer, the ELCA
module aims to extract high-level abstract features, that is, the combi­
nation of shallow features and other information. If the multi-scale in­
formation and dense features are combined by using the dilation module
in the deep layer of the network, it will inevitably lead to the confusion
of information flow and weaken the diagnostic performance of the
network.
4.3. Method comparison
4.3.1. Cross-condition diagnostic experiment
To verify the superiority of the method proposed in this paper, we
compared it with the authoritative methods based on transfer learning
and domain adversarial learning in recent years. The comparison results
of cross-condition diagnosis are shown in Fig. 11.
It can be calculated from Fig. 11 that the average diagnostic accuracy
of our method under six experimental conditions is 93.05%. Under the
same conditions, the accuracy of our method is higher than that of all
comparison methods, which well proves that the proposed method has
certain advantages in classifying the same fault under the complex
condition. It is worth noting that under the experimental condition of
LabF-LabG, the accuracy is lower than the average accuracy. This shows
that the diagnostic accuracy of the model decreases when the high-speed
condition is the source domain and the low-speed condition is the target
domain. Through analysis, we believe that the model faces the problem
of untrained low-impact information samples in the source domain, that
is, zero samples in the cross-condition diagnosis task. The reason for the
above problem may be the fault impact is strong under high speed
conditions and the sample contains more intense fault information.
Therefore, when the cross-domain task is from high-speed condition to
low speed condition, the problem of zero samples in the target domain
will greatly affect the performance of the model. The comparison
method can achieve ideal cross machine diagnosis through domain
adaptation and other methods. However, in the data set of this paper,
different working conditions and multiple fault degrees lead to more
complex diagnostic tasks. Under the more complex fault tasks
mentioned above, our method is ahead of all comparison methods. In
addition, the above analysis and results well illustrate the superiority of
the method in this paper under the target domain zero sample problem.
4.3.2. Cross-device diagnostic experiment
It can be seen from Fig. 12 that the average accuracy of the proposed
method in cross-device diagnosis can reach 82.32%, which is better than
other methods. In addition, it can be revealed that under the condition of
UConn data set as the source domain and Lab data set as the target
domain, the accuracy of all models is higher than that under the reverse
of the condition. After thinking, we think the possible reason is that
Fig. 8. The schematic diagram of the experiment table structure and sensor placement location.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
9
UConn data set, as a standard sample collected under good laboratory
conditions, has more representative feature of gear fault. And a good
data set leads to the model can learn more target information and thus
having a higher accuracy under cross-device diagnosis tasks. According
to the time–frequency diagram in Fig. 9, a certain amount of noise was
introduced in the data acquisition process, which led to the yellow part
of the impact response in the area outside the fault frequency in the
figure. Therefore, the fault information in the data is relatively limited to
be learned by the model. The extreme problem of FST-ZST leads to the
decline of the performance of each model in cross-device diagnosis.
4.4. Ablation experiments
4.4.1. Importance exploration of MMP
To highlight the effectiveness of the MMP stage proposed in this
paper, this paper compares it with the two image enhancement methods
of Cutout[43] and hide and seek (HaS) [44] in the image processing
field. The specific results are shown in Fig. 13.
From the results, we can find that the MMP proposed in this paper
has the best gain effect on the model. In order to better show the gain
effect brought by MMP, we compared the number of iterations of the
network based on LabE-LabF and LabA-LabE tasks, respectively.
As can be seen from Fig. 14, on the two tasks of LabE-LabF and LabA-
Fig. 9. Seven kinds of gear health status based on actual laboratory data set.
Table 1
Different conditions of the gearing on different datasets.
Gearing
Status
Fault
Degree
Speed/
rpm
Number of
samples
Loads Dataset
Root crack 1 mm 600 30 2HP LabE
Spalling 1B 600 30
Healthy – 600 30
Root crack 1 mm 1500 30 2HP LabF
Spalling 1B 1500 30
Healthy – 1500 30
Root crack 1 mm 600 30 4HP LabG
Spalling 1B 600 30
Healthy – 600 30
Root crack 1 mm 1500 30 4HP LabH
Spalling 1B 1500 30
Healthy – 1500 30
Fig. 10. The selection of parameters based on MMP.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
10
LabE, the curve with the MMP method converges faster and reaches
stability in 55 and 70 training epochs respectively. In contrast, the
number of stable iterations without MMP method is 70 and 80, respec­
tively. This is mainly because MMP establishes a good optimization
starting point in the parameter space area, which makes MMP can
effectively reduce the number of learning iterations of cross-domain
diagnosis and enhance the generalization performance of the model.
4.4.2. Exploration of different metric distances
Through Fig. 15, we compare the diagnostic effect of the metric
method based on inner product used in this paper with the traditional
method based on cosine distance and Euclidean distance in current
metric learning. It can be seen that our method can achieve 80.38% and
79.05% diagnostic accuracy even in the two tasks of LabE-UConnA and
LabF-UConnA, which perform the worst in cosine distance and
Euclidean distance. This well proves that the BDC module based on inner
product developed in this paper has better performance in cross working
condition and cross device diagnosis tasks.
Table 2
Fault diagnosis results of different module orders.
BCMPN 1 BCMPN 2 BCMPN 3 BCMPN 4
Dilation ELCA Dilation ELCA Dilation ELCA Dilation ELCA
Stage 1 √ – √ – – – √ –
Stage 2 √ – √ – – – – –
Stage 3 – √ – – √ – √ –
Stage 4 – – – √ √ – – –
Stage 5 – – – – – √ – –
Stage 6 – – – – – – – √
Accuracy 93.71% 93.16% 90.32% 91.47%
LabE-LabF LabE-LabG LabE-LabH LabF-LabG LabF-LabH LabG-LabH
Ours 93.71 95.07 90.25 89.71 95.37 94.2
EAPN 80.49 81.37 75.98 76.32 82.1 84.74
TCNN 77.42 80.33 72.97 78.43 79.35 75.44
FRAN 79.14 80.1 76.94 74.31 79.08 80.44
MSSA 83.41 83.99 77.36 76.14 82.33 81.79
MSTLN 82.74 83.29 79.24 76.91 83.1 80.23
20
40
60
80
100
Accuracy(%)
Fig. 11. The result of cross-condition diagnostic experiment.
UConnA-
LabE
UConnA-
LabF
UConnA-
LabG
UConnA-
LabH
LabE-
UConnA
LabF-
UConnA
LabG-
UConnA
LabH-
UConnA
Ours 87.39 87.14 83.3 89.37 80.38 79.05 78.1 72.82
EAPN 77.32 74.54 73.1 75.1 67.38 65.15 66.4 63.19
TCNN 60.17 62.04 58.1 63.47 55.32 53.01 52.35 50.37
FRAN 72.09 68.47 67.03 73.24 65.97 64.19 61.7 58.12
MSSA 75.53 74.26 73.49 78.91 70.3 66.74 64.12 60.77
MSTLN 74.19 73.42 70.34 77.04 68.13 65.32 64.18 60.1
20
40
60
80
100
Accuracy(%)
Fig. 12. The result of cross-device diagnostic experiment.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
11
5. Conclusion
This paper presents a novel fault diagnosis method for gear systems
based on the Brownian correlation metric prototypical network algo­
rithm. The core idea of this method is to utilize the knowledge in the
limited source domain samples to improve the performance under the
zero samples in the target domain. The BCMPN is an improved version of
the EAPN we published before, in which the MMP technology is devel­
oped to establish better generalization performance from a global and
local perspective. Meanwhile, the integrated MSFE module and ELCA
module are designed to extract representative features in limited sam­
ples. In the classification stage, we introduce Brown distance covariance
into the cross-domain fault diagnosis field for the first time, compre­
hensively consider the joint distribution and complete the fault classi­
fication by calculating the inner product. Compared with the advanced
methods in recent years, our method has better performance. The future
work includes further improving the model structurally to reduce the
training time of the network and trying to combine unsupervised
learning methods to better apply to other data-driven tasks.
The current diagnosis mechanism still has some space in computing
time. In the future, the model structure and distance measurement
mechanism will be improved to reduce computing time.
Fig. 13. Result of different image enhancement methods.
Fig. 14. Result of different image enhancement methods.
Fig. 15. Result of different methods based on different metric distances.
J. Yang et al.
Advanced Engineering Informatics 54 (2022) 101815
12
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Data availability
The authors do not have permission to share data.
Acknowledgment
This work was supported by Natural Science Foundation of Hei­
longjiang Province of China (Grant No. LH2021F021).
References
[1] S. Liu, H. Jiang, Y. Wang, K. Zhu, C. Liu, A deep feature alignment adaptation
network for rolling bearing intelligent fault diagnosis, Adv. Eng. Inform. 52 (2022),
101598.
[2] Z. Huang, et al., A Multisource Dense Adaptation Adversarial Network for Fault
Diagnosis of Machinery, IEEE Trans. Ind. Electron. 69 (6) (2022) 6298–6307.
[3] Y. Wang, B. Qin, K. Liu, M. Shen, M. Niu, L. Han, A New Multitask Learning Method
for Tool Wear Condition and Part Surface Quality Prediction, IEEE Trans. Ind. Inf.
17 (9) (2020) 6023–6033.
[4] P. Xia, Y. Huang, P. Li, C. Liu, L. Shi, Fault Knowledge Transfer Assisted Ensemble
Method for Remaining Useful Life Prediction, IEEE Trans. Ind. Inf. 18 (3) (2021)
1758–1769.
[5] Z. Wang, J. Xuan, Intelligent fault recognition framework by using deep
reinforcement learning with one dimension convolution and improved actor-critic
algorithm, Adv. Eng. Inform. 49 (2021), 101315.
[6] X. Gu, Y. Zhao, G. Yang, L. Li, An Imbalance Modified Convolutional Neural
Network With Incremental Learning for Chemical Fault Diagnosis, IEEE Trans. Ind.
Inf. 18 (6) (2021) 3630–3639.
[7] X. Wang, L. Luo, L. Tang, Z. Yang, Automatic representation and detection of fault
bearings in in-wheel motors under variable load conditions, Adv. Eng. Inform. 49
(2021), 101321.
[8] C. Wang, Z. Xu, An intelligent fault diagnosis model based on deep neural network
for few-shot fault diagnosis, Neurocomputing 456 (2021) 550–562.
[9] S. Zhang, Y. Li, W. Cui, R. Yang, J. Dong, Hu, Limited data rolling bearing fault
diagnosis with few-shot learning, IEEE Access 7 (2019) 110895–110904.
[10] S. Li, A. Li, Q. Zhang, Z. He, J. Liao, Hu, Meta-learning for few-shot bearing fault
diagnosis under complex working conditions, Neurocomputing 439 (2021)
197–211.
[11] J. Fan, X. Yuan, Z. Miao, Z. Sun, X. Mei, F. Zhou, Full Attention Wasserstein GAN
With Gradient Normalization for Fault Diagnosis Under Imbalanced Data, IEEE
Trans. Ins. Mea 71 (2022) 1–16.
[12] Y. Zhou, Y. Ning, et al., Deep Dynamic Adaptive Transfer Network for Rolling
Bearing Fault Diagnosis with Considering Cross-machine Instance, IEEE Trans. Ins.
Mea 70 (2021).
[13] Y. Zou, K. Shi, Y. Liu, G. Ding, K. Ding, Rolling bearing transfer fault diagnosis
method based on adversarial variational autoencoder network, Meas. Sci. Technol.
32 (11) (2021), 115017.
[14] J. Tian, D. Han, M. Li, P. Shi, A multi-source information transfer learning method
with subdomain adaptation for cross-domain fault diagnosis, Knowl.-Based Syst.
243 (2020), 108466.
[15] J. Wang, S. Ji, B. Han, H. Bao, X. Jiang, Deep Adaptive Adversarial Network-Based
Method for Mechanical Fault Diagnosis under Different Working Conditions,
Complexity 2020 (2020) 6946702.
[16] L. Wan, Y. Li, K. Chen, K. Gong, C. Li, A novel deep convolution multi-adversarial
domain adaptation model for rolling bearing fault diagnosis, Measurement 191
(2022), 110752.
[17] L. Chen, Q. Li, C. Shen, J. Zhu, D. Wang, M. Xia, Adversarial Domain-Invariant
Generalization: A Generic Domain-Regressive Framework for Bearing Fault
Diagnosis Under Unseen Conditions, IEEE Trans. Ind. Inf. 18 (3) (2021)
1790–1800.
[18] B. Yang, S. Xu, Y. Lei, C. Lee, S. Edward, R. Clive, Multi-source transfer learning
network to complement knowledge for intelligent diagnosis of machines with
unseen faults, Mech. Syst. Signal Process. 162 (2022), 108095.
[19] S. Liu, H. Wang, J. Tang, X. Zhang, Research on fault diagnosis of gas turbine rotor
based on adversarial discriminative domain adaption transfer learning,
Measurement 196 (2022), 111174.
[20] Z. Chen, K. Gryllias, W. Li, Intelligent Fault Diagnosis for Rotary Machinery Using
Transferable Convolutional Neural Network, IEEE Trans. Ind. Inf. 16 (1) (2020)
339–349.
[21] J. Chen, J. Wang, J. Zhu, T.H. Lee, C.W. Silva, Unsupervised Cross-Domain Fault
Diagnosis Using Feature Representation Alignment Networks for Rotating
Machinery, IEEE ASME Trans. Mechatron. 26 (5) (2020) 2770–2781.
[22] F. Zhou, S. Yang, H. Fujita, D. Chen, C. Wen, Deep learning fault diagnosis method
based on global optimization GAN for unbalanced data, Knowl.-Based Syst. 187
(2020), 104837.
[23] C. Zheng, C. Zhao, Fault-Prototypical Adapted Network for Cross-Domain
Industrial Intelligent Diagnosis, IEEE Trans. Autom. Sci. Eng. 1–10 (2021).
[24] C. Jiang, H. Chen, Q. Xu, X. Wang, Few-shot fault diagnosis of rotating machinery
with two-branch prototypical networks, J. Intell. Manuf. (2022).
[25] C. Qian, Q. Jiang, Y. Shen, C. Huo, Q. Zhang, An intelligent fault diagnosis method
for rolling bearings based on feature transfer with improved DenseNet and joint
distribution adaptation, Meas. Sci. Technol. 33 (2) (2022), 025101.
[26] T.M. Cover, J.A. Thomas, Elements of information theory (2005) 463–508, https://
doi.org/10.1002/047174882X.ch14.
[27] C. Zhang, Y. Cai, G. Lin, C. Shen, DeepEMD: Few-shot image classification with
differentiable earth mover’s distance and structured classifiers, (2020) https://doi.
org/10.48550/arXiv.2003.06777.
[28] D. Wertheimer, L. Tang, B. Hariharan, Few-Shot Classification with Feature Map
Reconstruction Networks, (2021) https://doi.org/10.48550/arXiv.2012.01506.
[29] B. Christopher, Pattern Recognition and Machine Learning, Springer. (2006),
https://doi.org/10.18637/jss.v017.b05.
[30] MI. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, RD. Hjelm,
MINE: Mutual Information Neural Estimation. (2018) https://doi.org/10.48550/
arXiv.1801.04062.
[31] S. Gabor, R. Maria, Brownian distance ́covariance, Ann. Appl. Stat. 3 (4) (2009)
1236–1265.
[32] J. Xie, F. Long, J. Lv, Q. Wang, P. Li, Joint Distribution Matters: Deep Brownian
Distance Covariance for Few-Shot Classification, (2022) https://doi.org/
10.48550/arXiv.2204.04567.
[33] C. Wang, H. Sun, X. Cao, Construction of the efficient attention prototypical net
based on the time-frequency characterization of vibration signals under noisy small
sample, Measurement 179 (2021), 109412.
[34] Z. Yan, M. Ayaho, Z. Jiang, X. Liu, An overall theoretical description of frequency
slice wavelet transform, Mech. Syst. Signal Process. 24 (2) (2010) 491–507.
[35] H. Sun, X. Cao, C. Dong, S. Gao, An interpretable anti-noise network for rolling
bearing fault diagnosis based on FSWT, Measurement 190 (2022), 110698.
[36] J. Joshua, G. Ashok, Perceptually grounded self-diagnosis and self-repair of domain
knowledge, Knowl.-Based Syst. 27 (2012) 281–301.
[37] E. Belouadah, A. Popescu, I. Kanellos, A Comprehensive Study of Class Incremental
Learning Algorithms for Visual Tasks, (2020) https://doi.org/10.48550/
arXiv.2011.01844.
[38] M. Lu, H. Liu, X. Yuan, Thermal fault diagnosis of electrical equipment in
substations based on image fusion, Trait. Signal. 38 (4) (2021) 1095–1102.
[39] P. Chen, S. Liu, H. Zhao, J. Jia, GridMask Data Augmentation (2020), https://doi.
org/10.48550/arXiv.2001.04086.
[40] T. Yao, X. Yi, DZ. Cheng, F. Yu, T. Chen, A. Menon, L. Hong, EH. Chi, S. Tjoa, J.
Kang, Self-supervised Learning for Large-scale Item Recommendations, (2020)
https://doi.org/10.48550/arXiv.2007.12865.
[41] F. Yu, V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR.
(2016), https://doi.org/10.48550/arXiv.1511.07122.
[42] P. Cao, S. Zhang, J. Tang, Preprocessing-Free Gear Fault Diagnosis Using Small
Datasets With Deep Convolutional Neural Network-Based Transfer Learning, IEEE
Access 6 (2018) 26241–26253.
[43] T. DeVries, G. W. Taylor, Improved regularization of convolutional neural
networks with cutout, (2017) https://doi.org/10.48550/arXiv.1708.04552.
[44] K. Kumar, Y. Hao, A. Sarmasi, G. Pradeep, Y. Jae, Hide-and-seek: A data
augmentation technique for weakly-supervised localization and beyond, (2018)
https://doi.org/10.48550/arXiv.1811.02545.
J. Yang et al.

More Related Content

Similar to 1-s2.0-S1474034622002737-main.pdf

IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...M H
 
Demand forecast of PV integrated bioclimatic buildings using ensemble framework
Demand forecast of PV integrated bioclimatic buildings using ensemble frameworkDemand forecast of PV integrated bioclimatic buildings using ensemble framework
Demand forecast of PV integrated bioclimatic buildings using ensemble frameworkMuhammad Qamar Raza
 
Comparison of specific segmentation methods used for copy move detection
Comparison of specific segmentation methods used for copy move detection Comparison of specific segmentation methods used for copy move detection
Comparison of specific segmentation methods used for copy move detection IJECEIAES
 
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMDYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMcscpconf
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...cscpconf
 
Twin support vector machine using kernel function for colorectal cancer detec...
Twin support vector machine using kernel function for colorectal cancer detec...Twin support vector machine using kernel function for colorectal cancer detec...
Twin support vector machine using kernel function for colorectal cancer detec...journalBEEI
 
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesDilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesIRJET Journal
 
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdf
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdfresearchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdf
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdfAvijitChaudhuri3
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
Deep learning for cancer tumor classification using transfer learning and fe...
Deep learning for cancer tumor classification using transfer  learning and fe...Deep learning for cancer tumor classification using transfer  learning and fe...
Deep learning for cancer tumor classification using transfer learning and fe...IJECEIAES
 
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGUNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGgerogepatton
 
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGUNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGijaia
 
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing ImagesA Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing ImagesDR.P.S.JAGADEESH KUMAR
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
 
Ieee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and AbstractIeee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and Abstracttsysglobalsolutions
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsVijay Karan
 

Similar to 1-s2.0-S1474034622002737-main.pdf (20)

V01 i010405
V01 i010405V01 i010405
V01 i010405
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
 
Demand forecast of PV integrated bioclimatic buildings using ensemble framework
Demand forecast of PV integrated bioclimatic buildings using ensemble frameworkDemand forecast of PV integrated bioclimatic buildings using ensemble framework
Demand forecast of PV integrated bioclimatic buildings using ensemble framework
 
Comparison of specific segmentation methods used for copy move detection
Comparison of specific segmentation methods used for copy move detection Comparison of specific segmentation methods used for copy move detection
Comparison of specific segmentation methods used for copy move detection
 
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMDYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
 
Twin support vector machine using kernel function for colorectal cancer detec...
Twin support vector machine using kernel function for colorectal cancer detec...Twin support vector machine using kernel function for colorectal cancer detec...
Twin support vector machine using kernel function for colorectal cancer detec...
 
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesDilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
 
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdf
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdfresearchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdf
researchpaper_2023_Skin_Csdbjsjvnvsdnfvancer.pdf
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
Deep learning for cancer tumor classification using transfer learning and fe...
Deep learning for cancer tumor classification using transfer  learning and fe...Deep learning for cancer tumor classification using transfer  learning and fe...
Deep learning for cancer tumor classification using transfer learning and fe...
 
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGUNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
 
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGUNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNING
 
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing ImagesA Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...
 
Ieee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and AbstractIeee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and Abstract
 
I0341042048
I0341042048I0341042048
I0341042048
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab Projects
 

Recently uploaded

Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...Health
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 

Recently uploaded (20)

Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

1-s2.0-S1474034622002737-main.pdf

  • 1. Advanced Engineering Informatics 54 (2022) 101815 Available online 23 November 2022 1474-0346/© 2022 Elsevier Ltd. All rights reserved. Full length article A novel Brownian correlation metric prototypical network for rotating machinery fault diagnosis with few and zero shot learners Jingli Yang, Changdong Wang, Chang’an Wei * School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150006, People’s Republic of China A R T I C L E I N F O Keywords: Rotating machinery Fault diagnosis Prototypical network Multi-scale mask Brownian distance covariance A B S T R A C T Due to the variability of working conditions and the scarcity of fault samples, the existing diagnosis models still have a big gap under the condition of covering more practical application scenarios. Therefore, it is of great significance to study an intelligent diagnosis scheme that takes few samples in the training source domain and zero samples in the test target domain (FST-ZST) into account. A Brownian correlation metric prototypical network (BCMPN) algorithm based on a multi-scale mask preprocessing mechanism is proposed for the above problem. First, this paper constructs a multi-scale mask preprocessing mechanism (MMP) to improve the opti­ mization starting point. Second, the multi-scale feature embedding is realized through the dilation convolution module and the effective light channel attention (ELCA) module. Third, based on the Brownian distance simi­ larity measurement, we learn the feature representation by measuring the difference between the joint feature function and the edge product in the field of diagnosis. Finally, based on the gear data set of the Connecticut university (UConn) and the data collected in the laboratory, it is proved that the BCMPN has better performance in the problem of FST-ZST. 1. Introduction With the vigorous development of prediction and health manage­ ment technology, intelligent diagnosis algorithm based on deep learning helps people understand and decide the maintenance strategy of equipment by monitoring the obtained data [1,2]. The deep belief network (DBN) [3], the long-term and short-term memory network (LSTM) [4], the convolutional neural network (CNN) [5–7], and other methods of deep learning algorithms have made great achievements in the field of fault diagnosis and life prediction of rotating machinery. However, due to the changes in working conditions in the practical application, not only can we not obtain sufficient fault data sets, but also there are even no samples available in many actual scenarios due to safety constraints [8,9]. Therefore, the scarcity of samples and the complexity of diagnostic conditions seriously limit the potential of the above methods to adapt to new diagnostic tasks. At present, it is a great challenge to train a powerful fault diagnosis model and make it work well under complex conditions such as FST-ZST [10]. As the diagnosis scenario changes with the changes in equipment structure, operating conditions, and data quality, the data feature also become highly complex [11,12]. Therefore, an in-depth study of fault diagnosis based on cross-domain is the key to realizing the application of deep learning methods to practical equipment. To solve the complex cross working condition and even cross-equipment problems, some re­ searchers have proposed the corresponding domain adversarial method. Zou et al. [13] developed a fault diagnosis model based on Wasserstein adversarial channel compression variational automatic encoder (WACCVAE). The interference of redundant features is reduced by compressing the channel, and then the distance constraint between classes and within classes is imposed on the proposed model to enhance the distribution alignment of the same different-domain samples. Tian et al. [14] took multiple source domains as the diagnostic knowledge and developed a multi- branch network structure to match the feature space distribution. The local maximum mean difference was introduced to correct the distribution difference of subdomains, and the multiple source classifiers were applied to diagnose the status of equipment. To learn domain invariant features, Wang et al. [15] designed a deep adaptive adversarial network (DAAN) including a condition recognition module and domain adversarial learning module to automatically extract features and classify the health status of different working con­ ditions. Wan et al. [16] constructed a new deep convolution multi adversarial domain adaptive network (DCMADAN). The constructed * Corresponding author. E-mail addresses: jinglidg@hit.edu.cn (J. Yang), weichangan2021@163.com (C. Wei). Contents lists available at ScienceDirect Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei https://doi.org/10.1016/j.aei.2022.101815 Received 14 July 2022; Received in revised form 10 October 2022; Accepted 12 November 2022
  • 2. Advanced Engineering Informatics 54 (2022) 101815 2 domain adaptive module combined with a multi-core maximum mean difference (MK-MMD) and multi-domain discriminator to debug the edge distribution and conditional distribution. Chen et al. [17] devel­ oped an adversarial domain invariant generalization (ADIG) fault diagnosis framework based on adversarial learning. After integrating the data of multiple different domains, the adversarial learning between the feature extraction module and the domain classification module is used to realize high-precision cross-domain diagnosis. However, the read­ justment of the model will pay a high time cost when the above methods face new diagnostic tasks. For alleviating the problem of differences in the distribution of data features, some researchers have combined transfer learning and deep learning to diagnose faults in recent years. Based on transfer learning, Yang et al. [18] constructed a balance factor to weigh the target samples, so that the adaptive subnetwork can jointly adapt to the partial distri­ bution of the source domain and the target domain. In addition, they used the multi-source diagnostic knowledge fusion module to integrate multiple diagnostic decisions. Liu et al. [19] transmitted the pre-trained deep CNN in the source domain to the target domain, and then applied deep adversarial training between two domains to optimize the pa­ rameters for reducing the domain offset. Chen et al. [20] pretrained the one-dimensional CNN in a large source data set, and the excellent transferability of the proposed transferable convolutional neural network (TCNN) has been verified on three test rigs. However, the premise of the high performance of the above diagnosis method based on transfer learning is that the source domain should contain a large amount of effective training data [21]. As we all know, the data that can be obtained is limited in many actual scenarios. At present, many people focus on the ability of feature embedding and classification stage models, ignoring the importance of the input preprocessing stage. We believe that preprocessing based on data enhancement is an important stage in gaining the internal feature rep­ resentation of different categories. Therefore, a complete and effective preprocessing mechanism can make the diagnostic performance obtain a high gain across working conditions and even across equipment [21]. Fortunately, one-dimensional time-domain vibration signals can be converted into two-dimensional forms and presented in the form of time–frequency images effectively. Meanwhile, CNN can be utilized to extract critical features from high-dimensional data in the image [22]. From a statistical point of view, the essence of fault diagnosis after transforming signals into images is equivalent to embedding random vectors expressing fault features into high-dimensional space and measuring their correlation [8]. Among the diagnosis methods based on correlation metric, the fault method based on few-shot learning gradu­ ally shows superior performance [23,24]. However, the most current methods use Euclidean distance and cosine distance to realize distance measurement, and they only consider the marginal distribution between different sample features and ignore their joint distribution, which will lead to the limited diagnostic ability of the model [25]. It can be seen from the above that the dependence between the two images should be measured according to their joint distribution [26]. Although the earth mover’s distance (EMD), which seeks the optimal joint distribution, is an effective method to measure this dependence [27], its computational cost is very high [28]. The mutual information (MI) [29] can quantify the dependence of two random variables through the Kullback-Leibler (KL) divergence between the joint distribution and the marginal product. However, it is very difficult to calculate MI in real value and high-dimensional settings [30]. The Brownian distance covariance (BDC) measurement [31] is defined as the product of Euclidean distance and margin between joint feature functions. It can naturally quantify the dependency between two random variables, accept feature mapping as input, and output a BDC matrix as image representation. In this way, we further equivalent the calculation of similarity between two images to the inner product calculation of two BDC matrices between corresponding images. We first introduced the BDC measurement into the field of rotating machinery fault diagnosis [32]. It is worth noting that our preliminary work is introduced in [33]. In the version of this article, we have made the following innovations and extensions for the more difficult task such as FST-ZST. (1) To improve the generalization performance and optimization starting point of the model, we propose a data augmentation pre­ processing mechanism based on a multi-scale mask. Based on MMP, a sub-distribution of the original data sample distribution can be obtained to improve the initial performance and the generalization ability of the model when it is applied to new working conditions and new equipment. (2) For the first time, we introduce Brownian distance into the field of fault diagnosis, and construct a Brownian correlation metric proto­ typical network. In the classification stage, fault recognition is carried out by measuring the difference between the joint distribution of its embedded features and the edge product. In addition, the developed method can break the monopoly of Euclidean distance and cosine dis­ tance used in current metric learning methods, and provide researchers with a new idea of distance metric. (3) To better realize fault diagnosis under the problem of the FST- ZST, we develop a powerful intelligent fault diagnosis scheme based on the BCMPN. When facing the diagnosis tasks of different working conditions or different equipment, this method realizes the high- precision fault diagnosis under the FST-ZST. (4) Based on the standard data set and the actual data set, we con­ ducted ablation experiments to verify the effectiveness of the innovative part. Moreover, we also compared with the current advanced diagnostic methods to verify the superiority of our proposed method in cross working conditions and cross machine. The structure of this article is arranged as follows. Section 1 is the introduction. Section 2 describes the problem from a mathematical point of view. Section 3 gives a detailed description of the proposed method. In Section 4, the experimental results and discussion are represented in depth. Section 5 gives a summary and future jobs. 2. Problem description First, we give the source domainDS = {(x (i) s , y (i) s )} N i=1, which includes n labeled images G ∈ Rh×w×3 from the source input space{XS, YS}. In addition, we extract the feature map m = f(I), m ∈ R h r×w r ×c ,which is in­ dependent of the class. The classifier can recognize the l types of samples and process the feature map by using the convolution kernelcl ∈ R1×1×c , where r is the output step size and c represents the number of feature channels. x (i) s ∈ XS represents a sample in the source input space from the probability distribution functionps(XS). And y (i) s ∈ YS is the correspond­ ing label to identify the fault type. In the source domain, we define the fault set under different working conditions and different equipment asU = ∅. In few-shot tasks, these fault instances U are so rare that they cannot support the work of traditional classifiers. We record the fault samples in these source domains asM = {cm i ⃒ ⃒i = 1,...,Nm},M ∈ S. Given DT = {(x (j) t )} M j=1 as the target domain, which includes m number of unlabeled images from the target domain space{XT,YT}. The probability distribution function of the samples in the target domain ispt(XT). Generally, due to the domain offset, the source domain and the target domain are different, so the probability distributions of the source domain and the target domain are unequal, that isps(XS) ∕ = pt(XT). The goal of cross-domain fault diagnosis is to learn a general model that performs well in the classification of unlabeled target domains with the help of labeled source domains. Its mathematical principle is as follows. min‖yt, ̂ yt‖ (1) Where ̂ yt refers to the predicted fault type input in the target domain. J. Yang et al.
  • 3. Advanced Engineering Informatics 54 (2022) 101815 3 3. Methodology 3.1. The framework of the BCMPN The overall diagnosis method is shown in Fig. 1, which mainly in­ cludes five stages: (1) Sample acquisition and data transformation based on frequency slice wavelet transform (FSWT) [34]. Among them, our previous work [33,35] has well explained that the FSWT can promote the time­ –frequency description of samples, and then give full play to the ad­ vantages of the CNN in images; (2) The enhanced preprocessing based on the multiscale mask; (3) The enhanced samples and the original samples are fused as the source domain, and the data under different working conditions and equipment are taken as the target domain. The model is trained in the source domain based on the feature embedding stage, which is composed of the ECLA module, the Convolution block and the Dilation Block; (4) In the BDC classification stage, the distance metric based on class prototype and softmax are constructed to identify different fault types, and the trained model is saved; (5) Test relevant unknown categories on the target domain and obtain cross-domain diagnostic results. The Convolution block established in this paper consists of three parts: 3 × 3 convolutions, the BN layer and the ReLU activation function. The detailed structure and parameters of the ELCA module are given in the literature [33]. 3.2. The global–local data fusion preprocessing mechanism based on multi-scale mask Human cognitive learning of things is a gradual and multi-scale development process [36]. Therefore, humans can often complete sim­ ple tasks easily after learning more complex knowledge [37]. Inspired by the human cognitive learning process, the generalization performance can be boosted by increasing the training tasks difficulty while retaining important classification information [38,39]. The specific principle of mask mechanism is shown in Fig. 2. Taking the gear time–frequency image as an example, the noise in the original input image is masked in a certain proportion to achieve local enhancement of critical pixels. In order to improve the ability to capture sensitive features in the target domain, we have developed a multi-scale mask global–local data fusion preprocessing mechanism. The specific process is described in Fig. 3. We mask the training samples pixel at three scales to achieve multi-scale local feature enhancement. Moreover, we combine the original input image to realize global data fusion. To make readers better understand this process, we give a mathe­ matical description as follows. We complete the process of local feature enhancement by evenly deleting the area of the image, which is set as follows. ̂ G = G × M (2) Where G ∈ Rh×w×c represents the input image, M ∈ {0, 1}h×w is the bi­ nary mask of the pixel to be removed, and ̂ G ∈ Rh×w×c is the enhanced result. For the binary maskM, ifMi,j = 1, the pixels (i, j) in the input image is retained; otherwise, mask them. As shown in Fig. 3, we use (r, d, x, y) to representM, r is the ratio of mask retention, d is the length of a unit, x and y indicate the distance between the first complete unit of the image and the image boundary. The r determines the retention ratio of the input image. We define the retention ratio p of the given mask M as follows. p = sum(M) H × W (3) Too large a retention ratio may lead to overfitting, and too small a retention ratio may lead to underfitting due to the loss of too much in­ formation. The relationship between r and p can be expressed as follows. p = 1 − (1 − r)2 = 2r − r2 (4) The d determines the size of a removed area. When r fixed, the relationship between a removed side length l and d is: l = r × d (5) Fig. 1. Framework of the proposed method. J. Yang et al.
  • 4. Advanced Engineering Informatics 54 (2022) 101815 4 We choose randomly from a range as follows. d = random(dmin, dmax) (6) Given the r andd, x and y determines the range of the moving mask, they can ensure that they can move to all possible positions. Therefore, the x and y can be selected randomly according to the following equation. x(y) = random(0, d − 1) (7) In the preprocessing stage, we utilize the MMP to ensure the Fig. 2. Diagram of learning principle in feature representation based on mask. Fig. 3. The principle of the MMP. J. Yang et al.
  • 5. Advanced Engineering Informatics 54 (2022) 101815 5 differences between samples, improve the difficulty of model training tasks, and then make the trained model more generalized. It is worth noting that we used the original data with a certain probability in the training process, to ensure that the original data set is a subset of the enhanced set, and then the trained model has a stronger generalization ability under the enhanced assurance assumption [40]. 3.3. The feature embedding stage The composition of the proposed dilation module in feature embedding is shown in Fig. 4. First, the multi-scale feature relationship is extracted in parallel by using the dilated convolution with different dilated rates [41]. Second, the features are combined to use 1 × 1 convolution reduce its size, and finally integrate the identity mapping of the original input into the output layer to reduce the loss of effective features. From a theoretical point of view, we express dr as dilated rate, and then the dilated convolution ∗dr can be expressed as follows. (F∗drk)(p) = ∑ s+drt=p F(s)k(t) (8) The detailed structure of other convolution modules and the ELCA module are described in the reference [33]. 3.4. The classification stage As shown in Fig. 5, the diagnosis task is equivalent to an N-class image classification task in the training set Dtrain = {(zj, yj)}N j=1 and the test setDtest = {(zj, yj)}N j=1. The trained model will be tested on the new taskDtest ′ . We provide time–frequency images to the network to generate the BDC matrixAθ(zj). The prototype of the k-category in the training set is the average value of the BDC matrix belonging to this category. Its calculation equation is as follows. Pk = 1 K ∑ (zj,yj)∈Sk Aθ(zj) (9) where Sk is the sample set is labeled with k-class. We generate the dis­ tance between the class distribution and the training set class prototype based on softmax, and then formulate the loss function as follows. argmin θ − ∑ (zj,yj)∈Dtest ′ log exp(τtr(Aθ(zj)T Pyj ) ∑ kexp(τtr(Aθ(zj)T Pk) (10) where τ is a learnable scaling parameter. Based on the theory of covariance of Brownian distance, we let X = Rp and Y = Rq be random vectors with dimensions of p and q respec­ tively. Assuming fXY(x, y) is their joint probability density function, the joint characteristic function of X and Y can be defined as follows. ∅XY (t, s) = ∫ Rp ∫ Rq exp(i(tT x + sT y))fXY (x, y)dxdy (11) where i is an imaginary number unit. Obviously, the marginal distri­ butions of X and Y are ∅X(t) = ∅XY(t, 0) and∅Y(s) = ∅XY(0,s), where 0 is the vector with all elements being zero. When and only when∅XY(t, s) = ∅X(t)∅Y(s), X and Y are indepen­ dent of each other. At this time, assuming that X and Y have limited first- order moments, the covariance measure of Brownian distance between them can be defined as follows. ρ(X, Y) = ∫ Rp ∫ Rq |∅XY (t, s) − ∅X(t)∅Y (s)|2 cpcq‖t‖1+p ‖s‖1+q dtds (12) where ‖⋅‖ is the Euclidean norm,cp = π(1+p)/2 Γ((1 + p)/2), and Γ is gamma function.∅XY(t, s) is the joint characteristic function of X andY, and∅X(t),∅Y(s) represents marginals. For the set of m observation sets{(x1, y1), ..., (xm, ym)}, if they are independent and identically distributed, the continuous expression of the Brownian distance covariance matrix can be defined according to an empirical characteristic function as follows. ∅XY (t, s) = 1 m ∑ m k=1 exp(i(tT xk + sT yk)) (13) In the discrete case, let̂ A = (̂ akl) ∈ Rm×m , where ̂ akl = ‖xk − xl‖ in­ dicates the Euclidean distance matrix between the calculated observa­ tionsX. Similarly, we calculate the Euclidean distance matrix̂ B = (̂ bkl) ∈ Rm×m , wherê bkl = ‖yk − yl‖. Then the covariance measure of Brownian distance can be characterized in a relatively simple form as follows. ρ(X, Y) = tr(AT B) (14) where tr(⋅) represents matrix trace, T indicates the matrix transpose, and A = (akl) is the covariance matrix of Brownian distance. Andakl = ̂ akl − 1 m ∑m k=1̂ akl− 1 m ∑m l=1̂ akl− 1 m2 ∑m k=1 ∑m l=1̂ akl.The last three items in the above equation represent l − th columns, k − th rows, and all entries in̂ A. Fig. 4. The structure of the dilation module. J. Yang et al.
  • 6. Advanced Engineering Informatics 54 (2022) 101815 6 The matrix B can be obtained in the same waŷ B. Since the Brownian distance covariance matrix is symmetric, ρ(X, Y) can be written in the form of two Brownian distance covariance matrix vectors a and b inner products as follows. ρ(X, Y) = 〈a, b〉 = aT b (15) Through the above analysis, we can calculate the Brownian distance covariance matrix of each input image independently. The module design based on Brownian distance covariance matrix is shown in Fig. 6. Specifically, the two-tier structure of the module is used to perform the operation of reducing the size and calculate the brown distance covariance matrix. Suppose we embed the color image into the feature space z ∈ R3 and obtain a tensorh × w × d, where h and w indicate the height and width, d represents the number of channels. We reshape tensors into matricesX ∈ Rhw×d , and we can view each column χk ∈ Rhw or row xj ∈ Rd after transposing as observations of random vectorsX. In the following content, we make a random observation χk as an example. First, calculate the square Euclidean distance matrix̃ A = (̃ akl), where ̃ akl is the square Euclidean distance between k-th column and l-th column inX. ̃ A = 2(1(XT X◦ I))sym − 2XT X (16) where the matrix 1 ∈ Rd×d indicates that each element is one, I is the identity matrix and ◦ represents the Hadamard product. Then, the Euclidean distance matrix ̂ A = ( ̅̅̅̅̅̅ ̃ akl √ ) is obtained after squaring. Finally, the BDC matrix A is obtained by subtracting the row Fig. 5. The principle of classification stage. Fig. 6. The calculation process based on BDC. J. Yang et al.
  • 7. Advanced Engineering Informatics 54 (2022) 101815 7 average value, column average value and average value of all elements from̂ A. The specific calculation equation is as follows. A = ̂ A − 2 d (1̂ A)sym + 1 d2 1̂ A1 (17) It is worth noting that we can approximate the BDC matrix as a nonparametric, modular pooling layer. Eq. (13) shows that the BDC matrix combined with Euclidean distance can model the nonlinear relationship between channels. Compared with the covariance matrix that can only simulate the linear relationship, the BDC matrix compre­ hensively considers the joint distribution, which has more advantages than the covariance matrix that only considers the edge factors in the task of few shot classification. 4. Case verification In this section, the performance of the BCMPN is verified based on the UConn data set and the laboratory data set. Meanwhile, combined with the actual application requirements, we carry out experiments based on the situation of FST-ZST to test the performance of this method. The comparison model includes more advanced transfer learning, meta learning and adversarial learning methods. The structural parameter references of comparison methods are as follows. MSSA [14]: MSSA is a multi-source subdomain adaptation transfer learning method, which can transfer diagnostic knowledge from multi­ ple sources. In addition, the learning rate is 0.01, and one training in­ cludes 10 epochs. MSTLN [18]: MSTLN is a multi-source transfer learning network, which can transfer diagnostic knowledge from multiple source ma­ chines. Moreover, the learning rate is 0.0005, mini-batch size is 64. TCNN [20]:TCNN is a transferable CNN, which can promote the learning ability under target tasks. Meanwhile, learning rate is 0.01 and the momentum is 0.97 with 100 epochs. FRAN [21]: FRAN is an unsupervised domain adaptation method, which can relieve domain shifts. The learning rate is1e-5, mini-batch size is 64. EAPN [33]: This paper is based on the improvement of the diagnosis method EAPN that considers the problem of small samples and noise. The learning rate is selected as 0.001. The computer configuration of all experiments in this paper is as follows: the Radeon Graphics CPU @ 2.10 GHz with 16 GB AMD Ryzen 5 4600u memory, the Intel (R) Xeon (R) CPU @ 2.30 GHz with 12 GB memory and Tesla K80-GPU, the version of Python is 3.7. In this paper, the evaluation index to measure the performance of different models is accuracy, which is defined as follows. ACC = TP + TN TP + TN + FP + FN (18) where TP represents the number of correctly divided positive cases, FP represents the number of incorrectly divided positive cases, FN is the number of incorrectly divided negative cases, TN is the number of correctly divided negative cases. 4.1. Experimental Dataset description There are five datasets in this paper, including UConnA, LabE, LabF, LabG, LabH. In this paper, we assume that the fault types classified in the target domain are set to be the same as those in the source domain. 4.1.1. Case 1: UConn data set The UConn data set is a gearbox vibration signal collected by re­ searchers using the dSPACE system at a sampling frequency of 20 kHz. Among them, the structure and description of the gearbox collecting signals are shown in Fig. 7 [42], in which a variety of different gear Fig. 7. The gearbox signal acquisition system and 4 kinds of gear health status. J. Yang et al.
  • 8. Advanced Engineering Informatics 54 (2022) 101815 8 conditions are introduced into the pinion on the input shaft. Based on the experimental requirements, three kinds of data, including health, root cracking and spalling, are selected for the experiment. The sampling points refer to the number of data points after each sampling based on the vibration sensor, that is, how many points a time-domain waveform consists of. In addition, the size of the sampling points will directly affect the frequency resolution, so we develop each type as a total of 30 samples, and each sample contains 1024 sampling points. It is worth noting that the number of sampling points can completely include the number of points in multiple sampling cycles. 4.1.2. Case 2: Laboratory data set This paper introduces the gear data set collected under real condi­ tions. As shown in Fig. 8, we install acceleration sensors on the input shaft motor side, gearbox box and output shaft motor side. The gear data of 7 different states are collected at the sampling frequency of 5120hz, including health, the root cracks are 1 mm, 2 mm and 3 mm respectively, and the tooth surface spalling was 1B, 2B and 3B respectively. The specific details are shown in Fig. 9. In order to highlight the advantages of this method in identifying weak faults with low damage degree, this paper uses health, 1 mm tooth root crack and 1B tooth surface spalling, which are shown in Table 1. The rotating speed is selected as 1500 rpm and 600 rpm, and the load is set as 0hp, 2HP and 4HP. 4.2. Model structure and parameter selection In the parameter selection stage, we test for the selection of various parameters based on the input 250 × 250 size of pictures. By observing Fig. 10 (a), it can be revealed that the highest accuracy rate is 93.71% whenr = 0.5. In the task of cross machine, the highest accuracy of 87.39% can be obtained whenr = 0.6. Through the result of above experiments, we can draw that we need to choose larger r in more difficult tasks. After thinking, it can be considered that if the model wants to perform well in the more difficult task, it should retain the complex classification features at a deeper level, and on the other hand, it should remove the simple classification information extracted at a shallow level. In summary, the results of the experiment are consistent with our common sense. Fig. 10 (b) shows the comparison of model diagnostic performance in different ranges ofd. We found that when the range of d is too small, the diagnostic performance is relatively low. With the increase of the range, the diagnostic accuracy shows an upward trend. It can be seen that the selection of values in different ranges will have different effects on the diagnosis process, and the increase of range of d can well enhance the diagnosis performance of the network. In order to illustrate the influence of the modules proposed in this paper on the diagnosis accuracy at different layers of the network, as well as the influence of convolution kernel size and dilated rate on the model classification ability, this paper carried out experiments based on the cross-condition diagnosis task LabE-LabF, and the results are shown in Table 2. From the results in Table 2, it can be observed that the BCMPN1 model achieves the highest accuracy of 93.71%. Through the analysis, it can be believed that the shallow data features are relatively sparse, and the multi-scale feature spatial correlation can be captured by using the dilation module. With the deepening of the network layer, the ELCA module aims to extract high-level abstract features, that is, the combi­ nation of shallow features and other information. If the multi-scale in­ formation and dense features are combined by using the dilation module in the deep layer of the network, it will inevitably lead to the confusion of information flow and weaken the diagnostic performance of the network. 4.3. Method comparison 4.3.1. Cross-condition diagnostic experiment To verify the superiority of the method proposed in this paper, we compared it with the authoritative methods based on transfer learning and domain adversarial learning in recent years. The comparison results of cross-condition diagnosis are shown in Fig. 11. It can be calculated from Fig. 11 that the average diagnostic accuracy of our method under six experimental conditions is 93.05%. Under the same conditions, the accuracy of our method is higher than that of all comparison methods, which well proves that the proposed method has certain advantages in classifying the same fault under the complex condition. It is worth noting that under the experimental condition of LabF-LabG, the accuracy is lower than the average accuracy. This shows that the diagnostic accuracy of the model decreases when the high-speed condition is the source domain and the low-speed condition is the target domain. Through analysis, we believe that the model faces the problem of untrained low-impact information samples in the source domain, that is, zero samples in the cross-condition diagnosis task. The reason for the above problem may be the fault impact is strong under high speed conditions and the sample contains more intense fault information. Therefore, when the cross-domain task is from high-speed condition to low speed condition, the problem of zero samples in the target domain will greatly affect the performance of the model. The comparison method can achieve ideal cross machine diagnosis through domain adaptation and other methods. However, in the data set of this paper, different working conditions and multiple fault degrees lead to more complex diagnostic tasks. Under the more complex fault tasks mentioned above, our method is ahead of all comparison methods. In addition, the above analysis and results well illustrate the superiority of the method in this paper under the target domain zero sample problem. 4.3.2. Cross-device diagnostic experiment It can be seen from Fig. 12 that the average accuracy of the proposed method in cross-device diagnosis can reach 82.32%, which is better than other methods. In addition, it can be revealed that under the condition of UConn data set as the source domain and Lab data set as the target domain, the accuracy of all models is higher than that under the reverse of the condition. After thinking, we think the possible reason is that Fig. 8. The schematic diagram of the experiment table structure and sensor placement location. J. Yang et al.
  • 9. Advanced Engineering Informatics 54 (2022) 101815 9 UConn data set, as a standard sample collected under good laboratory conditions, has more representative feature of gear fault. And a good data set leads to the model can learn more target information and thus having a higher accuracy under cross-device diagnosis tasks. According to the time–frequency diagram in Fig. 9, a certain amount of noise was introduced in the data acquisition process, which led to the yellow part of the impact response in the area outside the fault frequency in the figure. Therefore, the fault information in the data is relatively limited to be learned by the model. The extreme problem of FST-ZST leads to the decline of the performance of each model in cross-device diagnosis. 4.4. Ablation experiments 4.4.1. Importance exploration of MMP To highlight the effectiveness of the MMP stage proposed in this paper, this paper compares it with the two image enhancement methods of Cutout[43] and hide and seek (HaS) [44] in the image processing field. The specific results are shown in Fig. 13. From the results, we can find that the MMP proposed in this paper has the best gain effect on the model. In order to better show the gain effect brought by MMP, we compared the number of iterations of the network based on LabE-LabF and LabA-LabE tasks, respectively. As can be seen from Fig. 14, on the two tasks of LabE-LabF and LabA- Fig. 9. Seven kinds of gear health status based on actual laboratory data set. Table 1 Different conditions of the gearing on different datasets. Gearing Status Fault Degree Speed/ rpm Number of samples Loads Dataset Root crack 1 mm 600 30 2HP LabE Spalling 1B 600 30 Healthy – 600 30 Root crack 1 mm 1500 30 2HP LabF Spalling 1B 1500 30 Healthy – 1500 30 Root crack 1 mm 600 30 4HP LabG Spalling 1B 600 30 Healthy – 600 30 Root crack 1 mm 1500 30 4HP LabH Spalling 1B 1500 30 Healthy – 1500 30 Fig. 10. The selection of parameters based on MMP. J. Yang et al.
  • 10. Advanced Engineering Informatics 54 (2022) 101815 10 LabE, the curve with the MMP method converges faster and reaches stability in 55 and 70 training epochs respectively. In contrast, the number of stable iterations without MMP method is 70 and 80, respec­ tively. This is mainly because MMP establishes a good optimization starting point in the parameter space area, which makes MMP can effectively reduce the number of learning iterations of cross-domain diagnosis and enhance the generalization performance of the model. 4.4.2. Exploration of different metric distances Through Fig. 15, we compare the diagnostic effect of the metric method based on inner product used in this paper with the traditional method based on cosine distance and Euclidean distance in current metric learning. It can be seen that our method can achieve 80.38% and 79.05% diagnostic accuracy even in the two tasks of LabE-UConnA and LabF-UConnA, which perform the worst in cosine distance and Euclidean distance. This well proves that the BDC module based on inner product developed in this paper has better performance in cross working condition and cross device diagnosis tasks. Table 2 Fault diagnosis results of different module orders. BCMPN 1 BCMPN 2 BCMPN 3 BCMPN 4 Dilation ELCA Dilation ELCA Dilation ELCA Dilation ELCA Stage 1 √ – √ – – – √ – Stage 2 √ – √ – – – – – Stage 3 – √ – – √ – √ – Stage 4 – – – √ √ – – – Stage 5 – – – – – √ – – Stage 6 – – – – – – – √ Accuracy 93.71% 93.16% 90.32% 91.47% LabE-LabF LabE-LabG LabE-LabH LabF-LabG LabF-LabH LabG-LabH Ours 93.71 95.07 90.25 89.71 95.37 94.2 EAPN 80.49 81.37 75.98 76.32 82.1 84.74 TCNN 77.42 80.33 72.97 78.43 79.35 75.44 FRAN 79.14 80.1 76.94 74.31 79.08 80.44 MSSA 83.41 83.99 77.36 76.14 82.33 81.79 MSTLN 82.74 83.29 79.24 76.91 83.1 80.23 20 40 60 80 100 Accuracy(%) Fig. 11. The result of cross-condition diagnostic experiment. UConnA- LabE UConnA- LabF UConnA- LabG UConnA- LabH LabE- UConnA LabF- UConnA LabG- UConnA LabH- UConnA Ours 87.39 87.14 83.3 89.37 80.38 79.05 78.1 72.82 EAPN 77.32 74.54 73.1 75.1 67.38 65.15 66.4 63.19 TCNN 60.17 62.04 58.1 63.47 55.32 53.01 52.35 50.37 FRAN 72.09 68.47 67.03 73.24 65.97 64.19 61.7 58.12 MSSA 75.53 74.26 73.49 78.91 70.3 66.74 64.12 60.77 MSTLN 74.19 73.42 70.34 77.04 68.13 65.32 64.18 60.1 20 40 60 80 100 Accuracy(%) Fig. 12. The result of cross-device diagnostic experiment. J. Yang et al.
  • 11. Advanced Engineering Informatics 54 (2022) 101815 11 5. Conclusion This paper presents a novel fault diagnosis method for gear systems based on the Brownian correlation metric prototypical network algo­ rithm. The core idea of this method is to utilize the knowledge in the limited source domain samples to improve the performance under the zero samples in the target domain. The BCMPN is an improved version of the EAPN we published before, in which the MMP technology is devel­ oped to establish better generalization performance from a global and local perspective. Meanwhile, the integrated MSFE module and ELCA module are designed to extract representative features in limited sam­ ples. In the classification stage, we introduce Brown distance covariance into the cross-domain fault diagnosis field for the first time, compre­ hensively consider the joint distribution and complete the fault classi­ fication by calculating the inner product. Compared with the advanced methods in recent years, our method has better performance. The future work includes further improving the model structurally to reduce the training time of the network and trying to combine unsupervised learning methods to better apply to other data-driven tasks. The current diagnosis mechanism still has some space in computing time. In the future, the model structure and distance measurement mechanism will be improved to reduce computing time. Fig. 13. Result of different image enhancement methods. Fig. 14. Result of different image enhancement methods. Fig. 15. Result of different methods based on different metric distances. J. Yang et al.
  • 12. Advanced Engineering Informatics 54 (2022) 101815 12 Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability The authors do not have permission to share data. Acknowledgment This work was supported by Natural Science Foundation of Hei­ longjiang Province of China (Grant No. LH2021F021). References [1] S. Liu, H. Jiang, Y. Wang, K. Zhu, C. Liu, A deep feature alignment adaptation network for rolling bearing intelligent fault diagnosis, Adv. Eng. Inform. 52 (2022), 101598. [2] Z. Huang, et al., A Multisource Dense Adaptation Adversarial Network for Fault Diagnosis of Machinery, IEEE Trans. Ind. Electron. 69 (6) (2022) 6298–6307. [3] Y. Wang, B. Qin, K. Liu, M. Shen, M. Niu, L. Han, A New Multitask Learning Method for Tool Wear Condition and Part Surface Quality Prediction, IEEE Trans. Ind. Inf. 17 (9) (2020) 6023–6033. [4] P. Xia, Y. Huang, P. Li, C. Liu, L. Shi, Fault Knowledge Transfer Assisted Ensemble Method for Remaining Useful Life Prediction, IEEE Trans. Ind. Inf. 18 (3) (2021) 1758–1769. [5] Z. Wang, J. Xuan, Intelligent fault recognition framework by using deep reinforcement learning with one dimension convolution and improved actor-critic algorithm, Adv. Eng. Inform. 49 (2021), 101315. [6] X. Gu, Y. Zhao, G. Yang, L. Li, An Imbalance Modified Convolutional Neural Network With Incremental Learning for Chemical Fault Diagnosis, IEEE Trans. Ind. Inf. 18 (6) (2021) 3630–3639. [7] X. Wang, L. Luo, L. Tang, Z. Yang, Automatic representation and detection of fault bearings in in-wheel motors under variable load conditions, Adv. Eng. Inform. 49 (2021), 101321. [8] C. Wang, Z. Xu, An intelligent fault diagnosis model based on deep neural network for few-shot fault diagnosis, Neurocomputing 456 (2021) 550–562. [9] S. Zhang, Y. Li, W. Cui, R. Yang, J. Dong, Hu, Limited data rolling bearing fault diagnosis with few-shot learning, IEEE Access 7 (2019) 110895–110904. [10] S. Li, A. Li, Q. Zhang, Z. He, J. Liao, Hu, Meta-learning for few-shot bearing fault diagnosis under complex working conditions, Neurocomputing 439 (2021) 197–211. [11] J. Fan, X. Yuan, Z. Miao, Z. Sun, X. Mei, F. Zhou, Full Attention Wasserstein GAN With Gradient Normalization for Fault Diagnosis Under Imbalanced Data, IEEE Trans. Ins. Mea 71 (2022) 1–16. [12] Y. Zhou, Y. Ning, et al., Deep Dynamic Adaptive Transfer Network for Rolling Bearing Fault Diagnosis with Considering Cross-machine Instance, IEEE Trans. Ins. Mea 70 (2021). [13] Y. Zou, K. Shi, Y. Liu, G. Ding, K. Ding, Rolling bearing transfer fault diagnosis method based on adversarial variational autoencoder network, Meas. Sci. Technol. 32 (11) (2021), 115017. [14] J. Tian, D. Han, M. Li, P. Shi, A multi-source information transfer learning method with subdomain adaptation for cross-domain fault diagnosis, Knowl.-Based Syst. 243 (2020), 108466. [15] J. Wang, S. Ji, B. Han, H. Bao, X. Jiang, Deep Adaptive Adversarial Network-Based Method for Mechanical Fault Diagnosis under Different Working Conditions, Complexity 2020 (2020) 6946702. [16] L. Wan, Y. Li, K. Chen, K. Gong, C. Li, A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis, Measurement 191 (2022), 110752. [17] L. Chen, Q. Li, C. Shen, J. Zhu, D. Wang, M. Xia, Adversarial Domain-Invariant Generalization: A Generic Domain-Regressive Framework for Bearing Fault Diagnosis Under Unseen Conditions, IEEE Trans. Ind. Inf. 18 (3) (2021) 1790–1800. [18] B. Yang, S. Xu, Y. Lei, C. Lee, S. Edward, R. Clive, Multi-source transfer learning network to complement knowledge for intelligent diagnosis of machines with unseen faults, Mech. Syst. Signal Process. 162 (2022), 108095. [19] S. Liu, H. Wang, J. Tang, X. Zhang, Research on fault diagnosis of gas turbine rotor based on adversarial discriminative domain adaption transfer learning, Measurement 196 (2022), 111174. [20] Z. Chen, K. Gryllias, W. Li, Intelligent Fault Diagnosis for Rotary Machinery Using Transferable Convolutional Neural Network, IEEE Trans. Ind. Inf. 16 (1) (2020) 339–349. [21] J. Chen, J. Wang, J. Zhu, T.H. Lee, C.W. Silva, Unsupervised Cross-Domain Fault Diagnosis Using Feature Representation Alignment Networks for Rotating Machinery, IEEE ASME Trans. Mechatron. 26 (5) (2020) 2770–2781. [22] F. Zhou, S. Yang, H. Fujita, D. Chen, C. Wen, Deep learning fault diagnosis method based on global optimization GAN for unbalanced data, Knowl.-Based Syst. 187 (2020), 104837. [23] C. Zheng, C. Zhao, Fault-Prototypical Adapted Network for Cross-Domain Industrial Intelligent Diagnosis, IEEE Trans. Autom. Sci. Eng. 1–10 (2021). [24] C. Jiang, H. Chen, Q. Xu, X. Wang, Few-shot fault diagnosis of rotating machinery with two-branch prototypical networks, J. Intell. Manuf. (2022). [25] C. Qian, Q. Jiang, Y. Shen, C. Huo, Q. Zhang, An intelligent fault diagnosis method for rolling bearings based on feature transfer with improved DenseNet and joint distribution adaptation, Meas. Sci. Technol. 33 (2) (2022), 025101. [26] T.M. Cover, J.A. Thomas, Elements of information theory (2005) 463–508, https:// doi.org/10.1002/047174882X.ch14. [27] C. Zhang, Y. Cai, G. Lin, C. Shen, DeepEMD: Few-shot image classification with differentiable earth mover’s distance and structured classifiers, (2020) https://doi. org/10.48550/arXiv.2003.06777. [28] D. Wertheimer, L. Tang, B. Hariharan, Few-Shot Classification with Feature Map Reconstruction Networks, (2021) https://doi.org/10.48550/arXiv.2012.01506. [29] B. Christopher, Pattern Recognition and Machine Learning, Springer. (2006), https://doi.org/10.18637/jss.v017.b05. [30] MI. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, RD. Hjelm, MINE: Mutual Information Neural Estimation. (2018) https://doi.org/10.48550/ arXiv.1801.04062. [31] S. Gabor, R. Maria, Brownian distance ́covariance, Ann. Appl. Stat. 3 (4) (2009) 1236–1265. [32] J. Xie, F. Long, J. Lv, Q. Wang, P. Li, Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification, (2022) https://doi.org/ 10.48550/arXiv.2204.04567. [33] C. Wang, H. Sun, X. Cao, Construction of the efficient attention prototypical net based on the time-frequency characterization of vibration signals under noisy small sample, Measurement 179 (2021), 109412. [34] Z. Yan, M. Ayaho, Z. Jiang, X. Liu, An overall theoretical description of frequency slice wavelet transform, Mech. Syst. Signal Process. 24 (2) (2010) 491–507. [35] H. Sun, X. Cao, C. Dong, S. Gao, An interpretable anti-noise network for rolling bearing fault diagnosis based on FSWT, Measurement 190 (2022), 110698. [36] J. Joshua, G. Ashok, Perceptually grounded self-diagnosis and self-repair of domain knowledge, Knowl.-Based Syst. 27 (2012) 281–301. [37] E. Belouadah, A. Popescu, I. Kanellos, A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks, (2020) https://doi.org/10.48550/ arXiv.2011.01844. [38] M. Lu, H. Liu, X. Yuan, Thermal fault diagnosis of electrical equipment in substations based on image fusion, Trait. Signal. 38 (4) (2021) 1095–1102. [39] P. Chen, S. Liu, H. Zhao, J. Jia, GridMask Data Augmentation (2020), https://doi. org/10.48550/arXiv.2001.04086. [40] T. Yao, X. Yi, DZ. Cheng, F. Yu, T. Chen, A. Menon, L. Hong, EH. Chi, S. Tjoa, J. Kang, Self-supervised Learning for Large-scale Item Recommendations, (2020) https://doi.org/10.48550/arXiv.2007.12865. [41] F. Yu, V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR. (2016), https://doi.org/10.48550/arXiv.1511.07122. [42] P. Cao, S. Zhang, J. Tang, Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning, IEEE Access 6 (2018) 26241–26253. [43] T. DeVries, G. W. Taylor, Improved regularization of convolutional neural networks with cutout, (2017) https://doi.org/10.48550/arXiv.1708.04552. [44] K. Kumar, Y. Hao, A. Sarmasi, G. Pradeep, Y. Jae, Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond, (2018) https://doi.org/10.48550/arXiv.1811.02545. J. Yang et al.