2. Advanced Engineering Informatics 54 (2022) 101815
2
domain adaptive module combined with a multi-core maximum mean
difference (MK-MMD) and multi-domain discriminator to debug the
edge distribution and conditional distribution. Chen et al. [17] devel
oped an adversarial domain invariant generalization (ADIG) fault
diagnosis framework based on adversarial learning. After integrating the
data of multiple different domains, the adversarial learning between the
feature extraction module and the domain classification module is used
to realize high-precision cross-domain diagnosis. However, the read
justment of the model will pay a high time cost when the above methods
face new diagnostic tasks.
For alleviating the problem of differences in the distribution of data
features, some researchers have combined transfer learning and deep
learning to diagnose faults in recent years. Based on transfer learning,
Yang et al. [18] constructed a balance factor to weigh the target samples,
so that the adaptive subnetwork can jointly adapt to the partial distri
bution of the source domain and the target domain. In addition, they
used the multi-source diagnostic knowledge fusion module to integrate
multiple diagnostic decisions. Liu et al. [19] transmitted the pre-trained
deep CNN in the source domain to the target domain, and then applied
deep adversarial training between two domains to optimize the pa
rameters for reducing the domain offset. Chen et al. [20] pretrained the
one-dimensional CNN in a large source data set, and the excellent
transferability of the proposed transferable convolutional neural
network (TCNN) has been verified on three test rigs. However, the
premise of the high performance of the above diagnosis method based
on transfer learning is that the source domain should contain a large
amount of effective training data [21]. As we all know, the data that can
be obtained is limited in many actual scenarios.
At present, many people focus on the ability of feature embedding
and classification stage models, ignoring the importance of the input
preprocessing stage. We believe that preprocessing based on data
enhancement is an important stage in gaining the internal feature rep
resentation of different categories. Therefore, a complete and effective
preprocessing mechanism can make the diagnostic performance obtain a
high gain across working conditions and even across equipment [21].
Fortunately, one-dimensional time-domain vibration signals can be
converted into two-dimensional forms and presented in the form of
time–frequency images effectively. Meanwhile, CNN can be utilized to
extract critical features from high-dimensional data in the image [22].
From a statistical point of view, the essence of fault diagnosis after
transforming signals into images is equivalent to embedding random
vectors expressing fault features into high-dimensional space and
measuring their correlation [8]. Among the diagnosis methods based on
correlation metric, the fault method based on few-shot learning gradu
ally shows superior performance [23,24]. However, the most current
methods use Euclidean distance and cosine distance to realize distance
measurement, and they only consider the marginal distribution between
different sample features and ignore their joint distribution, which will
lead to the limited diagnostic ability of the model [25].
It can be seen from the above that the dependence between the two
images should be measured according to their joint distribution [26].
Although the earth mover’s distance (EMD), which seeks the optimal
joint distribution, is an effective method to measure this dependence
[27], its computational cost is very high [28]. The mutual information
(MI) [29] can quantify the dependence of two random variables through
the Kullback-Leibler (KL) divergence between the joint distribution and
the marginal product. However, it is very difficult to calculate MI in real
value and high-dimensional settings [30]. The Brownian distance
covariance (BDC) measurement [31] is defined as the product of
Euclidean distance and margin between joint feature functions. It can
naturally quantify the dependency between two random variables,
accept feature mapping as input, and output a BDC matrix as image
representation. In this way, we further equivalent the calculation of
similarity between two images to the inner product calculation of two
BDC matrices between corresponding images. We first introduced the
BDC measurement into the field of rotating machinery fault diagnosis
[32]. It is worth noting that our preliminary work is introduced in [33].
In the version of this article, we have made the following innovations
and extensions for the more difficult task such as FST-ZST.
(1) To improve the generalization performance and optimization
starting point of the model, we propose a data augmentation pre
processing mechanism based on a multi-scale mask. Based on MMP, a
sub-distribution of the original data sample distribution can be obtained
to improve the initial performance and the generalization ability of the
model when it is applied to new working conditions and new equipment.
(2) For the first time, we introduce Brownian distance into the field
of fault diagnosis, and construct a Brownian correlation metric proto
typical network. In the classification stage, fault recognition is carried
out by measuring the difference between the joint distribution of its
embedded features and the edge product. In addition, the developed
method can break the monopoly of Euclidean distance and cosine dis
tance used in current metric learning methods, and provide researchers
with a new idea of distance metric.
(3) To better realize fault diagnosis under the problem of the FST-
ZST, we develop a powerful intelligent fault diagnosis scheme based
on the BCMPN. When facing the diagnosis tasks of different working
conditions or different equipment, this method realizes the high-
precision fault diagnosis under the FST-ZST.
(4) Based on the standard data set and the actual data set, we con
ducted ablation experiments to verify the effectiveness of the innovative
part. Moreover, we also compared with the current advanced diagnostic
methods to verify the superiority of our proposed method in cross
working conditions and cross machine.
The structure of this article is arranged as follows. Section 1 is the
introduction. Section 2 describes the problem from a mathematical point
of view. Section 3 gives a detailed description of the proposed method.
In Section 4, the experimental results and discussion are represented in
depth. Section 5 gives a summary and future jobs.
2. Problem description
First, we give the source domainDS = {(x
(i)
s , y
(i)
s )}
N
i=1, which includes
n labeled images G ∈ Rh×w×3
from the source input space{XS, YS}. In
addition, we extract the feature map m = f(I), m ∈ R
h
r×w
r ×c
,which is in
dependent of the class. The classifier can recognize the l types of samples
and process the feature map by using the convolution kernelcl ∈ R1×1×c
,
where r is the output step size and c represents the number of feature
channels. x
(i)
s ∈ XS represents a sample in the source input space from the
probability distribution functionps(XS). And y
(i)
s ∈ YS is the correspond
ing label to identify the fault type. In the source domain, we define the
fault set under different working conditions and different equipment
asU = ∅. In few-shot tasks, these fault instances U are so rare that they
cannot support the work of traditional classifiers. We record the fault
samples in these source domains asM = {cm
i
⃒
⃒i = 1,...,Nm},M ∈ S.
Given DT = {(x
(j)
t )}
M
j=1 as the target domain, which includes m
number of unlabeled images from the target domain space{XT,YT}. The
probability distribution function of the samples in the target domain
ispt(XT). Generally, due to the domain offset, the source domain and the
target domain are different, so the probability distributions of the source
domain and the target domain are unequal, that isps(XS) ∕
= pt(XT).
The goal of cross-domain fault diagnosis is to learn a general model
that performs well in the classification of unlabeled target domains with
the help of labeled source domains. Its mathematical principle is as
follows.
min‖yt, ̂
yt‖ (1)
Where ̂
yt refers to the predicted fault type input in the target domain.
J. Yang et al.
3. Advanced Engineering Informatics 54 (2022) 101815
3
3. Methodology
3.1. The framework of the BCMPN
The overall diagnosis method is shown in Fig. 1, which mainly in
cludes five stages:
(1) Sample acquisition and data transformation based on frequency
slice wavelet transform (FSWT) [34]. Among them, our previous work
[33,35] has well explained that the FSWT can promote the time
–frequency description of samples, and then give full play to the ad
vantages of the CNN in images;
(2) The enhanced preprocessing based on the multiscale mask;
(3) The enhanced samples and the original samples are fused as the
source domain, and the data under different working conditions and
equipment are taken as the target domain. The model is trained in the
source domain based on the feature embedding stage, which is
composed of the ECLA module, the Convolution block and the Dilation
Block;
(4) In the BDC classification stage, the distance metric based on class
prototype and softmax are constructed to identify different fault types,
and the trained model is saved;
(5) Test relevant unknown categories on the target domain and
obtain cross-domain diagnostic results.
The Convolution block established in this paper consists of three
parts: 3 × 3 convolutions, the BN layer and the ReLU activation function.
The detailed structure and parameters of the ELCA module are given in
the literature [33].
3.2. The global–local data fusion preprocessing mechanism based on
multi-scale mask
Human cognitive learning of things is a gradual and multi-scale
development process [36]. Therefore, humans can often complete sim
ple tasks easily after learning more complex knowledge [37]. Inspired by
the human cognitive learning process, the generalization performance
can be boosted by increasing the training tasks difficulty while retaining
important classification information [38,39].
The specific principle of mask mechanism is shown in Fig. 2. Taking
the gear time–frequency image as an example, the noise in the original
input image is masked in a certain proportion to achieve local
enhancement of critical pixels. In order to improve the ability to capture
sensitive features in the target domain, we have developed a multi-scale
mask global–local data fusion preprocessing mechanism. The specific
process is described in Fig. 3. We mask the training samples pixel at
three scales to achieve multi-scale local feature enhancement. Moreover,
we combine the original input image to realize global data fusion.
To make readers better understand this process, we give a mathe
matical description as follows.
We complete the process of local feature enhancement by evenly
deleting the area of the image, which is set as follows.
̂
G = G × M (2)
Where G ∈ Rh×w×c
represents the input image, M ∈ {0, 1}h×w
is the bi
nary mask of the pixel to be removed, and ̂
G ∈ Rh×w×c
is the enhanced
result. For the binary maskM, ifMi,j = 1, the pixels (i, j) in the input
image is retained; otherwise, mask them.
As shown in Fig. 3, we use (r, d, x, y) to representM, r is the ratio of
mask retention, d is the length of a unit, x and y indicate the distance
between the first complete unit of the image and the image boundary.
The r determines the retention ratio of the input image. We define the
retention ratio p of the given mask M as follows.
p =
sum(M)
H × W
(3)
Too large a retention ratio may lead to overfitting, and too small a
retention ratio may lead to underfitting due to the loss of too much in
formation. The relationship between r and p can be expressed as follows.
p = 1 − (1 − r)2
= 2r − r2
(4)
The d determines the size of a removed area. When r fixed, the
relationship between a removed side length l and d is:
l = r × d (5)
Fig. 1. Framework of the proposed method.
J. Yang et al.
4. Advanced Engineering Informatics 54 (2022) 101815
4
We choose randomly from a range as follows.
d = random(dmin, dmax) (6)
Given the r andd, x and y determines the range of the moving mask,
they can ensure that they can move to all possible positions. Therefore,
the x and y can be selected randomly according to the following
equation.
x(y) = random(0, d − 1) (7)
In the preprocessing stage, we utilize the MMP to ensure the
Fig. 2. Diagram of learning principle in feature representation based on mask.
Fig. 3. The principle of the MMP.
J. Yang et al.
5. Advanced Engineering Informatics 54 (2022) 101815
5
differences between samples, improve the difficulty of model training
tasks, and then make the trained model more generalized. It is worth
noting that we used the original data with a certain probability in the
training process, to ensure that the original data set is a subset of the
enhanced set, and then the trained model has a stronger generalization
ability under the enhanced assurance assumption [40].
3.3. The feature embedding stage
The composition of the proposed dilation module in feature
embedding is shown in Fig. 4. First, the multi-scale feature relationship
is extracted in parallel by using the dilated convolution with different
dilated rates [41]. Second, the features are combined to use 1 × 1
convolution reduce its size, and finally integrate the identity mapping of
the original input into the output layer to reduce the loss of effective
features. From a theoretical point of view, we express dr as dilated rate,
and then the dilated convolution ∗dr can be expressed as follows.
(F∗drk)(p) =
∑
s+drt=p
F(s)k(t) (8)
The detailed structure of other convolution modules and the ELCA
module are described in the reference [33].
3.4. The classification stage
As shown in Fig. 5, the diagnosis task is equivalent to an N-class
image classification task in the training set Dtrain
= {(zj, yj)}N
j=1 and the
test setDtest
= {(zj, yj)}N
j=1. The trained model will be tested on the new
taskDtest
′
. We provide time–frequency images to the network to generate
the BDC matrixAθ(zj). The prototype of the k-category in the training set
is the average value of the BDC matrix belonging to this category. Its
calculation equation is as follows.
Pk =
1
K
∑
(zj,yj)∈Sk
Aθ(zj) (9)
where Sk is the sample set is labeled with k-class. We generate the dis
tance between the class distribution and the training set class prototype
based on softmax, and then formulate the loss function as follows.
argmin
θ
−
∑
(zj,yj)∈Dtest
′
log
exp(τtr(Aθ(zj)T
Pyj
)
∑
kexp(τtr(Aθ(zj)T
Pk)
(10)
where τ is a learnable scaling parameter.
Based on the theory of covariance of Brownian distance, we let X =
Rp
and Y = Rq
be random vectors with dimensions of p and q respec
tively. Assuming fXY(x, y) is their joint probability density function, the
joint characteristic function of X and Y can be defined as follows.
∅XY (t, s) =
∫
Rp
∫
Rq
exp(i(tT
x + sT
y))fXY (x, y)dxdy (11)
where i is an imaginary number unit. Obviously, the marginal distri
butions of X and Y are ∅X(t) = ∅XY(t, 0) and∅Y(s) = ∅XY(0,s), where 0 is
the vector with all elements being zero.
When and only when∅XY(t, s) = ∅X(t)∅Y(s), X and Y are indepen
dent of each other. At this time, assuming that X and Y have limited first-
order moments, the covariance measure of Brownian distance between
them can be defined as follows.
ρ(X, Y) =
∫
Rp
∫
Rq
|∅XY (t, s) − ∅X(t)∅Y (s)|2
cpcq‖t‖1+p
‖s‖1+q
dtds (12)
where ‖⋅‖ is the Euclidean norm,cp = π(1+p)/2
Γ((1 + p)/2), and Γ is
gamma function.∅XY(t, s) is the joint characteristic function of X andY,
and∅X(t),∅Y(s) represents marginals.
For the set of m observation sets{(x1, y1), ..., (xm, ym)}, if they are
independent and identically distributed, the continuous expression of
the Brownian distance covariance matrix can be defined according to an
empirical characteristic function as follows.
∅XY (t, s) =
1
m
∑
m
k=1
exp(i(tT
xk + sT
yk)) (13)
In the discrete case, let̂
A = (̂
akl) ∈ Rm×m
, where ̂
akl = ‖xk − xl‖ in
dicates the Euclidean distance matrix between the calculated observa
tionsX. Similarly, we calculate the Euclidean distance matrix̂
B =
(̂
bkl) ∈ Rm×m
, wherê
bkl = ‖yk − yl‖. Then the covariance measure of
Brownian distance can be characterized in a relatively simple form as
follows.
ρ(X, Y) = tr(AT
B) (14)
where tr(⋅) represents matrix trace, T indicates the matrix transpose, and
A = (akl) is the covariance matrix of Brownian distance. Andakl =
̂
akl − 1
m
∑m
k=1̂
akl− 1
m
∑m
l=1̂
akl− 1
m2
∑m
k=1
∑m
l=1̂
akl.The last three items in the
above equation represent l − th columns, k − th rows, and all entries in̂
A.
Fig. 4. The structure of the dilation module.
J. Yang et al.
6. Advanced Engineering Informatics 54 (2022) 101815
6
The matrix B can be obtained in the same waŷ
B. Since the Brownian
distance covariance matrix is symmetric, ρ(X, Y) can be written in the
form of two Brownian distance covariance matrix vectors a and b inner
products as follows.
ρ(X, Y) = 〈a, b〉 = aT
b (15)
Through the above analysis, we can calculate the Brownian distance
covariance matrix of each input image independently. The module
design based on Brownian distance covariance matrix is shown in Fig. 6.
Specifically, the two-tier structure of the module is used to perform the
operation of reducing the size and calculate the brown distance
covariance matrix.
Suppose we embed the color image into the feature space z ∈ R3
and
obtain a tensorh × w × d, where h and w indicate the height and width, d
represents the number of channels. We reshape tensors into
matricesX ∈ Rhw×d
, and we can view each column χk ∈ Rhw
or row xj ∈
Rd
after transposing as observations of random vectorsX.
In the following content, we make a random observation χk as an
example. First, calculate the square Euclidean distance matrix̃
A = (̃
akl),
where ̃
akl is the square Euclidean distance between k-th column and l-th
column inX.
̃
A = 2(1(XT
X◦
I))sym − 2XT
X (16)
where the matrix 1 ∈ Rd×d
indicates that each element is one, I is the
identity matrix and ◦
represents the Hadamard product.
Then, the Euclidean distance matrix ̂
A = (
̅̅̅̅̅̅
̃
akl
√
) is obtained after
squaring. Finally, the BDC matrix A is obtained by subtracting the row
Fig. 5. The principle of classification stage.
Fig. 6. The calculation process based on BDC.
J. Yang et al.
7. Advanced Engineering Informatics 54 (2022) 101815
7
average value, column average value and average value of all elements
from̂
A. The specific calculation equation is as follows.
A = ̂
A −
2
d
(1̂
A)sym +
1
d2
1̂
A1 (17)
It is worth noting that we can approximate the BDC matrix as a
nonparametric, modular pooling layer. Eq. (13) shows that the BDC
matrix combined with Euclidean distance can model the nonlinear
relationship between channels. Compared with the covariance matrix
that can only simulate the linear relationship, the BDC matrix compre
hensively considers the joint distribution, which has more advantages
than the covariance matrix that only considers the edge factors in the
task of few shot classification.
4. Case verification
In this section, the performance of the BCMPN is verified based on
the UConn data set and the laboratory data set. Meanwhile, combined
with the actual application requirements, we carry out experiments
based on the situation of FST-ZST to test the performance of this method.
The comparison model includes more advanced transfer learning, meta
learning and adversarial learning methods. The structural parameter
references of comparison methods are as follows.
MSSA [14]: MSSA is a multi-source subdomain adaptation transfer
learning method, which can transfer diagnostic knowledge from multi
ple sources. In addition, the learning rate is 0.01, and one training in
cludes 10 epochs.
MSTLN [18]: MSTLN is a multi-source transfer learning network,
which can transfer diagnostic knowledge from multiple source ma
chines. Moreover, the learning rate is 0.0005, mini-batch size is 64.
TCNN [20]:TCNN is a transferable CNN, which can promote the
learning ability under target tasks. Meanwhile, learning rate is 0.01 and
the momentum is 0.97 with 100 epochs.
FRAN [21]: FRAN is an unsupervised domain adaptation method,
which can relieve domain shifts. The learning rate is1e-5, mini-batch
size is 64.
EAPN [33]: This paper is based on the improvement of the diagnosis
method EAPN that considers the problem of small samples and noise.
The learning rate is selected as 0.001.
The computer configuration of all experiments in this paper is as
follows: the Radeon Graphics CPU @ 2.10 GHz with 16 GB AMD Ryzen 5
4600u memory, the Intel (R) Xeon (R) CPU @ 2.30 GHz with 12 GB
memory and Tesla K80-GPU, the version of Python is 3.7.
In this paper, the evaluation index to measure the performance of
different models is accuracy, which is defined as follows.
ACC =
TP + TN
TP + TN + FP + FN
(18)
where TP represents the number of correctly divided positive cases,
FP represents the number of incorrectly divided positive cases, FN is the
number of incorrectly divided negative cases, TN is the number of
correctly divided negative cases.
4.1. Experimental Dataset description
There are five datasets in this paper, including UConnA, LabE, LabF,
LabG, LabH. In this paper, we assume that the fault types classified in the
target domain are set to be the same as those in the source domain.
4.1.1. Case 1: UConn data set
The UConn data set is a gearbox vibration signal collected by re
searchers using the dSPACE system at a sampling frequency of 20 kHz.
Among them, the structure and description of the gearbox collecting
signals are shown in Fig. 7 [42], in which a variety of different gear
Fig. 7. The gearbox signal acquisition system and 4 kinds of gear health status.
J. Yang et al.
8. Advanced Engineering Informatics 54 (2022) 101815
8
conditions are introduced into the pinion on the input shaft. Based on
the experimental requirements, three kinds of data, including health,
root cracking and spalling, are selected for the experiment. The sampling
points refer to the number of data points after each sampling based on
the vibration sensor, that is, how many points a time-domain waveform
consists of. In addition, the size of the sampling points will directly affect
the frequency resolution, so we develop each type as a total of 30
samples, and each sample contains 1024 sampling points. It is worth
noting that the number of sampling points can completely include the
number of points in multiple sampling cycles.
4.1.2. Case 2: Laboratory data set
This paper introduces the gear data set collected under real condi
tions. As shown in Fig. 8, we install acceleration sensors on the input
shaft motor side, gearbox box and output shaft motor side. The gear data
of 7 different states are collected at the sampling frequency of 5120hz,
including health, the root cracks are 1 mm, 2 mm and 3 mm respectively,
and the tooth surface spalling was 1B, 2B and 3B respectively. The
specific details are shown in Fig. 9. In order to highlight the advantages
of this method in identifying weak faults with low damage degree, this
paper uses health, 1 mm tooth root crack and 1B tooth surface spalling,
which are shown in Table 1. The rotating speed is selected as 1500 rpm
and 600 rpm, and the load is set as 0hp, 2HP and 4HP.
4.2. Model structure and parameter selection
In the parameter selection stage, we test for the selection of various
parameters based on the input 250 × 250 size of pictures.
By observing Fig. 10 (a), it can be revealed that the highest accuracy
rate is 93.71% whenr = 0.5. In the task of cross machine, the highest
accuracy of 87.39% can be obtained whenr = 0.6. Through the result of
above experiments, we can draw that we need to choose larger r in more
difficult tasks. After thinking, it can be considered that if the model
wants to perform well in the more difficult task, it should retain the
complex classification features at a deeper level, and on the other hand,
it should remove the simple classification information extracted at a
shallow level. In summary, the results of the experiment are consistent
with our common sense.
Fig. 10 (b) shows the comparison of model diagnostic performance in
different ranges ofd. We found that when the range of d is too small, the
diagnostic performance is relatively low. With the increase of the range,
the diagnostic accuracy shows an upward trend. It can be seen that the
selection of values in different ranges will have different effects on the
diagnosis process, and the increase of range of d can well enhance the
diagnosis performance of the network.
In order to illustrate the influence of the modules proposed in this
paper on the diagnosis accuracy at different layers of the network, as
well as the influence of convolution kernel size and dilated rate on the
model classification ability, this paper carried out experiments based on
the cross-condition diagnosis task LabE-LabF, and the results are shown
in Table 2.
From the results in Table 2, it can be observed that the BCMPN1
model achieves the highest accuracy of 93.71%. Through the analysis, it
can be believed that the shallow data features are relatively sparse, and
the multi-scale feature spatial correlation can be captured by using the
dilation module. With the deepening of the network layer, the ELCA
module aims to extract high-level abstract features, that is, the combi
nation of shallow features and other information. If the multi-scale in
formation and dense features are combined by using the dilation module
in the deep layer of the network, it will inevitably lead to the confusion
of information flow and weaken the diagnostic performance of the
network.
4.3. Method comparison
4.3.1. Cross-condition diagnostic experiment
To verify the superiority of the method proposed in this paper, we
compared it with the authoritative methods based on transfer learning
and domain adversarial learning in recent years. The comparison results
of cross-condition diagnosis are shown in Fig. 11.
It can be calculated from Fig. 11 that the average diagnostic accuracy
of our method under six experimental conditions is 93.05%. Under the
same conditions, the accuracy of our method is higher than that of all
comparison methods, which well proves that the proposed method has
certain advantages in classifying the same fault under the complex
condition. It is worth noting that under the experimental condition of
LabF-LabG, the accuracy is lower than the average accuracy. This shows
that the diagnostic accuracy of the model decreases when the high-speed
condition is the source domain and the low-speed condition is the target
domain. Through analysis, we believe that the model faces the problem
of untrained low-impact information samples in the source domain, that
is, zero samples in the cross-condition diagnosis task. The reason for the
above problem may be the fault impact is strong under high speed
conditions and the sample contains more intense fault information.
Therefore, when the cross-domain task is from high-speed condition to
low speed condition, the problem of zero samples in the target domain
will greatly affect the performance of the model. The comparison
method can achieve ideal cross machine diagnosis through domain
adaptation and other methods. However, in the data set of this paper,
different working conditions and multiple fault degrees lead to more
complex diagnostic tasks. Under the more complex fault tasks
mentioned above, our method is ahead of all comparison methods. In
addition, the above analysis and results well illustrate the superiority of
the method in this paper under the target domain zero sample problem.
4.3.2. Cross-device diagnostic experiment
It can be seen from Fig. 12 that the average accuracy of the proposed
method in cross-device diagnosis can reach 82.32%, which is better than
other methods. In addition, it can be revealed that under the condition of
UConn data set as the source domain and Lab data set as the target
domain, the accuracy of all models is higher than that under the reverse
of the condition. After thinking, we think the possible reason is that
Fig. 8. The schematic diagram of the experiment table structure and sensor placement location.
J. Yang et al.
9. Advanced Engineering Informatics 54 (2022) 101815
9
UConn data set, as a standard sample collected under good laboratory
conditions, has more representative feature of gear fault. And a good
data set leads to the model can learn more target information and thus
having a higher accuracy under cross-device diagnosis tasks. According
to the time–frequency diagram in Fig. 9, a certain amount of noise was
introduced in the data acquisition process, which led to the yellow part
of the impact response in the area outside the fault frequency in the
figure. Therefore, the fault information in the data is relatively limited to
be learned by the model. The extreme problem of FST-ZST leads to the
decline of the performance of each model in cross-device diagnosis.
4.4. Ablation experiments
4.4.1. Importance exploration of MMP
To highlight the effectiveness of the MMP stage proposed in this
paper, this paper compares it with the two image enhancement methods
of Cutout[43] and hide and seek (HaS) [44] in the image processing
field. The specific results are shown in Fig. 13.
From the results, we can find that the MMP proposed in this paper
has the best gain effect on the model. In order to better show the gain
effect brought by MMP, we compared the number of iterations of the
network based on LabE-LabF and LabA-LabE tasks, respectively.
As can be seen from Fig. 14, on the two tasks of LabE-LabF and LabA-
Fig. 9. Seven kinds of gear health status based on actual laboratory data set.
Table 1
Different conditions of the gearing on different datasets.
Gearing
Status
Fault
Degree
Speed/
rpm
Number of
samples
Loads Dataset
Root crack 1 mm 600 30 2HP LabE
Spalling 1B 600 30
Healthy – 600 30
Root crack 1 mm 1500 30 2HP LabF
Spalling 1B 1500 30
Healthy – 1500 30
Root crack 1 mm 600 30 4HP LabG
Spalling 1B 600 30
Healthy – 600 30
Root crack 1 mm 1500 30 4HP LabH
Spalling 1B 1500 30
Healthy – 1500 30
Fig. 10. The selection of parameters based on MMP.
J. Yang et al.
10. Advanced Engineering Informatics 54 (2022) 101815
10
LabE, the curve with the MMP method converges faster and reaches
stability in 55 and 70 training epochs respectively. In contrast, the
number of stable iterations without MMP method is 70 and 80, respec
tively. This is mainly because MMP establishes a good optimization
starting point in the parameter space area, which makes MMP can
effectively reduce the number of learning iterations of cross-domain
diagnosis and enhance the generalization performance of the model.
4.4.2. Exploration of different metric distances
Through Fig. 15, we compare the diagnostic effect of the metric
method based on inner product used in this paper with the traditional
method based on cosine distance and Euclidean distance in current
metric learning. It can be seen that our method can achieve 80.38% and
79.05% diagnostic accuracy even in the two tasks of LabE-UConnA and
LabF-UConnA, which perform the worst in cosine distance and
Euclidean distance. This well proves that the BDC module based on inner
product developed in this paper has better performance in cross working
condition and cross device diagnosis tasks.
Table 2
Fault diagnosis results of different module orders.
BCMPN 1 BCMPN 2 BCMPN 3 BCMPN 4
Dilation ELCA Dilation ELCA Dilation ELCA Dilation ELCA
Stage 1 √ – √ – – – √ –
Stage 2 √ – √ – – – – –
Stage 3 – √ – – √ – √ –
Stage 4 – – – √ √ – – –
Stage 5 – – – – – √ – –
Stage 6 – – – – – – – √
Accuracy 93.71% 93.16% 90.32% 91.47%
LabE-LabF LabE-LabG LabE-LabH LabF-LabG LabF-LabH LabG-LabH
Ours 93.71 95.07 90.25 89.71 95.37 94.2
EAPN 80.49 81.37 75.98 76.32 82.1 84.74
TCNN 77.42 80.33 72.97 78.43 79.35 75.44
FRAN 79.14 80.1 76.94 74.31 79.08 80.44
MSSA 83.41 83.99 77.36 76.14 82.33 81.79
MSTLN 82.74 83.29 79.24 76.91 83.1 80.23
20
40
60
80
100
Accuracy(%)
Fig. 11. The result of cross-condition diagnostic experiment.
UConnA-
LabE
UConnA-
LabF
UConnA-
LabG
UConnA-
LabH
LabE-
UConnA
LabF-
UConnA
LabG-
UConnA
LabH-
UConnA
Ours 87.39 87.14 83.3 89.37 80.38 79.05 78.1 72.82
EAPN 77.32 74.54 73.1 75.1 67.38 65.15 66.4 63.19
TCNN 60.17 62.04 58.1 63.47 55.32 53.01 52.35 50.37
FRAN 72.09 68.47 67.03 73.24 65.97 64.19 61.7 58.12
MSSA 75.53 74.26 73.49 78.91 70.3 66.74 64.12 60.77
MSTLN 74.19 73.42 70.34 77.04 68.13 65.32 64.18 60.1
20
40
60
80
100
Accuracy(%)
Fig. 12. The result of cross-device diagnostic experiment.
J. Yang et al.
11. Advanced Engineering Informatics 54 (2022) 101815
11
5. Conclusion
This paper presents a novel fault diagnosis method for gear systems
based on the Brownian correlation metric prototypical network algo
rithm. The core idea of this method is to utilize the knowledge in the
limited source domain samples to improve the performance under the
zero samples in the target domain. The BCMPN is an improved version of
the EAPN we published before, in which the MMP technology is devel
oped to establish better generalization performance from a global and
local perspective. Meanwhile, the integrated MSFE module and ELCA
module are designed to extract representative features in limited sam
ples. In the classification stage, we introduce Brown distance covariance
into the cross-domain fault diagnosis field for the first time, compre
hensively consider the joint distribution and complete the fault classi
fication by calculating the inner product. Compared with the advanced
methods in recent years, our method has better performance. The future
work includes further improving the model structurally to reduce the
training time of the network and trying to combine unsupervised
learning methods to better apply to other data-driven tasks.
The current diagnosis mechanism still has some space in computing
time. In the future, the model structure and distance measurement
mechanism will be improved to reduce computing time.
Fig. 13. Result of different image enhancement methods.
Fig. 14. Result of different image enhancement methods.
Fig. 15. Result of different methods based on different metric distances.
J. Yang et al.
12. Advanced Engineering Informatics 54 (2022) 101815
12
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Data availability
The authors do not have permission to share data.
Acknowledgment
This work was supported by Natural Science Foundation of Hei
longjiang Province of China (Grant No. LH2021F021).
References
[1] S. Liu, H. Jiang, Y. Wang, K. Zhu, C. Liu, A deep feature alignment adaptation
network for rolling bearing intelligent fault diagnosis, Adv. Eng. Inform. 52 (2022),
101598.
[2] Z. Huang, et al., A Multisource Dense Adaptation Adversarial Network for Fault
Diagnosis of Machinery, IEEE Trans. Ind. Electron. 69 (6) (2022) 6298–6307.
[3] Y. Wang, B. Qin, K. Liu, M. Shen, M. Niu, L. Han, A New Multitask Learning Method
for Tool Wear Condition and Part Surface Quality Prediction, IEEE Trans. Ind. Inf.
17 (9) (2020) 6023–6033.
[4] P. Xia, Y. Huang, P. Li, C. Liu, L. Shi, Fault Knowledge Transfer Assisted Ensemble
Method for Remaining Useful Life Prediction, IEEE Trans. Ind. Inf. 18 (3) (2021)
1758–1769.
[5] Z. Wang, J. Xuan, Intelligent fault recognition framework by using deep
reinforcement learning with one dimension convolution and improved actor-critic
algorithm, Adv. Eng. Inform. 49 (2021), 101315.
[6] X. Gu, Y. Zhao, G. Yang, L. Li, An Imbalance Modified Convolutional Neural
Network With Incremental Learning for Chemical Fault Diagnosis, IEEE Trans. Ind.
Inf. 18 (6) (2021) 3630–3639.
[7] X. Wang, L. Luo, L. Tang, Z. Yang, Automatic representation and detection of fault
bearings in in-wheel motors under variable load conditions, Adv. Eng. Inform. 49
(2021), 101321.
[8] C. Wang, Z. Xu, An intelligent fault diagnosis model based on deep neural network
for few-shot fault diagnosis, Neurocomputing 456 (2021) 550–562.
[9] S. Zhang, Y. Li, W. Cui, R. Yang, J. Dong, Hu, Limited data rolling bearing fault
diagnosis with few-shot learning, IEEE Access 7 (2019) 110895–110904.
[10] S. Li, A. Li, Q. Zhang, Z. He, J. Liao, Hu, Meta-learning for few-shot bearing fault
diagnosis under complex working conditions, Neurocomputing 439 (2021)
197–211.
[11] J. Fan, X. Yuan, Z. Miao, Z. Sun, X. Mei, F. Zhou, Full Attention Wasserstein GAN
With Gradient Normalization for Fault Diagnosis Under Imbalanced Data, IEEE
Trans. Ins. Mea 71 (2022) 1–16.
[12] Y. Zhou, Y. Ning, et al., Deep Dynamic Adaptive Transfer Network for Rolling
Bearing Fault Diagnosis with Considering Cross-machine Instance, IEEE Trans. Ins.
Mea 70 (2021).
[13] Y. Zou, K. Shi, Y. Liu, G. Ding, K. Ding, Rolling bearing transfer fault diagnosis
method based on adversarial variational autoencoder network, Meas. Sci. Technol.
32 (11) (2021), 115017.
[14] J. Tian, D. Han, M. Li, P. Shi, A multi-source information transfer learning method
with subdomain adaptation for cross-domain fault diagnosis, Knowl.-Based Syst.
243 (2020), 108466.
[15] J. Wang, S. Ji, B. Han, H. Bao, X. Jiang, Deep Adaptive Adversarial Network-Based
Method for Mechanical Fault Diagnosis under Different Working Conditions,
Complexity 2020 (2020) 6946702.
[16] L. Wan, Y. Li, K. Chen, K. Gong, C. Li, A novel deep convolution multi-adversarial
domain adaptation model for rolling bearing fault diagnosis, Measurement 191
(2022), 110752.
[17] L. Chen, Q. Li, C. Shen, J. Zhu, D. Wang, M. Xia, Adversarial Domain-Invariant
Generalization: A Generic Domain-Regressive Framework for Bearing Fault
Diagnosis Under Unseen Conditions, IEEE Trans. Ind. Inf. 18 (3) (2021)
1790–1800.
[18] B. Yang, S. Xu, Y. Lei, C. Lee, S. Edward, R. Clive, Multi-source transfer learning
network to complement knowledge for intelligent diagnosis of machines with
unseen faults, Mech. Syst. Signal Process. 162 (2022), 108095.
[19] S. Liu, H. Wang, J. Tang, X. Zhang, Research on fault diagnosis of gas turbine rotor
based on adversarial discriminative domain adaption transfer learning,
Measurement 196 (2022), 111174.
[20] Z. Chen, K. Gryllias, W. Li, Intelligent Fault Diagnosis for Rotary Machinery Using
Transferable Convolutional Neural Network, IEEE Trans. Ind. Inf. 16 (1) (2020)
339–349.
[21] J. Chen, J. Wang, J. Zhu, T.H. Lee, C.W. Silva, Unsupervised Cross-Domain Fault
Diagnosis Using Feature Representation Alignment Networks for Rotating
Machinery, IEEE ASME Trans. Mechatron. 26 (5) (2020) 2770–2781.
[22] F. Zhou, S. Yang, H. Fujita, D. Chen, C. Wen, Deep learning fault diagnosis method
based on global optimization GAN for unbalanced data, Knowl.-Based Syst. 187
(2020), 104837.
[23] C. Zheng, C. Zhao, Fault-Prototypical Adapted Network for Cross-Domain
Industrial Intelligent Diagnosis, IEEE Trans. Autom. Sci. Eng. 1–10 (2021).
[24] C. Jiang, H. Chen, Q. Xu, X. Wang, Few-shot fault diagnosis of rotating machinery
with two-branch prototypical networks, J. Intell. Manuf. (2022).
[25] C. Qian, Q. Jiang, Y. Shen, C. Huo, Q. Zhang, An intelligent fault diagnosis method
for rolling bearings based on feature transfer with improved DenseNet and joint
distribution adaptation, Meas. Sci. Technol. 33 (2) (2022), 025101.
[26] T.M. Cover, J.A. Thomas, Elements of information theory (2005) 463–508, https://
doi.org/10.1002/047174882X.ch14.
[27] C. Zhang, Y. Cai, G. Lin, C. Shen, DeepEMD: Few-shot image classification with
differentiable earth mover’s distance and structured classifiers, (2020) https://doi.
org/10.48550/arXiv.2003.06777.
[28] D. Wertheimer, L. Tang, B. Hariharan, Few-Shot Classification with Feature Map
Reconstruction Networks, (2021) https://doi.org/10.48550/arXiv.2012.01506.
[29] B. Christopher, Pattern Recognition and Machine Learning, Springer. (2006),
https://doi.org/10.18637/jss.v017.b05.
[30] MI. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, RD. Hjelm,
MINE: Mutual Information Neural Estimation. (2018) https://doi.org/10.48550/
arXiv.1801.04062.
[31] S. Gabor, R. Maria, Brownian distance ́covariance, Ann. Appl. Stat. 3 (4) (2009)
1236–1265.
[32] J. Xie, F. Long, J. Lv, Q. Wang, P. Li, Joint Distribution Matters: Deep Brownian
Distance Covariance for Few-Shot Classification, (2022) https://doi.org/
10.48550/arXiv.2204.04567.
[33] C. Wang, H. Sun, X. Cao, Construction of the efficient attention prototypical net
based on the time-frequency characterization of vibration signals under noisy small
sample, Measurement 179 (2021), 109412.
[34] Z. Yan, M. Ayaho, Z. Jiang, X. Liu, An overall theoretical description of frequency
slice wavelet transform, Mech. Syst. Signal Process. 24 (2) (2010) 491–507.
[35] H. Sun, X. Cao, C. Dong, S. Gao, An interpretable anti-noise network for rolling
bearing fault diagnosis based on FSWT, Measurement 190 (2022), 110698.
[36] J. Joshua, G. Ashok, Perceptually grounded self-diagnosis and self-repair of domain
knowledge, Knowl.-Based Syst. 27 (2012) 281–301.
[37] E. Belouadah, A. Popescu, I. Kanellos, A Comprehensive Study of Class Incremental
Learning Algorithms for Visual Tasks, (2020) https://doi.org/10.48550/
arXiv.2011.01844.
[38] M. Lu, H. Liu, X. Yuan, Thermal fault diagnosis of electrical equipment in
substations based on image fusion, Trait. Signal. 38 (4) (2021) 1095–1102.
[39] P. Chen, S. Liu, H. Zhao, J. Jia, GridMask Data Augmentation (2020), https://doi.
org/10.48550/arXiv.2001.04086.
[40] T. Yao, X. Yi, DZ. Cheng, F. Yu, T. Chen, A. Menon, L. Hong, EH. Chi, S. Tjoa, J.
Kang, Self-supervised Learning for Large-scale Item Recommendations, (2020)
https://doi.org/10.48550/arXiv.2007.12865.
[41] F. Yu, V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR.
(2016), https://doi.org/10.48550/arXiv.1511.07122.
[42] P. Cao, S. Zhang, J. Tang, Preprocessing-Free Gear Fault Diagnosis Using Small
Datasets With Deep Convolutional Neural Network-Based Transfer Learning, IEEE
Access 6 (2018) 26241–26253.
[43] T. DeVries, G. W. Taylor, Improved regularization of convolutional neural
networks with cutout, (2017) https://doi.org/10.48550/arXiv.1708.04552.
[44] K. Kumar, Y. Hao, A. Sarmasi, G. Pradeep, Y. Jae, Hide-and-seek: A data
augmentation technique for weakly-supervised localization and beyond, (2018)
https://doi.org/10.48550/arXiv.1811.02545.
J. Yang et al.