SlideShare a Scribd company logo
1 of 24
Download to read offline
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
Applied Soft Computing xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Unsupervised feature selection using swarm intelligence and
consensus clustering for automatic fault detection and diagnosis in
Heating Ventilation and Air Conditioning systems
Mitchell Yuwonoa,∗Q1 , Ying Guob
, Josh Wallc
, Jiaming Lib
, Sam Westc
, Glenn Plattc
,
Steven W. Sua
a
Faculty of Engineering and Information Technology, University of Technology, Sydney (UTS), 15 Broadway, Ultimo, NSW 2007, Australia
b
The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Computational Informatics, Marsfield, NSW 2122, Australia
c
The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Energy Technology, Mayfield West, NSW 2304, Australia
a r t i c l e i n f o
Article history:
Received 4 May 2014
Received in revised form 12 February 2015
Accepted 17 May 2015
Available online xxx
Keywords:
Data clusteringQ4
Consensus clustering
Feature selection
Ensemble Rapid Centroid Estimation (ERCE)
Particle Swarm Optimization
Fault detection and diagnosis
Heating Ventilation and Air Conditioning
(HVAC) system
Nonlinear Auto-Regressive Neural Network
with eXogenous inputs and distributed
time delays (NARX-TDNN)
Hidden Markov Model
a b s t r a c t
Various sensory andQ3 control signals in a Heating Ventilation and Air Conditioning (HVAC) system are
closely interrelated which give rise to severe redundancies between original signals. These redundancies
may cripple the generalization capability of an automatic fault detection and diagnosis (AFDD) algo-
rithm. This paper proposes an unsupervised feature selection approach and its application to AFDD in
a HVAC system. Using Ensemble Rapid Centroid Estimation (ERCE), the important features are auto-
matically selected from original measurements based on the relative entropy between the low- and
high-frequency features. The materials used is the experimental HVAC fault data from the ASHRAE-
1312-RP datasets containing a total of 49 days of various types of faults and corresponding severity.
The features selected using ERCE (Median normalized mutual information (NMI) = 0.019) achieved the
least redundancies compared to those selected using manual selection (Median NMI = 0.0199) Complete
Linkage (Median NMI = 0.1305), Evidence Accumulation K-means (Median NMI = 0.04) and Weighted Evi-
dence Accumulation K-means (Median NMI = 0.048). The effectiveness of the feature selection method is
further investigated using two well-established time-sequence classification algorithms: (a) Nonlinear
Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN); and
(b) Hidden Markov Models (HMM); where weighted average sensitivity and specificity of: (a) higher
than 99% and 96% for NARX-TDNN; and (b) higher than 98% and 86% for HMM is observed. The proposed
feature selection algorithm could potentially be applied to other model-based systems to improve the
fault detection performance.
© 2015 Published by Elsevier B.V.
1. Introduction
Q5
Heating Ventilation and Air Conditioning (HVAC) systems are
important for maintaining the thermal comfort and indoor air qual-
ity at places such as offices, shopping malls, warehouses, schools,
and homes [1,2]. According to the report by CSIRO [3], 25% of energy
consumption in Australia is accounted from commercial buildings
[3]. Moreover, HVAC systems represents 40–50% of energy use
in these buildings [4]. In the United States (US), HVAC systems
account for almost 31% of the electricity consumed by households
∗ Corresponding author. Tel.: +61 430731938.Q2
E-mail addresses: mitchellyuwono@gmail.com (M. Yuwono), Ying.Guo@csiro.au
(Y. Guo), Josh.Wall@csiro.au (J. Wall), Jiaming.Li@csiro.au (J. Li), Sam.West@csiro.au
(S. West), Glenn.Platt@csiro.au (G. Platt), Steven.Su@uts.edu.au (S.W. Su).
[1]. Operational problems in the HVAC systems can cause excess
energy consumption. Regular checks and maintenance are there-
fore crucial to prevent unnecessary consumption. However, due to
the high reactionary maintenance costs, preventive or predictive
maintenance practices are usually preferred to reactionary main-
tenance.
Discriminating a normally behaving HVAC system to a fault
condition is a relatively well researched area. A variety of auto-
matic fault detection and diagnosis (AFDD) techniques provide a
number of benefits to the HVAC systems [5–7]. The current AFDD
techniques available in the market for HVAC systems are mainly
rule-based approaches [8–10], which obtain prior knowledge to
derive a set of if-then-else rules and an inference mechanism that
searches through the rule-space to draw conclusions. The rule-
based systems can be based solely on expert knowledge (inferred
from experience) or can be based on prior knowledge of a specific
http://dx.doi.org/10.1016/j.asoc.2015.05.030
1568-4946/© 2015 Published by Elsevier B.V.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
2 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
system. Being one of the very first methods used in HVAC fault
detection problems, the rule-based approaches have been most
popularly used over the last decades.
Indeed the rule-based approaches come with advantages
including ease of development, transparent reasoning, ability
to reason even under uncertainty, and the ability to provide
explanations for the conclusions reached. However, one must
realize that most HVAC systems are installed in different build-
ings/environments. This generally means that rules or analytical
models developed for a particular system cannot be easily applied
to an alternative system. As such, the difficult process of deter-
mining and setting rules or generating analytical mathematical
models must be tailored to each individual building/environment.
The threshold method utilized in the rule-based system is prone
to producing false alarms. Moreover, building conditions such as
structure of the internal architecture design and even external fac-
tors (such as shading and the growth of plant life) often change after
the system installation/initialization of a fault detection system,
which can require rules/models that were originally appropri-
ate to be revisited and updated. It can be learned that a number
of weaknesses associated with this type of approach include the
requirement of specific tailoring to a system, potential failure of
the AFDD system due to its limited knowledge boundaries, and dif-
ficulty in updating the model when the AFDD system is installed in
a different HVAC system. The aforementioned complications with
the rule-based approach give rise to the data driven methods for
AFDD in HVAC systems.
Regardless of the approach, the performance of an AFDD algo-
rithm generally depends on the quality of the features. In CSIRO,
we are developing a novel data-driven machine learning technique
for AFDD in HVAC systems [4,11–14]. Preliminary results were
presented in [11–14], showing the superior performance of the
machine learning-based technique in detecting air-handling unit
(AHU) faults to rule-based methods based on fault data obtained
from ASHRAE Project 1312-RP up to 90% accuracy [13]. However,
one limitation of the AFDD systems described in [11–13] is that
they rely on features provided by field experts. As with rules, fea-
tures that are particularly effective for a particular system may not
guarantee equivalent performance when utilized in an alternative
system.
Selecting the appropriate features is essential in any model-
based frameworks. Feature selection aims for minimizing redun-
dancies/mutual information between features such that the more
important ‘characteristic’ features are not undermined. Specific
faults exhibit specific symptoms which are observable only in
certain clusters of features that behave differently to the others.
The difficulty is that these cluster of features need to be con-
stantly monitored as they may change dynamically depending on
the condition of the HVAC system under investigation. Moreover,
incorrect selections of these characteristic features are dangerous
as they may adversely effect the final classifier to an extent that
some obvious faults are overlooked. The motivation of this paper
is therefore to design a reliable method for feature selection that
can be used to augment the effectiveness of AFDD frameworks in
general. The unsupervised data-driven feature selection algorithm
is designed for HVAC systems operating under varying seasonal
dynamics.
Evolutionary algorithms are particularly powerful for solving
complex optimization problems with multiple local minima. For
example, Differential Evolution (DE) has been used for optimization
of pressure vessel structure design [15] and joint replenish-
ment and distribution model [16]. Although the methods outlined
in [15,16] are powerful for general purpose optimization, a
major algorithmic restructuring is required to implement these
algorithms for cluster optimization. Instead, our paper is inter-
ested in exploiting a lightweight evolutionary algorithm designed
specifically for clustering purposes, the Rapid Centroid Estimation
(RCE) [17].
Unsupervised feature selection based on data clustering is inher-
ently an ill-posed problem where the goal is to group redundant
features into some unknown number of clusters based on intrin-
sic information alone. For this paper, we utilize the Ensemble Rapid
Centroid Estimation (ERCE) [17,18], a semi-stochastic multi-swarm
clustering algorithm inspired by the Particle Swarm Optimization
(PSO [19]), to determine the characteristic features for the specific
season. The method is designed to automate the selection of charac-
teristic features in each season. The block diagram of the proposed
method is shown in Fig. 1.
The performance of the proposed feature selection algorithm
was tested using two well established time-sequence classifiers:
(a) Nonlinear Auto-Regressive Time Delay Neural Networks with
Exogenous inputs (NARX TDNN); and (b) Hidden Markov Models
(HMM) [13]. A comprehensive comparison would also be given
with regards to other feature selection methods including Li’s
Manual selection [20], Complete Linkage (CL), Ensemble Evidence
Accumulation K-means (EAC K-means) and Weighted Evidence
Accumulation K-means (WEAC K-means).
The paper is structured as follows: Section 2 presents the
overview of the proposed method as well as the materials used to
examine its performance. Section 3 presents the detailed descrip-
tion for each component including feature extraction, feature
selection, and the classifier used in experiment. Section 4 describes
the theoretical foundations of the consensus clustering algorithm
that we utilize for performing the feature selection. Section 5
describes the data utilized in the experiments. Section 6 presents
a comprehensive experimental result of the proposed method and
comparative analysis with other conventional feature selection and
classification algorithms. Section 7 presents in depth analyses and
discussion regarding the results. Finally, Section 8 presents the con-
clusion and future direction of the research.
2. General overview on HVAC systems
HVAC systems are configured and used to control the environ-
ment of a building or a zone including one or several rooms. The
environmental variables may, for example, include temperature,
air-flow, and humidity. The desired values/set-points of the envi-
ronmental variables will depend on the intended use of the HVAC
system. If the HVAC system is being used in an office building, the
environmental variables will be set to make the building/rooms
therein comfortable to humans. An HVAC system typically services
a number of zones within a building. The system normally includes
a central plant which includes:
• a hydronic heater and chiller,
• a pump system, which may include dedicated heated and chilled
water pumps, circulates heated and chilled water from the heater
and chiller through a circuit of interconnected pipes, and
• a valve system, which may include dedicated heated and chilled
water valves, controls the flow of water into a heat exchange
system (which may include dedicated heated and chilled water
coils).
The heated and/or chilled water circulates through the heat
exchange system before being returned to the central plant where
the process repeats (i.e. the water is heated or chilled and recircu-
lated). In the heat exchange system, energy from the heated/chilled
water is exchanged with air being circulated through an air distri-
bution system.
The HVAC system also includes a sensing system which typically
includes a number of sensors located throughout the system, such
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 3
Fig. 1. Block diagram of the proposed method.
as temperature, humidity, air velocity, volumetric flow, pressure,
gas, position, and occupancy detection sensors. The HVAC system
is controlled by a control system that may be a stand alone system,
or may form part of a building automation system (BAS) or build-
ing management and control system (BMCS). The control system
includes a computing system which is in communication with the
various components of the HVAC system. The control system con-
trols and/or receives feedback from the various components of the
HVAC system in order to regulate environmental conditions for the
inhabitancy or functional purpose of the building.
In an AFDD process, data from the components of the HVAC
system is received. This data may, for example, include sensed data
from various sensors within the system and feedback data from
various components of the system. Additional data from external
data sources can also be received, such as the external weather
data. Consequently, the dimensionality and volume of these data
are enormous.
In order to ensure proper identification of faults, an AFDD algo-
rithm requires redundancies in the selected sensory and control
signal sources to be minimized. Additional information given by
redundant features are irrelevant and provide no useful informa-
tion in describing the type of fault and will ultimately cripple the
generalization capability of the fault detector. Insufficient features
are equally as dangerous as it may lead misdiagnoses due to incom-
plete information.
The method presented in this paper offers an unsupervised
approach for feature selection method using ERCE. The system can
be summarized in the block diagram in Fig. 1. A sample feature
extraction and feature selection result using our proposed approach
can be seen in Fig. 2.
The experimental materials in this paper are the experimental
fault data from the ASHRAE-1312-RP datasets including Summer
2007, Spring 2008, and Winter 2008 from the ASHRAE Project 1312-
RP. In each season, different faults were generated, recorded and
reported for experimental uses.
3. Methods
Selecting important features in a HVAC system is challenging
due to the excessive interrelations between signals. This section
overviews our contribution on feature selection using consensus
clustering and how it is applied for the HVAC system in particular.
The section is subdivided into five subsections:
• Section 3.1 outlines the general model that we use for extracting
magnitude and oscillation (spectral centroid) features from a raw
signal.
• Section 3.2 outlines our proposed polar approach for visualizing
multi-dimensional patterns.
• Section 3.3 defines the measure that we use for quantifying the
degree of dissimilarity between features.
• Section 3.4 provides the general overview of our main contri-
bution, a method for feature selection using semi-stochastic
swarm-based consensus clustering, which will be further
detailed in Section 4.
• Section 3.5 shows the architecture of the neural networks that we
use to benchmark the efficiency of the proposed feature selection
method.
3.1. Extracting time signal features: magnitude and spectral
centroid
Sensory signals from a HVAC system are streamed in the form
of sampled time signals. From each time signal, HVAC engineers
mainly observe two main features for deciding the condition of the
system:
1. Whether the average magnitude of a sensory reading is inside
the typical condition for the specific season.
2. Whether there is any excessive oscillation in the sensory read-
ings compared to the typical condition for the specific season.
For example, a fault type classified as Sequence of Heating and
Cooling Unstable (HCSF0517) can be identified by observing the
excessive oscillation of the Chilled Water Coil control signal (CHWC
GPM). The phenomenon can be seen in Fig. 3. In this Figure, it is easy
to observe that the moving average magnitude of the CHWC GPM
during HCSF0517 is considerably close to the typical behavior.
We model these two features mathematically as the moving
average magnitude and spectral centroid. For a discrete signal gs(n),
the two features can be measured using a straightforward calcula-
tion as follows.
Magnitude characteristic is measured using a simple moving
average which is calculated as follows,
MAG(gs) =
1
N
N
n=1
gs(n), (1)
where n denotes the sample number, N denotes the length of the
window.
Spectral centroid of a signal describes the center of mass of the
spectrum, which can be calculated as follows,
gs = FFT(gs, NFFT ), (2)
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
4 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 2. (a) Raw signals for the Spring 2008 dataset; (b) the low and high frequency features are isolated from each signal. Signals 1–160 are moving average magnitude signals
while signals 161–320 are spectral centroid signals; (c) characteristic features are selected using ERCE, while (d) classification is done using NARX-TDNN.
SC(gs) =
NFFT
n=5
|ˆgs(n)|ˆgs(n)
NFFT
n=5
|ˆgs(n)|
, (3)
where FFT denotes fast Fourier transform, NFFT indicates the number
of bin, ˆgs(n) and |ˆgs(n)| represent the center frequency and magni-
tude of the nth bin. Notice that the frequency centroid is calculated
from the fifth bin to isolate only the high frequency oscillation.
Fault can be interpreted as ‘how much a signal deviates from its
typical characteristic during the specific season’. Incorporating this
criterion, each feature vector qs which includes {MAG(gs), SC(gs)} is
normalized with respect to its normal operation. The discrepancy
in both direction and magnitude relative to the normal signal is
represented as a signed multiple of the signal’s standard deviation
during typical operation,
zs(n) =
qs(n) − n(n)
n(n)
, (4)
where n(n) and n(n) denote the mean and standard deviation of
a feature during its normal operation at a specific sample n taken
at a particular time of the day. One can automatically realize that
the approach simply calculates the cross-sectional z-score of the
feature qs.
The hyperbolic tangent kernel is then applied on the z-score,
effectively transforming each feature to a continuous measure from
{ − 1, 1} as follows
ys(n) = tanh (zs) (5)
which has a rather intuitive ‘fuzzy’ interpretation as follows:
(a) ys(n) = 0: feature is at a typical level.
(b) ys(n) → −1: feature is atypical negative (much smaller than its
typical level),
(c) ys(n) → 1: feature is atypical positive (much larger than its typ-
ical level).
Intuitively, the variability of ys throughout the season would pro-
vide a good indicator of its importance. In this paper, we measure
variability of a feature in term of its entropy as follows,
Hys = − pys (x) log pys (x)dx, (6)
where pys (x) can be approximated empirically from the histogram
of ys.
3.2. Feature visualization
Visualization is an important tool to verify the effectiveness of a
feature selection algorithm. However, due to the complexity of an
HVAC system, simultaneous visualization would easily overwhelm
the observer.
In this paper a polar approach for visualizing patterns consti-
tuted by multi-dimensional feature cross-sections is proposed. The
visualization scheme can be seen in Fig. 4.
Using the proposed visualization scheme, we have the variable
numbers listed in particular angles in the circle, whose correspond-
ing radius represents the magnitude of ys, as previously detailed
in Eq. (5). A normal system would oscillate inside the typical
region (ys = 0) such that the polar plot shows a circle-like pat-
tern. During fault condition the sensors behave inside either the
positive/negative atypical region such that the polar plot assumes
various shapes other than circle. For example, Fig. 5 shows that the
pattern during normal operations are visually different to the OA
Damper Stuck (OADS) fault scenario.
3.3. Measuring divergence between features
A pair of feature vectors y1 ∈ Y and y2 ∈ Y calculated from Eq.
(5) can be treated as a vector of random numbers generated by the
probability distribution functions P = p(x) and Q = q(x), respectively.
y1 and y2 can be assumed to be redundant (i.e. generated from
the same distribution) when the Kullback–Leibler(KL) divergence
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 5
Fig. 3. The magnitude (top) and frequency (bottom) characteristics of the Chilled Water Control signal (CHWC GPM) during fault (HCSF0517) vs. normal (NOR0505). Even
though CHWC GPM during HCSF0517 is correlated in terms of magnitude characteristic, the signal is uncorrelated in terms of frequency characteristic.
between the two approaches zero [21]. A practical illustration of
the case can be seen in Fig. 6.
KL-divergence measures the relative entropy between two dis-
tributions [21]. KL-divergence measures the amount of information
lost when Q is used to approximate P as follows,
KL(P||Q) =
H(P,Q)
−
x
p(x) log q(x) +
−H(P)
x
p(x) log p(x), (7)
=
x
p(x) log
p(x)
q(x)
, (8)
where H(P, Q) denotes the cross entropy between P and Q and H(P)
denotes the information entropy of P. In this paper we use the
symmetrical KL-divergence as originally proposed in [21] due to
its symmetrical property as follows,
KLs(P||Q) = KL(P||Q) + KL(Q||P) =
x
p(x) log
p(x)
q(x)
− q(x) log
p(x)
q(x)
. (9)
3.4. Feature selection using consensus clustering
Performing feature selection using prototype-based algorithms
such as K-means, fuzzy C-means, or Self Organizing Map (SOM),
can be difficult because the number of characteristic features K is
not initially known. Consensus clustering provides a quantitative
evidence for determining the number and membership of possible
clusters within a dataset (in our case, features). The method has
gained popularity in cancer genomics as a powerful tool to extract
and visualize the dependencies between genes [22–24].
In this paper we propose an approach for unsupervised fea-
ture selection using a swarm based ensemble algorithm [18]. An
advantage of ensemble clustering algorithms to the conventional
clustering algorithms is that they allow a robust estimation of
natural clusters by investigating the consensus strength between
multiple clusterings [22,25,26]. Consensus clustering is particularly
powerful for identifying strong clusters in the data [22]. This is par-
ticularly useful for our application as can be seen in Section 6 where
it can be observed that the features selected using consensus clus-
tering algorithms are generally more compact and least redundant
compared to the ones selected using complete-linkage.
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
6 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 4. The proposed polar visualization scheme. In this illustration, we can see that features other than features #4 and #5 behave atypically.
The feature selection process can be summarized as follows:
1. Determine the feature clusters using consensus clustering.
2. For each cluster, rank each feature according to its entropy and
pick one whose entropy is the highest as the characteristic fea-
ture for the cluster.
A sample result of a run of feature selection process using con-
sensus clustering is shown in Fig. 7. Features in the same cluster
are denoted accordingly using the same color. The radius of each
feature indicates the entropy. A bold circle in each cluster is the
chosen characteristic features, which is the feature with the highest
entropy compared to the others in the same cluster.
3.5. Fault classification using Nonlinear Auto-Regressive Neural
Network with eXogenous inputs and distributed time delays
(NARX-TDNN)
The Non-linear Auto-Regressive with eXogeneous inputs
(NARX) network architecture [27] is a class of discrete-time non-
linear systems. The NARX architecture can be broadly expressed in
the parallel mode,
ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), ˆy(t − ny), . . ., ˆy(t − 1)), (10)
or in the series-parallel mode,
ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (11)
where u(t), y(t) and ˆy(t) denote input, actual output and esti-
mated output of the network at time t. nu and ny are the input
and output order, and f denotes a nonlinear function, which can be
Fig. 5. The proposed polar visualization scheme showing the characteristic signals in normal operation scenarios (left) and in OADS scenario (right) in the Winter 2008
dataset.
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 7
Fig. 6. A simplified case of redundancy between features in a HVAC system. How many clusters are there? It can be seen that the divergence between yCHWC−VLV and yCHWC−GPM
distributions is intuitively smaller than the divergence between yCHWC−VLV and ySA−HUMD. If these four signals were to be clustered, then a possible solution would be to assign
them into two clusters, i.e. {{ yCHWC−VLV, yCHWC−GPM }, {ySA−HUMD, yRA−HUMD}}.
approximated using a Multilayer Perceptron (MLP). As opposed to
conventional Recurrent Neural Network (RNN), a NARX network’s
feedback comes only from the output neurons rather than its hid-
den states. Using this simplified configuration, it has been argued
that NARX networks generalize better compared to other RNN net-
works, especially on problems involving long-term dependencies
[28].
The configurations described in Eqs. (10) and (11) differ only in
their mode of feedback. The configuration described in Eq. (10) is
referred to as parallel mode or recurrent NARX (NARX-P), while Eq.
(11) is referred to as series-parallel mode NARX (NARX-SP) [29].
The NARX-P uses the state estimate feedback, while NARX-SP uses
the actual observable state. Due to the fact that the actual state of an
HVAC system is practically unavailable at all times, the deployment
of NARX in an AFDD systems is currently limited to the NARX-P
configuration.
4. Consensus clustering
This section explains, in great detail, the semi-stochastic swarm-
based consensus clustering approach to feature selection in a HVAC
system. The section is subdivided into six subsections:
• Section 4.1 briefly introduces the consensus clustering paradigm,
• Section 4.2 presents the visual abstract of our proposed feature
selection method,
• Section 4.3 overviews Fred and Jain’s Ensemble Accumulation
[25],
• Section 4.4 summarizes our previous work on Swarm Rapid Cen-
troid Estimation (SRCE) [17],
• Section 4.5 introduces the newly proposed ‘self-evolution’ strat-
egy for the SRCE,
• Section 4.6 outlines the new implementation of ERCE for feature
selection purposes.
4.1. Fundamentals of consensus clustering
Consensus clustering infers a consensus matrix from multiple
runs of clustering algorithms. This consensus matrix encodes the
probability of each pairs of observation belonging to the same clus-
ter. It has been argued that the natural, and arguably, optimum
clusters can be validated with higher confidence by analyzing the
stability of this matrix [22,25].
The consensus matrix C is a positive semidefinite N × N square
matrix of joint probabilities. Each Cij ∈ {0, 1} represents the proba-
bility of data point i and j belonging in the same cluster. For given
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
8 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 7. A result of feature selection using ERCE (Algorithm 4, Section 4) on the Spring 2008 dataset, projected on the first and second principal components for ease ofQ6
visualization. Each point represents a feature where the radius denotes the corresponding entropy. Each feature cluster is color coded and the characteristic feature of each
cluster is annotated accordingly. In this example, ERCE chose 16 characteristic features from the 320 features (160 magnitude features and 160 spectral centroid features). It
can be seen that the spectral centroid feature for CHWC-GPM (SC CHWC-GPM) is selected, in line with the observation in Fig. 3. ERCE accurately discovered that Return Fan
(RF) and Supply Fan (SF) features are particularly important. This discovery is in line with the existence of Return Fan Failure (RFF) faults (May 12th, 18th, and 19th) observed
during the season. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
a cluster assignment obtained from the mth clustering, we can cal-
culate the mth co-association matrix as follows,
Cm = UT
mUm, (12)
where each Um is a Km × N matrix which stores the values of
uik,m for i ∈ {1, . . ., N} and k ∈ {1, . . ., Km} obtained from the mth
run of any clustering algorithm. Each uik,m denotes the probabil-
ity of a data point yi belonging to the cluster Ck. For any m, Um
should satisfy the constraints uik,m ∈ {0, 1} and
K
k=1
uik,m = 1. The
matrix multiplication represents a probabilistic ‘and’ operator con-
veniently calculated using the (multiplicative) fuzzy T-norm [30].
The ith diagonal component of Cm, i.e. Dii,m, quantifies the degree of
Fig. 8. An illustration describing the architecture of the Parallel Nonlinear Auto-Regressive Time Delay Neural Networks with eXogenous input (NARX-TDNN).
424
425
426
427
428
429
430
431
432
433
434
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 9
Fig. 9. Various partitions on the Spring 2008 dataset encoded by 16 subswarms of the Self Evolving Swarm Rapid Centroid Estimation (SE-SRCE, Algorithm 3). Fuzzifier
constant is set to 1.2, target entropies are uniformly randomized between 0.005 and 0.05. The coordinates are projected to the first and second principal components for
ease of visualization. In depth explanation regarding the method can be read in Section 4.4 and Section 4.5.
stability for the ith data in the mth clustering. In this paper we
propose normalizing Cm by its diagonal matrix Dm as follows,
Cm = D
−1/2
m CmD
−1/2
m (13)
The consensus C, or ensemble aggregate, is calculated as the
weighted average of the co-association matrices C1, C2, . . ., CM as
follows,
C =
M
m=1
wmCm
M
m=1
wm
, (14)
where wm denotes the weight of the corresponding partition which
can be determined manually or using any cluster validation method
[31]. wm can also be set to assume equal weighting such that wm = 1
for all m [25].
The consensus distance matrix can be defined as follows [22],
D = 1 − C (15)
which transforms the consensus matrix into a pairwise distance
matrix. Fred and Jain [25] proposes using single/average/complete
linkage algorithm on the D matrix to recover the natural cluster. In
their 2005 paper, a criterion called maximum lifetime is proposed
to determine the optimum threshold for cutting the cluster den-
drogram [25]. Readers are encouraged to refer to [25] for more
details.
4.2. Visual abstract: feature selection using ERCE
A visual abstract of the proposed swarm-based consensus
clustering algorithm can be seen in Figs. 9 and 10. Fig. 10
presents the consensus matrix and hierarchical cluster tree (clus-
ter dendrogram) from the aggregation of the partitions shown in
Fig. 9.
4.3. Evidence accumulation
Fred and Jain propose the Evidence Accumulation (EAC) in
2005 as a consensus clustering framework for combining the
result of multiple runs of a crisp prototype-based clustering
algorithm (e.g. K-means) [25]. Wang proposes a generalization
to the algorithm, extending the applicability of the EAC for
both crisp and fuzzy clusters [30]. He finds that fuzzy par-
titions is rather advantageous to crisp partitions in Ensemble
Accumulation as the degree of overlapping in fuzzy partition
encodes to an extent how ‘close’ together clusters are [30].
The approach can be summarized as a two step process as
follows,
1. Split: Partition the data matrix Y into some number of parti-
tions Km (may be fixed or randomized within an interval) using
any prototype-based clustering algorithm. Repeat this step M
times.
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
10 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 10. A heat map presenting the consensus matrix resulted from the aggregation
of an SE-SRCE swarm shown in Fig. 9 using Algorithm 4 (Section 4.6). The rows and
columns indicate individual items (in our case: the 320 features) whose consensus
values range from 0 (never clustered together) to 1 (always clustered together)
marked by white to dark blue. The complete linkage cluster dendrogram showing
the degree of redundancy between features is shown above the consensus matrix.
Between the cluster dendrogram and the consensus matrix is the cluster label vector
suggested by the maximum lifetime cut. The output of the consensus clustering is
as shown in Fig. 7. (For interpretation of the references to color in this figure legend,
the reader is referred to the web version of the article.)
2. Merge: Calculate the consensus matrix C and interpret the
ensemble clustering by performing a desired graph algo-
rithm.
Given the data vectors yi ∈ Y, for each clustering m, Km centroid
vectors xk ∈ Xm can be obtained using any prototype-based clus-
tering algorithm (e.g. K-means, fuzzy C-means, Gaussian Mixture
Models). The degree of membership of yi w.r.t xk is a function of
distance calculated as follows,
uik,m =
1 if argmin
xk∈X
d(yi, xk,m)
0 otherwise
u ∈ [0, 1] (16)
uik,m =
d(yi, xk,m)−1/( −1)
K
j=1
d(yi, xj,m)−1/( −1)
, > 1 u ∈ {0, 1}. (17)
Wang argues that using fuzzy partition in consensus clustering is
particularly efficient for suppressing over-segmentation. It is also
more tolerant to noisy information than its crisp counterpart [30].
The conventional approach using Evidence Accumulation (EAC)
[25] and Weighted Evidence Accumulation (WEAC) [31] are
summarized in Algorithm 1. Notice that the pseudocode is sim-
plified using the fuzzy t-norm approach to EAC as introduced in
[30].
Algorithm 1. (Weighted) Ensemble Clustering ((W)EAC
Clustering)
Input dim × N Data Matrix Y, maximum number of prototypes Kmax, number of
repetitions M, Prototype-based clustering algorithm Cluster (e.g. K-means,
Fuzzy C-means), Linkage algorithm Linkage.
Output Crisp Ensemble Partition L
1: for m = {1, . . ., M} do
2: // Partition Y using random number of clusters.
3: Krnd ← random({2, Kmax})
4: {Um, Xm} ← Cluster(Y, Krnd)
5: // Calculate the co-association matrix for each clustering.
6: Cm ← UT
mUm
7: Cm ← D
−1/2
m CmD
−1/2
m
8: end for
9: // Calculate the consensus matrix
10: C ←
M
m=1
wmCm
M
m=1
wm
,
11: // Interpret the consensus matrix using Linkage algorithm
12: HierarchicalTree = linkage(C)
13: th← MaximumLifetime(HierarchicalTree)
14: L ← Cut(HierarchicalTree, th)
15: Note that the threshold for cutting the hierarchical tree is determined
using maximum lifetime method [25].
4.4. Swarm Rapid Centroid Estimation
Yuwono [17] proposes the Swarm Rapid Centroid Estimation
(Swarm RCEr+) algorithm in 2011 [32]. The semi-stochastic clus-
tering algorithm efficiently incorporates the paradigms of Particle
Swarm Optimization (PSO [19]) into the traditional Expectation
Maximization (EM). The statistical validation on benchmark data
suggest that Swarm RCEr+ have a reduced risk of converging to
local minima and leaner computational complexity compared to
earlier evolutionary-algorithm-based clustering approaches [17].
The algorithm was updated in 2014 to further decrease its memory
complexity to be used for Ensemble clustering applications [18].
The RCE algorithm below follows the 2014 preposition.
A particle in an RCE subswarm stores a tuple consisting of a
position vector x and a velocity vector v,
particlek,m = {xk,m, vk,m}. (18)
The position vector of each particle represents the coordinate of
a centroid vector xi ∈ Rdim. In RCE a subswarm is a collection of
centroid coordinates, encoding a possible solution to the clustering
problem. As the RCE swarm consists of M of such subswarm, at
the end of optimization, as many as M clustering solutions can be
obtained.
Each subswarm stores two memory matrices:
1. The self-organizing memory Ym, which is an array of randomly
sampled pointers to the data Y,
Ym = randsample(Y, Á%), (19)
where Á % ∈ {0, 1} denotes the rate of random sampling.
2. The best position memory Xbest
m which stores the position vec-
tors X = {x1, . . ., xKm } that minimizes a given objective function
f (Ym, Xm) throughout the search. A typical objective function is
usually defined as, but not restricted to, the average distortion,
f (Ym, Xm) =
xk∈Xm
yi∈Ym
uik,md(xk, yi)
yi∈Ym
uik,m
(20)
where uik,m can be calculated either using Eq. (16) or Eq. (17).
The RCE swarm Xbest matrix is the union of all Xbest
m such that,
Xbest
=
M
m=1
Xbest
m (21)
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 11
Fig. 11. Trajectory of the Swarm RCE particles recorded after 30 iterations on a toy dataset with numerous random seeding shows Swarm RCE robustness and insensitivity
to initialization. M = 6, tmax = 30, ε = 0.05, ıreset = 15.
On each iteration, the velocity and position of a particle is
updated as follows,
vk,m(t + 1) = vk,m(t) + «k,m(t) (22)
xk,m(t + 1) = xk,m(t) + vk,m(t + 1) (23)
where « denotes the resultant vector, which consist mainly of the
self organizing term and minimum (best position) term,
«k,m(t) = ϕ1 ◦
self organizing
|Ym|
i=1
uik,m (yi − xk,m(t))
|Ym|
i=1
uik,m
+ ϕ2 ◦
minimum (best position)
⎛
⎝
|Xbest |
j=1
qjk,m (xbest
j
(t) − xk,m(t))
|Xbest |
j=1
qjk,m
⎞
⎠,
= ϕ1 ◦ (E[Ym|Xm = xk,m] − xi,m)
+ϕ2 ◦ (E[Xbest|Xm = xk,m] − xk,m),
(24)
where ϕ ∈ {0, 1} ∈ Rdim denotes a uniform random vector; uik,m
denotes the cluster membership when Ym is mapped to Xm; while
qjk,m denotes the cluster membership when Xbest is mapped to Xm.
Should the self-organizing vector of a particle equals 0, xi will
be directed to xI win,m, the position of the winning particle. xIwin,m
is a particle in the mth subswarm whose cluster has the largest
cardinality.
The RCE is equipped with two strategies to cope with suboptimal
convergence including substitution and particle reset as follows:
1. Substitution strategy forces particles in a search space to reach
alternate equilibrium positions by introducing position instabil-
ity. After each position update episode for a particle, apply
{xi(t + 1), vi(t + 1)} =
{xI win(t + 1) + N(0, ), 0} if ϕ < ε
{xi(t + 1), vi(t + 1)} otherwise
(25)
where ϕ is a uniform random number ϕ ∈ {0, 1}, and N(0, ) is
a Gaussian random vector with mean = 0 and standard devia-
tion of each dimension of the data being clustered. ε denotes
the substitution probability parameter. Larger ε increases the fre-
quency. Optimal ε values lie between 0.01 ≤ ε ≤ 0.05 [17]. RCE
with substitution strategy enabled is denoted with the super-
script +.
2. Particle reset strategy is triggered when fitness of the local
minimum f (Ym, Xbest
m (t)) does not improve after a number of
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
12 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
iterations. Stagnation can be detected using a stagnation counter
ı which is updated as follows:
ı(t + 1) =
ı(t) + 1 if f (Ym, X(t)) ≥ f (Ym, Xbest(t))
0 otherwise
. (26)
When ı(t + 1) > ımax this strategy reinitializes all particles in a
subswarm without resetting the local minimum position matrix.
Values being reinitialized are only xk(t) and vk(t). Swarm conver-
gence is detected when f (Ym, Xbest(t)) does not improve after
a number of resets. RCE with particle reset strategy enabled is
denoted with the superscript r.
The algorithm pseudocode is shown in Algorithm 2. An illus-
tration of the search trajectory of the swarm on a toy example is
shown in Fig. 11.
Algorithm 2. Swarm RCEr+
Input Data points Y = {y1, . . ., yN } ∈ Rdim
, # of clusters K.
Output Swarm centroid vectors
Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).
2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, Á%).
3: repeat
4: for all m ∈ {1, . . ., M} do
5: Calculate Um from the pairwise distance between Xm
and Ym,
6: Calculate Qm from the pairwise distance between Xm
and Xbest
,
7: Store Xbest
m which minimizes f (Ym, Xm) throughout the
search,
8: Vm ← Vm + «m,
9: Xm ← Xm + Vm,
10: Redirect particles with zero cardinality toward the
particle whose cluster has the largest cardinality.
11: Apply substitution with rate of ε
12: if f (Ym, Xbest
m ) does not improve after ıreset iterations
then
13: Reinitialize subswarm (randomize(Xm), Vm = 0)
14: end if
15: end for
16: until Convergence or maximum iteration reached
17: return Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
4.5. Self Evolving Swarm RCE
In this implementation we introduce a new self-evolution
criterion to the RCE which allows each subswarm to summon
additional particles at will until the target cluster entropy is
satisfied.
The uncertainty for a fuzzy membership value uik ∈ {0, 1} [33]
can be quantified as follows,
hik,m = uik,m log uik,m. (27)
Bezdek argues that a good clustering can be achieved when hik,m is
minimized [33]. The average cluster entropy is then,
Hm = −
1
Km|Ym|
Km
k=1
|Ym|
i=1
uik,m log uik,m, (28)
where Um is calculated from Xbest
m . Hm close to 0.5 indicates a
possible underpartitioning. Hm very close to 0 may also indicate
overpartitioning.
Hm is only investigated each when there is an update to Xbest
m
where the number of non-empty clusters is equal to Km such that
|Cbest
m | = Km. If Hm is larger than the target entropy m, the number
of particles incremented using the following rule,
Km(t) =
Km(t) + z+
r if Hm > m,
Km(t) otherwise,
(29)
where Km(t) denotes the number of particles in the swarm m at the
current iteration t, z+
r denotes an upper-bounded random integer,
z+
r ∈ Z+ = [1, 2, . . ., z+
max], while m ∈ {0, 0.5} denotes a target Hm.
Using this approach each subswarm to automatically adjusts Km
until the entropy criterion is satisfied.
The desired granularity and diversity of the swarm can be con-
trolled by setting or randomizing the value of m. The growth speed
of the swarm can be controlled by setting z+
r . As the subswarms
infer Km automatically from Hm, the need of specifying the ran-
domization interval is now abolished (recall that in EAC and WEAC
K-means, Km is randomized within a pre-specified upper and lower
bound).
The pseudocode of the Self-Evolving Swarm RCEr+ (SE-SRCE) can
be seen in Algorithm 3. A typical summary of an execution of SE-
SRCE can be seen in Fig. 12.
Algorithm 3. Self-Evolving Swarm RCEr+ (SE-SRCE)
Input Data points Y = {y1, . . ., yN } ∈ Rdim
, # of clusters K.
Output Swarm centroid vectors
Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).
2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, Á%).
3: repeat
4: for all m ∈ {1, . . ., M} do
5: Execute Algorithm 2 lines 5–14,
6: if f (Ym, Xm) improves then
7: // Check whether the entropy criterion is satisfied and
whether all subswarms are nonempty
8: if |Cbest
m | = Km and Hm > m then
9: Km ← Km + z+
r
10: end if
11: end if
12: end for
13: until Convergence or maximum iteration reached
14: return Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
4.6. Ensemble Rapid Centroid Estimation using Self-Evolving
Swarm
Ensemble RCE (ERCE) [18] is an ensemble extension to the
Swarm RCEr+. The algorithm is shown to be relatively leaner com-
plexity compared to conventional ensemble clustering algorithms
[18], achieving up to quasilinear complexity in both time and space
[18].
In this application we propose incorporating the proposed
SE-SRCE into the ERCE framework. As the size of the evidence accu-
mulation matrix is still relatively manageable (recall that since
there are 320 features = 160 magnitude features + 160 spectral cen-
troid features, the size of C is 320 × 320), EAC can be performed
without using the co-association tree compression process pro-
posed in the original paper [18,34]. However, it needs to be noted
that should the number of features increase up to thousands, it is
advisable that the co-association tree compression is utilized. Fur-
ther information on the co-association tree can be read in Wang’s
paper [34].
In order to interpret the final clustering, we need to clarify that in
our application each cluster represents “a group of more redundant
features”. For each feature cluster, a feature with the largest entropy
is selected as a characteristic feature for the cluster. The pseudocode
of ERCE used in our application is shown in Algorithm 4.
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 13
Algorithm 4. Ensemble Rapid Centroid Estimation (ERCE)
Input dim × N Data Matrix Y, number of subswarms M, fuzzification
constant , target entropy for each subswarm { 1, . . ., M}, Linkage
algorithm Linkage.
Output Crisp Ensemble Partition L
Xbest
← SE − SRCE(Y)
for all m ∈ {1, . . ., M} do
Given Y and Xbest
m , calculate Um using Eq. (17).
// Calculate the co-association matrix for each clustering.
Cm ← UT
mUm
Cm ← D
−1/2
m CmD
−1/2
m
end for
C ←
M
m=1
wmCm
M
m=1
wm
,
HierarchicalTree = linkage(C)
th← MaximumLifetime(HierarchicalTree)
L ← Cut(HierarchicalTree, th)
// interpreting the final partition
for all Ck ∈ {C1, . . ., YL max} do
// For each feature cluster, the characteristic feature is the feature with
highest entropy
ycharacteristic
k
= argmaxy∈Ck
− py(x) log py(x)dx
end for
5. Experimental data
The ASHRAE Project 1312-RP modeled and reported a wide vari-
ety of faults in three different seasons. The experiments include two
HVAC systems running side by side with identical zone load. Fault
test was conducted in Air Handling Unit (AHU)-A, meanwhile nor-
mal operation was running in AHU-B. By comparing AHU A and
B fault characteristics were recorded. ASHRAE-1312-RP datasets
included detailed experimental result from Summer 2007, Spring
2008, and Winter 2008. In each season different types of faults
were generated, recorded and reported. Readings from 160 sig-
nals sources during normal operation and various fault scenarios
were recorded. The data was sampled every minute from 6:00 to
18:00. The faults reported in the ASHRAE-1312-RP datasets as well
as a summary on the behavior of the feature proposed by Li [20],
were described in Table 1. Note that the features used in this table
are not part of our research but rather to illustrate how a static
model would struggle during varying seasons. This is because the
features that are important in one season may not be as important
in other seasons. The feature that we use throughout the paper is
determined dynamically using consensus clustering based on the
unique behavior in each season.
6. Result
Based on the features in Table 1, we can see that faults such as
OASB, MADU and HCSF are particularly difficult to identify using Li’s
model [20]. In this section we present the experimental result of our
proposed unsupervised feature selection method. In this section we
wish to investigate the following:
1. What the characteristic features for each season are, and
2. Whether the selected features improves the generalization capa-
bility of an AFDD algorithm in general. In particular, we are
interested in whether we can reliably identify OASB, MADU, and
HCSF using the features selected by our proposed method.
Our approach is as follows. From each dataset (Summer 2007,
Spring 2008, and Winter 2008), as many as 160 time signals, and
a vector recording the time of the day were reported. Using the
method described in Section 3.1 as many as 320 + 1 additional fea-
ture could be extracted including:
• Magnitude features from 160 sensor and control signals,
• Spectral centroid features from 160 sensor and control signals.
• Time of the day (1 feature),
For clarity, the step-by-step process of the experiment can be
summarized as follows:
1. Select a season and get the raw signals during normal operations.
2. For each raw signal, isolate the magnitude and spectral centroid
components and calculate the fuzzy feature representation using
the method described in Section 3.
3. Find the characteristic features using a consensus clustering
algorithm (Our approach uses ERCE: Algorithm 4).
4 . Append the time-of-the-day feature as an additional feature.
5. Using the selected features, train a model (Our approach uses
NARX-TDNN) using the data in Table 1. For each type of fault,
randomly partition the data as follows:
• 15% as training set,
• 15% as validation set, and
• 70% as test set.
6. Investigate the results on the test set to see whether using the
selected features increases/decreases the classifier’s generaliza-
tion capability.
6.1. Feature selection result
We wish to keep the number characteristic feature to a reason-
able level (e.g. between 4 and 30) to ensure that the generalization
capability of the classifier is not undermined. The parameters of
both ERCE, EAC K-means, and WEAC K-means were selected based
on the assumption derived using the method illustrated in Fig. 12.
From the average entropy-distortion scatter for each season such
as depicted in Fig. 12, we approximated the number of character-
istic features to be around 5–30 or the average cluster entropy of
0.005–0.05.
The parameters used for ERCE were as follows. The initial num-
ber of particles was set to 2, the number of subswarms was set to
60, substitution probability ε was set to 3%, ıreset was set to 15, the
distance metric was set to KL-divergence, fuzzifier was set to 1.2,
the entropy threshold for each subswarm m was uniformly ran-
domized between 0.005 and 0.05, z+
max = 2, maximum number of
iterations was set to 100, and the linkage method was set to com-
plete linkage. KL-divergence and complete linkage were selected
as the physical model of the HVAC was assumed to be unknown
and even a subtle difference in temporal patterns/shapes could be
an important predictive component for specific types of fault. Com-
plete linkage favors the formation of small spherical clusters which
is particularly useful for capturing these subtle differences. Opti-
mum cut was then conventionally calculated using the maximum
lifetime criterion [25]. Subswarms were equally weighted during
ensemble aggregation such that w1,...,M = 1.
Further investigation was also performed in order to benchmark
the quality of the feature selected by the method. Benchmark unsu-
pervised feature selection methods includes EAC K-means [25],
WEAC K-means [31], and a traditional complete linkage agglomer-
ative clustering (CL). CL was utilized to verify the advantages of the
consensus approaches to a conventional graph-based approach. In
this experiment, the CL hierarchical tree is cut using inconsistency
criterion, with inconsistency coefficient = 1, returning as many as
84 clusters, thus 84 characteristic features.
The parameters for EAC K-means and WEAC K-means were set
as follows. The number of repetitions was set to 60, the number
of clusters k was uniformly randomized between 5 and 30. The
distance metric was set to KL-divergence. The linkage method was
set to complete linkage as per discussion. The optimum cut was
calculated using the maximum lifetime criterion [25]. Weights for
WEAC K-means were calculated using the average silhouette width
criterion [35].
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus-
teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLEINPRESSGModel
ASOC29831–24
14M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx
Table 1
ASHRAE-1312-RP dataset description and symptoms using features described in Shun Li’s model [20].
# Name Description HWC-
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Summer
2007
1 NOR0819 Normal Operation
2 NOR0825 Normal Operation
3 EADS0820 EA Damper Stuck (Fully
Open)
0 0 0 0 + + + + 0 + + 0 0 0 0 0
4 EADS0821 EA Damper Stuck (Fully
Close)
0 0 0 0 − − − − 0 − − 0 0 0 0 0
5 RFF0822 Return Fan at fixed
speed (30% speed)
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
6 RFF0823 Return Fan complete
failure
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
7 CHWC0824 Cooling Coil Valve
Control unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(Reduce PID
Proportional Band by
half)
8 CHWC0903 Cooling Coil Valve
Reverse Action
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
9 OADS0826 OADS OA Damper Stuck
(Fully Closed)
0 0 0 0 ++ ++ ++ ++ 0 + − 0 0 0 0 0
10 CHWV0827 Cooling Coil Valve Stuck
(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
11 CHWV0831 Cooling Coil Valve Stuck
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
12 CHWV0901 Cooling Coil Valve Stuck
(Partially Open – 15%)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
13 CHWV0902 Cooling Coil Valve Stuck
(Partially Open – 65%)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
14 HCL0828 Heating Coil Valve
Leaking (Stage 1 –
0.4GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
15 HCL0829 Heating Coil Valve
Leaking (Stage 2 –
1.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
16 HCL0830 Heating Coil Valve
Leaking (Stage 3 –
2.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
17 OADL0905 OA Damper Leaking
(45% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
18 OADL0906 OA Damper Leaking
(55% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
19 AHUL0907 AHU Duct Leaking (after
SF)
0 0 + + + + + + + + + 0 0 0 0 0
20 AHUL0908 AHU Duct Leaking
(before SF)
0 0 0 0 −− −− −− −− 0 −− −− 0 0 0 0 0
Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus-
teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLEINPRESSGModel
ASOC29831–24
M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx15
Table 1 (Continued)
# Name Description HWC-
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Spring
2008
1 NOR0502 Normal Operation
2 NOR0503 Normal Operation
3 NOR0504 Normal Operation
4 NOR0505 Normal Operation
5 NOR0509 Normal Operation
6 OASB0529 OA temperature sensor
bias (+3F)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 OASB0530 OA temperature sensor
bias (−3F)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 OADS0507 OA Damper Stuck (Fully
Close)
0 0 0 0 + + + + − + −− 0 0 0 0 0
9 OADS0508 OA Damper Stuck (40%
open)
0 0 0 0 + + + + − + −− 0 0 0 0 0
10 EADS0527 EA Damper Stuck (Fully
open)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 EADS0510 EA Damper Stuck (Fully
Close)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
12 EADS0511 EA Damper Stuck (40%
open)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
13 CHW0506 Cooling Coil Valve Stuck
(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
14 CHW0515 Cooling Coil Valve Stuck
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
15 CHW0516 Cooling Coil Valve Stuck
(Partially Open – 50%)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
16 RFF0512 Return Fan complete
failure
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
17 RFF0518 Return Fan at fixed
speed (20%spd)
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
18 RFF0519 Return Fan at fixed
speed (80%spd)
0 0 0 0 0 0 ++ ++ 0 ++ 0 0 0 0 0 0
19 AFAB0522 Air filter area block fault
(10%)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 AFAB0525 Air filter area block fault
(25%)
0 0 0 0 + + + + 0 0 0 0 0 0 0 0
21 MADU0513 Mixed air damper
unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 MADU0514 Mixed air damper
unstable/Cooling coil
control unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 HCSF0517 Sequence of heating and
cooling unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 HCSF0601 Supply Fan control
unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus-
teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLEINPRESSGModel
ASOC29831–24
16M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx
Table 1 (Continued)
# Name Description HWC-
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Winter
2008
1 NOR0129 Normal Operation
2 NOR0216 Normal Operation
3 NOR0217 Normal Operation
4 OADS0212 OA Damper Stuck (Fully
Close)
−− −− 0 0 ++ + ++ + −− ++ −− 0 − 0 0 0
5 OADL0213 OA damper leaking (52%
open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
6 OADL0215 OA damper leaking (62%
open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
7 EADS0202 EA Damper Stuck (Fully
open)
0 0 0 0 0 0 0 0 0 + + 0 0 0 0 0
8 EADS0203 EA Damper Stuck (Fully
Close)
− −− 0 0 0 0 0 −− 0 −− −− 0 0 0 0 0
9 CHW0210 Cooling Coil Valve Stuck
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 − 0 0 ++ −
10 CHW0211 Cooling Coil Valve Stuck
(Partially Open – 20%)
+ + + + 0 0 0 0 0 0 0 0 0 0 ++ 0
11 HCF0205 Heating Coil Fouling
Stage 1
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
12 HCF0206 Heating Coil Fouling
Stage 2
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
13 HCRC0207 Heating coil reduced
capacity Stage 1
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 HCRC0208 Heating coil reduced
capacity Stage 2
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 HCRC0209 Heating coil reduced
capacity Stage 3
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A plug {0(a)
, +(b)
, ++(c)
, −(d)
, −−(e)
} indicates that the value for the variable is: (a) 0: unchanged (the fault has no effect on the corresponding variable); (b) +: greater than normal; (c) ++: substantially greater than normal; (d) −:
less than normal; (e) −−: substantially less than normal.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 17
200 400 600 800 1000
0
0.1
0.2
0.3
0.4
iteration
Ave.ClusterEntropy
200 400 600 800 1000
0
10
20
30
40
iteration
NumberofClusters
200 400 600 800 1000
10
−2
10
0
10
2
iteration
Ave.Distortion
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 5 10 15 20 25
Number of Clusters
ClusterEntropy
0 5 10 15 20 25
0
10
20
30
40
50
60
Number of Clusters
AverageDistortion
0 0.1 0.2 0.3 0.4 0.5
0
10
20
30
40
50
60
Cluster Entropy
AverageDistortion
0
0.1
0.2
0.3
0.4
0.5
0
5
10
15
20
25
0
10
20
30
40
50
60
Number of Clusters
Cluster Entropy
AverageDistortion
Fig. 12. The scatter plot of the average distortion with respect to cluster entropy and the number of clusters extracted after a run of SE-SRCE with = 1.2. The top graphs show
the cross-sectional plots of the three parameters during optimization of SE-SRCE, leading to the creation of the bottom scatter plot. The appropriate entropy range/K range
can be investigated by observing Km, Hm, and f (Ym, X) trade-offs so that both distortion and entropy can be minimized while keeping the number of clusters to a reasonable
level.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
18 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
We measured the appropriateness of the feature selection
method by investigating the normalized mutual information (NMI)
between features [26]. Mutual information examines the depen-
dence between two discrete distributions X and Y. Minimizing
mutual information is equal to maximizing the KL-divergence
between the cross-entropy H(X, Y) and the marginal entropies (H(X)
and H(Y)) as follows,
NMI(X; Y) =
I(X; Y)
H(X)H(Y)
,
=
H(X) + H(Y) − H(X, Y)
H(X)H(Y)
,
=
x∈X y∈Y
p(x, y)(log p(x, y)/(p(x)p(y)))
x∈X
p(x) log p(x) y∈Y
p(y) log p(y)
,
(30)
where X and Y in our case was a pair of fuzzy feature signals (y1 and
y2 calculated using Eq. (5)), rounded to the nearest integer, such
that
X(n) = round(y1(n)), X(n) ∈ {−1, 0, 1}, (31)
and
Y(n) = round(y2(n)), Y(n) ∈ {−1, 0, 1}. (32)
The NMI is calculated by marginalizing the probability of co-
occurrence between these three discrete categories. For a pair of
signals, NMI closer to 1 indicates that the feature pair is redun-
dant. For each feature set, the strictly upper triangular of the
pairwise NMI matrix is taken and the median, 75 percentile, and
95 percentile is averaged over 80 runs. Since we want to minimize
redundancies between features, a good feature set is characterized
by an average NMI closer to 0. Table 2 summarizes the result of the
experiment.
The characteristic features in each season were unique from
those of other seasons. In order to analyze the important features
for each season, we repeated the clustering process 200 times. From
this process, three histograms describing the probability of occur-
rence of the characteristic features for each season were reported
in Fig. 13. The probability of occurrence was calculated as the fre-
quency of appearance divided by the number of trials.
The overall patterns for fault classes for each season based on the
characteristic features are presented in Figs. 14–16, respectively.
Each circle in these figures show the condition of the characteristic
features during a specific fault in the HVAC system.
6.2. Classification result
Generalization capability of a classifier is a powerful indicator of
the quality of the features. Using the characteristic features selected
using the proposed method, a classifier can be trained with less
computational burden and less probability of overfitting (note that
in our experiment, 30% of the data was equally divided into train-
ing and validation sets, the remaining 70% is used as test set). The
classifier were trained and tested using the fuzzy features, ys, as is
shown in Figs. 14–16.
The parameters for NARX-TDNN are set as follows. The number
of hidden neurons was set to 10. The input layer, hidden layer, and
feedback orders were set to 2. The architecture is illustrated in Fig. 8.
The dataset was divided at random to be used for training (15%),
validation (15%), and test (70%) sets. The training was done using
Levenberg–Marquardt algorithm. The experiment was repeated 80
times for each season to test the reliability and repeatability of the
method. Using the features shown in Figs. 14–16, the average sen-
sitivity and specificity of the proposed method compared to Li’s
manual feature selection approach is presented in Table 3.
The quality of the feature sets selected by ERCE was bench-
marked against the features selected by EAC K-means, WEAC
K-means, and Complete Linkage. The features selected by these
four competing algorithms were supplied for both NARX-TDNN
and Hidden Markov Models (HMM) [11–13], where the training
and testing for both classifiers were repeated 100 times for each
pair of feature selection and classification algorithm. The weighted
average (WA) sensitivity and WA specificity result are reported in
Table 4.
The significance of the experimental result were validated using
paired t-test with null hypotheses as follows:
1. H∗
0
: The performance of a classifier using features from ERCE is
not significantly better than using features from algorithm X. A
star (*) in Tables 3 and 4 indicates that H∗
0
should be rejected,
whereas no sign indicates otherwise.
2. H
†
0
: Given the same feature selection algorithm, a trained
classifier A does not exercise significantly better performance
compared to classifier B. A dagger (†) in Table 4 indicates that H
†
0
should be rejected, whereas no sign indicates otherwise.
7. Discussion
As the proposed feature selection process is strictly unsu-
pervised, analyzing the result leads to a number of interesting
observations.
With regards to the redundancies between features,
it can be seen in Table 2 that all consensus algorithms
(Median NMIERCE = 0.019, Median NMIEAC Kmeans = 0.040, Median
NMIWEAC Kmeans = 0.048) in general outperformed CL (Median
NMI = 0.1305), manual selection (Median NMI = 0.0199, Q75%
NMI = 0.2227), and no selection (Median NMI = 0.1857). The three
consensus algorithms reported less than 20 characteristic features
on average, which is at least four times lower than the number
of characteristic features selected using CL. Furthermore, the
features selected by ERCE (Median NMI = 0.019 ± 0.004) outper-
formed those that are selected by other consensus algorithms:
EAC K-means (Median NMI = 0.040 ± 0.011) and WEAC K-means
(Median NMI = 0.048 ± 0.034) as indicated by its low NMI. ERCE
also had smaller standard deviations on all performance aspects,
especially on the number of features, suggesting the relatively
high reliability and repeatability of the proposed swarm-based
consensus clustering algorithm.
With regards to the reliability of the feature selection algorithm,
ERCE consistently selects features that are unique and relevant to
the faults in the corresponding year, as can be seen in Fig. 13. For
example, throughout the experiment using Winter 2008 dataset,
ERCE consistently selected HWC-VLV, PLN-TMP, EA-DMPR, HWC-
DAT and HWP-GPM, which are ones of the important features for
the specific season. Pattern for the Winter 2008 dataset is shown in
Fig. 16. In this figure, the pattern for Exhaust Air Damper Stuck
(EADS) faults can be easily distinguished among the others by
observing the conditions of both EA-DMPR and PLN-TMP. Simi-
larly, HCRC faults in this season are characterized by abnormal
HWC-VLV and VAV-DMPR signals. CHW faults are also observable
from an increase in HWC-DAT as the system compensates for the
increased flow of chilled water due to the faulty cooling coil valve.
ERCE also appropriately discovers that SC CHWC-GPM is a partic-
ularly important feature in Spring 2008 due to HCSF0517, as has
been discussed previously in Section 3. ERCE discovers that outside
air damper (OA-DMPR) is consistently inside the atypical nega-
tive region during HCSF faults. This information may be useful for
further investigation of the nature of the particular fault.
Regarding the effects of the proposed feature selection algo-
rithm to classifier performances, the result of ERCE+NARX-TDNN,
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 19
Table 2
The Normalized Mutual Information (NMI) between features selected using various feature selection algorithm on Spring 2008 dataset. Boldface indicates the lowest NMI
(the least redundancies between features).
Feature selection method
Without feature selection Manual selection [20] CL
# of Features 320 16 84
NMI between characteristic feature pairs
Median 0.1857 0.0199 0.1305
Q75% NMI 0.4110 0.3014 0.2227
Q95% NMI 0.8821 0.4899 0.4863
Feature selection method
EAC k-Means WEAC K-means ERCE
# of Features 15.90 ± 3.86 16.70 ± 4.73 17.20 ± 1.60
NMI between characteristic feature pairs
Median 0.040 ± 0.011 0.048 ± 0.034 0.019 ± 0.004
Q75% NMI 0.106 ± 0.025 0.131 ± 0.068 0.078 ± 0.013
Q95% NMI 0.404 ± 0.035 0.364 ± 1.600 0.339 ± 0.031
particularly in the Spring 2008 shows a clear advantage of ERCE
to other feature selection approaches. As can be seen in Table 3,
when compared to the manual selected features as suggested
by Li [20], supplying NARX-TDNN with the feature selected by
ERCE results in consistent specificity improvements in Spring 2008.
Moreover overall statistically significant weighted average per-
formance improvements are also observed throughout Summer
2007, Spring 2008, and Winter 2008 based on our experiment.
Based on the statistical results in Table 4, using features from
Li and EAC K-means limits NARX-TDNN’s specificity at an aver-
age around 91.54% and 91.85% respectively. The low average may
be attributed to misclassification of a number of more ambigu-
ous faults such as OASB, MADU, AFAB and HCSF. This report
is consistent with Li’s observation, presented in Table 1 where
Fig. 13. Representative feature occurrence histogram for each season after 200 clustering trials. The x-axis denotes the specific label for each feature, y-axis denotes the
probability of occurrence, calculated as the frequency of appearance divided by the number of trials.
850
851
852
853
854
855
856
857
858
859
860
861
862
863
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
20 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
NOR0819
NOR0825
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
EADS0820
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
EADS0821
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
RFF0822
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
RFF0823
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWC0824
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWC0903
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADS0826
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0827
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0831
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0901
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0902
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0828
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0829
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0830
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADL0905
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADL0906
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
AHUL0907
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
AHUL0908
Fig. 14. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Summer 2007 dataset.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 21
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
NOR0502
NOR0503
NOR0504
NOR0505
NOR0509
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OASB0529
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OASB0530
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OADS0507
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OADS0508
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0527
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0510
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0511
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0506
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0515
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0516
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0512
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0518
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0519
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
AFAB0522
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
AFAB0525
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
MADU0513
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
MADU0514
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
HCSF0517
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
HCSF0601
Fig. 15. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Spring 2008 dataset.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
22 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
1
2
3
4
5
6
7
−1.0 0.0 1.0
NOR0129
NOR0216
NOR0217
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADS0212
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0213
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0215
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0202
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0203
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0210
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0211
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0205
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0206
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0207
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0208
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0209
Fig. 16. Patterns constituted by the characteristic features for each data in the ASHRAE-1312 Winter 2008 dataset.
these faults seem to have no effects on the manually selected
features. Similar cases are seen with WEAC K-means and com-
plete linkage. Using features from ERCE allows NARX-TDNN to
reach a significantly higher specificity average of 98.37% ± 0.25%.
The significance of the results are statistically validated on both
Summer 2007 and Spring 2008 datasets, where signals exhibit
more nonlinearities compared to those in the Winter 2008
dataset.
Regarding the general performance of the classifiers, results in
Table 4 show the comparative performance between HMM and
NARX-TDNN. While HMM shows superior specificity in Winter
2008 dataset, its specificity in Spring 2008 and Summer 2007
is relatively not as high. This is arguably due to the nonlin-
earities in the fault patterns in Spring 2008 and Summer 2007
datasets compared to Winter 2008 faults. For instance, it can
be seen in Fig. 15 that MADU, AFAB and HCSF faults exhibit
visually ambiguous patterns. When dealing with these nonlinear
datasets, the NARX-TDNN classifier benefits from its capabil-
ity in dealing with long-term dependencies. Table 4 shows that
NARX-TDNN was capable in distinguishing these faults, achiev-
ing specificity of 98.37% ± 0.25% using the features provided by
ERCE.
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 23
Table 3
NARX-TDNN classification result.
Fault type Feature selection method
Manual selectiona
ERCEb
Sensitivity Specificity Sensitivity Specificity
Summer 2007
NOR 99.9% ± 0.1% 98.1% ± 1.6% 99.9% ± 0.2% 99.0% ± 2.1%
EADS 99.7% ± 0.5% 99.5% ± 2.7% 99.8% ± 0.3% 98.9% ± 2.5%
RFF 99.9% ± 0.0% 99.0% ± 2.7% 99.9% ± 0.1% 99.5% ± 1.4%
CHWC 99.9% ± 0.2% 99.0% ± 1.1% 99.8% ± 0.2% 99.0% ± 4.4%
OADS 99.9% ± 0.2% 98.0% ± 2.2% 99.9% ± 0.3% 97.3% ± 3.1%
CHWV 99.8% ± 0.3% 99.0% ± 4.3% 99.7% ± 0.9% 99.2% ± 2.5%
HCL 99.7% ± 0.4% 98.0% ± 1.0% 99.7% ± 0.3% 98.4% ± 2.4%
OADL 99.7% ± 0.5% *
95.2% ± 7.1% 99.9% ± 0.2% 98.0% ± 1.2%
AHUL 99.8% ± 0.2% 99.8% ± 1.1% 99.9% ± 0.1% 99.5% ± 2.6%
Weighted average 99.8% ± 0.1% *
96.8% ± 2.2% 99.8% ± 0.1% 98.4% ± 0.7%
Spring 2008
NOR 99.8% ± 0.3% 99.3% ± 2.1% 99.9% ± 0.1% 99.6% ± 0.6%
OASB 99.1% ± 1.5% *
95.0% ± 6.1% 99.7% ± 0.3% 99.5% ± 1.4%
OADS 99.9% ± 0.2% *
98.2% ± 1.7% 99.8% ± 0.1% 99.5% ± 0.9%
EADS 99.9% ± 0.1% *
98.3% ± 0.5% 99.9% ± 0.1% 99.0% ± 2.8%
CHW 99.7% ± 0.4% *
98.7% ± 0.8% 99.8% ± 0.2% 99.3% ± 0.7%
RFF 99.9% ± 0.2% *
82.6% ± 33.1% 99.8% ± 0.1% 99.4% ± 0.7%
AFAB 99.7% ± 0.2% *
42.9% ± 17.8% 99.7% ± 0.2% 98.5% ± 4.9%
MADU 98.6% ± 1.6% *
70.4% ± 39.8% 98.9% ± 0.2% 98.0% ± 4.0%
HCSF 99.6% ± 0.6% *
94.7% ± 6.6% 99.9% ± 0.0% 99.5% ± 1.5%
Weighted average 98.9% ± 0.2% *
86.2% ± 5.0% 99.9% ± 0.1% 99.2% ± 0.5%
Winter 2008
NOR 99.6% ± 0.4% 99.3% ± 1.1% 99.8% ± 0.1% 98.3% ± 2.4%
OADS 99.9% ± 0.1% *
95.6% ± 3.8% 99.8% ± 0.2% 98.7% ± 1.4%
OADL 99.8% ± 0.4% 98.5% ± 3.2% 99.5% ± 0.7% 98.5% ± 1.5%
EADS 99.9% ± 0.4% 97.9% ± 1.3% 99.6% ± 0.3% 97.5% ± 2.5%
CHW 99.8% ± 0.4% *
97.5% ± 5.2% 99.6% ± 0.3% 99.1% ± 1.2%
HCF 99.8% ± 0.4% *
95.1% ± 4.5% 99.2% ± 0.7% 97.2% ± 2.9%
HCRC 99.8% ± 0.4% 99.0% ± 2.2% 99.8% ± 0.3% 99.4% ± 1.1%
Weighted average 99.7% ± 0.2% 97.5% ± 0.7% 99.8% ± 0.1% 98.7% ± 0.7%
H∗
0
: The performance of NARX-TDNN using features from ERCE is not significantly better than using manually selected features.
a
Manual selection utilizes Shun Li’s feature set [20].
b
ERCE features are as shown in Fig. 14–16.
*
Reject H∗
0
(˛ = 0.001).
Table 4
Performance comparison with competing feature selection methods, tested against two classification methods: NARX-TDNN and HMM.
Feature selection # of features HMM NARX-TDNN
WA sensitivity WA specificity WA sensitivity WA specificity
Summer 2007
Manual selectiona
16 ± 0.00 *
98.65% ± 0.34% 89.45% ± 2.48% †
99.59% ± 0.12% †
96.81% ± 1.99%
EAC K-means 29.85 ± 17.26 *
98.70% ± 0.50% *
85.01% ± 4.94% †
99.69% ± 0.22% *,†
95.07% ± 3.75%
WEAC K-means 14.14 ± 13.09 *
97.69% ± 0.13% *
72.85% ± 1.48% †
99.79% ± 0.08% *,†
96.85% ± 2.31%
Complete linkage 81.00 ± 0.00 98.71% ± 0.98% 90.49% ± 7.52% †
99.51% ± 0.27% †
96.42% ± 1.16%
ERCE 21.41 ± 4.46 99.15% ± 0.32% 90.85% ± 4.16% †
99.69% ± 0.08% †
97.61% ± 0.85%
Spring 2008
Manual selectiona
16 ± 0.00 98.90% ± 0.54% †
91.54% ± 2.98% *
98.89% ± 0.23% *
86.17% ± 5.01%
EAC K-means 34.56 ± 9.40 98.55% ± 0.42% 91.85% ± 2.68% *,†
99.02% ± 0.81% *
91.92% ± 6.42%
WEAC K-means 33.52 ± 10.32 98.83% ± 0.40% 93.37% ± 2.38% †
99.20% ± 0.49% *
92.37% ± 6.53%
Complete linkage 84 ± 0.00 98.80% ± 0.46% 94.12% ± 2.61% †
99.62% ± 0.17% *
95.14% ± 1.29%
ERCE 19.93 ± 5.19 98.84% ± 0.32% 92.68% ± 2.66% †
99.79% ± 0.10% †
98.37% ± 0.25%
Winter 2008
Manual selectiona
16 ± 0.00 98.81% ± 0.56% *
92.92% ± 0.31% †
99.71% ± 0.15% †
97.51% ± 0.65%
EAC K-means 27.74 ± 7.18 †
99.98% ± 0.14% †
99.85% ± 0.85% 99.49% ± 0.50% 97.87% ± 2.06%
WEAC K-means 21.37 ± 11.75 †
99.96% ± 0.18% 99.79% ± 1.00% 99.59% ± 0.19% 97.68% ± 0.88%
Complete linkage 95 ± 0.00 99.87% ± 0.40% 99.21% ± 2.37% 99.74% ± 0.13% 98.54% ± 1.01%
ERCE 7.88 ± 3.02 99.92% ± 0.31% 99.49% ± 1.43% 99.73% ± 0.19% 98.35% ± 1.16%
H∗
0: The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. H
†
0
: Given the same feature selection algorithm,
a trained classifier A does not exercise significantly better performance compared to classifier B.
a
Manual selection utilizes Shun Li’s feature set [20].
*
Reject H∗
0
(˛ = 0.001).
†
Reject H
†
0
(˛ = 0.001).
8. Conclusion
A method for automating feature selection and classification
of faults for Heating Ventilation and Air-Conditioning (HVAC) sys-
tems using a knowledge-discovery and Neural-Network approach
has been proposed. The core of the method is the Ensemble Rapid
Centroid Estimation (ERCE) which automatically finds characteris-
tic features and discards redundant features. Using these character-
istic features, a Parallel Nonlinear Auto-Regressive Neural Network
with eXogenous inputs and distributed time delays (NARX-TDNN)
is then trained to identify the faults described in ASHRAE-1312-RP
Summer 2007, Spring 2008, and Winter 2008 datasets.
886
887
888
889
890
891
892
893
894
895
896
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
The performance of the proposed unsupervised fea-
ture selection algorithm (ERCE Median NMI = 0.019 ± 0.004)
generally outperformed the conventional consensus clus-
tering including Evidence Accumulation K-means (Median
NMI = 0.040 ± 0.011), Weighted Evidence Accumulation K-means
(Median NMI = 0.048 ± 0.034), and the conventional complete
linkage clustering (Median NMI = 0.1305). ERCE also had smaller
standard deviations on all performance aspects, especially on the
number of features, suggesting the relatively high reliability and
repeatability of the proposed swarm-based consensus clustering
algorithm.
The proposed feature selection method was tested on the
experimental fault data from the ASHRAE-1312-RP datasets includ-
ing Summer 2007, Spring 2008, and Winter 2008 using two
well-established time-domain classifiers: (a) NARX-TDNN; and (b)
Hidden Markov Models (HMM). Satisfactory results were reported
and summarized. Our experimental results showed weighted aver-
age sensitivity and specificity of: (a) higher than 99% and 96% for
NARX-TDNN, and; (b) higher than 98% and 86% for HMM on the
ASHRAE-1312-RP datasets. The proposed feature selection method
appears to have positive effect in improving the generalization
capability of both AFDD algorithms based on our experiment.
Notwithstanding the satisfactory result to date, further work
is necessary to investigate the performance of the proposed
method on alternative HVAC systems. Future works will incor-
porate semi-supervised adaptive learning capability for automatic
fault discovery. We are also interested in applying the proposed
consensus clustering method for other applications.
Acknowledgements
This research is funded by The Commonwealth Scientific and
Industrial Research Organisation (CSIRO), Marsfield, Australia. The
ASHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008
fault data are provided by CSIRO. The research is supervised
by CSIRO, the paper writing is supervised specifically by Guo.
Automatic Fault Detection and Diagnosis (AFDD) for the Heating
Ventilation and Air Conditioning (HVAC) research is an ongoing
project in CSIRO Energy Technology and Computational Informat-
ics. We acknowledge the inputs of the anonymous reviewers for
the time and effort in providing our paper comprehensive quality
criticisms. The corresponding author would also like to personally
acknowledge Nina Elita for her contribution, especially in proof
reading and provision of sincere moral support to the correspond-
ing author during the preparation, writing and submission of this
paper.
References
[1] A. Kusiak, M. Li, F. Tang, Modeling and optimization of {HVAC} energy con-
sumption, Appl. Energy 87 (2010) 3092–3102.
[2] A. Kusiak, F. Tang, G. Xu, Multi-objective optimization of {HVAC} system with
an evolutionary computation algorithm, Energy 36 (2011) 2440–2449.
[3] J. Wall, Automatic Fault Detection and Diagnosis, 2011 http://www.csiro.au/
Outcomes/Energy/building-fault-detection.aspx
[4] J. Ward, Opticool, 2013 http://www.csiro.au/Organisation-Structure/
Flagships/Energy-Flagship/Opticool.aspx
[5] J. Liang, R. Du, Model-based fault detection and diagnosis of HVAC systems
using support vector machine method, Int. J. Refrig. 30 (2007) 1104–1114.
[6] D. Jacob, S. Dietz, S. Komhard, C. Neumann, S. Herkel, Black-box models for fault
detection and performance monitoring of buildings, J. Build. Perform. Simul. 3
(2010) 53–62.
[7] C. Lo, P. Chan, Y.-K. Wong, A.B. Rad, K. Cheung, Fuzzy-genetic algorithm for auto-
matic fault detection in HVAC systems, Appl. Soft Comput. 7 (2007) 554–560.
[8] J. Schein, S.T. Bushby, N.S. Castro, J.M. House, A rule-based fault detection
method for air handling units, Energy Build. 38 (2006) 1485–1492.
[9] T.M. Rossi, J.E. Braun, A statistical, rule-based fault detection and diagnostic
method for vapor compression air conditioners, HVAC&R Res. 3 (1997) 19–37.
[10] J. Schein, Results from Field Testing of Embedded Air Handling Unit and Variable
Air Volume Box Fault Detection Tools, U.S. Dept. of Commerce, Technology
Administration, National Institute of Standards and Technology, 2006.
[11] J. Wall, Y. Guo, J. Li, S. West, A dynamic machine learning-based tech-
nique for automated fault detection in HVAC systems, in: Proceedings of
the ASHRAE Annual Conference, Montreal, Quebec, Canada, 2011, 2011,
pp. 449–456.
[12] Y. Guo, D. Dehestani, J. Li, J. Wall, S. West, S. Su, Intelligent outlier detection for
HVAC system fault detection, in: Proceedings of the 10th International Healthy
Buildings Conference, Brisbane, Queensland, Australia, 2012, 2012.
[13] Y. Guo, J. Wall, J. Li, S. West, Intelligent model based fault detection and
diagnosis for HVAC system using statistical machine learning methods, in:
Proceedings of the ASHRAE 2013 Winter Conference, Dallas, USA, 2013, 2013.
[14] M. Yuwono, S.W. Su, Y. Guo, J. Li, S. West, J. Wall, Automatic feature selection
using multiobjective cluster optimization for fault detection in a heating venti-
lation and air conditioning system, in: Proceedings of the 2013 1st International
Conference on Artificial Intelligence, Modelling and Simulation, AIMS ’13, IEEE
Computer Society, Washington, DC, USA, 2013, 2013, pp. 171–176, http://dx.
doi.org/10.1109/AIMS.2013.34
[15] W. Deng, X. Yang, L. Zou, M. Wang, Y. Liu, Y. Li, An improved self-adaptive
differential evolution algorithm and its application, Chemometr. Intell. Lab.
Syst. 128 (2013) 66–76, http://dx.doi.org/10.1016/j.chemolab.2013.07.004
[16] L. Wang, C.-X. Dun, W.-J. Bi, Y.-R. Zeng, An effective and efficient differen-
tial evolution algorithm for the integrated stochastic joint replenishment and
delivery model, Knowl.-Based Syst. 36 (2012) 104–114, http://dx.doi.org/10.
1016/j.knosys.2012.06.007
[17] M. Yuwono, S. Su, B. Moulton, H. Nguyen, Data clustering using variants of rapid
centroid estimation, IEEE Trans. Evol. Comput. 18 (2013) 366–377.
[18] M. Yuwono, S. Su, B. Moulton, H. Nguyen, An algorithm for scalable clustering:
ensemble rapid centroid estimation, in: Proceedings of the 2014 IEEE Congress
on Evolutionary Computation, 2014, 2014, pp. 1250–1257.
[19] D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarm
optimization, in: Proceedings of the 2003 IEEE Congress on Evolutionary Com-
putation, 2003, vol. 1, 2003, 2003, pp. 215–220.
[20] S. Li, A Model-Based Fault Detection and Diagnostic Methodology for Secondary
HVAC Systems (Ph.D. thesis), Drexel University, 2014.
[21] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22
(1951) 79–86, http://dx.doi.org/10.1214/aoms/1177729694
[22] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: A resampling-
based method for class discovery and visualization of gene expression
microarray data, Mach. Learn. 52 (2003) 91–118, http://dx.doi.org/10.1023/
A:1023949509487
[23] M.D. Wilkerson, D.N. Hayes, ConsensusClusterPlus: a class discovery tool
with confidence assessments and item tracking, Bioinformatics 26 (2010)
1572–1573.
[24] D.N. Hayes, S. Monti, G. Parmigiani, C.B. Gilks, K. Naoki, A. Bhattacharjee,
M.A. Socinski, C. Perou, M. Meyerson, Gene expression profiling reveals repro-
ducible human lung adenocarcinoma subtypes in multiple independent patient
cohorts, J. Clin. Oncol. 24 (2006) 5079–5090.
[25] A. Fred, A. Jain, Combining multiple clusterings using evidence accumulation,
IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835–850, http://dx.doi.org/10.
1109/TPAMI.2005.113
[26] A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for com-
bining multiple partitions, J. Mach. Learn. Res. 3 (2003) 583–617, http://dx.doi.
org/10.1162/153244303321897735
[27] I.J. Leontaritis, S.A. Billings, Input–output parametric models for non-linear
systems. Part I: Deterministic non-linear systems, Int. J. Control 41 (1985)
303–328, http://dx.doi.org/10.1080/0020718508961129
[28] H. Siegelmann, B. Horne, C. Giles, Computational capabilities of recurrent NARX
neural networks, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27 (1997)
208–215, http://dx.doi.org/10.1109/3477.558801
[29] J.M. Menezes Jr., G. Barreto, A new look at nonlinear time series prediction
with NARX recurrent neural network, in: Ninth Brazilian Symposium on Neural
Networks, 2006. SBRN ’06, 2006, pp. 160–165, http://dx.doi.org/10.1109/SBRN.
2006.7
[30] T. Wang, Comparing hard and fuzzy C-means for evidence-accumulation clus-
tering, in: Proceedings of the 18th International Conference on Fuzzy Systems,
FUZZ-IEEE’09, IEEE Press, Piscataway, NJ, USA, 2009, 2009, pp. 468–473.
[31] F. Duarte, A.L.N. Fred, A. Lourenco, M. Rodrigues, Weighting cluster ensembles
in evidence accumulation clustering, in: Portuguese Conference on Artificial
Intelligence, 2005. EPIA 2005, 2005, pp. 159–167, http://dx.doi.org/10.1109/
EPIA.2005.341287
[32] M. Yuwono, S.W. Su, B.D. Moulton, H.T. Nguyen, Fast unsupervised learning
method for rapid estimation of cluster centroids, in: Proceedings of the 2012
IEEE Congress on Evolutionary Computation, 2012, 2012, pp. 889–896.
[33] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: G.
Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical
Taxonomy, Freeman, San Francisco, CA, 1975, 1975, pp. 143–166.
[34] T. Wang, Ca-tree: a hierarchical structure for efficient and scalable
coassociation-based cluster ensembles, IEEE Trans. Syst. Man Cybern. Part B:
Cybern. 41 (2011) 686–698, http://dx.doi.org/10.1109/TSMCB.2010.2086059
[35] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation
of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042

More Related Content

What's hot

Fault diagnosis of a high voltage transmission line using waveform matching a...
Fault diagnosis of a high voltage transmission line using waveform matching a...Fault diagnosis of a high voltage transmission line using waveform matching a...
Fault diagnosis of a high voltage transmission line using waveform matching a...
ijsc
 
Muravin The fundamentals of Structural Health Monitoring using Acoustic Emis...
Muravin  The fundamentals of Structural Health Monitoring using Acoustic Emis...Muravin  The fundamentals of Structural Health Monitoring using Acoustic Emis...
Muravin The fundamentals of Structural Health Monitoring using Acoustic Emis...
mboria
 
International Journal of Engineering (IJE) Volume (3) Issue (1)
International Journal of Engineering (IJE) Volume (3)  Issue (1)International Journal of Engineering (IJE) Volume (3)  Issue (1)
International Journal of Engineering (IJE) Volume (3) Issue (1)
CSCJournals
 

What's hot (19)

A New Hybrid Robust Fault Detection of Switching Systems by Combination of Ob...
A New Hybrid Robust Fault Detection of Switching Systems by Combination of Ob...A New Hybrid Robust Fault Detection of Switching Systems by Combination of Ob...
A New Hybrid Robust Fault Detection of Switching Systems by Combination of Ob...
 
APPLICATION SPECIFIC USAGE CONTROL IMPLEMENTATION VERIFICATION
APPLICATION SPECIFIC USAGE CONTROL IMPLEMENTATION VERIFICATIONAPPLICATION SPECIFIC USAGE CONTROL IMPLEMENTATION VERIFICATION
APPLICATION SPECIFIC USAGE CONTROL IMPLEMENTATION VERIFICATION
 
computers in clinical development
 computers in clinical development computers in clinical development
computers in clinical development
 
Structural Health Monitoring Presentation
Structural Health Monitoring PresentationStructural Health Monitoring Presentation
Structural Health Monitoring Presentation
 
Fault diagnosis of a high voltage transmission line using waveform matching a...
Fault diagnosis of a high voltage transmission line using waveform matching a...Fault diagnosis of a high voltage transmission line using waveform matching a...
Fault diagnosis of a high voltage transmission line using waveform matching a...
 
012_22796ny071214_94_101
012_22796ny071214_94_101012_22796ny071214_94_101
012_22796ny071214_94_101
 
Kostogryzov 10.12.2009
Kostogryzov 10.12.2009Kostogryzov 10.12.2009
Kostogryzov 10.12.2009
 
008_23035research061214_49_55
008_23035research061214_49_55008_23035research061214_49_55
008_23035research061214_49_55
 
Structural health monitoring 2011-wei fan-83-111
Structural health monitoring 2011-wei fan-83-111Structural health monitoring 2011-wei fan-83-111
Structural health monitoring 2011-wei fan-83-111
 
17 swarnkarpankaj 154-162
17 swarnkarpankaj 154-16217 swarnkarpankaj 154-162
17 swarnkarpankaj 154-162
 
Muravin The fundamentals of Structural Health Monitoring using Acoustic Emis...
Muravin  The fundamentals of Structural Health Monitoring using Acoustic Emis...Muravin  The fundamentals of Structural Health Monitoring using Acoustic Emis...
Muravin The fundamentals of Structural Health Monitoring using Acoustic Emis...
 
Role of computer in clinical development
Role of computer in clinical developmentRole of computer in clinical development
Role of computer in clinical development
 
COMPARISON OF ANFIS AND ANN TECHNIQUES IN THE SIMULATION OF A TYPICAL AIRCRAF...
COMPARISON OF ANFIS AND ANN TECHNIQUES IN THE SIMULATION OF A TYPICAL AIRCRAF...COMPARISON OF ANFIS AND ANN TECHNIQUES IN THE SIMULATION OF A TYPICAL AIRCRAF...
COMPARISON OF ANFIS AND ANN TECHNIQUES IN THE SIMULATION OF A TYPICAL AIRCRAF...
 
Structural Health Monitoring
Structural Health MonitoringStructural Health Monitoring
Structural Health Monitoring
 
System Identification of a Beam Using Frequency Response Analysis
System Identification of a Beam Using Frequency Response AnalysisSystem Identification of a Beam Using Frequency Response Analysis
System Identification of a Beam Using Frequency Response Analysis
 
Green indexes used in CAST to measure the energy consumption in code
Green indexes used in CAST to measure the energy consumption in codeGreen indexes used in CAST to measure the energy consumption in code
Green indexes used in CAST to measure the energy consumption in code
 
Kost for china-2011
Kost for china-2011Kost for china-2011
Kost for china-2011
 
Pervasive Computing Based Intelligent Energy Conservation System
Pervasive Computing Based Intelligent Energy Conservation SystemPervasive Computing Based Intelligent Energy Conservation System
Pervasive Computing Based Intelligent Energy Conservation System
 
International Journal of Engineering (IJE) Volume (3) Issue (1)
International Journal of Engineering (IJE) Volume (3)  Issue (1)International Journal of Engineering (IJE) Volume (3)  Issue (1)
International Journal of Engineering (IJE) Volume (3) Issue (1)
 

Similar to HVAC_CSIRO_Proof_2015

Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...
IJEECSIAES
 
Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...
nooriasukmaningtyas
 
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE WITH ON-LINE PARAMETER PR...
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE  WITH ON-LINE PARAMETER PR...FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE  WITH ON-LINE PARAMETER PR...
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE WITH ON-LINE PARAMETER PR...
Sheikh R Manihar Ahmed
 
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
ijsc
 
Survey on deep learning applied to predictive maintenance
Survey on deep learning applied to predictive maintenance Survey on deep learning applied to predictive maintenance
Survey on deep learning applied to predictive maintenance
IJECEIAES
 

Similar to HVAC_CSIRO_Proof_2015 (20)

Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...
 
Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...Parameter selection in data-driven fault detection and diagnosis of the air c...
Parameter selection in data-driven fault detection and diagnosis of the air c...
 
Optimized sensor selection for control and fault tolerance of electromagnetic...
Optimized sensor selection for control and fault tolerance of electromagnetic...Optimized sensor selection for control and fault tolerance of electromagnetic...
Optimized sensor selection for control and fault tolerance of electromagnetic...
 
Proposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance ApplicationsProposed Algorithm for Surveillance Applications
Proposed Algorithm for Surveillance Applications
 
IRJET- Early Detection of Sensors Failure using IoT
IRJET- Early Detection of Sensors Failure using IoTIRJET- Early Detection of Sensors Failure using IoT
IRJET- Early Detection of Sensors Failure using IoT
 
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE WITH ON-LINE PARAMETER PR...
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE  WITH ON-LINE PARAMETER PR...FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE  WITH ON-LINE PARAMETER PR...
FAULT DETECTION AND DIAGNOSIS OF INDUCTION MACHINE WITH ON-LINE PARAMETER PR...
 
FUZZY LOGIC APPROACH FOR FAULT DIAGNOSIS OF THREE PHASE TRANSMISSION LINE
FUZZY LOGIC APPROACH FOR FAULT DIAGNOSIS OF THREE PHASE TRANSMISSION LINEFUZZY LOGIC APPROACH FOR FAULT DIAGNOSIS OF THREE PHASE TRANSMISSION LINE
FUZZY LOGIC APPROACH FOR FAULT DIAGNOSIS OF THREE PHASE TRANSMISSION LINE
 
IoT Device Intelligence & Real Time Anomaly Detection
IoT Device Intelligence & Real Time Anomaly DetectionIoT Device Intelligence & Real Time Anomaly Detection
IoT Device Intelligence & Real Time Anomaly Detection
 
TRANSMISSION LINE HEALTH PREDICTION SYSTEM IN HVDC AND HVAC LINES
TRANSMISSION LINE HEALTH PREDICTION SYSTEM IN HVDC AND HVAC LINESTRANSMISSION LINE HEALTH PREDICTION SYSTEM IN HVDC AND HVAC LINES
TRANSMISSION LINE HEALTH PREDICTION SYSTEM IN HVDC AND HVAC LINES
 
FCAME2014
FCAME2014FCAME2014
FCAME2014
 
sensors-23-04512-v3.pdf
sensors-23-04512-v3.pdfsensors-23-04512-v3.pdf
sensors-23-04512-v3.pdf
 
Principal component analysis based approach for fault diagnosis in pneumatic ...
Principal component analysis based approach for fault diagnosis in pneumatic ...Principal component analysis based approach for fault diagnosis in pneumatic ...
Principal component analysis based approach for fault diagnosis in pneumatic ...
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
 
Reliability Prediction of Port Harcourt Electricity Distribution Network Usin...
Reliability Prediction of Port Harcourt Electricity Distribution Network Usin...Reliability Prediction of Port Harcourt Electricity Distribution Network Usin...
Reliability Prediction of Port Harcourt Electricity Distribution Network Usin...
 
Constrained discrete model predictive control of a greenhouse system temperature
Constrained discrete model predictive control of a greenhouse system temperatureConstrained discrete model predictive control of a greenhouse system temperature
Constrained discrete model predictive control of a greenhouse system temperature
 
Wide area protection-and_emergency_control (1)
Wide area protection-and_emergency_control (1)Wide area protection-and_emergency_control (1)
Wide area protection-and_emergency_control (1)
 
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
Fault Diagnosis of a High Voltage Transmission Line Using Waveform Matching A...
 
Survey on deep learning applied to predictive maintenance
Survey on deep learning applied to predictive maintenance Survey on deep learning applied to predictive maintenance
Survey on deep learning applied to predictive maintenance
 
energies-12-01471 (3).pdf
energies-12-01471 (3).pdfenergies-12-01471 (3).pdf
energies-12-01471 (3).pdf
 
Reliability Assessment of Induction Motor Drive using Failure Mode Effects An...
Reliability Assessment of Induction Motor Drive using Failure Mode Effects An...Reliability Assessment of Induction Motor Drive using Failure Mode Effects An...
Reliability Assessment of Induction Motor Drive using Failure Mode Effects An...
 

HVAC_CSIRO_Proof_2015

  • 1. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 Applied Soft Computing xxx (2015) xxx–xxx Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Unsupervised feature selection using swarm intelligence and consensus clustering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems Mitchell Yuwonoa,∗Q1 , Ying Guob , Josh Wallc , Jiaming Lib , Sam Westc , Glenn Plattc , Steven W. Sua a Faculty of Engineering and Information Technology, University of Technology, Sydney (UTS), 15 Broadway, Ultimo, NSW 2007, Australia b The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Computational Informatics, Marsfield, NSW 2122, Australia c The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Energy Technology, Mayfield West, NSW 2304, Australia a r t i c l e i n f o Article history: Received 4 May 2014 Received in revised form 12 February 2015 Accepted 17 May 2015 Available online xxx Keywords: Data clusteringQ4 Consensus clustering Feature selection Ensemble Rapid Centroid Estimation (ERCE) Particle Swarm Optimization Fault detection and diagnosis Heating Ventilation and Air Conditioning (HVAC) system Nonlinear Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN) Hidden Markov Model a b s t r a c t Various sensory andQ3 control signals in a Heating Ventilation and Air Conditioning (HVAC) system are closely interrelated which give rise to severe redundancies between original signals. These redundancies may cripple the generalization capability of an automatic fault detection and diagnosis (AFDD) algo- rithm. This paper proposes an unsupervised feature selection approach and its application to AFDD in a HVAC system. Using Ensemble Rapid Centroid Estimation (ERCE), the important features are auto- matically selected from original measurements based on the relative entropy between the low- and high-frequency features. The materials used is the experimental HVAC fault data from the ASHRAE- 1312-RP datasets containing a total of 49 days of various types of faults and corresponding severity. The features selected using ERCE (Median normalized mutual information (NMI) = 0.019) achieved the least redundancies compared to those selected using manual selection (Median NMI = 0.0199) Complete Linkage (Median NMI = 0.1305), Evidence Accumulation K-means (Median NMI = 0.04) and Weighted Evi- dence Accumulation K-means (Median NMI = 0.048). The effectiveness of the feature selection method is further investigated using two well-established time-sequence classification algorithms: (a) Nonlinear Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN); and (b) Hidden Markov Models (HMM); where weighted average sensitivity and specificity of: (a) higher than 99% and 96% for NARX-TDNN; and (b) higher than 98% and 86% for HMM is observed. The proposed feature selection algorithm could potentially be applied to other model-based systems to improve the fault detection performance. © 2015 Published by Elsevier B.V. 1. Introduction Q5 Heating Ventilation and Air Conditioning (HVAC) systems are important for maintaining the thermal comfort and indoor air qual- ity at places such as offices, shopping malls, warehouses, schools, and homes [1,2]. According to the report by CSIRO [3], 25% of energy consumption in Australia is accounted from commercial buildings [3]. Moreover, HVAC systems represents 40–50% of energy use in these buildings [4]. In the United States (US), HVAC systems account for almost 31% of the electricity consumed by households ∗ Corresponding author. Tel.: +61 430731938.Q2 E-mail addresses: mitchellyuwono@gmail.com (M. Yuwono), Ying.Guo@csiro.au (Y. Guo), Josh.Wall@csiro.au (J. Wall), Jiaming.Li@csiro.au (J. Li), Sam.West@csiro.au (S. West), Glenn.Platt@csiro.au (G. Platt), Steven.Su@uts.edu.au (S.W. Su). [1]. Operational problems in the HVAC systems can cause excess energy consumption. Regular checks and maintenance are there- fore crucial to prevent unnecessary consumption. However, due to the high reactionary maintenance costs, preventive or predictive maintenance practices are usually preferred to reactionary main- tenance. Discriminating a normally behaving HVAC system to a fault condition is a relatively well researched area. A variety of auto- matic fault detection and diagnosis (AFDD) techniques provide a number of benefits to the HVAC systems [5–7]. The current AFDD techniques available in the market for HVAC systems are mainly rule-based approaches [8–10], which obtain prior knowledge to derive a set of if-then-else rules and an inference mechanism that searches through the rule-space to draw conclusions. The rule- based systems can be based solely on expert knowledge (inferred from experience) or can be based on prior knowledge of a specific http://dx.doi.org/10.1016/j.asoc.2015.05.030 1568-4946/© 2015 Published by Elsevier B.V. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
  • 2. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 2 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx system. Being one of the very first methods used in HVAC fault detection problems, the rule-based approaches have been most popularly used over the last decades. Indeed the rule-based approaches come with advantages including ease of development, transparent reasoning, ability to reason even under uncertainty, and the ability to provide explanations for the conclusions reached. However, one must realize that most HVAC systems are installed in different build- ings/environments. This generally means that rules or analytical models developed for a particular system cannot be easily applied to an alternative system. As such, the difficult process of deter- mining and setting rules or generating analytical mathematical models must be tailored to each individual building/environment. The threshold method utilized in the rule-based system is prone to producing false alarms. Moreover, building conditions such as structure of the internal architecture design and even external fac- tors (such as shading and the growth of plant life) often change after the system installation/initialization of a fault detection system, which can require rules/models that were originally appropri- ate to be revisited and updated. It can be learned that a number of weaknesses associated with this type of approach include the requirement of specific tailoring to a system, potential failure of the AFDD system due to its limited knowledge boundaries, and dif- ficulty in updating the model when the AFDD system is installed in a different HVAC system. The aforementioned complications with the rule-based approach give rise to the data driven methods for AFDD in HVAC systems. Regardless of the approach, the performance of an AFDD algo- rithm generally depends on the quality of the features. In CSIRO, we are developing a novel data-driven machine learning technique for AFDD in HVAC systems [4,11–14]. Preliminary results were presented in [11–14], showing the superior performance of the machine learning-based technique in detecting air-handling unit (AHU) faults to rule-based methods based on fault data obtained from ASHRAE Project 1312-RP up to 90% accuracy [13]. However, one limitation of the AFDD systems described in [11–13] is that they rely on features provided by field experts. As with rules, fea- tures that are particularly effective for a particular system may not guarantee equivalent performance when utilized in an alternative system. Selecting the appropriate features is essential in any model- based frameworks. Feature selection aims for minimizing redun- dancies/mutual information between features such that the more important ‘characteristic’ features are not undermined. Specific faults exhibit specific symptoms which are observable only in certain clusters of features that behave differently to the others. The difficulty is that these cluster of features need to be con- stantly monitored as they may change dynamically depending on the condition of the HVAC system under investigation. Moreover, incorrect selections of these characteristic features are dangerous as they may adversely effect the final classifier to an extent that some obvious faults are overlooked. The motivation of this paper is therefore to design a reliable method for feature selection that can be used to augment the effectiveness of AFDD frameworks in general. The unsupervised data-driven feature selection algorithm is designed for HVAC systems operating under varying seasonal dynamics. Evolutionary algorithms are particularly powerful for solving complex optimization problems with multiple local minima. For example, Differential Evolution (DE) has been used for optimization of pressure vessel structure design [15] and joint replenish- ment and distribution model [16]. Although the methods outlined in [15,16] are powerful for general purpose optimization, a major algorithmic restructuring is required to implement these algorithms for cluster optimization. Instead, our paper is inter- ested in exploiting a lightweight evolutionary algorithm designed specifically for clustering purposes, the Rapid Centroid Estimation (RCE) [17]. Unsupervised feature selection based on data clustering is inher- ently an ill-posed problem where the goal is to group redundant features into some unknown number of clusters based on intrin- sic information alone. For this paper, we utilize the Ensemble Rapid Centroid Estimation (ERCE) [17,18], a semi-stochastic multi-swarm clustering algorithm inspired by the Particle Swarm Optimization (PSO [19]), to determine the characteristic features for the specific season. The method is designed to automate the selection of charac- teristic features in each season. The block diagram of the proposed method is shown in Fig. 1. The performance of the proposed feature selection algorithm was tested using two well established time-sequence classifiers: (a) Nonlinear Auto-Regressive Time Delay Neural Networks with Exogenous inputs (NARX TDNN); and (b) Hidden Markov Models (HMM) [13]. A comprehensive comparison would also be given with regards to other feature selection methods including Li’s Manual selection [20], Complete Linkage (CL), Ensemble Evidence Accumulation K-means (EAC K-means) and Weighted Evidence Accumulation K-means (WEAC K-means). The paper is structured as follows: Section 2 presents the overview of the proposed method as well as the materials used to examine its performance. Section 3 presents the detailed descrip- tion for each component including feature extraction, feature selection, and the classifier used in experiment. Section 4 describes the theoretical foundations of the consensus clustering algorithm that we utilize for performing the feature selection. Section 5 describes the data utilized in the experiments. Section 6 presents a comprehensive experimental result of the proposed method and comparative analysis with other conventional feature selection and classification algorithms. Section 7 presents in depth analyses and discussion regarding the results. Finally, Section 8 presents the con- clusion and future direction of the research. 2. General overview on HVAC systems HVAC systems are configured and used to control the environ- ment of a building or a zone including one or several rooms. The environmental variables may, for example, include temperature, air-flow, and humidity. The desired values/set-points of the envi- ronmental variables will depend on the intended use of the HVAC system. If the HVAC system is being used in an office building, the environmental variables will be set to make the building/rooms therein comfortable to humans. An HVAC system typically services a number of zones within a building. The system normally includes a central plant which includes: • a hydronic heater and chiller, • a pump system, which may include dedicated heated and chilled water pumps, circulates heated and chilled water from the heater and chiller through a circuit of interconnected pipes, and • a valve system, which may include dedicated heated and chilled water valves, controls the flow of water into a heat exchange system (which may include dedicated heated and chilled water coils). The heated and/or chilled water circulates through the heat exchange system before being returned to the central plant where the process repeats (i.e. the water is heated or chilled and recircu- lated). In the heat exchange system, energy from the heated/chilled water is exchanged with air being circulated through an air distri- bution system. The HVAC system also includes a sensing system which typically includes a number of sensors located throughout the system, such 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
  • 3. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 3 Fig. 1. Block diagram of the proposed method. as temperature, humidity, air velocity, volumetric flow, pressure, gas, position, and occupancy detection sensors. The HVAC system is controlled by a control system that may be a stand alone system, or may form part of a building automation system (BAS) or build- ing management and control system (BMCS). The control system includes a computing system which is in communication with the various components of the HVAC system. The control system con- trols and/or receives feedback from the various components of the HVAC system in order to regulate environmental conditions for the inhabitancy or functional purpose of the building. In an AFDD process, data from the components of the HVAC system is received. This data may, for example, include sensed data from various sensors within the system and feedback data from various components of the system. Additional data from external data sources can also be received, such as the external weather data. Consequently, the dimensionality and volume of these data are enormous. In order to ensure proper identification of faults, an AFDD algo- rithm requires redundancies in the selected sensory and control signal sources to be minimized. Additional information given by redundant features are irrelevant and provide no useful informa- tion in describing the type of fault and will ultimately cripple the generalization capability of the fault detector. Insufficient features are equally as dangerous as it may lead misdiagnoses due to incom- plete information. The method presented in this paper offers an unsupervised approach for feature selection method using ERCE. The system can be summarized in the block diagram in Fig. 1. A sample feature extraction and feature selection result using our proposed approach can be seen in Fig. 2. The experimental materials in this paper are the experimental fault data from the ASHRAE-1312-RP datasets including Summer 2007, Spring 2008, and Winter 2008 from the ASHRAE Project 1312- RP. In each season, different faults were generated, recorded and reported for experimental uses. 3. Methods Selecting important features in a HVAC system is challenging due to the excessive interrelations between signals. This section overviews our contribution on feature selection using consensus clustering and how it is applied for the HVAC system in particular. The section is subdivided into five subsections: • Section 3.1 outlines the general model that we use for extracting magnitude and oscillation (spectral centroid) features from a raw signal. • Section 3.2 outlines our proposed polar approach for visualizing multi-dimensional patterns. • Section 3.3 defines the measure that we use for quantifying the degree of dissimilarity between features. • Section 3.4 provides the general overview of our main contri- bution, a method for feature selection using semi-stochastic swarm-based consensus clustering, which will be further detailed in Section 4. • Section 3.5 shows the architecture of the neural networks that we use to benchmark the efficiency of the proposed feature selection method. 3.1. Extracting time signal features: magnitude and spectral centroid Sensory signals from a HVAC system are streamed in the form of sampled time signals. From each time signal, HVAC engineers mainly observe two main features for deciding the condition of the system: 1. Whether the average magnitude of a sensory reading is inside the typical condition for the specific season. 2. Whether there is any excessive oscillation in the sensory read- ings compared to the typical condition for the specific season. For example, a fault type classified as Sequence of Heating and Cooling Unstable (HCSF0517) can be identified by observing the excessive oscillation of the Chilled Water Coil control signal (CHWC GPM). The phenomenon can be seen in Fig. 3. In this Figure, it is easy to observe that the moving average magnitude of the CHWC GPM during HCSF0517 is considerably close to the typical behavior. We model these two features mathematically as the moving average magnitude and spectral centroid. For a discrete signal gs(n), the two features can be measured using a straightforward calcula- tion as follows. Magnitude characteristic is measured using a simple moving average which is calculated as follows, MAG(gs) = 1 N N n=1 gs(n), (1) where n denotes the sample number, N denotes the length of the window. Spectral centroid of a signal describes the center of mass of the spectrum, which can be calculated as follows, gs = FFT(gs, NFFT ), (2) 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266
  • 4. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 4 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx Fig. 2. (a) Raw signals for the Spring 2008 dataset; (b) the low and high frequency features are isolated from each signal. Signals 1–160 are moving average magnitude signals while signals 161–320 are spectral centroid signals; (c) characteristic features are selected using ERCE, while (d) classification is done using NARX-TDNN. SC(gs) = NFFT n=5 |ˆgs(n)|ˆgs(n) NFFT n=5 |ˆgs(n)| , (3) where FFT denotes fast Fourier transform, NFFT indicates the number of bin, ˆgs(n) and |ˆgs(n)| represent the center frequency and magni- tude of the nth bin. Notice that the frequency centroid is calculated from the fifth bin to isolate only the high frequency oscillation. Fault can be interpreted as ‘how much a signal deviates from its typical characteristic during the specific season’. Incorporating this criterion, each feature vector qs which includes {MAG(gs), SC(gs)} is normalized with respect to its normal operation. The discrepancy in both direction and magnitude relative to the normal signal is represented as a signed multiple of the signal’s standard deviation during typical operation, zs(n) = qs(n) − n(n) n(n) , (4) where n(n) and n(n) denote the mean and standard deviation of a feature during its normal operation at a specific sample n taken at a particular time of the day. One can automatically realize that the approach simply calculates the cross-sectional z-score of the feature qs. The hyperbolic tangent kernel is then applied on the z-score, effectively transforming each feature to a continuous measure from { − 1, 1} as follows ys(n) = tanh (zs) (5) which has a rather intuitive ‘fuzzy’ interpretation as follows: (a) ys(n) = 0: feature is at a typical level. (b) ys(n) → −1: feature is atypical negative (much smaller than its typical level), (c) ys(n) → 1: feature is atypical positive (much larger than its typ- ical level). Intuitively, the variability of ys throughout the season would pro- vide a good indicator of its importance. In this paper, we measure variability of a feature in term of its entropy as follows, Hys = − pys (x) log pys (x)dx, (6) where pys (x) can be approximated empirically from the histogram of ys. 3.2. Feature visualization Visualization is an important tool to verify the effectiveness of a feature selection algorithm. However, due to the complexity of an HVAC system, simultaneous visualization would easily overwhelm the observer. In this paper a polar approach for visualizing patterns consti- tuted by multi-dimensional feature cross-sections is proposed. The visualization scheme can be seen in Fig. 4. Using the proposed visualization scheme, we have the variable numbers listed in particular angles in the circle, whose correspond- ing radius represents the magnitude of ys, as previously detailed in Eq. (5). A normal system would oscillate inside the typical region (ys = 0) such that the polar plot shows a circle-like pat- tern. During fault condition the sensors behave inside either the positive/negative atypical region such that the polar plot assumes various shapes other than circle. For example, Fig. 5 shows that the pattern during normal operations are visually different to the OA Damper Stuck (OADS) fault scenario. 3.3. Measuring divergence between features A pair of feature vectors y1 ∈ Y and y2 ∈ Y calculated from Eq. (5) can be treated as a vector of random numbers generated by the probability distribution functions P = p(x) and Q = q(x), respectively. y1 and y2 can be assumed to be redundant (i.e. generated from the same distribution) when the Kullback–Leibler(KL) divergence 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
  • 5. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 5 Fig. 3. The magnitude (top) and frequency (bottom) characteristics of the Chilled Water Control signal (CHWC GPM) during fault (HCSF0517) vs. normal (NOR0505). Even though CHWC GPM during HCSF0517 is correlated in terms of magnitude characteristic, the signal is uncorrelated in terms of frequency characteristic. between the two approaches zero [21]. A practical illustration of the case can be seen in Fig. 6. KL-divergence measures the relative entropy between two dis- tributions [21]. KL-divergence measures the amount of information lost when Q is used to approximate P as follows, KL(P||Q) = H(P,Q) − x p(x) log q(x) + −H(P) x p(x) log p(x), (7) = x p(x) log p(x) q(x) , (8) where H(P, Q) denotes the cross entropy between P and Q and H(P) denotes the information entropy of P. In this paper we use the symmetrical KL-divergence as originally proposed in [21] due to its symmetrical property as follows, KLs(P||Q) = KL(P||Q) + KL(Q||P) = x p(x) log p(x) q(x) − q(x) log p(x) q(x) . (9) 3.4. Feature selection using consensus clustering Performing feature selection using prototype-based algorithms such as K-means, fuzzy C-means, or Self Organizing Map (SOM), can be difficult because the number of characteristic features K is not initially known. Consensus clustering provides a quantitative evidence for determining the number and membership of possible clusters within a dataset (in our case, features). The method has gained popularity in cancer genomics as a powerful tool to extract and visualize the dependencies between genes [22–24]. In this paper we propose an approach for unsupervised fea- ture selection using a swarm based ensemble algorithm [18]. An advantage of ensemble clustering algorithms to the conventional clustering algorithms is that they allow a robust estimation of natural clusters by investigating the consensus strength between multiple clusterings [22,25,26]. Consensus clustering is particularly powerful for identifying strong clusters in the data [22]. This is par- ticularly useful for our application as can be seen in Section 6 where it can be observed that the features selected using consensus clus- tering algorithms are generally more compact and least redundant compared to the ones selected using complete-linkage. 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358
  • 6. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 6 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx Fig. 4. The proposed polar visualization scheme. In this illustration, we can see that features other than features #4 and #5 behave atypically. The feature selection process can be summarized as follows: 1. Determine the feature clusters using consensus clustering. 2. For each cluster, rank each feature according to its entropy and pick one whose entropy is the highest as the characteristic fea- ture for the cluster. A sample result of a run of feature selection process using con- sensus clustering is shown in Fig. 7. Features in the same cluster are denoted accordingly using the same color. The radius of each feature indicates the entropy. A bold circle in each cluster is the chosen characteristic features, which is the feature with the highest entropy compared to the others in the same cluster. 3.5. Fault classification using Nonlinear Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN) The Non-linear Auto-Regressive with eXogeneous inputs (NARX) network architecture [27] is a class of discrete-time non- linear systems. The NARX architecture can be broadly expressed in the parallel mode, ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), ˆy(t − ny), . . ., ˆy(t − 1)), (10) or in the series-parallel mode, ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (11) where u(t), y(t) and ˆy(t) denote input, actual output and esti- mated output of the network at time t. nu and ny are the input and output order, and f denotes a nonlinear function, which can be Fig. 5. The proposed polar visualization scheme showing the characteristic signals in normal operation scenarios (left) and in OADS scenario (right) in the Winter 2008 dataset. 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382
  • 7. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 7 Fig. 6. A simplified case of redundancy between features in a HVAC system. How many clusters are there? It can be seen that the divergence between yCHWC−VLV and yCHWC−GPM distributions is intuitively smaller than the divergence between yCHWC−VLV and ySA−HUMD. If these four signals were to be clustered, then a possible solution would be to assign them into two clusters, i.e. {{ yCHWC−VLV, yCHWC−GPM }, {ySA−HUMD, yRA−HUMD}}. approximated using a Multilayer Perceptron (MLP). As opposed to conventional Recurrent Neural Network (RNN), a NARX network’s feedback comes only from the output neurons rather than its hid- den states. Using this simplified configuration, it has been argued that NARX networks generalize better compared to other RNN net- works, especially on problems involving long-term dependencies [28]. The configurations described in Eqs. (10) and (11) differ only in their mode of feedback. The configuration described in Eq. (10) is referred to as parallel mode or recurrent NARX (NARX-P), while Eq. (11) is referred to as series-parallel mode NARX (NARX-SP) [29]. The NARX-P uses the state estimate feedback, while NARX-SP uses the actual observable state. Due to the fact that the actual state of an HVAC system is practically unavailable at all times, the deployment of NARX in an AFDD systems is currently limited to the NARX-P configuration. 4. Consensus clustering This section explains, in great detail, the semi-stochastic swarm- based consensus clustering approach to feature selection in a HVAC system. The section is subdivided into six subsections: • Section 4.1 briefly introduces the consensus clustering paradigm, • Section 4.2 presents the visual abstract of our proposed feature selection method, • Section 4.3 overviews Fred and Jain’s Ensemble Accumulation [25], • Section 4.4 summarizes our previous work on Swarm Rapid Cen- troid Estimation (SRCE) [17], • Section 4.5 introduces the newly proposed ‘self-evolution’ strat- egy for the SRCE, • Section 4.6 outlines the new implementation of ERCE for feature selection purposes. 4.1. Fundamentals of consensus clustering Consensus clustering infers a consensus matrix from multiple runs of clustering algorithms. This consensus matrix encodes the probability of each pairs of observation belonging to the same clus- ter. It has been argued that the natural, and arguably, optimum clusters can be validated with higher confidence by analyzing the stability of this matrix [22,25]. The consensus matrix C is a positive semidefinite N × N square matrix of joint probabilities. Each Cij ∈ {0, 1} represents the proba- bility of data point i and j belonging in the same cluster. For given 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423
  • 8. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 8 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx Fig. 7. A result of feature selection using ERCE (Algorithm 4, Section 4) on the Spring 2008 dataset, projected on the first and second principal components for ease ofQ6 visualization. Each point represents a feature where the radius denotes the corresponding entropy. Each feature cluster is color coded and the characteristic feature of each cluster is annotated accordingly. In this example, ERCE chose 16 characteristic features from the 320 features (160 magnitude features and 160 spectral centroid features). It can be seen that the spectral centroid feature for CHWC-GPM (SC CHWC-GPM) is selected, in line with the observation in Fig. 3. ERCE accurately discovered that Return Fan (RF) and Supply Fan (SF) features are particularly important. This discovery is in line with the existence of Return Fan Failure (RFF) faults (May 12th, 18th, and 19th) observed during the season. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) a cluster assignment obtained from the mth clustering, we can cal- culate the mth co-association matrix as follows, Cm = UT mUm, (12) where each Um is a Km × N matrix which stores the values of uik,m for i ∈ {1, . . ., N} and k ∈ {1, . . ., Km} obtained from the mth run of any clustering algorithm. Each uik,m denotes the probabil- ity of a data point yi belonging to the cluster Ck. For any m, Um should satisfy the constraints uik,m ∈ {0, 1} and K k=1 uik,m = 1. The matrix multiplication represents a probabilistic ‘and’ operator con- veniently calculated using the (multiplicative) fuzzy T-norm [30]. The ith diagonal component of Cm, i.e. Dii,m, quantifies the degree of Fig. 8. An illustration describing the architecture of the Parallel Nonlinear Auto-Regressive Time Delay Neural Networks with eXogenous input (NARX-TDNN). 424 425 426 427 428 429 430 431 432 433 434
  • 9. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 9 Fig. 9. Various partitions on the Spring 2008 dataset encoded by 16 subswarms of the Self Evolving Swarm Rapid Centroid Estimation (SE-SRCE, Algorithm 3). Fuzzifier constant is set to 1.2, target entropies are uniformly randomized between 0.005 and 0.05. The coordinates are projected to the first and second principal components for ease of visualization. In depth explanation regarding the method can be read in Section 4.4 and Section 4.5. stability for the ith data in the mth clustering. In this paper we propose normalizing Cm by its diagonal matrix Dm as follows, Cm = D −1/2 m CmD −1/2 m (13) The consensus C, or ensemble aggregate, is calculated as the weighted average of the co-association matrices C1, C2, . . ., CM as follows, C = M m=1 wmCm M m=1 wm , (14) where wm denotes the weight of the corresponding partition which can be determined manually or using any cluster validation method [31]. wm can also be set to assume equal weighting such that wm = 1 for all m [25]. The consensus distance matrix can be defined as follows [22], D = 1 − C (15) which transforms the consensus matrix into a pairwise distance matrix. Fred and Jain [25] proposes using single/average/complete linkage algorithm on the D matrix to recover the natural cluster. In their 2005 paper, a criterion called maximum lifetime is proposed to determine the optimum threshold for cutting the cluster den- drogram [25]. Readers are encouraged to refer to [25] for more details. 4.2. Visual abstract: feature selection using ERCE A visual abstract of the proposed swarm-based consensus clustering algorithm can be seen in Figs. 9 and 10. Fig. 10 presents the consensus matrix and hierarchical cluster tree (clus- ter dendrogram) from the aggregation of the partitions shown in Fig. 9. 4.3. Evidence accumulation Fred and Jain propose the Evidence Accumulation (EAC) in 2005 as a consensus clustering framework for combining the result of multiple runs of a crisp prototype-based clustering algorithm (e.g. K-means) [25]. Wang proposes a generalization to the algorithm, extending the applicability of the EAC for both crisp and fuzzy clusters [30]. He finds that fuzzy par- titions is rather advantageous to crisp partitions in Ensemble Accumulation as the degree of overlapping in fuzzy partition encodes to an extent how ‘close’ together clusters are [30]. The approach can be summarized as a two step process as follows, 1. Split: Partition the data matrix Y into some number of parti- tions Km (may be fixed or randomized within an interval) using any prototype-based clustering algorithm. Repeat this step M times. 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476
  • 10. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 10 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx Fig. 10. A heat map presenting the consensus matrix resulted from the aggregation of an SE-SRCE swarm shown in Fig. 9 using Algorithm 4 (Section 4.6). The rows and columns indicate individual items (in our case: the 320 features) whose consensus values range from 0 (never clustered together) to 1 (always clustered together) marked by white to dark blue. The complete linkage cluster dendrogram showing the degree of redundancy between features is shown above the consensus matrix. Between the cluster dendrogram and the consensus matrix is the cluster label vector suggested by the maximum lifetime cut. The output of the consensus clustering is as shown in Fig. 7. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) 2. Merge: Calculate the consensus matrix C and interpret the ensemble clustering by performing a desired graph algo- rithm. Given the data vectors yi ∈ Y, for each clustering m, Km centroid vectors xk ∈ Xm can be obtained using any prototype-based clus- tering algorithm (e.g. K-means, fuzzy C-means, Gaussian Mixture Models). The degree of membership of yi w.r.t xk is a function of distance calculated as follows, uik,m = 1 if argmin xk∈X d(yi, xk,m) 0 otherwise u ∈ [0, 1] (16) uik,m = d(yi, xk,m)−1/( −1) K j=1 d(yi, xj,m)−1/( −1) , > 1 u ∈ {0, 1}. (17) Wang argues that using fuzzy partition in consensus clustering is particularly efficient for suppressing over-segmentation. It is also more tolerant to noisy information than its crisp counterpart [30]. The conventional approach using Evidence Accumulation (EAC) [25] and Weighted Evidence Accumulation (WEAC) [31] are summarized in Algorithm 1. Notice that the pseudocode is sim- plified using the fuzzy t-norm approach to EAC as introduced in [30]. Algorithm 1. (Weighted) Ensemble Clustering ((W)EAC Clustering) Input dim × N Data Matrix Y, maximum number of prototypes Kmax, number of repetitions M, Prototype-based clustering algorithm Cluster (e.g. K-means, Fuzzy C-means), Linkage algorithm Linkage. Output Crisp Ensemble Partition L 1: for m = {1, . . ., M} do 2: // Partition Y using random number of clusters. 3: Krnd ← random({2, Kmax}) 4: {Um, Xm} ← Cluster(Y, Krnd) 5: // Calculate the co-association matrix for each clustering. 6: Cm ← UT mUm 7: Cm ← D −1/2 m CmD −1/2 m 8: end for 9: // Calculate the consensus matrix 10: C ← M m=1 wmCm M m=1 wm , 11: // Interpret the consensus matrix using Linkage algorithm 12: HierarchicalTree = linkage(C) 13: th← MaximumLifetime(HierarchicalTree) 14: L ← Cut(HierarchicalTree, th) 15: Note that the threshold for cutting the hierarchical tree is determined using maximum lifetime method [25]. 4.4. Swarm Rapid Centroid Estimation Yuwono [17] proposes the Swarm Rapid Centroid Estimation (Swarm RCEr+) algorithm in 2011 [32]. The semi-stochastic clus- tering algorithm efficiently incorporates the paradigms of Particle Swarm Optimization (PSO [19]) into the traditional Expectation Maximization (EM). The statistical validation on benchmark data suggest that Swarm RCEr+ have a reduced risk of converging to local minima and leaner computational complexity compared to earlier evolutionary-algorithm-based clustering approaches [17]. The algorithm was updated in 2014 to further decrease its memory complexity to be used for Ensemble clustering applications [18]. The RCE algorithm below follows the 2014 preposition. A particle in an RCE subswarm stores a tuple consisting of a position vector x and a velocity vector v, particlek,m = {xk,m, vk,m}. (18) The position vector of each particle represents the coordinate of a centroid vector xi ∈ Rdim. In RCE a subswarm is a collection of centroid coordinates, encoding a possible solution to the clustering problem. As the RCE swarm consists of M of such subswarm, at the end of optimization, as many as M clustering solutions can be obtained. Each subswarm stores two memory matrices: 1. The self-organizing memory Ym, which is an array of randomly sampled pointers to the data Y, Ym = randsample(Y, Á%), (19) where Á % ∈ {0, 1} denotes the rate of random sampling. 2. The best position memory Xbest m which stores the position vec- tors X = {x1, . . ., xKm } that minimizes a given objective function f (Ym, Xm) throughout the search. A typical objective function is usually defined as, but not restricted to, the average distortion, f (Ym, Xm) = xk∈Xm yi∈Ym uik,md(xk, yi) yi∈Ym uik,m (20) where uik,m can be calculated either using Eq. (16) or Eq. (17). The RCE swarm Xbest matrix is the union of all Xbest m such that, Xbest = M m=1 Xbest m (21) 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530
  • 11. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 11 Fig. 11. Trajectory of the Swarm RCE particles recorded after 30 iterations on a toy dataset with numerous random seeding shows Swarm RCE robustness and insensitivity to initialization. M = 6, tmax = 30, ε = 0.05, ıreset = 15. On each iteration, the velocity and position of a particle is updated as follows, vk,m(t + 1) = vk,m(t) + «k,m(t) (22) xk,m(t + 1) = xk,m(t) + vk,m(t + 1) (23) where « denotes the resultant vector, which consist mainly of the self organizing term and minimum (best position) term, «k,m(t) = ϕ1 ◦ self organizing |Ym| i=1 uik,m (yi − xk,m(t)) |Ym| i=1 uik,m + ϕ2 ◦ minimum (best position) ⎛ ⎝ |Xbest | j=1 qjk,m (xbest j (t) − xk,m(t)) |Xbest | j=1 qjk,m ⎞ ⎠, = ϕ1 ◦ (E[Ym|Xm = xk,m] − xi,m) +ϕ2 ◦ (E[Xbest|Xm = xk,m] − xk,m), (24) where ϕ ∈ {0, 1} ∈ Rdim denotes a uniform random vector; uik,m denotes the cluster membership when Ym is mapped to Xm; while qjk,m denotes the cluster membership when Xbest is mapped to Xm. Should the self-organizing vector of a particle equals 0, xi will be directed to xI win,m, the position of the winning particle. xIwin,m is a particle in the mth subswarm whose cluster has the largest cardinality. The RCE is equipped with two strategies to cope with suboptimal convergence including substitution and particle reset as follows: 1. Substitution strategy forces particles in a search space to reach alternate equilibrium positions by introducing position instabil- ity. After each position update episode for a particle, apply {xi(t + 1), vi(t + 1)} = {xI win(t + 1) + N(0, ), 0} if ϕ < ε {xi(t + 1), vi(t + 1)} otherwise (25) where ϕ is a uniform random number ϕ ∈ {0, 1}, and N(0, ) is a Gaussian random vector with mean = 0 and standard devia- tion of each dimension of the data being clustered. ε denotes the substitution probability parameter. Larger ε increases the fre- quency. Optimal ε values lie between 0.01 ≤ ε ≤ 0.05 [17]. RCE with substitution strategy enabled is denoted with the super- script +. 2. Particle reset strategy is triggered when fitness of the local minimum f (Ym, Xbest m (t)) does not improve after a number of 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559
  • 12. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 12 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx iterations. Stagnation can be detected using a stagnation counter ı which is updated as follows: ı(t + 1) = ı(t) + 1 if f (Ym, X(t)) ≥ f (Ym, Xbest(t)) 0 otherwise . (26) When ı(t + 1) > ımax this strategy reinitializes all particles in a subswarm without resetting the local minimum position matrix. Values being reinitialized are only xk(t) and vk(t). Swarm conver- gence is detected when f (Ym, Xbest(t)) does not improve after a number of resets. RCE with particle reset strategy enabled is denoted with the superscript r. The algorithm pseudocode is shown in Algorithm 2. An illus- tration of the search trajectory of the swarm on a toy example is shown in Fig. 11. Algorithm 2. Swarm RCEr+ Input Data points Y = {y1, . . ., yN } ∈ Rdim , # of clusters K. Output Swarm centroid vectors Xbest = {Xbest 1 , Xbest 2 , . . ., Xbest M } ∈ Rdim . 1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0). 2: For each subswarm m, randomly sample Y and store it in the memory Ym = randsample(Y, Á%). 3: repeat 4: for all m ∈ {1, . . ., M} do 5: Calculate Um from the pairwise distance between Xm and Ym, 6: Calculate Qm from the pairwise distance between Xm and Xbest , 7: Store Xbest m which minimizes f (Ym, Xm) throughout the search, 8: Vm ← Vm + «m, 9: Xm ← Xm + Vm, 10: Redirect particles with zero cardinality toward the particle whose cluster has the largest cardinality. 11: Apply substitution with rate of ε 12: if f (Ym, Xbest m ) does not improve after ıreset iterations then 13: Reinitialize subswarm (randomize(Xm), Vm = 0) 14: end if 15: end for 16: until Convergence or maximum iteration reached 17: return Xbest = {Xbest 1 , Xbest 2 , . . ., Xbest M } ∈ Rdim . 4.5. Self Evolving Swarm RCE In this implementation we introduce a new self-evolution criterion to the RCE which allows each subswarm to summon additional particles at will until the target cluster entropy is satisfied. The uncertainty for a fuzzy membership value uik ∈ {0, 1} [33] can be quantified as follows, hik,m = uik,m log uik,m. (27) Bezdek argues that a good clustering can be achieved when hik,m is minimized [33]. The average cluster entropy is then, Hm = − 1 Km|Ym| Km k=1 |Ym| i=1 uik,m log uik,m, (28) where Um is calculated from Xbest m . Hm close to 0.5 indicates a possible underpartitioning. Hm very close to 0 may also indicate overpartitioning. Hm is only investigated each when there is an update to Xbest m where the number of non-empty clusters is equal to Km such that |Cbest m | = Km. If Hm is larger than the target entropy m, the number of particles incremented using the following rule, Km(t) = Km(t) + z+ r if Hm > m, Km(t) otherwise, (29) where Km(t) denotes the number of particles in the swarm m at the current iteration t, z+ r denotes an upper-bounded random integer, z+ r ∈ Z+ = [1, 2, . . ., z+ max], while m ∈ {0, 0.5} denotes a target Hm. Using this approach each subswarm to automatically adjusts Km until the entropy criterion is satisfied. The desired granularity and diversity of the swarm can be con- trolled by setting or randomizing the value of m. The growth speed of the swarm can be controlled by setting z+ r . As the subswarms infer Km automatically from Hm, the need of specifying the ran- domization interval is now abolished (recall that in EAC and WEAC K-means, Km is randomized within a pre-specified upper and lower bound). The pseudocode of the Self-Evolving Swarm RCEr+ (SE-SRCE) can be seen in Algorithm 3. A typical summary of an execution of SE- SRCE can be seen in Fig. 12. Algorithm 3. Self-Evolving Swarm RCEr+ (SE-SRCE) Input Data points Y = {y1, . . ., yN } ∈ Rdim , # of clusters K. Output Swarm centroid vectors Xbest = {Xbest 1 , Xbest 2 , . . ., Xbest M } ∈ Rdim . 1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0). 2: For each subswarm m, randomly sample Y and store it in the memory Ym = randsample(Y, Á%). 3: repeat 4: for all m ∈ {1, . . ., M} do 5: Execute Algorithm 2 lines 5–14, 6: if f (Ym, Xm) improves then 7: // Check whether the entropy criterion is satisfied and whether all subswarms are nonempty 8: if |Cbest m | = Km and Hm > m then 9: Km ← Km + z+ r 10: end if 11: end if 12: end for 13: until Convergence or maximum iteration reached 14: return Xbest = {Xbest 1 , Xbest 2 , . . ., Xbest M } ∈ Rdim . 4.6. Ensemble Rapid Centroid Estimation using Self-Evolving Swarm Ensemble RCE (ERCE) [18] is an ensemble extension to the Swarm RCEr+. The algorithm is shown to be relatively leaner com- plexity compared to conventional ensemble clustering algorithms [18], achieving up to quasilinear complexity in both time and space [18]. In this application we propose incorporating the proposed SE-SRCE into the ERCE framework. As the size of the evidence accu- mulation matrix is still relatively manageable (recall that since there are 320 features = 160 magnitude features + 160 spectral cen- troid features, the size of C is 320 × 320), EAC can be performed without using the co-association tree compression process pro- posed in the original paper [18,34]. However, it needs to be noted that should the number of features increase up to thousands, it is advisable that the co-association tree compression is utilized. Fur- ther information on the co-association tree can be read in Wang’s paper [34]. In order to interpret the final clustering, we need to clarify that in our application each cluster represents “a group of more redundant features”. For each feature cluster, a feature with the largest entropy is selected as a characteristic feature for the cluster. The pseudocode of ERCE used in our application is shown in Algorithm 4. 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631
  • 13. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 13 Algorithm 4. Ensemble Rapid Centroid Estimation (ERCE) Input dim × N Data Matrix Y, number of subswarms M, fuzzification constant , target entropy for each subswarm { 1, . . ., M}, Linkage algorithm Linkage. Output Crisp Ensemble Partition L Xbest ← SE − SRCE(Y) for all m ∈ {1, . . ., M} do Given Y and Xbest m , calculate Um using Eq. (17). // Calculate the co-association matrix for each clustering. Cm ← UT mUm Cm ← D −1/2 m CmD −1/2 m end for C ← M m=1 wmCm M m=1 wm , HierarchicalTree = linkage(C) th← MaximumLifetime(HierarchicalTree) L ← Cut(HierarchicalTree, th) // interpreting the final partition for all Ck ∈ {C1, . . ., YL max} do // For each feature cluster, the characteristic feature is the feature with highest entropy ycharacteristic k = argmaxy∈Ck − py(x) log py(x)dx end for 5. Experimental data The ASHRAE Project 1312-RP modeled and reported a wide vari- ety of faults in three different seasons. The experiments include two HVAC systems running side by side with identical zone load. Fault test was conducted in Air Handling Unit (AHU)-A, meanwhile nor- mal operation was running in AHU-B. By comparing AHU A and B fault characteristics were recorded. ASHRAE-1312-RP datasets included detailed experimental result from Summer 2007, Spring 2008, and Winter 2008. In each season different types of faults were generated, recorded and reported. Readings from 160 sig- nals sources during normal operation and various fault scenarios were recorded. The data was sampled every minute from 6:00 to 18:00. The faults reported in the ASHRAE-1312-RP datasets as well as a summary on the behavior of the feature proposed by Li [20], were described in Table 1. Note that the features used in this table are not part of our research but rather to illustrate how a static model would struggle during varying seasons. This is because the features that are important in one season may not be as important in other seasons. The feature that we use throughout the paper is determined dynamically using consensus clustering based on the unique behavior in each season. 6. Result Based on the features in Table 1, we can see that faults such as OASB, MADU and HCSF are particularly difficult to identify using Li’s model [20]. In this section we present the experimental result of our proposed unsupervised feature selection method. In this section we wish to investigate the following: 1. What the characteristic features for each season are, and 2. Whether the selected features improves the generalization capa- bility of an AFDD algorithm in general. In particular, we are interested in whether we can reliably identify OASB, MADU, and HCSF using the features selected by our proposed method. Our approach is as follows. From each dataset (Summer 2007, Spring 2008, and Winter 2008), as many as 160 time signals, and a vector recording the time of the day were reported. Using the method described in Section 3.1 as many as 320 + 1 additional fea- ture could be extracted including: • Magnitude features from 160 sensor and control signals, • Spectral centroid features from 160 sensor and control signals. • Time of the day (1 feature), For clarity, the step-by-step process of the experiment can be summarized as follows: 1. Select a season and get the raw signals during normal operations. 2. For each raw signal, isolate the magnitude and spectral centroid components and calculate the fuzzy feature representation using the method described in Section 3. 3. Find the characteristic features using a consensus clustering algorithm (Our approach uses ERCE: Algorithm 4). 4 . Append the time-of-the-day feature as an additional feature. 5. Using the selected features, train a model (Our approach uses NARX-TDNN) using the data in Table 1. For each type of fault, randomly partition the data as follows: • 15% as training set, • 15% as validation set, and • 70% as test set. 6. Investigate the results on the test set to see whether using the selected features increases/decreases the classifier’s generaliza- tion capability. 6.1. Feature selection result We wish to keep the number characteristic feature to a reason- able level (e.g. between 4 and 30) to ensure that the generalization capability of the classifier is not undermined. The parameters of both ERCE, EAC K-means, and WEAC K-means were selected based on the assumption derived using the method illustrated in Fig. 12. From the average entropy-distortion scatter for each season such as depicted in Fig. 12, we approximated the number of character- istic features to be around 5–30 or the average cluster entropy of 0.005–0.05. The parameters used for ERCE were as follows. The initial num- ber of particles was set to 2, the number of subswarms was set to 60, substitution probability ε was set to 3%, ıreset was set to 15, the distance metric was set to KL-divergence, fuzzifier was set to 1.2, the entropy threshold for each subswarm m was uniformly ran- domized between 0.005 and 0.05, z+ max = 2, maximum number of iterations was set to 100, and the linkage method was set to com- plete linkage. KL-divergence and complete linkage were selected as the physical model of the HVAC was assumed to be unknown and even a subtle difference in temporal patterns/shapes could be an important predictive component for specific types of fault. Com- plete linkage favors the formation of small spherical clusters which is particularly useful for capturing these subtle differences. Opti- mum cut was then conventionally calculated using the maximum lifetime criterion [25]. Subswarms were equally weighted during ensemble aggregation such that w1,...,M = 1. Further investigation was also performed in order to benchmark the quality of the feature selected by the method. Benchmark unsu- pervised feature selection methods includes EAC K-means [25], WEAC K-means [31], and a traditional complete linkage agglomer- ative clustering (CL). CL was utilized to verify the advantages of the consensus approaches to a conventional graph-based approach. In this experiment, the CL hierarchical tree is cut using inconsistency criterion, with inconsistency coefficient = 1, returning as many as 84 clusters, thus 84 characteristic features. The parameters for EAC K-means and WEAC K-means were set as follows. The number of repetitions was set to 60, the number of clusters k was uniformly randomized between 5 and 30. The distance metric was set to KL-divergence. The linkage method was set to complete linkage as per discussion. The optimum cut was calculated using the maximum lifetime criterion [25]. Weights for WEAC K-means were calculated using the average silhouette width criterion [35]. 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734
  • 14. Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus- teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLEINPRESSGModel ASOC29831–24 14M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx Table 1 ASHRAE-1312-RP dataset description and symptoms using features described in Shun Li’s model [20]. # Name Description HWC- VLV P-E- hcoil CHWC- VLV P-E- ccoil SF-SPD P-E-SF RF-SPD P-E-RF P-SA- CFM P-RA- CFM P-OA- CFM SA- TEMP MA- TEMP RA- TEMP HWC- DAT CHWC-DAT Summer 2007 1 NOR0819 Normal Operation 2 NOR0825 Normal Operation 3 EADS0820 EA Damper Stuck (Fully Open) 0 0 0 0 + + + + 0 + + 0 0 0 0 0 4 EADS0821 EA Damper Stuck (Fully Close) 0 0 0 0 − − − − 0 − − 0 0 0 0 0 5 RFF0822 Return Fan at fixed speed (30% speed) 0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0 6 RFF0823 Return Fan complete failure 0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0 7 CHWC0824 Cooling Coil Valve Control unstable 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (Reduce PID Proportional Band by half) 8 CHWC0903 Cooling Coil Valve Reverse Action ++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 9 OADS0826 OADS OA Damper Stuck (Fully Closed) 0 0 0 0 ++ ++ ++ ++ 0 + − 0 0 0 0 0 10 CHWV0827 Cooling Coil Valve Stuck (Fully Closed) 0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++ 11 CHWV0831 Cooling Coil Valve Stuck (Fully Open) ++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 12 CHWV0901 Cooling Coil Valve Stuck (Partially Open – 15%) 0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++ 13 CHWV0902 Cooling Coil Valve Stuck (Partially Open – 65%) ++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 14 HCL0828 Heating Coil Valve Leaking (Stage 1 – 0.4GPM) 0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 15 HCL0829 Heating Coil Valve Leaking (Stage 2 – 1.0GPM) 0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 16 HCL0830 Heating Coil Valve Leaking (Stage 3 – 2.0GPM) 0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 17 OADL0905 OA Damper Leaking (45% Open) 0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0 18 OADL0906 OA Damper Leaking (55% Open) 0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0 19 AHUL0907 AHU Duct Leaking (after SF) 0 0 + + + + + + + + + 0 0 0 0 0 20 AHUL0908 AHU Duct Leaking (before SF) 0 0 0 0 −− −− −− −− 0 −− −− 0 0 0 0 0
  • 15. Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus- teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLEINPRESSGModel ASOC29831–24 M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx15 Table 1 (Continued) # Name Description HWC- VLV P-E- hcoil CHWC- VLV P-E- ccoil SF-SPD P-E-SF RF-SPD P-E-RF P-SA- CFM P-RA- CFM P-OA- CFM SA- TEMP MA- TEMP RA- TEMP HWC- DAT CHWC-DAT Spring 2008 1 NOR0502 Normal Operation 2 NOR0503 Normal Operation 3 NOR0504 Normal Operation 4 NOR0505 Normal Operation 5 NOR0509 Normal Operation 6 OASB0529 OA temperature sensor bias (+3F) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 OASB0530 OA temperature sensor bias (−3F) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 OADS0507 OA Damper Stuck (Fully Close) 0 0 0 0 + + + + − + −− 0 0 0 0 0 9 OADS0508 OA Damper Stuck (40% open) 0 0 0 0 + + + + − + −− 0 0 0 0 0 10 EADS0527 EA Damper Stuck (Fully open) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 EADS0510 EA Damper Stuck (Fully Close) 0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0 12 EADS0511 EA Damper Stuck (40% open) 0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0 13 CHW0506 Cooling Coil Valve Stuck (Fully Closed) 0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++ 14 CHW0515 Cooling Coil Valve Stuck (Fully Open) ++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 15 CHW0516 Cooling Coil Valve Stuck (Partially Open – 50%) ++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0 16 RFF0512 Return Fan complete failure 0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0 17 RFF0518 Return Fan at fixed speed (20%spd) 0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0 18 RFF0519 Return Fan at fixed speed (80%spd) 0 0 0 0 0 0 ++ ++ 0 ++ 0 0 0 0 0 0 19 AFAB0522 Air filter area block fault (10%) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 AFAB0525 Air filter area block fault (25%) 0 0 0 0 + + + + 0 0 0 0 0 0 0 0 21 MADU0513 Mixed air damper unstable 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 MADU0514 Mixed air damper unstable/Cooling coil control unstable 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 HCSF0517 Sequence of heating and cooling unstable 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 HCSF0601 Supply Fan control unstable 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 16. Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus- teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLEINPRESSGModel ASOC29831–24 16M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx Table 1 (Continued) # Name Description HWC- VLV P-E- hcoil CHWC- VLV P-E- ccoil SF-SPD P-E-SF RF-SPD P-E-RF P-SA- CFM P-RA- CFM P-OA- CFM SA- TEMP MA- TEMP RA- TEMP HWC- DAT CHWC-DAT Winter 2008 1 NOR0129 Normal Operation 2 NOR0216 Normal Operation 3 NOR0217 Normal Operation 4 OADS0212 OA Damper Stuck (Fully Close) −− −− 0 0 ++ + ++ + −− ++ −− 0 − 0 0 0 5 OADL0213 OA damper leaking (52% open) 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 6 OADL0215 OA damper leaking (62% open) 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 7 EADS0202 EA Damper Stuck (Fully open) 0 0 0 0 0 0 0 0 0 + + 0 0 0 0 0 8 EADS0203 EA Damper Stuck (Fully Close) − −− 0 0 0 0 0 −− 0 −− −− 0 0 0 0 0 9 CHW0210 Cooling Coil Valve Stuck (Fully Open) ++ ++ ++ ++ 0 0 0 0 0 0 0 − 0 0 ++ − 10 CHW0211 Cooling Coil Valve Stuck (Partially Open – 20%) + + + + 0 0 0 0 0 0 0 0 0 0 ++ 0 11 HCF0205 Heating Coil Fouling Stage 1 0 −− 0 0 + + + + 0 + − 0 0 0 0 0 12 HCF0206 Heating Coil Fouling Stage 2 0 −− 0 0 + + + + 0 + − 0 0 0 0 0 13 HCRC0207 Heating coil reduced capacity Stage 1 + − 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 HCRC0208 Heating coil reduced capacity Stage 2 + − 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 HCRC0209 Heating coil reduced capacity Stage 3 + − 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A plug {0(a) , +(b) , ++(c) , −(d) , −−(e) } indicates that the value for the variable is: (a) 0: unchanged (the fault has no effect on the corresponding variable); (b) +: greater than normal; (c) ++: substantially greater than normal; (d) −: less than normal; (e) −−: substantially less than normal.
  • 17. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 17 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 iteration Ave.ClusterEntropy 200 400 600 800 1000 0 10 20 30 40 iteration NumberofClusters 200 400 600 800 1000 10 −2 10 0 10 2 iteration Ave.Distortion 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20 25 Number of Clusters ClusterEntropy 0 5 10 15 20 25 0 10 20 30 40 50 60 Number of Clusters AverageDistortion 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 50 60 Cluster Entropy AverageDistortion 0 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20 25 0 10 20 30 40 50 60 Number of Clusters Cluster Entropy AverageDistortion Fig. 12. The scatter plot of the average distortion with respect to cluster entropy and the number of clusters extracted after a run of SE-SRCE with = 1.2. The top graphs show the cross-sectional plots of the three parameters during optimization of SE-SRCE, leading to the creation of the bottom scatter plot. The appropriate entropy range/K range can be investigated by observing Km, Hm, and f (Ym, X) trade-offs so that both distortion and entropy can be minimized while keeping the number of clusters to a reasonable level.
  • 18. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 18 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx We measured the appropriateness of the feature selection method by investigating the normalized mutual information (NMI) between features [26]. Mutual information examines the depen- dence between two discrete distributions X and Y. Minimizing mutual information is equal to maximizing the KL-divergence between the cross-entropy H(X, Y) and the marginal entropies (H(X) and H(Y)) as follows, NMI(X; Y) = I(X; Y) H(X)H(Y) , = H(X) + H(Y) − H(X, Y) H(X)H(Y) , = x∈X y∈Y p(x, y)(log p(x, y)/(p(x)p(y))) x∈X p(x) log p(x) y∈Y p(y) log p(y) , (30) where X and Y in our case was a pair of fuzzy feature signals (y1 and y2 calculated using Eq. (5)), rounded to the nearest integer, such that X(n) = round(y1(n)), X(n) ∈ {−1, 0, 1}, (31) and Y(n) = round(y2(n)), Y(n) ∈ {−1, 0, 1}. (32) The NMI is calculated by marginalizing the probability of co- occurrence between these three discrete categories. For a pair of signals, NMI closer to 1 indicates that the feature pair is redun- dant. For each feature set, the strictly upper triangular of the pairwise NMI matrix is taken and the median, 75 percentile, and 95 percentile is averaged over 80 runs. Since we want to minimize redundancies between features, a good feature set is characterized by an average NMI closer to 0. Table 2 summarizes the result of the experiment. The characteristic features in each season were unique from those of other seasons. In order to analyze the important features for each season, we repeated the clustering process 200 times. From this process, three histograms describing the probability of occur- rence of the characteristic features for each season were reported in Fig. 13. The probability of occurrence was calculated as the fre- quency of appearance divided by the number of trials. The overall patterns for fault classes for each season based on the characteristic features are presented in Figs. 14–16, respectively. Each circle in these figures show the condition of the characteristic features during a specific fault in the HVAC system. 6.2. Classification result Generalization capability of a classifier is a powerful indicator of the quality of the features. Using the characteristic features selected using the proposed method, a classifier can be trained with less computational burden and less probability of overfitting (note that in our experiment, 30% of the data was equally divided into train- ing and validation sets, the remaining 70% is used as test set). The classifier were trained and tested using the fuzzy features, ys, as is shown in Figs. 14–16. The parameters for NARX-TDNN are set as follows. The number of hidden neurons was set to 10. The input layer, hidden layer, and feedback orders were set to 2. The architecture is illustrated in Fig. 8. The dataset was divided at random to be used for training (15%), validation (15%), and test (70%) sets. The training was done using Levenberg–Marquardt algorithm. The experiment was repeated 80 times for each season to test the reliability and repeatability of the method. Using the features shown in Figs. 14–16, the average sen- sitivity and specificity of the proposed method compared to Li’s manual feature selection approach is presented in Table 3. The quality of the feature sets selected by ERCE was bench- marked against the features selected by EAC K-means, WEAC K-means, and Complete Linkage. The features selected by these four competing algorithms were supplied for both NARX-TDNN and Hidden Markov Models (HMM) [11–13], where the training and testing for both classifiers were repeated 100 times for each pair of feature selection and classification algorithm. The weighted average (WA) sensitivity and WA specificity result are reported in Table 4. The significance of the experimental result were validated using paired t-test with null hypotheses as follows: 1. H∗ 0 : The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. A star (*) in Tables 3 and 4 indicates that H∗ 0 should be rejected, whereas no sign indicates otherwise. 2. H † 0 : Given the same feature selection algorithm, a trained classifier A does not exercise significantly better performance compared to classifier B. A dagger (†) in Table 4 indicates that H † 0 should be rejected, whereas no sign indicates otherwise. 7. Discussion As the proposed feature selection process is strictly unsu- pervised, analyzing the result leads to a number of interesting observations. With regards to the redundancies between features, it can be seen in Table 2 that all consensus algorithms (Median NMIERCE = 0.019, Median NMIEAC Kmeans = 0.040, Median NMIWEAC Kmeans = 0.048) in general outperformed CL (Median NMI = 0.1305), manual selection (Median NMI = 0.0199, Q75% NMI = 0.2227), and no selection (Median NMI = 0.1857). The three consensus algorithms reported less than 20 characteristic features on average, which is at least four times lower than the number of characteristic features selected using CL. Furthermore, the features selected by ERCE (Median NMI = 0.019 ± 0.004) outper- formed those that are selected by other consensus algorithms: EAC K-means (Median NMI = 0.040 ± 0.011) and WEAC K-means (Median NMI = 0.048 ± 0.034) as indicated by its low NMI. ERCE also had smaller standard deviations on all performance aspects, especially on the number of features, suggesting the relatively high reliability and repeatability of the proposed swarm-based consensus clustering algorithm. With regards to the reliability of the feature selection algorithm, ERCE consistently selects features that are unique and relevant to the faults in the corresponding year, as can be seen in Fig. 13. For example, throughout the experiment using Winter 2008 dataset, ERCE consistently selected HWC-VLV, PLN-TMP, EA-DMPR, HWC- DAT and HWP-GPM, which are ones of the important features for the specific season. Pattern for the Winter 2008 dataset is shown in Fig. 16. In this figure, the pattern for Exhaust Air Damper Stuck (EADS) faults can be easily distinguished among the others by observing the conditions of both EA-DMPR and PLN-TMP. Simi- larly, HCRC faults in this season are characterized by abnormal HWC-VLV and VAV-DMPR signals. CHW faults are also observable from an increase in HWC-DAT as the system compensates for the increased flow of chilled water due to the faulty cooling coil valve. ERCE also appropriately discovers that SC CHWC-GPM is a partic- ularly important feature in Spring 2008 due to HCSF0517, as has been discussed previously in Section 3. ERCE discovers that outside air damper (OA-DMPR) is consistently inside the atypical nega- tive region during HCSF faults. This information may be useful for further investigation of the nature of the particular fault. Regarding the effects of the proposed feature selection algo- rithm to classifier performances, the result of ERCE+NARX-TDNN, 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849
  • 19. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 19 Table 2 The Normalized Mutual Information (NMI) between features selected using various feature selection algorithm on Spring 2008 dataset. Boldface indicates the lowest NMI (the least redundancies between features). Feature selection method Without feature selection Manual selection [20] CL # of Features 320 16 84 NMI between characteristic feature pairs Median 0.1857 0.0199 0.1305 Q75% NMI 0.4110 0.3014 0.2227 Q95% NMI 0.8821 0.4899 0.4863 Feature selection method EAC k-Means WEAC K-means ERCE # of Features 15.90 ± 3.86 16.70 ± 4.73 17.20 ± 1.60 NMI between characteristic feature pairs Median 0.040 ± 0.011 0.048 ± 0.034 0.019 ± 0.004 Q75% NMI 0.106 ± 0.025 0.131 ± 0.068 0.078 ± 0.013 Q95% NMI 0.404 ± 0.035 0.364 ± 1.600 0.339 ± 0.031 particularly in the Spring 2008 shows a clear advantage of ERCE to other feature selection approaches. As can be seen in Table 3, when compared to the manual selected features as suggested by Li [20], supplying NARX-TDNN with the feature selected by ERCE results in consistent specificity improvements in Spring 2008. Moreover overall statistically significant weighted average per- formance improvements are also observed throughout Summer 2007, Spring 2008, and Winter 2008 based on our experiment. Based on the statistical results in Table 4, using features from Li and EAC K-means limits NARX-TDNN’s specificity at an aver- age around 91.54% and 91.85% respectively. The low average may be attributed to misclassification of a number of more ambigu- ous faults such as OASB, MADU, AFAB and HCSF. This report is consistent with Li’s observation, presented in Table 1 where Fig. 13. Representative feature occurrence histogram for each season after 200 clustering trials. The x-axis denotes the specific label for each feature, y-axis denotes the probability of occurrence, calculated as the frequency of appearance divided by the number of trials. 850 851 852 853 854 855 856 857 858 859 860 861 862 863
  • 20. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 20 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 NOR0819 NOR0825 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 EADS0820 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 EADS0821 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 RFF0822 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 RFF0823 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWC0824 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWC0903 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 OADS0826 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWV0827 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWV0831 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWV0901 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 CHWV0902 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 HCL0828 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 HCL0829 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 HCL0830 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 OADL0905 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 OADL0906 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 AHUL0907 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 −1.0 0.0 1.0 AHUL0908 Fig. 14. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Summer 2007 dataset.
  • 21. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 21 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 NOR0502 NOR0503 NOR0504 NOR0505 NOR0509 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 OASB0529 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 OASB0530 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 OADS0507 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 OADS0508 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 EADS0527 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 EADS0510 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 EADS0511 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 CHW0506 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 CHW0515 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 CHW0516 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 RFF0512 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 RFF0518 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 RFF0519 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 AFAB0522 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 AFAB0525 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 MADU0513 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 MADU0514 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 HCSF0517 1 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 −1.0 0.0 1.0 HCSF0601 Fig. 15. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Spring 2008 dataset.
  • 22. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 22 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 1 2 3 4 5 6 7 −1.0 0.0 1.0 NOR0129 NOR0216 NOR0217 1 2 3 4 5 6 7 −1.0 0.0 1.0 OADS0212 1 2 3 4 5 6 7 −1.0 0.0 1.0 OADL0213 1 2 3 4 5 6 7 −1.0 0.0 1.0 OADL0215 1 2 3 4 5 6 7 −1.0 0.0 1.0 EADS0202 1 2 3 4 5 6 7 −1.0 0.0 1.0 EADS0203 1 2 3 4 5 6 7 −1.0 0.0 1.0 CHW0210 1 2 3 4 5 6 7 −1.0 0.0 1.0 CHW0211 1 2 3 4 5 6 7 −1.0 0.0 1.0 HCF0205 1 2 3 4 5 6 7 −1.0 0.0 1.0 HCF0206 1 2 3 4 5 6 7 −1.0 0.0 1.0 HCRC0207 1 2 3 4 5 6 7 −1.0 0.0 1.0 HCRC0208 1 2 3 4 5 6 7 −1.0 0.0 1.0 HCRC0209 Fig. 16. Patterns constituted by the characteristic features for each data in the ASHRAE-1312 Winter 2008 dataset. these faults seem to have no effects on the manually selected features. Similar cases are seen with WEAC K-means and com- plete linkage. Using features from ERCE allows NARX-TDNN to reach a significantly higher specificity average of 98.37% ± 0.25%. The significance of the results are statistically validated on both Summer 2007 and Spring 2008 datasets, where signals exhibit more nonlinearities compared to those in the Winter 2008 dataset. Regarding the general performance of the classifiers, results in Table 4 show the comparative performance between HMM and NARX-TDNN. While HMM shows superior specificity in Winter 2008 dataset, its specificity in Spring 2008 and Summer 2007 is relatively not as high. This is arguably due to the nonlin- earities in the fault patterns in Spring 2008 and Summer 2007 datasets compared to Winter 2008 faults. For instance, it can be seen in Fig. 15 that MADU, AFAB and HCSF faults exhibit visually ambiguous patterns. When dealing with these nonlinear datasets, the NARX-TDNN classifier benefits from its capabil- ity in dealing with long-term dependencies. Table 4 shows that NARX-TDNN was capable in distinguishing these faults, achiev- ing specificity of 98.37% ± 0.25% using the features provided by ERCE. 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885
  • 23. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 23 Table 3 NARX-TDNN classification result. Fault type Feature selection method Manual selectiona ERCEb Sensitivity Specificity Sensitivity Specificity Summer 2007 NOR 99.9% ± 0.1% 98.1% ± 1.6% 99.9% ± 0.2% 99.0% ± 2.1% EADS 99.7% ± 0.5% 99.5% ± 2.7% 99.8% ± 0.3% 98.9% ± 2.5% RFF 99.9% ± 0.0% 99.0% ± 2.7% 99.9% ± 0.1% 99.5% ± 1.4% CHWC 99.9% ± 0.2% 99.0% ± 1.1% 99.8% ± 0.2% 99.0% ± 4.4% OADS 99.9% ± 0.2% 98.0% ± 2.2% 99.9% ± 0.3% 97.3% ± 3.1% CHWV 99.8% ± 0.3% 99.0% ± 4.3% 99.7% ± 0.9% 99.2% ± 2.5% HCL 99.7% ± 0.4% 98.0% ± 1.0% 99.7% ± 0.3% 98.4% ± 2.4% OADL 99.7% ± 0.5% * 95.2% ± 7.1% 99.9% ± 0.2% 98.0% ± 1.2% AHUL 99.8% ± 0.2% 99.8% ± 1.1% 99.9% ± 0.1% 99.5% ± 2.6% Weighted average 99.8% ± 0.1% * 96.8% ± 2.2% 99.8% ± 0.1% 98.4% ± 0.7% Spring 2008 NOR 99.8% ± 0.3% 99.3% ± 2.1% 99.9% ± 0.1% 99.6% ± 0.6% OASB 99.1% ± 1.5% * 95.0% ± 6.1% 99.7% ± 0.3% 99.5% ± 1.4% OADS 99.9% ± 0.2% * 98.2% ± 1.7% 99.8% ± 0.1% 99.5% ± 0.9% EADS 99.9% ± 0.1% * 98.3% ± 0.5% 99.9% ± 0.1% 99.0% ± 2.8% CHW 99.7% ± 0.4% * 98.7% ± 0.8% 99.8% ± 0.2% 99.3% ± 0.7% RFF 99.9% ± 0.2% * 82.6% ± 33.1% 99.8% ± 0.1% 99.4% ± 0.7% AFAB 99.7% ± 0.2% * 42.9% ± 17.8% 99.7% ± 0.2% 98.5% ± 4.9% MADU 98.6% ± 1.6% * 70.4% ± 39.8% 98.9% ± 0.2% 98.0% ± 4.0% HCSF 99.6% ± 0.6% * 94.7% ± 6.6% 99.9% ± 0.0% 99.5% ± 1.5% Weighted average 98.9% ± 0.2% * 86.2% ± 5.0% 99.9% ± 0.1% 99.2% ± 0.5% Winter 2008 NOR 99.6% ± 0.4% 99.3% ± 1.1% 99.8% ± 0.1% 98.3% ± 2.4% OADS 99.9% ± 0.1% * 95.6% ± 3.8% 99.8% ± 0.2% 98.7% ± 1.4% OADL 99.8% ± 0.4% 98.5% ± 3.2% 99.5% ± 0.7% 98.5% ± 1.5% EADS 99.9% ± 0.4% 97.9% ± 1.3% 99.6% ± 0.3% 97.5% ± 2.5% CHW 99.8% ± 0.4% * 97.5% ± 5.2% 99.6% ± 0.3% 99.1% ± 1.2% HCF 99.8% ± 0.4% * 95.1% ± 4.5% 99.2% ± 0.7% 97.2% ± 2.9% HCRC 99.8% ± 0.4% 99.0% ± 2.2% 99.8% ± 0.3% 99.4% ± 1.1% Weighted average 99.7% ± 0.2% 97.5% ± 0.7% 99.8% ± 0.1% 98.7% ± 0.7% H∗ 0 : The performance of NARX-TDNN using features from ERCE is not significantly better than using manually selected features. a Manual selection utilizes Shun Li’s feature set [20]. b ERCE features are as shown in Fig. 14–16. * Reject H∗ 0 (˛ = 0.001). Table 4 Performance comparison with competing feature selection methods, tested against two classification methods: NARX-TDNN and HMM. Feature selection # of features HMM NARX-TDNN WA sensitivity WA specificity WA sensitivity WA specificity Summer 2007 Manual selectiona 16 ± 0.00 * 98.65% ± 0.34% 89.45% ± 2.48% † 99.59% ± 0.12% † 96.81% ± 1.99% EAC K-means 29.85 ± 17.26 * 98.70% ± 0.50% * 85.01% ± 4.94% † 99.69% ± 0.22% *,† 95.07% ± 3.75% WEAC K-means 14.14 ± 13.09 * 97.69% ± 0.13% * 72.85% ± 1.48% † 99.79% ± 0.08% *,† 96.85% ± 2.31% Complete linkage 81.00 ± 0.00 98.71% ± 0.98% 90.49% ± 7.52% † 99.51% ± 0.27% † 96.42% ± 1.16% ERCE 21.41 ± 4.46 99.15% ± 0.32% 90.85% ± 4.16% † 99.69% ± 0.08% † 97.61% ± 0.85% Spring 2008 Manual selectiona 16 ± 0.00 98.90% ± 0.54% † 91.54% ± 2.98% * 98.89% ± 0.23% * 86.17% ± 5.01% EAC K-means 34.56 ± 9.40 98.55% ± 0.42% 91.85% ± 2.68% *,† 99.02% ± 0.81% * 91.92% ± 6.42% WEAC K-means 33.52 ± 10.32 98.83% ± 0.40% 93.37% ± 2.38% † 99.20% ± 0.49% * 92.37% ± 6.53% Complete linkage 84 ± 0.00 98.80% ± 0.46% 94.12% ± 2.61% † 99.62% ± 0.17% * 95.14% ± 1.29% ERCE 19.93 ± 5.19 98.84% ± 0.32% 92.68% ± 2.66% † 99.79% ± 0.10% † 98.37% ± 0.25% Winter 2008 Manual selectiona 16 ± 0.00 98.81% ± 0.56% * 92.92% ± 0.31% † 99.71% ± 0.15% † 97.51% ± 0.65% EAC K-means 27.74 ± 7.18 † 99.98% ± 0.14% † 99.85% ± 0.85% 99.49% ± 0.50% 97.87% ± 2.06% WEAC K-means 21.37 ± 11.75 † 99.96% ± 0.18% 99.79% ± 1.00% 99.59% ± 0.19% 97.68% ± 0.88% Complete linkage 95 ± 0.00 99.87% ± 0.40% 99.21% ± 2.37% 99.74% ± 0.13% 98.54% ± 1.01% ERCE 7.88 ± 3.02 99.92% ± 0.31% 99.49% ± 1.43% 99.73% ± 0.19% 98.35% ± 1.16% H∗ 0: The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. H † 0 : Given the same feature selection algorithm, a trained classifier A does not exercise significantly better performance compared to classifier B. a Manual selection utilizes Shun Li’s feature set [20]. * Reject H∗ 0 (˛ = 0.001). † Reject H † 0 (˛ = 0.001). 8. Conclusion A method for automating feature selection and classification of faults for Heating Ventilation and Air-Conditioning (HVAC) sys- tems using a knowledge-discovery and Neural-Network approach has been proposed. The core of the method is the Ensemble Rapid Centroid Estimation (ERCE) which automatically finds characteris- tic features and discards redundant features. Using these character- istic features, a Parallel Nonlinear Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN) is then trained to identify the faults described in ASHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008 datasets. 886 887 888 889 890 891 892 893 894 895 896
  • 24. Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus- tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.05.030 ARTICLE IN PRESSG Model ASOC29831–24 24 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx The performance of the proposed unsupervised fea- ture selection algorithm (ERCE Median NMI = 0.019 ± 0.004) generally outperformed the conventional consensus clus- tering including Evidence Accumulation K-means (Median NMI = 0.040 ± 0.011), Weighted Evidence Accumulation K-means (Median NMI = 0.048 ± 0.034), and the conventional complete linkage clustering (Median NMI = 0.1305). ERCE also had smaller standard deviations on all performance aspects, especially on the number of features, suggesting the relatively high reliability and repeatability of the proposed swarm-based consensus clustering algorithm. The proposed feature selection method was tested on the experimental fault data from the ASHRAE-1312-RP datasets includ- ing Summer 2007, Spring 2008, and Winter 2008 using two well-established time-domain classifiers: (a) NARX-TDNN; and (b) Hidden Markov Models (HMM). Satisfactory results were reported and summarized. Our experimental results showed weighted aver- age sensitivity and specificity of: (a) higher than 99% and 96% for NARX-TDNN, and; (b) higher than 98% and 86% for HMM on the ASHRAE-1312-RP datasets. The proposed feature selection method appears to have positive effect in improving the generalization capability of both AFDD algorithms based on our experiment. Notwithstanding the satisfactory result to date, further work is necessary to investigate the performance of the proposed method on alternative HVAC systems. Future works will incor- porate semi-supervised adaptive learning capability for automatic fault discovery. We are also interested in applying the proposed consensus clustering method for other applications. Acknowledgements This research is funded by The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Marsfield, Australia. The ASHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008 fault data are provided by CSIRO. The research is supervised by CSIRO, the paper writing is supervised specifically by Guo. Automatic Fault Detection and Diagnosis (AFDD) for the Heating Ventilation and Air Conditioning (HVAC) research is an ongoing project in CSIRO Energy Technology and Computational Informat- ics. We acknowledge the inputs of the anonymous reviewers for the time and effort in providing our paper comprehensive quality criticisms. The corresponding author would also like to personally acknowledge Nina Elita for her contribution, especially in proof reading and provision of sincere moral support to the correspond- ing author during the preparation, writing and submission of this paper. References [1] A. Kusiak, M. Li, F. Tang, Modeling and optimization of {HVAC} energy con- sumption, Appl. Energy 87 (2010) 3092–3102. [2] A. Kusiak, F. Tang, G. Xu, Multi-objective optimization of {HVAC} system with an evolutionary computation algorithm, Energy 36 (2011) 2440–2449. [3] J. Wall, Automatic Fault Detection and Diagnosis, 2011 http://www.csiro.au/ Outcomes/Energy/building-fault-detection.aspx [4] J. Ward, Opticool, 2013 http://www.csiro.au/Organisation-Structure/ Flagships/Energy-Flagship/Opticool.aspx [5] J. Liang, R. Du, Model-based fault detection and diagnosis of HVAC systems using support vector machine method, Int. J. Refrig. 30 (2007) 1104–1114. [6] D. Jacob, S. Dietz, S. Komhard, C. Neumann, S. Herkel, Black-box models for fault detection and performance monitoring of buildings, J. Build. Perform. Simul. 3 (2010) 53–62. [7] C. Lo, P. Chan, Y.-K. Wong, A.B. Rad, K. Cheung, Fuzzy-genetic algorithm for auto- matic fault detection in HVAC systems, Appl. Soft Comput. 7 (2007) 554–560. [8] J. Schein, S.T. Bushby, N.S. Castro, J.M. House, A rule-based fault detection method for air handling units, Energy Build. 38 (2006) 1485–1492. [9] T.M. Rossi, J.E. Braun, A statistical, rule-based fault detection and diagnostic method for vapor compression air conditioners, HVAC&R Res. 3 (1997) 19–37. [10] J. Schein, Results from Field Testing of Embedded Air Handling Unit and Variable Air Volume Box Fault Detection Tools, U.S. Dept. of Commerce, Technology Administration, National Institute of Standards and Technology, 2006. [11] J. Wall, Y. Guo, J. Li, S. West, A dynamic machine learning-based tech- nique for automated fault detection in HVAC systems, in: Proceedings of the ASHRAE Annual Conference, Montreal, Quebec, Canada, 2011, 2011, pp. 449–456. [12] Y. Guo, D. Dehestani, J. Li, J. Wall, S. West, S. Su, Intelligent outlier detection for HVAC system fault detection, in: Proceedings of the 10th International Healthy Buildings Conference, Brisbane, Queensland, Australia, 2012, 2012. [13] Y. Guo, J. Wall, J. Li, S. West, Intelligent model based fault detection and diagnosis for HVAC system using statistical machine learning methods, in: Proceedings of the ASHRAE 2013 Winter Conference, Dallas, USA, 2013, 2013. [14] M. Yuwono, S.W. Su, Y. Guo, J. Li, S. West, J. Wall, Automatic feature selection using multiobjective cluster optimization for fault detection in a heating venti- lation and air conditioning system, in: Proceedings of the 2013 1st International Conference on Artificial Intelligence, Modelling and Simulation, AIMS ’13, IEEE Computer Society, Washington, DC, USA, 2013, 2013, pp. 171–176, http://dx. doi.org/10.1109/AIMS.2013.34 [15] W. Deng, X. Yang, L. Zou, M. Wang, Y. Liu, Y. Li, An improved self-adaptive differential evolution algorithm and its application, Chemometr. Intell. Lab. Syst. 128 (2013) 66–76, http://dx.doi.org/10.1016/j.chemolab.2013.07.004 [16] L. Wang, C.-X. Dun, W.-J. Bi, Y.-R. Zeng, An effective and efficient differen- tial evolution algorithm for the integrated stochastic joint replenishment and delivery model, Knowl.-Based Syst. 36 (2012) 104–114, http://dx.doi.org/10. 1016/j.knosys.2012.06.007 [17] M. Yuwono, S. Su, B. Moulton, H. Nguyen, Data clustering using variants of rapid centroid estimation, IEEE Trans. Evol. Comput. 18 (2013) 366–377. [18] M. Yuwono, S. Su, B. Moulton, H. Nguyen, An algorithm for scalable clustering: ensemble rapid centroid estimation, in: Proceedings of the 2014 IEEE Congress on Evolutionary Computation, 2014, 2014, pp. 1250–1257. [19] D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarm optimization, in: Proceedings of the 2003 IEEE Congress on Evolutionary Com- putation, 2003, vol. 1, 2003, 2003, pp. 215–220. [20] S. Li, A Model-Based Fault Detection and Diagnostic Methodology for Secondary HVAC Systems (Ph.D. thesis), Drexel University, 2014. [21] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22 (1951) 79–86, http://dx.doi.org/10.1214/aoms/1177729694 [22] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: A resampling- based method for class discovery and visualization of gene expression microarray data, Mach. Learn. 52 (2003) 91–118, http://dx.doi.org/10.1023/ A:1023949509487 [23] M.D. Wilkerson, D.N. Hayes, ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking, Bioinformatics 26 (2010) 1572–1573. [24] D.N. Hayes, S. Monti, G. Parmigiani, C.B. Gilks, K. Naoki, A. Bhattacharjee, M.A. Socinski, C. Perou, M. Meyerson, Gene expression profiling reveals repro- ducible human lung adenocarcinoma subtypes in multiple independent patient cohorts, J. Clin. Oncol. 24 (2006) 5079–5090. [25] A. Fred, A. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835–850, http://dx.doi.org/10. 1109/TPAMI.2005.113 [26] A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for com- bining multiple partitions, J. Mach. Learn. Res. 3 (2003) 583–617, http://dx.doi. org/10.1162/153244303321897735 [27] I.J. Leontaritis, S.A. Billings, Input–output parametric models for non-linear systems. Part I: Deterministic non-linear systems, Int. J. Control 41 (1985) 303–328, http://dx.doi.org/10.1080/0020718508961129 [28] H. Siegelmann, B. Horne, C. Giles, Computational capabilities of recurrent NARX neural networks, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27 (1997) 208–215, http://dx.doi.org/10.1109/3477.558801 [29] J.M. Menezes Jr., G. Barreto, A new look at nonlinear time series prediction with NARX recurrent neural network, in: Ninth Brazilian Symposium on Neural Networks, 2006. SBRN ’06, 2006, pp. 160–165, http://dx.doi.org/10.1109/SBRN. 2006.7 [30] T. Wang, Comparing hard and fuzzy C-means for evidence-accumulation clus- tering, in: Proceedings of the 18th International Conference on Fuzzy Systems, FUZZ-IEEE’09, IEEE Press, Piscataway, NJ, USA, 2009, 2009, pp. 468–473. [31] F. Duarte, A.L.N. Fred, A. Lourenco, M. Rodrigues, Weighting cluster ensembles in evidence accumulation clustering, in: Portuguese Conference on Artificial Intelligence, 2005. EPIA 2005, 2005, pp. 159–167, http://dx.doi.org/10.1109/ EPIA.2005.341287 [32] M. Yuwono, S.W. Su, B.D. Moulton, H.T. Nguyen, Fast unsupervised learning method for rapid estimation of cluster centroids, in: Proceedings of the 2012 IEEE Congress on Evolutionary Computation, 2012, 2012, pp. 889–896. [33] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: G. Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical Taxonomy, Freeman, San Francisco, CA, 1975, 1975, pp. 143–166. [34] T. Wang, Ca-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41 (2011) 686–698, http://dx.doi.org/10.1109/TSMCB.2010.2086059 [35] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65. 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042