HVAC_CSIRO_Proof_2015

Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-
tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),
http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG Model
ASOC29831–24
Applied Soft Computing xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Unsupervised feature selection using swarm intelligence and
consensus clustering for automatic fault detection and diagnosis in
Heating Ventilation and Air Conditioning systems
Mitchell Yuwonoa,∗Q1 , Ying Guob
, Josh Wallc
, Jiaming Lib
, Sam Westc
, Glenn Plattc
,
Steven W. Sua
a
Faculty of Engineering and Information Technology, University of Technology, Sydney (UTS), 15 Broadway, Ultimo, NSW 2007, Australia
b
The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Computational Informatics, Marsfield, NSW 2122, Australia
c
The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Energy Technology, Mayfield West, NSW 2304, Australia
a r t i c l e i n f o
Article history:
Received 4 May 2014
Received in revised form 12 February 2015
Accepted 17 May 2015
Available online xxx
Keywords:
Data clusteringQ4
Consensus clustering
Feature selection
Ensemble Rapid Centroid Estimation (ERCE)
Particle Swarm Optimization
Fault detection and diagnosis
Heating Ventilation and Air Conditioning
(HVAC) system
Nonlinear Auto-Regressive Neural Network
with eXogenous inputs and distributed
time delays (NARX-TDNN)
Hidden Markov Model
a b s t r a c t
Various sensory andQ3 control signals in a Heating Ventilation and Air Conditioning (HVAC) system are
closely interrelated which give rise to severe redundancies between original signals. These redundancies
may cripple the generalization capability of an automatic fault detection and diagnosis (AFDD) algo-
rithm. This paper proposes an unsupervised feature selection approach and its application to AFDD in
a HVAC system. Using Ensemble Rapid Centroid Estimation (ERCE), the important features are auto-
matically selected from original measurements based on the relative entropy between the low- and
high-frequency features. The materials used is the experimental HVAC fault data from the ASHRAE-
1312-RP datasets containing a total of 49 days of various types of faults and corresponding severity.
The features selected using ERCE (Median normalized mutual information (NMI) = 0.019) achieved the
least redundancies compared to those selected using manual selection (Median NMI = 0.0199) Complete
Linkage (Median NMI = 0.1305), Evidence Accumulation K-means (Median NMI = 0.04) and Weighted Evi-
dence Accumulation K-means (Median NMI = 0.048). The effectiveness of the feature selection method is
further investigated using two well-established time-sequence classification algorithms: (a) Nonlinear
Auto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN); and
(b) Hidden Markov Models (HMM); where weighted average sensitivity and specificity of: (a) higher
than 99% and 96% for NARX-TDNN; and (b) higher than 98% and 86% for HMM is observed. The proposed
feature selection algorithm could potentially be applied to other model-based systems to improve the
fault detection performance.
© 2015 Published by Elsevier B.V.
1. Introduction
Q5
Heating Ventilation and Air Conditioning (HVAC) systems are
important for maintaining the thermal comfort and indoor air qual-
ity at places such as offices, shopping malls, warehouses, schools,
and homes [1,2]. According to the report by CSIRO [3], 25% of energy
consumption in Australia is accounted from commercial buildings
[3]. Moreover, HVAC systems represents 40–50% of energy use
in these buildings [4]. In the United States (US), HVAC systems
account for almost 31% of the electricity consumed by households
∗ Corresponding author. Tel.: +61 430731938.Q2
E-mail addresses: mitchellyuwono@gmail.com (M. Yuwono), Ying.Guo@csiro.au
(Y. Guo), Josh.Wall@csiro.au (J. Wall), Jiaming.Li@csiro.au (J. Li), Sam.West@csiro.au
(S. West), Glenn.Platt@csiro.au (G. Platt), Steven.Su@uts.edu.au (S.W. Su).
[1]. Operational problems in the HVAC systems can cause excess
energy consumption. Regular checks and maintenance are there-
fore crucial to prevent unnecessary consumption. However, due to
the high reactionary maintenance costs, preventive or predictive
maintenance practices are usually preferred to reactionary main-
tenance.
Discriminating a normally behaving HVAC system to a fault
condition is a relatively well researched area. A variety of auto-
matic fault detection and diagnosis (AFDD) techniques provide a
number of benefits to the HVAC systems [5–7]. The current AFDD
techniques available in the market for HVAC systems are mainly
rule-based approaches [8–10], which obtain prior knowledge to
derive a set of if-then-else rules and an inference mechanism that
searches through the rule-space to draw conclusions. The rule-
based systems can be based solely on expert knowledge (inferred
from experience) or can be based on prior knowledge of a specific
1568-4946/© 2015 Published by Elsevier B.V.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

ASOC29831–24
2 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
system. Being one of the very first methods used in HVAC fault
detection problems, the rule-based approaches have been most
popularly used over the last decades.
Indeed the rule-based approaches come with advantages
including ease of development, transparent reasoning, ability
to reason even under uncertainty, and the ability to provide
explanations for the conclusions reached. However, one must
realize that most HVAC systems are installed in different build-
ings/environments. This generally means that rules or analytical
models developed for a particular system cannot be easily applied
to an alternative system. As such, the difficult process of deter-
mining and setting rules or generating analytical mathematical
models must be tailored to each individual building/environment.
The threshold method utilized in the rule-based system is prone
to producing false alarms. Moreover, building conditions such as
structure of the internal architecture design and even external fac-
tors (such as shading and the growth of plant life) often change after
the system installation/initialization of a fault detection system,
which can require rules/models that were originally appropri-
ate to be revisited and updated. It can be learned that a number
of weaknesses associated with this type of approach include the
requirement of specific tailoring to a system, potential failure of
the AFDD system due to its limited knowledge boundaries, and dif-
ficulty in updating the model when the AFDD system is installed in
a different HVAC system. The aforementioned complications with
the rule-based approach give rise to the data driven methods for
AFDD in HVAC systems.
Regardless of the approach, the performance of an AFDD algo-
rithm generally depends on the quality of the features. In CSIRO,
we are developing a novel data-driven machine learning technique
for AFDD in HVAC systems [4,11–14]. Preliminary results were
presented in [11–14], showing the superior performance of the
machine learning-based technique in detecting air-handling unit
(AHU) faults to rule-based methods based on fault data obtained
from ASHRAE Project 1312-RP up to 90% accuracy [13]. However,
one limitation of the AFDD systems described in [11–13] is that
they rely on features provided by field experts. As with rules, fea-
tures that are particularly effective for a particular system may not
guarantee equivalent performance when utilized in an alternative
system.
Selecting the appropriate features is essential in any model-
based frameworks. Feature selection aims for minimizing redun-
dancies/mutual information between features such that the more
important ‘characteristic’ features are not undermined. Specific
faults exhibit specific symptoms which are observable only in
certain clusters of features that behave differently to the others.
The difficulty is that these cluster of features need to be con-
stantly monitored as they may change dynamically depending on
the condition of the HVAC system under investigation. Moreover,
incorrect selections of these characteristic features are dangerous
as they may adversely effect the final classifier to an extent that
some obvious faults are overlooked. The motivation of this paper
is therefore to design a reliable method for feature selection that
can be used to augment the effectiveness of AFDD frameworks in
general. The unsupervised data-driven feature selection algorithm
is designed for HVAC systems operating under varying seasonal
dynamics.
Evolutionary algorithms are particularly powerful for solving
complex optimization problems with multiple local minima. For
example, Differential Evolution (DE) has been used for optimization
of pressure vessel structure design [15] and joint replenish-
ment and distribution model [16]. Although the methods outlined
in [15,16] are powerful for general purpose optimization, a
major algorithmic restructuring is required to implement these
algorithms for cluster optimization. Instead, our paper is inter-
ested in exploiting a lightweight evolutionary algorithm designed
specifically for clustering purposes, the Rapid Centroid Estimation
(RCE) [17].
Unsupervised feature selection based on data clustering is inher-
ently an ill-posed problem where the goal is to group redundant
features into some unknown number of clusters based on intrin-
sic information alone. For this paper, we utilize the Ensemble Rapid
Centroid Estimation (ERCE) [17,18], a semi-stochastic multi-swarm
clustering algorithm inspired by the Particle Swarm Optimization
(PSO [19]), to determine the characteristic features for the specific
season. The method is designed to automate the selection of charac-
teristic features in each season. The block diagram of the proposed
method is shown in Fig. 1.
The performance of the proposed feature selection algorithm
was tested using two well established time-sequence classifiers:
(a) Nonlinear Auto-Regressive Time Delay Neural Networks with
Exogenous inputs (NARX TDNN); and (b) Hidden Markov Models
(HMM) [13]. A comprehensive comparison would also be given
with regards to other feature selection methods including Li’s
Manual selection [20], Complete Linkage (CL), Ensemble Evidence
Accumulation K-means (EAC K-means) and Weighted Evidence
Accumulation K-means (WEAC K-means).
The paper is structured as follows: Section 2 presents the
overview of the proposed method as well as the materials used to
examine its performance. Section 3 presents the detailed descrip-
tion for each component including feature extraction, feature
selection, and the classifier used in experiment. Section 4 describes
the theoretical foundations of the consensus clustering algorithm
that we utilize for performing the feature selection. Section 5
describes the data utilized in the experiments. Section 6 presents
a comprehensive experimental result of the proposed method and
comparative analysis with other conventional feature selection and
classification algorithms. Section 7 presents in depth analyses and
discussion regarding the results. Finally, Section 8 presents the con-
clusion and future direction of the research.
2. General overview on HVAC systems
HVAC systems are configured and used to control the environ-
ment of a building or a zone including one or several rooms. The
environmental variables may, for example, include temperature,
air-flow, and humidity. The desired values/set-points of the envi-
ronmental variables will depend on the intended use of the HVAC
system. If the HVAC system is being used in an office building, the
environmental variables will be set to make the building/rooms
therein comfortable to humans. An HVAC system typically services
a number of zones within a building. The system normally includes
a central plant which includes:
• a hydronic heater and chiller,
• a pump system, which may include dedicated heated and chilled
water pumps, circulates heated and chilled water from the heater
and chiller through a circuit of interconnected pipes, and
• a valve system, which may include dedicated heated and chilled
water valves, controls the flow of water into a heat exchange
system (which may include dedicated heated and chilled water
coils).
The heated and/or chilled water circulates through the heat
exchange system before being returned to the central plant where
the process repeats (i.e. the water is heated or chilled and recircu-
lated). In the heat exchange system, energy from the heated/chilled
water is exchanged with air being circulated through an air distri-
bution system.
The HVAC system also includes a sensing system which typically
includes a number of sensors located throughout the system, such
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183

ASOC29831–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 3
Fig. 1. Block diagram of the proposed method.
as temperature, humidity, air velocity, volumetric flow, pressure,
gas, position, and occupancy detection sensors. The HVAC system
is controlled by a control system that may be a stand alone system,
or may form part of a building automation system (BAS) or build-
ing management and control system (BMCS). The control system
includes a computing system which is in communication with the
various components of the HVAC system. The control system con-
trols and/or receives feedback from the various components of the
HVAC system in order to regulate environmental conditions for the
inhabitancy or functional purpose of the building.
In an AFDD process, data from the components of the HVAC
system is received. This data may, for example, include sensed data
from various sensors within the system and feedback data from
various components of the system. Additional data from external
data sources can also be received, such as the external weather
data. Consequently, the dimensionality and volume of these data
are enormous.
In order to ensure proper identification of faults, an AFDD algo-
rithm requires redundancies in the selected sensory and control
signal sources to be minimized. Additional information given by
redundant features are irrelevant and provide no useful informa-
tion in describing the type of fault and will ultimately cripple the
generalization capability of the fault detector. Insufficient features
are equally as dangerous as it may lead misdiagnoses due to incom-
plete information.
The method presented in this paper offers an unsupervised
approach for feature selection method using ERCE. The system can
be summarized in the block diagram in Fig. 1. A sample feature
extraction and feature selection result using our proposed approach
can be seen in Fig. 2.
The experimental materials in this paper are the experimental
fault data from the ASHRAE-1312-RP datasets including Summer
2007, Spring 2008, and Winter 2008 from the ASHRAE Project 1312-
RP. In each season, different faults were generated, recorded and
reported for experimental uses.
3. Methods
Selecting important features in a HVAC system is challenging
due to the excessive interrelations between signals. This section
overviews our contribution on feature selection using consensus
clustering and how it is applied for the HVAC system in particular.
The section is subdivided into five subsections:
• Section 3.1 outlines the general model that we use for extracting
magnitude and oscillation (spectral centroid) features from a raw
signal.
• Section 3.2 outlines our proposed polar approach for visualizing
multi-dimensional patterns.
• Section 3.3 defines the measure that we use for quantifying the
degree of dissimilarity between features.
• Section 3.4 provides the general overview of our main contri-
bution, a method for feature selection using semi-stochastic
swarm-based consensus clustering, which will be further
detailed in Section 4.
• Section 3.5 shows the architecture of the neural networks that we
use to benchmark the efficiency of the proposed feature selection
method.
3.1. Extracting time signal features: magnitude and spectral
centroid
Sensory signals from a HVAC system are streamed in the form
of sampled time signals. From each time signal, HVAC engineers
mainly observe two main features for deciding the condition of the
system:
1. Whether the average magnitude of a sensory reading is inside
the typical condition for the specific season.
2. Whether there is any excessive oscillation in the sensory read-
ings compared to the typical condition for the specific season.
For example, a fault type classified as Sequence of Heating and
Cooling Unstable (HCSF0517) can be identified by observing the
excessive oscillation of the Chilled Water Coil control signal (CHWC
GPM). The phenomenon can be seen in Fig. 3. In this Figure, it is easy
to observe that the moving average magnitude of the CHWC GPM
during HCSF0517 is considerably close to the typical behavior.
We model these two features mathematically as the moving
average magnitude and spectral centroid. For a discrete signal gs(n),
the two features can be measured using a straightforward calcula-
tion as follows.
Magnitude characteristic is measured using a simple moving
average which is calculated as follows,
MAG(gs) =
1
N
N
n=1
gs(n), (1)
where n denotes the sample number, N denotes the length of the
window.
Spectral centroid of a signal describes the center of mass of the
spectrum, which can be calculated as follows,
gs = FFT(gs, NFFT ), (2)
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

ASOC29831–24
Fig. 2. (a) Raw signals for the Spring 2008 dataset; (b) the low and high frequency features are isolated from each signal. Signals 1–160 are moving average magnitude signals
while signals 161–320 are spectral centroid signals; (c) characteristic features are selected using ERCE, while (d) classification is done using NARX-TDNN.
SC(gs) =
NFFT
n=5
|ˆgs(n)|ˆgs(n)
NFFT
n=5
|ˆgs(n)|
, (3)
where FFT denotes fast Fourier transform, NFFT indicates the number
of bin, ˆgs(n) and |ˆgs(n)| represent the center frequency and magni-
tude of the nth bin. Notice that the frequency centroid is calculated
from the fifth bin to isolate only the high frequency oscillation.
Fault can be interpreted as ‘how much a signal deviates from its
typical characteristic during the specific season’. Incorporating this
criterion, each feature vector qs which includes {MAG(gs), SC(gs)} is
normalized with respect to its normal operation. The discrepancy
in both direction and magnitude relative to the normal signal is
represented as a signed multiple of the signal’s standard deviation
during typical operation,
zs(n) =
qs(n) − n(n)
n(n)
, (4)
where n(n) and n(n) denote the mean and standard deviation of
a feature during its normal operation at a specific sample n taken
at a particular time of the day. One can automatically realize that
the approach simply calculates the cross-sectional z-score of the
feature qs.
The hyperbolic tangent kernel is then applied on the z-score,
effectively transforming each feature to a continuous measure from
{ − 1, 1} as follows
ys(n) = tanh (zs) (5)
which has a rather intuitive ‘fuzzy’ interpretation as follows:
(a) ys(n) = 0: feature is at a typical level.
(b) ys(n) → −1: feature is atypical negative (much smaller than its
typical level),
(c) ys(n) → 1: feature is atypical positive (much larger than its typ-
ical level).
Intuitively, the variability of ys throughout the season would pro-
vide a good indicator of its importance. In this paper, we measure
variability of a feature in term of its entropy as follows,
Hys = − pys (x) log pys (x)dx, (6)
where pys (x) can be approximated empirically from the histogram
of ys.
3.2. Feature visualization
Visualization is an important tool to verify the effectiveness of a
feature selection algorithm. However, due to the complexity of an
HVAC system, simultaneous visualization would easily overwhelm
the observer.
In this paper a polar approach for visualizing patterns consti-
tuted by multi-dimensional feature cross-sections is proposed. The
visualization scheme can be seen in Fig. 4.
Using the proposed visualization scheme, we have the variable
numbers listed in particular angles in the circle, whose correspond-
ing radius represents the magnitude of ys, as previously detailed
in Eq. (5). A normal system would oscillate inside the typical
region (ys = 0) such that the polar plot shows a circle-like pat-
tern. During fault condition the sensors behave inside either the
positive/negative atypical region such that the polar plot assumes
various shapes other than circle. For example, Fig. 5 shows that the
pattern during normal operations are visually different to the OA
Damper Stuck (OADS) fault scenario.
3.3. Measuring divergence between features
A pair of feature vectors y1 ∈ Y and y2 ∈ Y calculated from Eq.
(5) can be treated as a vector of random numbers generated by the
probability distribution functions P = p(x) and Q = q(x), respectively.
y1 and y2 can be assumed to be redundant (i.e. generated from
the same distribution) when the Kullback–Leibler(KL) divergence
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324

ASOC29831–24
Fig. 3. The magnitude (top) and frequency (bottom) characteristics of the Chilled Water Control signal (CHWC GPM) during fault (HCSF0517) vs. normal (NOR0505). Even
though CHWC GPM during HCSF0517 is correlated in terms of magnitude characteristic, the signal is uncorrelated in terms of frequency characteristic.
between the two approaches zero [21]. A practical illustration of
the case can be seen in Fig. 6.
KL-divergence measures the relative entropy between two dis-
tributions [21]. KL-divergence measures the amount of information
lost when Q is used to approximate P as follows,
KL(P||Q) =
H(P,Q)
−
x
p(x) log q(x) +
−H(P)
x
p(x) log p(x), (7)
=
x
p(x) log
p(x)
q(x)
, (8)
where H(P, Q) denotes the cross entropy between P and Q and H(P)
denotes the information entropy of P. In this paper we use the
symmetrical KL-divergence as originally proposed in [21] due to
its symmetrical property as follows,
KLs(P||Q) = KL(P||Q) + KL(Q||P) =
x
p(x) log
p(x)
q(x)
− q(x) log
p(x)
q(x)
. (9)
3.4. Feature selection using consensus clustering
Performing feature selection using prototype-based algorithms
such as K-means, fuzzy C-means, or Self Organizing Map (SOM),
can be difﬁcult because the number of characteristic features K is
not initially known. Consensus clustering provides a quantitative
evidence for determining the number and membership of possible
clusters within a dataset (in our case, features). The method has
gained popularity in cancer genomics as a powerful tool to extract
and visualize the dependencies between genes [22–24].
In this paper we propose an approach for unsupervised fea-
ture selection using a swarm based ensemble algorithm [18]. An
advantage of ensemble clustering algorithms to the conventional
clustering algorithms is that they allow a robust estimation of
natural clusters by investigating the consensus strength between
multiple clusterings [22,25,26]. Consensus clustering is particularly
powerful for identifying strong clusters in the data [22]. This is par-
ticularly useful for our application as can be seen in Section 6 where
it can be observed that the features selected using consensus clus-
tering algorithms are generally more compact and least redundant
compared to the ones selected using complete-linkage.
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358

ASOC29831–24
Fig. 4. The proposed polar visualization scheme. In this illustration, we can see that features other than features #4 and #5 behave atypically.
The feature selection process can be summarized as follows:
1. Determine the feature clusters using consensus clustering.
2. For each cluster, rank each feature according to its entropy and
pick one whose entropy is the highest as the characteristic fea-
ture for the cluster.
A sample result of a run of feature selection process using con-
sensus clustering is shown in Fig. 7. Features in the same cluster
are denoted accordingly using the same color. The radius of each
feature indicates the entropy. A bold circle in each cluster is the
chosen characteristic features, which is the feature with the highest
entropy compared to the others in the same cluster.
3.5. Fault classiﬁcation using Nonlinear Auto-Regressive Neural
Network with eXogenous inputs and distributed time delays
(NARX-TDNN)
The Non-linear Auto-Regressive with eXogeneous inputs
(NARX) network architecture [27] is a class of discrete-time non-
linear systems. The NARX architecture can be broadly expressed in
the parallel mode,
ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), ˆy(t − ny), . . ., ˆy(t − 1)), (10)
or in the series-parallel mode,
ˆy(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (11)
where u(t), y(t) and ˆy(t) denote input, actual output and esti-
mated output of the network at time t. nu and ny are the input
and output order, and f denotes a nonlinear function, which can be
Fig. 5. The proposed polar visualization scheme showing the characteristic signals in normal operation scenarios (left) and in OADS scenario (right) in the Winter 2008
dataset.
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382

ASOC29831–24
Fig. 6. A simplified case of redundancy between features in a HVAC system. How many clusters are there? It can be seen that the divergence between yCHWC−VLV and yCHWC−GPM
distributions is intuitively smaller than the divergence between yCHWC−VLV and ySA−HUMD. If these four signals were to be clustered, then a possible solution would be to assign
them into two clusters, i.e. {{ yCHWC−VLV, yCHWC−GPM }, {ySA−HUMD, yRA−HUMD}}.
approximated using a Multilayer Perceptron (MLP). As opposed to
conventional Recurrent Neural Network (RNN), a NARX network’s
feedback comes only from the output neurons rather than its hid-
den states. Using this simplified configuration, it has been argued
that NARX networks generalize better compared to other RNN net-
works, especially on problems involving long-term dependencies
[28].
The configurations described in Eqs. (10) and (11) differ only in
their mode of feedback. The configuration described in Eq. (10) is
referred to as parallel mode or recurrent NARX (NARX-P), while Eq.
(11) is referred to as series-parallel mode NARX (NARX-SP) [29].
The NARX-P uses the state estimate feedback, while NARX-SP uses
the actual observable state. Due to the fact that the actual state of an
HVAC system is practically unavailable at all times, the deployment
of NARX in an AFDD systems is currently limited to the NARX-P
configuration.
4. Consensus clustering
This section explains, in great detail, the semi-stochastic swarm-
based consensus clustering approach to feature selection in a HVAC
system. The section is subdivided into six subsections:
• Section 4.1 briefly introduces the consensus clustering paradigm,
• Section 4.2 presents the visual abstract of our proposed feature
selection method,
• Section 4.3 overviews Fred and Jain’s Ensemble Accumulation
[25],
• Section 4.4 summarizes our previous work on Swarm Rapid Cen-
troid Estimation (SRCE) [17],
• Section 4.5 introduces the newly proposed ‘self-evolution’ strat-
egy for the SRCE,
• Section 4.6 outlines the new implementation of ERCE for feature
selection purposes.
4.1. Fundamentals of consensus clustering
Consensus clustering infers a consensus matrix from multiple
runs of clustering algorithms. This consensus matrix encodes the
probability of each pairs of observation belonging to the same clus-
ter. It has been argued that the natural, and arguably, optimum
clusters can be validated with higher confidence by analyzing the
stability of this matrix [22,25].
The consensus matrix C is a positive semidefinite N × N square
matrix of joint probabilities. Each Cij ∈ {0, 1} represents the proba-
bility of data point i and j belonging in the same cluster. For given
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423

ASOC29831–24
Fig. 7. A result of feature selection using ERCE (Algorithm 4, Section 4) on the Spring 2008 dataset, projected on the first and second principal components for ease ofQ6
visualization. Each point represents a feature where the radius denotes the corresponding entropy. Each feature cluster is color coded and the characteristic feature of each
cluster is annotated accordingly. In this example, ERCE chose 16 characteristic features from the 320 features (160 magnitude features and 160 spectral centroid features). It
can be seen that the spectral centroid feature for CHWC-GPM (SC CHWC-GPM) is selected, in line with the observation in Fig. 3. ERCE accurately discovered that Return Fan
(RF) and Supply Fan (SF) features are particularly important. This discovery is in line with the existence of Return Fan Failure (RFF) faults (May 12th, 18th, and 19th) observed
during the season. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
a cluster assignment obtained from the mth clustering, we can cal-
culate the mth co-association matrix as follows,
Cm = UT
mUm, (12)
where each Um is a Km × N matrix which stores the values of
uik,m for i ∈ {1, . . ., N} and k ∈ {1, . . ., Km} obtained from the mth
run of any clustering algorithm. Each uik,m denotes the probabil-
ity of a data point yi belonging to the cluster Ck. For any m, Um
should satisfy the constraints uik,m ∈ {0, 1} and
K
k=1
uik,m = 1. The
matrix multiplication represents a probabilistic ‘and’ operator con-
veniently calculated using the (multiplicative) fuzzy T-norm [30].
The ith diagonal component of Cm, i.e. Dii,m, quantifies the degree of
Fig. 8. An illustration describing the architecture of the Parallel Nonlinear Auto-Regressive Time Delay Neural Networks with eXogenous input (NARX-TDNN).
424
425
426
427
428
429
430
431
432
433
434

ASOC29831–24
Fig. 9. Various partitions on the Spring 2008 dataset encoded by 16 subswarms of the Self Evolving Swarm Rapid Centroid Estimation (SE-SRCE, Algorithm 3). Fuzzifier
constant is set to 1.2, target entropies are uniformly randomized between 0.005 and 0.05. The coordinates are projected to the first and second principal components for
ease of visualization. In depth explanation regarding the method can be read in Section 4.4 and Section 4.5.
stability for the ith data in the mth clustering. In this paper we
propose normalizing Cm by its diagonal matrix Dm as follows,
Cm = D
−1/2
m CmD
−1/2
m (13)
The consensus C, or ensemble aggregate, is calculated as the
weighted average of the co-association matrices C1, C2, . . ., CM as
follows,
C =
M
m=1
wmCm
M
m=1
wm
, (14)
where wm denotes the weight of the corresponding partition which
can be determined manually or using any cluster validation method
[31]. wm can also be set to assume equal weighting such that wm = 1
for all m [25].
The consensus distance matrix can be defined as follows [22],
D = 1 − C (15)
which transforms the consensus matrix into a pairwise distance
matrix. Fred and Jain [25] proposes using single/average/complete
linkage algorithm on the D matrix to recover the natural cluster. In
their 2005 paper, a criterion called maximum lifetime is proposed
to determine the optimum threshold for cutting the cluster den-
drogram [25]. Readers are encouraged to refer to [25] for more
details.
4.2. Visual abstract: feature selection using ERCE
A visual abstract of the proposed swarm-based consensus
clustering algorithm can be seen in Figs. 9 and 10. Fig. 10
presents the consensus matrix and hierarchical cluster tree (clus-
ter dendrogram) from the aggregation of the partitions shown in
Fig. 9.
4.3. Evidence accumulation
Fred and Jain propose the Evidence Accumulation (EAC) in
2005 as a consensus clustering framework for combining the
result of multiple runs of a crisp prototype-based clustering
algorithm (e.g. K-means) [25]. Wang proposes a generalization
to the algorithm, extending the applicability of the EAC for
both crisp and fuzzy clusters [30]. He finds that fuzzy par-
titions is rather advantageous to crisp partitions in Ensemble
Accumulation as the degree of overlapping in fuzzy partition
encodes to an extent how ‘close’ together clusters are [30].
The approach can be summarized as a two step process as
follows,
1. Split: Partition the data matrix Y into some number of parti-
tions Km (may be fixed or randomized within an interval) using
any prototype-based clustering algorithm. Repeat this step M
times.
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476

ASOC29831–24
Fig. 10. A heat map presenting the consensus matrix resulted from the aggregation
of an SE-SRCE swarm shown in Fig. 9 using Algorithm 4 (Section 4.6). The rows and
columns indicate individual items (in our case: the 320 features) whose consensus
values range from 0 (never clustered together) to 1 (always clustered together)
marked by white to dark blue. The complete linkage cluster dendrogram showing
the degree of redundancy between features is shown above the consensus matrix.
Between the cluster dendrogram and the consensus matrix is the cluster label vector
suggested by the maximum lifetime cut. The output of the consensus clustering is
as shown in Fig. 7. (For interpretation of the references to color in this figure legend,
the reader is referred to the web version of the article.)
2. Merge: Calculate the consensus matrix C and interpret the
ensemble clustering by performing a desired graph algo-
rithm.
Given the data vectors yi ∈ Y, for each clustering m, Km centroid
vectors xk ∈ Xm can be obtained using any prototype-based clus-
tering algorithm (e.g. K-means, fuzzy C-means, Gaussian Mixture
Models). The degree of membership of yi w.r.t xk is a function of
distance calculated as follows,
uik,m =
1 if argmin
xk∈X
d(yi, xk,m)
0 otherwise
u ∈ [0, 1] (16)
uik,m =
d(yi, xk,m)−1/( −1)
K
j=1
d(yi, xj,m)−1/( −1)
, > 1 u ∈ {0, 1}. (17)
Wang argues that using fuzzy partition in consensus clustering is
particularly efficient for suppressing over-segmentation. It is also
more tolerant to noisy information than its crisp counterpart [30].
The conventional approach using Evidence Accumulation (EAC)
[25] and Weighted Evidence Accumulation (WEAC) [31] are
summarized in Algorithm 1. Notice that the pseudocode is sim-
plified using the fuzzy t-norm approach to EAC as introduced in
[30].
Algorithm 1. (Weighted) Ensemble Clustering ((W)EAC
Clustering)
Input dim × N Data Matrix Y, maximum number of prototypes Kmax, number of
repetitions M, Prototype-based clustering algorithm Cluster (e.g. K-means,
Fuzzy C-means), Linkage algorithm Linkage.
Output Crisp Ensemble Partition L
1: for m = {1, . . ., M} do
2: // Partition Y using random number of clusters.
3: Krnd ← random({2, Kmax})
4: {Um, Xm} ← Cluster(Y, Krnd)
5: // Calculate the co-association matrix for each clustering.
6: Cm ← UT
mUm
7: Cm ← D
−1/2
m CmD
−1/2
m
8: end for
9: // Calculate the consensus matrix
10: C ←
M
m=1
wmCm
M
m=1
wm
,
11: // Interpret the consensus matrix using Linkage algorithm
12: HierarchicalTree = linkage(C)
13: th← MaximumLifetime(HierarchicalTree)
14: L ← Cut(HierarchicalTree, th)
15: Note that the threshold for cutting the hierarchical tree is determined
using maximum lifetime method [25].
4.4. Swarm Rapid Centroid Estimation
Yuwono [17] proposes the Swarm Rapid Centroid Estimation
(Swarm RCEr+) algorithm in 2011 [32]. The semi-stochastic clus-
tering algorithm efficiently incorporates the paradigms of Particle
Swarm Optimization (PSO [19]) into the traditional Expectation
Maximization (EM). The statistical validation on benchmark data
suggest that Swarm RCEr+ have a reduced risk of converging to
local minima and leaner computational complexity compared to
earlier evolutionary-algorithm-based clustering approaches [17].
The algorithm was updated in 2014 to further decrease its memory
complexity to be used for Ensemble clustering applications [18].
The RCE algorithm below follows the 2014 preposition.
A particle in an RCE subswarm stores a tuple consisting of a
position vector x and a velocity vector v,
particlek,m = {xk,m, vk,m}. (18)
The position vector of each particle represents the coordinate of
a centroid vector xi ∈ Rdim. In RCE a subswarm is a collection of
centroid coordinates, encoding a possible solution to the clustering
problem. As the RCE swarm consists of M of such subswarm, at
the end of optimization, as many as M clustering solutions can be
obtained.
Each subswarm stores two memory matrices:
1. The self-organizing memory Ym, which is an array of randomly
sampled pointers to the data Y,
Ym = randsample(Y, Á%), (19)
where Á % ∈ {0, 1} denotes the rate of random sampling.
2. The best position memory Xbest
m which stores the position vec-
tors X = {x1, . . ., xKm } that minimizes a given objective function
f (Ym, Xm) throughout the search. A typical objective function is
usually defined as, but not restricted to, the average distortion,
f (Ym, Xm) =
xk∈Xm
yi∈Ym
uik,md(xk, yi)
yi∈Ym
uik,m
(20)
where uik,m can be calculated either using Eq. (16) or Eq. (17).
The RCE swarm Xbest matrix is the union of all Xbest
m such that,
Xbest
=
M
m=1
Xbest
m (21)
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530

ASOC29831–24
Fig. 11. Trajectory of the Swarm RCE particles recorded after 30 iterations on a toy dataset with numerous random seeding shows Swarm RCE robustness and insensitivity
to initialization. M = 6, tmax = 30, ε = 0.05, ıreset = 15.
On each iteration, the velocity and position of a particle is
updated as follows,
vk,m(t + 1) = vk,m(t) + «k,m(t) (22)
xk,m(t + 1) = xk,m(t) + vk,m(t + 1) (23)
where « denotes the resultant vector, which consist mainly of the
self organizing term and minimum (best position) term,
«k,m(t) = ϕ1 ◦
self organizing
|Ym|
i=1
uik,m (yi − xk,m(t))
|Ym|
i=1
uik,m
+ ϕ2 ◦
minimum (best position)
⎛
⎝
|Xbest |
j=1
qjk,m (xbest
j
(t) − xk,m(t))
|Xbest |
j=1
qjk,m
⎞
⎠,
= ϕ1 ◦ (E[Ym|Xm = xk,m] − xi,m)
+ϕ2 ◦ (E[Xbest|Xm = xk,m] − xk,m),
(24)
where ϕ ∈ {0, 1} ∈ Rdim denotes a uniform random vector; uik,m
denotes the cluster membership when Ym is mapped to Xm; while
qjk,m denotes the cluster membership when Xbest is mapped to Xm.
Should the self-organizing vector of a particle equals 0, xi will
be directed to xI win,m, the position of the winning particle. xIwin,m
is a particle in the mth subswarm whose cluster has the largest
cardinality.
The RCE is equipped with two strategies to cope with suboptimal
convergence including substitution and particle reset as follows:
1. Substitution strategy forces particles in a search space to reach
alternate equilibrium positions by introducing position instabil-
ity. After each position update episode for a particle, apply
{xi(t + 1), vi(t + 1)} =
{xI win(t + 1) + N(0, ), 0} if ϕ < ε
{xi(t + 1), vi(t + 1)} otherwise
(25)
where ϕ is a uniform random number ϕ ∈ {0, 1}, and N(0, ) is
a Gaussian random vector with mean = 0 and standard devia-
tion of each dimension of the data being clustered. ε denotes
the substitution probability parameter. Larger ε increases the fre-
quency. Optimal ε values lie between 0.01 ≤ ε ≤ 0.05 [17]. RCE
with substitution strategy enabled is denoted with the super-
script +.
2. Particle reset strategy is triggered when ﬁtness of the local
minimum f (Ym, Xbest
m (t)) does not improve after a number of
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559

ASOC29831–24
iterations. Stagnation can be detected using a stagnation counter
ı which is updated as follows:
ı(t + 1) =
ı(t) + 1 if f (Ym, X(t)) ≥ f (Ym, Xbest(t))
0 otherwise
. (26)
When ı(t + 1) > ımax this strategy reinitializes all particles in a
subswarm without resetting the local minimum position matrix.
Values being reinitialized are only xk(t) and vk(t). Swarm conver-
gence is detected when f (Ym, Xbest(t)) does not improve after
a number of resets. RCE with particle reset strategy enabled is
denoted with the superscript r.
The algorithm pseudocode is shown in Algorithm 2. An illus-
tration of the search trajectory of the swarm on a toy example is
shown in Fig. 11.
Algorithm 2. Swarm RCEr+
Input Data points Y = {y1, . . ., yN } ∈ Rdim
, # of clusters K.
Output Swarm centroid vectors
Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).
2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, Á%).
3: repeat
4: for all m ∈ {1, . . ., M} do
5: Calculate Um from the pairwise distance between Xm
and Ym,
6: Calculate Qm from the pairwise distance between Xm
and Xbest
,
7: Store Xbest
m which minimizes f (Ym, Xm) throughout the
search,
8: Vm ← Vm + «m,
9: Xm ← Xm + Vm,
10: Redirect particles with zero cardinality toward the
particle whose cluster has the largest cardinality.
11: Apply substitution with rate of ε
12: if f (Ym, Xbest
m ) does not improve after ıreset iterations
then
13: Reinitialize subswarm (randomize(Xm), Vm = 0)
14: end if
15: end for
16: until Convergence or maximum iteration reached
17: return Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
4.5. Self Evolving Swarm RCE
In this implementation we introduce a new self-evolution
criterion to the RCE which allows each subswarm to summon
additional particles at will until the target cluster entropy is
satisfied.
The uncertainty for a fuzzy membership value uik ∈ {0, 1} [33]
can be quantified as follows,
hik,m = uik,m log uik,m. (27)
Bezdek argues that a good clustering can be achieved when hik,m is
minimized [33]. The average cluster entropy is then,
Hm = −
1
Km|Ym|
Km
k=1
|Ym|
i=1
uik,m log uik,m, (28)
where Um is calculated from Xbest
m . Hm close to 0.5 indicates a
possible underpartitioning. Hm very close to 0 may also indicate
overpartitioning.
Hm is only investigated each when there is an update to Xbest
m
where the number of non-empty clusters is equal to Km such that
|Cbest
m | = Km. If Hm is larger than the target entropy m, the number
of particles incremented using the following rule,
Km(t) =
Km(t) + z+
r if Hm > m,
Km(t) otherwise,
(29)
where Km(t) denotes the number of particles in the swarm m at the
current iteration t, z+
r denotes an upper-bounded random integer,
z+
r ∈ Z+ = [1, 2, . . ., z+
max], while m ∈ {0, 0.5} denotes a target Hm.
Using this approach each subswarm to automatically adjusts Km
until the entropy criterion is satisfied.
The desired granularity and diversity of the swarm can be con-
trolled by setting or randomizing the value of m. The growth speed
of the swarm can be controlled by setting z+
r . As the subswarms
infer Km automatically from Hm, the need of specifying the ran-
domization interval is now abolished (recall that in EAC and WEAC
K-means, Km is randomized within a pre-specified upper and lower
bound).
The pseudocode of the Self-Evolving Swarm RCEr+ (SE-SRCE) can
be seen in Algorithm 3. A typical summary of an execution of SE-
SRCE can be seen in Fig. 12.
Algorithm 3. Self-Evolving Swarm RCEr+ (SE-SRCE)
Input Data points Y = {y1, . . ., yN } ∈ Rdim
, # of clusters K.
Output Swarm centroid vectors
Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).
2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, Á%).
3: repeat
4: for all m ∈ {1, . . ., M} do
5: Execute Algorithm 2 lines 5–14,
6: if f (Ym, Xm) improves then
7: // Check whether the entropy criterion is satisfied and
whether all subswarms are nonempty
8: if |Cbest
m | = Km and Hm > m then
9: Km ← Km + z+
r
10: end if
11: end if
12: end for
13: until Convergence or maximum iteration reached
14: return Xbest
= {Xbest
1 , Xbest
2 , . . ., Xbest
M } ∈ Rdim
.
4.6. Ensemble Rapid Centroid Estimation using Self-Evolving
Swarm
Ensemble RCE (ERCE) [18] is an ensemble extension to the
Swarm RCEr+. The algorithm is shown to be relatively leaner com-
plexity compared to conventional ensemble clustering algorithms
[18], achieving up to quasilinear complexity in both time and space
[18].
In this application we propose incorporating the proposed
SE-SRCE into the ERCE framework. As the size of the evidence accu-
mulation matrix is still relatively manageable (recall that since
there are 320 features = 160 magnitude features + 160 spectral cen-
troid features, the size of C is 320 × 320), EAC can be performed
without using the co-association tree compression process pro-
posed in the original paper [18,34]. However, it needs to be noted
that should the number of features increase up to thousands, it is
advisable that the co-association tree compression is utilized. Fur-
ther information on the co-association tree can be read in Wang’s
paper [34].
In order to interpret the final clustering, we need to clarify that in
our application each cluster represents “a group of more redundant
features”. For each feature cluster, a feature with the largest entropy
is selected as a characteristic feature for the cluster. The pseudocode
of ERCE used in our application is shown in Algorithm 4.
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631

ASOC29831–24
Algorithm 4. Ensemble Rapid Centroid Estimation (ERCE)
Input dim × N Data Matrix Y, number of subswarms M, fuzzification
constant , target entropy for each subswarm { 1, . . ., M}, Linkage
algorithm Linkage.
Output Crisp Ensemble Partition L
Xbest
← SE − SRCE(Y)
for all m ∈ {1, . . ., M} do
Given Y and Xbest
m , calculate Um using Eq. (17).
// Calculate the co-association matrix for each clustering.
Cm ← UT
mUm
Cm ← D
−1/2
m CmD
−1/2
m
end for
C ←
M
m=1
wmCm
M
m=1
wm
,
HierarchicalTree = linkage(C)
th← MaximumLifetime(HierarchicalTree)
L ← Cut(HierarchicalTree, th)
// interpreting the final partition
for all Ck ∈ {C1, . . ., YL max} do
// For each feature cluster, the characteristic feature is the feature with
highest entropy
ycharacteristic
k
= argmaxy∈Ck
− py(x) log py(x)dx
end for
5. Experimental data
The ASHRAE Project 1312-RP modeled and reported a wide vari-
ety of faults in three different seasons. The experiments include two
HVAC systems running side by side with identical zone load. Fault
test was conducted in Air Handling Unit (AHU)-A, meanwhile nor-
mal operation was running in AHU-B. By comparing AHU A and
B fault characteristics were recorded. ASHRAE-1312-RP datasets
included detailed experimental result from Summer 2007, Spring
2008, and Winter 2008. In each season different types of faults
were generated, recorded and reported. Readings from 160 sig-
nals sources during normal operation and various fault scenarios
were recorded. The data was sampled every minute from 6:00 to
18:00. The faults reported in the ASHRAE-1312-RP datasets as well
as a summary on the behavior of the feature proposed by Li [20],
were described in Table 1. Note that the features used in this table
are not part of our research but rather to illustrate how a static
model would struggle during varying seasons. This is because the
features that are important in one season may not be as important
in other seasons. The feature that we use throughout the paper is
determined dynamically using consensus clustering based on the
unique behavior in each season.
6. Result
Based on the features in Table 1, we can see that faults such as
OASB, MADU and HCSF are particularly difficult to identify using Li’s
model [20]. In this section we present the experimental result of our
proposed unsupervised feature selection method. In this section we
wish to investigate the following:
1. What the characteristic features for each season are, and
2. Whether the selected features improves the generalization capa-
bility of an AFDD algorithm in general. In particular, we are
interested in whether we can reliably identify OASB, MADU, and
HCSF using the features selected by our proposed method.
Our approach is as follows. From each dataset (Summer 2007,
Spring 2008, and Winter 2008), as many as 160 time signals, and
a vector recording the time of the day were reported. Using the
method described in Section 3.1 as many as 320 + 1 additional fea-
ture could be extracted including:
• Magnitude features from 160 sensor and control signals,
• Spectral centroid features from 160 sensor and control signals.
• Time of the day (1 feature),
For clarity, the step-by-step process of the experiment can be
summarized as follows:
1. Select a season and get the raw signals during normal operations.
2. For each raw signal, isolate the magnitude and spectral centroid
components and calculate the fuzzy feature representation using
the method described in Section 3.
3. Find the characteristic features using a consensus clustering
algorithm (Our approach uses ERCE: Algorithm 4).
4 . Append the time-of-the-day feature as an additional feature.
5. Using the selected features, train a model (Our approach uses
NARX-TDNN) using the data in Table 1. For each type of fault,
randomly partition the data as follows:
• 15% as training set,
• 15% as validation set, and
• 70% as test set.
6. Investigate the results on the test set to see whether using the
selected features increases/decreases the classifier’s generaliza-
tion capability.
6.1. Feature selection result
We wish to keep the number characteristic feature to a reason-
able level (e.g. between 4 and 30) to ensure that the generalization
capability of the classifier is not undermined. The parameters of
both ERCE, EAC K-means, and WEAC K-means were selected based
on the assumption derived using the method illustrated in Fig. 12.
From the average entropy-distortion scatter for each season such
as depicted in Fig. 12, we approximated the number of character-
istic features to be around 5–30 or the average cluster entropy of
0.005–0.05.
The parameters used for ERCE were as follows. The initial num-
ber of particles was set to 2, the number of subswarms was set to
60, substitution probability ε was set to 3%, ıreset was set to 15, the
distance metric was set to KL-divergence, fuzzifier was set to 1.2,
the entropy threshold for each subswarm m was uniformly ran-
domized between 0.005 and 0.05, z+
max = 2, maximum number of
iterations was set to 100, and the linkage method was set to com-
plete linkage. KL-divergence and complete linkage were selected
as the physical model of the HVAC was assumed to be unknown
and even a subtle difference in temporal patterns/shapes could be
an important predictive component for specific types of fault. Com-
plete linkage favors the formation of small spherical clusters which
is particularly useful for capturing these subtle differences. Opti-
mum cut was then conventionally calculated using the maximum
lifetime criterion [25]. Subswarms were equally weighted during
ensemble aggregation such that w1,...,M = 1.
Further investigation was also performed in order to benchmark
the quality of the feature selected by the method. Benchmark unsu-
pervised feature selection methods includes EAC K-means [25],
WEAC K-means [31], and a traditional complete linkage agglomer-
ative clustering (CL). CL was utilized to verify the advantages of the
consensus approaches to a conventional graph-based approach. In
this experiment, the CL hierarchical tree is cut using inconsistency
criterion, with inconsistency coefficient = 1, returning as many as
84 clusters, thus 84 characteristic features.
The parameters for EAC K-means and WEAC K-means were set
as follows. The number of repetitions was set to 60, the number
of clusters k was uniformly randomized between 5 and 30. The
distance metric was set to KL-divergence. The linkage method was
set to complete linkage as per discussion. The optimum cut was
calculated using the maximum lifetime criterion [25]. Weights for
WEAC K-means were calculated using the average silhouette width
criterion [35].
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734

Pleasecitethisarticleinpressas:M.Yuwono,etal.,Unsupervisedfeatureselectionusingswarmintelligenceandconsensusclus-
teringforautomaticfaultdetectionanddiagnosisinHeatingVentilationandAirConditioningsystems,Appl.SoftComput.J.(2015),
ARTICLEINPRESSGModel
ASOC29831–24
14M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx
Table 1
ASHRAE-1312-RP dataset description and symptoms using features described in Shun Li’s model [20].
# Name Description HWC-
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Summer
2007
1 NOR0819 Normal Operation
3 EADS0820 EA Damper Stuck (Fully
Open)
0 0 0 0 + + + + 0 + + 0 0 0 0 0
Close)
0 0 0 0 − − − − 0 − − 0 0 0 0 0
5 RFF0822 Return Fan at ﬁxed
speed (30% speed)
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
6 RFF0823 Return Fan complete
failure
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
7 CHWC0824 Cooling Coil Valve
Control unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(Reduce PID
Proportional Band by
half)
8 CHWC0903 Cooling Coil Valve
Reverse Action
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
9 OADS0826 OADS OA Damper Stuck
(Fully Closed)
0 0 0 0 ++ ++ ++ ++ 0 + − 0 0 0 0 0
10 CHWV0827 Cooling Coil Valve Stuck
(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
(Partially Open – 15%)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
14 HCL0828 Heating Coil Valve
Leaking (Stage 1 –
0.4GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
1.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
2.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
17 OADL0905 OA Damper Leaking
(45% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
18 OADL0906 OA Damper Leaking
(55% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
19 AHUL0907 AHU Duct Leaking (after
SF)
0 0 + + + + + + + + + 0 0 0 0 0
20 AHUL0908 AHU Duct Leaking
(before SF)
0 0 0 0 −− −− −− −− 0 −− −− 0 0 0 0 0

ASOC29831–24
M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx15
Table 1 (Continued)
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Spring
2008
6 OASB0529 OA temperature sensor
bias (+3F)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 OASB0530 OA temperature sensor
bias (−3F)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 OADS0507 OA Damper Stuck (Fully
Close)
0 0 0 0 + + + + − + −− 0 0 0 0 0
9 OADS0508 OA Damper Stuck (40%
open)
0 0 0 0 + + + + − + −− 0 0 0 0 0
open)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Close)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
12 EADS0511 EA Damper Stuck (40%
open)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
13 CHW0506 Cooling Coil Valve Stuck
(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
16 RFF0512 Return Fan complete
failure
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
speed (20%spd)
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
speed (80%spd)
0 0 0 0 0 0 ++ ++ 0 ++ 0 0 0 0 0 0
19 AFAB0522 Air ﬁlter area block fault
(10%)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 AFAB0525 Air ﬁlter area block fault
(25%)
0 0 0 0 + + + + 0 0 0 0 0 0 0 0
21 MADU0513 Mixed air damper
unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 MADU0514 Mixed air damper
unstable/Cooling coil
control unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 HCSF0517 Sequence of heating and
cooling unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 HCSF0601 Supply Fan control
unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ASOC29831–24
16M.Yuwonoetal./AppliedSoftComputingxxx(2015)xxx–xxx
Table 1 (Continued)
VLV
P-E-
hcoil
CHWC-
VLV
P-E-
ccoil
CFM
P-RA-
CFM
P-OA-
CFM
SA-
TEMP
MA-
TEMP
RA-
TEMP
HWC-
DAT
CHWC-DAT
Winter
2008
4 OADS0212 OA Damper Stuck (Fully
Close)
−− −− 0 0 ++ + ++ + −− ++ −− 0 − 0 0 0
5 OADL0213 OA damper leaking (52%
open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
6 OADL0215 OA damper leaking (62%
open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
open)
0 0 0 0 0 0 0 0 0 + + 0 0 0 0 0
Close)
− −− 0 0 0 0 0 −− 0 −− −− 0 0 0 0 0
(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 − 0 0 ++ −
+ + + + 0 0 0 0 0 0 0 0 0 0 ++ 0
11 HCF0205 Heating Coil Fouling
Stage 1
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
12 HCF0206 Heating Coil Fouling
Stage 2
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
13 HCRC0207 Heating coil reduced
capacity Stage 1
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
capacity Stage 2
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
capacity Stage 3
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A plug {0(a)
, +(b)
, ++(c)
, −(d)
, −−(e)
} indicates that the value for the variable is: (a) 0: unchanged (the fault has no effect on the corresponding variable); (b) +: greater than normal; (c) ++: substantially greater than normal; (d) −:
less than normal; (e) −−: substantially less than normal.

ASOC29831–24
200 400 600 800 1000
0
0.1
0.2
0.3
0.4
iteration
Ave.ClusterEntropy
200 400 600 800 1000
0
10
20
30
40
iteration
NumberofClusters
200 400 600 800 1000
10
−2
10
0
10
2
iteration
Ave.Distortion
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 5 10 15 20 25
Number of Clusters
ClusterEntropy
0 5 10 15 20 25
0
10
20
30
40
50
60
Number of Clusters
AverageDistortion
0 0.1 0.2 0.3 0.4 0.5
0
10
20
30
40
50
60
Cluster Entropy
AverageDistortion
0
0.1
0.2
0.3
0.4
0.5
0
5
10
15
20
25
0
10
20
30
40
50
60
Number of Clusters
Cluster Entropy
AverageDistortion
Fig. 12. The scatter plot of the average distortion with respect to cluster entropy and the number of clusters extracted after a run of SE-SRCE with = 1.2. The top graphs show
the cross-sectional plots of the three parameters during optimization of SE-SRCE, leading to the creation of the bottom scatter plot. The appropriate entropy range/K range
can be investigated by observing Km, Hm, and f (Ym, X) trade-offs so that both distortion and entropy can be minimized while keeping the number of clusters to a reasonable
level.

ASOC29831–24
We measured the appropriateness of the feature selection
method by investigating the normalized mutual information (NMI)
between features [26]. Mutual information examines the depen-
dence between two discrete distributions X and Y. Minimizing
mutual information is equal to maximizing the KL-divergence
between the cross-entropy H(X, Y) and the marginal entropies (H(X)
and H(Y)) as follows,
NMI(X; Y) =
I(X; Y)
H(X)H(Y)
,
=
H(X) + H(Y) − H(X, Y)
H(X)H(Y)
,
=
x∈X y∈Y
p(x, y)(log p(x, y)/(p(x)p(y)))
x∈X
p(x) log p(x) y∈Y
p(y) log p(y)
,
(30)
where X and Y in our case was a pair of fuzzy feature signals (y1 and
y2 calculated using Eq. (5)), rounded to the nearest integer, such
that
X(n) = round(y1(n)), X(n) ∈ {−1, 0, 1}, (31)
and
Y(n) = round(y2(n)), Y(n) ∈ {−1, 0, 1}. (32)
The NMI is calculated by marginalizing the probability of co-
occurrence between these three discrete categories. For a pair of
signals, NMI closer to 1 indicates that the feature pair is redun-
dant. For each feature set, the strictly upper triangular of the
pairwise NMI matrix is taken and the median, 75 percentile, and
95 percentile is averaged over 80 runs. Since we want to minimize
redundancies between features, a good feature set is characterized
by an average NMI closer to 0. Table 2 summarizes the result of the
experiment.
The characteristic features in each season were unique from
those of other seasons. In order to analyze the important features
for each season, we repeated the clustering process 200 times. From
this process, three histograms describing the probability of occur-
rence of the characteristic features for each season were reported
in Fig. 13. The probability of occurrence was calculated as the fre-
quency of appearance divided by the number of trials.
The overall patterns for fault classes for each season based on the
characteristic features are presented in Figs. 14–16, respectively.
Each circle in these figures show the condition of the characteristic
features during a specific fault in the HVAC system.
6.2. Classification result
Generalization capability of a classifier is a powerful indicator of
the quality of the features. Using the characteristic features selected
using the proposed method, a classifier can be trained with less
computational burden and less probability of overfitting (note that
in our experiment, 30% of the data was equally divided into train-
ing and validation sets, the remaining 70% is used as test set). The
classifier were trained and tested using the fuzzy features, ys, as is
shown in Figs. 14–16.
The parameters for NARX-TDNN are set as follows. The number
of hidden neurons was set to 10. The input layer, hidden layer, and
feedback orders were set to 2. The architecture is illustrated in Fig. 8.
The dataset was divided at random to be used for training (15%),
validation (15%), and test (70%) sets. The training was done using
Levenberg–Marquardt algorithm. The experiment was repeated 80
times for each season to test the reliability and repeatability of the
method. Using the features shown in Figs. 14–16, the average sen-
sitivity and specificity of the proposed method compared to Li’s
manual feature selection approach is presented in Table 3.
The quality of the feature sets selected by ERCE was bench-
marked against the features selected by EAC K-means, WEAC
K-means, and Complete Linkage. The features selected by these
four competing algorithms were supplied for both NARX-TDNN
and Hidden Markov Models (HMM) [11–13], where the training
and testing for both classifiers were repeated 100 times for each
pair of feature selection and classification algorithm. The weighted
average (WA) sensitivity and WA specificity result are reported in
Table 4.
The significance of the experimental result were validated using
paired t-test with null hypotheses as follows:
1. H∗
0
: The performance of a classifier using features from ERCE is
not significantly better than using features from algorithm X. A
star (*) in Tables 3 and 4 indicates that H∗
0
should be rejected,
whereas no sign indicates otherwise.
2. H
†
0
: Given the same feature selection algorithm, a trained
classifier A does not exercise significantly better performance
compared to classifier B. A dagger (†) in Table 4 indicates that H
†
0
should be rejected, whereas no sign indicates otherwise.
7. Discussion
As the proposed feature selection process is strictly unsu-
pervised, analyzing the result leads to a number of interesting
observations.
With regards to the redundancies between features,
it can be seen in Table 2 that all consensus algorithms
(Median NMIERCE = 0.019, Median NMIEAC Kmeans = 0.040, Median
NMIWEAC Kmeans = 0.048) in general outperformed CL (Median
NMI = 0.1305), manual selection (Median NMI = 0.0199, Q75%
NMI = 0.2227), and no selection (Median NMI = 0.1857). The three
consensus algorithms reported less than 20 characteristic features
on average, which is at least four times lower than the number
of characteristic features selected using CL. Furthermore, the
features selected by ERCE (Median NMI = 0.019 ± 0.004) outper-
formed those that are selected by other consensus algorithms:
EAC K-means (Median NMI = 0.040 ± 0.011) and WEAC K-means
(Median NMI = 0.048 ± 0.034) as indicated by its low NMI. ERCE
also had smaller standard deviations on all performance aspects,
especially on the number of features, suggesting the relatively
high reliability and repeatability of the proposed swarm-based
consensus clustering algorithm.
With regards to the reliability of the feature selection algorithm,
ERCE consistently selects features that are unique and relevant to
the faults in the corresponding year, as can be seen in Fig. 13. For
example, throughout the experiment using Winter 2008 dataset,
ERCE consistently selected HWC-VLV, PLN-TMP, EA-DMPR, HWC-
DAT and HWP-GPM, which are ones of the important features for
the specific season. Pattern for the Winter 2008 dataset is shown in
Fig. 16. In this figure, the pattern for Exhaust Air Damper Stuck
(EADS) faults can be easily distinguished among the others by
observing the conditions of both EA-DMPR and PLN-TMP. Simi-
larly, HCRC faults in this season are characterized by abnormal
HWC-VLV and VAV-DMPR signals. CHW faults are also observable
from an increase in HWC-DAT as the system compensates for the
increased flow of chilled water due to the faulty cooling coil valve.
ERCE also appropriately discovers that SC CHWC-GPM is a partic-
ularly important feature in Spring 2008 due to HCSF0517, as has
been discussed previously in Section 3. ERCE discovers that outside
air damper (OA-DMPR) is consistently inside the atypical nega-
tive region during HCSF faults. This information may be useful for
further investigation of the nature of the particular fault.
Regarding the effects of the proposed feature selection algo-
rithm to classifier performances, the result of ERCE+NARX-TDNN,
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849

ASOC29831–24
Table 2
The Normalized Mutual Information (NMI) between features selected using various feature selection algorithm on Spring 2008 dataset. Boldface indicates the lowest NMI
(the least redundancies between features).
Feature selection method
Without feature selection Manual selection [20] CL
# of Features 320 16 84
NMI between characteristic feature pairs
Median 0.1857 0.0199 0.1305
Q75% NMI 0.4110 0.3014 0.2227
Q95% NMI 0.8821 0.4899 0.4863
Feature selection method
EAC k-Means WEAC K-means ERCE
# of Features 15.90 ± 3.86 16.70 ± 4.73 17.20 ± 1.60
NMI between characteristic feature pairs
Median 0.040 ± 0.011 0.048 ± 0.034 0.019 ± 0.004
Q75% NMI 0.106 ± 0.025 0.131 ± 0.068 0.078 ± 0.013
Q95% NMI 0.404 ± 0.035 0.364 ± 1.600 0.339 ± 0.031
particularly in the Spring 2008 shows a clear advantage of ERCE
to other feature selection approaches. As can be seen in Table 3,
when compared to the manual selected features as suggested
by Li [20], supplying NARX-TDNN with the feature selected by
ERCE results in consistent specificity improvements in Spring 2008.
Moreover overall statistically significant weighted average per-
formance improvements are also observed throughout Summer
2007, Spring 2008, and Winter 2008 based on our experiment.
Based on the statistical results in Table 4, using features from
Li and EAC K-means limits NARX-TDNN’s specificity at an aver-
age around 91.54% and 91.85% respectively. The low average may
be attributed to misclassification of a number of more ambigu-
ous faults such as OASB, MADU, AFAB and HCSF. This report
is consistent with Li’s observation, presented in Table 1 where
Fig. 13. Representative feature occurrence histogram for each season after 200 clustering trials. The x-axis denotes the specific label for each feature, y-axis denotes the
probability of occurrence, calculated as the frequency of appearance divided by the number of trials.
850
851
852
853
854
855
856
857
858
859
860
861
862
863

ASOC29831–24
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
NOR0819
NOR0825
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
EADS0820
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
EADS0821
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
RFF0822
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
RFF0823
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWC0824
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWC0903
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADS0826
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0827
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0831
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0901
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
CHWV0902
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0828
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0829
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
HCL0830
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADL0905
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
OADL0906
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
AHUL0907
1
2
3
4
56
7
8
9
10
11
12
13
14
15
16 17
18
19
20
21
−1.0 0.0 1.0
AHUL0908
Fig. 14. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Summer 2007 dataset.

ASOC29831–24
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
NOR0502
NOR0503
NOR0504
NOR0505
NOR0509
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OASB0529
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OASB0530
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OADS0507
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
OADS0508
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0527
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0510
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
EADS0511
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0506
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0515
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
CHW0516
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0512
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0518
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
RFF0519
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
AFAB0522
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
AFAB0525
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
MADU0513
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
MADU0514
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
HCSF0517
1
2
3
4
56
7
8
9
10
11
12
13
14 15
16
17
18
19
−1.0 0.0 1.0
HCSF0601
Fig. 15. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Spring 2008 dataset.

ASOC29831–24
1
2
3
4
5
6
7
−1.0 0.0 1.0
NOR0129
NOR0216
NOR0217
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADS0212
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0213
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0215
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0202
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0203
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0210
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0211
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0205
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0206
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0207
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0208
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0209
Fig. 16. Patterns constituted by the characteristic features for each data in the ASHRAE-1312 Winter 2008 dataset.
these faults seem to have no effects on the manually selected
features. Similar cases are seen with WEAC K-means and com-
plete linkage. Using features from ERCE allows NARX-TDNN to
reach a significantly higher specificity average of 98.37% ± 0.25%.
The significance of the results are statistically validated on both
Summer 2007 and Spring 2008 datasets, where signals exhibit
more nonlinearities compared to those in the Winter 2008
dataset.
Regarding the general performance of the classifiers, results in
Table 4 show the comparative performance between HMM and
NARX-TDNN. While HMM shows superior specificity in Winter
2008 dataset, its specificity in Spring 2008 and Summer 2007
is relatively not as high. This is arguably due to the nonlin-
earities in the fault patterns in Spring 2008 and Summer 2007
datasets compared to Winter 2008 faults. For instance, it can
be seen in Fig. 15 that MADU, AFAB and HCSF faults exhibit
visually ambiguous patterns. When dealing with these nonlinear
datasets, the NARX-TDNN classifier benefits from its capabil-
ity in dealing with long-term dependencies. Table 4 shows that
NARX-TDNN was capable in distinguishing these faults, achiev-
ing specificity of 98.37% ± 0.25% using the features provided by
ERCE.
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885

ASOC29831–24
Table 3
NARX-TDNN classification result.
Fault type Feature selection method
Manual selectiona
ERCEb
Sensitivity Specificity Sensitivity Specificity
Summer 2007
NOR 99.9% ± 0.1% 98.1% ± 1.6% 99.9% ± 0.2% 99.0% ± 2.1%
EADS 99.7% ± 0.5% 99.5% ± 2.7% 99.8% ± 0.3% 98.9% ± 2.5%
RFF 99.9% ± 0.0% 99.0% ± 2.7% 99.9% ± 0.1% 99.5% ± 1.4%
CHWC 99.9% ± 0.2% 99.0% ± 1.1% 99.8% ± 0.2% 99.0% ± 4.4%
OADS 99.9% ± 0.2% 98.0% ± 2.2% 99.9% ± 0.3% 97.3% ± 3.1%
CHWV 99.8% ± 0.3% 99.0% ± 4.3% 99.7% ± 0.9% 99.2% ± 2.5%
HCL 99.7% ± 0.4% 98.0% ± 1.0% 99.7% ± 0.3% 98.4% ± 2.4%
OADL 99.7% ± 0.5% *
95.2% ± 7.1% 99.9% ± 0.2% 98.0% ± 1.2%
AHUL 99.8% ± 0.2% 99.8% ± 1.1% 99.9% ± 0.1% 99.5% ± 2.6%
Weighted average 99.8% ± 0.1% *
96.8% ± 2.2% 99.8% ± 0.1% 98.4% ± 0.7%
Spring 2008
NOR 99.8% ± 0.3% 99.3% ± 2.1% 99.9% ± 0.1% 99.6% ± 0.6%
OASB 99.1% ± 1.5% *
95.0% ± 6.1% 99.7% ± 0.3% 99.5% ± 1.4%
OADS 99.9% ± 0.2% *
98.2% ± 1.7% 99.8% ± 0.1% 99.5% ± 0.9%
EADS 99.9% ± 0.1% *
98.3% ± 0.5% 99.9% ± 0.1% 99.0% ± 2.8%
CHW 99.7% ± 0.4% *
98.7% ± 0.8% 99.8% ± 0.2% 99.3% ± 0.7%
RFF 99.9% ± 0.2% *
82.6% ± 33.1% 99.8% ± 0.1% 99.4% ± 0.7%
AFAB 99.7% ± 0.2% *
42.9% ± 17.8% 99.7% ± 0.2% 98.5% ± 4.9%
MADU 98.6% ± 1.6% *
70.4% ± 39.8% 98.9% ± 0.2% 98.0% ± 4.0%
HCSF 99.6% ± 0.6% *
94.7% ± 6.6% 99.9% ± 0.0% 99.5% ± 1.5%
Weighted average 98.9% ± 0.2% *
86.2% ± 5.0% 99.9% ± 0.1% 99.2% ± 0.5%
Winter 2008
NOR 99.6% ± 0.4% 99.3% ± 1.1% 99.8% ± 0.1% 98.3% ± 2.4%
OADS 99.9% ± 0.1% *
95.6% ± 3.8% 99.8% ± 0.2% 98.7% ± 1.4%
OADL 99.8% ± 0.4% 98.5% ± 3.2% 99.5% ± 0.7% 98.5% ± 1.5%
EADS 99.9% ± 0.4% 97.9% ± 1.3% 99.6% ± 0.3% 97.5% ± 2.5%
CHW 99.8% ± 0.4% *
97.5% ± 5.2% 99.6% ± 0.3% 99.1% ± 1.2%
HCF 99.8% ± 0.4% *
95.1% ± 4.5% 99.2% ± 0.7% 97.2% ± 2.9%
HCRC 99.8% ± 0.4% 99.0% ± 2.2% 99.8% ± 0.3% 99.4% ± 1.1%
Weighted average 99.7% ± 0.2% 97.5% ± 0.7% 99.8% ± 0.1% 98.7% ± 0.7%
H∗
0
: The performance of NARX-TDNN using features from ERCE is not significantly better than using manually selected features.
a
Manual selection utilizes Shun Li’s feature set [20].
b
ERCE features are as shown in Fig. 14–16.
*
Reject H∗
0
(˛ = 0.001).
Table 4
Performance comparison with competing feature selection methods, tested against two classification methods: NARX-TDNN and HMM.
Feature selection # of features HMM NARX-TDNN
WA sensitivity WA specificity WA sensitivity WA specificity
Summer 2007
Manual selectiona
16 ± 0.00 *
98.65% ± 0.34% 89.45% ± 2.48% †
99.59% ± 0.12% †
96.81% ± 1.99%
EAC K-means 29.85 ± 17.26 *
98.70% ± 0.50% *
85.01% ± 4.94% †
99.69% ± 0.22% *,†
95.07% ± 3.75%
WEAC K-means 14.14 ± 13.09 *
97.69% ± 0.13% *
72.85% ± 1.48% †
99.79% ± 0.08% *,†
96.85% ± 2.31%
Complete linkage 81.00 ± 0.00 98.71% ± 0.98% 90.49% ± 7.52% †
99.51% ± 0.27% †
96.42% ± 1.16%
ERCE 21.41 ± 4.46 99.15% ± 0.32% 90.85% ± 4.16% †
99.69% ± 0.08% †
97.61% ± 0.85%
Spring 2008
Manual selectiona
16 ± 0.00 98.90% ± 0.54% †
91.54% ± 2.98% *
98.89% ± 0.23% *
86.17% ± 5.01%
EAC K-means 34.56 ± 9.40 98.55% ± 0.42% 91.85% ± 2.68% *,†
99.02% ± 0.81% *
91.92% ± 6.42%
WEAC K-means 33.52 ± 10.32 98.83% ± 0.40% 93.37% ± 2.38% †
99.20% ± 0.49% *
92.37% ± 6.53%
Complete linkage 84 ± 0.00 98.80% ± 0.46% 94.12% ± 2.61% †
99.62% ± 0.17% *
95.14% ± 1.29%
ERCE 19.93 ± 5.19 98.84% ± 0.32% 92.68% ± 2.66% †
99.79% ± 0.10% †
98.37% ± 0.25%
Winter 2008
Manual selectiona
16 ± 0.00 98.81% ± 0.56% *
92.92% ± 0.31% †
99.71% ± 0.15% †
97.51% ± 0.65%
EAC K-means 27.74 ± 7.18 †
99.98% ± 0.14% †
99.85% ± 0.85% 99.49% ± 0.50% 97.87% ± 2.06%
WEAC K-means 21.37 ± 11.75 †
99.96% ± 0.18% 99.79% ± 1.00% 99.59% ± 0.19% 97.68% ± 0.88%
Complete linkage 95 ± 0.00 99.87% ± 0.40% 99.21% ± 2.37% 99.74% ± 0.13% 98.54% ± 1.01%
ERCE 7.88 ± 3.02 99.92% ± 0.31% 99.49% ± 1.43% 99.73% ± 0.19% 98.35% ± 1.16%
H∗
0: The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. H
†
0
: Given the same feature selection algorithm,
a trained classifier A does not exercise significantly better performance compared to classifier B.
a
Manual selection utilizes Shun Li’s feature set [20].
*
Reject H∗
0
(˛ = 0.001).
†
Reject H
†
0
(˛ = 0.001).
8. Conclusion
A method for automating feature selection and classification
of faults for Heating Ventilation and Air-Conditioning (HVAC) sys-
tems using a knowledge-discovery and Neural-Network approach
has been proposed. The core of the method is the Ensemble Rapid
Centroid Estimation (ERCE) which automatically finds characteris-
tic features and discards redundant features. Using these character-
istic features, a Parallel Nonlinear Auto-Regressive Neural Network
with eXogenous inputs and distributed time delays (NARX-TDNN)
is then trained to identify the faults described in ASHRAE-1312-RP
Summer 2007, Spring 2008, and Winter 2008 datasets.
886
887
888
889
890
891
892
893
894
895
896

ASOC29831–24
The performance of the proposed unsupervised fea-
ture selection algorithm (ERCE Median NMI = 0.019 ± 0.004)
generally outperformed the conventional consensus clus-
tering including Evidence Accumulation K-means (Median
NMI = 0.040 ± 0.011), Weighted Evidence Accumulation K-means
(Median NMI = 0.048 ± 0.034), and the conventional complete
linkage clustering (Median NMI = 0.1305). ERCE also had smaller
standard deviations on all performance aspects, especially on the
number of features, suggesting the relatively high reliability and
repeatability of the proposed swarm-based consensus clustering
algorithm.
The proposed feature selection method was tested on the
experimental fault data from the ASHRAE-1312-RP datasets includ-
ing Summer 2007, Spring 2008, and Winter 2008 using two
well-established time-domain classifiers: (a) NARX-TDNN; and (b)
Hidden Markov Models (HMM). Satisfactory results were reported
and summarized. Our experimental results showed weighted aver-
age sensitivity and specificity of: (a) higher than 99% and 96% for
NARX-TDNN, and; (b) higher than 98% and 86% for HMM on the
ASHRAE-1312-RP datasets. The proposed feature selection method
appears to have positive effect in improving the generalization
capability of both AFDD algorithms based on our experiment.
Notwithstanding the satisfactory result to date, further work
is necessary to investigate the performance of the proposed
method on alternative HVAC systems. Future works will incor-
porate semi-supervised adaptive learning capability for automatic
fault discovery. We are also interested in applying the proposed
consensus clustering method for other applications.
Acknowledgements
This research is funded by The Commonwealth Scientific and
Industrial Research Organisation (CSIRO), Marsfield, Australia. The
ASHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008
fault data are provided by CSIRO. The research is supervised
by CSIRO, the paper writing is supervised specifically by Guo.
Automatic Fault Detection and Diagnosis (AFDD) for the Heating
Ventilation and Air Conditioning (HVAC) research is an ongoing
project in CSIRO Energy Technology and Computational Informat-
ics. We acknowledge the inputs of the anonymous reviewers for
the time and effort in providing our paper comprehensive quality
criticisms. The corresponding author would also like to personally
acknowledge Nina Elita for her contribution, especially in proof
reading and provision of sincere moral support to the correspond-
ing author during the preparation, writing and submission of this
paper.
References
[1] A. Kusiak, M. Li, F. Tang, Modeling and optimization of {HVAC} energy con-
sumption, Appl. Energy 87 (2010) 3092–3102.
[2] A. Kusiak, F. Tang, G. Xu, Multi-objective optimization of {HVAC} system with
an evolutionary computation algorithm, Energy 36 (2011) 2440–2449.
[3] J. Wall, Automatic Fault Detection and Diagnosis, 2011 http://www.csiro.au/
Outcomes/Energy/building-fault-detection.aspx
[4] J. Ward, Opticool, 2013 http://www.csiro.au/Organisation-Structure/
Flagships/Energy-Flagship/Opticool.aspx
[5] J. Liang, R. Du, Model-based fault detection and diagnosis of HVAC systems
using support vector machine method, Int. J. Refrig. 30 (2007) 1104–1114.
[6] D. Jacob, S. Dietz, S. Komhard, C. Neumann, S. Herkel, Black-box models for fault
detection and performance monitoring of buildings, J. Build. Perform. Simul. 3
(2010) 53–62.
[7] C. Lo, P. Chan, Y.-K. Wong, A.B. Rad, K. Cheung, Fuzzy-genetic algorithm for auto-
matic fault detection in HVAC systems, Appl. Soft Comput. 7 (2007) 554–560.
[8] J. Schein, S.T. Bushby, N.S. Castro, J.M. House, A rule-based fault detection
method for air handling units, Energy Build. 38 (2006) 1485–1492.
[9] T.M. Rossi, J.E. Braun, A statistical, rule-based fault detection and diagnostic
method for vapor compression air conditioners, HVAC&R Res. 3 (1997) 19–37.
[10] J. Schein, Results from Field Testing of Embedded Air Handling Unit and Variable
Air Volume Box Fault Detection Tools, U.S. Dept. of Commerce, Technology
Administration, National Institute of Standards and Technology, 2006.
[11] J. Wall, Y. Guo, J. Li, S. West, A dynamic machine learning-based tech-
nique for automated fault detection in HVAC systems, in: Proceedings of
the ASHRAE Annual Conference, Montreal, Quebec, Canada, 2011, 2011,
pp. 449–456.
[12] Y. Guo, D. Dehestani, J. Li, J. Wall, S. West, S. Su, Intelligent outlier detection for
HVAC system fault detection, in: Proceedings of the 10th International Healthy
Buildings Conference, Brisbane, Queensland, Australia, 2012, 2012.
[13] Y. Guo, J. Wall, J. Li, S. West, Intelligent model based fault detection and
diagnosis for HVAC system using statistical machine learning methods, in:
Proceedings of the ASHRAE 2013 Winter Conference, Dallas, USA, 2013, 2013.
[14] M. Yuwono, S.W. Su, Y. Guo, J. Li, S. West, J. Wall, Automatic feature selection
using multiobjective cluster optimization for fault detection in a heating venti-
lation and air conditioning system, in: Proceedings of the 2013 1st International
Conference on Artificial Intelligence, Modelling and Simulation, AIMS ’13, IEEE
Computer Society, Washington, DC, USA, 2013, 2013, pp. 171–176, http://dx.
doi.org/10.1109/AIMS.2013.34
[15] W. Deng, X. Yang, L. Zou, M. Wang, Y. Liu, Y. Li, An improved self-adaptive
differential evolution algorithm and its application, Chemometr. Intell. Lab.
Syst. 128 (2013) 66–76, http://dx.doi.org/10.1016/j.chemolab.2013.07.004
[16] L. Wang, C.-X. Dun, W.-J. Bi, Y.-R. Zeng, An effective and efficient differen-
tial evolution algorithm for the integrated stochastic joint replenishment and
delivery model, Knowl.-Based Syst. 36 (2012) 104–114, http://dx.doi.org/10.
1016/j.knosys.2012.06.007
[17] M. Yuwono, S. Su, B. Moulton, H. Nguyen, Data clustering using variants of rapid
centroid estimation, IEEE Trans. Evol. Comput. 18 (2013) 366–377.
[18] M. Yuwono, S. Su, B. Moulton, H. Nguyen, An algorithm for scalable clustering:
ensemble rapid centroid estimation, in: Proceedings of the 2014 IEEE Congress
on Evolutionary Computation, 2014, 2014, pp. 1250–1257.
[19] D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarm
optimization, in: Proceedings of the 2003 IEEE Congress on Evolutionary Com-
putation, 2003, vol. 1, 2003, 2003, pp. 215–220.
[20] S. Li, A Model-Based Fault Detection and Diagnostic Methodology for Secondary
HVAC Systems (Ph.D. thesis), Drexel University, 2014.
[21] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22
(1951) 79–86, http://dx.doi.org/10.1214/aoms/1177729694
[22] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: A resampling-
based method for class discovery and visualization of gene expression
microarray data, Mach. Learn. 52 (2003) 91–118, http://dx.doi.org/10.1023/
A:1023949509487
[23] M.D. Wilkerson, D.N. Hayes, ConsensusClusterPlus: a class discovery tool
with confidence assessments and item tracking, Bioinformatics 26 (2010)
1572–1573.
[24] D.N. Hayes, S. Monti, G. Parmigiani, C.B. Gilks, K. Naoki, A. Bhattacharjee,
M.A. Socinski, C. Perou, M. Meyerson, Gene expression profiling reveals repro-
ducible human lung adenocarcinoma subtypes in multiple independent patient
cohorts, J. Clin. Oncol. 24 (2006) 5079–5090.
[25] A. Fred, A. Jain, Combining multiple clusterings using evidence accumulation,
IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835–850, http://dx.doi.org/10.
1109/TPAMI.2005.113
[26] A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for com-
bining multiple partitions, J. Mach. Learn. Res. 3 (2003) 583–617, http://dx.doi.
org/10.1162/153244303321897735
[27] I.J. Leontaritis, S.A. Billings, Input–output parametric models for non-linear
systems. Part I: Deterministic non-linear systems, Int. J. Control 41 (1985)
303–328, http://dx.doi.org/10.1080/0020718508961129
[28] H. Siegelmann, B. Horne, C. Giles, Computational capabilities of recurrent NARX
neural networks, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27 (1997)
208–215, http://dx.doi.org/10.1109/3477.558801
[29] J.M. Menezes Jr., G. Barreto, A new look at nonlinear time series prediction
with NARX recurrent neural network, in: Ninth Brazilian Symposium on Neural
Networks, 2006. SBRN ’06, 2006, pp. 160–165, http://dx.doi.org/10.1109/SBRN.
2006.7
[30] T. Wang, Comparing hard and fuzzy C-means for evidence-accumulation clus-
tering, in: Proceedings of the 18th International Conference on Fuzzy Systems,
FUZZ-IEEE’09, IEEE Press, Piscataway, NJ, USA, 2009, 2009, pp. 468–473.
[31] F. Duarte, A.L.N. Fred, A. Lourenco, M. Rodrigues, Weighting cluster ensembles
in evidence accumulation clustering, in: Portuguese Conference on Artificial
Intelligence, 2005. EPIA 2005, 2005, pp. 159–167, http://dx.doi.org/10.1109/
EPIA.2005.341287
[32] M. Yuwono, S.W. Su, B.D. Moulton, H.T. Nguyen, Fast unsupervised learning
method for rapid estimation of cluster centroids, in: Proceedings of the 2012
IEEE Congress on Evolutionary Computation, 2012, 2012, pp. 889–896.
[33] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: G.
Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical
Taxonomy, Freeman, San Francisco, CA, 1975, 1975, pp. 143–166.
[34] T. Wang, Ca-tree: a hierarchical structure for efficient and scalable
coassociation-based cluster ensembles, IEEE Trans. Syst. Man Cybern. Part B:
Cybern. 41 (2011) 686–698, http://dx.doi.org/10.1109/TSMCB.2010.2086059
[35] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation
of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042

HVAC_CSIRO_Proof_2015

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to HVAC_CSIRO_Proof_2015

Similar to HVAC_CSIRO_Proof_2015 (20)

HVAC_CSIRO_Proof_2015