1. Vibration-Based Damage Detection
Using Unsupervised Support Vector Machine
Ching-Huei Tsou1 and John R. Williams2
{tsou, jrw}@mit.edu
Abstract:
Vibration-based damage detection methods can be used to identify hidden damages in structural
components. Traditional modal based system identification paradigm requires a detailed model of
the structure, such as a finite element model. This paper describes a novel statistical damage
detection approach based on a support-vector machine methodology. The proposed approach is
computational efficient even when the number of features is large and does not suffer from the
local minima problem that is encountered by artificial neural networks. We build the statistical
model through unsupervised learning, avoiding the need of using measurements from the
damaged structure, which is unrealistic in many real world problems. Extracting significant
features from raw vibration time series data is crucial to the efficiency and scalability of statistical
based methods. A feature selection algorithm is presented along with the building of our
statistical model. Numerical simulations, including the ASCE benchmark problem, are analyzed
to examine the accuracy and the scalability of our approach. We show that the proposed approach
is able to detect both the occurrence and the location of damage, and our feature selection scheme
can effectively reduce the required dimensions while retaining high accuracy.
1
Graduate Student, Intelligent Engineering Systems Laboratory (IESL), Department of Civil and
Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
2
Associate Professor, Director of IESL, Department of Civil and Environmental Engineering and
Engineering Systems Division, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
2. 1. Introduction
The process of implementing a damage detection strategy is referred to as structural health
monitoring (SHM), and can be categorized into five stages [1]: (1) detecting the existence of
damage, (2) locating the damage, (3) identifying the type of the damage, (4) determining the
severity of the damage, (5) predicting the remaining service life of the structure. Research has
been conducted in this field during the past decade and detailed literature reviews on vibration-
based damage detection methods can be found in [2-4]. The basic reasoning behind all vibration-
based damage detection is that the stiffness, mass, or energy dissipation behavior of the structure
will change significantly when damage occurs. These properties can be measured and detected by
monitoring the dynamic response of the system. When compared to other nondestructive damage
detection (NDD) techniques, such as ultrasonic scanning, acoustic emission, x-ray inspection,
etc., the vibration-based method has the advantage of providing a global evaluation of the state of
the structure.
Traditional vibration-based damage identification applications rely on detailed finite element
models of the undamaged structures, and damage diagnosis is made by comparing the modal
responses, such as frequencies and mode shapes, of the model and the potentially damaged
structure. The system identification approaches have been shown to be very accurate provided the
models can produce robust and reliable modal estimates, and large amount of high quality data is
available. But these two requirements cannot always be met in the field.
To overcome these difficulties, pattern recognition based approaches have been proposed [5-8].
Instead of building models from physical properties of the structures, those methods construct
statistical models from the vibration response data directly. This reduces the complexity in the
modeling process, in the cost of losing physical meaning of the model. Also, these methods have
been shown to be accurate in damage detection and are less sensitive to data quality; however,
some problems still remain. For example, methods which use autoregressive models (AR/ARX)
3. [5] may not be able to fit the vibration data well because it gives only linear approximations.
Complex statistical models are less efficient, and they have little control over the generalization
bound, i.e., they may fit the history data perfectly but have no guarantee on the future data.
Methods based on artificial neural network (ANN) [8] often suffer from local minima problem
and cannot be trained efficiently, and do not scale well to large scale problems. Methods use
support vector machine (SVM) have also been purposed [7, 9], with SVM used to perform
supervised, binary classifications.
In this paper, we propose using one-class SVM and support vector regression (SVR) to perform
unsupervised learning. This does not require training samples from damaged structure because
they are usually unavailable in the practical situation. Training SVM is mathematically equivalent
to solving a convex quadratic programming (QP) problem that does not have local minima. The
lack of local minima means it can be trained faster than ANN. Finally, SVM is used with a linear
kernel to reduce the number of features in our model. This leads to a statistical model that is
efficient, accurate, and easy to implement. Mathematical simulations are provided to examine the
performance and accuracy of this approach.
2. Theory of SVM and Its Application in Damage Detection
We propose SVM-based approach in this paper because of its theoretical advantages over other
learning algorithms. SVM has been applied in various pattern recognition fields and it is not new
to introduce SVM into SHM. Nevertheless, SVM itself has evolved a lot during the past few
years, and these developments also shed new light on its applications in SHM. In this section, we
first review the motivation and algorithm of SVM. Then we move on to introduce two extensions
of SVM, which are able to perform unsupervised learning, and how they can be applied in the
damage detection scheme.
4. 2.1 Support Vector Machine
Support Vector Machine was developed by Vapnik et al. [10] based on structural risk
minimization (SRM) principle from statistical learning theory, rather than empirical risk
minimization (ERM) used by most other learning algorithms (Risk means test error in this
context). This fundamental difference allows SVM to select the best classifier from a family of
functions that not only fits the training data well but also provides a bonded generalization error,
i.e., a better prediction power [11]. Together with kernel techniques, SVM has shown superior
performance on both speed and accuracy, and it has outperformed Artificial Neural Networks
(ANN) in a wide variety of applications [12]. We start introducing the algorithm by discussing
the simplest case, a linear classifier trained on separable binary data. Assume we have l training
examples,
{ x i , yi } , i = 1,L , l where yi ∈ { −1, 1} , x i ∈R n
% %
x i are often referred to as patterns or inputs, and yi are called label or outputs of the example. A
%
linear classifier (a hyperplane in R n ) can be defined as,
f ( x i ) = x T w1 + w0 = 0
% %i %
where w1 is a vector normal to the hyperplane, and w0 is a scalar constant. We can also define
%
another two auxiliary hyperplanes by f ( x i ) = x T w1 + w0 = ±1 . It is easy to show that each of the
% %i %
two parallel hyperplanes has a perpendicular distance the original hyperplane equal to 1/ w1 .
%
The distance is often referred to as the “margin”. Because the data is separable, we can always
find those hyperplanes that separate the training samples perfectly. It is obvious that the solution
is not unique, and the SVM algorithm looks for the one that gives the maximum margin. The
optimization problem for the above process can be expressed as,
1 2
minimize: w1
2 %
5. subject to: yi ( x i w1 + w0 ) ≥ 1
T
% %
To extend to inseparable data, we can introduce slack variables ξi to relax the constraints, and
then add some penalty to the relaxation. The new optimization problem becomes,
{ }
l
1
w1 + C ∑ max 1 − yi ( x T w1 + w0 ) , 0
2
minimize:
2 % i =1 %i %
subject to: yi ( x i w1 + w0 ) ≥ 1 − ξi and ξi ≥ 0
T
% %
where C is a constant determining the trade-off between our two conflicting goals: maximizing
the margin, and minimizing the training error. For computational simplicity, we can further
transform the optimization problem into its dual form by using Lagrange multipliers, denoted by
α i ’s, and the result becomes,
l
1 l l
maximize: ∑α
i =1
i − ∑∑αiα j yi yj x T x j
2 i =1 j =1 %i %
l
subject to: ∑α y
i =1
i i = 0 and C ≥ α i ≥ 0
For all constraints in Eq. that are not strictly met as equalities, the corresponding α i ’s must be
zeros. This is known as the Karush-Kuhn-Tucker (KKT) conditions in optimization theory.
Examples with non-zero α i ’s are called the support vectors, and the classifier is determined by
the support vectors alone,
N SV
f ( x i ) = x T w1 + w0 =
% %i %
∑α y x
j =1 % j j
T
i x j + w0
%
where N SV denotes the total number of support vectors.
To extend the algorithm from linear to nonlinear, we define a mapping function φ : R n → H
which maps x i from its original Euclidian space to a reproducing kernel Hilbert space (RKHS).
%
The original space is often referred to as the sample space, and the RKHS is called the feature
6. space. Without losing generality in our context, we can simply think a RKHS as a generalization
Euclidian space which can have infinite dimensions. By replacing x i in the optimization problem
%
with f ( x i ) and perform linear classification in the corresponding RKHS, the solution become,
% %
f ( x i ) = f T ( x i ) w1 + w0
% % % %
N SV N SV
= ∑ α y f%
j =1
j j
T
( x i )f ( x j ) + w0 =
% % %
∑α y Κ ( x ,x ) + w
j =1
j j
% % i j 0
The mapping function φ is called kernel function and its dot product f ( x i )f ( x j ) = Κ ( x i , x j )
T
% % % % % %
is known as the kernel. Popular selection of kernels includes linear kernel, polynomial kernel, and
radial basis function (RBF) kernel. When nonlinear kernels are used, Eq. is no longer a linear
function in the original Euclidian space.
Because solving SVM corresponds to solving a convex QP problem, it does not have local
minima and can be trained faster than algorithm that does, such as ANN. We can show that
N SV << l for easier problems, i.e., problems with small generalization errors. This leads to a
sparse matrix in Eq. and Eq., and that means the optimization problem can be solved efficiently.
Also, through SRM and VC dimension [10], SVM provides a bounded generalization error and a
systematic way to select the complexity of the solution function, which effectively control the
problem of overfitting. Detailed discussion of these properties is beyond the scope of this paper,
and can be found in many recent statistical learning text books [13, 14].
2.2 One-Class Support Vector Machine
SVM is originally a supervised, batch learning algorithm, and has been applied in the SHM field
[7, 9] performing binary classification tasks. A major challenge is that data measured from
damaged structure is often not available in practical situations, thus unsupervised learning
methods are more desirable [15]. Similar needs also occur in other domains, and researchers in
7. machine learning and pattern recognition communities have extended the idea of SVM into
unsupervised learning, often referred to as one-class SVM [16].
Instead of finding a hyperplane that maximize the margin between two classes in the RKHS, one-
class SVM maximizes the distance from the hyperplane to the origin. The corresponding
optimization problem becomes,
1 1
∑ξ
2
minimize: w1 + −ρ
νl
i
2 % i
subject to: f ( x i ) w1 ≥ ρ −ξi ξi ≥ 0
T
and
% % %
where ν ∈ ( 0,1] is a parameter similar to the C introduced in Eq., and ρ is a offset which will
be calculated automatically during the optimization. If a translation invariant kernel is used (e.g.
RBF kernel), the goal of one-class SVM can also be thought of as to find small spheres that
contain most of the training samples.
2.3 Support Vector Regression
SVM was first developed for classification, and the labels yi in represent a finite number of
possible categories. The algorithm can be extended to estimate real-valued functions by allowing
yi to have real value, and defining a suitable loss function [17]. The following loss function,
{
f ( x i ) − yi = max f ( x i ) − yi − ε , 0
% %
}
known as ε -insensitive loss function, pays no penalty to points within the ε range, and this
carries over the sparseness property from SVM to SVR. Again, the estimated function can be
expressed as Eq., and the goal now is to minimize,
l
1
w1
2 %
2
i =1 %
{
+ C ∑ max f ( x i ) − yi − ε , 0 }
The basic idea of SVM, one-class SVM and SVR are summarized in Table 1 and Figure 1. For
simplicity, the discriminant function (Eq.) of SVM and one-class SVM are drawn as linear
8. functions. As mentioned, when nonlinear kernels are used, the functions are by no means linear in
the sample space.
Maximize Penalty
misclassified samples and
SVM distance between two hyperplanes
samples within the margin
One-class SVM distance between the hyperplane and origin misclassified samples
SVR smoothness of the function samples outside the ε - tube
Table 1. Comparison of SVM, one-class SVM and SVR
Origin
Figure 1. Geometric interpretation of SVM, one-class SVM and SVR in 2D
2.4 Damage Detection Using SVM
Vibration-based damage detection approaches are grounded on the assumption that the dynamic
response of the system will change significantly when damage occurs. We propose using SVM
for the detection, either through novelty detection or regression, and it is essential to have a
reasonable representation of the dynamic response before we can feed the data into SVM.
A time series is usually modeled by splitting it into series of windows, and the value at each time
point is determined by a set of its previous values, i.e.,
xt = f ( xt −τ , xt − 2τ ,..., xt − mτ )
where m and τ are referred to as the embedding dimension and delay time [18], respectively.
Through this representation, an acceleration response series can be transformed into a data set of
fixed-length vectors, and used by SVM. Damage detection is conducted by examining the
9. similarity and dissimilarity among data collected from different structure status. Detailed analysis
procedure will be given in the numerical studies section.
3. Feature Selection
As mentioned, statistical proximate approach is an attractive alternative to approaches based on
high order physical models in the sense that the former is computational competitive, less
sensitive to modeling error and data quality, and requires only measurement signals to build the
model. SVM is among the fastest algorithm in statistical learning; however, for large scale
problems the SVM algorithm is still slow and further reducing the computational complexity is
necessary.
3.1 The Motivation of Feature Selection in the Proposed Approach
Although in the dual form of SVM we are facing a QP problem whose computational complexity
is proximately proportional to the square of the number of training examples l , not the number of
features (ref. Eq.~), reducing the number of features is nevertheless helping to improve the
performance. For example, dot products between feature vectors are frequently required when
evaluating a kernel function. This process is time-consuming when the number of features is
large. When implementing a SVM solver, we often cache these results to improve the
performance, and this also brings up the memory consumption problem. Besides, field data is
often polluted by noises and redundant information, and feature selection provides a way of
identifying and eliminating them from the feature set. This not only improves the computational
efficiency but also increase the accuracy.
10. 3.2 Feature Selection using SVM
By looking at the solution of the primal form of SVM given by Eq., we can see that each
component in w1 can be thought of as the weight of its corresponding feature, φ ( x i ) , in RKHS.
% %
Feature reduction is done by removing features with zero weights from the set.
2
The primal SVM optimization problem is to minimize w1 while obey all the constraints, which
%
forces the value of each component wi to be small, but does not set it to zero because the
2 2
derivative of w1 at wi B 0 is small. We could replace w1 by w1 in SVM to solve this
% % %
problem, but this will forbid us transform SVM into its dual form and lose all the advantages. The
simplest way to get around this is to set a threshold for wi , and remove features associate with
weights smaller than the threshold.
Using the time series model indicated in Eq., the target of feature reduction in our damage
detection approach is to reduce the embedding dimension, i.e., the length of patterns in the
sample space. Because we are aimed to the sample space, no feature mapping is needed and a
linear kernel is suitable for this scenario, for its efficiency. Note that the choice of kernel in
feature selection is independent of the choice of kernel in the classification or regression stage.
When a linear kernel is used, features in the RKHS are actually the patterns in the sample space
themselves. This is why we used the conventional term “feature selection” throughout this paper,
although we are actually doing “pattern selection”.
The value of w1 is not calculated in solving SVM because only its dot product is required and
%
which can be obtained more efficiently by evaluating the kernel function. When doing feature
selection, we need to calculate w1 explicitly. The following relation is obtained while deriving
%
the dual form of SVM with linear kernel (ref. Eq.),
11. n SV
w 1 = ∑ α i yi x i
% i =1 %
which can be used to determine w1 once the corresponding SVM is solved.
%
This feature selection approach allow us to reduce the number of features while keep the
accuracy, as our numerical studies shown.
4. Numerical Studies
In this section, we demonstrate the proposed approach using a simple 2-story shear building and
the ASCE benchmark problem [19]. In all examples, acceleration responses are first normalized
using,
ai = ( ai − µa ) σ a
where µa and σ a are the sample mean and sample standard deviation of the acceleration signal,
respectively. Also, in all examples, when we say acceleration response we mean relative
acceleration response between two adjacent floors, i.e., the acceleration difference between the
current floor and the one below. By doing these, we do not need to deal with the scale and units
of the loading, and we can better isolate the effect of damages in each story. The value of SVM
related parameters, such as C , ν , ε and σ in RBF kernel are selected based on common
practice in pattern recognition and are specified in each example. In general, we can obtain
similar results as long as those parameters are within a reasonable range.
4.1 Two-Story Shear Building
We start with a simple 2-story shear building shown in Figure 2. Damage is modeled by reducing
the stiffness of a column. Vibration data are collected through accelerometers attached under each
floor. Three different SVM based approach are used for damage detection, namely, (1) supervised
SVM, (2) one-class SVM, and (3) support vector regression.
12. 0.2 m
B
0.5 m
A
0.5 m
Figure 2. Plane Steel Frame under Traverse Seismic Load ( EI = 6.379 N ⋅ m 2 for all columns)
4.1.1 Damage Detection Using Supervised Support Vector Machine
Figure 3 shows the acceleration response of the structure under the 1940 El Centro earthquake
load, and each time series corresponds to a different structure status. The damage in a floor is
modeled by reducing the stiffness of one of the columns in that floor by 50%.
1
0.5
Acceleration (g)
0
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Time (sec)
-0.5
Undamaged 1F Damaged 2F Damaged
-1
Figure 3. Acceleration Measurements from Accelerometer A (1F, El Centro)
The vibration data are recorded at location A and B with sampling rate equal to 50Hz. Therefore,
with a 2-second long window, we can extract 100 patterns from the time series for each example.
Knowing the patterns and their corresponding labels (undamaged, 1st floor damaged, or 2nd floor
damaged), we can feed these data into a support vector machine (using C =100 and a RBF kernel
with σ 2 =20). The results of a 5-fold cross validation are shown in Table 1. We can see that SVM
13. is able to detect the occurrence as well as the location of the damage with very high accuracy,
provided the number of patterns is long enough. The trial-and-error way of selecting patterns here
will be replaced by our feather selection algorithm in section 4.2.2.
# of Patterns CV 1 CV 2 CV 3 CV 4 CV 5 Average
100 97 / 120 89 / 120 90 / 120 87 / 120 78 / 120 76.2%
150 111 / 120 111 / 120 119 / 120 117 / 120 116 / 120 95.7%
200 120 / 120 120 / 120 120 / 120 120 / 120 120 / 120 100%
Table 1. Cross Validation Results (3 structure status; El Centro)
In the next example, the same structure is excited using two different seismic loads. For each
load, acceleration responses in two different structure status (undamaged / 1st floor damaged) are
recorded, as shown in Figure 4.
1.5
1
0.5
Acceleration (g)
0
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
-0.5
Time (sec)
-1
-1.5 Undamaged 1F damaged
Undamaged (Kobe) 1F damaged (Kobe)
-2
Figure 4. Acceleration Measurements from Accelerometer A (El Centro and Kobe)
The purpose here is to detect damages in the structure, regardless of the sources of excitations
that causes the damage. We mix the two acceleration responses measured from the structure
under different excitations and train SVM with 2 classes (damaged or undamaged) instead of 4.
The cross validation results are shown in Table 2. Even with mixed excitations, SVM can still
achieve an accurate detection. When we group the training samples with the same structure status
together, we are implicitly indicating that the excitation is not a feature we care about, hence
SVM is focused on maximizing the differences caused by changes in the structure.
# of Patterns CV 1 CV 2 CV 3 CV 4 CV 5 Average
14. 200 145 / 160 151 / 160 146 / 160 144 / 160 147 / 160 91.6%
Table 2. Cross Validation Results (2 structure status; El Centro & Kobe)
4.1.2 Damage Detection Using One-Class Support Vector Machine
Although supervised SVM classification is accurate and easy to implement, in practice we often
do not have vibration data from damaged structure beforehand. In this section, we apply one-class
SVM on the same structure used before. Similarly, we extract features from the acceleration
response by setting a windows size equal to 2 seconds (100 patterns for each example), and we
again use a RBF kernel with σ 2 =20, and ν =0.1. Three one-class SVM models are trained using
response data measured from undamaged structure, each with a different seismic load. Then each
model is used to test response data measured from both damaged and undamaged structure, under
the 3 seismic loads. The results are shown in below.
Training El Centro Golden Gate Kobe
Testing El C. G.G. Kobe El C. G.G. Kobe El C. G.G. Kobe
Undam. 14.3% 29.6% 34.9% 85.3% 16.3% 97.8% 18.9% 21.6% 17.4%
1F 89.1% 90.6% 75.0% 98.8% 97.2% 100% 86.7% 57.8% 68.8%
2F 80.9% 100% 73.4% 100% 100% 100% 75.4% 100% 66.6%
Table 3. Proportion of outliers (800 testing samples)
Undamaged 1F 2F Undamaged 1F 2F Undamaged 1F 2F
100% 100% 100%
75% 75% 75%
50% 50% 50%
25% 25% 25%
0% 0% 0%
El Centro Golden Gate Kobe El Centro Golden Gate Kobe El Centro Golden Gate Kobe
Figure 5. Proportion of outliers (800 testing samples)
(model built using El Centro, Golden Gate, and Kobe seismic loading respectively, left to right)
Table 3 and Figure 5 indicate that when a damage occurred in the structure, the percentage of
outliers increase significantly. Note that each SVM model is trained using positive samples
15. measured from one particular seismic load. When the model trained using Golden Gate
earthquake is applied to monitor the same structure under a different seismic load, a large portion
of signals measured from the undamaged structure are also considered as outliers. This is due to
the fact that both external force and structure status can affect the acceleration response, and a
model built on one particular loading history cannot be generalized well to monitor arbitrary
loading. To reduce this unwanted effect, we train SVM models using a larger database that
consists of mixture of acceleration responses measured from undamaged structure under different
seismic loads. By grouping these responses together, we implicitly tell SVM to ignore the
differences caused by excitation variability. Table 4 and Figure 6 shows the results of damage
detection using models built on 3 different sized data sets. (left to right, training data measured
from structure under a. Golden Gate, b. El Centro and Golden Gate, c. El Centro, Golden Gate,
Corral, Hach and Hachinohe seismic load)
Training Golden Gate 2 mixture 5 mixture
Testing El Centro Kobe El Centro Kobe El Centro Kobe
Undam. 85.3% 97.8% 9.5% 31.6% 1.0% 16.0%
1F 98.8% 100% 90.5% 75.1% 64.0% 66.6%
2F 100% 100% 79.0% 73.5% 63.8% 64.5%
Table 4. Proportion of outliers (800 testing samples) detected at location A
Undamaged 1F 2F Undamaged 1F 2F Undamaged 1F 2F
100% 100% 100%
75% 75% 75%
50% 50% 50%
25% 25% 25%
0% 0% 0%
El Centro Kobe El Centro Kobe El Centro Kobe
Figure 6. Proportion of outliers detected at location A
As shown in Figure 6, when SVM model is trained using mixed data set, the effect due to loading
variability is averaged out and the change in structure properties become dominant, i.e., the
16. model is able to detect damages caused by arbitrary loads. Note that the acceleration response
measured from Kobe earthquake is never included in the training set and the result is also good,
i.e., the model can generalize well to unseen data. Nonetheless, when damage occurred in either
floor, the model detects a significant change in both sensors and fails to tell the location of the
damage.
4.1.3 Damage Detection Using Regression-based Methods
Using regression based novelty detection approach for damage detection has been suggested by
Los Alamos National Laboratory (LANL) [5], and followed by others with minor modifications
[6, 20]. The concept of this two-step approach is as following: for each structure, a “reference
database” is created recording the acceleration response of perturbing the undamaged structure by
many different excitations. When a new acceleration response aTBD ( t ) is measured from a
structure whose current status is to be determined, the first step is to select an acceleration
response aun ( t ) from the predefined database which is closest to the current measurement. The
step is referred to as “data normalization”. The second step is to fit aun ( t ) using an auto-
regressive model with exogenous inputs (ARX), and use the ARX model to predict aTBD ( t ) .
Denoting the training error between the ARX model and aun ( t ) at time t as ε un ( t ) and the
prediction error between the ARX model and aTBD ( t ) at time t as ε TBD ( t ) , the ratio of the
standard deviation of the two errors is defined as the damage-sensitive feature,
h = σ (ε TBD ) σ (ε un )
and a experimental threshold limit is used to indicate the occurrence of damage.
We adopt the concept of the damage-sensitive indicator, and make two modifications to the
LANL approach. First, instead of selecting a closest acceleration response from the reference
database and build a regression model from that one response, we build our model from all
17. responses in the database. This simulates the worst case in the first step of LANL approach, i.e.,
no similar excitation can be found in the reference database. Second, in LANL, ε un ( t ) is the
training error of building the ARX model, and ε TBD ( t ) is the prediction error when ARX is used
to predict the unseen data. To be more consistent, in our approach, ε un ( t ) is calculated by use our
regression model to predict an arbitrary piece of unseen response data from the undamaged
structure. Third, linear regression is replaced by SVR, which does not have to be linear and can
guarantee a bounded generalization error. Also, combining with our feature selection scheme,
SVR also provides a systematic way of determining the embedded dimension, a free parameter in
the time series model.
The 2-story steel frame shown in Figure 2 is used in this example. Two experiments are
conducted by exciting the structure using El Centro (1970) and Golden Gate (1989) earthquakes,
respectively. For each experiment, a 5-second long acceleration response, measured from the
structure 5 seconds after the start of excitation, is used as training data. Response measured in the
next 1 second is used as the testing data. We choose C =100 and ε =0.1 for the SVR, and a RBF
kernel with σ =10 is used. The damage detection results are shown in Table 5.
Seismic load El Centro Golden Gate
Location of Damage 1F 2F 1F 2F
h (location A, 1F) 2.984 1.474 2.240 1.173
h (location B, 2F) 1.207 2.554 1.244 2.344
Table 5. Damage Detection in a 2-story Frame using SVR
As expected, the SVR model built from undamaged structure yields significant higher prediction
errors when used to predict the response from damaged structure. When a suitable threshold limit
is chosen for the damage-sensitive feature h (Eq.), the proposed approach is able to indicate both
the existence and the location of the damage.
18. 4.2 ASCE Benchmark Problem
Structural health monitoring studies often apply different methods to different structures, which
make side-by-side comparison of those methods difficult. To coordinate the studies, the ASCE
Task Group on Health Monitoring built a 4-story 2-bay by 2-bay steel frame benchmark structure
and provided two finite element based models, a 12DOF shear-building and a more realistic
120DOF 3D model [19]. The benchmark problem is studied in the following examples.
4.2.1 Support Vector Regression
Five damage patterns are defined in the benchmark study, and we apply the SVR detection
procedure to the first two patterns: (1) all braces in the first story are removed, and (2) all braces
in both the first story and the third story are removed. Acceleration responses of these two
damage patterns are generated by using the 12DOF analytical model under ambient wind load.
The results of damage detection and localization using damage-sensitive feature h is shown in
Table 6. The training data is a mixture of 5-second acceleration responses obtained from the
undamaged structure under 10 different ambient loads. For each damage pattern, two 1-second
acceleration responses caused by different ambient loads (denoted as L1 and L2 in Table 6.) are
used as the testing data. We choose C =100, and ε =0.1 in the SVR, and RBF kernel with σ =10.
Damage pattern 1 Damage pattern 2
# of patterns 30 100 30 100
Ambient load L1 L2 L1 L2 L1 L2 L1 L2
h (1F) 2.57 2.46 1.69 1.56 2.03 2.07 1.78 1.58
h (2F) 1.74 1.07 1.32 0.88 1.48 1.11 1.28 1.09
h (3F) 1.30 1.43 1.26 1.07 2.19 1.92 1.71 1.48
h (4F) 1.30 1.23 1.02 0.89 1.20 1.11 1.08 1.12
Table 6. Damage detection and localization results for damage pattern I and II
Comparing to the results given in [6] and [20], the differences of the damage-sensitive features
between damaged and undamaged structure is less significant, due to the fact we simulate a worse
case in the data normalization step. Nevertheless, our approach indicate the occurrence and the
19. location of the damage in both damage patterns successfully, whereas the second floor in damage
pattern 2 is classified as damaged in [6].
We can see that the value of h varies when the length of patterns is changed. Although the
approach is able to distinguish the structure status from one another in both pattern lengths, a
systematic way of feature selection is more desirable. We will apply the feature selection scheme
discussed in section 3.2 in the following example.
4.2.2 Feature Selection
We use the feature reduction scheme on both supervised SVM and unsupervised SVR approach.
Recall that in our first example in section 4.1.1, the number of features is selected via trial-and-
error, and more than 100 features are required in order to achieve 80% accuracy. Using Eq., we
draw the absolute value of the components of the w1 in Figure 7. It is clear that some features are
%
more important to others, and we can understand why a long pattern was required. Table 7 shows
that by selecting features based to the value of wi , we can obtain the same level of accuracy
with much less features. Note that we use the term feature and pattern interchangeably in this
section, because we are selecting features in the input (pattern) space.
3
2.5
2
w1
1.5
1
0.5
0
1 16 31 46 61 76 91 106 121 136 151 166 181 196
feature
Figure 7. Absolute value of the components in the w1 vector
%
First k patterns Selected k patterns
20. k 50 100 150 200 100 40
CV 1 (120) 60 97 111 120 120 104
CV 2 (120) 62 89 111 120 120 108
CV 3 (120) 68 90 119 120 120 105
CV 4 (120) 61 87 117 120 120 106
CV 5 (120) 63 78 116 120 120 110
Average 52.3% 76.2% 95.7 % 100 % 100 % 88.8 %
Table 7. Feature selection in supervised SVM damage detection
Similarly, we apply the feature selection approach to the ASCE benchmark example. The result is
shown in Figure 8 and Table 8. In this case, we can see that a long pattern is not necessary. Using
the first 9 features, the model is able to generate similar result as using 100 features.
7000
6000
5000
4000
w1
3000
2000
1000
0
1 10 19 28 37 46 55 64 73 82 91 100
feature
Figure 8. Distribution of the components in the w1 vector
%
Damage pattern 1 Damage pattern 2
# of patterns 9 100 9 100
Ambient load L1 L2 L1 L2 L1 L2 L1 L2
h (1F) 2.16 2.18 1.69 1.56 1.90 1.85 1.78 1.58
h (2F) 1.55 1.14 1.32 0.88 1.48 1.13 1.28 1.09
h (3F) 1.35 1.30 1.26 1.07 2.08 1.74 1.71 1.48
h (4F) 1.31 0.99 1.02 0.89 1.15 1.03 1.08 1.12
Table 8. Feature selection in SVR-based damage detection
5. Conclusions
SVM has achieved remarkable success in pattern recognition and machine learning areas, and its
continuing developing also shed new light on its applications in SHM. This paper has described
21. two approaches which applying unsupervised SVM algorithms to vibration-based damage
detection, in addition to the supervised SVM introduced earlier by other researchers. By
combining SVM based novelty detection techniques with vibration-based damage detection
approach, eliminating the need of using data from damaged structure. These approaches are easy
to implement because only vibration responses measured from the structure are required for
building the models. Numerical examples have shown that the SVR approach is able to detect
both the occurrence and location of damages. Furthermore, large dimensional feature vectors
result in more noises and pose a restriction on the scalability of most statistical pattern
recognition methods. The idea of regularization in SVM is extended into feature selection and we
show that the reduced model can still retain the same level of accuracy.
Acknowledgement
This research is supported by the …
References
1. Rytter, A., Vibration based inspection of Civil Engineering structures, in Department of
Building Technology and Structural Engineering. 1993, University of Aalborg: Denmark.
2. Doebling, S.W., C.R. Farrar, and M.B. Prime, A Summary Review of Vibration-Based
Damage Identification Methods. The Shock and Vibration Digest, 1998. 30(2): p. 91-105.
3. Stubbs, N., et al. A Methodology to Nondestructively Evaluate the Structural Properties
of Bridges. in Proceedings of the 17th International Modal Analysis Conference. 1999.
Kissimmee, Fla.
4. N. Haritos and J.S. Owen, The Use of Vibration Data for Damage Detection in Bridges:
A Comparison of System Identification and Pattern Recognition Approaches.
International Journal of Structural Health Monitoring, 2004.
5. Hoon Sohn and Charles R Farrar, Damage Diagnosis Using Time Series Analysis of
Vibration Signals, in Smart Materials and Structures. 2001.
6. Y. Lei, et al. An Enhanced Statistical Damage Detection Algorithm Using Time Series
Analysis. in Proceedings of the 4th International Workshop on Structural Health
Monitoring. 2003.
7. Worden, K. and A.J. Lane, Damage Identification using Support Vector Machines. Smart
Materials and Structures, 2001. 10(3): p. 540-547.
22. 8. Yun, C.B., et al., Damage Estimation Method Using Committee of Neural Networks.
Smart Nondestructive Evaluation and Health Monitoring of Structural and Biological
Systems II. Proceedings of the SPIE, 2003. 5047: p. 263-274.
9. Ahmet Bulut, Peter Shin, and L. Yan. Real-time Nondestructive Structural Health
Monitoring using Support Vector Machines and Wavelets. in Proceedings of Knowledge
Discovery in Data and Data Mining. 2004. Seattle, WA.
10. Vladimir N. Vapnik, The Nature of Statistical Learning Theory. 1995, New York:
Springer-Verlag.
11. Christopher J.C. Burges, A Tutorial on Support Vector Machines for Pattern
Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.
12. Byvatov E., et al., Comparison of support vector machine and artificial neural network
systems for drug/nondrug classification. Journal of Chemical Information and Computer
Sciences, 2003. 43(6): p. 1882-1889.
13. Bernhard Schölkopf and Alex Smola, Learning with Kernels - Support Vector Machines,
Regularization, Optimization and Beyond. 2002: MIT Press.
14. John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis. 2004:
Cambridge University Press.
15. Michael L. Fugate, Hoon Sohn, and C.R. Farrar. Unsupervised Learning Methods for
Vibration-Based Damage Detection. in Proceedings of the 18th International Modal
Analysis Conference. 2000. San Antonio, Texas.
16. Bernhard Schölkopf, et al., Estimating the Support of a High-Dimensional Distribution.
Neural Computation, 2001. 13: p. 1443-1471.
17. Alex J. Smola and Bernhard Schölkopf, A Tutorial on Support Vector Regression, in
NeuroCOLT2 Technical Report Series. 1998.
18. Mead, W.C., et al. Prediction of Chaotic Time Series using CNLS-Net-Example: The
Mackey-Glass Equation. in Nonlinear Modeling and Forecasting. 1992: Addison
Wesley.
19. Johnson, E.A., et al. A Benchmark Problem for Structural Health Monitoring and
Damage Detection. in Proceedings of the 14th Engineering Mechanics Conference. 2000.
Austin, Texas.
20. K.K. Nair, et al. Application of time series analysis in structural damage evaluation. in
Proceedings of the International Conference on Structural Health Monitoring. 2003.
Tokyo, Japan.
Be the first to comment