Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

Adaptive training of vibration‐
based anomaly detector for wind
turbine condition monitoring
Takanori
Hasegawa
Jun
Ogata
Masahiro
Murakawa
Tetsunori
Kobayashi
Tetsuji
Ogawa
1

Anomaly Detection for CMS
2
• Condition monitoring system (CMS) plays a vital role in
establishing condition‐based maintenance and repair (M&R).
• Anomaly (fault) detection is a key technology to realize effective
CMSs. [Afrooz, et al, 2014][A Zaher, et al, 2009][Z Hameed, et al 2009]
https://www.nrel.gov/continuum/partnering/wind.html

Challenges in Anomaly Detection
1. Both data for developing system and those
during its operation are obtained from
identical device.
2. Constant monitoring of data is desirable,
but data are observed periodically.
3
training
target model
large
data
Target wind turbine
Building reliable anomaly detector needs
large‐scale data, but it is time‐consuming!

Challenges in Anomaly Detection
Target wind turbine
• Anomaly detector is not reliable at early
stage of monitoring target device.
4
training
target model
small
data

Effective Use of Existing Detector
Target wind turbine Other wind turbine
• Base anomaly detector is built using large‐
scale data from non‐target device.
• Base anomaly detector is adapted to target
device using small amount of data.
5
base model
adaptive
training
target model
large
data
small
data

• Base anomaly detector is built using large‐
scale data from non‐target wind turbine.
• Base anomaly detector is adapted to target
wind turbine using small amount of data.
6
base model
adaptive
training
target model
large
data
small
data
Efficient development of robust
anomaly detector

Contents
7
1. Baseline system
GMM‐based anomaly detection using FLAC features
2. Proposed Method
Adaptive training of existing similar system to target
environment
3. Experiments
Adaptive training performed well when only small
amount of data is available during system development.

Baseline system for
Anomaly Detection
8

Schematic Diagram
9
Feature Extraction
Training
Feature Extraction
Computing negative
log. likelihood
Thresholding
Normal model
run‐timesystem development

Schematic Diagram
10
Feature Extraction
system development

Schematic Diagram
11
Feature Extraction
Training
Normal model
system development

Schematic Diagram
12
Feature Extraction
Training
Feature Extraction
Normal model

Schematic Diagram
13
Feature Extraction
Training
Feature Extraction
Computing negative
log. likelihood
Normal model

Schematic Diagram
14
Feature Extraction
Training
Feature Extraction
Computing negative
log. likelihood
Thresholding
Normal model

Feature Extraction Using FLAC
Spectrogram 15
STFT
Fourier transform is applied to
vibration signal for short‐time interval
using sliding window.
J.Ogata, et al
WWEC2016

Spectrogram 16
STFT
J.Ogata, et al
WWEC2016

Spectrogram 17
STFT
J.Ogata, et al
WWEC2016

Spectrogram 18
STFT
J.Ogata, et al
WWEC2016

Spectrogram 19
STFT
J.Ogata, et al
WWEC2016

Spectrogram 20
J.Ogata, et al
WWEC2016

Spectrogram 21
,
∗
time
frequency
Fourier local autocorrelation (FLAC)
AC calculated in local area on TF plane.
J.Ogata, et al
WWEC2016

Spectrogram 22
5 filters are applied
to calculate ACs.
,
∗
2
1 1 1
1
1
1
1
1
Magnitude features
Inner domain features
Cross domain
dynamic features
time
frequency
Fourier local autocorrelation (FLAC)
AC calculated in local area on TF plane.
J.Ogata, et al
WWEC2016

Modeling Sequence of Features
We can represent vibration signal segment that have a varying number of
feature vectors by using Gaussian mixture model (GMM).
Signal Space
GMM
FLAC Feature Space (2 to 75‐dim) 23

Measuring Anomaly Using GMM
Likelihood of GMM:
: # of Gaussians
Θ , , Σ : parameter set of GMM
: mixture weight for ‐th Gaussian
: mean vector ‐th Gaussian
Σ : full covariance matrix for ‐th Gaussian
Anomaly score:
24
FLAC feature space
Anomaly
• GMM represents normal status of machinery.
• Negative logarithmic likelihood of input for normal status GMM
can measure anomaly of device operating.

Schematic Diagram
26
Feature Extraction
Training
Feature Extraction
Computing negative
log. likelihood
Thresholding
Normal model
system development run‐time

27
base model
adaptive
training
target model
large
data
small
data

Model Adaptation
• Suitable for case where only small amount of data can be
observed in target environment.
28
pre‐trained model
mismatch
observation in target environment

Model Adaptation
• Suitable for case where only small amount of data can be
observed in target environment.
• Existing model is adjusted to data in target environment to
reduce mismatch between training and testing data.
29
pre‐trained model
mismatch
pre‐trained model
adapted model

Maximum A Posteriori Adaptation
30
MAP adaptation has been frequently applied to GMM‐based
prediction systems for reducing mismatches in domains between
development and run‐time data.
e.g., Effect of difference in speakers is compensated in ASR.
pre‐trained model
mismatch
pre‐trained model
adapted model

Maximum Likelihood Training
31
Mean is estimated with ML training only on data from target device.
1‐th order stat. (sum)
0‐th order stat. (count)
ML estimates are not reliable, especially
when limited data are available for training.

32
Mean is estimated with MAP adaptation by averaging statistic of
pre‐trained model and that accumulated from observations.
Contribution of pre‐
trained model
Mean of pre‐trained model
0‐th order stat. (count) of
observations
1‐th order stat. (sum) of
observations

33
Mean is estimated with MAP adaptation by averaging statistic of
pre‐trained model and that accumulated from observations.
Contribution of pre‐
trained model
Mean of pre‐trained model
0‐th order stat. (count) of
observations
1‐th order stat. (sum) of
observations
MAP estimate is more reliable than ML estimate
even though small amount of data is available
because MAP adaptation uses not only limited
observations but reliable pre‐trained model.

Methods in Enrollment of Normality
35
ML
Training
vibration
from target
device
Feature
Extraction
target
GMM
Training only on limited data from target gearbox
ML‐train
MAP
Adaptation
vibration
from target
device
base
GMM
Feature
Extraction
target
GMM
Adaptive training of existing system to target gearbox
MAP‐adapt

Vibration Materials
36
NREL dataset
HSG dataset
(High‐speed gearbox)
Target machine Gearbox Gearbox
power rating 750 kW 2 MW
nominal speed 1800 rpm 1800 rpm
label healthy and faulty case1 (w/ fault)
case2 (good condition)
case3 (good condition)
sampling rate 40 kHz 97.656 kHz
purpose pre‐training (MAP) ML training
MAP adaptation
testing
NREL, https://www.nrel.gov/docs/fy12osti/54530.pdf
Eric Bechhoefer, http://data‐acoustics.com/measurements/gear‐faults/gear‐1/

Experimental Setup
37
dataset data length
normal faulty
pre‐training non‐target NREL 59892 ‐‐‐
adaptation (MAP)
target HSG 4132 (case2) ‐‐‐
training (ML)
testing target HSG 3540 (case3) 6490 (case1)
window length 0.1 sec
frequency Unifying 40 kHz
# of mixtures in GMM 1, 2, 4, 8, 16, 32, 64, 128, 256, 512
covariance matrix full covariance

Criterion: Area Under the Curve (AUC)
False positive rate
False negative rate
38
w/o adaptation
(0 frame of target data)
w/ MAP adaptation
(4132 frame of target data)
AUC 0.890 0.006

Methods in Enrollment of Normality
39
ML
Training
vibration
from target
device
Feature
Extraction
target
GMM
Training only on limited data from target gearbox
ML‐train
MAP
Adaptation
vibration
from target
device
base
GMM
Feature
Extraction
target
GMM
Adaptive training of existing system to target gearbox
MAP‐adapt

MAP adaptation works on less training data
40
MAP adaptation successfully
reduces AUCs at early stage of
collecting data.
0.10
0.61
4132
AUC
Data length [frame]
590

Advantages of MAP Adaptation
41
Adapted system can handle
initial failure owing to its high‐
performance at early stage of
monitoring.
4132
AUC
Data length [frame]
590

Take‐home Messages
43
Approach : MAP adaptation updates existing normal state
GMM using limited data from target device.
Effectiveness : Developed system yields significant
reduction in AUCs at early stage of collecting data.
Data‐driven anomaly detector developed for other devices
can be transferred to achieve efficient operation of robust
condition monitoring.
w/o training
(0s of target data)
ML training
(6s of target data)
MAP‐adaptation
(6s of target data)
0.89 0.61 0.10

Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

More Related Content

More from pcl-lab

Recently uploaded

Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring