Use of Machine Learning Algorithms for Anomaly
Detection in Particles Accelerators Technical Infrastructure
Lorenzo Giusti
lor.giusti@icloud.com
23/04/2020 1
Problem
2
• We want to exploit upcoming failures among the accelerators’ technical infrastructure
• In the current situation, a device is going to be checked if its internal temperature rises
over a certain threshold level (37.5 °C)
• The monitoring framework is constantly supervised by the engineers, which have
thousands of devices to take care and sometimes they can miss one alarm out of
millions received everyday
• Thermal runaways cannot be exploited by human supervision, leading to a repentine
increasing of the internal temperature which leads to an explosion
23/04/2020
Motivation
3
• Devices are not flawless:
• Reliability and performances decreases with time
• Faults can suddenly put out of services and impact overall availability
• Preventive and corrective maintenance is expensive and impact operating time
• Predictive maintenance and anticipated interventions improve:
• Reduce risks
• Decreases downtime
• Increase reliability and overall availability
23/04/2020
Approaches
4
• Mahalonobis Distance:
• Measures the distance between a point and a distribution
• If the distance is above a certain threshold, the point is considered
anomalous
• Isolation Forest:
• Split the regions “randomly” until a point is isolated
• The probability of that point being non-anomalous is proportional
of the number of splits
• Residual Autoregressive Score:
• Compute the norm between the actual time series and the one
predicted with an autoregressive model (eg. AIRMA)
• If the gradient of the norm is a monotonic increasing function, then
the time series is classified as anomalous
23/04/2020
Solution
5
• Machine Learning based anomaly detection:
• Real time monitoring with unsupervised detection
• Device independent with generalized algorithms
• Independent of environmental and periodic operational conditions
• State of the art artificial intelligence algorithms
• Faults are predicted with significant lead time (i.e. day to weeks before failure)
• Different type and thresholds of anomalies are detected
23/04/2020
Failure Analysis Framework Architecture
6
Noisy temperature signal as input Extract, Load and Transform Engineered features as output
• Our framework is developed using only the significant features of the devices
• i.e. only the temperature sensors for the collimators
• Extract Load Transform Pipeline:
• Identify and remove seasonal components from the signal
• Filter out the environmental noise (eg. Gaussian Smoothing)
• Derive additional features in order to gain more insights on physical phenomena of
interest
• At the end, we homogenize the extracted data range and variability
23/04/2020
Neural Networks
7
• Neural Networks are computing systems that are inspired by the biological
brain.
• They learn to perform tasks by considering examples, without being
programmed with task-specific rules
• The original goal of the neural approach was to solve problems in the same
way that a human brain would, now the attention moved to performing specific
tasks
McCulloch, W.S., Pitts, W., (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics.
23/04/2020
Recurrent Neural Networks
8
• In handling generic sequences of data, simple feed forward neural networks
have very tough limitations, especially for:
• Sequences with variable lengths (e.g. environmental processes)
• Sequences with long-term dependencies
• Recurrent neural networks (RNNs) are a class of neural networks naturally
able to exhibit temporal dynamic behavior
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, (1986), “Learning internal representations by error propagation”.
• RNNs are susceptible to vanishing gradients in processing long sequences
23/04/2020
Recurrent Neural Networks
9
• Sharing parameters across the models allow us to extends the model to
samples of different forms and generalize across them:
• The output is produced using the same update rule applied to the previous
outputs
• RNNs introduce the concept of cycles in the computational graph:
• Cycles model the influence of the value at time t on the value at time t+ 𝜏
I. Goodfellow, Y. Bengio, and A. Courville, (2015), “Deep Learning”.
• The hidden state at time t can be considered as a summary of all the previous
values processed by the RNN
23/04/2020
LSTM Networks
10
• The forget gate’s vector of activations allows the network to better control the
gradients values, preventing eventual vanishings of it
• Well-suited to classifying, processing and making predictions based on time
series, since there can be lags of unknown duration between important events
in a time series
• LSTM Networks are a type of recurrent models that have been used in many
sequence learning tasks like speech recognition and time series forecasting
Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". In Neural Computation.
23/04/2020
Bidirectional Networks
11
• Bidirectional LSTM Networks are made by putting two independent
networks together
• The input sequence is given in normal time order for one network, and in
reversed time order for the other one.
• This structure allows the networks to have both backward and forward
information about the sequence at every time step
M. Schuster and K. K. Paliwal, (1997) ,”Bidirectional recurrent neural networks" in IEEE Transactions on Signal Processing.
23/04/2020
Autoencoders
12
• The principal aim is to learn a representation for a set of data, by training the
network to ignore signal noise
• By learning to replicate the most salient features in the model is encouraged to
learn how to precisely reproduce the most frequent characteristics:
• When facing anomalies, the model should fail the reconstruction
• Reconstruction error of a data point, is used as an anomaly score to detect
anomalies
Kramer, M.A., (1991), “Nonlinear principal component analysis using autoassociative neural networks”, AIChE J.
23/04/2020
Autoencoders
13
• Formally, Autoencoders are feedforward neural networks whose aim is
learning to copy its input to its output: ℐ(𝐱) = 𝐱
• As it is, the identity function itself isn’t so useful, unless we force the model to
prioritize which aspects of the input should be copied, leading the model to
learn useful properties of the input data
• The main idea is to separate the identity function in two parts: ℐ(𝐱) =
𝑔(𝑓(𝐱))
• 𝐡 = 𝑓(𝐱) is the encoder function, which maps the input to an internal
representation or code
• 𝐫 = 𝑔(𝐡) is the decoder function, which produces a reconstruction of
the input as a function of the code taken from the previous mapping• One way to capture the most salient features of the input, the encoder/decoder
functions must be of the form:
𝑓: 𝑅 𝑛
→ 𝑅 𝑚
, 𝑚 < 𝑛
𝑔: 𝑅 𝑚
→ 𝑅 𝑛
, 𝑚 < 𝑛
23/04/2020
Autoencoders
14
• An Autoencoder whose code dimension is less than the input dimension is
called undercomplete
• The learning process of an undercomplete autoencoder is described as:
𝐖∗
= arg𝑚𝑖𝑛
𝐖
𝐿(𝐱, 𝑔(𝑓(𝐱)) = arg𝑚𝑖𝑛
𝐖
𝐱 − 𝑔(𝑓(𝐱)) 2
23/04/2020
Autoencoders
15
• Using proper activation functions lead the autoencoder to learn a
generalization of the principal component analysis under a kernel function:
• If the activations of the neurons are linear and the loss function is the
mean squared error, it can be shown that the encoder function is just an
approximation of the principal component analysis technique
• There are a lot of fancy variants of the undercomplete autoencoders:
• Regularized Autoencoders:
• The loss function has additional terms which force the model to have other
properties besides the ability to reproduce the identity function
• Sparse Autoencoders:
• The loss function has an additional sparsity penalty 𝛀(𝐡) (usually the L1 reg.)
which induce sparsity in the code
• Denoising Autoencoders:
• Before the training phase, the training data is altered by some form of noise:
• The model is more stable and robust to the induced corruption
• Variational Autoencoders:
• Make variational inference on the distribution of the training data ☠
23/04/2020
How to detect extreme rare events
16
• Bidirectional Long Short Term Memory Autoencoder overcome with a modelling
approach:
• Learns and reconstruct the nominal behaviour of a time series
• Use the signal versus reconstruction error to detect anomalies
• Devices normal behaviour is often affected by external factors or variables which are not
evident by analysing signals behaviour with time due to:
• Unmonitored or unknown environmental conditions
• Additional harsh conditions that add noise, i.e. radiation dose
• Measurements and data acquisition errors, i.e. the difference between the measured value of a quantity and
its true value
23/04/2020
Extreme rare events detection
17
Bidirectional LSTM LSTM
Autoencoder
Bidirectional LSTM
Autoencoder
23/04/2020
Anomalies as Outliers in the Reconstruction Error Distribution
18
• The model learns how to encode the normal behavior of our devices
• We set a threshold as an extreme value in the distribution of the reconstruction
errors on the data we claim have no anomalies
• Points with the reconstruction error over the threshold are anomalies
Normal Behavior Abnormal Behavior
• Subsequent anomalies trigger a critical alarm on the monitoring framework
23/04/2020
Results
19
No anomalies detected Anomaly detected on 06-19-2019
Temp. over threshold on 07-23-2019Temp. never goes over the crit. level
23/04/2020
Conclusions & Future developments
20
• With this techniques we have shown that is possible to predict well in advance
if a device deviates from its nominal behavior thus predicting a potential fault
• Is also possible to assess the criticality of the detected anomaly
• We aim to generalize the framework to a multi-system anomaly detector, for
the following systems:
• Uninterruptible Power Supplies and more generally batteries✔
• Collimators ✔
• Electrical Transformers ✘
• Hydraulic pumps ✘
• Compressors ✘
• Future features will also be added to infer the type of anomaly and the extent
(sensor, component, sub-system, system)
23/04/2020
Refereces
21
Zhang, C. et al., (2018), “A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis
in Multivariate Time Series Data“, arXiv:1811.08055v1.
Marchi, E. et al., (2015), “A Novel Approach for Automatic Acoustic Novelty Detection Using a
Denoising Autoencoder with Bidirectional LSTM Neural Networks”, ICASSP.
Sakurada, M., Yairi, T., (2014), “Anomaly Detection Using Autoencoders with Nonlinear
Dimensionality Reduction”, ACM.
Zhou, C., Paffenroth, R. C., (2017), “Anomaly Detection with Robust Deep Autoencoders”, KDD.
Malhotra P., et al., (2016), “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML.
Gong, D. et al., (2019) “Memorizing Normality to Detect Anomaly: Memory-augmented Deep
Autoencoder for Unsupervised Anomaly Detection”, arXiv:1904.02639v2.
Majid S. alDosari, (2016) “Unsupervised Anomaly Detection in Sequences Using Long Short Term
Memory Recurrent Neural Networks”.
23/04/2020
23/04/2020 22
Thank you!

Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure

  • 1.
    Use of MachineLearning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure Lorenzo Giusti lor.giusti@icloud.com 23/04/2020 1
  • 2.
    Problem 2 • We wantto exploit upcoming failures among the accelerators’ technical infrastructure • In the current situation, a device is going to be checked if its internal temperature rises over a certain threshold level (37.5 °C) • The monitoring framework is constantly supervised by the engineers, which have thousands of devices to take care and sometimes they can miss one alarm out of millions received everyday • Thermal runaways cannot be exploited by human supervision, leading to a repentine increasing of the internal temperature which leads to an explosion 23/04/2020
  • 3.
    Motivation 3 • Devices arenot flawless: • Reliability and performances decreases with time • Faults can suddenly put out of services and impact overall availability • Preventive and corrective maintenance is expensive and impact operating time • Predictive maintenance and anticipated interventions improve: • Reduce risks • Decreases downtime • Increase reliability and overall availability 23/04/2020
  • 4.
    Approaches 4 • Mahalonobis Distance: •Measures the distance between a point and a distribution • If the distance is above a certain threshold, the point is considered anomalous • Isolation Forest: • Split the regions “randomly” until a point is isolated • The probability of that point being non-anomalous is proportional of the number of splits • Residual Autoregressive Score: • Compute the norm between the actual time series and the one predicted with an autoregressive model (eg. AIRMA) • If the gradient of the norm is a monotonic increasing function, then the time series is classified as anomalous 23/04/2020
  • 5.
    Solution 5 • Machine Learningbased anomaly detection: • Real time monitoring with unsupervised detection • Device independent with generalized algorithms • Independent of environmental and periodic operational conditions • State of the art artificial intelligence algorithms • Faults are predicted with significant lead time (i.e. day to weeks before failure) • Different type and thresholds of anomalies are detected 23/04/2020
  • 6.
    Failure Analysis FrameworkArchitecture 6 Noisy temperature signal as input Extract, Load and Transform Engineered features as output • Our framework is developed using only the significant features of the devices • i.e. only the temperature sensors for the collimators • Extract Load Transform Pipeline: • Identify and remove seasonal components from the signal • Filter out the environmental noise (eg. Gaussian Smoothing) • Derive additional features in order to gain more insights on physical phenomena of interest • At the end, we homogenize the extracted data range and variability 23/04/2020
  • 7.
    Neural Networks 7 • NeuralNetworks are computing systems that are inspired by the biological brain. • They learn to perform tasks by considering examples, without being programmed with task-specific rules • The original goal of the neural approach was to solve problems in the same way that a human brain would, now the attention moved to performing specific tasks McCulloch, W.S., Pitts, W., (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics. 23/04/2020
  • 8.
    Recurrent Neural Networks 8 •In handling generic sequences of data, simple feed forward neural networks have very tough limitations, especially for: • Sequences with variable lengths (e.g. environmental processes) • Sequences with long-term dependencies • Recurrent neural networks (RNNs) are a class of neural networks naturally able to exhibit temporal dynamic behavior D. E. Rumelhart, G. E. Hinton, and R. J. Williams, (1986), “Learning internal representations by error propagation”. • RNNs are susceptible to vanishing gradients in processing long sequences 23/04/2020
  • 9.
    Recurrent Neural Networks 9 •Sharing parameters across the models allow us to extends the model to samples of different forms and generalize across them: • The output is produced using the same update rule applied to the previous outputs • RNNs introduce the concept of cycles in the computational graph: • Cycles model the influence of the value at time t on the value at time t+ 𝜏 I. Goodfellow, Y. Bengio, and A. Courville, (2015), “Deep Learning”. • The hidden state at time t can be considered as a summary of all the previous values processed by the RNN 23/04/2020
  • 10.
    LSTM Networks 10 • Theforget gate’s vector of activations allows the network to better control the gradients values, preventing eventual vanishings of it • Well-suited to classifying, processing and making predictions based on time series, since there can be lags of unknown duration between important events in a time series • LSTM Networks are a type of recurrent models that have been used in many sequence learning tasks like speech recognition and time series forecasting Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". In Neural Computation. 23/04/2020
  • 11.
    Bidirectional Networks 11 • BidirectionalLSTM Networks are made by putting two independent networks together • The input sequence is given in normal time order for one network, and in reversed time order for the other one. • This structure allows the networks to have both backward and forward information about the sequence at every time step M. Schuster and K. K. Paliwal, (1997) ,”Bidirectional recurrent neural networks" in IEEE Transactions on Signal Processing. 23/04/2020
  • 12.
    Autoencoders 12 • The principalaim is to learn a representation for a set of data, by training the network to ignore signal noise • By learning to replicate the most salient features in the model is encouraged to learn how to precisely reproduce the most frequent characteristics: • When facing anomalies, the model should fail the reconstruction • Reconstruction error of a data point, is used as an anomaly score to detect anomalies Kramer, M.A., (1991), “Nonlinear principal component analysis using autoassociative neural networks”, AIChE J. 23/04/2020
  • 13.
    Autoencoders 13 • Formally, Autoencodersare feedforward neural networks whose aim is learning to copy its input to its output: ℐ(𝐱) = 𝐱 • As it is, the identity function itself isn’t so useful, unless we force the model to prioritize which aspects of the input should be copied, leading the model to learn useful properties of the input data • The main idea is to separate the identity function in two parts: ℐ(𝐱) = 𝑔(𝑓(𝐱)) • 𝐡 = 𝑓(𝐱) is the encoder function, which maps the input to an internal representation or code • 𝐫 = 𝑔(𝐡) is the decoder function, which produces a reconstruction of the input as a function of the code taken from the previous mapping• One way to capture the most salient features of the input, the encoder/decoder functions must be of the form: 𝑓: 𝑅 𝑛 → 𝑅 𝑚 , 𝑚 < 𝑛 𝑔: 𝑅 𝑚 → 𝑅 𝑛 , 𝑚 < 𝑛 23/04/2020
  • 14.
    Autoencoders 14 • An Autoencoderwhose code dimension is less than the input dimension is called undercomplete • The learning process of an undercomplete autoencoder is described as: 𝐖∗ = arg𝑚𝑖𝑛 𝐖 𝐿(𝐱, 𝑔(𝑓(𝐱)) = arg𝑚𝑖𝑛 𝐖 𝐱 − 𝑔(𝑓(𝐱)) 2 23/04/2020
  • 15.
    Autoencoders 15 • Using properactivation functions lead the autoencoder to learn a generalization of the principal component analysis under a kernel function: • If the activations of the neurons are linear and the loss function is the mean squared error, it can be shown that the encoder function is just an approximation of the principal component analysis technique • There are a lot of fancy variants of the undercomplete autoencoders: • Regularized Autoencoders: • The loss function has additional terms which force the model to have other properties besides the ability to reproduce the identity function • Sparse Autoencoders: • The loss function has an additional sparsity penalty 𝛀(𝐡) (usually the L1 reg.) which induce sparsity in the code • Denoising Autoencoders: • Before the training phase, the training data is altered by some form of noise: • The model is more stable and robust to the induced corruption • Variational Autoencoders: • Make variational inference on the distribution of the training data ☠ 23/04/2020
  • 16.
    How to detectextreme rare events 16 • Bidirectional Long Short Term Memory Autoencoder overcome with a modelling approach: • Learns and reconstruct the nominal behaviour of a time series • Use the signal versus reconstruction error to detect anomalies • Devices normal behaviour is often affected by external factors or variables which are not evident by analysing signals behaviour with time due to: • Unmonitored or unknown environmental conditions • Additional harsh conditions that add noise, i.e. radiation dose • Measurements and data acquisition errors, i.e. the difference between the measured value of a quantity and its true value 23/04/2020
  • 17.
    Extreme rare eventsdetection 17 Bidirectional LSTM LSTM Autoencoder Bidirectional LSTM Autoencoder 23/04/2020
  • 18.
    Anomalies as Outliersin the Reconstruction Error Distribution 18 • The model learns how to encode the normal behavior of our devices • We set a threshold as an extreme value in the distribution of the reconstruction errors on the data we claim have no anomalies • Points with the reconstruction error over the threshold are anomalies Normal Behavior Abnormal Behavior • Subsequent anomalies trigger a critical alarm on the monitoring framework 23/04/2020
  • 19.
    Results 19 No anomalies detectedAnomaly detected on 06-19-2019 Temp. over threshold on 07-23-2019Temp. never goes over the crit. level 23/04/2020
  • 20.
    Conclusions & Futuredevelopments 20 • With this techniques we have shown that is possible to predict well in advance if a device deviates from its nominal behavior thus predicting a potential fault • Is also possible to assess the criticality of the detected anomaly • We aim to generalize the framework to a multi-system anomaly detector, for the following systems: • Uninterruptible Power Supplies and more generally batteries✔ • Collimators ✔ • Electrical Transformers ✘ • Hydraulic pumps ✘ • Compressors ✘ • Future features will also be added to infer the type of anomaly and the extent (sensor, component, sub-system, system) 23/04/2020
  • 21.
    Refereces 21 Zhang, C. etal., (2018), “A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data“, arXiv:1811.08055v1. Marchi, E. et al., (2015), “A Novel Approach for Automatic Acoustic Novelty Detection Using a Denoising Autoencoder with Bidirectional LSTM Neural Networks”, ICASSP. Sakurada, M., Yairi, T., (2014), “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction”, ACM. Zhou, C., Paffenroth, R. C., (2017), “Anomaly Detection with Robust Deep Autoencoders”, KDD. Malhotra P., et al., (2016), “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML. Gong, D. et al., (2019) “Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection”, arXiv:1904.02639v2. Majid S. alDosari, (2016) “Unsupervised Anomaly Detection in Sequences Using Long Short Term Memory Recurrent Neural Networks”. 23/04/2020
  • 22.