International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
I am Stacy W. I am a Statistical Physics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, University of McGill, Canada
I have been helping students with their homework for the past 7years. I solve assignments related to Statistical.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistical Physics Assignments.
In this talk we present a framework for splitting data assimilation problems based upon the model dynamics. This is motivated by assimilation in the unstable subspace (AUS) and center manifold and inertial manifold techniques in dynamical systems. Recent efforts based upon the development of particle filters projected into the unstable subspace will be highlighted.
The MAIN CONTRIBUTION is an on-line heuristic law to set the training process and to modify the NN topology based on the Levenberg-Marquardt method.
An Area Predictor Filter using nonlinear autoregressive model based on neural networks for time series forecasting is introduced.
The core of the proposal is to analyze the roughness (long or short term stochastic dependence) of time series evaluated by the Hurst parameter (H).
The proposed law adapts in real time the topology of the filter at each stage of time series, changing the number of pattern, the number of iterations and the input vector length.
The main results show a good performance of the predictor, considering in particular to time series whose H parameter has a high roughness of signal, which is evaluated by HS and HA, respectively.
These results encouraged to continue working on new adjustment algorithms for time series modeling natural phenomena.
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
Digital Signal Processing[ECEG-3171]-Ch1_L04Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
Digital Signal Processing[ECEG-3171]-Ch1_L05Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
Some Developments in Space-Time Modelling with GIS Tao Cheng – University Col...Beniamino Murgante
Some Developments in Space-Time Modelling with GIS
Tao Cheng – University College London (U.K)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)
Time alignment techniques for experimental sensor dataIJCSES Journal
Experimental data is subject to data loss, which presents a challenge for representing the data with a
proper time scale. Additionally, data from separate measurement systems need to be aligned in order to
use the data cooperatively. Due to the need for accurate time alignment, various practical techniques are
presented along with an illustrative example detailing each step of the time alignment procedure for actual
experimental data from an Unmanned Aerial Vehicle (UAV). Some example MATLAB code is also
provided.
PROGRAMMA ATTIVITA’ DIDATTICA A.A. 2016/17
DOTTORATO DI RICERCA IN INGEGNERIA STRUTTURALE E GEOTECNICA
____________________________________________________________
STOCHASTIC DYNAMICS AND MONTE CARLO SIMULATION IN EARTHQUAKE ENGINEERING APPLICATIONS
Lecture Series by
Agathoklis Giaralis, Ph.D., M.ASCE., P.E. City, University of London
Visiting Professor Sapienza University of Rome
I am Stacy W. I am a Statistical Physics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, University of McGill, Canada
I have been helping students with their homework for the past 7years. I solve assignments related to Statistical.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistical Physics Assignments.
In this talk we present a framework for splitting data assimilation problems based upon the model dynamics. This is motivated by assimilation in the unstable subspace (AUS) and center manifold and inertial manifold techniques in dynamical systems. Recent efforts based upon the development of particle filters projected into the unstable subspace will be highlighted.
The MAIN CONTRIBUTION is an on-line heuristic law to set the training process and to modify the NN topology based on the Levenberg-Marquardt method.
An Area Predictor Filter using nonlinear autoregressive model based on neural networks for time series forecasting is introduced.
The core of the proposal is to analyze the roughness (long or short term stochastic dependence) of time series evaluated by the Hurst parameter (H).
The proposed law adapts in real time the topology of the filter at each stage of time series, changing the number of pattern, the number of iterations and the input vector length.
The main results show a good performance of the predictor, considering in particular to time series whose H parameter has a high roughness of signal, which is evaluated by HS and HA, respectively.
These results encouraged to continue working on new adjustment algorithms for time series modeling natural phenomena.
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
Digital Signal Processing[ECEG-3171]-Ch1_L04Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
Digital Signal Processing[ECEG-3171]-Ch1_L05Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
Some Developments in Space-Time Modelling with GIS Tao Cheng – University Col...Beniamino Murgante
Some Developments in Space-Time Modelling with GIS
Tao Cheng – University College London (U.K)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)
Time alignment techniques for experimental sensor dataIJCSES Journal
Experimental data is subject to data loss, which presents a challenge for representing the data with a
proper time scale. Additionally, data from separate measurement systems need to be aligned in order to
use the data cooperatively. Due to the need for accurate time alignment, various practical techniques are
presented along with an illustrative example detailing each step of the time alignment procedure for actual
experimental data from an Unmanned Aerial Vehicle (UAV). Some example MATLAB code is also
provided.
PROGRAMMA ATTIVITA’ DIDATTICA A.A. 2016/17
DOTTORATO DI RICERCA IN INGEGNERIA STRUTTURALE E GEOTECNICA
____________________________________________________________
STOCHASTIC DYNAMICS AND MONTE CARLO SIMULATION IN EARTHQUAKE ENGINEERING APPLICATIONS
Lecture Series by
Agathoklis Giaralis, Ph.D., M.ASCE., P.E. City, University of London
Visiting Professor Sapienza University of Rome
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
Epidemic processes on switching networksNaoki Masuda
Presentation slides for the following two papers:
- Leo Speidel, Konstantin Klemm, Víctor M. Eguíluz, Naoki Masuda.
New Journal of Physics, 18, 073013 (2016).
- Tomokatsu Onaga, James P. Gleeson, Naoki Masuda.
Physical Review Letters, 119, 108301 (2017).
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...NUI Galway
Professor Dimitris Kugiumtzis, Aristotle University of Thessaloniki, Greece, presented this workshop on nonlinear analysis of time series as part of the Summer School on Modern Statisitical Analysis and Computational Methods hosted by the Social Sciences Compuing Hub at the Whitaker Institute, NUI Galway on 17th-19th June 2013.
The issues about maneuvering target track prediction were discussed in this paper. Firstly, using Kalman filter which based on current statistical model describes the state of maneuvering target motion, thereby analyzing time range of the target maneuvering occurred. Then, predict the target trajectory in real time by the improved gray prediction model. Finally, residual test and posterior variance test model accuracy, model accuracy is accurate.
In these two lectures, we’re looking at basic discrete time representations of linear, time invariant plants and models and seeing how their parameters can be estimated using the normal equations.
The key example is the first order, linear, stable RC electrical circuit which we met last week, and which has an exponential response.
Surveillance refers to the task of observing a scene, often for lengthy periods in search of particular objects or particular behaviour. This task has many applications, foremost among them is security (monitoring for undesirable behaviour such as theft or vandalism), but increasing numbers of others in areas such as agriculture also exist. Historically, closed circuit TV (CCTV) surveillance has been mundane and labour Intensive, involving personnel scanning multiple screens, but the advent of reasonably priced fast hardware means that automatic surveillance is becoming a realistic task to attempt in real time. Several attempts at this are underway.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
1. Want Access to the Coding Examples?
https://github.com/dn3kmc/dsaa_2020_tutorial_code_public
https://hub.docker.com/r/ianbeaver/dsaa2020
2. How to Determine the
Optimal Anomaly Detection
Method For Your
Application
Cynthia Freeman
Research Scientist
Ian Beaver
Chief Scientist
3. Overview
1. Background
Dene: time series, anomalies
Why is anomaly detection hard?
2. Time Series Characteristics and How to Detect Them
Seasonality, Trend, Concept Drift, Missing Time Steps
3. Dataset Resources
4. Anomaly Detection Methods
STL, SARIMA, Prophet, GPs, RNNs, etc.
5. Evaluation Methods
Numenta Benchmark Scores, Windowed F-Scores
6. Which anomaly detection method given a characteristic?
7. Human-in-the-Loop Methods
6. Time Series
A time series is a sequence of data points indexed in order of time.
How are time series used?
Stock Market
Tracking KPIs
Medical Sensors
Weather Patterns
7. Anomalies
An anomaly in a time series is a pattern that does not conform to past patterns of
behavior.
Applications:
Ecient troubleshooting
Fraud detection
Ensuring undisrupted business
Saving lives in system health monitoring
Anomaly Detection is hard!
14. Which anomaly detection method should I use?
Base this decision o of the characteristics the time series possesses
Evaluate anomaly detection methods on time series characteristics as an
example
Experiment with 2 evaluation criteria
Window-based F-score
Numenta Anomaly Benchmark (NAB) Score
Human-in-the-loop methodologies
16. Simple Example: Sliding Gaussian Window Detector
Estimate mean and variance over
sliding window
Compute a score based on the tail
probability
S(yt) = P(yt ≤ τ|µ, σ2
)
Use max relative to upper and lower
extremes
02-24 00
02-24 12
02-25 00
02-25 12
02-26 00
02-26 12
02-27 00
02-27 12
02-28 00
10
0
10
20
30
19. Stationarity
A time series is stationary if the mean, variance, and autocorrelation structure
are constant for all time
Autocorrelation: the correlation of a signal with a delayed copy of itself
A white noise process is stationary.
20. How can a time series be non-stationary?
Several possibilities:
Seasonality
Trend
Concept Drift
21. Seasonality
Presence of variations that occur at
specic regular intervals
Real data often exhibits seasonal
eects at multiple time scales.
Day-of-week
Hour-of-day
Can be irregular
Day-of-month
Holidays
Can be Additive or Multiplicative
If multiplicative, amplitude of
seasonal behavior is dependent on the
mean
01
Jul
2014
30 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
timestamp
22. Seasonality is not always obvious at a glance!
Time Series
0 20 40 60 80 100
Autocorrelation
The autocorrelation plot can help identify seasonality
The autocorrelation plot displays autocorrelation coecients
23. Autocorrelation Plots
Autocorrelation coecient equation:
rk =
T
t=k+1
(yt − ¯y)(yt−k − ¯y)
T
t=1
(yt − ¯y)2
where rk = correlation between yt and yt−k and T = length of time series
Time Series
0 20 40 60 80 100
Autocorrelation
x-axis is k (lag), y-axis is rk
seasonality present → denitive repeated spikes
24. Automatic Detection of Seasonality
What about a function to automatically detect seasonality?
R's findfrequency will return the period with the maximum spectral
amplitude of the signal
What does this mean?
Quick Review:
Period = # of time steps required to complete a single cycle
Frequency = fraction of a cycle that's completed in a single time step
Frequency = 1
period
Amplitude = measure of change in a single period
The spectral density is a frequency domain representation of a time series; we
want to represent the time series as a sum of sine and cosine waves!
25. Automatic Detection of Seasonality
Given a time series with n distinct values, we can represent it as a sum of sine
and cosine waves!
xt =
n/2
j=1
β1
j
n
cos(2πωjt) + β2
j
n
sin(2πωjt) .
ωj = 1
n, 2
n, ...
n
2
n, are the harmonic frequencies (positive integer)
β1
j
n and β2
j
n are parameters that can be estimated using FFT
26. Automatic Detection of Seasonality
Periodogram graphs importances of possible frequency values that might
explain the oscillation pattern of the data.
After FFT, we can plot the periodogram.
x-axis is frequency j
n
y-axis is
P
j
n
= β2
1
j
n
+ β2
2
j
n
Large P(j
n) → Frequency j
n is important in explaining the oscillation in the
observed series.
29. Trend
The process mean can change over time.
Two types of trends: Deterministic and Stochastic
30. Deterministic vs Stochastic Trends
Which trend is present is dependent on how we eliminate them
Stochastic trends (dierence-stationary)
The mean trend is stochastic.
Eliminated by dierencing
Detected via the Augmented Dickey Fuller Test
Deterministic trends (trend-stationary)
The mean trend is deterministic.
Eliminated by detrending
Detected via the Cox-Stuart Test
35. Data Resources
Numenta Anomaly Benchmark Repository
https://github.com/numenta/NAB/tree/master/data
Annotation Instructions:
https://drive.google.com/file/d/0B1_XUjaAXeV3YlgwRXdsb3Voa1k/view
UCR Time Series Classication Archive
https://www.cs.ucr.edu/~eamonn/time_series_data/
Time Series Data Library
https://pkg.yangzhuoranyang.com/tsdl/
Kaggle
https://www.kaggle.com/datasets?tagids=6618
38. STL
Local regression with LOESS
y(t) = S(t) + T(t) + (t)
Decompose into season and trend
LOESS smoothing can interpolate
missing data
Residual should look more
stationary
STL: A seasonal-trend decomposition by Cleveland, Robert B., et al.
41. ARMA
A family of Gaussian models with temporal correlation.
y(t) −
p
i=1
θiy(t − i)
AR
= (t) +
q
j=1
φj (t − j)
MA
Autoregressive (AR)
The value at time t is a linear combination of p past values plus current noise signal.
Moving Average (MA)
The value at time t is a linear combination of q past values of noise.
42. ARIMA and SARIMA
ARIMA (p,d,q)
ARMA on dierenced signal.
SARIMA (p,d,q,P,D,Q,s)
Extend ARIMA to incorporate longer-term seasonal correlation.
44. Facebook Prophet
Uses an additive model:
y(t) = g(t) + s(t) + h(t) + t
g(t) is linear/logistic growth trend
s(t) is yearly/weekly seasonal component
h(t) is user-provided list of holidays
Forecasting at Scale by Taylor, Sean J., and Benjamin Letham.
46. What is a Gaussian Process?
A Gaussian distribution over functions consistent with our data
p(f (x)) = N(µ(x), K(x, x))
µ(x) is the mean function1
K(x, x) is the covariance matrix
K(x, x) gives us power of expression...
1
Usually at functions are used here
47. Covariance Matrix
Assuming we have n many points, the covariance matrix2
is...
K(x, x) =
k(x1, x1) . . . k(x1, xn)
... ...
k(xn, x1) k(xn, xn)
k is the covariance kernel function. If my data has...
Stationarity → k(x, x ) = σ2
exp −(x−x )2
2 2
Periodicity → k(x, x ) = σ2
exp −2 sin2(π|x−x |/p)
2
Trend → k(x, x ) = σ2
b + σ2
v(x − c)(x − c)
2
K has to be a positive semidenite matrix
48. Prediction
Once I have my mean and covariance functions, I can predict the future!3
1. Given x∗, I want to know what f (x∗) is
2. We just select a point from p(f (x∗)) = N(m∗, C∗) where
m∗ = µ(x∗) + K(x∗, x)K(x, x)−1
(f (x) − µ(x))
C∗ = K(x∗, x∗) − K(x∗, x)K(x, x)−1
K(x∗, x)T
Time complexity is O(n3
) because we have to nd the inverse of K.
3
Or interpolate
50. Recurrent Neural Network
Given a window of nlag time steps in
the past, predict a window of nseq
time steps in the future
Anomaly score is an average of the
prediction error
Adaptive: uses online gradient-based
optimizer, built to deal with concept
drift
Choice of nseq can greatly aect
false positive rate
Online Anomaly Detection with Concept Drift Adaptation using RNNs CoDS-COMAD ’18, January 11–13, 2018, Goa, India
where T is length of the time series:
reset gate : r
(i)
t = (W(i)
r · [D(z
(i 1)
t ), z
(i)
t 1])
update gate : u
(i)
t = (W(i)
u · [D(z
(i 1)
t ), z
(i)
t 1])
proposed state : ˜z
(i)
t = tanh(W(i)
p · [D(z
(i 1)
t ), rt z
(i)
t 1])
hidden state : z
(i)
t = (1 u
(i)
t ) z
(i)
t 1 + u
(i)
t ˜z
(i)
t
(1)
where is Hadamard product, [a, b] is concatenation of
vectors a and b, D(·) is dropout operator that randomly sets
the dimensions of its argument to zero with probability equal
to dropout rate, z0
t equals the value of the input time series at
time t. Wr, Wu, and Wp are weight matrices of appropriate
dimensions s.t. r
(i)
t , u
(i)
t ,˜z
(i)
t , and z
(i)
t are vectors in Rc(i)
,
where c(i)
is the number of units in layer i. The sigmoid ( )
and tanh activation functions are applied element-wise. The
hidden state z
(i)
t is used to obtain the output via a linear or
non-linear output layer. The parameters W = [Wr, Wu, Wp]
of the RNN consist of the weight matrices in Equations 1.
Dropout is used for regularization [28, 33] and is applied only
to the non-recurrent connections, ensuring information flow
across time-steps.
3 APPROACH
We assume that a model that is able to predict the next few
Anomaly Score
Computation
Prediction using RNN
Anomaly Score
Computation
RNN Updation using
BPTT
At time t At time t+1
Prediction using RNN
RNN Updation using
BPTT
Figure 1: Steps in Online RNN-AD approach
obtained using RNNs and then used for anomaly score com-
putation as well as incremental model updation. Overall steps
of the algorithm are depicted in Figure 1.
3.1 Online RNN-AD
Consider a multivariate time series x = {x1, x2, ..., xt}, where
m
Illustration from Saurav et al. '18
53. Anomaly Scores
Anomaly detectors are adapted to output a score between 0 and 1
STL: Apply Q-function to residuals
SARIMA, Prophet, Gaussian: Apply Q-function to forecasting error
RNN: Apply Q-function to unnormalized anomaly score
54. Numenta Anomaly Benchmark Scoring
For every predicted anomaly y, its
score σ(y) is determined by its
position relative to its containing
window or an immediately preceding
window
For every ground truth anomaly,
construct an anomaly window with
the anomaly in the center.
.1×length of time series
# of true anomalies
(FN) are not applicable for evaluating algorithms for the above
requirements.
Fig. 2. Shaded red regions represent the anomaly windows for this data file.
The shaded purple region is the first 15% of the data file, representing the
probationary period. During this period the detector is allowed to learn the
data patterns without being tested.
To promote early detection NAB defines anomaly
windows. Each window represents a range of data points that is
centered around a ground truth anomaly label. Fig. 2 shows an
example using the data from Fig 1. A scoring function
(described in more detail below) uses these windows to
identify and weight true positives, false positives, and false
negatives. If there are multiple detections within a window, the
earliest detection is given credit and counted as a true positive.
Additional positive detections within the window are ignored.
The sigmoidal scoring function gives higher positive scores to
true positive detections earlier in a window and negative scores
to detections outside the window (i.e. the false positives).
These properties are illustrated in Fig. 3 with an example.
How large should the windows be? The earlier a detector
can reliably identify anomalies the better, implying these
windows should be as large as possible. The tradeoff with
extremely large windows is that random or unreliable
the cost of a false negative is far higher than the cost of a false
positive. Alternatively, an application monitoring the statuses
of individual servers in a datacenter might be sensitive to the
number of false positives and be fine with the occasional
missed anomaly since most server clusters are relatively fault
tolerant.
To gauge how algorithms operate within these different
application scenarios, NAB introduces the notion of
application profiles. For TPs, FPs, FNs, and TNs, NAB applies
different relative weights associated with each profile to obtain
a separate score per profile.
Fig. 3. Scoring example for a sample anomaly window, where the values
represent the scaled sigmoid function, the second term in Eq. (1). The first
point is an FP preceding the anomaly window (red dashed lines) and
contributes -1.0 to the score. Within the window we see two detections, and
only count the earliest TP for the score. There are two FPs after the window.
The first is less detrimental because it is close to the window, and the second
yields -1.0 because it’s too far after the window to be associated with the true
anomaly. TNs make no score contributions. The scaled sigmoid values are
multiplied by the relevant application profile weight, as shown in Eq. (1), the
NAB score for this example would calculate as: −1.0!! + 0.9999!! −
0.8093!! − 1.0!!. With the standard application profile this would result
in a total score of 0.6909.
Illustration from Lavin Ahmad '15
55. Numenta Anomaly Benchmark Scoring (Continued)
The raw score is computed as:
Sd =
y∈Yd
σ(y)
+ AFNfd
AFN is cost of false negatives
Then rescale to get summary score:
100 ×
S − Snull
Sperfect − Snull
Choose threshold that maximizes score
56. Window-based F-score
Segment into nonoverlapping windows
Window is anomalous if it contains an anomaly
Treat like binary classication and report F1
Choose threshold that minimizes # of errors
Prefer detection in case of tie
60. Which methods are promising given a characteristic?
Seasonality and Trend
STL, SARIMA, Prophet
Concept Drift
Requires more complex methods such as HTMs
Missing Time Steps
Performance varies based on evaluation strategy
Area for future work: more methods needed!
61. Which evaluation strategy should I use?
F-score scheme is more restrictive
NAB scores have more wiggle room for false positives due to reward for early
detection
What evaluation metric to use is entirely based on the needs of the user
63. Human-in-the-Loop
Not advisable to completely remove the human element
Predicted anomalies given to user to annotate (Is the predicted anomaly truly
an anomaly?)
Based on user decision:
Idea One: The parameters for that method can be tuned to reduce the error.
Idea Two: The anomaly score is tuned to reduce the error.
64. Concept One
Avoid predicted anomaly clusters.
0 1000 2000 3000 4000 5000 6000 7000 8000
60
65
70
75
80
85
Weight anomaly scores after a prediction by multiplying to a sigmoid function
erf (x) = 1√
π
x
−x e−t2
dt to briey reduce the anomaly scores of clustered anomalies
65. Concept Two
Users disagree with a prediction → similar instances should not be detected.
60 80 100 120 140 160 180 200
64
66
68
70
72
74
520 540 560 580 600 620 640 660
66
68
70
72
74
76
1. Use MASS4
to nd similar subsequences (motifs)
2. Reduce the anomaly scores corresponding to these motifs by multiplying them to a
sigmoid function:
y =
1
1 + e−kx+b
where b = ln(1−min_weight
min_weight ), k = ln( )−b
−max_distance
min_weight = minimum weight multiplied to the anomaly scores
max_distance=max discord distance from the query
4
Mueen's Algorithm for Similarity Search [14]
69. In Summary
The existence of an anomaly detection method that is optimal for all domains
is a myth
Determine the characteristics present in the data to narrow down the choices
for anomaly detection methods
Incorporate user feedback on predicted outliers by utilizing subsequence
similarity search, reducing the need for annotation while also increasing
evaluation scores