Live Anomaly Detection

Arun Kejariwal
Arun KejariwalStatistical Learning Principal at Machine Zone, Inc.
LiveAnomalyDetection
Identifying anomalies in live data
1
Dhruv Choudhary Francois Orsini
Arun Kejariwal
SATORI #StrataData
What is live data ?
Petabytes of Data Live Reactions
Time to Reaction ~ less than 5 msecs
2
44546A
SATORI
What is live data ?
3
Live Data is new Streaming data
that needs to be reacted upon
instantly
SATORI #StrataData
Where is live data ?
4
SATORI #StrataData
Most Recent and Realtime Reactive Data
Most Recent
Low Latency
Unstructured
Highly Reactive
High Throughput
5
Live Data Properties
SATORI #StrataData
6
Satori - A Live Data Computation Mesh
Satori powers a live data mesh that is capable of
managing data flows from billions endpoints
simultaneously at milliseconds latency.
SATORI #StrataData
7 7
SATORI #StrataData
Live Data and Big Data ?
8
SATORI #StrataData
9
SATORI #StrataData
Live Data Platform
•Satori is a fully managed 

platform as a service
•Connect, process and react to 

streaming live data at ultra-low 

latency.
•Use cases are within 

(but not limited to) IoT, mobile 

fitness, gaming and smart cities.
10
SATORI #StrataData
In-Stream Live Anomaly Detection
11
SATORI #StrataData
Data Quadrants
12
SATORI #StrataData
What are live
anomalies ?
Audio
Time Series
Video
Text
Binary
Gun Shot Sound
Stock market Crash
Profanity Filters
Road Accident
13
SATORI #StrataData
Audio Time Series
Time Series
Audio
Prediction Error
Audio Series
Engine Misfiring
14
Connected Cars
Surveillance
Equipment
Malfunction
Applications
FFT Window Features
Timbre, Tempo, Dynamics
Wavenet
Other approaches
SATORI #StrataData
Text to Time Series
Text
Word2vec Averaging
Word Anomalies
Paragraph Anomalies
Novelty Detection
Applications
15
C0
C1
C2
Clustering Word2vec Averaging
Word2vec Averaging
Time Series
SATORI #StrataData
Video to Time Series
Video
Shape Anomalies
Deep Encoders
Representation
16
Time Series
SATORI #StrataData
Attributes of Live Anomaly Detection
Model Selection
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
17
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
POINT ANOMALY
Individual points that break the
pattern made by adjoining points
CHANGEPOINT
Change in mean, variance or
structure of the series
PATTERN ANOMALY
A group of collective points that
form a pattern never seen before.
TREND ANOMALY
Significant Perturbation in the
longterm trend of a series
18
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
19
MEMORY CONSTRAINTS
Can all the data be loaded
into memory ?
COMPUTE CONSTRAINTS
Can you keep up to the
data rate ?
EVOLUTIONARY DATA
Is the structure in data
changing continuously?
SATORI #StrataData
True Positives
True Negatives
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
DATA OBSOLESCENCE
How fast should we forget past
anomalies ?
ANOMALY CLUSTERING
Anomalies usually occur
close to each other
20
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
CONSTANT RATE
How much human
attention to allocate ?
21
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
SEMI - SUPERVISED LEARNING
Use Unlabelled Data for Training
SUPERVISED LEARNING
Separate positive and
negative samples
LABELTRAININFERENCE
22
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
Hourly Series, Daily Seasonality
Minutely Series, Hourly Seasonality
Secondly Series, Seasonal Jitter
23
SATORI #StrataData
Research Domains
Statistics
Pattern
Mining
Time
Series
Machine
Learning
Stream
Clustering
Deep
Learning
24
SATORI #StrataData
Statistics
25
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
SATORI #StrataData
Statistics
26
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
ROBUST STATISTICS
Rejecting the effect of anomalies
while modeling the distribution
Median-MAD, Winsorization
Grubb’s test, Generalized-ESD
Student’s t-test
SATORI #StrataData
Statistics
NON-PARAMETRIC STATISTICS
Histogram based techniques
t-digest
Adjusted Box plots
99.73 %00.27 %
27
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
ROBUST STATISTICS
Rejecting the effect of anomalies
while modeling the distribution
Median-MAD, Winsorization
Grubb’s test, Generalized-ESD
Student’s t-test
SATORI #StrataData
Time Series Analysis
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
ARMA, SARMA, EWMA,
TBATS
Model Estimation based on
past data
Point Anomalies
28
SATORI #StrataData
Time Series Analysis
STL, LOESS
Non-parametric regression
to model time series
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
29
Point Anomalies
SATORI #StrataData
Time Series Analysis
PCA, RobustPCA
Principal Component
Analysis
EDM, BCP, SDAR
Breakout Detection,
Sequential Discounting
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
30
Point Anomalies
SATORI #StrataData
Pattern Mining
Mark the rarest elements in
the stream as anomalies
Inter arrival times for patterns
HOTSAX
Rare-Rule Anomaly
Pattern Anomalies Incremental False Alarm Rate Robust
31
SATORI #StrataData
DBscan
k-means
c-means
Clu-Stream
DenStream
Clustree
D-Stream
HPStream
DBStream
Clustering
Micro-Clusters
Online
Micro-
Clustering
Offline
Clustering of
MCs
32
SATORI #StrataData
Clustering
Micro-Clusters
Online
Micro-
Clustering
Offline
Clustering of
MCs
d
d
DBStream
ClusTree
DenStream
DBStream
Runtimes
Pattern Anomalies
Incremental
Robust
33
SATORI #StrataData
Deep Learning
LSTM
Encoders
LSTM Auto-Encoders
Anomaly
Input
Input
Reconstructed
Input
Explicitly Models Time Series
Structure
Non-linear dimensionality
reduction without modeling
time series structure
Performance degrades as the
modality of the series increases
34
SATORI #StrataData
Time Series Prediction
Prediction
Input
Point Anomalies
Deep Learning
LSTM
Anomaly
Input
Classifier
Labels
Point Anomalies
No need for a fixed size
window for model estimation
Time Series Pattern Prediction
Pattern Prediction
Input
Pattern Anomalies
35
SATORI #StrataData
Deep Learning
LSTM
Time Series Pattern Prediction
Pattern Prediction
Input
Pattern Anomalies
Multiple predicted value for
each future observation
Model the errors as multivariate
gaussian to find anomalous
observations
Model the Euclidean distance
between predicted and true
sequences as error
Prediction Error
Prediction Error
36
SATORI #StrataData
Deep Learning
LSTM Encoders Time Series Reconstruction
Pattern Prediction
Input
Pattern Anomalies
Decoder Network
Encoder Network
LSTM-Encoder
LSTM
AutoEncoder
Runtimes
37
SATORI #StrataData
Correlation in Anomalies
s1
s2
s3
s8
Multi-dimensionality
Model Correlation
Correlation in anomaly space
can be captured in a graph
What about jitter in anomalies ?
Model anomalies in fixed sized buckets of time
What about contextual anomalies ?
Modeling correlation in the space of the whole
series is very expensive for live data
Naive Algorithm
Majority vote across all
dimensions
s4
s5
s6
s7
s6
s4
s2
s5
s8
s1
38
SATORI #StrataData
Multi-dimensionality
What about jitter in anomalies ?
Model anomalies in fixed sized buckets of time
What about contextual anomalies ?
Modeling correlation in the space of the whole
series can be very expensive
LSTM
No need for a fixed size window for model estimation
Can model high dimensionality
Works with non-stationary time series with irregular structure
Does not work with evolutionary series
39
SATORI #StrataData
Model Selection
Single Dimensional Series Multi-Dimensional Series
DBStream
Runtimes
Statistics
TimeSeriesAnalysis
HOTSAX/RRA
OneSVM/Iforest
LSTM
DBStream
Statistics
TimeSeries
LSTM
Runtimes
40
SATORI
41
SATORI #StrataData
Business Metrics
Performance Metrics
Health Metrics
Operations
Point Anomalies
Change Points
Trend Anomalies
What kind of anomalies
< 100 msec
Latency Sensitivity
42
#StrataData
SATORI #StrataData
Electrocardiograms
Fitness Trackers
Healthcare
Pattern anomalies
Change points
What kind of anomalies
< 1 msec
Latency Sensitivity
43
#StrataData
ECG
Driver Stress
Epilepsy onset
SATORI #StrataData
Traffic Routing
Passenger Load
Scheduling
Transportation
Point anomalies
Change points
Trend Anomalies
Spatial Anomalies
What kind of anomalies
< 5 seconds
Latency Sensitivity
44
#StrataData
SATORI #StrataData
Stock Trades
Share Prices
Financial Data
Point anomalies
Change points
What kind of anomalies
< 100 micro-seconds
Latency Sensitivity
45
#StrataData
SATORI #StrataData
Network Intrusion
Video Surveillance
Security
Point anomalies
Change points
What kind of anomalies
< 5 msecs
Latency Sensitivity
46
#StrataData
SATORI #StrataData
Smart Homes
Connected Cars
Smart Devices
Internet of Things
Pattern anomalies
Change points
Trend anomalies
What kind of anomalies
< 5 msecs
Latency Sensitivity
47
SATORI
48
SATORI
Check our tech @ satori.com
@choudharydhruv
@SatoriLiveData
@arun_kejariwal @FrancoisOrsini_
49
SATORI
Resources
“Computing Extremely Accurate Quantiles using t-Digests”, https://github.com/tdunning/t-digest
50
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
https://www.tensorflow.org/tutorials/word2vec
https://blog.keras.io/building-autoencoders-in-keras.html
“Deep Learning for Time Series Analysis”, https://arxiv.org/pdf/1701.01887.pdf
SATORI
Readings
“Using Natural Language Processing Models for Understanding Network Anomalies”, HPEC’17.
51
“Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection”, CIN’17.
“Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network”, FDSE’16.
“Deep Structured Energy Based Models for Anomaly Detection”, ICML’16.
“Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series”, ICML’16.
SATORI
Readings
52
“Long Short Term Memory Networks for. Anomaly Detection in Time Series”, ESANN’15.
“Clustering Data Streams based on Shared Density Between Clusters”, TKDE’16.
“LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML’16 Anomaly Detection Workshop.
“Sequence to Sequence Model for Anomaly Detection in Financial Transactions”, ICML’16.
“MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection”, NetworkML’16.
SATORI
Readings
“Anomaly detection: A survey”, ACM Computing Surveys, 2009.
“Time Series Analysis by State Space Methods”, by J. Durbin and S. J. Koopman, 2001.
53
“HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence”, ICDM, 2005
“Real-time change-point detection using sequentially discounting normalized maximum likelihood coding”,
Advanced Knowledge Discovery Data Mining, 2011.
“Unsupervised Learning of Video Representations using LSTMs”, ICML’15.
SATORI
Thank you
54
1 of 54

Recommended

Anomaly detection in real-time data streams using Heron by
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
4.7K views49 slides
Data Data Everywhere: Not An Insight to Take Action Upon by
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
1.5K views37 slides
Modern real-time streaming architectures by
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
7.2K views175 slides
Strata 2014 Anomaly Detection by
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
11.3K views56 slides
WSO2 Big Data Platform and Applications by
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
1.7K views28 slides
Strata New York 2012 by
Strata New York 2012Strata New York 2012
Strata New York 2012MapR Technologies
316 views36 slides

More Related Content

Similar to Live Anomaly Detection

Mastering AIOps with Deep Learning by
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep LearningJorge Cardoso
2K views10 slides
RSC: Mining and Modeling Temporal Activity in Social Media by
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
1.1K views41 slides
What's new in Hivemall v0.5.0 by
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0Makoto Yui
686 views38 slides
Building Scalable IoT Apps (QCon S-F) by
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Pavel Hardak
1.3K views51 slides
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms by
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
809 views70 slides
SmartData Webinar: Applying Neocortical Research to Streaming Analytics by
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
1.2K views29 slides

Similar to Live Anomaly Detection(20)

Mastering AIOps with Deep Learning by Jorge Cardoso
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
Jorge Cardoso2K views
RSC: Mining and Modeling Temporal Activity in Social Media by Alceu Ferraz Costa
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social Media
Alceu Ferraz Costa1.1K views
What's new in Hivemall v0.5.0 by Makoto Yui
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0
Makoto Yui686 views
Building Scalable IoT Apps (QCon S-F) by Pavel Hardak
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
Pavel Hardak1.3K views
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms by Jason Riedy
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy809 views
SmartData Webinar: Applying Neocortical Research to Streaming Analytics by DATAVERSITY
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
DATAVERSITY1.2K views
Streaming Analytics: It's Not the Same Game by Numenta
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
Numenta2.6K views
Data Science in Retail-as-a-Service (RaaS) by zhiking
Data Science in Retail-as-a-Service (RaaS)Data Science in Retail-as-a-Service (RaaS)
Data Science in Retail-as-a-Service (RaaS)
zhiking7K views
Automated clock mesh analysis for faster turnaround by Animesh Sharma
Automated clock mesh analysis for faster turnaroundAutomated clock mesh analysis for faster turnaround
Automated clock mesh analysis for faster turnaround
Animesh Sharma299 views
Eclipse VIATRA Overview 2017 by Istvan Rath
Eclipse VIATRA Overview 2017Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017
Istvan Rath659 views
Time Series Anomaly Detection with .net and Azure by Marco Parenzan
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
Marco Parenzan146 views
Stock Market Prediction.pptx by RastogiAman
Stock Market Prediction.pptxStock Market Prediction.pptx
Stock Market Prediction.pptx
RastogiAman83 views
The unknown spatial quality of dense point clouds derived from stereo images by ieeepondy
The unknown spatial quality of dense point clouds derived from stereo imagesThe unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo images
ieeepondy132 views
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea... by Spark Summit
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Spark Summit2K views
Денис Баталов by CodeFest
Денис БаталовДенис Баталов
Денис Баталов
CodeFest1.1K views
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016 by MLconf
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
MLconf2.7K views
Forecasting at Scale with Marcello Tomasini by InfluxData
Forecasting at Scale with Marcello TomasiniForecasting at Scale with Marcello Tomasini
Forecasting at Scale with Marcello Tomasini
InfluxData996 views
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors by ankit_ppt
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
ankit_ppt231 views

More from Arun Kejariwal

Anomaly Detection At The Edge by
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
581 views54 slides
Serverless Streaming Architectures and Algorithms for the Enterprise by
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
2.8K views227 slides
Sequence-to-Sequence Modeling for Time Series by
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
3.2K views64 slides
Sequence-to-Sequence Modeling for Time Series by
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
1.9K views45 slides
Model Serving via Pulsar Functions by
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
1.7K views44 slides
Designing Modern Streaming Data Applications by
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
2.6K views227 slides

More from Arun Kejariwal(19)

Anomaly Detection At The Edge by Arun Kejariwal
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
Arun Kejariwal581 views
Serverless Streaming Architectures and Algorithms for the Enterprise by Arun Kejariwal
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal2.8K views
Sequence-to-Sequence Modeling for Time Series by Arun Kejariwal
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal3.2K views
Sequence-to-Sequence Modeling for Time Series by Arun Kejariwal
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal1.9K views
Model Serving via Pulsar Functions by Arun Kejariwal
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
Arun Kejariwal1.7K views
Designing Modern Streaming Data Applications by Arun Kejariwal
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
Arun Kejariwal2.6K views
Correlation Analysis on Live Data Streams by Arun Kejariwal
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
Arun Kejariwal321 views
Deep Learning for Time Series Data by Arun Kejariwal
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
Arun Kejariwal1.7K views
Correlation Analysis on Live Data Streams by Arun Kejariwal
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
Arun Kejariwal2.1K views
Real Time Analytics: Algorithms and Systems by Arun Kejariwal
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
Arun Kejariwal23K views
Finding bad apples early: Minimizing performance impact by Arun Kejariwal
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal1.1K views
Statistical Learning Based Anomaly Detection @ Twitter by Arun Kejariwal
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
Arun Kejariwal5.1K views
Days In Green (DIG): Forecasting the life of a healthy service by Arun Kejariwal
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
Arun Kejariwal793 views
Gimme More! Supporting User Growth in a Performant and Efficient Fashion by Arun Kejariwal
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Arun Kejariwal2.3K views
A Systematic Approach to Capacity Planning in the Real World by Arun Kejariwal
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
Arun Kejariwal5.5K views
Isolating Events from the Fail Whale by Arun Kejariwal
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
Arun Kejariwal2K views
Techniques for Minimizing Cloud Footprint by Arun Kejariwal
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
Arun Kejariwal1.4K views
A Tool for Practical Garbage Collection Analysis In the Cloud by Arun Kejariwal
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
Arun Kejariwal3.4K views

Recently uploaded

State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
303 views53 slides
Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
145 views13 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
120 views17 slides
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueShapeBlue
265 views23 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
65 views27 slides
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Moses Kemibaro
35 views38 slides

Recently uploaded(20)

State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue247 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue178 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc176 views

Live Anomaly Detection

  • 1. LiveAnomalyDetection Identifying anomalies in live data 1 Dhruv Choudhary Francois Orsini Arun Kejariwal
  • 2. SATORI #StrataData What is live data ? Petabytes of Data Live Reactions Time to Reaction ~ less than 5 msecs 2 44546A
  • 3. SATORI What is live data ? 3 Live Data is new Streaming data that needs to be reacted upon instantly
  • 5. SATORI #StrataData Most Recent and Realtime Reactive Data Most Recent Low Latency Unstructured Highly Reactive High Throughput 5 Live Data Properties
  • 6. SATORI #StrataData 6 Satori - A Live Data Computation Mesh Satori powers a live data mesh that is capable of managing data flows from billions endpoints simultaneously at milliseconds latency.
  • 8. SATORI #StrataData Live Data and Big Data ? 8
  • 10. SATORI #StrataData Live Data Platform •Satori is a fully managed 
 platform as a service •Connect, process and react to 
 streaming live data at ultra-low 
 latency. •Use cases are within 
 (but not limited to) IoT, mobile 
 fitness, gaming and smart cities. 10
  • 11. SATORI #StrataData In-Stream Live Anomaly Detection 11
  • 13. SATORI #StrataData What are live anomalies ? Audio Time Series Video Text Binary Gun Shot Sound Stock market Crash Profanity Filters Road Accident 13
  • 14. SATORI #StrataData Audio Time Series Time Series Audio Prediction Error Audio Series Engine Misfiring 14 Connected Cars Surveillance Equipment Malfunction Applications FFT Window Features Timbre, Tempo, Dynamics Wavenet Other approaches
  • 15. SATORI #StrataData Text to Time Series Text Word2vec Averaging Word Anomalies Paragraph Anomalies Novelty Detection Applications 15 C0 C1 C2 Clustering Word2vec Averaging Word2vec Averaging Time Series
  • 16. SATORI #StrataData Video to Time Series Video Shape Anomalies Deep Encoders Representation 16 Time Series
  • 17. SATORI #StrataData Attributes of Live Anomaly Detection Model Selection Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 17
  • 18. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity POINT ANOMALY Individual points that break the pattern made by adjoining points CHANGEPOINT Change in mean, variance or structure of the series PATTERN ANOMALY A group of collective points that form a pattern never seen before. TREND ANOMALY Significant Perturbation in the longterm trend of a series 18
  • 19. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 19 MEMORY CONSTRAINTS Can all the data be loaded into memory ? COMPUTE CONSTRAINTS Can you keep up to the data rate ? EVOLUTIONARY DATA Is the structure in data changing continuously?
  • 20. SATORI #StrataData True Positives True Negatives Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity DATA OBSOLESCENCE How fast should we forget past anomalies ? ANOMALY CLUSTERING Anomalies usually occur close to each other 20
  • 21. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity CONSTANT RATE How much human attention to allocate ? 21
  • 22. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity SEMI - SUPERVISED LEARNING Use Unlabelled Data for Training SUPERVISED LEARNING Separate positive and negative samples LABELTRAININFERENCE 22
  • 23. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity Hourly Series, Daily Seasonality Minutely Series, Hourly Seasonality Secondly Series, Seasonal Jitter 23
  • 25. SATORI #StrataData Statistics 25 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental
  • 26. SATORI #StrataData Statistics 26 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  • 27. SATORI #StrataData Statistics NON-PARAMETRIC STATISTICS Histogram based techniques t-digest Adjusted Box plots 99.73 %00.27 % 27 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  • 28. SATORI #StrataData Time Series Analysis AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series ARMA, SARMA, EWMA, TBATS Model Estimation based on past data Point Anomalies 28
  • 29. SATORI #StrataData Time Series Analysis STL, LOESS Non-parametric regression to model time series AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 29 Point Anomalies
  • 30. SATORI #StrataData Time Series Analysis PCA, RobustPCA Principal Component Analysis EDM, BCP, SDAR Breakout Detection, Sequential Discounting AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 30 Point Anomalies
  • 31. SATORI #StrataData Pattern Mining Mark the rarest elements in the stream as anomalies Inter arrival times for patterns HOTSAX Rare-Rule Anomaly Pattern Anomalies Incremental False Alarm Rate Robust 31
  • 34. SATORI #StrataData Deep Learning LSTM Encoders LSTM Auto-Encoders Anomaly Input Input Reconstructed Input Explicitly Models Time Series Structure Non-linear dimensionality reduction without modeling time series structure Performance degrades as the modality of the series increases 34
  • 35. SATORI #StrataData Time Series Prediction Prediction Input Point Anomalies Deep Learning LSTM Anomaly Input Classifier Labels Point Anomalies No need for a fixed size window for model estimation Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies 35
  • 36. SATORI #StrataData Deep Learning LSTM Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies Multiple predicted value for each future observation Model the errors as multivariate gaussian to find anomalous observations Model the Euclidean distance between predicted and true sequences as error Prediction Error Prediction Error 36
  • 37. SATORI #StrataData Deep Learning LSTM Encoders Time Series Reconstruction Pattern Prediction Input Pattern Anomalies Decoder Network Encoder Network LSTM-Encoder LSTM AutoEncoder Runtimes 37
  • 38. SATORI #StrataData Correlation in Anomalies s1 s2 s3 s8 Multi-dimensionality Model Correlation Correlation in anomaly space can be captured in a graph What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series is very expensive for live data Naive Algorithm Majority vote across all dimensions s4 s5 s6 s7 s6 s4 s2 s5 s8 s1 38
  • 39. SATORI #StrataData Multi-dimensionality What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series can be very expensive LSTM No need for a fixed size window for model estimation Can model high dimensionality Works with non-stationary time series with irregular structure Does not work with evolutionary series 39
  • 40. SATORI #StrataData Model Selection Single Dimensional Series Multi-Dimensional Series DBStream Runtimes Statistics TimeSeriesAnalysis HOTSAX/RRA OneSVM/Iforest LSTM DBStream Statistics TimeSeries LSTM Runtimes 40
  • 42. SATORI #StrataData Business Metrics Performance Metrics Health Metrics Operations Point Anomalies Change Points Trend Anomalies What kind of anomalies < 100 msec Latency Sensitivity 42 #StrataData
  • 43. SATORI #StrataData Electrocardiograms Fitness Trackers Healthcare Pattern anomalies Change points What kind of anomalies < 1 msec Latency Sensitivity 43 #StrataData ECG Driver Stress Epilepsy onset
  • 44. SATORI #StrataData Traffic Routing Passenger Load Scheduling Transportation Point anomalies Change points Trend Anomalies Spatial Anomalies What kind of anomalies < 5 seconds Latency Sensitivity 44 #StrataData
  • 45. SATORI #StrataData Stock Trades Share Prices Financial Data Point anomalies Change points What kind of anomalies < 100 micro-seconds Latency Sensitivity 45 #StrataData
  • 46. SATORI #StrataData Network Intrusion Video Surveillance Security Point anomalies Change points What kind of anomalies < 5 msecs Latency Sensitivity 46 #StrataData
  • 47. SATORI #StrataData Smart Homes Connected Cars Smart Devices Internet of Things Pattern anomalies Change points Trend anomalies What kind of anomalies < 5 msecs Latency Sensitivity 47
  • 49. SATORI Check our tech @ satori.com @choudharydhruv @SatoriLiveData @arun_kejariwal @FrancoisOrsini_ 49
  • 50. SATORI Resources “Computing Extremely Accurate Quantiles using t-Digests”, https://github.com/tdunning/t-digest 50 https://deepmind.com/blog/wavenet-generative-model-raw-audio/ https://www.tensorflow.org/tutorials/word2vec https://blog.keras.io/building-autoencoders-in-keras.html “Deep Learning for Time Series Analysis”, https://arxiv.org/pdf/1701.01887.pdf
  • 51. SATORI Readings “Using Natural Language Processing Models for Understanding Network Anomalies”, HPEC’17. 51 “Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection”, CIN’17. “Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network”, FDSE’16. “Deep Structured Energy Based Models for Anomaly Detection”, ICML’16. “Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series”, ICML’16.
  • 52. SATORI Readings 52 “Long Short Term Memory Networks for. Anomaly Detection in Time Series”, ESANN’15. “Clustering Data Streams based on Shared Density Between Clusters”, TKDE’16. “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML’16 Anomaly Detection Workshop. “Sequence to Sequence Model for Anomaly Detection in Financial Transactions”, ICML’16. “MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection”, NetworkML’16.
  • 53. SATORI Readings “Anomaly detection: A survey”, ACM Computing Surveys, 2009. “Time Series Analysis by State Space Methods”, by J. Durbin and S. J. Koopman, 2001. 53 “HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence”, ICDM, 2005 “Real-time change-point detection using sequentially discounting normalized maximum likelihood coding”, Advanced Knowledge Discovery Data Mining, 2011. “Unsupervised Learning of Video Representations using LSTMs”, ICML’15.