Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Live Anomaly Detection

2,326 views

Published on

compute tier. Detection and filtering of anomalies in live data is of paramount importance for robust decision making. To this end, in this talk we share techniques for anomaly detection in live data.

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Live Anomaly Detection

  1. 1. LiveAnomalyDetection Identifying anomalies in live data 1 Dhruv Choudhary Francois Orsini Arun Kejariwal
  2. 2. SATORI #StrataData What is live data ? Petabytes of Data Live Reactions Time to Reaction ~ less than 5 msecs 2 44546A
  3. 3. SATORI What is live data ? 3 Live Data is new Streaming data that needs to be reacted upon instantly
  4. 4. SATORI #StrataData Where is live data ? 4
  5. 5. SATORI #StrataData Most Recent and Realtime Reactive Data Most Recent Low Latency Unstructured Highly Reactive High Throughput 5 Live Data Properties
  6. 6. SATORI #StrataData 6 Satori - A Live Data Computation Mesh Satori powers a live data mesh that is capable of managing data flows from billions endpoints simultaneously at milliseconds latency.
  7. 7. SATORI #StrataData 7 7
  8. 8. SATORI #StrataData Live Data and Big Data ? 8
  9. 9. SATORI #StrataData 9
  10. 10. SATORI #StrataData Live Data Platform •Satori is a fully managed 
 platform as a service •Connect, process and react to 
 streaming live data at ultra-low 
 latency. •Use cases are within 
 (but not limited to) IoT, mobile 
 fitness, gaming and smart cities. 10
  11. 11. SATORI #StrataData In-Stream Live Anomaly Detection 11
  12. 12. SATORI #StrataData Data Quadrants 12
  13. 13. SATORI #StrataData What are live anomalies ? Audio Time Series Video Text Binary Gun Shot Sound Stock market Crash Profanity Filters Road Accident 13
  14. 14. SATORI #StrataData Audio Time Series Time Series Audio Prediction Error Audio Series Engine Misfiring 14 Connected Cars Surveillance Equipment Malfunction Applications FFT Window Features Timbre, Tempo, Dynamics Wavenet Other approaches
  15. 15. SATORI #StrataData Text to Time Series Text Word2vec Averaging Word Anomalies Paragraph Anomalies Novelty Detection Applications 15 C0 C1 C2 Clustering Word2vec Averaging Word2vec Averaging Time Series
  16. 16. SATORI #StrataData Video to Time Series Video Shape Anomalies Deep Encoders Representation 16 Time Series
  17. 17. SATORI #StrataData Attributes of Live Anomaly Detection Model Selection Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 17
  18. 18. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity POINT ANOMALY Individual points that break the pattern made by adjoining points CHANGEPOINT Change in mean, variance or structure of the series PATTERN ANOMALY A group of collective points that form a pattern never seen before. TREND ANOMALY Significant Perturbation in the longterm trend of a series 18
  19. 19. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 19 MEMORY CONSTRAINTS Can all the data be loaded into memory ? COMPUTE CONSTRAINTS Can you keep up to the data rate ? EVOLUTIONARY DATA Is the structure in data changing continuously?
  20. 20. SATORI #StrataData True Positives True Negatives Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity DATA OBSOLESCENCE How fast should we forget past anomalies ? ANOMALY CLUSTERING Anomalies usually occur close to each other 20
  21. 21. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity CONSTANT RATE How much human attention to allocate ? 21
  22. 22. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity SEMI - SUPERVISED LEARNING Use Unlabelled Data for Training SUPERVISED LEARNING Separate positive and negative samples LABELTRAININFERENCE 22
  23. 23. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity Hourly Series, Daily Seasonality Minutely Series, Hourly Seasonality Secondly Series, Seasonal Jitter 23
  24. 24. SATORI #StrataData Research Domains Statistics Pattern Mining Time Series Machine Learning Stream Clustering Deep Learning 24
  25. 25. SATORI #StrataData Statistics 25 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental
  26. 26. SATORI #StrataData Statistics 26 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  27. 27. SATORI #StrataData Statistics NON-PARAMETRIC STATISTICS Histogram based techniques t-digest Adjusted Box plots 99.73 %00.27 % 27 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  28. 28. SATORI #StrataData Time Series Analysis AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series ARMA, SARMA, EWMA, TBATS Model Estimation based on past data Point Anomalies 28
  29. 29. SATORI #StrataData Time Series Analysis STL, LOESS Non-parametric regression to model time series AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 29 Point Anomalies
  30. 30. SATORI #StrataData Time Series Analysis PCA, RobustPCA Principal Component Analysis EDM, BCP, SDAR Breakout Detection, Sequential Discounting AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 30 Point Anomalies
  31. 31. SATORI #StrataData Pattern Mining Mark the rarest elements in the stream as anomalies Inter arrival times for patterns HOTSAX Rare-Rule Anomaly Pattern Anomalies Incremental False Alarm Rate Robust 31
  32. 32. SATORI #StrataData DBscan k-means c-means Clu-Stream DenStream Clustree D-Stream HPStream DBStream Clustering Micro-Clusters Online Micro- Clustering Offline Clustering of MCs 32
  33. 33. SATORI #StrataData Clustering Micro-Clusters Online Micro- Clustering Offline Clustering of MCs d d DBStream ClusTree DenStream DBStream Runtimes Pattern Anomalies Incremental Robust 33
  34. 34. SATORI #StrataData Deep Learning LSTM Encoders LSTM Auto-Encoders Anomaly Input Input Reconstructed Input Explicitly Models Time Series Structure Non-linear dimensionality reduction without modeling time series structure Performance degrades as the modality of the series increases 34
  35. 35. SATORI #StrataData Time Series Prediction Prediction Input Point Anomalies Deep Learning LSTM Anomaly Input Classifier Labels Point Anomalies No need for a fixed size window for model estimation Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies 35
  36. 36. SATORI #StrataData Deep Learning LSTM Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies Multiple predicted value for each future observation Model the errors as multivariate gaussian to find anomalous observations Model the Euclidean distance between predicted and true sequences as error Prediction Error Prediction Error 36
  37. 37. SATORI #StrataData Deep Learning LSTM Encoders Time Series Reconstruction Pattern Prediction Input Pattern Anomalies Decoder Network Encoder Network LSTM-Encoder LSTM AutoEncoder Runtimes 37
  38. 38. SATORI #StrataData Correlation in Anomalies s1 s2 s3 s8 Multi-dimensionality Model Correlation Correlation in anomaly space can be captured in a graph What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series is very expensive for live data Naive Algorithm Majority vote across all dimensions s4 s5 s6 s7 s6 s4 s2 s5 s8 s1 38
  39. 39. SATORI #StrataData Multi-dimensionality What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series can be very expensive LSTM No need for a fixed size window for model estimation Can model high dimensionality Works with non-stationary time series with irregular structure Does not work with evolutionary series 39
  40. 40. SATORI #StrataData Model Selection Single Dimensional Series Multi-Dimensional Series DBStream Runtimes Statistics TimeSeriesAnalysis HOTSAX/RRA OneSVM/Iforest LSTM DBStream Statistics TimeSeries LSTM Runtimes 40
  41. 41. SATORI 41
  42. 42. SATORI #StrataData Business Metrics Performance Metrics Health Metrics Operations Point Anomalies Change Points Trend Anomalies What kind of anomalies < 100 msec Latency Sensitivity 42 #StrataData
  43. 43. SATORI #StrataData Electrocardiograms Fitness Trackers Healthcare Pattern anomalies Change points What kind of anomalies < 1 msec Latency Sensitivity 43 #StrataData ECG Driver Stress Epilepsy onset
  44. 44. SATORI #StrataData Traffic Routing Passenger Load Scheduling Transportation Point anomalies Change points Trend Anomalies Spatial Anomalies What kind of anomalies < 5 seconds Latency Sensitivity 44 #StrataData
  45. 45. SATORI #StrataData Stock Trades Share Prices Financial Data Point anomalies Change points What kind of anomalies < 100 micro-seconds Latency Sensitivity 45 #StrataData
  46. 46. SATORI #StrataData Network Intrusion Video Surveillance Security Point anomalies Change points What kind of anomalies < 5 msecs Latency Sensitivity 46 #StrataData
  47. 47. SATORI #StrataData Smart Homes Connected Cars Smart Devices Internet of Things Pattern anomalies Change points Trend anomalies What kind of anomalies < 5 msecs Latency Sensitivity 47
  48. 48. SATORI 48
  49. 49. SATORI Check our tech @ satori.com @choudharydhruv @SatoriLiveData @arun_kejariwal @FrancoisOrsini_ 49
  50. 50. SATORI Resources “Computing Extremely Accurate Quantiles using t-Digests”, https://github.com/tdunning/t-digest 50 https://deepmind.com/blog/wavenet-generative-model-raw-audio/ https://www.tensorflow.org/tutorials/word2vec https://blog.keras.io/building-autoencoders-in-keras.html “Deep Learning for Time Series Analysis”, https://arxiv.org/pdf/1701.01887.pdf
  51. 51. SATORI Readings “Using Natural Language Processing Models for Understanding Network Anomalies”, HPEC’17. 51 “Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection”, CIN’17. “Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network”, FDSE’16. “Deep Structured Energy Based Models for Anomaly Detection”, ICML’16. “Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series”, ICML’16.
  52. 52. SATORI Readings 52 “Long Short Term Memory Networks for. Anomaly Detection in Time Series”, ESANN’15. “Clustering Data Streams based on Shared Density Between Clusters”, TKDE’16. “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML’16 Anomaly Detection Workshop. “Sequence to Sequence Model for Anomaly Detection in Financial Transactions”, ICML’16. “MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection”, NetworkML’16.
  53. 53. SATORI Readings “Anomaly detection: A survey”, ACM Computing Surveys, 2009. “Time Series Analysis by State Space Methods”, by J. Durbin and S. J. Koopman, 2001. 53 “HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence”, ICDM, 2005 “Real-time change-point detection using sequentially discounting normalized maximum likelihood coding”, Advanced Knowledge Discovery Data Mining, 2011. “Unsupervised Learning of Video Representations using LSTMs”, ICML’15.
  54. 54. SATORI Thank you 54

×