Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation

6,013 views

Published on

Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.

However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.

Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager - StreamAnalytix, in a discussion on:

Importance of anomaly detection in enterprise data, types of anomalies, and challenges
Prominent real-time application areas
Approaches, techniques and algorithms for anomaly detection
Sample use-case implementation on the StreamAnalytix platform

Published in: Data & Analytics
  • Login to see the comments

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation

  1. 1. Anomaly Detection: Real World Scenarios, Approaches and Live Implementation WEBINAR | DECEMBER 15, 2017 Saurabh DuttaRavishankar Rao Vallabhajosyula SENIOR DATA SCIENTIST, IMPETUS TECHNOLOGIES TWITTER: @ImpetusTech TECHNICAL PRODUCT MANAGER, STREAMANALYTIX TWITTER: @StreamAnalytix
  2. 2. Agenda • What’s an anomaly? • Real world use cases of anomaly detection • Key steps in anomaly detection • A deep dive into building an anomaly detection model • Types of anomaly detection • Data attributes • Approaches and methods • A platform approach to anomaly detection • Live implementation using StreamAnalytix • Q & A
  3. 3. About Impetus Mission critical technology solutions since 1996 Fortune 500: Big data clients 1700 people; US, India, global reach Unique mix of big data products and services
  4. 4. What’s an Anomaly? Anomaly: is an observation that greatly deviates from most of the other observations, i.e., a data point/behavior/pattern that appears to be statistically unusual or 'anomalous' Basic qualities of anomaly: 1. Rare 2. Significantly different from others
  5. 5. What is different about modern anomaly detection? • Rule based methods are hard to scale • Modern data science techniques are more efficient • Can work with real-time data • Improve detection across multiple channels • Learn and detect variations • Adaptable to multiple domains
  6. 6. Real world use cases of anomaly detection Anomaly detection is influencing business decisions across verticals MANUFACTURING Detect abnormal machine behavior to prevent cost overruns FINANCE & INSURANCE Detect and prevent out of pattern or fraudulent spend, travel expenses HEALTHCARE Detect fraud in claims and payments; events from RFID and mobiles BANKING Flag abnormally high purchases/deposits, detect cyber intrusions NETWORKING Detect intrusion into networks, prevent theft of source code or IP SOCIAL MEDIA Detect compromised accounts, bots that generate fake reviews VIDEO SURVEILLANCE Detect or track objects and persons of interest in monotonous footage SMART HOUSE Detect energy leakage, standardize smart sensor datasets TELECOM Detect roaming abuse, revenue fraud, service disruptions TRANSPORTATION Ensure external communications to the vehicle are not intrusion
  7. 7. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  8. 8. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  9. 9. Model development for anomaly detection Type of anomaly detection used Type of data available If the data has labels
  10. 10. Taxonomy of anomaly detection Anomaly Detection Collective AnomalyContextual AnomalyPoint Anomaly
  11. 11. Data – Types of attributes Data Categorical Nominal Ordinal Numerical Named Categories Categories with an implied order Discrete Continuous Only particular numbers Any numerical value Binary Variables with only two options (Yes/No)
  12. 12. Data – Choice of algorithm Data Categorical Nominal Ordinal Numerical Discrete Continuous Binary Apply K-means clustering Data has no labels Apply time-series anomaly detection algorithms When time-stamps are present Data has labels Use standard machine learning classifiers Use sequence classification algorithms When time-stamps are absent
  13. 13. Approaches to anomaly detection Model Test Data Result Training Data Supervised (Classification) Data skewness, lack of counter examples Model Test Data Result Training Data Semi-supervised (Novelty detection) Requires a 'normal' training dataset Model Unlabeled Data Result Unsupervised (Clustering) Faces curse of dimensionality Unsupervised Algorithm
  14. 14. Methods for anomaly detection: Categorical and numeric attributes K-modes Generic mixture models Robust SVM Uses hamming distance to measure distance for categorical features Extends the framework of Gaussian mixture models Kernel-based approach that identifies regions in which data resides in alternate feature space
  15. 15. Methods for anomaly detection: Sequential data State space models Hidden Markov models Graph-based methods Model the evolution of data in time to enable forecasting and flag an anomaly if it exceeds a threshold Markov Chains and HMMs measure the probability of different events happening in some sequence Graphs capture interdependencies, and allow discovery of relational associations such as in fraud System Behavior model Observed behavior Expected behavior Observation Model Formation Anomaly Detection Simulation
  16. 16. Latest methods for anomaly detection Deep Learning (AutoEncoder) Deep Learning (RNN-based) Generative Adversarial Nets AutoEncoders can learn the latent representation of the data by using an encoder and a decoder together RNN-based architectures enable sequence prediction. The network can flag an anomaly when needed GANs combine two neural networks - a generator and a discriminator, and can be used to find anomalies
  17. 17. Anomaly detection algorithms Host-based IDS • Statistical profiling using histograms • Mixture of models, • Neural networks • SVM, Rule-based systems Network intrusion detection • Statistical profiling using histograms • Parametric statistical modeling • Non-parametric statistical modeling • Bayesian networks, Neural networks • SVM, Rule-based systems • Clustering based, Nearest neighbor • Spectral, Information Theoretic Credit card fraud detection • Neural Networks, • Rule-based systems • Clustering, Self-organizing map • Artificial immune system • Decision trees, SVM Mobile phone fraud detection • Statistical profiling using histograms • Parametric statistical modeling • Neural networks, Rule-based systems Insider trading detection • Statistical profiling using histograms • Information theoretic Medical and public health • Parametric statistical modeling • Neural networks, Bayesian networks • Rule-based systems • Nearest neighbor techniques Fault detection in mechanical units • Parametric statistical modeling • Non-parametric statistical modeling • Neural networks, Spectral methods • Rule-based systems Structural damage detection • Statistical profiling using histograms • Parametric statistical modeling • Mixture of models, Neural networks Image processing, Surveillance • Mixture of models, Regression, SVM • Bayesian networks, Neural networks, • Clustering, Nearest neighbor methods Anomalous topic detection • Mixture of models, Neural networks • Statistical profiling using histograms • Clustering, SVM Anomaly detection in sensor networks • Parametric statistical modeling • Bayesian networks, Nearest neighbor • Rule-based systems, Spectral
  18. 18. Poll question: At what stage is your organization in implementing anomaly detection techniques / solutions using advanced Data Science / Machine Learning / Real-time approaches? Stage 0: We do not have any plans yet, I am here for education Stage 1: We are at an initial planning stage Stage 2: Currently evaluating platforms/ implementation partners Stage 3: Implementation underway Stage 4: Already using a modern anomaly detection platform/ solution
  19. 19. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  20. 20. A modern platform approach to anomaly detection • Multi-tenancy • Rapidly develop and operationalize • Apply data science / machine learning techniques with real-time data • A-B testing • Easily scalable • Monitor, debug and diagnose at scale • Version management • Deployment workflow: Dev – Test – Prod
  21. 21. Real-time Stream Processing and Machine Learning Platform ENABLING THE REAL-TIME ENTERPRISE
  22. 22. Implementing credit card fraud detection in real-time using
  23. 23. Schema overview { "isMerchantCompromised": 0, "isfraudent": true, "transactionAmount": 11276.0, "phone": "1478523699", "radiusFromResidence": 2.0, "deviation": 10.0, "averageTransaction": 4608.0, "city": 3, "transactionTime": "1512979321050", "email": "ava@mail.com", "name": "Jean", "gender": "Male", "merchantName": “My_Company", "timeOfDay": "10:30:19", "merchantCity": 10 }
  24. 24. Build Apache Spark Applications Within Minutes https://www.streamanalytix.com/download
  25. 25. Key takeaways • Modern data science techniques significantly improve detection of anomalies • It is possible to do it on streaming data in a scalable manner • Modern platforms can simplify implementation and reduce development cycle
  26. 26. Thank you. Questions? © 2017 Impetus Technologies Email: inquiry@streamanalytix.com Twitter : @ImpetusTech / @StreamAnalytix

×