BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Anomaly Detection
database integrated
Dr. Olaf Nimz
Our company.
anomaly detection2 08/01/2018
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
Scoring Engine for R
3
EXEC sp_execute_external_script
@language =N'R',
-- SQL Part (sends to @script)
@input_data_1 =N 'SELECT 1 as Installed',
-- R Part (gets @input_data_1)
@script=N'OutputDataSet<-InputDataSet'
WITH RESULT SETS
(([Installed] int not null));
GO
Microsoft R ServerLaunchpad
(BxlServer and
SQL Satellite,
Rserver.dll)
08/01/2018 anomaly detection
Agenda
anomaly detection4 08/01/2018
1. Prioritise data quality effort
Cleansing DWH
IoT Streams
Online Learning
2. Unsupervised Measures
Mahalanobis distance
Clustering
Local Outlier Factor
Isolation Forest
Variational AutoEncoders
Novelty, Noise, Outlier, Anomaly, Fraud, Instability
anomaly detection5 08/01/2018
1. Special Observations
in Relation to Baseline
Contextual
2. Suspicious Observations
Novelty
Outlier / Anomaly
Data quality issue
Instable Process
Random Noise
Local - Global
anomaly detection6 08/01/2018
Local - Global
anomaly detection7 08/01/2018
High dimensional: Distance is meaningless

Approach for Detection
anomaly detection8 08/01/2018
1. Statistical Distribution
Entropy
2. Deviation from Normal
Sequence of Events
Conditional (temporal, spatial context)
Collective like DoS-Attack
3. Distance to neighborhood
4. Local Density
5. High-dimensional adaptations
Subspace projection
Angle based
 


Univariate Extreme Values
anomaly detection9 08/01/2018
IQR = Inter quartile range
~ 95%
Median = 50% Percentile
50%
> 2 stdev (~ 2%)
Grubb’s test per point
Scaling by z-Scores
(robust using
Median absolute deviation)
Multivariate data
anomaly detection10 08/01/2018
Robust
Mahalanobis
Distance
chisq.plot()
dimensions
2
No outlier ?
anomaly detection11 08/01/2018
+
SCADA of Wind Turbine
anomaly detection16 08/01/2018
Power Curve – Deviation from Prediction
anomaly detection17 08/01/2018
lm( power ~ wind_speed
+ I(wind_speed^2)
+ I(wind_speed^3) , data)
R2
adj.= 95%
Sample
Boxplot – univariate
anomaly detection18 08/01/2018
Mean
Median
Power
WindSpeed
Temperature
Wind distribution
anomaly detection19 08/01/2018
Mahalanobis Distance
anomaly detection20 08/01/2018
Multivariate:
multi dimensional
Scale:
How many stdev
away from center ?
Mahalanobis Distance
candidates
Outlier in Orginal Space
anomaly detection21 08/01/2018
Eigenvector Space
anomaly detection22 08/01/2018
Overview of Outliers
anomaly detection23 08/01/2018
Leland Wilkinson's probabilistic HDoutlier model
=> for mixture of numeric and categorical variables
1D
2D
3D
4D
HDBSCAN
anomaly detection24 08/01/2018
Only Borderline & Core cases
Scaled by z-Scores
Local Outlier Factor
anomaly detection25 08/01/2018
Reachability distance: It can be "reached" from its neighbors.
LOF: relative Reachability compared to its neighbours
K-Nearest Neighbors
Reachability distance
Hierarchical Clustering
anomaly detection26 08/01/2018
MDS coloured by cuttree
Isolation Forest - Emsemble
anomaly detection27 08/01/2018
Challenges
anomaly detection28 08/01/2018
1. Manual threshold – automatic is expensive
2. Mixed data type (numeric & categorical)
3. High dimensional spaces
4. High Cardinality (Granularity)
5. Multi-Modal: Global vs Local Scope
6. Online
http://projects.rajivshah.com/shiny/outlier/

Anomaly detection - database integrated

  • 1.
    BASLE BERN BRUGGDÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Anomaly Detection database integrated Dr. Olaf Nimz
  • 2.
    Our company. anomaly detection208/01/2018 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 3.
    Scoring Engine forR 3 EXEC sp_execute_external_script @language =N'R', -- SQL Part (sends to @script) @input_data_1 =N 'SELECT 1 as Installed', -- R Part (gets @input_data_1) @script=N'OutputDataSet<-InputDataSet' WITH RESULT SETS (([Installed] int not null)); GO Microsoft R ServerLaunchpad (BxlServer and SQL Satellite, Rserver.dll) 08/01/2018 anomaly detection
  • 4.
    Agenda anomaly detection4 08/01/2018 1.Prioritise data quality effort Cleansing DWH IoT Streams Online Learning 2. Unsupervised Measures Mahalanobis distance Clustering Local Outlier Factor Isolation Forest Variational AutoEncoders
  • 5.
    Novelty, Noise, Outlier,Anomaly, Fraud, Instability anomaly detection5 08/01/2018 1. Special Observations in Relation to Baseline Contextual 2. Suspicious Observations Novelty Outlier / Anomaly Data quality issue Instable Process Random Noise
  • 6.
    Local - Global anomalydetection6 08/01/2018
  • 7.
    Local - Global anomalydetection7 08/01/2018 High dimensional: Distance is meaningless
  • 8.
     Approach for Detection anomalydetection8 08/01/2018 1. Statistical Distribution Entropy 2. Deviation from Normal Sequence of Events Conditional (temporal, spatial context) Collective like DoS-Attack 3. Distance to neighborhood 4. Local Density 5. High-dimensional adaptations Subspace projection Angle based    
  • 9.
    Univariate Extreme Values anomalydetection9 08/01/2018 IQR = Inter quartile range ~ 95% Median = 50% Percentile 50% > 2 stdev (~ 2%) Grubb’s test per point Scaling by z-Scores (robust using Median absolute deviation)
  • 10.
    Multivariate data anomaly detection1008/01/2018 Robust Mahalanobis Distance chisq.plot() dimensions 2
  • 11.
    No outlier ? anomalydetection11 08/01/2018 +
  • 12.
    SCADA of WindTurbine anomaly detection16 08/01/2018
  • 13.
    Power Curve –Deviation from Prediction anomaly detection17 08/01/2018 lm( power ~ wind_speed + I(wind_speed^2) + I(wind_speed^3) , data) R2 adj.= 95% Sample
  • 14.
    Boxplot – univariate anomalydetection18 08/01/2018 Mean Median Power WindSpeed Temperature
  • 15.
  • 16.
    Mahalanobis Distance anomaly detection2008/01/2018 Multivariate: multi dimensional Scale: How many stdev away from center ? Mahalanobis Distance candidates
  • 17.
    Outlier in OrginalSpace anomaly detection21 08/01/2018
  • 18.
  • 19.
    Overview of Outliers anomalydetection23 08/01/2018 Leland Wilkinson's probabilistic HDoutlier model => for mixture of numeric and categorical variables 1D 2D 3D 4D
  • 20.
    HDBSCAN anomaly detection24 08/01/2018 OnlyBorderline & Core cases Scaled by z-Scores
  • 21.
    Local Outlier Factor anomalydetection25 08/01/2018 Reachability distance: It can be "reached" from its neighbors. LOF: relative Reachability compared to its neighbours K-Nearest Neighbors Reachability distance
  • 22.
    Hierarchical Clustering anomaly detection2608/01/2018 MDS coloured by cuttree
  • 23.
    Isolation Forest -Emsemble anomaly detection27 08/01/2018
  • 24.
    Challenges anomaly detection28 08/01/2018 1.Manual threshold – automatic is expensive 2. Mixed data type (numeric & categorical) 3. High dimensional spaces 4. High Cardinality (Granularity) 5. Multi-Modal: Global vs Local Scope 6. Online http://projects.rajivshah.com/shiny/outlier/