Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC
 Anomalies

Data Science Fairy Tale
 Topics in Anomaly Detection
 Seizure Detection Example
 Summary

anomaly something that deviates from what is standard, normal, or expected
data cleansing
3-5% mislabeled ground truth in MNIST database
9

1

0

1

7

2

3

9

5

0

3

6

6

0

7

5

0

7

6

3

...
transactions

video surveillance

email
Date: Sat, 12 Aug 2012 14:39:59 UTC
From: "Iglobal"
<tryme@yourdomain.com>
To: ”Mr...
c
o
u
n
t
e
r

f
e
i
t

h
e
a
l
t
h
c

a
r
e

c
o
n
d
i
t
i

o
n

s
e
i
z
u
r
e

s


Many names



One key (counter-intuitive) idea:
focus on the hay…

… not the needle




Machine learning (ooh)
Unsupervised*
Classification*
User
Device
Sensors

Signals
(Data)




Alerts | interventio...
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.







Advantages
Data haystacks .01%
Un...
We sell healthy, green apples!



Bob ... knows apples

common (n=13)

rare (n=1)


Bob “The 8th Dwarf”
8 Dwarf Orchards...
Goal: label instances
(green vs. red)

watercore

greens


green = +1

red = -1

Feature Space

Labels



mass density (...
Test Examples

watercore

Test Examples – Results

Confusion Matrix
Green (G)

not-green
(NG)

Label G

13 (TP)

4 (FP)

L...
Key idea: trade-off mislabeling each class (P vs. N)

Sensitivity

Confusion matrix
True Classes
Green (G)

TPR = TP / (TP...
Idea: distance to “average” example
centroid based anomaly detection

examples
 centroid
 threshold
 anomaly

watercore...
Trait

classic

anomaly

Sensitivity

.928

1.00

Specificity

.200

.833

Feature dependent?
Require labels?
Magic number...
Goal: find densest regions in feature space

Standard deviation



mass density (g/cm3)

Tukey statistic (IQR)



waterc...
Goal: find densest regions in feature space

Flexible



Density based



Robust



watercore



Tunable

mass density...
Goal: find densest regions in feature space







x

xx

“Flood” graph


x

Pick fraction, e.g. 0.5

Mark waterlines
...



Outlier impact
Rich data
 Graphs

 Spatio-temporal
 Text

Use labels
 Online / latency
 Features
 Clustering & ...
APPROACHES

SAMPLE METHODS

Statistical methods
 Distance based methods
 Rule systems
 Profiling Methods
 Model based ...


Problem: Detect seizures in patients from IEEG



Solution: Use one-class SVM to train on 15-minutes of
baseline



P...








Neurological disorder
Electrographic seizures
1% of population
30% non-controllable
EEG, IEEG, MRI, fMRI, PE...
an “obvious” electrographic seizure

9 minutes
Traditional Model
Brain Electrical Activity

Novelty Model
Brain Electrical Activity

baseline

baseline

pre-seizure

sei...
Idea: Capture Spectral Changes

Sliding Windows

Spectrum
frequency

EEG

time



Teager Energy



Curve Length



Shor...
Baseline IEEG
2000

1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Ictal IEEG
2000
1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.

Advantages






Data haystacks .01%
Un...



Questions?
Connect!

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

V. Chandola, A. Banerjee and V...
Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013
Upcoming SlideShare
Loading in …5
×

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

3,587 views

Published on

Anomaly detection is a useful machine learning technique for identifying interesting, valuable or unusual instances in data sets. Applications for anomaly detection are diverse, including: fraud and counterfeit detection; surveillance; network, security and process monitoring; data exploration and more.

In this presentation, I review the basic ideas behind outlier based detectors, and compare this to traditional classification. I highlight practical and advanced issues for performance. Finally, I present an application of anomaly detection for detecting seizures from intracranial EEG time series.

See the accompanying video, http://vimeo.com/71931374

Published in: Technology, Education
  • Be the first to comment

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

  1. 1. Andrew B. Gardner agardner@momentics.com http://linkd.in/1byADxC
  2. 2.  Anomalies Data Science Fairy Tale  Topics in Anomaly Detection  Seizure Detection Example  Summary 
  3. 3. anomaly something that deviates from what is standard, normal, or expected
  4. 4. data cleansing 3-5% mislabeled ground truth in MNIST database 9 1 0 1 7 2 3 9 5 0 3 6 6 0 7 5 0 7 6 3 stock price Volkswagen (VOW.DE) short squeeze, 10/28/2008
  5. 5. transactions video surveillance email Date: Sat, 12 Aug 2012 14:39:59 UTC From: "Iglobal" <tryme@yourdomain.com> To: ”Mr. Foo1" <foo1@freemail.com> Subject: Foo1, Please Confirm Your Position! Hi Foo1, Welcome To The $7 Plan. I Bring in 3 to 5 New Members In Every Day, I can show you how easily. Its to much Fun. Solution #1 It costs too much every month. Not with the $7 Plan! The TOTAL cost is $7 per month. The $7.00 Plan is still holding your position and we have people that are waiting to place under you. That's right only Credit Card Fraud  Campaign Response    Traffic Persons of Interest   Spam Intrusion / Malware
  6. 6. c o u n t e r f e i t h e a l t h c a r e c o n d i t i o n s e i z u r e s
  7. 7.  Many names  One key (counter-intuitive) idea: focus on the hay… … not the needle
  8. 8.    Machine learning (ooh) Unsupervised* Classification* User Device Sensors Signals (Data)   Alerts | intervention Online | batch Features Outputs Detector (Classifier)
  9. 9. Nothing is more expensive than a missed opportunity. – H. Jackson Brown, Jr.      Advantages Data haystacks .01% Unusual = interesting Models $$$ Labels $$$ … Disadvantages?
  10. 10. We sell healthy, green apples!  Bob ... knows apples common (n=13) rare (n=1)  Bob “The 8th Dwarf” 8 Dwarf Orchards, Inc. … sells healthy apples  … studies data science  … does “Big Apple Data”
  11. 11. Goal: label instances (green vs. red) watercore greens  green = +1 red = -1 Feature Space Labels  mass density (g/cm3)   reds Training zi Inputs xi zi yi f :X Y
  12. 12. Test Examples watercore Test Examples – Results Confusion Matrix Green (G) not-green (NG) Label G 13 (TP) 4 (FP) Label NG 1 (FN) 1 (TN) mass density (g/cm3)
  13. 13. Key idea: trade-off mislabeling each class (P vs. N) Sensitivity Confusion matrix True Classes Green (G) TPR = TP / (TP+FN) = 13/14 not-green (NG) Specificity Label G 13 (TP) 4 (FP) Label NG 1 (FN) 1 (TN) P N SPC= TN / (FP+TN) = 1/5 False Positive Rate FPR= FP / (TP+FP) = 4/17 errors on the “positive” class, Green. errors on the “negative” class, not-green.
  14. 14. Idea: distance to “average” example centroid based anomaly detection examples  centroid  threshold  anomaly watercore   mass density (g/cm3) false positive anomaly score
  15. 15. Trait classic anomaly Sensitivity .928 1.00 Specificity .200 .833 Feature dependent? Require labels? Magic numbers? Performance
  16. 16. Goal: find densest regions in feature space Standard deviation  mass density (g/cm3) Tukey statistic (IQR)  watercore  Mahalanobis distance
  17. 17. Goal: find densest regions in feature space Flexible  Density based  Robust  watercore  Tunable mass density (g/cm3) How? the one-class support vector machine
  18. 18. Goal: find densest regions in feature space    x xx “Flood” graph  x Pick fraction, e.g. 0.5 Mark waterlines  Note support The One-class Support Vector Machine Does This
  19. 19.   Outlier impact Rich data  Graphs  Spatio-temporal  Text Use labels  Online / latency  Features  Clustering & alternatives  You Are Here
  20. 20. APPROACHES SAMPLE METHODS Statistical methods  Distance based methods  Rule systems  Profiling Methods  Model based approaches          Kernel methods PCA & subspace methods OCNM & OCSVM CUSUM Nearest neighbors Decision trees Replicator Neural Networks Clustering V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)
  21. 21.  Problem: Detect seizures in patients from IEEG  Solution: Use one-class SVM to train on 15-minutes of baseline  Performance: Improve state-of-the art latency (5 secs) to -13 secs, auto channel selection, unsupervised technique, …  Reference: “One-Class Novelty Detection for Seizure Analysis from Intracranial EEG,” Journal of Machine Learning Research ‘06
  22. 22.       Neurological disorder Electrographic seizures 1% of population 30% non-controllable EEG, IEEG, MRI, fMRI, PET, etc. Cyberonics, Neuropace, NeuroVista,…
  23. 23. an “obvious” electrographic seizure 9 minutes
  24. 24. Traditional Model Brain Electrical Activity Novelty Model Brain Electrical Activity baseline baseline pre-seizure seizure other (e.g., seizures, artifacts, etc.)
  25. 25. Idea: Capture Spectral Changes Sliding Windows Spectrum frequency EEG time  Teager Energy  Curve Length  Short-Term Energy slide & compute
  26. 26. Baseline IEEG 2000 1000 0 -1000 -2000 -10 -5 0 5 0 5 P(seizure) 1 0.5 0 -10 -5 time (minutes)
  27. 27. Ictal IEEG 2000 1000 0 -1000 -2000 -10 -5 0 5 0 5 P(seizure) 1 0.5 0 -10 -5 time (minutes)
  28. 28. Nothing is more expensive than a missed opportunity. – H. Jackson Brown, Jr. Advantages      Data haystacks .01% Unusual = interesting Models $$$ Labels $$$ … Challenges       Features FTW Normal = ? Deviation = ? False positives Adaptation …
  29. 29.   Questions? Connect! Andrew B. Gardner agardner@momentics.com http://linkd.in/1byADxC V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)

×