Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC
 Anomalies

Data Science Fairy Tale
 Topics in Anomaly Detection
 Seizure Detection Example
 Summary

anomaly something that deviates from what is standard, normal, or expected
data cleansing
3-5% mislabeled ground truth in MNIST database
9

1

0

1

7

2

3

9

5

0

3

6

6

0

7

5

0

7

6

3

...
transactions

video surveillance

email
Date: Sat, 12 Aug 2012 14:39:59 UTC
From: "Iglobal"
<tryme@yourdomain.com>
To: ”Mr...
c
o
u
n
t
e
r

f
e
i
t

h
e
a
l
t
h
c

a
r
e

c
o
n
d
i
t
i

o
n

s
e
i
z
u
r
e

s


Many names



One key (counter-intuitive) idea:
focus on the hay…

… not the needle




Machine learning (ooh)
Unsupervised*
Classification*
User
Device
Sensors

Signals
(Data)




Alerts | interventio...
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.







Advantages
Data haystacks .01%
Un...
We sell healthy, green apples!



Bob ... knows apples

common (n=13)

rare (n=1)


Bob “The 8th Dwarf”
8 Dwarf Orchards...
Goal: label instances
(green vs. red)

watercore

greens


green = +1

red = -1

Feature Space

Labels



mass density (...
Test Examples

watercore

Test Examples – Results

Confusion Matrix
Green (G)

not-green
(NG)

Label G

13 (TP)

4 (FP)

L...
Key idea: trade-off mislabeling each class (P vs. N)

Sensitivity

Confusion matrix
True Classes
Green (G)

TPR = TP / (TP...
Idea: distance to “average” example
centroid based anomaly detection

examples
 centroid
 threshold
 anomaly

watercore...
Trait

classic

anomaly

Sensitivity

.928

1.00

Specificity

.200

.833

Feature dependent?
Require labels?
Magic number...
Goal: find densest regions in feature space

Standard deviation



mass density (g/cm3)

Tukey statistic (IQR)



waterc...
Goal: find densest regions in feature space

Flexible



Density based



Robust



watercore



Tunable

mass density...
Goal: find densest regions in feature space







x

xx

“Flood” graph


x

Pick fraction, e.g. 0.5

Mark waterlines
...



Outlier impact
Rich data
 Graphs

 Spatio-temporal
 Text

Use labels
 Online / latency
 Features
 Clustering & ...
APPROACHES

SAMPLE METHODS

Statistical methods
 Distance based methods
 Rule systems
 Profiling Methods
 Model based ...


Problem: Detect seizures in patients from IEEG



Solution: Use one-class SVM to train on 15-minutes of
baseline



P...








Neurological disorder
Electrographic seizures
1% of population
30% non-controllable
EEG, IEEG, MRI, fMRI, PE...
an “obvious” electrographic seizure

9 minutes
Traditional Model
Brain Electrical Activity

Novelty Model
Brain Electrical Activity

baseline

baseline

pre-seizure

sei...
Idea: Capture Spectral Changes

Sliding Windows

Spectrum
frequency

EEG

time



Teager Energy



Curve Length



Shor...
Baseline IEEG
2000

1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Ictal IEEG
2000
1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.

Advantages






Data haystacks .01%
Un...



Questions?
Connect!

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

V. Chandola, A. Banerjee and V...
Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013
Upcoming SlideShare
Loading in...5
×

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

1,378

Published on

Anomaly detection is a useful machine learning technique for identifying interesting, valuable or unusual instances in data sets. Applications for anomaly detection are diverse, including: fraud and counterfeit detection; surveillance; network, security and process monitoring; data exploration and more.

In this presentation, I review the basic ideas behind outlier based detectors, and compare this to traditional classification. I highlight practical and advanced issues for performance. Finally, I present an application of anomaly detection for detecting seizures from intracranial EEG time series.

See the accompanying video, http://vimeo.com/71931374

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,378
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • (1:00)Thank organizers &amp; attendeesMy background thesisInvitation to connect
  • (1:00)Anomaly detection is intuitiveRequires a contextRequires a measure
  • (0:45)MNIST database of handwritten digits. Longstanding story about accuracy of the data set.Volkswagen share price from 210EUR -&gt; 1005EUR. Porsche disclosed holdings, including options that intended to acquire the underlying in. This was going to deplete the float, which caused a run by short sellers. (http://www.risk.net/risk-magazine/feature/1498381/the-volkswagen-squeeze)Anomalies focus our attention
  • (0:45)Anomalies have intrinsic valuebusiness, social and scientific valuetransactions, like insurance, purchases, returns, etc., looking for unusual good and bad behavior. Canonical example is credit card fraud, for instance my recent “purchase” of wine in SpainVideo surveillance, directly examining people, vehicles, and scenes for gait, position, counts, etc. to determine unusual traffic, intent, directionEmail – canonical example is the spam scam. Anomalous to me individually by content, sender, etc.Anomalous to recipients of an ISP because of the number of spreadMalware – anomalous mailings by me
  • (0:45)Often overlookedTwo axesExpensive to acquire examplesExpensive to miss anomaliesCurrency – secret service tv episodeConditions – life safety, services, etcSeizures
  • Anomalies everywhereChanging perspective
  • Machine learning makes it happenIdeal vs. real systemAlertsbc of intervention costOnline is rareWorkflow is similar
  • Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus
  • Meet bobRed apples are “poison” so build a healthy (green) apple detector
  • RFA request for applesCount all combinations of “what I said It was” x “what it actually was” -&gt; confusion matrixNote the unforeseen apple examples: rotten, yellow, etc.These unanticipated counter-examples are one reason why traditional classification “breaks”
  • Confusion matrices are … confusingReduce to two statistics (sens, spec)Fpr is related to specSens: how well do we do on green applesSpec: how well do we do on the othersExample: can build a perfect green apple detector by labeling all apples green. That’s highly sensitive, but not specific
  • Watercore is a real produce feature!This works pretty well for some problems, but there are issues as we will see…
  • Tukey = nonparametric, spherical region of supportStddev = parametric, spherical region of supportMahalanobis = elliptical, generalization of stddev, tighter bounds but more expensive to computeIn practice, mahalanobis performs nicely
  • Ideal case: find statistically significant “islands”Curiously, outliers distort this taskThe one-class SVM is the canonical, golden algorithm to achieve this Oracle Data Mining implements one-class svmThere are better variants, now, like OCNM
  • Outlier pruning before modeling can helpRich data has representation challengesHow do you encode feature vectors?What is an anomaly?How do you define normal?Semisupervised technique: do anomaly detection + use labels for classifyingIf online system, concerned with latencyFeatures matter, even more so for anomaly detectionClustering is an alternative and related problem. Many other related problems. Maybe worth considering.
  • Good survey paperThey create a taxonomy of techniquesExamples of AD techniques listed Note familiar methods: lots of ML algorithms can be reworked as anomaly detectionStrategies:Find a technique that works for your dataMap your data so it works with your favorite techniqueInvent your own technique
  • When non-controllable, looking at Surgical brain resection (gold standard)Implantable device (experimental)alternative
  • Real 20-minictal EEGSeizures not so obvious in raw time series form
  • We pick simple but robust features from the speech and signal processing literatureTime series almost never useful in raw formUse sliding window approachesHow to pick window width?What about multiscale phenomena
  • Interictal (baseline) features vsictal (seizure)Notice that feature distributions shift during seizure = anomaly
  • Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus
  • Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

    1. 1. Andrew B. Gardner agardner@momentics.com http://linkd.in/1byADxC
    2. 2.  Anomalies Data Science Fairy Tale  Topics in Anomaly Detection  Seizure Detection Example  Summary 
    3. 3. anomaly something that deviates from what is standard, normal, or expected
    4. 4. data cleansing 3-5% mislabeled ground truth in MNIST database 9 1 0 1 7 2 3 9 5 0 3 6 6 0 7 5 0 7 6 3 stock price Volkswagen (VOW.DE) short squeeze, 10/28/2008
    5. 5. transactions video surveillance email Date: Sat, 12 Aug 2012 14:39:59 UTC From: "Iglobal" <tryme@yourdomain.com> To: ”Mr. Foo1" <foo1@freemail.com> Subject: Foo1, Please Confirm Your Position! Hi Foo1, Welcome To The $7 Plan. I Bring in 3 to 5 New Members In Every Day, I can show you how easily. Its to much Fun. Solution #1 It costs too much every month. Not with the $7 Plan! The TOTAL cost is $7 per month. The $7.00 Plan is still holding your position and we have people that are waiting to place under you. That's right only Credit Card Fraud  Campaign Response    Traffic Persons of Interest   Spam Intrusion / Malware
    6. 6. c o u n t e r f e i t h e a l t h c a r e c o n d i t i o n s e i z u r e s
    7. 7.  Many names  One key (counter-intuitive) idea: focus on the hay… … not the needle
    8. 8.    Machine learning (ooh) Unsupervised* Classification* User Device Sensors Signals (Data)   Alerts | intervention Online | batch Features Outputs Detector (Classifier)
    9. 9. Nothing is more expensive than a missed opportunity. – H. Jackson Brown, Jr.      Advantages Data haystacks .01% Unusual = interesting Models $$$ Labels $$$ … Disadvantages?
    10. 10. We sell healthy, green apples!  Bob ... knows apples common (n=13) rare (n=1)  Bob “The 8th Dwarf” 8 Dwarf Orchards, Inc. … sells healthy apples  … studies data science  … does “Big Apple Data”
    11. 11. Goal: label instances (green vs. red) watercore greens  green = +1 red = -1 Feature Space Labels  mass density (g/cm3)   reds Training zi Inputs xi zi yi f :X Y
    12. 12. Test Examples watercore Test Examples – Results Confusion Matrix Green (G) not-green (NG) Label G 13 (TP) 4 (FP) Label NG 1 (FN) 1 (TN) mass density (g/cm3)
    13. 13. Key idea: trade-off mislabeling each class (P vs. N) Sensitivity Confusion matrix True Classes Green (G) TPR = TP / (TP+FN) = 13/14 not-green (NG) Specificity Label G 13 (TP) 4 (FP) Label NG 1 (FN) 1 (TN) P N SPC= TN / (FP+TN) = 1/5 False Positive Rate FPR= FP / (TP+FP) = 4/17 errors on the “positive” class, Green. errors on the “negative” class, not-green.
    14. 14. Idea: distance to “average” example centroid based anomaly detection examples  centroid  threshold  anomaly watercore   mass density (g/cm3) false positive anomaly score
    15. 15. Trait classic anomaly Sensitivity .928 1.00 Specificity .200 .833 Feature dependent? Require labels? Magic numbers? Performance
    16. 16. Goal: find densest regions in feature space Standard deviation  mass density (g/cm3) Tukey statistic (IQR)  watercore  Mahalanobis distance
    17. 17. Goal: find densest regions in feature space Flexible  Density based  Robust  watercore  Tunable mass density (g/cm3) How? the one-class support vector machine
    18. 18. Goal: find densest regions in feature space    x xx “Flood” graph  x Pick fraction, e.g. 0.5 Mark waterlines  Note support The One-class Support Vector Machine Does This
    19. 19.   Outlier impact Rich data  Graphs  Spatio-temporal  Text Use labels  Online / latency  Features  Clustering & alternatives  You Are Here
    20. 20. APPROACHES SAMPLE METHODS Statistical methods  Distance based methods  Rule systems  Profiling Methods  Model based approaches          Kernel methods PCA & subspace methods OCNM & OCSVM CUSUM Nearest neighbors Decision trees Replicator Neural Networks Clustering V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)
    21. 21.  Problem: Detect seizures in patients from IEEG  Solution: Use one-class SVM to train on 15-minutes of baseline  Performance: Improve state-of-the art latency (5 secs) to -13 secs, auto channel selection, unsupervised technique, …  Reference: “One-Class Novelty Detection for Seizure Analysis from Intracranial EEG,” Journal of Machine Learning Research ‘06
    22. 22.       Neurological disorder Electrographic seizures 1% of population 30% non-controllable EEG, IEEG, MRI, fMRI, PET, etc. Cyberonics, Neuropace, NeuroVista,…
    23. 23. an “obvious” electrographic seizure 9 minutes
    24. 24. Traditional Model Brain Electrical Activity Novelty Model Brain Electrical Activity baseline baseline pre-seizure seizure other (e.g., seizures, artifacts, etc.)
    25. 25. Idea: Capture Spectral Changes Sliding Windows Spectrum frequency EEG time  Teager Energy  Curve Length  Short-Term Energy slide & compute
    26. 26. Baseline IEEG 2000 1000 0 -1000 -2000 -10 -5 0 5 0 5 P(seizure) 1 0.5 0 -10 -5 time (minutes)
    27. 27. Ictal IEEG 2000 1000 0 -1000 -2000 -10 -5 0 5 0 5 P(seizure) 1 0.5 0 -10 -5 time (minutes)
    28. 28. Nothing is more expensive than a missed opportunity. – H. Jackson Brown, Jr. Advantages      Data haystacks .01% Unusual = interesting Models $$$ Labels $$$ … Challenges       Features FTW Normal = ? Deviation = ? False positives Adaptation …
    29. 29.   Questions? Connect! Andrew B. Gardner agardner@momentics.com http://linkd.in/1byADxC V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)

    ×