Outlier Selection and One Class Classification by Jeroen Janssens

2,416 views

Published on

I present a novel algorithm called Stochastic Outlier Selection (SOS). The SOS algorithm computes for each data point an outlier probability. These probabilities are more intuitive than the unbounded outlier scores computed by existing outlier-selection algorithms. I have evaluated SOS on a variety of real-world and synthetic datasets, and compared it to four state-of-the-art outlier-selection algorithms. The results show that SOS has a superior performance while being more robust to data perturbations and parameter settings.

Published in: Technology, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,416
On SlideShare
0
From Embeds
0
Number of Embeds
1,146
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Outlier Selection and One Class Classification by Jeroen Janssens

  1. 1. Outlier Selection and One-Class Classification . Jeroen Janssens @jeroenhjanssens
  2. 2. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Overview • Anomalies and outliers • Stochastic Outlier Selection • Experiments and results Outlier Selection and One-Class Classification Jeroen Janssens
  3. 3. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Anomalies and outliers Outlier Selection and One-Class Classification Jeroen Janssens
  4. 4. .
  5. 5. .
  6. 6. .
  7. 7. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Definition (Anomaly) An anomaly is an observation or event that deviates qualitatively from what is considered to be normal, according to a domain expert. Outlier Selection and One-Class Classification Jeroen Janssens
  8. 8. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Detecting anomalies is important • Expensive • Dangerous Outlier Selection and One-Class Classification Jeroen Janssens
  9. 9. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Detecting anomalies is important • Expensive • Dangerous • Mess up your model Outlier Selection and One-Class Classification Jeroen Janssens
  10. 10. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Human anomaly detection may suffer from • Fatigue • Information overload • Emotional bias Outlier Selection and One-Class Classification Jeroen Janssens
  11. 11. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Computers work with numbers Observations . Outlier Selection and One-Class Classification Jeroen Janssens
  12. 12. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Computers work with numbers Observations Data points width height label 8.4 7.3 apple 6.7 7.1 orange 8.0 6.8 apple 7.4 7.2 apple 9.6 9.2 orange … … … Outlier Selection and One-Class Classification . Jeroen Janssens
  13. 13. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Computers work with numbers Data points width height label 8.4 7.3 apple 6.7 7.1 orange 8.0 6.8 apple 7.4 7.2 apple 9.6 9.2 orange … … … Outlier Selection and One-Class Classification Visualisation 10 8 . 6 6 . .apple . .orange 4 8 10 width height Observations Jeroen Janssens
  14. 14. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From anomaly to outlier rate of turn . .anomalous vessel . .normal vessel . . Outlier Selection and One-Class Classification speed over ground Jeroen Janssens
  15. 15. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Definition (Outlier) An outlier is a data point that deviates quantitatively from the majority of the data points, according to an outlier-selection algorithm. Outlier Selection and One-Class Classification Jeroen Janssens
  16. 16. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Three standard deviations . . outlier . . inlier 3σ . −3σ Outlier Selection and One-Class Classification −2σ −1σ μ x 1σ 2σ 3σ Jeroen Janssens
  17. 17. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend real-world observations (a) Experiments and results . . . . . . . . . . . Set notation X . X Outlier Selection and One-Class Classification Jeroen Janssens
  18. 18. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend Experiments and results . . . . . . . . . . . Set notation real-world observations labeled by expert as anomalous labeled by expert as normal A X ∩A . X (a) (b) X A Outlier Selection and One-Class Classification X Jeroen Janssens
  19. 19. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend Experiments and results . . . . . . . . . . . Set notation real-world observations A X X (c) labeled by expert as anomalous labeled by expert as normal A X ∩A data points unrecorded D X ∩D . X (a) (b) X A Outlier Selection and One-Class Classification D Jeroen Janssens
  20. 20. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend X (d) A . Outlier Selection and One-Class Classification D anomalies represented as data points normalities represented as data points Experiments and results . . . . . . . . . . . Set notation CA = D ∩ A CN = D ∩ A Jeroen Janssens
  21. 21. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend X (d) A . D X (e) A CO D Outlier Selection and One-Class Classification Experiments and results . . . . . . . . . . . Set notation anomalies represented as data points normalities represented as data points CA = D ∩ A CN = D ∩ A classified by algorithm as an outlier classified by algorithm as an inlier CO CI = D ∩ CO Jeroen Janssens
  22. 22. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Euler diagram Set legend X (d) A . D X (e) CO D A X (f) A CO D Outlier Selection and One-Class Classification Experiments and results . . . . . . . . . . . Set notation anomalies represented as data points normalities represented as data points CA = D ∩ A CN = D ∩ A classified by algorithm as an outlier classified by algorithm as an inlier CO CI = D ∩ CO hits false alarms misses correct rejects H = CA ∩ CO FA = CN ∩ CO M= CA ∩ CI CR = CN ∩ CI Jeroen Janssens
  23. 23. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Confusion matrix Expert labels the observation as a(n) Outlier Selection and One-Class Classification Outlier (CO ) Normality (CN ) hit . Hi false alarm . FA . Inlier (CI ) Algorithm classi es the data point as an Anomaly (CA ) miss . Mi correct. reject CR Jeroen Janssens
  24. 24. Labels by the expert Data set . .anomaly .normality . . .data point . . . . Classi cations by the algorithm . .inlier .outlier . . Outcome . .hit .false alarm .miss .correct reject . . . .
  25. 25. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Stochastic Outlier Selection Outlier Selection and One-Class Classification Jeroen Janssens
  26. 26. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Stochastic Outlier Selection • Unsupervised outlier selection algorithm • Employs concept of affinity • Computes outlier probabilities • One parameter: perplexity • Inspired by t-SNE Outlier Selection and One-Class Classification Jeroen Janssens
  27. 27. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . A data point is selected as an outlier when all the other data points have insufficient affinity with it. Outlier Selection and One-Class Classification Jeroen Janssens
  28. 28. .
  29. 29. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From input to output Subsections: 4.2.1 4.2.2 n X 1 2 3 4 5 6 .1 4.2.3 D 2 8 6 4 2 . . 0 m Outlier Selection and One-Class Classification 1 2 3 4 5 6 .1 A 2 3 4 5 6 8 6 4 2 . n 4.2.4 – 6 0 1 2 3 4 5 6 .1 B 2 3 4 5 6 . n 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 .1 2 3 4 5 6 0.3 0.2 0.1 . n 0 1 2 3 4 5 6 Φ .1 . 1 0.8 0.6 0.4 0.2 0 1 Jeroen Janssens
  30. 30. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Demo: SOS on the command-line (see http://sos.jeroenjanssens.com) Outlier Selection and One-Class Classification Jeroen Janssens
  31. 31. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From feature matrix to dissimilarity matrix second feature x2 4 . 3 x4 0. 0 = 5.056 d2,6 ≡ d6,2 x5 Outlier Selection and One-Class Classification 1 2 3 4 5 6 x6 x2 x1 1 X x3 2 1 Equation 2.1 . data point 2 3 4 5 6 rst feature x1 7 8 9 .1 D 2 8 6 4 2 . 0 1 2 3 4 5 6 .1 2 3 4 5 6 8 6 4 2 . 0 Jeroen Janssens
  32. 32. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From feature matrix to dissimilarity matrix second feature x2 4 . 3 x4 0. 0 = 5.056 d2,6 ≡ d6,2 x5 2 3 4 5 6 rst feature x1 dij = Outlier Selection and One-Class Classification 1 2 3 4 5 6 x6 x2 x1 1 X x3 2 1 Equation 2.1 . data point 7 m 8 9 .1 D 2 8 6 4 2 . 0 1 2 3 4 5 6 .1 2 3 4 5 6 8 6 4 2 . 0 2 ∑ (xjk − xik ) k =1 Jeroen Janssens
  33. 33. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Affinity between data points . . . affinity aij 1 0.8 0.6 . σ 2 = 0 .1 i .σ 2 = 1 i . σ 2 = 10 i Equation 4.2 D 1 2 3 4 5 6 0.4 0.2 0. 0 2 Outlier Selection and One-Class Classification 4 6 dissimilarity dij 8 10 .1 A 2 3 4 5 6 8 6 4 2 . 0 1 2 3 4 5 6 .1 2 3 4 5 6 . 1 0.8 0.6 0.4 0.2 0 Jeroen Janssens
  34. 34. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Affinity between data points . . . affinity aij 1 0.8 0.6 . σ 2 = 0 .1 i .σ 2 = 1 i . σ 2 = 10 i Equation 4.2 D 1 2 3 4 5 6 0.4 0.2 0. 0 2 4 6 dissimilarity dij 8 .1 10 ⎧ ⎪ exp (−dij2 / 2σi2 ) ⎪ aij = ⎨ ⎪0 ⎪ ⎩ Outlier Selection and One-Class Classification A 2 3 4 5 6 8 6 4 2 . 0 1 2 3 4 5 6 .1 2 3 4 5 6 . 1 0.8 0.6 0.4 0.2 0 if i ≠ j if i = j Jeroen Janssens
  35. 35. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Smooth neighborhoods 6 . . . second feature x2 5 .σ 2 . 1 .σ 2 . 3 .σ 2 . 5 4 3 x4 6 x3 σ 62 x5 2 1 .σ 2 2 .σ 2 4 .σ 2 x6 x2 x1 0 −1 . −1 Outlier Selection and One-Class Classification 0 1 2 4 5 3 rst feature x1 6 7 8 9 Jeroen Janssens
  36. 36. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From affinity to binding probability Equation 4.5 / 4.6 A . Outlier Selection and One-Class Classification 1 2 3 4 5 6 .1 B 2 3 4 5 6 . 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 .1 2 3 4 5 6 0.3 0.2 0.1 . 0 Jeroen Janssens
  37. 37. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . From affinity to binding probability Equation 4.5 / 4.6 A . 1 2 3 4 5 6 .1 B 2 3 4 5 6 . 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 .1 2 3 4 5 6 0.3 0.2 0.1 . 0 bij = p (i → j ∈ EG ) ∝ aij aij bij = n ∑k=1 aik Outlier Selection and One-Class Classification Jeroen Janssens
  38. 38. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Binding probabilities B v3 v4 .21 . (a) .25 v5 v6 .25 v.1 .24 v2 v3 v4 (b) Outlier Selection and One-Class Classification v5 .04 1 2 3 4 5 6 .1 2 3 4 5 6 . 0.25 0.2 0.15 0.1 0.05 0 B .10 .29 .21 v6 1 2 3 4 5 .1 2 3 4 5 6 0.3 0.2 . 0.1 Jeroen Janssens
  39. 39. v3 v4 Anomalies and outliers . . . . . . . . . . . . .21 (a) .25 Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . v5 Binding probabilities v v6 .25 .24 v.1 2 v3 v4 . .04 .29 v6 .21 v2 v1 .1 2 3 4 5 6 Experiments and results . .0.25. . . . . . . . . . 0.2 0.15 0.1 0.05 0 B .10 v5 (b) 1 2 3 4 5 6 .30 1 2 3 4 5 6 .1 2 3 4 5 6 0.3 0.2 0.1 . . 0 .10 B v3 v4 (c) v5 .24 .22 Outlier Selection and One-Class Classification .18 v6 1 2 3 4 5 .1 2 3 4 5 6 0.25 0.2 0.15 0.1 0.05Jeroen Janssens
  40. 40. Anomalies and outliers . . . . . . . . . . . . v3 v4 .10 Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . .29 v5 (b) Binding probabilities v .30 2 v1 v6 .21 1 2 3 4 5 6 .1 2 3 4 5 6 Experiments and results . .0.3 . . . . . . . . . 0.2 0.1 . . 0 . 0.25 0.2 0.15 0.1 0.05 0 .10 B v3 v4 . (c) v5 .24 .22 v1 v6 .18 v2 1 2 3 4 5 6 .1 2 3 4 5 6 .23 .10 .04 (d) B v3 v4 v5 .05 .04 v6 .05 Outlier Selection and One-Class Classification 1 2 3 4 5 .1 2 3 4 5 .6 0.05 0.04 0.03 0.02 0.01Jeroen Janssens
  41. 41. Anomalies and outliers . . . . . . . . . . . . v3 v4 (c) Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . v5 .24 v6 Binding probabilities v .22 v1 .18 2 1 2 3 4 5 6 .1 2 3 4 5 6 Experiments and results . .0.25. . . . . . . . . . 0.2 0.15 0.1 0.05 0 . 0.05 0.04 0.03 0.02 0.01 0 .23 .10 .04 . .05 .04 v5 (d) B v3 v4 v6 .05 v1 Outlier Selection and One-Class Classification v2 .04 1 2 3 4 5 6 .1 2 3 4 5 .6 Jeroen Janssens
  42. 42. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Stochastic Neighbor Graph G = (V, EG ) Outlier Selection and One-Class Classification Jeroen Janssens
  43. 43. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Stochastic Neighbor Graph G = (V, EG ) p(G) = ∏ bij . i→j ∈ EG Outlier Selection and One-Class Classification Jeroen Janssens
  44. 44. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Stochastic Neighbor Graph G = (V, EG ) p(G) = ∏ bij . i→j ∈ EG CO ∣ G = {xi ∈ X ∣ deg− (vi ) = 0} G = {xi ∈ X ∣ ∄vj ∈ V ∶ j → i ∈ EG } = {xi ∈ X ∣ ∀vj ∈ V ∶ j → i ∉ EG } Outlier Selection and One-Class Classification Jeroen Janssens
  45. 45. v3 v4 p(Ga ) = 3.931 ⋅ 10−4 v5 (Ga ) v6 v3 v4 p(Gb ) = 4.562 ⋅ 10−5 v5 (Gb ) v6 v3 v4 p(Gc ) = 5.950 ⋅ 10−7 v5 v1 CO ∣Gb = {x5 , x6 } v2 v1 (Gc ) CO ∣Ga = {x1 , x4 , x6 } v2 v.1 v6 v2 CO ∣Gc = {x1 , x3 }
  46. 46. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Set of all SNGs ⋅10−4 8 6 Ga 4 2 Gb Gc 0. Outlier Selection and One-Class Classification G cumulative probability mass probability mass 10 1 . . . 0.8 . h = 4.0 . h = 4.5 . h = 5.0 0.6 0.4 0.2 0. G Jeroen Janssens
  47. 47. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Approximating outlier probabilities by sampling SNGs 1 . . . . . . outlier probability 0.8 0.6 . x1 . x2 . x3 . x4 . x5 . x6 0.4 0.2 0 .0 10 101 102 p(xi ∈ CO ) = lim S→∞ Outlier Selection and One-Class Classification 1 S 103 sampling iteration S 104 ∑ I{xi ∈ CO ∣ G(s) } , s=1 105 106 G(s) ∼ P(G) Jeroen Janssens
  48. 48. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Demo: Sampling SNGs in CoffeeScript and D3 (see http://sos.jeroenjanssens.com) Outlier Selection and One-Class Classification Jeroen Janssens
  49. 49. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Computing outlier probabilities through marginalisation p(xi ∈ CO ) = ∑ I{xi ∈ CO ∣ G} ⋅ p(G) G∈G = ∑ I{xi ∈ CO ∣ G} ⋅ ∏ bqr . G∈G q→r ∈ EG ∣G∣ = (n − 1)n Outlier Selection and One-Class Classification Jeroen Janssens
  50. 50. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Computing outlier probabilities in closed form 4 B 1 2 3 4 5 6 .1 2 3 4 5 6 0.3 0.2 0.1 . 0 1 2 3 4 5 6 Φ .1 . 1 0.8 0.6 0.4 0.2 0 second feature x2 Equation 4.14 . 3 x4 x3 x5 2 1 0. 0 . data point x6 x2 x1 1 2 3 4 5 6 rst feature x1 7 8 9 p(xi ∈ CO ) = ∏ (1 − bji ) j≠ i Outlier Selection and One-Class Classification Jeroen Janssens
  51. 51. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . xi ∈ CO ∣ G ⇐⇒ deg− (vi ) = 0 G Experiments and results . . . . . . . . . . . (1) p(xi ∈ CO ) = EG [I{deg− (vi ) = 0}] G (2) p(xi ∈ CO ) = EG [∏ I{ j → i ∉ EG }] (3) p(xi ∈ CO ) = EG [∏ (1 − I{ j → i ∈ EG })] (4) p(xi ∈ CO ) = ∏ (1 − EG [I{ j → i ∈ EG }]) (5) p(xi ∈ CO ) = ∏ (1 − p( j → i ∈ EG )) (6) p(xi ∈ CO ) = ∏ (1 − bji ) (7) j≠ i j≠ i j≠ i j≠ i j≠ i Outlier Selection and One-Class Classification Jeroen Janssens
  52. 52. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Selecting outliers 5 . . outlier . . inlier second feature x2 4 3 x4 x3 x5 2 1 x6 x2 x1 0 −1 . −1 0 1 2 ⎧ ⎪ outlier ⎪ f(x ) = ⎨ ⎪ inlier ⎪ ⎩ Outlier Selection and One-Class Classification 4 5 3 rst feature x1 6 7 8 9 if p(x ∈ CO ) > θ, if p(x ∈ CO ) ≤ θ. Jeroen Janssens
  53. 53. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Adaptive variances via the perplexity parameter . . . . . . . perplexity h(bi ) 5 4 3 . h = 4.5 . x1 . x2 . x3 . x4 . x5 . x6 2 1 . 10−1 . 100 variance . 101 σ i2 102 5 15 10 variance σ i2 n h(bi ) = 2H(bi ) , H(bi ) = − ∑ bij log2 (bij ) j= 1 j≠ i Outlier Selection and One-Class Classification Jeroen Janssens
  54. 54. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Continuous binary search . . . . . . variance σ i2 15 10 perplexity h(bi ) 5 . x1 . x2 . x3 . x4 . x5 . x6 . 4.8 . 4.6 4.4 4.2 0 Outlier Selection and One-Class Classification 1 2 3 4 5 7 6 binary search iteration 8 9 10 Jeroen Janssens
  55. 55. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Perplexity influences outlier probabilities 1 . . . . . . outlier probability 0.8 0.6 . x1 . x2 . x3 . x4 . x5 . x6 0.4 0.2 0. 0 Outlier Selection and One-Class Classification 1 2 4 3 perplexity h 5 6 Jeroen Janssens
  56. 56. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Experiments and results Outlier Selection and One-Class Classification Jeroen Janssens
  57. 57. SOS (h = 10) KNNDD (k = 10) LOF (k = 10) LOCI LSOD A B Densities Ring Banana . . . . . . .H . J . . . . . . . .G . I . . . C . D . . . F . . . . E
  58. 58. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . DM = Iris ower data set One-class datasets Petal width 3 D1 . . .CA = Versicolor ∪ Virginica . .CN = Setosa Outlier Selection and One-Class Classification . . C1 = Setosa . . C2 = Versicolor . . C3 = Virginica 2 1 0. 4 . Experiments and results . . . . . . . . . . . 5 7 6 Sepal length 8 D3 D2 . . . .CA = Setosa ∪ Virginica . .CN = Versicolor . . .CA = Setosa ∪ Versicolor . .CN = Virginica Jeroen Janssens
  59. 59. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Weighted AUC 1 1 . . hit . false alarm . miss . correct reject . . . ′ 1 θ = 0.57 0.8 0.8 θ 0.5 ′ 0.6 0.6 0.4 CI 0. hit rate . outlier score CO CA Outlier Selection and One-Class Classification CN 0.4 0.2 0. 0 0.2 0.4 0.6 false alarm rate 0.8 . 1 0.2 Jeroen Janssens
  60. 60. Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . . LSOD . . . .. ... . . . . . .. .. .. . .. . . . . . . .. . . . .. . . . . . . .. . .. . .. .. .. . . . . .. . .. . . . . .. 0.5 . Eco Bre Del Wa Win Co Bio Veh Gla Bos He Arrh Ha Bre Liv SPE Hep b e ast ft P ve t a M a s e lo li C a W. um form Recon Ge ed icle S s Iden on Hort Dis ythm erma st W. r Diso TF H titis ilho ti Ori p usi ease ia n’s S New rde ear Ge gnit ne gin uet cat ng rs t ur v ner ion al tes ion iva ato l r Outlier Selection and One-Class Classification . . .. . . Iris . 0. . LOCI . . . .. . . . .. . 0.6 . LOF . . . .. . .. . . . .. . . . . . .. . ... . . . ... .. . . . .. . .. . . 0.7 Experiments and results . . . . . . . . . . . . KNNDD . . weighted AUC . SOS . . . .. . 0.8 . . .. . . .. . 0.9 . . .. . 1 .... Real-world datasets . .. .. Anomalies and outliers . . . . . . . . . . . . Jeroen Janssens
  61. 61. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Synthetic datasets Parameter λ Data set Determines (a) (b) (c) (d) (e) (f) (g) Radius of ring Cardinality of cluster and ring Distance between clusters Cardinality of one cluster and ring Density of one cluster and ring Radius of square ring Radius of square ring Outlier Selection and One-Class Classification λ start step size λ end 5 100 4 0 1 2 2 0.1 5 0.1 5 0.05 0.05 0.05 2 5 0 45 0 0.8 0.8 Jeroen Janssens
  62. 62. (a) (b) (c) (d) (e) (f) (g) . . . . . . . λ inter . . . . . . . . . λ start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . λ end . .
  63. 63. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Results on synthetic datasets (a) (b) 1 AUC 0.9 0.8 0.7 0.6 . 4 . 3.5 3 λ 2.5 2 40 30 20 10 30 40 . λ (c) . (d) 1 AUC 0.9 0.8 0.7 . 0.6 4 Outlier Selection and One-Class Classification 3 2 λ (e) 1 0 . 10 20 λ (f) Jeroen Janssens
  64. 64. Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . 0.9 AUC Anomalies and outliers . . . . . . . . . . . . Experiments and results . . . . . . . . . . . 0.8 0.7 Results on synthetic datasets . 0.6 4 3 1 2 λ 0 . 10 20 30 40 λ (e) (f) 1 AUC 0.9 0.8 0.7 0.6 0 .5 . . 0.4 0.3 0.2 0.1 1.4 0 λ 1 .2 0.8 1 λ . (g) . . . . . 1 AUC 0.9 0.8 0.7 0.6 1.4 1.2 1 0.8 . SOS . KNNDD . LOF . LOCI . LSOD . λ Outlier Selection and One-Class Classification Jeroen Janssens
  65. 65. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . SOS performs significantly better cd (p < .05) 5 . 4 3 2 1 LOCI KNNDD SOS LOF LSOD cd (p < .01) 5 LOCI LSOD KNNDD Outlier Selection and One-Class Classification 4 3 2 1 SOS LOF Jeroen Janssens
  66. 66. Anomalies and outliers . . . . . . . . . . . . Stochastic Outlier Selection . . . . . . . . . . . . . . . . . . . . . . . Experiments and results . . . . . . . . . . . Conclusion • Outlier selection can support the detection of anomalies • SOS is an intuitive and probabilistic algorithm to select outliers • SOS has a very good performance • No free lunch Outlier Selection and One-Class Classification Jeroen Janssens
  67. 67. Outlier Selection and One-Class Classification . Jeroen Janssens @jeroenhjanssens

×