Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

Rayana & Akoglu
Shebuti Rayana* Leman Akoglu
May 2, 2015

Rayana & Akoglu 2Less is More: Building Selective Anomaly Ensembles
Network intrusion
At time point t
Time tick 7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Score
Time tick
Event Detection

Emerging Topic in Social Media
Nepal Earth Quake 2015
tweets, retweets with
• #Nepal
• #NepalEarthQuake
• #NepalEarthQuakeRelief
• …
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20
Score
Time tick
Event Detection
25th April 2015

Rayana & Akoglu 4
 Given a sequence of graphs {G1, G2, … , Gt, …, GT}
 Find time points t’ at which Gt’ changes significantly
from Gt’-1
Less is More: Building Selective Anomaly Ensembles
time
similarity/distance scores

 Numerous algorithms for event detection
 no “winner” algorithm across datasets
 Idea: ensemble approach
 Combine strength of accurate detectors
 Alleviate weakness of inaccurate detectors
Improved accuracy, reduced noise
More robust performance
Better than individual base detectors
T. G. Dietterich. Ensemble methods in machine learning. Springer, 2000
J. Ghosh and A. Acharya. Cluster ensembles: Theory and applications. 2013.

Rayana & Akoglu 6
 Idea: ensemble approach
 Challenge: building anomaly ensembles –
a fully unsupervised task
 No labels to guide for detector accuracy
 No objective function inherent to task
 Combining all the results may deteriorate the
overall ensemble accuracy [Rayana&Akoglu’14]
▪ some detectors may be inaccurate
We build SELECTive anomaly ensembles
- identify (in)accurate detectors
- in unsupervised fashion

EventDetection

Eigen-behaviors
Parametric modeling
SPIRIT
Z-score
1 – norm.
(sum
p-value)
projection
Subspace Method
Moving Average
SPE
Agg.
p-value
time ticks
EventDetection(Cybernet)
feature: degree

Rayana & Akoglu
EventDetection(Enron)
feature:
weighted in-degree
Z-score
1 – norm.
(sum
p-value)
projection
SPE
Agg.
p-value
9

 Graphs over time  node feature time series
 Base detectors
 Anomalous Subspace (ASED) [Lakhina et al. ’04]
 SPIRIT [Papadimitriou et al. ’05]
 Eigen-behavior based (EBED) [Akoglu et al. ’10]
 Parametric modeling (PTSAD) [Rayana&Akoglu ’14]
▪ Models: Poisson, ZIP, Bernoulli+ZTP, Markov+ZTP
▪ Model selection: likelihood ratio test
 Moving average (MAED)
Nodes
Features
(egonet)
Time

ASED SPIRIT EBED PTSAD MAED
Base detector SELECTion
Rank based
• Inverse Rank
• Kemeny-Young [Kemeny’59]
•RobustRankAggregation
[Kolde+ ‘12]
Score based
• Unification [Zimek+ ‘11]
- avg & max
• Mixture Model [Gao+ ‘06]
- avg & max
Consensus SELECTion & final ensemble

Rayana & Akoglu 12
 Vertical SELECTion (SELECT-V)
 Exploits correlation among the rank
lists
 Horizontal SELECTion (SELECT-H)
 Exploits element wise order statistics to
filter out inaccurate detectors

S1 S2 S3 S4 S5P1 P2 P3 P4 P5
Unification

P1
target
avg
P2 P3 P4 P5
Pseudo ground truth
P3 is most correlated to the target

P1
target
avg
P2 P3 P4 P5
P3
Ensemble
avg
p

P1 P2
P3
P4 P5
Ensemble
avg
p
P1 is most correlated to p
If corr(avg(E,P1), target) > corr(p, target)
accept P1
else
discard P1

P1 P2
P3
P4 P5
Ensemble
avg
p
P1
Update until this list is empty

P2P3
P4 P5
Ensemble
P1
Discarded

S1 S2 S3
…
Sm
1
1
1
0
.
.
1
0
1
0
.
.
0
0
1
1
.
.
1
0
1
0
.
.
M1 M2 M3
…
Mm
Mixture Modeling
• 1 (outliers)
• 0 (inliers)
1
0
1
0
.
.
Majority
Voting
O
 Order statistics to choose
accurate lists
 Given m lists, for each
pseudo outlier:
r = [r(1), …,r(m)], s.t. r(1) ≤ … ≤ r(m)
Under uniform null,
prob. r̂(l) ≤ r(l):
(at least l ranks drawn uniformly
from [0, 1] must be ϵ [0, r(l)])Pseudo
outliers

Rayana & Akoglu 20
 Example with 20 detectors
 last 5 likely inaccurate

Rayana & Akoglu 22
 Full Ensemble (Full) [Rayana&Akoglu‘14]
 Assemble all the detector/consensus
results
 Diversity-based Ensemble (DivE)
[Schubert et al. 2012]
 Select diverse (less correlated) detector/
consensus results to assemble

Rayana & Akoglu 23
Data Set names duration #nodes #edges rate
1. EnronInc 4 years ~80K ~350K 1 day
2. RealityMining 50 weeks ~18K ~33k 1 week
3. TwitterSecurity 4 months ~130K ~441K 1 day
4. TwitterWCup 1 month ~54K ~274K 5 mins
5. NYTNews 7.5 years ~320K ~2980K 1 week
• Ground truth for datasets 1-4
• Qualitative evaluation for NYTNews

 Performance comparison

Rayana & Akoglu 37
Feature:
Weighted Degree

Rayana & Akoglu 38
 Columbia Disaster
 9/11
attack
New York City
World Trade
Center
Washington (DC)
Afghanistan
Bin Laden,
Osama
Al Qaeda
Manhattan (NY)
Bush,
George W
White HouseCongress
New York City
World Trade
Center
Washington (DC)
Afghanistan
Bin Laden,
Osama
Al Qaeda
Manhattan
(NY)
Bush,
George W
White HouseCongress
Time tick 89 Time tick 90

 A new Anomaly Ensemble
 SELECTive:
▪ Discard inaccurate detectors
▪ unsupervised
 Heterogeneous
▪ different detectors
▪ different consensus
 2-phases:
▪ No bias towards detectors & consensus
 SELECT outperforms
▪ Full (no selection)
▪ DivE (diversity ensemble)
 5 large datasets (4 w/ ground truth)
Hurt by inaccurate detectors

Event Detection
srayana@cs.stonybrook.edu
http://www.cs.stonybrook.edu/~datalab/

Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

Recommended

Recommended

More Related Content

Similar to Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

Similar to Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs (17)

Recently uploaded

Recently uploaded (20)

Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

Editor's Notes