SlideShare a Scribd company logo
Rayana & Akoglu
Shebuti Rayana* Leman Akoglu
May 2, 2015
Rayana & Akoglu 2Less is More: Building Selective Anomaly Ensembles
Network intrusion
At time point t
Time tick 7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Score
Time tick
Event Detection
Rayana & Akoglu 3Less is More: Building Selective Anomaly Ensembles
Emerging Topic in Social Media
Nepal Earth Quake 2015
tweets, retweets with
• #Nepal
• #NepalEarthQuake
• #NepalEarthQuakeRelief
• …
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20
Score
Time tick
Event Detection
25th April 2015
Rayana & Akoglu 4
 Given a sequence of graphs {G1, G2, … , Gt, …, GT}
 Find time points t’ at which Gt’ changes significantly
from Gt’-1
Less is More: Building Selective Anomaly Ensembles
time
similarity/distance scores
Rayana & Akoglu 5Less is More: Building Selective Anomaly Ensembles
 Numerous algorithms for event detection
 no “winner” algorithm across datasets
 Idea: ensemble approach
 Combine strength of accurate detectors
 Alleviate weakness of inaccurate detectors
Improved accuracy, reduced noise
More robust performance
Better than individual base detectors
T. G. Dietterich. Ensemble methods in machine learning. Springer, 2000
J. Ghosh and A. Acharya. Cluster ensembles: Theory and applications. 2013.
Rayana & Akoglu 6
 Idea: ensemble approach
 Challenge: building anomaly ensembles –
a fully unsupervised task
 No labels to guide for detector accuracy
 No objective function inherent to task
 Combining all the results may deteriorate the
overall ensemble accuracy [Rayana&Akoglu’14]
▪ some detectors may be inaccurate
Less is More: Building Selective Anomaly Ensembles
We build SELECTive anomaly ensembles
- identify (in)accurate detectors
- in unsupervised fashion
Rayana & Akoglu 7Less is More: Building Selective Anomaly Ensembles
EventDetection
Rayana & Akoglu 8Less is More: Building Selective Anomaly Ensembles
Eigen-behaviors
Parametric modeling
SPIRIT
Z-score
1 – norm.
(sum
p-value)
projection
Subspace Method
Moving Average
SPE
Agg.
p-value
time ticks
EventDetection(Cybernet)
feature: degree
Rayana & Akoglu
EventDetection(Enron)
feature:
weighted in-degree
Z-score
1 – norm.
(sum
p-value)
projection
SPE
Agg.
p-value
9
Rayana & Akoglu 10Less is More: Building Selective Anomaly Ensembles
 Graphs over time  node feature time series
 Base detectors
 Anomalous Subspace (ASED) [Lakhina et al. ’04]
 SPIRIT [Papadimitriou et al. ’05]
 Eigen-behavior based (EBED) [Akoglu et al. ’10]
 Parametric modeling (PTSAD) [Rayana&Akoglu ’14]
▪ Models: Poisson, ZIP, Bernoulli+ZTP, Markov+ZTP
▪ Model selection: likelihood ratio test
 Moving average (MAED)
Nodes
Features
(egonet)
Time
Rayana & Akoglu 11Less is More: Building Selective Anomaly Ensembles
ASED SPIRIT EBED PTSAD MAED
Base detector SELECTion
Rank based
• Inverse Rank
• Kemeny-Young [Kemeny’59]
•RobustRankAggregation
[Kolde+ ‘12]
Score based
• Unification [Zimek+ ‘11]
- avg & max
• Mixture Model [Gao+ ‘06]
- avg & max
Consensus SELECTion & final ensemble
Rayana & Akoglu 12
 Vertical SELECTion (SELECT-V)
 Exploits correlation among the rank
lists
 Horizontal SELECTion (SELECT-H)
 Exploits element wise order statistics to
filter out inaccurate detectors
Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 13Less is More: Building Selective Anomaly Ensembles
S1 S2 S3 S4 S5P1 P2 P3 P4 P5
Unification
Rayana & Akoglu 14Less is More: Building Selective Anomaly Ensembles
P1
target
avg
P2 P3 P4 P5
Pseudo ground truth
P3 is most correlated to the target
Rayana & Akoglu 15Less is More: Building Selective Anomaly Ensembles
P1
target
avg
P2 P3 P4 P5
P3
Ensemble
avg
p
Rayana & Akoglu 16Less is More: Building Selective Anomaly Ensembles
P1 P2
P3
P4 P5
Ensemble
avg
p
P1 is most correlated to p
If corr(avg(E,P1), target) > corr(p, target)
accept P1
else
discard P1
Rayana & Akoglu 17Less is More: Building Selective Anomaly Ensembles
P1 P2
P3
P4 P5
Ensemble
avg
p
P1
Update until this list is empty
Rayana & Akoglu 18Less is More: Building Selective Anomaly Ensembles
P2P3
P4 P5
Ensemble
P1
Discarded
Rayana & Akoglu 19Less is More: Building Selective Anomaly Ensembles
S1 S2 S3
…
Sm
1
1
1
0
.
.
1
0
1
0
.
.
0
0
1
1
.
.
1
0
1
0
.
.
M1 M2 M3
…
Mm
Mixture Modeling
• 1 (outliers)
• 0 (inliers)
1
0
1
0
.
.
Majority
Voting
O
 Order statistics to choose
accurate lists
 Given m lists, for each
pseudo outlier:
r = [r(1), …,r(m)], s.t. r(1) ≤ … ≤ r(m)
Under uniform null,
prob. r̂(l) ≤ r(l):
(at least l ranks drawn uniformly
from [0, 1] must be ϵ [0, r(l)])Pseudo
outliers
Rayana & Akoglu 20
 Example with 20 detectors
 last 5 likely inaccurate
Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 22
 Full Ensemble (Full) [Rayana&Akoglu‘14]
 Assemble all the detector/consensus
results
 Diversity-based Ensemble (DivE)
[Schubert et al. 2012]
 Select diverse (less correlated) detector/
consensus results to assemble
Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 23
Data Set names duration #nodes #edges rate
1. EnronInc 4 years ~80K ~350K 1 day
2. RealityMining 50 weeks ~18K ~33k 1 week
3. TwitterSecurity 4 months ~130K ~441K 1 day
4. TwitterWCup 1 month ~54K ~274K 5 mins
5. NYTNews 7.5 years ~320K ~2980K 1 week
Less is More: Building Selective Anomaly Ensembles
• Ground truth for datasets 1-4
• Qualitative evaluation for NYTNews
Rayana & Akoglu 24Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 25Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 26Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 27Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 28Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 29
Rayana & Akoglu 30Less is More: Building Selective Anomaly Ensembles
 Performance comparison
Rayana & Akoglu 31Less is More: Building Selective Anomaly Ensembles
 Performance comparison
Rayana & Akoglu 32Less is More: Building Selective Anomaly Ensembles
 Performance comparison
Rayana & Akoglu 33Less is More: Building Selective Anomaly Ensembles
 Performance comparison
Rayana & Akoglu 36Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 37
Feature:
Weighted Degree
Rayana & Akoglu 38
 Columbia Disaster
 9/11
attack
New York City
World Trade
Center
Washington (DC)
Afghanistan
Bin Laden,
Osama
Al Qaeda
Manhattan (NY)
Bush,
George W
White HouseCongress
New York City
World Trade
Center
Washington (DC)
Afghanistan
Bin Laden,
Osama
Al Qaeda
Manhattan
(NY)
Bush,
George W
White HouseCongress
Time tick 89 Time tick 90
Less is More: Building Selective Anomaly Ensembles
Rayana & Akoglu 39Less is More: Building Selective Anomaly Ensembles
 A new Anomaly Ensemble
 SELECTive:
▪ Discard inaccurate detectors
▪ unsupervised
 Heterogeneous
▪ different detectors
▪ different consensus
 2-phases:
▪ No bias towards detectors & consensus
 SELECT outperforms
▪ Full (no selection)
▪ DivE (diversity ensemble)
 5 large datasets (4 w/ ground truth)
Hurt by inaccurate detectors
Rayana & Akoglu 40Less is More: Building Selective Anomaly Ensembles
Event Detection
srayana@cs.stonybrook.edu
http://www.cs.stonybrook.edu/~datalab/

More Related Content

Similar to Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

Event Detection and Characterization in Dynamic Graphs
Event Detection and Characterization in Dynamic GraphsEvent Detection and Characterization in Dynamic Graphs
Event Detection and Characterization in Dynamic Graphs
Shebuti Rayana
 
21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))
Hayato Shimabukuro
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
Vasia Kalavri
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Flink Forward
 
Event Detection and Characterization in Dynamic Graphs
Event Detection and Characterization  in Dynamic GraphsEvent Detection and Characterization  in Dynamic Graphs
Event Detection and Characterization in Dynamic Graphs
Shebuti Rayana
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
kib_83
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
YONG ZHENG
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Gábor Szárnyas
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Andrea Gigli
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
Daniele Dell'Aglio
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
Matt Yang
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
Gábor Szárnyas
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
openCypher
 
Mba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethodsMba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethods
Niranjana K.R.
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
 

Similar to Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs (17)

Event Detection and Characterization in Dynamic Graphs
Event Detection and Characterization in Dynamic GraphsEvent Detection and Characterization in Dynamic Graphs
Event Detection and Characterization in Dynamic Graphs
 
21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Event Detection and Characterization in Dynamic Graphs
Event Detection and Characterization  in Dynamic GraphsEvent Detection and Characterization  in Dynamic Graphs
Event Detection and Characterization in Dynamic Graphs
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Mba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethodsMba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethods
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 

Recently uploaded

一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Recently uploaded (20)

一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs

  • 1. Rayana & Akoglu Shebuti Rayana* Leman Akoglu May 2, 2015
  • 2. Rayana & Akoglu 2Less is More: Building Selective Anomaly Ensembles Network intrusion At time point t Time tick 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 Score Time tick Event Detection
  • 3. Rayana & Akoglu 3Less is More: Building Selective Anomaly Ensembles Emerging Topic in Social Media Nepal Earth Quake 2015 tweets, retweets with • #Nepal • #NepalEarthQuake • #NepalEarthQuakeRelief • … 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 16 18 20 Score Time tick Event Detection 25th April 2015
  • 4. Rayana & Akoglu 4  Given a sequence of graphs {G1, G2, … , Gt, …, GT}  Find time points t’ at which Gt’ changes significantly from Gt’-1 Less is More: Building Selective Anomaly Ensembles time similarity/distance scores
  • 5. Rayana & Akoglu 5Less is More: Building Selective Anomaly Ensembles  Numerous algorithms for event detection  no “winner” algorithm across datasets  Idea: ensemble approach  Combine strength of accurate detectors  Alleviate weakness of inaccurate detectors Improved accuracy, reduced noise More robust performance Better than individual base detectors T. G. Dietterich. Ensemble methods in machine learning. Springer, 2000 J. Ghosh and A. Acharya. Cluster ensembles: Theory and applications. 2013.
  • 6. Rayana & Akoglu 6  Idea: ensemble approach  Challenge: building anomaly ensembles – a fully unsupervised task  No labels to guide for detector accuracy  No objective function inherent to task  Combining all the results may deteriorate the overall ensemble accuracy [Rayana&Akoglu’14] ▪ some detectors may be inaccurate Less is More: Building Selective Anomaly Ensembles We build SELECTive anomaly ensembles - identify (in)accurate detectors - in unsupervised fashion
  • 7. Rayana & Akoglu 7Less is More: Building Selective Anomaly Ensembles EventDetection
  • 8. Rayana & Akoglu 8Less is More: Building Selective Anomaly Ensembles Eigen-behaviors Parametric modeling SPIRIT Z-score 1 – norm. (sum p-value) projection Subspace Method Moving Average SPE Agg. p-value time ticks EventDetection(Cybernet) feature: degree
  • 9. Rayana & Akoglu EventDetection(Enron) feature: weighted in-degree Z-score 1 – norm. (sum p-value) projection SPE Agg. p-value 9
  • 10. Rayana & Akoglu 10Less is More: Building Selective Anomaly Ensembles  Graphs over time  node feature time series  Base detectors  Anomalous Subspace (ASED) [Lakhina et al. ’04]  SPIRIT [Papadimitriou et al. ’05]  Eigen-behavior based (EBED) [Akoglu et al. ’10]  Parametric modeling (PTSAD) [Rayana&Akoglu ’14] ▪ Models: Poisson, ZIP, Bernoulli+ZTP, Markov+ZTP ▪ Model selection: likelihood ratio test  Moving average (MAED) Nodes Features (egonet) Time
  • 11. Rayana & Akoglu 11Less is More: Building Selective Anomaly Ensembles ASED SPIRIT EBED PTSAD MAED Base detector SELECTion Rank based • Inverse Rank • Kemeny-Young [Kemeny’59] •RobustRankAggregation [Kolde+ ‘12] Score based • Unification [Zimek+ ‘11] - avg & max • Mixture Model [Gao+ ‘06] - avg & max Consensus SELECTion & final ensemble
  • 12. Rayana & Akoglu 12  Vertical SELECTion (SELECT-V)  Exploits correlation among the rank lists  Horizontal SELECTion (SELECT-H)  Exploits element wise order statistics to filter out inaccurate detectors Less is More: Building Selective Anomaly Ensembles
  • 13. Rayana & Akoglu 13Less is More: Building Selective Anomaly Ensembles S1 S2 S3 S4 S5P1 P2 P3 P4 P5 Unification
  • 14. Rayana & Akoglu 14Less is More: Building Selective Anomaly Ensembles P1 target avg P2 P3 P4 P5 Pseudo ground truth P3 is most correlated to the target
  • 15. Rayana & Akoglu 15Less is More: Building Selective Anomaly Ensembles P1 target avg P2 P3 P4 P5 P3 Ensemble avg p
  • 16. Rayana & Akoglu 16Less is More: Building Selective Anomaly Ensembles P1 P2 P3 P4 P5 Ensemble avg p P1 is most correlated to p If corr(avg(E,P1), target) > corr(p, target) accept P1 else discard P1
  • 17. Rayana & Akoglu 17Less is More: Building Selective Anomaly Ensembles P1 P2 P3 P4 P5 Ensemble avg p P1 Update until this list is empty
  • 18. Rayana & Akoglu 18Less is More: Building Selective Anomaly Ensembles P2P3 P4 P5 Ensemble P1 Discarded
  • 19. Rayana & Akoglu 19Less is More: Building Selective Anomaly Ensembles S1 S2 S3 … Sm 1 1 1 0 . . 1 0 1 0 . . 0 0 1 1 . . 1 0 1 0 . . M1 M2 M3 … Mm Mixture Modeling • 1 (outliers) • 0 (inliers) 1 0 1 0 . . Majority Voting O  Order statistics to choose accurate lists  Given m lists, for each pseudo outlier: r = [r(1), …,r(m)], s.t. r(1) ≤ … ≤ r(m) Under uniform null, prob. r̂(l) ≤ r(l): (at least l ranks drawn uniformly from [0, 1] must be ϵ [0, r(l)])Pseudo outliers
  • 20. Rayana & Akoglu 20  Example with 20 detectors  last 5 likely inaccurate Less is More: Building Selective Anomaly Ensembles
  • 21. Rayana & Akoglu 22  Full Ensemble (Full) [Rayana&Akoglu‘14]  Assemble all the detector/consensus results  Diversity-based Ensemble (DivE) [Schubert et al. 2012]  Select diverse (less correlated) detector/ consensus results to assemble Less is More: Building Selective Anomaly Ensembles
  • 22. Rayana & Akoglu 23 Data Set names duration #nodes #edges rate 1. EnronInc 4 years ~80K ~350K 1 day 2. RealityMining 50 weeks ~18K ~33k 1 week 3. TwitterSecurity 4 months ~130K ~441K 1 day 4. TwitterWCup 1 month ~54K ~274K 5 mins 5. NYTNews 7.5 years ~320K ~2980K 1 week Less is More: Building Selective Anomaly Ensembles • Ground truth for datasets 1-4 • Qualitative evaluation for NYTNews
  • 23. Rayana & Akoglu 24Less is More: Building Selective Anomaly Ensembles
  • 24. Rayana & Akoglu 25Less is More: Building Selective Anomaly Ensembles
  • 25. Rayana & Akoglu 26Less is More: Building Selective Anomaly Ensembles
  • 26. Rayana & Akoglu 27Less is More: Building Selective Anomaly Ensembles
  • 27. Rayana & Akoglu 28Less is More: Building Selective Anomaly Ensembles
  • 29. Rayana & Akoglu 30Less is More: Building Selective Anomaly Ensembles  Performance comparison
  • 30. Rayana & Akoglu 31Less is More: Building Selective Anomaly Ensembles  Performance comparison
  • 31. Rayana & Akoglu 32Less is More: Building Selective Anomaly Ensembles  Performance comparison
  • 32. Rayana & Akoglu 33Less is More: Building Selective Anomaly Ensembles  Performance comparison
  • 33. Rayana & Akoglu 36Less is More: Building Selective Anomaly Ensembles
  • 34. Rayana & Akoglu 37 Feature: Weighted Degree
  • 35. Rayana & Akoglu 38  Columbia Disaster  9/11 attack New York City World Trade Center Washington (DC) Afghanistan Bin Laden, Osama Al Qaeda Manhattan (NY) Bush, George W White HouseCongress New York City World Trade Center Washington (DC) Afghanistan Bin Laden, Osama Al Qaeda Manhattan (NY) Bush, George W White HouseCongress Time tick 89 Time tick 90 Less is More: Building Selective Anomaly Ensembles
  • 36. Rayana & Akoglu 39Less is More: Building Selective Anomaly Ensembles  A new Anomaly Ensemble  SELECTive: ▪ Discard inaccurate detectors ▪ unsupervised  Heterogeneous ▪ different detectors ▪ different consensus  2-phases: ▪ No bias towards detectors & consensus  SELECT outperforms ▪ Full (no selection) ▪ DivE (diversity ensemble)  5 large datasets (4 w/ ground truth) Hurt by inaccurate detectors
  • 37. Rayana & Akoglu 40Less is More: Building Selective Anomaly Ensembles Event Detection srayana@cs.stonybrook.edu http://www.cs.stonybrook.edu/~datalab/

Editor's Notes

  1. My work focuses on discovering patterns and detecting anomalies in real-world data, using graph analytics techniques, and developing effective and efficient tools to do so .