SlideShare a Scribd company logo
1 of 43
Anomaly Detection in Plain Static
Graphs
Prepared by : Javad Forough (javad.forough@aut.ac.ir)
Professor : Dr.Mousavi
Amirkabir University of Technology
1
Outline
 Introduction
 Outliers & graph anomalies
 Challenges
 Anomaly detection in static graphs
 Anomalies in static plain graphs
 Structure based methods
 Community based methods
 Anomalies in static attributed graphs
 Structure based methods
 Community based methods
 Relational learning based methods
2
Introduction
 Detecting anomalies in data is a vital task and , with numerous high-impact
applications in areas such as security, finance, health care, and law enforcement
and many others.
 with graph data becoming ubiquitous, techniques for structured graph data have
been of focus recently.
 The branch of data mining concerned with discovering rare occurrences in datasets
is called anomaly detection.
 Application examples:
 Detecting network intrusion or network failure
 Detecting credit card fraud
3
Introduction
 Application examples:
 Detecting email and Web spam
 Detecting auction fraud
 Detecting securities fraud
 Detecting malware/spyware
 Data cleaning
 And so many others…….
4
Introduction
 Outliers & graph anomalies:
 Many techniques have been developed in the past decades, especially for
spotting outliers and anomalies in unstructured collections of multi-
dimensional data points.
 data objects cannot always be treated as points lying in a multi-
dimensional space independently.
 They may exhibit inter-dependencies which should be accounted for during
the anomaly detection process(Fig. 1).
 Data instances in a wide range of disciplines, such as physics, biology,
social sciences, and information systems, are inherently related to one
another.
5
Introduction
 Outliers & graph anomalies:
 Types of outliers in graphs
1. Node outliers : vertices with unusual characteristics
2. Linkage outliers : edges with unusual characteristics
3. Subgraph outliers : parts of the graph which exhibit unusual
characteristics
6
Introduction
 Fig. 1 Point-based outlier detection versus graph-based anomaly detection. a
Clouds of points (multidimensional),b inter-linked objects (network)
7
Introduction
 Researchers have recently intensified their study of methods for anomaly
detection in structured graph data.
 Why Graphs?? We highlight four main reasons that make graph-based
approaches to anomaly detection vital and necessary:
1. Inter-dependent nature of the data
2. Powerful representation
3. Relational nature of problem domains
4. Robust machinery
8
Introduction
 Challenges:
 No unique definition for the problem of anomaly detection exists
 the general definition of an anomaly or an outlier is a vague one: the definition
becomes meaningful only under a given context or application
 The very first definition of an outlier dates back to 1980, and is given by Hawkins
(1980)[1]:
 Definition 1 (Hawkins’ Definition of Outlier, 1980) “An outlier is an observation that
differs so much from other observations as to arouse suspicion that it was
generated by a different mechanism.”
 the above definition is quite general and thus make the detection problem an open-
ended one
 the problem of anomaly detection has been defined in various ways in different
contexts
9
Introduction
 Challenges:
 the problem has many definitions often tailored for the specific application
domain, and also exhibits various names such as outlier, anomaly, outbreak,
event, change, fraud, detection, etc.
 In some applications, such as data cleaning, outliers are even called the noise.
 we provide a general definition for the graph anomaly detection problem as
follows.
 Definition 2 (General Graph Anomaly Detection Problem):
Given a (plain/attributed, static/dynamic) graph database, Find the graph
objects (nodes/edges/substructures) that are rare and that differ significantly
from the majority of the reference objects in the graph.
10
Introduction
 Challenges:
 For practical purposes, a record/point/graph-object is flagged as anomalous if its
rarity/likelihood/outlierness score exceeds a user-defined or an estimated threshold.
 In other words, an anomaly is treated as a data object or a group of objects that is
rare (e.g., rare combination of categorical attribute values), isolated (e.g., far-away
points in n-dimensional spaces), and/or surprising (e.g., data instances that do not
fit well in our mental/statistical model, or need too many bits to describe under the
Minimum Description Length principle (Rissanen 1999)[2]
 There are two challenges associated with anomaly detection:
1. data-specific challenges
2. problem-specific challenges
11
Introduction
 Challenges:
 Data-specific challenges Simply put, the challenges with respect to data are those of
working with big data; namely volume, velocity, and variety of massive, streaming,
and complex datasets. The same challenges generalize to graph data as well.
 Scale and dynamics
 Facebook ~ 2 billion users
 The Web ~ 1 trillion pages
 Cell phone ~ over 6 billion users
 Complexity
 the datasets are rich and complex in content
12
Introduction
 Challenges:
 Problem-specific challenges Additional challenges arise with respect to
the anomaly detection task itself.
 Lack and noise of labels
 Class imbalance and asymmetric error
 Novel anomalies
 Graph-specific challenges : All of the above +
 Inter-dependent objects
 Variety of definitions
 Size of search space
13
Anomaly detection in static plain graphs
 Outliers in clouds of data points:
 Multi-dimensional outlier detection
 Techniques:
 density-based
 distance-based
 depth-based
 distribution-based
 clustering-based
 classification-based
 information theory-based
 spectrum-based
 subspace-based
14
Anomaly detection in static plain graphs 15
Anomaly detection in static plain graphs
 Anomaly detection in static graphs:
1. Plain graphs
 only nodes and edges among those nodes, i.e. the graph structure.
2. Attributed graphs
 Social network : Users various interests , work/live at
different locations ,various education levels and etc.
 relational links various strengths, types, frequency, etc.
16
Anomaly detection in static plain graphs
 a general definition for the anomaly detection problem for static graphs can be
stated as follows:
 Definition 3 (Static-graph anomaly detection problem) :
 Given the snapshot of a (plain or attributed) graph database, Find the
nodes and/or edges and/or substructures that are “few and different” or
deviate significantly from the patterns observed in the graph.
17
Anomaly detection in static plain graphs
 Anomalies in static plain graphs
 The only information is the graph structure
1. Structure based methods
 Feature-based approaches
 Proximity-based approaches
2. Community based methods
18
Anomaly detection in static plain graphs
 Structure based methods:
 Feature-based approaches:
 Main idea : This group of approaches uses the graph representation to
extract structural graph-centric features
 use the given graph structure to compute various measures associated
with the nodes, dyads, triads, egonets, communities, as well as the
global graph structure[3]
 These features have been used in several anomaly detection
applications including Web spam[4] and network intrusion[5]
19
Anomaly detection in static plain graphs
 Structure based methods:
 Node-level features
1. (in/out) degrees
2. centrality measures
1. Eigenvector
2. Closeness
3. Betweenness
3. local clustering coefficient
4. degree assortativity
5. roles
20
Anomaly detection in plain graphs
 Structure based methods:
 dyadic features:
1. Reciprocity
2. edge betweenness
3. number of common neighbors
 Egonet features:
1. number of triangles
2. total weight
3. principal eigenvalue
21
Anomaly detection in static plain graphs
 Structure based methods:
 node-group-level:
1. Density
2. Modularity
3. Conductance
 Global measures:
1. number of connected components
2. distribution of component sizes
3. principal eigenvalue
4. minimum spanning tree weight
5. average node degree
6. global clustering coefficient
22
Anomaly detection in static plain graphs
 Structure based methods:
 Oddball[6] :
 The aim of this technique is to find anomalous nodes
 It builds its solution on the analysis of ego networks
 Input = a graph , output = list of node outlier candidates
 Ego network :
 one-step neighborhood around a central node “ego”
 includes the central node, its direct neighbors and all the edges
among these nodes
 In other words, the ego network is the subgraph of one-step
neighborhood of the central node
23
Anomaly detection in static plain graphs
 Ego network
 Fig.2 Fig.3
24
Anomaly detection in static plain graphs
 Oddball :
1. Ego network extraction: get all ego networks from the input graph.
2. Feature selection: choose features of ego networks that could indicate
anomalies; compute these features for all ego networks.
3. Analysis: pinpoint anomalies using any outlier detection method in point
clouds
 Two of the features that are successful in detecting outliers are number of
nodes and number of edges in the ego network
25
Anomaly detection in static plain graphs
 Oddball:
 Plotting the number of nodes against the number of edges reveals near
cliques and stars
 The green line represents the maximum number of edges in an 𝑛 node ego
network (𝑛∗(𝑛−1)/2)
 the blue line the minimum number of edges (𝑛−1)
 The closer the ego network lies to the lines, the more remarkable it is likely
to be.
26
Anomaly detection in static plain graphs
 Oddball:
Clique in graph 𝐴 Star in graph 𝐵
 fig.4 Revealing cliques in graph 𝐴 fig.5 Revealing stars in graph 𝐵
27
Anomaly detection in static plain graphs
 Structure based methods:
 Proximity-based approaches:
 Main idea : This group of techniques exploits the graph structure to measure closeness
(or proximity) of objects in the graph
 These methods capture the simple autocorrelation between these objects, where close-by
objects are considered to be likely to belong to the same class (e.g. malicious/benign or
infected/healthy)
 Measuring the importance of the nodes in a graph
 PageRank[7]
 Personalized PageRank (PPR)[8]
 SimRank[9]
28
Anomaly detection in static plain graphs
 Community based methods
 Main idea : The cluster or community-based methods for graph anomaly detection rely on finding densely
connected groups of “close-by” nodes in the graph and spot nodes and/or edges that have connections
across communities.
 Two main problems[10]
 P1 : how to find the community of a given node : ‘neighborhood of a node’
 Use random-walk-with-restart-based PPR scores[8] of all the nodes with respect to the given node
 nodes with high PPR scores constitute the neighborhood of a node
 P2 : how to quantify the level of the given node to be a bridge node
 The pairwise PPR scores among all the neighbors of the given node are aggregated by averaging to compute a
so-called “normality” score of a node
 nodes with low normality ~ have neighbors with low pairwise proximity to one another ~ neighbors lie in
different, separate communities ~ given node resemble a bridging node across communities
 techniques:
1. SCAN
2. AUTOPART
29
Anomaly detection in static plain graphs
 SCAN[11] – Structural Clustering Algorithm for Networks
 purpose : to identify node outliers
 two types of nodes that play special roles:
1. Outliers : nodes that are marginally connected to clusters
2. Hubs : nodes that bridge clusters
 clusters : groups of nodes that have a dense set of edges running within
the clusters, and have a relatively low number of edges that run between
the clusters
 hubs play a significant role
 outliers have no importance and maybe discarded or isolated as noise
30
Anomaly detection in static plain graphs
 SCAN :
 Input : a graph and two parameters (ε,μ)
 ε captures the rigorousness of the condition of a node to be considered
part of a cluster
 μ determines the minimum number of vertices a cluster must have
 Output : a list of clusters, hubs and outliers as output
31
Anomaly detection in static plain graphs
 SCAN :
 A low ε ~ draws a low line of requirement for being a member of a cluster
 In-creasing ε tightens the coherence inside a cluster, and the initial all-
encompassing cluster would be broken up to smaller groups
 fig.6. ε=0.7,μ=2 fig.7. ε=0.8,μ=2 fig.8. ε=0.9,μ=2
32
Anomaly detection in static plain graphs
 SCAN :
 In the Fig.6, the original interpretation is retrieved: clusters {1, 2, 3, 4, 5, 6}
and {8, 9, 10, 11, 12, 13}, 7 as a hub and 14 as an outlier.
 Fig.7 further decomposes the two clusters, thus identifying 10 also as a
hub, because it neighbors two clusters.
 At the extreme case in Fig.8, the conditions to form a cluster are so high,
that none was identified, thus all nodes are taken to be outliers.
 It is worth to note that ε=0.7 and μ=7 would also lead to the extreme case,
because there is no combination of seven nodes that are closely connected.
33
Anomaly detection in static plain graphs
 SCAN :
 How it work ??
 At the beginning, all nodes are labeled as unclassified
 SCAN performs one pass of the nodes, and classifies them either as a
cluster member or a non-member based on structure connectivity
 At the end, when all clusters are found, the non-members are classified
further as hubs or outliers, based on the cluster membership of their
neighbors
34
Anomaly detection in static plain graphs
 AUTOPART[12]-Parameter Free Graph Partitioning and Outlier Detection
 capable of identifying anomalous edges
 primary purpose is to (automatically) partition the graph into clusters
without user intervention ~ it is parameter free
 After finding a partitioning – a set of clusters – it proposes a method to
measure the outlierness of edges that bridge separate clusters
 This technique specifically uses the adjacency matrix as graph
representation
 A partitioning is a reordering of rows and columns in a way that nodes
belonging to the same cluster are placed next to each other
 the adjacency matrix is broken down to blocks
35
Anomaly detection in static plain graphs
 AUTOPART :
 the squares located on the diagonal of the matrix capture the edges running inside
the clusters
 the rectangles represent the edges bridging the corresponding clusters.
fig.9. nodes fig.10. groups
36
Anomaly detection in static plain graphs
 AUTOPART:
 A good partitioning yields homogeneous blocks, which in turn, can be compressed
efficiently
 The total cost is comprised of a description cost and a code cost
 Description cost : holds the information about the rectangular/square blocks. It is
the transmission cost of the following terms :
• Number of nodes
• Node permutation (which row represents which node)
• number of clusters
• number of nodes in each cluster
• number of ones in each block (the number of edges bridging the given clusters)
 Code cost : holds the information about the content of the blocks. It is the
transmission cost of the blocks calculated using the Shannon entropy function
37
Anomaly detection in static plain graphs
 AUTOPART:
 Description cost penalizes a high number of blocks
 code cost penalizes heterogeneous blocks
 a good partitioning maintains a balance between a low number of clusters
and a high homogeneity of blocks
 The algorithm finds the tradeoff point between the two aspects and yields
a construction with the minimal total cost
38
Anomaly detection in static plain graphs
 AUTOPART: how it works ??
fig.11.
Start with
initial
matrix,
k=1
Final
partitionin
g, k*
STEP 2. Increase k, k=k+1
Lower the
encoding
cost
STEP 1. Find good clusters
for fixed k
39
Anomaly detection in static plain graphs
 AUTOPART:
 It starts with an initial adjacency matrix, where all nodes belong to one cluster (k =
1)
 Inside the main loop, the total cost is iteratively reduced until no improvements can
be made, and the final partitioning together with the final cluster count 𝑘∗ is
outputted
 The iterative reduction is made up of two steps :
1. first, a good partitioning given the number of clusters is found
2. Second, the number of clusters is increased to allow for better partitioning
 Once the final partitioning is found, AUTOPART marks the anomalous edges
 Outliers show deviation from the normal patterns, so they hurt attempts to
compress data
 Therefore those edges, whose removal reduces the total cost the most are marked
as outliers
40
References
1. D. M. Hawkins, Identification of outliers, Springer, 1980.
2. Rissanen J (1999) Hypothesis selection and testing by the MDL principle. Comput J 42:260–269
3. Henderson K, Eliassi-Rad T, Faloutsos C, Akoglu L, Li L Maruhashi K, Prakash BA, Tong H (2010)
Metricforensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM
international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp
163–172
4. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R (2006) Link-based characterization and
detection of Web Spam. In: Second international workshop on adversarial information retrieval on
the web (AIRWeb)
5. Ding Q, Katenka N, Barford P, Kolaczyk ED, Crovella M (2012) Intrusion as (anti)social
communication:characterization and detection. In: Proceedings of the 18th ACM international
conference on knowledge discovery and data mining (SIGKDD), Beijing, China. ACM, pp 886–894
41
References
6. Akoglu L,McGlohon M, Faloutsos C (2010) OddBall: spotting anomalies in weighted graphs. In:
Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining
(PAKDD),Hyderabad, India, pp 410–421
7. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw
30(1–7):107–117
8. Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web
search.IEEE Trans Knowl Data Eng 15(4):784–796
9. JehG,Widom J (2002) SimRank: ameasure of structural-context similarity. In: Proceedings of the
8thACM international conference on knowledge discovery and data mining (SIGKDD), Edmonton,
Alberta, pp 538–543
10. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in
bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining (ICDM),
Houston, TX. IEEE Computer Society, pp 418–425
11. X. Xu, N. Yuruk, Z. Feng and T. A. Schweiger, "SCAN: A Structural Clustering Algorithm for
Networks," in Proceedings of the 13th ACM SIGKDD international conference on Knowledge
discovery and data mining, 2007.
42
References
12. D. Chakrabarti, "Autopart: Parameter-free graph partitioning and outlier detection," in Knowledge
Discovery in Databases: PKDD 2004, Springer, 2004, pp. 112--124.
43

More Related Content

What's hot

Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphsDeakin University
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full ArticleMenglinLiu1
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1Gautam Kumar
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in tradingAashay Harlalka
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 

What's hot (20)

Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
Anomaly Detection: A Survey
Anomaly Detection: A SurveyAnomaly Detection: A Survey
Anomaly Detection: A Survey
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Network topologies
Network topologiesNetwork topologies
Network topologies
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Clustering coefficient
Clustering coefficient Clustering coefficient
Clustering coefficient
 

Similar to Anomaly detection in plain static graphs

Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Traffic anomaly detection and attack
Traffic anomaly detection and attackTraffic anomaly detection and attack
Traffic anomaly detection and attackQrator Labs
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
Traffic anomaly diagnosis in internet backbone networks
Traffic anomaly diagnosis in internet backbone networksTraffic anomaly diagnosis in internet backbone networks
Traffic anomaly diagnosis in internet backbone networksSabri Balafif
 
A Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionA Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionCamella Taylor
 
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprint
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprintCao nicolau-mc dermott-learning-neural-cybernetics-2018-preprint
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprintNam Le
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Editor IJARCET
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Editor IJARCET
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScscpconf
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScsitconf
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONVLSICS Design
 
Target Detection Using Multi Resolution Analysis for Camouflaged Images
Target Detection Using Multi Resolution Analysis for Camouflaged Images Target Detection Using Multi Resolution Analysis for Camouflaged Images
Target Detection Using Multi Resolution Analysis for Camouflaged Images ijcisjournal
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningPRATHAMESH REGE
 
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMDYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMcscpconf
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Projectbutest
 

Similar to Anomaly detection in plain static graphs (20)

2007.02500.pdf
2007.02500.pdf2007.02500.pdf
2007.02500.pdf
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
12 outlier
12 outlier12 outlier
12 outlier
 
Traffic anomaly detection and attack
Traffic anomaly detection and attackTraffic anomaly detection and attack
Traffic anomaly detection and attack
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
G44093135
G44093135G44093135
G44093135
 
Traffic anomaly diagnosis in internet backbone networks
Traffic anomaly diagnosis in internet backbone networksTraffic anomaly diagnosis in internet backbone networks
Traffic anomaly diagnosis in internet backbone networks
 
A Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionA Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft Detection
 
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprint
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprintCao nicolau-mc dermott-learning-neural-cybernetics-2018-preprint
Cao nicolau-mc dermott-learning-neural-cybernetics-2018-preprint
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368
 
Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368Ijarcet vol-2-issue-7-2363-2368
Ijarcet vol-2-issue-7-2363-2368
 
Ij2514951500
Ij2514951500Ij2514951500
Ij2514951500
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
 
Target Detection Using Multi Resolution Analysis for Camouflaged Images
Target Detection Using Multi Resolution Analysis for Camouflaged Images Target Detection Using Multi Resolution Analysis for Camouflaged Images
Target Detection Using Multi Resolution Analysis for Camouflaged Images
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learning
 
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMDYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOM
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 

Recently uploaded

Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Anomaly detection in plain static graphs

  • 1. Anomaly Detection in Plain Static Graphs Prepared by : Javad Forough (javad.forough@aut.ac.ir) Professor : Dr.Mousavi Amirkabir University of Technology 1
  • 2. Outline  Introduction  Outliers & graph anomalies  Challenges  Anomaly detection in static graphs  Anomalies in static plain graphs  Structure based methods  Community based methods  Anomalies in static attributed graphs  Structure based methods  Community based methods  Relational learning based methods 2
  • 3. Introduction  Detecting anomalies in data is a vital task and , with numerous high-impact applications in areas such as security, finance, health care, and law enforcement and many others.  with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently.  The branch of data mining concerned with discovering rare occurrences in datasets is called anomaly detection.  Application examples:  Detecting network intrusion or network failure  Detecting credit card fraud 3
  • 4. Introduction  Application examples:  Detecting email and Web spam  Detecting auction fraud  Detecting securities fraud  Detecting malware/spyware  Data cleaning  And so many others……. 4
  • 5. Introduction  Outliers & graph anomalies:  Many techniques have been developed in the past decades, especially for spotting outliers and anomalies in unstructured collections of multi- dimensional data points.  data objects cannot always be treated as points lying in a multi- dimensional space independently.  They may exhibit inter-dependencies which should be accounted for during the anomaly detection process(Fig. 1).  Data instances in a wide range of disciplines, such as physics, biology, social sciences, and information systems, are inherently related to one another. 5
  • 6. Introduction  Outliers & graph anomalies:  Types of outliers in graphs 1. Node outliers : vertices with unusual characteristics 2. Linkage outliers : edges with unusual characteristics 3. Subgraph outliers : parts of the graph which exhibit unusual characteristics 6
  • 7. Introduction  Fig. 1 Point-based outlier detection versus graph-based anomaly detection. a Clouds of points (multidimensional),b inter-linked objects (network) 7
  • 8. Introduction  Researchers have recently intensified their study of methods for anomaly detection in structured graph data.  Why Graphs?? We highlight four main reasons that make graph-based approaches to anomaly detection vital and necessary: 1. Inter-dependent nature of the data 2. Powerful representation 3. Relational nature of problem domains 4. Robust machinery 8
  • 9. Introduction  Challenges:  No unique definition for the problem of anomaly detection exists  the general definition of an anomaly or an outlier is a vague one: the definition becomes meaningful only under a given context or application  The very first definition of an outlier dates back to 1980, and is given by Hawkins (1980)[1]:  Definition 1 (Hawkins’ Definition of Outlier, 1980) “An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by a different mechanism.”  the above definition is quite general and thus make the detection problem an open- ended one  the problem of anomaly detection has been defined in various ways in different contexts 9
  • 10. Introduction  Challenges:  the problem has many definitions often tailored for the specific application domain, and also exhibits various names such as outlier, anomaly, outbreak, event, change, fraud, detection, etc.  In some applications, such as data cleaning, outliers are even called the noise.  we provide a general definition for the graph anomaly detection problem as follows.  Definition 2 (General Graph Anomaly Detection Problem): Given a (plain/attributed, static/dynamic) graph database, Find the graph objects (nodes/edges/substructures) that are rare and that differ significantly from the majority of the reference objects in the graph. 10
  • 11. Introduction  Challenges:  For practical purposes, a record/point/graph-object is flagged as anomalous if its rarity/likelihood/outlierness score exceeds a user-defined or an estimated threshold.  In other words, an anomaly is treated as a data object or a group of objects that is rare (e.g., rare combination of categorical attribute values), isolated (e.g., far-away points in n-dimensional spaces), and/or surprising (e.g., data instances that do not fit well in our mental/statistical model, or need too many bits to describe under the Minimum Description Length principle (Rissanen 1999)[2]  There are two challenges associated with anomaly detection: 1. data-specific challenges 2. problem-specific challenges 11
  • 12. Introduction  Challenges:  Data-specific challenges Simply put, the challenges with respect to data are those of working with big data; namely volume, velocity, and variety of massive, streaming, and complex datasets. The same challenges generalize to graph data as well.  Scale and dynamics  Facebook ~ 2 billion users  The Web ~ 1 trillion pages  Cell phone ~ over 6 billion users  Complexity  the datasets are rich and complex in content 12
  • 13. Introduction  Challenges:  Problem-specific challenges Additional challenges arise with respect to the anomaly detection task itself.  Lack and noise of labels  Class imbalance and asymmetric error  Novel anomalies  Graph-specific challenges : All of the above +  Inter-dependent objects  Variety of definitions  Size of search space 13
  • 14. Anomaly detection in static plain graphs  Outliers in clouds of data points:  Multi-dimensional outlier detection  Techniques:  density-based  distance-based  depth-based  distribution-based  clustering-based  classification-based  information theory-based  spectrum-based  subspace-based 14
  • 15. Anomaly detection in static plain graphs 15
  • 16. Anomaly detection in static plain graphs  Anomaly detection in static graphs: 1. Plain graphs  only nodes and edges among those nodes, i.e. the graph structure. 2. Attributed graphs  Social network : Users various interests , work/live at different locations ,various education levels and etc.  relational links various strengths, types, frequency, etc. 16
  • 17. Anomaly detection in static plain graphs  a general definition for the anomaly detection problem for static graphs can be stated as follows:  Definition 3 (Static-graph anomaly detection problem) :  Given the snapshot of a (plain or attributed) graph database, Find the nodes and/or edges and/or substructures that are “few and different” or deviate significantly from the patterns observed in the graph. 17
  • 18. Anomaly detection in static plain graphs  Anomalies in static plain graphs  The only information is the graph structure 1. Structure based methods  Feature-based approaches  Proximity-based approaches 2. Community based methods 18
  • 19. Anomaly detection in static plain graphs  Structure based methods:  Feature-based approaches:  Main idea : This group of approaches uses the graph representation to extract structural graph-centric features  use the given graph structure to compute various measures associated with the nodes, dyads, triads, egonets, communities, as well as the global graph structure[3]  These features have been used in several anomaly detection applications including Web spam[4] and network intrusion[5] 19
  • 20. Anomaly detection in static plain graphs  Structure based methods:  Node-level features 1. (in/out) degrees 2. centrality measures 1. Eigenvector 2. Closeness 3. Betweenness 3. local clustering coefficient 4. degree assortativity 5. roles 20
  • 21. Anomaly detection in plain graphs  Structure based methods:  dyadic features: 1. Reciprocity 2. edge betweenness 3. number of common neighbors  Egonet features: 1. number of triangles 2. total weight 3. principal eigenvalue 21
  • 22. Anomaly detection in static plain graphs  Structure based methods:  node-group-level: 1. Density 2. Modularity 3. Conductance  Global measures: 1. number of connected components 2. distribution of component sizes 3. principal eigenvalue 4. minimum spanning tree weight 5. average node degree 6. global clustering coefficient 22
  • 23. Anomaly detection in static plain graphs  Structure based methods:  Oddball[6] :  The aim of this technique is to find anomalous nodes  It builds its solution on the analysis of ego networks  Input = a graph , output = list of node outlier candidates  Ego network :  one-step neighborhood around a central node “ego”  includes the central node, its direct neighbors and all the edges among these nodes  In other words, the ego network is the subgraph of one-step neighborhood of the central node 23
  • 24. Anomaly detection in static plain graphs  Ego network  Fig.2 Fig.3 24
  • 25. Anomaly detection in static plain graphs  Oddball : 1. Ego network extraction: get all ego networks from the input graph. 2. Feature selection: choose features of ego networks that could indicate anomalies; compute these features for all ego networks. 3. Analysis: pinpoint anomalies using any outlier detection method in point clouds  Two of the features that are successful in detecting outliers are number of nodes and number of edges in the ego network 25
  • 26. Anomaly detection in static plain graphs  Oddball:  Plotting the number of nodes against the number of edges reveals near cliques and stars  The green line represents the maximum number of edges in an 𝑛 node ego network (𝑛∗(𝑛−1)/2)  the blue line the minimum number of edges (𝑛−1)  The closer the ego network lies to the lines, the more remarkable it is likely to be. 26
  • 27. Anomaly detection in static plain graphs  Oddball: Clique in graph 𝐴 Star in graph 𝐵  fig.4 Revealing cliques in graph 𝐴 fig.5 Revealing stars in graph 𝐵 27
  • 28. Anomaly detection in static plain graphs  Structure based methods:  Proximity-based approaches:  Main idea : This group of techniques exploits the graph structure to measure closeness (or proximity) of objects in the graph  These methods capture the simple autocorrelation between these objects, where close-by objects are considered to be likely to belong to the same class (e.g. malicious/benign or infected/healthy)  Measuring the importance of the nodes in a graph  PageRank[7]  Personalized PageRank (PPR)[8]  SimRank[9] 28
  • 29. Anomaly detection in static plain graphs  Community based methods  Main idea : The cluster or community-based methods for graph anomaly detection rely on finding densely connected groups of “close-by” nodes in the graph and spot nodes and/or edges that have connections across communities.  Two main problems[10]  P1 : how to find the community of a given node : ‘neighborhood of a node’  Use random-walk-with-restart-based PPR scores[8] of all the nodes with respect to the given node  nodes with high PPR scores constitute the neighborhood of a node  P2 : how to quantify the level of the given node to be a bridge node  The pairwise PPR scores among all the neighbors of the given node are aggregated by averaging to compute a so-called “normality” score of a node  nodes with low normality ~ have neighbors with low pairwise proximity to one another ~ neighbors lie in different, separate communities ~ given node resemble a bridging node across communities  techniques: 1. SCAN 2. AUTOPART 29
  • 30. Anomaly detection in static plain graphs  SCAN[11] – Structural Clustering Algorithm for Networks  purpose : to identify node outliers  two types of nodes that play special roles: 1. Outliers : nodes that are marginally connected to clusters 2. Hubs : nodes that bridge clusters  clusters : groups of nodes that have a dense set of edges running within the clusters, and have a relatively low number of edges that run between the clusters  hubs play a significant role  outliers have no importance and maybe discarded or isolated as noise 30
  • 31. Anomaly detection in static plain graphs  SCAN :  Input : a graph and two parameters (ε,μ)  ε captures the rigorousness of the condition of a node to be considered part of a cluster  μ determines the minimum number of vertices a cluster must have  Output : a list of clusters, hubs and outliers as output 31
  • 32. Anomaly detection in static plain graphs  SCAN :  A low ε ~ draws a low line of requirement for being a member of a cluster  In-creasing ε tightens the coherence inside a cluster, and the initial all- encompassing cluster would be broken up to smaller groups  fig.6. ε=0.7,μ=2 fig.7. ε=0.8,μ=2 fig.8. ε=0.9,μ=2 32
  • 33. Anomaly detection in static plain graphs  SCAN :  In the Fig.6, the original interpretation is retrieved: clusters {1, 2, 3, 4, 5, 6} and {8, 9, 10, 11, 12, 13}, 7 as a hub and 14 as an outlier.  Fig.7 further decomposes the two clusters, thus identifying 10 also as a hub, because it neighbors two clusters.  At the extreme case in Fig.8, the conditions to form a cluster are so high, that none was identified, thus all nodes are taken to be outliers.  It is worth to note that ε=0.7 and μ=7 would also lead to the extreme case, because there is no combination of seven nodes that are closely connected. 33
  • 34. Anomaly detection in static plain graphs  SCAN :  How it work ??  At the beginning, all nodes are labeled as unclassified  SCAN performs one pass of the nodes, and classifies them either as a cluster member or a non-member based on structure connectivity  At the end, when all clusters are found, the non-members are classified further as hubs or outliers, based on the cluster membership of their neighbors 34
  • 35. Anomaly detection in static plain graphs  AUTOPART[12]-Parameter Free Graph Partitioning and Outlier Detection  capable of identifying anomalous edges  primary purpose is to (automatically) partition the graph into clusters without user intervention ~ it is parameter free  After finding a partitioning – a set of clusters – it proposes a method to measure the outlierness of edges that bridge separate clusters  This technique specifically uses the adjacency matrix as graph representation  A partitioning is a reordering of rows and columns in a way that nodes belonging to the same cluster are placed next to each other  the adjacency matrix is broken down to blocks 35
  • 36. Anomaly detection in static plain graphs  AUTOPART :  the squares located on the diagonal of the matrix capture the edges running inside the clusters  the rectangles represent the edges bridging the corresponding clusters. fig.9. nodes fig.10. groups 36
  • 37. Anomaly detection in static plain graphs  AUTOPART:  A good partitioning yields homogeneous blocks, which in turn, can be compressed efficiently  The total cost is comprised of a description cost and a code cost  Description cost : holds the information about the rectangular/square blocks. It is the transmission cost of the following terms : • Number of nodes • Node permutation (which row represents which node) • number of clusters • number of nodes in each cluster • number of ones in each block (the number of edges bridging the given clusters)  Code cost : holds the information about the content of the blocks. It is the transmission cost of the blocks calculated using the Shannon entropy function 37
  • 38. Anomaly detection in static plain graphs  AUTOPART:  Description cost penalizes a high number of blocks  code cost penalizes heterogeneous blocks  a good partitioning maintains a balance between a low number of clusters and a high homogeneity of blocks  The algorithm finds the tradeoff point between the two aspects and yields a construction with the minimal total cost 38
  • 39. Anomaly detection in static plain graphs  AUTOPART: how it works ?? fig.11. Start with initial matrix, k=1 Final partitionin g, k* STEP 2. Increase k, k=k+1 Lower the encoding cost STEP 1. Find good clusters for fixed k 39
  • 40. Anomaly detection in static plain graphs  AUTOPART:  It starts with an initial adjacency matrix, where all nodes belong to one cluster (k = 1)  Inside the main loop, the total cost is iteratively reduced until no improvements can be made, and the final partitioning together with the final cluster count 𝑘∗ is outputted  The iterative reduction is made up of two steps : 1. first, a good partitioning given the number of clusters is found 2. Second, the number of clusters is increased to allow for better partitioning  Once the final partitioning is found, AUTOPART marks the anomalous edges  Outliers show deviation from the normal patterns, so they hurt attempts to compress data  Therefore those edges, whose removal reduces the total cost the most are marked as outliers 40
  • 41. References 1. D. M. Hawkins, Identification of outliers, Springer, 1980. 2. Rissanen J (1999) Hypothesis selection and testing by the MDL principle. Comput J 42:260–269 3. Henderson K, Eliassi-Rad T, Faloutsos C, Akoglu L, Li L Maruhashi K, Prakash BA, Tong H (2010) Metricforensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 163–172 4. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R (2006) Link-based characterization and detection of Web Spam. In: Second international workshop on adversarial information retrieval on the web (AIRWeb) 5. Ding Q, Katenka N, Barford P, Kolaczyk ED, Crovella M (2012) Intrusion as (anti)social communication:characterization and detection. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China. ACM, pp 886–894 41
  • 42. References 6. Akoglu L,McGlohon M, Faloutsos C (2010) OddBall: spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining (PAKDD),Hyderabad, India, pp 410–421 7. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7):107–117 8. Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search.IEEE Trans Knowl Data Eng 15(4):784–796 9. JehG,Widom J (2002) SimRank: ameasure of structural-context similarity. In: Proceedings of the 8thACM international conference on knowledge discovery and data mining (SIGKDD), Edmonton, Alberta, pp 538–543 10. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining (ICDM), Houston, TX. IEEE Computer Society, pp 418–425 11. X. Xu, N. Yuruk, Z. Feng and T. A. Schweiger, "SCAN: A Structural Clustering Algorithm for Networks," in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007. 42
  • 43. References 12. D. Chakrabarti, "Autopart: Parameter-free graph partitioning and outlier detection," in Knowledge Discovery in Databases: PKDD 2004, Springer, 2004, pp. 112--124. 43