Focused Clustering and Outlier Detection in Large Attributed Graphs 
ACM SIG-KDD August 26, 2014 
Patricia Iglesias Sánchez*, Emmanuel Müller*† *Karlsruhe Institute of Technology †University of Antwerp 
Bryan Perozzi, Leman Akoglu Stony Brook University
Attributed Graphs 
Attributed graph: each node has 1+ properties 
Examples: 
Age 
School 
Relationship Status 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
2
Focused Mining of Attributed Graphs 
Numerous attributes (ex: Facebook profiles) 
Many irrelevant for most queries 
Ex: When trying to sell mortgages 
Useful: Income, Credit Score, Employer 
Not Useful: Hair Color, # Apps Installed 
Ex: When trying to sell make up 
Useful: Hair Color, Skin Tone, Gender 
Not Useful: Shoe Size 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
3 
Users have a Focus  Algorithms need a Focus too! 
 Focus 
 Focus
Adding Focus to Algorithms 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
4 
Users provide examples of the kind of similarity they are interested in. 
We infer the similarity function that matters to them. 
examples 
! 
user 
infer focus attributes 
focus 
Task
Outline 
Introduction 
New Problem: Focused Clustering & Outliers 
Our Approach: FocusCO 
Evaluation 
Conclusion 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
5
Focused Clusters and Outliers: Problem 
Given 
1) a graph w/ node attributes, 
2) exemplar nodes by the user 
Infer attribute weights/relevance 
Extract focused clusters: 
1) dense in structure, 
2) coherent in “heavy” attributes 
(called the “focus”) 
Detect focused outliers: 
*) nodes deviating in focus attribute values 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
6
An Example 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
7 
Users provide examples of nodes they consider similar. 
Ex: ‘Yann LeCun’ and ‘Foster Provost’ 
We learn a focus 
Education Level 
Location 
We extract clusters 
which agree with the focus 
We detect outliers 
which don’t agree with focus
Related Work 
Graph 
Clustering 
Attributed Graphs 
Attribute Subspace 
User Preference 
Outlier Detection 
METIS, Spectral 
✓ 
Parallel Nibble, BigClam 
✓ 
CoPaM, Gamer 
✓ 
✓ 
✓ 
CODA 
✓ 
✓ 
✓ 
GOutRank, ConSub 
✓ 
✓ 
✓ 
FocusCO 
✓ 
✓ 
✓ 
✓ 
✓ 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
8
examples 
FocusCO: sketch 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
9 
… 
… 
age 
gender 
location 
1 
2 
3 
infer 
“focus” 
attribute(s) 
4 
detect 
focused clusters & 
outliers
Focus attribute inference 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
10 
1. Construct a set of similar pairs, PS 
2. Construct a set of dissimilar pairs, PD 
Randomly sample 
pairs (u,v) 
3. Learn a distance metric between PS and PD 
Input: Set of similar nodes, Cex 
Pair user examples together 
Cex
Distance Metric Learning 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
11 
nodes 
attributes 
Feature Matrix 
PS and PD intermixed 
nodes 
attributes 
[Xing, et al 2002] 
Focused Attribute Vector 
PS closer together
examples 
FocusCO: sketch 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
12 
… 
… 
age 
gender 
location 
1 
2 
3 
infer 
“focus” 
attribute(s) 
4 
detect 
focused clusters & 
outliers
FocusCO: Cluster Extraction 
Local clustering algorithm 
Not cluster whole graph 
Expands a cluster around a starting set 
Two procedures: 
1.Finding good candidate sets to start at 
2.Growing clusters 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
13
Finding nodes to cluster around 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
14 
1.) We reweigh the graph using the focus 
2.) We keep only highly weighted edges 
3.) The connected components are our seeds 
A seed set
Growing a Focused Cluster 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
15 
2. At each step in cluster expansion: 2.1 - Examine boundary nodes 2.2 - Add node with best Δ 2.3 - Record best structural node 
1. Clustering objective: conductance weighted by focus 
Cluster Member 
Focused Outlier 
3. Focused Outliers: left out best structural nodes
Experiment set up 
Synthetic and Real World Graphs 
Performance measures: 
Cluster quality: NMI 
Outlier accuracy: precision, F1 
Compared to: 
CODA [Gao+’10] 
METIS (no outlier detection) [Karypis+’98] 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
16
Focused clustering performance 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
17 
9 clusters (3 focus1 + 3 focus2 +3 unfocused). 5 focus attributes.
Focused clustering performance 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
18
Outlier detection performance 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
19 
# deflated focus attributes increased (easier) from left to right
Disney: Amazon co-purchase graph 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
20 
Images are Focused Outliers
DBLP co-authorship graph 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
21 
Focused Outlier publishes in IR
Political blogs citation graph 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
22 
Focused Outlier did not mention Waas.
Summary 
Bryan Perozzi 
Focused Clustering and Outlier Detection in Large Attributed Graphs 
23 
A new graph mining paradigm where the focus steers graph mining according to user preference. 
A new problem formulation 
Focused Clustering & Outlier detection 
Thanks! Any questions? 
Bryan Perozzi (bperozzi@cs.stonybrook.edu) 
examples 
! 
user 
infer focus attributes 
focus 
Clustering

Focused Clustering and Outlier Detection in Large Attributed Graphs

  • 1.
    Focused Clustering andOutlier Detection in Large Attributed Graphs ACM SIG-KDD August 26, 2014 Patricia Iglesias Sánchez*, Emmanuel Müller*† *Karlsruhe Institute of Technology †University of Antwerp Bryan Perozzi, Leman Akoglu Stony Brook University
  • 2.
    Attributed Graphs Attributedgraph: each node has 1+ properties Examples: Age School Relationship Status Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 2
  • 3.
    Focused Mining ofAttributed Graphs Numerous attributes (ex: Facebook profiles) Many irrelevant for most queries Ex: When trying to sell mortgages Useful: Income, Credit Score, Employer Not Useful: Hair Color, # Apps Installed Ex: When trying to sell make up Useful: Hair Color, Skin Tone, Gender Not Useful: Shoe Size Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 3 Users have a Focus  Algorithms need a Focus too!  Focus  Focus
  • 4.
    Adding Focus toAlgorithms Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 4 Users provide examples of the kind of similarity they are interested in. We infer the similarity function that matters to them. examples ! user infer focus attributes focus Task
  • 5.
    Outline Introduction NewProblem: Focused Clustering & Outliers Our Approach: FocusCO Evaluation Conclusion Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 5
  • 6.
    Focused Clusters andOutliers: Problem Given 1) a graph w/ node attributes, 2) exemplar nodes by the user Infer attribute weights/relevance Extract focused clusters: 1) dense in structure, 2) coherent in “heavy” attributes (called the “focus”) Detect focused outliers: *) nodes deviating in focus attribute values Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 6
  • 7.
    An Example BryanPerozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 7 Users provide examples of nodes they consider similar. Ex: ‘Yann LeCun’ and ‘Foster Provost’ We learn a focus Education Level Location We extract clusters which agree with the focus We detect outliers which don’t agree with focus
  • 8.
    Related Work Graph Clustering Attributed Graphs Attribute Subspace User Preference Outlier Detection METIS, Spectral ✓ Parallel Nibble, BigClam ✓ CoPaM, Gamer ✓ ✓ ✓ CODA ✓ ✓ ✓ GOutRank, ConSub ✓ ✓ ✓ FocusCO ✓ ✓ ✓ ✓ ✓ Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 8
  • 9.
    examples FocusCO: sketch Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 9 … … age gender location 1 2 3 infer “focus” attribute(s) 4 detect focused clusters & outliers
  • 10.
    Focus attribute inference Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 10 1. Construct a set of similar pairs, PS 2. Construct a set of dissimilar pairs, PD Randomly sample pairs (u,v) 3. Learn a distance metric between PS and PD Input: Set of similar nodes, Cex Pair user examples together Cex
  • 11.
    Distance Metric Learning Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 11 nodes attributes Feature Matrix PS and PD intermixed nodes attributes [Xing, et al 2002] Focused Attribute Vector PS closer together
  • 12.
    examples FocusCO: sketch Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 12 … … age gender location 1 2 3 infer “focus” attribute(s) 4 detect focused clusters & outliers
  • 13.
    FocusCO: Cluster Extraction Local clustering algorithm Not cluster whole graph Expands a cluster around a starting set Two procedures: 1.Finding good candidate sets to start at 2.Growing clusters Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 13
  • 14.
    Finding nodes tocluster around Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 14 1.) We reweigh the graph using the focus 2.) We keep only highly weighted edges 3.) The connected components are our seeds A seed set
  • 15.
    Growing a FocusedCluster Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 15 2. At each step in cluster expansion: 2.1 - Examine boundary nodes 2.2 - Add node with best Δ 2.3 - Record best structural node 1. Clustering objective: conductance weighted by focus Cluster Member Focused Outlier 3. Focused Outliers: left out best structural nodes
  • 16.
    Experiment set up Synthetic and Real World Graphs Performance measures: Cluster quality: NMI Outlier accuracy: precision, F1 Compared to: CODA [Gao+’10] METIS (no outlier detection) [Karypis+’98] Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 16
  • 17.
    Focused clustering performance Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 17 9 clusters (3 focus1 + 3 focus2 +3 unfocused). 5 focus attributes.
  • 18.
    Focused clustering performance Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 18
  • 19.
    Outlier detection performance Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 19 # deflated focus attributes increased (easier) from left to right
  • 20.
    Disney: Amazon co-purchasegraph Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 20 Images are Focused Outliers
  • 21.
    DBLP co-authorship graph Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 21 Focused Outlier publishes in IR
  • 22.
    Political blogs citationgraph Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 22 Focused Outlier did not mention Waas.
  • 23.
    Summary Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 23 A new graph mining paradigm where the focus steers graph mining according to user preference. A new problem formulation Focused Clustering & Outlier detection Thanks! Any questions? Bryan Perozzi (bperozzi@cs.stonybrook.edu) examples ! user infer focus attributes focus Clustering