Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Carnegie 
Mellon 
University 
Making 
Sense 
of 
Large 
Graphs: 
Summarization 
and 
Similarity 
Danai Koutra 
Computer Sc...
Making 
sense 
of 
large 
graphs 
Human 
Connectome 
Project 
>1.25B 
users! 
scalable algorithms and models 
for understa...
Understanding 
Large 
Graphs 
Part 1 
S u m m a r i z a t i o n 
Danai Koutra (CMU) 3
Ever 
tried 
visualizing 
a 
large 
79,870 email 
accounts 
288,364 emails 
graph? 
Danai Koutra (CMU) 4
Ever 
tried 
visualizing 
a 
large 
79,870 email 
accounts 
288,364 emails 
graph? 
Danai Koutra (CMU) 5
After 
this 
talk, 
you’ll 
know 
how 
to 
Cind… 
VoG Top-3 Stars 
klay@enron.com 
kenneth.lay@enron.com 
Danai Koutra (CM...
Enron 
Summary 
VoG Top Near Bipartite Core 
Commenters CC’ed 
Danai Koutra (CMU) 7 
Ski 
excursion 
organizers 
participa...
Problem 
DeCinition 
Given: a graph 
Find: 
a succinct summary 
with possibly 
overlapping subgraphs 
≈ 
important graph 
...
Main 
Ideas 
Idea 1: Use well-known structures (vocabulary): 
Idea 2: Best graph summary 
Shortest lossless description 
è...
BACKGROUND 
Minimum 
Description 
Length 
~Occam’s razor 
min 
L(M) 
+ 
L(D|M) 
# bits 
for M 
a1 x + a0 
# bits for the 
...
Formally: 
Minimum 
Graph 
Description 
Given: - a graph G 
- vocabulary Ω 
Danai Koutra (CMU) 11 
Find: model M 
s.t. min...
VoG: 
Overview 
≈? 
argmin 
≈ 
Danai Koutra (CMU) 12
VoG: 
Overview 
Danai Koutra (CMU) 13 
Pick best 
(with some criterion) 
Summary
Q: 
Which 
structures 
to 
pick? 
A: Those that 
min description length 
S of G 
2|S| combinations 
Danai Koutra (CMU) 14
Runtime 
1.25B 
users! 
VOG is near-linear on # edges of the input graph. 
Danai Koutra (CMU) 15
Understanding 
a 
wiki 
graph 
I don’t see 
anything! L 
Nodes: wiki editors 
Edges: co-edited 
Danai Koutra (CMU) 16
Wiki 
Controversial 
Article 
Danai Koutra (CMU) 17 
Stars: 
admins, 
bots, 
heavy users 
Bipartite cores: edit wars 
Kiev...
VoG 
vs. 
other 
methods 
[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03] 
Stars, cliques near-cliques 
Danai Koutra (CMU) 18...
VoG: 
summary 
• Focus on important 
• possibly-overlapping structures 
• with known graph-theoretic properties 
Danai Kou...
Understanding 
Large 
Graphs 
Part 2 
S i m i l a r i t i e s 
Danai Koutra (CMU) 20
friendship 
graph 
≈ 
wall 
posts 
graph? 
VS. 
1 
Behavioral 
PaOerns 
Are 
the 
graphs 
/ 
behaviors 
similar? 
Danai Ko...
Why 
graph 
similarity? 
Day 
1 
Day 
2 
Day 
3 
Day 
4 
Danai Koutra (CMU) 22 
2 Classification 
Temporal 
anomaly 
detec...
Problem 
DeCinition: 
Graph 
Similarity 
• Given: 
(i) 2 graphs with the 
same nodes and 
different edge sets 
(ii) node c...
Obvious 
solution? 
Edge Overlap (EO) 
# of common edges 
(normalized or not) 
Danai Koutra 24 
GA 
GB
… 
but 
“barbell”… 
EO(B10,mB10) == EO(B10,mmB10) 
GA GA 
GB GB’ 
Danai Koutra 25
What 
makes 
a 
similarity 
function 
good? 
26 
• Properties: 
² Intuitive 
ProperFes 
like: 
“Edge-­‐importance” 
Danai...
ProperFes 
like: 
“Weight-­‐awareness” 
✗ 
What 
makes 
a 
similarity 
function 
good? 
27 
• Properties: 
² Intuitive 
²...
MAIN 
IDEA: 
DELTACON 
28 
① Find the pairwise node influence, SA  SB. 
② Find the similarity between SA  SB. 
SA 
= 
SB =...
INTUITION 
How? 
Using 
Belief 
Propagation 
Attenuating Neighboring Influence for small ε: 
1-hop 2-hops … 
29 
S =[I+ε 2...
OUR 
SOLUTION: 
DELTACON 
DETAILS 
30 
① Find the pairwise node influence, SA  SB. 
② Find the similarity between SA  SB. ...
… 
but 
O(n2) 
… 
31 
f a s t e r ? 
O(m1+m2) 
in the paper J 
Danai Koutra (CMU)
32 
• Nodes: 
Temporal 
Anomaly 
Detection 
email 
accounts 
of 
employees 
• Edges: 
email 
exchange 
sim1 
sim2 
sim3 
s...
Temporal 
Anomaly 
Detection 
similarity 
Feb 
4: 
Lay 
resigns 
consecu@ve 
days 
Danai Koutra (CMU) 
33
Brain-­‐Connectivity 
Graph 
Clustering 
34 
• 114 brain graphs 
² Nodes: 70 cortical regions 
² Edges: connections 
• A...
Brain-­‐Connectivity 
Graph 
Clustering 
Danai Koutra (CMU) 35 
t-­‐test 
p-­‐value 
= 
0.0057
Graph 
Understanding 
via 
… 
• … Summarization … 
² VoG: to spot the important graph structures 
• … Comparison … 
² De...
Thank 
you! 
Understanding 
summarization similarities 
www.cs.cmu.edu/~dkoutra/pub.htm 
danai@cs.cmu.edu 
Danai Koutra (C...
Upcoming SlideShare
Loading in …5
×

Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

2,898 views

Published on

Networks naturally capture a host of interactions in the real world spanning from friendships to brain activity. But, given a massive graph, like the Facebook social graph, what can be said about its structure? Which are its most important structures? How does it compare to other networks like Twitter? This talk will focus on my work developing scalable algorithms and models that help us to make sense of large graphs via pattern discovery and similarity analysis.

I will begin by presenting VoG, an approach that efficiently summarizes large graphs by finding their most interesting and semantically meaningful structures. Starting from a clutter of millions of nodes and edges, such as the Enron who-mails-whom graph, our Minimum Description Length based algorithm, disentangles the complex graph connectivity and spotlights the structures that ‘best’ describe the graph.

Then, for similarity analysis at the graph level, I will introduce the problems of graph comparison and graph alignment. I will conclude by showing how to apply my methods to temporal anomaly detection, brain graph clustering, deanonymization of bipartite (e.g., user-group membership) and unipartite graphs, and more.

Published in: Technology
  • Be the first to comment

Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL

  1. 1. Carnegie Mellon University Making Sense of Large Graphs: Summarization and Similarity Danai Koutra Computer Science Department Carnegie Mellon University danai@cs.cmu.edu http://www.cs.cmu.edu/~dkoutra Mlconf ‘14, Atlanta, GA
  2. 2. Making sense of large graphs Human Connectome Project >1.25B users! scalable algorithms and models for understanding massive graphs. Danai Koutra (CMU) 2
  3. 3. Understanding Large Graphs Part 1 S u m m a r i z a t i o n Danai Koutra (CMU) 3
  4. 4. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 4
  5. 5. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 5
  6. 6. After this talk, you’ll know how to Cind… VoG Top-3 Stars klay@enron.com kenneth.lay@enron.com Danai Koutra (CMU) 6
  7. 7. Enron Summary VoG Top Near Bipartite Core Commenters CC’ed Danai Koutra (CMU) 7 Ski excursion organizers participants “Affair”
  8. 8. Problem DeCinition Given: a graph Find: a succinct summary with possibly overlapping subgraphs ≈ important graph structures. [Koutra, Kang, Vreeken, Faloutsos. SDM’14] Danai Koutra (CMU) 8 Lady Gaga Fan Club
  9. 9. Main Ideas Idea 1: Use well-known structures (vocabulary): Idea 2: Best graph summary Shortest lossless description è optimal compression (MDL) Danai Koutra (CMU) 9
  10. 10. BACKGROUND Minimum Description Length ~Occam’s razor min L(M) + L(D|M) # bits for M a1 x + a0 # bits for the data using M errors a10 x10 + a9 x9 + … + a0 { } simple & good explanations Danai Koutra (CMU) 10
  11. 11. Formally: Minimum Graph Description Given: - a graph G - vocabulary Ω Danai Koutra (CMU) 11 Find: model M s.t. min L(G,M) = min{ L(M) + L(E) } Adjacency A Model M Error E
  12. 12. VoG: Overview ≈? argmin ≈ Danai Koutra (CMU) 12
  13. 13. VoG: Overview Danai Koutra (CMU) 13 Pick best (with some criterion) Summary
  14. 14. Q: Which structures to pick? A: Those that min description length S of G 2|S| combinations Danai Koutra (CMU) 14
  15. 15. Runtime 1.25B users! VOG is near-linear on # edges of the input graph. Danai Koutra (CMU) 15
  16. 16. Understanding a wiki graph I don’t see anything! L Nodes: wiki editors Edges: co-edited Danai Koutra (CMU) 16
  17. 17. Wiki Controversial Article Danai Koutra (CMU) 17 Stars: admins, bots, heavy users Bipartite cores: edit wars Kiev vs. Kyiv vandals vs. admins
  18. 18. VoG vs. other methods [Navlakha+’08] [Dunne+’13] [Chakrabarti+’03] Stars, cliques near-cliques Danai Koutra (CMU) 18 VoG Bounded-­‐Error Summariza@on Mo@f Simplifica@on Clustering Methods Cross-­‐ Associa@ons Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗
  19. 19. VoG: summary • Focus on important • possibly-overlapping structures • with known graph-theoretic properties Danai Koutra (CMU) 19 www.cs.cmu.edu/~dkoutra/SRC/vog.tar
  20. 20. Understanding Large Graphs Part 2 S i m i l a r i t i e s Danai Koutra (CMU) 20
  21. 21. friendship graph ≈ wall posts graph? VS. 1 Behavioral PaOerns Are the graphs / behaviors similar? Danai Koutra (CMU) 21
  22. 22. Why graph similarity? Day 1 Day 2 Day 3 Day 4 Danai Koutra (CMU) 22 2 Classification Temporal anomaly detec@on 3 4 Intrusion detec@on ! ! 12 13 14 22 23 sim1 sim2 sim3
  23. 23. Problem DeCinition: Graph Similarity • Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence • Find: similarity score s [0,1] € ∈ GA GB Danai Koutra (CMU) 23
  24. 24. Obvious solution? Edge Overlap (EO) # of common edges (normalized or not) Danai Koutra 24 GA GB
  25. 25. … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) GA GA GB GB’ Danai Koutra 25
  26. 26. What makes a similarity function good? 26 • Properties: ² Intuitive ProperFes like: “Edge-­‐importance” Danai Koutra
  27. 27. ProperFes like: “Weight-­‐awareness” ✗ What makes a similarity function good? 27 • Properties: ² Intuitive ² Scalable Danai Koutra ✗
  28. 28. MAIN IDEA: DELTACON 28 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. SA = SB = Danai Koutra (CMU) DETAILS
  29. 29. INTUITION How? Using Belief Propagation Attenuating Neighboring Influence for small ε: 1-hop 2-hops … 29 S =[I+ε 2D−εA]−1 ≈ ≈ [I −εA]−1 = I+εA+ε 2A2 +... Note: ε ε2 ..., 0ε1 Danai Koutra (CMU)
  30. 30. OUR SOLUTION: DELTACON DETAILS 30 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. Danai Koutra (CMU) sim( ) = 1 1+ Σ ( 2 s− s)i, j A,ij B,ij SA,SB SA = SB = “Root” Euclidean Distance
  31. 31. … but O(n2) … 31 f a s t e r ? O(m1+m2) in the paper J Danai Koutra (CMU)
  32. 32. 32 • Nodes: Temporal Anomaly Detection email accounts of employees • Edges: email exchange sim1 sim2 sim3 sim4 Day 1 Day 2 Day 3 Day 4 Day 5 Danai Koutra (CMU)
  33. 33. Temporal Anomaly Detection similarity Feb 4: Lay resigns consecu@ve days Danai Koutra (CMU) 33
  34. 34. Brain-­‐Connectivity Graph Clustering 34 • 114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections • Attributes: gender, IQ, age… Danai Koutra (CMU)
  35. 35. Brain-­‐Connectivity Graph Clustering Danai Koutra (CMU) 35 t-­‐test p-­‐value = 0.0057
  36. 36. Graph Understanding via … • … Summarization … ² VoG: to spot the important graph structures • … Comparison … ² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently Danai Koutra (CMU) 36
  37. 37. Thank you! Understanding summarization similarities www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra (CMU) 37

×