Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

2,927 views

Published on

I will begin by presenting VoG, an approach that efficiently summarizes large graphs by finding their most interesting and semantically meaningful structures. Starting from a clutter of millions of nodes and edges, such as the Enron who-mails-whom graph, our Minimum Description Length based algorithm, disentangles the complex graph connectivity and spotlights the structures that ‘best’ describe the graph.

Then, for similarity analysis at the graph level, I will introduce the problems of graph comparison and graph alignment. I will conclude by showing how to apply my methods to temporal anomaly detection, brain graph clustering, deanonymization of bipartite (e.g., user-group membership) and unipartite graphs, and more.

Published in:
Technology

No Downloads

Total views

2,927

On SlideShare

0

From Embeds

0

Number of Embeds

1,662

Shares

0

Downloads

31

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Carnegie Mellon University Making Sense of Large Graphs: Summarization and Similarity Danai Koutra Computer Science Department Carnegie Mellon University danai@cs.cmu.edu http://www.cs.cmu.edu/~dkoutra Mlconf ‘14, Atlanta, GA
- 2. Making sense of large graphs Human Connectome Project >1.25B users! scalable algorithms and models for understanding massive graphs. Danai Koutra (CMU) 2
- 3. Understanding Large Graphs Part 1 S u m m a r i z a t i o n Danai Koutra (CMU) 3
- 4. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 4
- 5. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 5
- 6. After this talk, you’ll know how to Cind… VoG Top-3 Stars klay@enron.com kenneth.lay@enron.com Danai Koutra (CMU) 6
- 7. Enron Summary VoG Top Near Bipartite Core Commenters CC’ed Danai Koutra (CMU) 7 Ski excursion organizers participants “Affair”
- 8. Problem DeCinition Given: a graph Find: a succinct summary with possibly overlapping subgraphs ≈ important graph structures. [Koutra, Kang, Vreeken, Faloutsos. SDM’14] Danai Koutra (CMU) 8 Lady Gaga Fan Club
- 9. Main Ideas Idea 1: Use well-known structures (vocabulary): Idea 2: Best graph summary Shortest lossless description è optimal compression (MDL) Danai Koutra (CMU) 9
- 10. BACKGROUND Minimum Description Length ~Occam’s razor min L(M) + L(D|M) # bits for M a1 x + a0 # bits for the data using M errors a10 x10 + a9 x9 + … + a0 { } simple & good explanations Danai Koutra (CMU) 10
- 11. Formally: Minimum Graph Description Given: - a graph G - vocabulary Ω Danai Koutra (CMU) 11 Find: model M s.t. min L(G,M) = min{ L(M) + L(E) } Adjacency A Model M Error E
- 12. VoG: Overview ≈? argmin ≈ Danai Koutra (CMU) 12
- 13. VoG: Overview Danai Koutra (CMU) 13 Pick best (with some criterion) Summary
- 14. Q: Which structures to pick? A: Those that min description length S of G 2|S| combinations Danai Koutra (CMU) 14
- 15. Runtime 1.25B users! VOG is near-linear on # edges of the input graph. Danai Koutra (CMU) 15
- 16. Understanding a wiki graph I don’t see anything! L Nodes: wiki editors Edges: co-edited Danai Koutra (CMU) 16
- 17. Wiki Controversial Article Danai Koutra (CMU) 17 Stars: admins, bots, heavy users Bipartite cores: edit wars Kiev vs. Kyiv vandals vs. admins
- 18. VoG vs. other methods [Navlakha+’08] [Dunne+’13] [Chakrabarti+’03] Stars, cliques near-cliques Danai Koutra (CMU) 18 VoG Bounded-‐Error Summariza@on Mo@f Simplifica@on Clustering Methods Cross-‐ Associa@ons Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗
- 19. VoG: summary • Focus on important • possibly-overlapping structures • with known graph-theoretic properties Danai Koutra (CMU) 19 www.cs.cmu.edu/~dkoutra/SRC/vog.tar
- 20. Understanding Large Graphs Part 2 S i m i l a r i t i e s Danai Koutra (CMU) 20
- 21. friendship graph ≈ wall posts graph? VS. 1 Behavioral PaOerns Are the graphs / behaviors similar? Danai Koutra (CMU) 21
- 22. Why graph similarity? Day 1 Day 2 Day 3 Day 4 Danai Koutra (CMU) 22 2 Classification Temporal anomaly detec@on 3 4 Intrusion detec@on ! ! 12 13 14 22 23 sim1 sim2 sim3
- 23. Problem DeCinition: Graph Similarity • Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence • Find: similarity score s [0,1] € ∈ GA GB Danai Koutra (CMU) 23
- 24. Obvious solution? Edge Overlap (EO) # of common edges (normalized or not) Danai Koutra 24 GA GB
- 25. … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) GA GA GB GB’ Danai Koutra 25
- 26. What makes a similarity function good? 26 • Properties: ² Intuitive ProperFes like: “Edge-‐importance” Danai Koutra
- 27. ProperFes like: “Weight-‐awareness” ✗ What makes a similarity function good? 27 • Properties: ² Intuitive ² Scalable Danai Koutra ✗
- 28. MAIN IDEA: DELTACON 28 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. SA = SB = Danai Koutra (CMU) DETAILS
- 29. INTUITION How? Using Belief Propagation Attenuating Neighboring Influence for small ε: 1-hop 2-hops … 29 S =[I+ε 2D−εA]−1 ≈ ≈ [I −εA]−1 = I+εA+ε 2A2 +... Note: ε ε2 ..., 0ε1 Danai Koutra (CMU)
- 30. OUR SOLUTION: DELTACON DETAILS 30 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. Danai Koutra (CMU) sim( ) = 1 1+ Σ ( 2 s− s)i, j A,ij B,ij SA,SB SA = SB = “Root” Euclidean Distance
- 31. … but O(n2) … 31 f a s t e r ? O(m1+m2) in the paper J Danai Koutra (CMU)
- 32. 32 • Nodes: Temporal Anomaly Detection email accounts of employees • Edges: email exchange sim1 sim2 sim3 sim4 Day 1 Day 2 Day 3 Day 4 Day 5 Danai Koutra (CMU)
- 33. Temporal Anomaly Detection similarity Feb 4: Lay resigns consecu@ve days Danai Koutra (CMU) 33
- 34. Brain-‐Connectivity Graph Clustering 34 • 114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections • Attributes: gender, IQ, age… Danai Koutra (CMU)
- 35. Brain-‐Connectivity Graph Clustering Danai Koutra (CMU) 35 t-‐test p-‐value = 0.0057
- 36. Graph Understanding via … • … Summarization … ² VoG: to spot the important graph structures • … Comparison … ² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently Danai Koutra (CMU) 36
- 37. Thank you! Understanding summarization similarities www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra (CMU) 37

No public clipboards found for this slide

Be the first to comment