Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Incremental Lossless Graph Summarization", KDD 2020

A presentation slides of Jihoon Ko*, Yunbum Kook* and Kijung Shin, "Incremental Lossless Graph Summarization", KDD 2020.

Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot?
As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice.

In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.

  • Be the first to comment

  • Be the first to like this

"Incremental Lossless Graph Summarization", KDD 2020

  1. 1. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin
  2. 2. Large-scale Graphs are Everywhere! Icon made by Freepik from www.flaticon.com 2B+ active users 600M+ users 1.5B+ users
  3. 3. Large-scale Graphs are Everywhere! (cont.) 4B+ web pages 5M papers 6K+ proteins Icon made by Freepik from www.flaticon.com
  4. 4. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O
  5. 5. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation!
  6. 6. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation! • A larger portion of original graphs can be stored in main memory or cache
  7. 7. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  8. 8. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  9. 9. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs
  10. 10. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs However, most real-world graphs go through lots of changes in fact...
  11. 11. Real-world Graphs are Evolving 2B+ users2M+ users 10 years
  12. 12. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes
  13. 13. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes Solution: Incrementally update compressed graphs in fast and effective manners!
  14. 14. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  15. 15. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges
  16. 16. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  17. 17. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Delete {𝑓𝑓, 𝑖𝑖}
  18. 18. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖}
  19. 19. Lossless Graph Summarization: Example Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Output with 𝟒𝟒 edges Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  20. 20. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  21. 21. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  22. 22. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  23. 23. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Proposed in [NRS08] based on “the Minimum Description Length principle” 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  24. 24. Lossless Graph Summarization: Definition Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  25. 25. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  26. 26. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode
  27. 27. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  28. 28. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) 2. Edge corrections (𝑪𝑪+, 𝑪𝑪−) • Residual graph (Positive) 𝑪𝑪+ • Residual graph (Negative) 𝑪𝑪− Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  29. 29. Lossless Graph Summarization: Notation Supernode containing 𝒖𝒖 Edges between supernodes 𝑨𝑨 and 𝑩𝑩 All possible edges between 𝑨𝑨 and 𝑩𝑩 Neighborhood of a node 𝒖𝒖 Nodes incident to 𝒖𝒖 in 𝑪𝑪+ (or 𝑪𝑪−) Compression rate : 𝐒𝐒𝒖𝒖 (i.e. 𝒖𝒖 ∈ 𝑺𝑺𝒖𝒖) : 𝑬𝑬𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬 ∶ 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑻𝑻𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ⊆ 𝑽𝑽: 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑵𝑵 𝒖𝒖 = {𝒗𝒗 ∈ 𝑽𝑽 ∶ 𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬} : 𝑪𝑪+(𝒖𝒖) (or 𝑪𝑪−(𝒖𝒖)) : ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  30. 30. Lossless Graph Summarization: Optimal Encoding For summarization, determining supernodes 𝑺𝑺 (a partition of 𝑽𝑽) is our main concern  For given 𝑺𝑺, superedges 𝑷𝑷 and edge corrections 𝑪𝑪 are optimally determined Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  31. 31. Lossless Graph Summarization: Optimal Encoding Edges 𝑬𝑬𝑨𝑨𝑨𝑨 between two supernodes: (1) a superedge with 𝑪𝑪− or (2) no superedge with 𝑪𝑪+ Case 1: 𝑬𝑬𝑨𝑨𝑨𝑨 ≥ 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add superedge 𝑨𝑨𝑨𝑨 to 𝑷𝑷 and 𝑻𝑻𝑨𝑨𝑨𝑨𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪− Case 2: 𝑬𝑬𝑨𝑨𝑨𝑨 < 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add all edges in 𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪+ Costs: |𝐄𝐄𝐀𝐀𝐀𝐀|Costs: 𝟏𝟏 + 𝐓𝐓𝑨𝑨𝑨𝑨 − |𝐄𝐄𝐀𝐀𝐀𝐀| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝑬𝑬𝑨𝑨𝑨𝑨: Edges between supernodes 𝑨𝑨 and 𝑩𝑩 𝑻𝑻𝑨𝑨𝑨𝑨: All possible edges between 𝑨𝑨 and 𝑩𝑩 Notation
  32. 32. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  33. 33. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝑪𝑪+ only 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝝋𝝋 = 𝟏𝟏 + 𝟓𝟓 + 𝟏𝟏 = 𝟕𝟕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  34. 34. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  35. 35. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  36. 36. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  37. 37. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− Add all edges in 𝐂𝐂+ 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  38. 38. Why Lossless Graph Summarization? 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  39. 39. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  40. 40. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  41. 41. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections • Combinable • Its outputs are also graphs  further compressed via other compression techniques! 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  42. 42. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
  43. 43. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes:
  44. 44. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes: Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  45. 45. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  46. 46. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  47. 47. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + -+ …… Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  48. 48. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 −
  49. 49. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − Retained at time 𝒕𝒕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  50. 50. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  51. 51. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  52. 52. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + -+ …… Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  53. 53. Challenge: Fast Update but Good Performance
  54. 54. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  55. 55. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  56. 56. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  57. 57. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 How to update current summarization? 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  58. 58. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  59. 59. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  60. 60. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  61. 61. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Testing Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  62. 62. Scheme for Incremental Summarization MoSSo finds... (1) Testing nodes whose move likely results in 𝛗𝛗 ↓ (2) Candidates for testing node, likely resulting in 𝛗𝛗 ↓ Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  63. 63. Lossless summarization Scheme for Incremental Summarization Current graph 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑗𝑗𝑏𝑏 𝑗𝑗Testing node Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  64. 64. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶−𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  65. 65. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  66. 66. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering Repeated Time 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Performance Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  67. 67. MoSSo: Details Parameters: • Sample number 𝒄𝒄 • Escape prob. 𝒆𝒆 Input: • Summary graph 𝑮𝑮𝒕𝒕 ∗ & Edge corrections 𝑪𝑪𝒕𝒕 • Edge change 𝒖𝒖, 𝒗𝒗 + (addition) or 𝒖𝒖, 𝒗𝒗 − (deletion) Output: • Summary graph 𝑮𝑮𝒕𝒕+𝟏𝟏 ∗ & Edge corrections 𝑪𝑪𝒕𝒕+𝟏𝟏
  68. 68. MoSSo: Details (Step 1) – MCMC 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  69. 69. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  70. 70. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  71. 71. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  72. 72. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  73. 73. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} Graph densification law [LKF05]: “The average degree of real-world graphs increases over time.” 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  74. 74. MoSSo: Details (Step 1) – MCMC (cont.) S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un iformly random by using Markov Chain Monte Carlo method (MCMC)  MCMC method: sampling from a random variable with its probability density proportional to a given function 𝑢𝑢 𝑣𝑣
  75. 75. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? 𝑣𝑣 𝑢𝑢
  76. 76. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes?  Better not… 𝑣𝑣 𝑢𝑢
  77. 77. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)  Better not… 𝑣𝑣 𝑢𝑢
  78. 78. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  79. 79. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs) S2. Test a sampled node 𝑣𝑣 w.p. 1 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) (1) Likely to avoid expensive testing on high-degree nodes (2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  80. 80. MoSSo: Details (Step 2) – Coarse Clustering Testing node 𝑦𝑦 𝑣𝑣 𝑢𝑢
  81. 81. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) 𝑣𝑣 𝑢𝑢
  82. 82. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! 𝑣𝑣 𝑢𝑢
  83. 83. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  84. 84. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: (2) Clusters from min-hashing: updated rapidly in response to edge changes 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  85. 85. MoSSo: Details (Step 2) – Separation of Node 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  86. 86. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  87. 87. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  88. 88. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  89. 89. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates 𝑣𝑣 Testing node 𝑢𝑢
  90. 90. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates Similar to before, accept or reject the separation depending on Δ𝝋𝝋 𝑣𝑣 Testing node 𝑢𝑢
  91. 91. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  92. 92. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges)
  93. 93. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web
  94. 94. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social
  95. 95. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration
  96. 96. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration Email And others!
  97. 97. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) • Batch loseless graph summarization algorithms: • Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19] Web Social Collaboration Email And others!
  98. 98. Baseline Incremental Algorithms • MoSSo-Greedy: • Greedily moves nodes related to inserted/deleted edge, while fixing the other nodes so that the objective is minimized • MoSSo-MCMC • See the paper for details • MoSSo-Simple • MoSSo without coarse clustering
  99. 99. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm
  100. 100. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm UK (Insertion-only)
  101. 101. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm Insertion-only graph streams Fully-dynamic graph streams UK (Insertion-only)
  102. 102. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  103. 103. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  104. 104. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms PR EN FB DB YT SK LJ EU HW UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  105. 105. Experiment results: Scalability • MoSSo processed each change in near-constant time
  106. 106. Experiment results: Scalability EU (Insertion-only) SK (Fully-dynamic) • MoSSo processed each change in near-constant time
  107. 107. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  108. 108. Conclusions We propose MoSSo, the first algorithm for incremental lossless graph summarization
  109. 109. Conclusions Fast and ‘any time’ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  110. 110. Conclusions Fast and ‘any time’ Effective We propose MoSSo, the first algorithm for incremental lossless graph summarization
  111. 111. Conclusions Fast and ‘any time’ Effective Scalable We propose MoSSo, the first algorithm for incremental lossless graph summarization
  112. 112. Conclusions Fast and ‘any time’ Effective Scalable The code and datasets used in the paper are available at http://dmlab.kaist.ac.kr/mosso/ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  113. 113. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin

×