"Incremental Lossless Graph Summarization", KDD 2020

Incremental Lossless
Graph Summarization
Jihoon Ko* Yunbum Kook* Kijung Shin
Large-scale Graphs are Everywhere!
Icon made by Freepik from www.flaticon.com
2B+ active users
600M+ users
1.5B+ users
Large-scale Graphs are Everywhere! (cont.)
4B+ web pages 5M papers 6K+ proteins
Icon made by Freepik from www.flaticon.com
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
Graph Compression for Efficient Manipulation
• Handling large-scale graphs as they are...
 heavy disk or network I/O
• Their compact representation makes possible efficient manipulation!
• A larger portion of original graphs can be stored in main memory or cache
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
Previous Graph Compression Techniques
• Various compression techniques have been proposed
• Relabeling nodes
• Pattern mining
• Lossless graph summarization  One of the most effective compression techniques
• …
• Lossless graph summarization is a batch algorithm for “static graphs”,
which indicate a single or a few snapshots of evolving graphs
However, most real-world graphs
go through lots of changes in fact...
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
Algorithms should be rerun from scratch to reflect changes
Solution: Incrementally update compressed graphs in fast and
effective manners!
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Delete {𝑓𝑓, 𝑖𝑖}
Lossless Graph Summarization: Example
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
Delete {𝑓𝑓, 𝑖𝑖}
Lossless Graph Summarization: Example
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Output with 𝟒𝟒 edges
Input graph with 𝟏𝟏𝟏𝟏 edges
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖}
Delete {𝑓𝑓, 𝑖𝑖}
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization
Lossless summarization yields (1) a summary graph and (2) edge corrections,
while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+
+ |𝑪𝑪−
|
(≈ “description cost” denoted by 𝝋𝝋)
Proposed in [NRS08]
based on “the Minimum
Description Length principle”
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Superedge
Lossless Graph Summarization: Definition
1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
• Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes
• Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above)
2. Edge corrections (𝑪𝑪+, 𝑪𝑪−)
• Residual graph (Positive) 𝑪𝑪+
• Residual graph (Negative) 𝑪𝑪−
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Supernode
Superedge
Lossless Graph Summarization: Notation
Supernode containing 𝒖𝒖
Edges between supernodes 𝑨𝑨 and 𝑩𝑩
All possible edges between 𝑨𝑨 and 𝑩𝑩
Neighborhood of a node 𝒖𝒖
Nodes incident to 𝒖𝒖 in 𝑪𝑪+ (or 𝑪𝑪−)
Compression rate
: 𝐒𝐒𝒖𝒖 (i.e. 𝒖𝒖 ∈ 𝑺𝑺𝒖𝒖)
: 𝑬𝑬𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬 ∶ 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)}
: 𝑻𝑻𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ⊆ 𝑽𝑽: 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)}
: 𝑵𝑵 𝒖𝒖 = {𝒗𝒗 ∈ 𝑽𝑽 ∶ 𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬}
: 𝑪𝑪+(𝒖𝒖) (or 𝑪𝑪−(𝒖𝒖))
: ( 𝑷𝑷 + 𝑪𝑪+
+ 𝑪𝑪−
)/|𝑬𝑬|
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
For summarization, determining supernodes 𝑺𝑺 (a partition of 𝑽𝑽) is our main concern
 For given 𝑺𝑺, superedges 𝑷𝑷 and edge corrections 𝑪𝑪 are optimally determined
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
Edges 𝑬𝑬𝑨𝑨𝑨𝑨 between two supernodes:
(1) a superedge with 𝑪𝑪− or (2) no superedge with 𝑪𝑪+
Case 1: 𝑬𝑬𝑨𝑨𝑨𝑨 ≥
𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏
𝟐𝟐
: add superedge 𝑨𝑨𝑨𝑨 to 𝑷𝑷 and 𝑻𝑻𝑨𝑨𝑨𝑨𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪−
Case 2: 𝑬𝑬𝑨𝑨𝑨𝑨 <
𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏
𝟐𝟐
: add all edges in 𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪+
Costs: |𝐄𝐄𝐀𝐀𝐀𝐀|Costs: 𝟏𝟏 + 𝐓𝐓𝑨𝑨𝑨𝑨 − |𝐄𝐄𝐀𝐀𝐀𝐀|
Lossless
Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝑬𝑬𝑨𝑨𝑨𝑨: Edges between supernodes 𝑨𝑨 and 𝑩𝑩
𝑻𝑻𝑨𝑨𝑨𝑨: All possible edges between 𝑨𝑨 and 𝑩𝑩
Notation
Lossless Graph Summarization: Optimal Encoding
Superedge 𝑨𝑨𝑨𝑨
𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Lossless Graph Summarization: Optimal Encoding
Superedge 𝑨𝑨𝑨𝑨
𝑪𝑪+
only
𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒
𝝋𝝋 = 𝟏𝟏 + 𝟓𝟓 + 𝟏𝟏 = 𝟕𝟕
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝐶𝐶−
= 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+, 𝑪𝑪−)
Input graph
𝐆𝐆 = (𝑽𝑽, 𝑬𝑬)
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓}
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
Remove all edges
in 𝐂𝐂−
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Recovery: Example
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes
between two adjacent
supernodes
Remove all edges
in 𝐂𝐂−
Add all edges in 𝐂𝐂+
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
𝐶𝐶+
= 𝑎𝑎𝑎𝑎 , 𝐶𝐶−
= {𝑓𝑓𝑓𝑓}
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐶𝐶+ = 𝑎𝑎𝑎𝑎
𝑒𝑒
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Why Lossless Graph Summarization?
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Why Lossless Graph Summarization?
• Queryable (Retrieving the neighborhood of a query node)
• Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstra’s, etc)
• Rapidly done from a summary and corrections
• Combinable
• Its outputs are also graphs  further compressed via other compression techniques!
𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
Stream of changes:
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + +
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −
 The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
……Stream of changes:
+ - - + -+
……
Empty graph 𝑮𝑮𝟎𝟎
Time 𝒕𝒕 = 𝟎𝟎
Current graph 𝑮𝑮𝒕𝒕
Time 𝒕𝒕
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
Retained at time 𝒕𝒕
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Problem Formulation
• Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎
∞
• Retain Summary graph 𝑮𝑮𝒕𝒕
∗
= 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕
+
, 𝑪𝑪𝒕𝒕
−
)
of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕
• To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕
+
+ 𝑪𝑪𝒕𝒕
−
+ -+
……
Retained at time 𝒕𝒕
Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏
𝐶𝐶+
= 𝑎𝑎𝑎𝑎
𝐶𝐶− = 𝑓𝑓𝑖𝑖
Summary graph
𝑮𝑮∗
= (𝑺𝑺, 𝑷𝑷)
Edge corrections
(𝑪𝑪+
, 𝑪𝑪−
)
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒}
𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
Challenge: Fast Update but Good Performance
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 4
𝐶𝐶+ 𝐶𝐶−
𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
New edge:
𝑗𝑗𝑏𝑏
𝑎𝑎 𝑗𝑗
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
New edge:
𝑗𝑗𝑏𝑏
How to update
current summarization?
𝑎𝑎 𝑗𝑗
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Our approach
(1) Attempt to move nodes
among supernodes
(2) Accept the move if 𝝋𝝋
decreases
(3) Reject otherwise
Testing node
Testing
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
MoSSo finds...
(1) Testing nodes
whose move likely
results in 𝛗𝛗 ↓
(2) Candidates for
testing node, likely
resulting in 𝛗𝛗 ↓
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Lossless summarization
Scheme for Incremental Summarization
Current graph
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝑗𝑗𝑏𝑏
𝑗𝑗Testing node
Candidate
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 5
𝑎𝑎 𝑓𝑓
𝑓𝑓𝑖𝑖
𝑎𝑎 𝑗𝑗
𝐶𝐶+ 𝐶𝐶−
Scheme for Incremental Summarization
Current graph Lossless summarization
𝐴𝐴
𝑎𝑎
𝐶𝐶
𝐺𝐺∗
𝑎𝑎
𝑐𝑐
𝑑𝑑
𝑒𝑒 𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐵𝐵
𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑
𝑖𝑖
𝑓𝑓 𝑔𝑔
ℎ
𝑗𝑗𝑏𝑏
𝑗𝑗
𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+
| + |𝐶𝐶−
| = 4
𝐶𝐶+ 𝐶𝐶−𝐶𝐶+ 𝐶𝐶−
𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
MoSSo: Main Ideas
• Step 1: Set testing nodes
• (S1) No restoration from the
current summarization 𝐺𝐺𝑡𝑡
∗
= 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 ,
𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡
+
, 𝐶𝐶𝑡𝑡
−
)
• (S2) Reduce redundant testing by a
stochastic filtering
𝑢𝑢 𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
• Step 2: Find candidate
• (S3) Utilize an incremental
coarse clustering
• (S4) Inject flexibility to
reorganization of supernodes
Which nodes to move?
(testing nodes)
𝑢𝑢 𝑣𝑣
Testing
node
Move into
which supernode?
(candidates)
MoSSo: Main Ideas
• Step 1: Set testing nodes
• (S1) No restoration from the
current summarization 𝐺𝐺𝑡𝑡
∗
= 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 ,
𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡
+
, 𝐶𝐶𝑡𝑡
−
)
• (S2) Reduce redundant testing by a
stochastic filtering
Repeated
Time
𝑢𝑢 𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
• Step 2: Find candidate
• (S3) Utilize an incremental
coarse clustering
• (S4) Inject flexibility to
reorganization of supernodes
Performance
Which nodes to move?
(testing nodes)
𝑢𝑢 𝑣𝑣
Testing
node
Move into
which supernode?
(candidates)
MoSSo: Details
Parameters:
• Sample number 𝒄𝒄
• Escape prob. 𝒆𝒆
Input:
• Summary graph 𝑮𝑮𝒕𝒕
∗
& Edge corrections 𝑪𝑪𝒕𝒕
• Edge change 𝒖𝒖, 𝒗𝒗 + (addition) or 𝒖𝒖, 𝒗𝒗 − (deletion)
Output:
• Summary graph 𝑮𝑮𝒕𝒕+𝟏𝟏
∗
& Edge corrections 𝑪𝑪𝒕𝒕+𝟏𝟏
MoSSo: Details (Step 1) – MCMC
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
 Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC
Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected
 Focus on testings nodes in 𝑵𝑵(𝒖𝒖)
P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from
𝑮𝑮∗
and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average
 Deadly to scalability…
𝑢𝑢
𝑣𝑣
Changed edge
𝑒𝑒 = {𝑢𝑢, 𝑣𝑣}
Graph densification law [LKF05]:
“The average degree of real-world graphs increases over time.”
𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖
Notation
MoSSo: Details (Step 1) – MCMC (cont.)
S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un
iformly random by using Markov Chain Monte Carlo method
(MCMC)
 MCMC method: sampling from a random variable with its
probability density proportional to a given function
𝑢𝑢
𝑣𝑣
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?  Better not…
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
 Better not…
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
 Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 1) – Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
Computationally
heavy
(Too many nbrs)
S2. Test a sampled node 𝑣𝑣 w.p.
1
𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)
(1) Likely to avoid expensive testing on high-degree nodes
(2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across
all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)
 Better not…
- Updating the optimal encoding
- Computing the change 𝛥𝛥𝛥𝛥 in the description cost
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
Testing
node
𝑦𝑦
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Coarse Clustering
P3. Among many choices, how do we know ”good” candidates?
Testing
node
𝑦𝑦
(likely resulting in 𝝋𝝋 ↓)
S3. Utilize an incremental coarse clustering
 Desirable: Nodes with “similar connectivity”
in the same cluster
 Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
(2) Clusters from min-hashing: updated rapidly in response to edge changes
𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 )
⇒ Grouping nodes with similar connectivity
Min-hashing
𝑣𝑣
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
𝑦𝑦
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
 Inject flexibility to supernodes (a partition of 𝑽𝑽)
 Empirically significant improvement in compression rates
𝑣𝑣
Testing
node
𝑢𝑢
MoSSo: Details (Step 2) – Separation of Node
P4. In this way, moving nodes only decreases or maintains 𝑺𝑺
 Discourage reorganizing supernodes in the long run
𝑦𝑦
S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and
create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆
 Inject flexibility to supernodes (a partition of 𝑽𝑽)
 Empirically significant improvement in compression rates
Similar to before,
accept or reject the separation depending on Δ𝝋𝝋
𝑣𝑣
Testing
node
𝑢𝑢
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social Collaboration
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
Web Social Collaboration Email And others!
Experimental Settings
• 10 Real-world Graphs (up to 0.3B edges)
• Batch loseless graph summarization algorithms:
• Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19]
Web Social Collaboration Email And others!
Baseline Incremental Algorithms
• MoSSo-Greedy:
• Greedily moves nodes related to inserted/deleted edge, while fixing the
other nodes so that the objective is minimized
• MoSSo-MCMC
• See the paper for details
• MoSSo-Simple
• MoSSo without coarse clustering
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
UK (Insertion-only)
Experiment results: Speed
• MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Insertion-only graph streams
Fully-dynamic graph streams
UK (Insertion-only)
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Compression Performance
• The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
• MoSSo achieved the best compression ratios among the streaming
algorithms
PR EN FB
DB YT SK
LJ EU HW
UK
Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬|
Notation
Experiment results: Scalability
• MoSSo processed each change in near-constant time
Experiment results: Scalability
EU (Insertion-only) SK (Fully-dynamic)
• MoSSo processed each change in near-constant time
Outline
• Preliminaries
• Proposed Algorithm: MoSSo
• Experimental Results
• Conclusions
Conclusions
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective Scalable
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Conclusions
Fast and ‘any time’ Effective Scalable
The code and datasets used in the paper
are available at http://dmlab.kaist.ac.kr/mosso/
We propose MoSSo, the first algorithm for incremental lossless graph summarization
Incremental Lossless
Graph Summarization
Jihoon Ko* Yunbum Kook* Kijung Shin
1 of 113

More Related Content

Similar to "Incremental Lossless Graph Summarization", KDD 2020(20)

Recently uploaded(20)

Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 12 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar7 views
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian5 views
ColonyOSColonyOS
ColonyOS
JohanKristiansson69 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia17 views
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese4917 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE909 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika19 views

"Incremental Lossless Graph Summarization", KDD 2020

  • 1. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin
  • 2. Large-scale Graphs are Everywhere! Icon made by Freepik from www.flaticon.com 2B+ active users 600M+ users 1.5B+ users
  • 3. Large-scale Graphs are Everywhere! (cont.) 4B+ web pages 5M papers 6K+ proteins Icon made by Freepik from www.flaticon.com
  • 4. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O
  • 5. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation!
  • 6. Graph Compression for Efficient Manipulation • Handling large-scale graphs as they are...  heavy disk or network I/O • Their compact representation makes possible efficient manipulation! • A larger portion of original graphs can be stored in main memory or cache
  • 7. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  • 8. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • …
  • 9. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs
  • 10. Previous Graph Compression Techniques • Various compression techniques have been proposed • Relabeling nodes • Pattern mining • Lossless graph summarization  One of the most effective compression techniques • … • Lossless graph summarization is a batch algorithm for “static graphs”, which indicate a single or a few snapshots of evolving graphs However, most real-world graphs go through lots of changes in fact...
  • 11. Real-world Graphs are Evolving 2B+ users2M+ users 10 years
  • 12. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes
  • 13. Real-world Graphs are Evolving 2B+ users2M+ users 10 years Previous algorithms: not designed to allow for changes in graphs Algorithms should be rerun from scratch to reflect changes Solution: Incrementally update compressed graphs in fast and effective manners!
  • 14. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 15. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges
  • 16. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 17. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Delete {𝑓𝑓, 𝑖𝑖}
  • 18. Lossless Graph Summarization: Example 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖}
  • 19. Lossless Graph Summarization: Example Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 Output with 𝟒𝟒 edges Input graph with 𝟏𝟏𝟏𝟏 edges 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} Add {𝑎𝑎, 𝑓𝑓} & Delete {𝑓𝑓, 𝑖𝑖} Delete {𝑓𝑓, 𝑖𝑖} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 20. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 21. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 22. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 23. Lossless Graph Summarization: Definition Lossless Summarization Lossless summarization yields (1) a summary graph and (2) edge corrections, while minimizing the edge count 𝑷𝑷 + 𝑪𝑪+ + |𝑪𝑪− | (≈ “description cost” denoted by 𝝋𝝋) Proposed in [NRS08] based on “the Minimum Description Length principle” 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 24. Lossless Graph Summarization: Definition Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 25. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 26. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode
  • 27. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  • 28. Lossless Graph Summarization: Definition 1. Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) • Supernodes 𝑺𝑺 = a partition of 𝑽𝑽, where each supernode is a set of nodes • Superedges 𝑷𝑷 = a set of pairs of supernodes (ex: {𝑨𝑨, 𝑩𝑩} in example above) 2. Edge corrections (𝑪𝑪+, 𝑪𝑪−) • Residual graph (Positive) 𝑪𝑪+ • Residual graph (Negative) 𝑪𝑪− Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} Supernode Superedge
  • 29. Lossless Graph Summarization: Notation Supernode containing 𝒖𝒖 Edges between supernodes 𝑨𝑨 and 𝑩𝑩 All possible edges between 𝑨𝑨 and 𝑩𝑩 Neighborhood of a node 𝒖𝒖 Nodes incident to 𝒖𝒖 in 𝑪𝑪+ (or 𝑪𝑪−) Compression rate : 𝐒𝐒𝒖𝒖 (i.e. 𝒖𝒖 ∈ 𝑺𝑺𝒖𝒖) : 𝑬𝑬𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬 ∶ 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑻𝑻𝑨𝑨𝑨𝑨 = {𝒖𝒖𝒖𝒖 ⊆ 𝑽𝑽: 𝒖𝒖 ∈ 𝑨𝑨, 𝒗𝒗 ∈ 𝑩𝑩 (𝒖𝒖 ≠ 𝒗𝒗)} : 𝑵𝑵 𝒖𝒖 = {𝒗𝒗 ∈ 𝑽𝑽 ∶ 𝒖𝒖𝒖𝒖 ∈ 𝑬𝑬} : 𝑪𝑪+(𝒖𝒖) (or 𝑪𝑪−(𝒖𝒖)) : ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 30. Lossless Graph Summarization: Optimal Encoding For summarization, determining supernodes 𝑺𝑺 (a partition of 𝑽𝑽) is our main concern  For given 𝑺𝑺, superedges 𝑷𝑷 and edge corrections 𝑪𝑪 are optimally determined Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 31. Lossless Graph Summarization: Optimal Encoding Edges 𝑬𝑬𝑨𝑨𝑨𝑨 between two supernodes: (1) a superedge with 𝑪𝑪− or (2) no superedge with 𝑪𝑪+ Case 1: 𝑬𝑬𝑨𝑨𝑨𝑨 ≥ 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add superedge 𝑨𝑨𝑨𝑨 to 𝑷𝑷 and 𝑻𝑻𝑨𝑨𝑨𝑨𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪− Case 2: 𝑬𝑬𝑨𝑨𝑨𝑨 < 𝑻𝑻𝑨𝑨𝑨𝑨 +𝟏𝟏 𝟐𝟐 : add all edges in 𝑬𝑬𝑨𝑨𝑨𝑨 to 𝑪𝑪+ Costs: |𝐄𝐄𝐀𝐀𝐀𝐀|Costs: 𝟏𝟏 + 𝐓𝐓𝑨𝑨𝑨𝑨 − |𝐄𝐄𝐀𝐀𝐀𝐀| Lossless Summarization 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝑬𝑬𝑨𝑨𝑨𝑨: Edges between supernodes 𝑨𝑨 and 𝑩𝑩 𝑻𝑻𝑨𝑨𝑨𝑨: All possible edges between 𝑨𝑨 and 𝑩𝑩 Notation
  • 32. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 33. Lossless Graph Summarization: Optimal Encoding Superedge 𝑨𝑨𝑨𝑨 𝑪𝑪+ only 𝝋𝝋 = 𝟐𝟐 + 𝟏𝟏 + 𝟏𝟏 = 𝟒𝟒 𝝋𝝋 = 𝟏𝟏 + 𝟓𝟓 + 𝟏𝟏 = 𝟕𝟕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+, 𝑪𝑪−) Input graph 𝐆𝐆 = (𝑽𝑽, 𝑬𝑬) 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 34. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 35. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 36. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 37. Recovery: Example 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} Add all pairs of nodes between two adjacent supernodes Remove all edges in 𝐂𝐂− Add all edges in 𝐂𝐂+ 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖} 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 , 𝐶𝐶− = {𝑓𝑓𝑓𝑓} 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝑒𝑒 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 38. Why Lossless Graph Summarization? 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 39. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 40. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 41. Why Lossless Graph Summarization? • Queryable (Retrieving the neighborhood of a query node) • Queryability: key building blocks in numerous graph algorithms (ex: DFS, PageRank, Dijkstra’s, etc) • Rapidly done from a summary and corrections • Combinable • Its outputs are also graphs  further compressed via other compression techniques! 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 42. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕
  • 43. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes:
  • 44. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 Stream of changes: Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  • 45. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎
  • 46. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + + Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  • 47. Fully Dynamic Graph Stream Fully dynamic graphs can be represented by using a sequence {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ of edge addition 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 + and deletion 𝒆𝒆𝒕𝒕 = 𝒖𝒖, 𝒗𝒗 −  The graph at time 𝒕𝒕 is constructed by aggregating all edges change until time 𝒕𝒕 ……Stream of changes: + - - + -+ …… Empty graph 𝑮𝑮𝟎𝟎 Time 𝒕𝒕 = 𝟎𝟎 Current graph 𝑮𝑮𝒕𝒕 Time 𝒕𝒕
  • 48. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 −
  • 49. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − Retained at time 𝒕𝒕 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 50. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 51. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 52. Problem Formulation • Given Fully dynamic graph stream {𝒆𝒆𝒕𝒕}𝒕𝒕=𝟎𝟎 ∞ • Retain Summary graph 𝑮𝑮𝒕𝒕 ∗ = 𝑺𝑺𝒕𝒕, 𝑷𝑷𝒕𝒕 and Edge corrections 𝑪𝑪𝒕𝒕 = (𝑪𝑪𝒕𝒕 + , 𝑪𝑪𝒕𝒕 − ) of graph 𝑮𝑮𝒕𝒕 at time 𝒕𝒕 • To Minimize the size of output representation 𝑷𝑷𝒕𝒕 + 𝑪𝑪𝒕𝒕 + + 𝑪𝑪𝒕𝒕 − + -+ …… Retained at time 𝒕𝒕 Edge change 𝒆𝒆𝒕𝒕+𝟏𝟏 𝐶𝐶+ = 𝑎𝑎𝑎𝑎 𝐶𝐶− = 𝑓𝑓𝑖𝑖 Summary graph 𝑮𝑮∗ = (𝑺𝑺, 𝑷𝑷) Edge corrections (𝑪𝑪+ , 𝑪𝑪− ) 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒} 𝐶𝐶 = {𝑓𝑓, 𝑔𝑔, ℎ, 𝑖𝑖}
  • 53. Challenge: Fast Update but Good Performance
  • 54. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 55. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  • 56. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 57. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ New edge: 𝑗𝑗𝑏𝑏 How to update current summarization? 𝑎𝑎 𝑗𝑗 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 58. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 59. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 60. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 61. Scheme for Incremental Summarization Our approach (1) Attempt to move nodes among supernodes (2) Accept the move if 𝝋𝝋 decreases (3) Reject otherwise Testing node Testing Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 62. Scheme for Incremental Summarization MoSSo finds... (1) Testing nodes whose move likely results in 𝛗𝛗 ↓ (2) Candidates for testing node, likely resulting in 𝛗𝛗 ↓ Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 63. Lossless summarization Scheme for Incremental Summarization Current graph 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝑗𝑗𝑏𝑏 𝑗𝑗Testing node Candidate 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 5 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑎𝑎 𝑗𝑗 𝐶𝐶+ 𝐶𝐶−
  • 64. Scheme for Incremental Summarization Current graph Lossless summarization 𝐴𝐴 𝑎𝑎 𝐶𝐶 𝐺𝐺∗ 𝑎𝑎 𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐵𝐵 𝑒𝑒𝑏𝑏 𝑐𝑐 𝑑𝑑 𝑖𝑖 𝑓𝑓 𝑔𝑔 ℎ 𝑗𝑗𝑏𝑏 𝑗𝑗 𝜑𝜑 = |𝑃𝑃| + |𝐶𝐶+ | + |𝐶𝐶− | = 4 𝐶𝐶+ 𝐶𝐶−𝐶𝐶+ 𝐶𝐶− 𝑎𝑎 𝑓𝑓 𝑓𝑓𝑖𝑖
  • 65. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  • 66. MoSSo: Main Ideas • Step 1: Set testing nodes • (S1) No restoration from the current summarization 𝐺𝐺𝑡𝑡 ∗ = 𝑆𝑆𝑡𝑡, 𝑃𝑃𝑡𝑡 , 𝐶𝐶𝑡𝑡 = (𝐶𝐶𝑡𝑡 + , 𝐶𝐶𝑡𝑡 − ) • (S2) Reduce redundant testing by a stochastic filtering Repeated Time 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} • Step 2: Find candidate • (S3) Utilize an incremental coarse clustering • (S4) Inject flexibility to reorganization of supernodes Performance Which nodes to move? (testing nodes) 𝑢𝑢 𝑣𝑣 Testing node Move into which supernode? (candidates)
  • 67. MoSSo: Details Parameters: • Sample number 𝒄𝒄 • Escape prob. 𝒆𝒆 Input: • Summary graph 𝑮𝑮𝒕𝒕 ∗ & Edge corrections 𝑪𝑪𝒕𝒕 • Edge change 𝒖𝒖, 𝒗𝒗 + (addition) or 𝒖𝒖, 𝒗𝒗 − (deletion) Output: • Summary graph 𝑮𝑮𝒕𝒕+𝟏𝟏 ∗ & Edge corrections 𝑪𝑪𝒕𝒕+𝟏𝟏
  • 68. MoSSo: Details (Step 1) – MCMC 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 69. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 70. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖)𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 71. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 72. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 73. MoSSo: Details (Step 1) – MCMC Neighborhood 𝑵𝑵(𝒖𝒖) of input node 𝒖𝒖 is more likely affected  Focus on testings nodes in 𝑵𝑵(𝒖𝒖) P1. To sample neighbors, one should retrieve all 𝑵𝑵(𝒖𝒖) from 𝑮𝑮∗ and 𝑪𝑪, which takes 𝑶𝑶(𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) time on average  Deadly to scalability… 𝑢𝑢 𝑣𝑣 Changed edge 𝑒𝑒 = {𝑢𝑢, 𝑣𝑣} Graph densification law [LKF05]: “The average degree of real-world graphs increases over time.” 𝑵𝑵(𝒖𝒖): Neighborhood of a node 𝒖𝒖 Notation
  • 74. MoSSo: Details (Step 1) – MCMC (cont.) S1. Without full retrievals of 𝑵𝑵(𝒖𝒖), sample 𝒄𝒄 neighbors in un iformly random by using Markov Chain Monte Carlo method (MCMC)  MCMC method: sampling from a random variable with its probability density proportional to a given function 𝑢𝑢 𝑣𝑣
  • 75. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? 𝑣𝑣 𝑢𝑢
  • 76. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes?  Better not… 𝑣𝑣 𝑢𝑢
  • 77. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣)  Better not… 𝑣𝑣 𝑢𝑢
  • 78. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  • 79. MoSSo: Details (Step 1) – Probabilistic Filtering Test all the sampled nodes? P2. Too frequent testing on high-degree nodes as 𝑃𝑃(𝑣𝑣 sampled) ∝ 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) Computationally heavy (Too many nbrs) S2. Test a sampled node 𝑣𝑣 w.p. 1 𝑑𝑑𝑑𝑑𝑑𝑑(𝑣𝑣) (1) Likely to avoid expensive testing on high-degree nodes (2) In expectation, 𝑃𝑃(𝑣𝑣: actually tested) is the same across all nodes 𝑣𝑣 (i.e., smoothen unbalance in # of testing)  Better not… - Updating the optimal encoding - Computing the change 𝛥𝛥𝛥𝛥 in the description cost 𝑣𝑣 𝑢𝑢
  • 80. MoSSo: Details (Step 2) – Coarse Clustering Testing node 𝑦𝑦 𝑣𝑣 𝑢𝑢
  • 81. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) 𝑣𝑣 𝑢𝑢
  • 82. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! 𝑣𝑣 𝑢𝑢
  • 83. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  • 84. MoSSo: Details (Step 2) – Coarse Clustering P3. Among many choices, how do we know ”good” candidates? Testing node 𝑦𝑦 (likely resulting in 𝝋𝝋 ↓) S3. Utilize an incremental coarse clustering  Desirable: Nodes with “similar connectivity” in the same cluster  Any incremental coarse clustering with the desirable property! (1) Fast with the desirable theoretical property: (2) Clusters from min-hashing: updated rapidly in response to edge changes 𝑷𝑷 𝒖𝒖, 𝒗𝒗 ∈ 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ∝ 𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱𝑱(𝑵𝑵 𝒖𝒖 , 𝑵𝑵 𝒗𝒗 ) ⇒ Grouping nodes with similar connectivity Min-hashing 𝑣𝑣 𝑢𝑢
  • 85. MoSSo: Details (Step 2) – Separation of Node 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  • 86. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 𝑣𝑣 Testing node 𝑢𝑢
  • 87. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  • 88. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆 𝑣𝑣 Testing node 𝑢𝑢
  • 89. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates 𝑣𝑣 Testing node 𝑢𝑢
  • 90. MoSSo: Details (Step 2) – Separation of Node P4. In this way, moving nodes only decreases or maintains 𝑺𝑺  Discourage reorganizing supernodes in the long run 𝑦𝑦 S4. Instead of finding a candidate, separate 𝒚𝒚 from 𝑺𝑺𝒚𝒚 and create a singleton supernode 𝑺𝑺𝒚𝒚 w.p. escape probability 𝒆𝒆  Inject flexibility to supernodes (a partition of 𝑽𝑽)  Empirically significant improvement in compression rates Similar to before, accept or reject the separation depending on Δ𝝋𝝋 𝑣𝑣 Testing node 𝑢𝑢
  • 91. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 92. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges)
  • 93. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web
  • 94. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social
  • 95. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration
  • 96. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) Web Social Collaboration Email And others!
  • 97. Experimental Settings • 10 Real-world Graphs (up to 0.3B edges) • Batch loseless graph summarization algorithms: • Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19] Web Social Collaboration Email And others!
  • 98. Baseline Incremental Algorithms • MoSSo-Greedy: • Greedily moves nodes related to inserted/deleted edge, while fixing the other nodes so that the objective is minimized • MoSSo-MCMC • See the paper for details • MoSSo-Simple • MoSSo without coarse clustering
  • 99. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm
  • 100. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm UK (Insertion-only)
  • 101. Experiment results: Speed • MoSSo processed each change up to 7 orders of magnitude faster than running the fastest batch algorithm Insertion-only graph streams Fully-dynamic graph streams UK (Insertion-only)
  • 102. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 103. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 104. Experiment results: Compression Performance • The compression ratio of MoSSo was even comparable to those of the best batch algorithms • MoSSo achieved the best compression ratios among the streaming algorithms PR EN FB DB YT SK LJ EU HW UK Compression ratio: ( 𝑷𝑷 + 𝑪𝑪+ + 𝑪𝑪− )/|𝑬𝑬| Notation
  • 105. Experiment results: Scalability • MoSSo processed each change in near-constant time
  • 106. Experiment results: Scalability EU (Insertion-only) SK (Fully-dynamic) • MoSSo processed each change in near-constant time
  • 107. Outline • Preliminaries • Proposed Algorithm: MoSSo • Experimental Results • Conclusions
  • 108. Conclusions We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 109. Conclusions Fast and ‘any time’ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 110. Conclusions Fast and ‘any time’ Effective We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 111. Conclusions Fast and ‘any time’ Effective Scalable We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 112. Conclusions Fast and ‘any time’ Effective Scalable The code and datasets used in the paper are available at http://dmlab.kaist.ac.kr/mosso/ We propose MoSSo, the first algorithm for incremental lossless graph summarization
  • 113. Incremental Lossless Graph Summarization Jihoon Ko* Yunbum Kook* Kijung Shin