A presentation slides of Jihoon Ko*, Yunbum Kook* and Kijung Shin, "Incremental Lossless Graph Summarization", KDD 2020.
Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot?
As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice.
In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.
A Real-time Classroom Attendance System Utilizing ViolaโJones for Face Detect...Nischal Lal Shrestha
ย
Abstract
The face of a human is crucial for conveying identity.
Computer scientists, Neuro
scientists, and psychologists, all exploits this human feature using image processing
techniques for commercial, and law enforcement applications. Likewise, this feature
can be invited into classrooms to maintain records of studentsโ attendance.
Con-
temporary traditional way of recording attendance involves human intervention and
requires cooperation of the students which is hectic and contribute towards waste of
class time.
An automated real-time classroom attendance system detects students
from still image or video frame coming from a digital camera, and marks his/her
attendance by recognizing them.
The system utilizes ViolaโJones object detection
framework which is capable of processing images extremely rapidly with high detec-
tion rates. In the next stage, the detected face in the image is recognized using Local
Binary Patterns Histogram.
Keywordsโ Computer
vision; face detection; face recognition; feature extraction;
image processing; Local Binary Patterns Histogram; object detection; Viola-Jones
object detection.
Direct solution of sparse network equations by optimally ordered triangular f...Dimas Ruliandi
ย
Triangular factorization method of a power network problem (in form of matrix). Direct solution can be found without calculating inverse matrix which usually considered an exhaustive method, especially in large scale network.
A Real-time Classroom Attendance System Utilizing ViolaโJones for Face Detect...Nischal Lal Shrestha
ย
Abstract
The face of a human is crucial for conveying identity.
Computer scientists, Neuro
scientists, and psychologists, all exploits this human feature using image processing
techniques for commercial, and law enforcement applications. Likewise, this feature
can be invited into classrooms to maintain records of studentsโ attendance.
Con-
temporary traditional way of recording attendance involves human intervention and
requires cooperation of the students which is hectic and contribute towards waste of
class time.
An automated real-time classroom attendance system detects students
from still image or video frame coming from a digital camera, and marks his/her
attendance by recognizing them.
The system utilizes ViolaโJones object detection
framework which is capable of processing images extremely rapidly with high detec-
tion rates. In the next stage, the detected face in the image is recognized using Local
Binary Patterns Histogram.
Keywordsโ Computer
vision; face detection; face recognition; feature extraction;
image processing; Local Binary Patterns Histogram; object detection; Viola-Jones
object detection.
Direct solution of sparse network equations by optimally ordered triangular f...Dimas Ruliandi
ย
Triangular factorization method of a power network problem (in form of matrix). Direct solution can be found without calculating inverse matrix which usually considered an exhaustive method, especially in large scale network.
Pre-calculus 1, 2 and Calculus I (exam notes)William Faber
ย
Notes I typed using Microsoft Word for pre-calculus and calculus exams. Most of the images were also created by me. I shared them with other students in my class to increase their chance of success as well. Upon completion of the courses I donated them to the math center to help other math students.
Paper Study: Melding the data decision pipelineChenYiHuang5
ย
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
Differential Geometry for Machine LearningSEMINARGROOT
ย
References:
Differential Geometry of Curves and Surfaces, Manfredo P. Do Carmo (2016)
Differential Geometry by Claudio Arezzo
Youtube: https://youtu.be/tKnBj7B2PSg
What is a Manifold?
Youtube: https://youtu.be/CEXSSz0gZI4
Shape analysis (MIT spring 2019) by Justin Solomon
Youtube: https://youtu.be/GEljqHZb30c
Tensor Calculus
Youtube: https://youtu.be/kGXr1SF3WmA
Manifolds: A Gentle Introduction,
Hyperbolic Geometry and Poincarรฉ Embeddings by Brian Keng
Link: http://bjlkeng.github.io/posts/manifolds/,
http://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/
Statistical Learning models for Manifold-Valued measurements with application to computer vision and neuroimaging by Hyunwoo J.Kim
A Non Local Boundary Value Problem with Integral Boundary ConditionIJMERJOURNAL
ย
ABSTRACT: In this article a three point boundary value problem associated with a second order differential equation with integral type boundary conditions is proposed. Then its solution is developed with the help of the Greenโs function associated with the homogeneous equation. Using this idea and Iteration method is proposed to solve the corresponding linear problem.
Pre-calculus 1, 2 and Calculus I (exam notes)William Faber
ย
Notes I typed using Microsoft Word for pre-calculus and calculus exams. Most of the images were also created by me. I shared them with other students in my class to increase their chance of success as well. Upon completion of the courses I donated them to the math center to help other math students.
Paper Study: Melding the data decision pipelineChenYiHuang5
ย
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
Differential Geometry for Machine LearningSEMINARGROOT
ย
References:
Differential Geometry of Curves and Surfaces, Manfredo P. Do Carmo (2016)
Differential Geometry by Claudio Arezzo
Youtube: https://youtu.be/tKnBj7B2PSg
What is a Manifold?
Youtube: https://youtu.be/CEXSSz0gZI4
Shape analysis (MIT spring 2019) by Justin Solomon
Youtube: https://youtu.be/GEljqHZb30c
Tensor Calculus
Youtube: https://youtu.be/kGXr1SF3WmA
Manifolds: A Gentle Introduction,
Hyperbolic Geometry and Poincarรฉ Embeddings by Brian Keng
Link: http://bjlkeng.github.io/posts/manifolds/,
http://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/
Statistical Learning models for Manifold-Valued measurements with application to computer vision and neuroimaging by Hyunwoo J.Kim
A Non Local Boundary Value Problem with Integral Boundary ConditionIJMERJOURNAL
ย
ABSTRACT: In this article a three point boundary value problem associated with a second order differential equation with integral type boundary conditions is proposed. Then its solution is developed with the help of the Greenโs function associated with the homogeneous equation. Using this idea and Iteration method is proposed to solve the corresponding linear problem.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
ย
Abstract โ Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Large-scale Graphs are Everywhere!
Icon made by Freepik from www.flaticon.com
2B+ active users
600M+ users
1.5B+ users
3. Large-scale Graphs are Everywhere! (cont.)
4B+ web pages 5M papers 6K+ proteins
Icon made by Freepik from www.flaticon.com
4. Graph Compression for Efficient Manipulation
โข Handling large-scale graphs as they are...
๏จ heavy disk or network I/O
5. Graph Compression for Efficient Manipulation
โข Handling large-scale graphs as they are...
๏จ heavy disk or network I/O
โข Their compact representation makes possible efficient manipulation!
6. Graph Compression for Efficient Manipulation
โข Handling large-scale graphs as they are...
๏จ heavy disk or network I/O
โข Their compact representation makes possible efficient manipulation!
โข A larger portion of original graphs can be stored in main memory or cache
7. Previous Graph Compression Techniques
โข Various compression techniques have been proposed
โข Relabeling nodes
โข Pattern mining
โข Lossless graph summarization ๏จ One of the most effective compression techniques
โข โฆ
8. Previous Graph Compression Techniques
โข Various compression techniques have been proposed
โข Relabeling nodes
โข Pattern mining
โข Lossless graph summarization ๏จ One of the most effective compression techniques
โข โฆ
9. Previous Graph Compression Techniques
โข Various compression techniques have been proposed
โข Relabeling nodes
โข Pattern mining
โข Lossless graph summarization ๏จ One of the most effective compression techniques
โข โฆ
โข Lossless graph summarization is a batch algorithm for โstatic graphsโ,
which indicate a single or a few snapshots of evolving graphs
10. Previous Graph Compression Techniques
โข Various compression techniques have been proposed
โข Relabeling nodes
โข Pattern mining
โข Lossless graph summarization ๏จ One of the most effective compression techniques
โข โฆ
โข Lossless graph summarization is a batch algorithm for โstatic graphsโ,
which indicate a single or a few snapshots of evolving graphs
However, most real-world graphs
go through lots of changes in fact...
12. Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
๏จAlgorithms should be rerun from scratch to reflect changes
13. Real-world Graphs are Evolving
2B+ users2M+ users
10 years
Previous algorithms: not designed to allow for changes in graphs
๏จAlgorithms should be rerun from scratch to reflect changes
Solution: Incrementally update compressed graphs in fast and
effective manners!
40. Why Lossless Graph Summarization?
โข Queryable (Retrieving the neighborhood of a query node)
โข Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstraโs, etc)
โข Rapidly done from a summary and corrections
๐ถ๐ถ+ = ๐๐๐๐ ๐ถ๐ถโ = ๐๐๐๐
Summary graph ๐ฎ๐ฎโ = (๐บ๐บ, ๐ท๐ท) Edge corrections (๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
41. Why Lossless Graph Summarization?
โข Queryable (Retrieving the neighborhood of a query node)
โข Queryability: key building blocks in numerous graph algorithms
(ex: DFS, PageRank, Dijkstraโs, etc)
โข Rapidly done from a summary and corrections
โข Combinable
โข Its outputs are also graphs ๏จ further compressed via other compression techniques!
๐ถ๐ถ+ = ๐๐๐๐ ๐ถ๐ถโ = ๐๐๐๐
Summary graph ๐ฎ๐ฎโ = (๐บ๐บ, ๐ท๐ท) Edge corrections (๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
42. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
43. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
Stream of changes:
44. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
Stream of changes:
Empty graph ๐ฎ๐ฎ๐๐
Time ๐๐ = ๐๐
45. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
โฆโฆStream of changes:
+ - - + +
Empty graph ๐ฎ๐ฎ๐๐
Time ๐๐ = ๐๐
46. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
โฆโฆStream of changes:
+ - - + +
Empty graph ๐ฎ๐ฎ๐๐
Time ๐๐ = ๐๐
Current graph ๐ฎ๐ฎ๐๐
Time ๐๐
47. Fully Dynamic Graph Stream
Fully dynamic graphs can be represented by using
a sequence {๐๐๐๐}๐๐=๐๐
โ
of edge addition ๐๐๐๐ = ๐๐, ๐๐ + and deletion ๐๐๐๐ = ๐๐, ๐๐ โ
๏จ The graph at time ๐๐ is constructed by aggregating all edges change until time ๐๐
โฆโฆStream of changes:
+ - - + -+
โฆโฆ
Empty graph ๐ฎ๐ฎ๐๐
Time ๐๐ = ๐๐
Current graph ๐ฎ๐ฎ๐๐
Time ๐๐
48. Problem Formulation
โข Given Fully dynamic graph stream {๐๐๐๐}๐๐=๐๐
โ
โข Retain Summary graph ๐ฎ๐ฎ๐๐
โ
= ๐บ๐บ๐๐, ๐ท๐ท๐๐ and Edge corrections ๐ช๐ช๐๐ = (๐ช๐ช๐๐
+
, ๐ช๐ช๐๐
โ
)
of graph ๐ฎ๐ฎ๐๐ at time ๐๐
โข To Minimize the size of output representation ๐ท๐ท๐๐ + ๐ช๐ช๐๐
+
+ ๐ช๐ช๐๐
โ
49. Problem Formulation
โข Given Fully dynamic graph stream {๐๐๐๐}๐๐=๐๐
โ
โข Retain Summary graph ๐ฎ๐ฎ๐๐
โ
= ๐บ๐บ๐๐, ๐ท๐ท๐๐ and Edge corrections ๐ช๐ช๐๐ = (๐ช๐ช๐๐
+
, ๐ช๐ช๐๐
โ
)
of graph ๐ฎ๐ฎ๐๐ at time ๐๐
โข To Minimize the size of output representation ๐ท๐ท๐๐ + ๐ช๐ช๐๐
+
+ ๐ช๐ช๐๐
โ
Retained at time ๐๐
๐ถ๐ถ+
= ๐๐๐๐
๐ถ๐ถโ = ๐๐๐๐
Summary graph
๐ฎ๐ฎโ
= (๐บ๐บ, ๐ท๐ท)
Edge corrections
(๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
50. Problem Formulation
โข Given Fully dynamic graph stream {๐๐๐๐}๐๐=๐๐
โ
โข Retain Summary graph ๐ฎ๐ฎ๐๐
โ
= ๐บ๐บ๐๐, ๐ท๐ท๐๐ and Edge corrections ๐ช๐ช๐๐ = (๐ช๐ช๐๐
+
, ๐ช๐ช๐๐
โ
)
of graph ๐ฎ๐ฎ๐๐ at time ๐๐
โข To Minimize the size of output representation ๐ท๐ท๐๐ + ๐ช๐ช๐๐
+
+ ๐ช๐ช๐๐
โ
+
Retained at time ๐๐
Edge change ๐๐๐๐+๐๐
๐ถ๐ถ+
= ๐๐๐๐
๐ถ๐ถโ = ๐๐๐๐
Summary graph
๐ฎ๐ฎโ
= (๐บ๐บ, ๐ท๐ท)
Edge corrections
(๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
51. Problem Formulation
โข Given Fully dynamic graph stream {๐๐๐๐}๐๐=๐๐
โ
โข Retain Summary graph ๐ฎ๐ฎ๐๐
โ
= ๐บ๐บ๐๐, ๐ท๐ท๐๐ and Edge corrections ๐ช๐ช๐๐ = (๐ช๐ช๐๐
+
, ๐ช๐ช๐๐
โ
)
of graph ๐ฎ๐ฎ๐๐ at time ๐๐
โข To Minimize the size of output representation ๐ท๐ท๐๐ + ๐ช๐ช๐๐
+
+ ๐ช๐ช๐๐
โ
+
Retained at time ๐๐
Edge change ๐๐๐๐+๐๐
๐ถ๐ถ+
= ๐๐๐๐
๐ถ๐ถโ = ๐๐๐๐
Summary graph
๐ฎ๐ฎโ
= (๐บ๐บ, ๐ท๐ท)
Edge corrections
(๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
52. Problem Formulation
โข Given Fully dynamic graph stream {๐๐๐๐}๐๐=๐๐
โ
โข Retain Summary graph ๐ฎ๐ฎ๐๐
โ
= ๐บ๐บ๐๐, ๐ท๐ท๐๐ and Edge corrections ๐ช๐ช๐๐ = (๐ช๐ช๐๐
+
, ๐ช๐ช๐๐
โ
)
of graph ๐ฎ๐ฎ๐๐ at time ๐๐
โข To Minimize the size of output representation ๐ท๐ท๐๐ + ๐ช๐ช๐๐
+
+ ๐ช๐ช๐๐
โ
+ -+
โฆโฆ
Retained at time ๐๐
Edge change ๐๐๐๐+๐๐
๐ถ๐ถ+
= ๐๐๐๐
๐ถ๐ถโ = ๐๐๐๐
Summary graph
๐ฎ๐ฎโ
= (๐บ๐บ, ๐ท๐ท)
Edge corrections
(๐ช๐ช+
, ๐ช๐ชโ
)
๐ด๐ด = {๐๐}
๐ต๐ต = {๐๐, ๐๐, ๐๐, ๐๐}
๐ถ๐ถ = {๐๐, ๐๐, โ, ๐๐}
69. MoSSo: Details (Step 1) โ MCMC
Neighborhood ๐ต๐ต(๐๐) of input node ๐๐ is more likely affected
๐ข๐ข
๐ฃ๐ฃ
Changed edge
๐๐ = {๐ข๐ข, ๐ฃ๐ฃ}
๐ต๐ต(๐๐): Neighborhood of a node ๐๐
Notation
70. MoSSo: Details (Step 1) โ MCMC
Neighborhood ๐ต๐ต(๐๐) of input node ๐๐ is more likely affected
๏จ Focus on testings nodes in ๐ต๐ต(๐๐)๐ข๐ข
๐ฃ๐ฃ
Changed edge
๐๐ = {๐ข๐ข, ๐ฃ๐ฃ}
๐ต๐ต(๐๐): Neighborhood of a node ๐๐
Notation
71. MoSSo: Details (Step 1) โ MCMC
Neighborhood ๐ต๐ต(๐๐) of input node ๐๐ is more likely affected
๏จ Focus on testings nodes in ๐ต๐ต(๐๐)
P1. To sample neighbors, one should retrieve all ๐ต๐ต(๐๐) from
๐ฎ๐ฎโ
and ๐ช๐ช, which takes ๐ถ๐ถ(๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ) time on average
๐ข๐ข
๐ฃ๐ฃ
Changed edge
๐๐ = {๐ข๐ข, ๐ฃ๐ฃ}
๐ต๐ต(๐๐): Neighborhood of a node ๐๐
Notation
72. MoSSo: Details (Step 1) โ MCMC
Neighborhood ๐ต๐ต(๐๐) of input node ๐๐ is more likely affected
๏จ Focus on testings nodes in ๐ต๐ต(๐๐)
P1. To sample neighbors, one should retrieve all ๐ต๐ต(๐๐) from
๐ฎ๐ฎโ
and ๐ช๐ช, which takes ๐ถ๐ถ(๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ) time on average
๏จ Deadly to scalabilityโฆ
๐ข๐ข
๐ฃ๐ฃ
Changed edge
๐๐ = {๐ข๐ข, ๐ฃ๐ฃ}
๐ต๐ต(๐๐): Neighborhood of a node ๐๐
Notation
73. MoSSo: Details (Step 1) โ MCMC
Neighborhood ๐ต๐ต(๐๐) of input node ๐๐ is more likely affected
๏จ Focus on testings nodes in ๐ต๐ต(๐๐)
P1. To sample neighbors, one should retrieve all ๐ต๐ต(๐๐) from
๐ฎ๐ฎโ
and ๐ช๐ช, which takes ๐ถ๐ถ(๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ) time on average
๏จ Deadly to scalabilityโฆ
๐ข๐ข
๐ฃ๐ฃ
Changed edge
๐๐ = {๐ข๐ข, ๐ฃ๐ฃ}
Graph densification law [LKF05]:
โThe average degree of real-world graphs increases over time.โ
๐ต๐ต(๐๐): Neighborhood of a node ๐๐
Notation
74. MoSSo: Details (Step 1) โ MCMC (cont.)
S1. Without full retrievals of ๐ต๐ต(๐๐), sample ๐๐ neighbors in un
iformly random by using Markov Chain Monte Carlo method
(MCMC)
๏จ MCMC method: sampling from a random variable with its
probability density proportional to a given function
๐ข๐ข
๐ฃ๐ฃ
75. MoSSo: Details (Step 1) โ Probabilistic Filtering
Test all the sampled nodes?
๐ฃ๐ฃ
๐ข๐ข
76. MoSSo: Details (Step 1) โ Probabilistic Filtering
Test all the sampled nodes? ๏จ Better notโฆ
๐ฃ๐ฃ
๐ข๐ข
77. MoSSo: Details (Step 1) โ Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as ๐๐(๐ฃ๐ฃ sampled) โ ๐๐๐๐๐๐(๐ฃ๐ฃ)
๏จ Better notโฆ
๐ฃ๐ฃ
๐ข๐ข
78. MoSSo: Details (Step 1) โ Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as ๐๐(๐ฃ๐ฃ sampled) โ ๐๐๐๐๐๐(๐ฃ๐ฃ)
Computationally
heavy
(Too many nbrs)
๏จ Better notโฆ
- Updating the optimal encoding
- Computing the change ๐ฅ๐ฅ๐ฅ๐ฅ in the description cost
๐ฃ๐ฃ
๐ข๐ข
79. MoSSo: Details (Step 1) โ Probabilistic Filtering
Test all the sampled nodes?
P2. Too frequent testing on high-degree nodes
as ๐๐(๐ฃ๐ฃ sampled) โ ๐๐๐๐๐๐(๐ฃ๐ฃ)
Computationally
heavy
(Too many nbrs)
S2. Test a sampled node ๐ฃ๐ฃ w.p.
1
๐๐๐๐๐๐(๐ฃ๐ฃ)
(1) Likely to avoid expensive testing on high-degree nodes
(2) In expectation, ๐๐(๐ฃ๐ฃ: actually tested) is the same across
all nodes ๐ฃ๐ฃ (i.e., smoothen unbalance in # of testing)
๏จ Better notโฆ
- Updating the optimal encoding
- Computing the change ๐ฅ๐ฅ๐ฅ๐ฅ in the description cost
๐ฃ๐ฃ
๐ข๐ข
81. MoSSo: Details (Step 2) โ Coarse Clustering
P3. Among many choices, how do we know โgoodโ candidates?
Testing
node
๐ฆ๐ฆ
(likely resulting in ๐๐ โ)
๐ฃ๐ฃ
๐ข๐ข
82. MoSSo: Details (Step 2) โ Coarse Clustering
P3. Among many choices, how do we know โgoodโ candidates?
Testing
node
๐ฆ๐ฆ
(likely resulting in ๐๐ โ)
S3. Utilize an incremental coarse clustering
๏จ Desirable: Nodes with โsimilar connectivityโ
in the same cluster
๏จ Any incremental coarse clustering with the
desirable property!
๐ฃ๐ฃ
๐ข๐ข
83. MoSSo: Details (Step 2) โ Coarse Clustering
P3. Among many choices, how do we know โgoodโ candidates?
Testing
node
๐ฆ๐ฆ
(likely resulting in ๐๐ โ)
S3. Utilize an incremental coarse clustering
๏จ Desirable: Nodes with โsimilar connectivityโ
in the same cluster
๏จ Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
๐ท๐ท ๐๐, ๐๐ โ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐ โ ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ(๐ต๐ต ๐๐ , ๐ต๐ต ๐๐ )
โ Grouping nodes with similar connectivity
Min-hashing
๐ฃ๐ฃ
๐ข๐ข
84. MoSSo: Details (Step 2) โ Coarse Clustering
P3. Among many choices, how do we know โgoodโ candidates?
Testing
node
๐ฆ๐ฆ
(likely resulting in ๐๐ โ)
S3. Utilize an incremental coarse clustering
๏จ Desirable: Nodes with โsimilar connectivityโ
in the same cluster
๏จ Any incremental coarse clustering with the
desirable property!
(1) Fast with the desirable theoretical property:
(2) Clusters from min-hashing: updated rapidly in response to edge changes
๐ท๐ท ๐๐, ๐๐ โ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐ โ ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ๐ฑ(๐ต๐ต ๐๐ , ๐ต๐ต ๐๐ )
โ Grouping nodes with similar connectivity
Min-hashing
๐ฃ๐ฃ
๐ข๐ข
86. MoSSo: Details (Step 2) โ Separation of Node
P4. In this way, moving nodes only decreases or maintains ๐บ๐บ
๏จ Discourage reorganizing supernodes in the long run
๐ฆ๐ฆ
๐ฃ๐ฃ
Testing
node
๐ข๐ข
87. MoSSo: Details (Step 2) โ Separation of Node
P4. In this way, moving nodes only decreases or maintains ๐บ๐บ
๏จ Discourage reorganizing supernodes in the long run
๐ฆ๐ฆ
S4. Instead of finding a candidate, separate ๐๐ from ๐บ๐บ๐๐ and
create a singleton supernode ๐บ๐บ๐๐ w.p. escape probability ๐๐
๐ฃ๐ฃ
Testing
node
๐ข๐ข
88. MoSSo: Details (Step 2) โ Separation of Node
P4. In this way, moving nodes only decreases or maintains ๐บ๐บ
๏จ Discourage reorganizing supernodes in the long run
๐ฆ๐ฆ
S4. Instead of finding a candidate, separate ๐๐ from ๐บ๐บ๐๐ and
create a singleton supernode ๐บ๐บ๐๐ w.p. escape probability ๐๐
๐ฃ๐ฃ
Testing
node
๐ข๐ข
89. MoSSo: Details (Step 2) โ Separation of Node
P4. In this way, moving nodes only decreases or maintains ๐บ๐บ
๏จ Discourage reorganizing supernodes in the long run
๐ฆ๐ฆ
S4. Instead of finding a candidate, separate ๐๐ from ๐บ๐บ๐๐ and
create a singleton supernode ๐บ๐บ๐๐ w.p. escape probability ๐๐
๏จ Inject flexibility to supernodes (a partition of ๐ฝ๐ฝ)
๏จ Empirically significant improvement in compression rates
๐ฃ๐ฃ
Testing
node
๐ข๐ข
90. MoSSo: Details (Step 2) โ Separation of Node
P4. In this way, moving nodes only decreases or maintains ๐บ๐บ
๏จ Discourage reorganizing supernodes in the long run
๐ฆ๐ฆ
S4. Instead of finding a candidate, separate ๐๐ from ๐บ๐บ๐๐ and
create a singleton supernode ๐บ๐บ๐๐ w.p. escape probability ๐๐
๏จ Inject flexibility to supernodes (a partition of ๐ฝ๐ฝ)
๏จ Empirically significant improvement in compression rates
Similar to before,
accept or reject the separation depending on ฮ๐๐
๐ฃ๐ฃ
Testing
node
๐ข๐ข
97. Experimental Settings
โข 10 Real-world Graphs (up to 0.3B edges)
โข Batch loseless graph summarization algorithms:
โข Randomized [NSR08], SAGS [KNL15], SWeG [SGKR19]
Web Social Collaboration Email And others!
98. Baseline Incremental Algorithms
โข MoSSo-Greedy:
โข Greedily moves nodes related to inserted/deleted edge, while fixing the
other nodes so that the objective is minimized
โข MoSSo-MCMC
โข See the paper for details
โข MoSSo-Simple
โข MoSSo without coarse clustering
99. Experiment results: Speed
โข MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
100. Experiment results: Speed
โข MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
UK (Insertion-only)
101. Experiment results: Speed
โข MoSSo processed each change up to 7 orders of magnitude faster
than running the fastest batch algorithm
Insertion-only graph streams
Fully-dynamic graph streams
UK (Insertion-only)
102. Experiment results: Compression Performance
โข The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
โข MoSSo achieved the best compression ratios among the streaming
algorithms
Compression ratio: ( ๐ท๐ท + ๐ช๐ช+ + ๐ช๐ชโ )/|๐ฌ๐ฌ|
Notation
103. Experiment results: Compression Performance
โข The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
โข MoSSo achieved the best compression ratios among the streaming
algorithms
UK
Compression ratio: ( ๐ท๐ท + ๐ช๐ช+ + ๐ช๐ชโ )/|๐ฌ๐ฌ|
Notation
104. Experiment results: Compression Performance
โข The compression ratio of MoSSo was even comparable to those of the
best batch algorithms
โข MoSSo achieved the best compression ratios among the streaming
algorithms
PR EN FB
DB YT SK
LJ EU HW
UK
Compression ratio: ( ๐ท๐ท + ๐ช๐ช+ + ๐ช๐ชโ )/|๐ฌ๐ฌ|
Notation
109. Conclusions
Fast and โany timeโ
We propose MoSSo, the first algorithm for incremental lossless graph summarization
110. Conclusions
Fast and โany timeโ Effective
We propose MoSSo, the first algorithm for incremental lossless graph summarization
111. Conclusions
Fast and โany timeโ Effective Scalable
We propose MoSSo, the first algorithm for incremental lossless graph summarization
112. Conclusions
Fast and โany timeโ Effective Scalable
The code and datasets used in the paper
are available at http://dmlab.kaist.ac.kr/mosso/
We propose MoSSo, the first algorithm for incremental lossless graph summarization