SlideShare a Scribd company logo
1 of 42
Download to read offline
SWeG: Lossless and
Lossy Summarization of
Web-Scale Graphs
Kijung Shin Amol Ghoting Myunghwan Kim Hema Raghavan
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Graphs are Everywhere
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 2/42
Graphs are Everywhere (cont.)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 3/42
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 4/42
Graphs are Everywhere (cont.)
Real-world Graphs are Huge
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 5/42
2B+ active users
500M+ products
300M+ customers
4B+ Web pages
600M+ users
20B+ connections
× 30+
How can we analyze and utilize
such large graph data?
Limitations of Existing Tools
•Graph algorithms in textbooks
◦ Assume graphs fit in main memory
(i.e., random-access memory)
•Tools for out-of-core graph processing
◦ Not for every graph algorithm
◦ Requiring engineering for each algorithm
◦ Inappropriate for real-time applications
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 6/42
1000+
new graph
algorithms
per year!
Solution: Graph Compression
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 7/42
Textbook
algorithms
Any new
algorithms
Real-time
algorithms
•Compressing large graphs so that
◦ Compressed data fit in main memory
◦ Algorithms can be performed on compressed
data without changes
Roadmap
• Problem Definition <<
• Proposed Method: SWeG
• Experimental Results
• Conclusions
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 8/42
Graph Summarization: Example
• Graph summarization is a graph compression technique
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 9/42
− (𝑎, 𝑑)
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔 𝑎, 𝑏
𝑐
𝑑
𝑒
𝑓
𝑔
𝑎, 𝑏 𝑐, 𝑑, 𝑒
𝑓
𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
Input Graph (w/ 9 edges)
Output (w/ 6 edges)
Graph Summarization: Definition
• Given: an input graph
• Find:
◦ a summary graph
◦ positive and negative residual graphs
• To Minimize: the edge count (≈ description length)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 10/42
Summary Graph
Residual Graph
(Positive)
Residual Graph
(Negative)
𝑎
𝑏
𝑑
𝑒
𝑓
𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
Input Graph
𝑐
Restoration: Example
11/42
𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒
𝑓
𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
𝑎, 𝑏
𝑐
𝑑
𝑒
𝑓
𝑔
− (𝑎, 𝑑)
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔
Restored Graph (w/ 9 edges)
Summarized Graph (w/ 6 edges)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Why Graph Summarization?
• Neighbor queries can be rapid and efficient
§Given a seed node, return its neighbors
§Key building block of most graph algorithms
• Easily extended to lossy compression
• Easily combined with other graph-compression methods
◦ the outputs are also graphs
12/42SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Summary Graph
Residual Graph (Positive)
Residual Graph (Negative)𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔 − (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
discussed
in the paper
Challenge: Scalability!
13/42
Maximum Size of Input Graphs
Compression
Performance
Good
Bad
VoG [KKVF14]
Greedy [NSR08]
millions 10 millions
Randomized [NSR08]
SAGS [KNL15]
billions
10,000×
SWeG
(Proposed)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Our Contribution: SWeG
14/42
Fast with Concise Outputs
Memory Efficient
Scalable
• We develop SWeG (SWeG: Lossless and Lossy
Summarization of Web-Scale Graphs):
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Roadmap
• Problem Definition
• Proposed Method: SWeG <<
• Experimental Results
• Conclusions
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 15/42
Main Idea behind SWeG
• Graph summarization ≈ node clustering
◦ Finding sets of similar nodes to be merged into super nodes
• Previous heuristics are greedy algorithms
• Why are even greedy algorithms slow?
◦ Too many node pairs are considered: 𝑂(𝑛3)
• How can we reduce the number of node pairs to be
considered without missing similar node pairs?
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 16/42
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
Main Idea behind SWeG (cont.)
• Step 1: Coarse clustering (Grouping)
◦ Fast and careless
• Step 2: Fine clustering (Merging)
◦ Greedy algorithm
◦ Only node pairs within each group are considered
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 17/42
Repeated
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
Terminologies
18/42
Summary Graph 𝑺
{𝑎, 𝑏}
= 𝐴
𝑐, 𝑑, 𝑒
= 𝐵
𝑓, 𝑔
= 𝐶
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
super node
Residual Graph 𝑹 Positive
Residual
Graph 𝑹<
Negative
Residual
Graph 𝑹=
𝑺𝒂𝒗𝒊𝒏𝒈 𝑨, 𝑩 : = 1 −
𝐶𝑜𝑠𝑡(𝐴 ∪ 𝐵)
𝐶𝑜𝑠𝑡 𝐴 + 𝐶𝑜𝑠𝑡(𝐵)
Encoding cost
when 𝐴 and 𝐵
are merged
Encoding cost of 𝐴 Encoding cost of 𝐵
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Details of SWeG
• Inputs: - input graph 𝑮
- number of iterations 𝑻
• Outputs: - summary graph 𝑺
- residual graph 𝑹 (or 𝑹<
and 𝑹=
)
• Procedure:
19/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step
• S2: Compressing Step (optional)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
SWeG: Initializing Step
20/42
𝐴 = {𝑎}
𝐵 = {𝑏}
𝐷 = {𝑑}
𝐸 = {𝑒}
𝐹 = {𝑓}
𝐺 = {𝑔}
𝐶 = {𝑐}
Summary Graph 𝑺 = 𝑮 Residual Graph 𝑹 = ∅
• S0: Initializing Step <<
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step
• S2: Compressing Step (optional)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
SWeG: Dividing Step
21/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step <<
• S1-2: Merging Step
• S2: Compressing Step (optional)
𝐴 = {𝑎}
𝐵 = {𝑏}
𝐷 = {𝑑}
𝐶 = {𝑐}
𝐸 = {𝑒}
𝐹 = {𝑓}
𝐺 = {𝑔}
• Divides super nodes into groups
◦ MinHashing: rapid and out-of-core
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
SWeG: Merging Step
• Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W)
22/42
𝐴 = {𝑎, 𝑏}
𝐷 = {𝑑, 𝑒}
𝐶 = {𝑐}
𝐹 = {𝑓, 𝑔}
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step <<
• S2: Compressing Step (optional)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
23/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step
• S2: Compressing Step (optional)
𝐴 = {𝑎, 𝑏}
𝐷 = {𝑑, 𝑒}
𝐹 = {𝑓, 𝑔}
𝐶 = {𝑐}
Summary Graph 𝑺 Residual Graph 𝑹
− (𝑎, 𝑑)
+ 𝑑, 𝑔
SWeG: Merging Step (cont.)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
SWeG: Dividing Step
24/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step <<
• S1-2: Merging Step
• S2: Compressing Step (optional)
𝐴 = {𝑎, 𝑏} 𝐷 = {𝑑, 𝑒}
𝐶 = {𝑐}𝐹 = {𝑓, 𝑔}
• Divides super nodes into groups
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
25/42
𝐴 = {𝑎, 𝑏} 𝐶 = {𝑐, 𝑑, 𝑒}
𝐹 = {𝑓, 𝑔}
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step <<
• S2: Compressing Step (optional)
• Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W)
SWeG: Merging Step
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
26/42
𝐴 = {𝑎, 𝑏}
𝐵 =
𝑐, 𝑑, 𝑒
𝐹 = 𝑓, 𝑔
− (𝑎, 𝑑)
− (𝑐, 𝑒)
+ 𝑑, 𝑔
Summary Graph 𝑺 Residual Graph 𝑹
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step
• S2: Compressing Step (optional)
SWeG: Merging Step (cont.)
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
27/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step <<
• S2: Compressing Step (optional)
• Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W)
• Decreasing 𝜃(W)
= 1 + 𝑡 =X
◦ exploration of other groups
◦ exploitation within each group
◦ ~ 30% better compression than 𝜃(W)
= 0
SWeG: Merging Step (cont.)
Details
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
SWeG: Compressing Step
• Compress each output graph (𝑺, 𝑹<
and 𝑹=
)
• Use any off-the-shelf graph-compression algorithm
◦ Boldi-Vigna [BV04] / VNMiner [BC08] / BFS [AD09] /
Graph Bisection [DKKO+16]
•SWeG+: SWeG with the compressing step
28/42
• S0: Initializing Step
• repeat 𝑇 times
• S1-1: Dividing Step
• S1-2: Merging Step
• S2: Compressing Step (optional) <<
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
No need to load the entire graph in memory!
• Map stage: compute min hashes in parallel
• Shuffle stage: divide super nodes using min hashes
• Reduce stage: process groups independently in parallel
29/42
Parallel & Distributed Processing
𝐴 = {𝑎}
𝐵 = {𝑏}
𝐶 = {𝑐}
MinHash = 𝟏
Merge!
𝐷 = {𝑑}
𝐸 = {𝑒}
MinHash = 𝟐
𝐹 = {𝑓}
𝐺 = {𝑔}
MinHash = 𝟑
Merge! Merge!
Details
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Roadmap
• Problem Definition
• Proposed Method: SWeG
• Experimental Results <<
• Conclusions
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 30/42
Experimental Settings
31/42
• 13 real-world graphs (10K - 20B edges)
• Graph summarization algorithms:
◦ Greedy [NRS08], Randomized [NSR08], SAGS [KNL15]
• Implementations: &
Social Collaboration Citation Web …
…
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
EXP1. Speed and Compression
32/42
SWeG outperforms its competitors
SWeG
- dataset:
𝟑 𝟕 𝟎
−
𝟒, 𝟒 𝟗 𝟎×
faster
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Advantages of SWeG
33/42
Fast with Concise Outputs
Memory Efficient
Scalable
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Memory
Usage
Input
Graph
34/42
EXP2. Memory Efficiency
SWeG loads ≤0.1−4% of edges
in main memory at once
294X1209X
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Advantages of SWeG
35/42
Fast with Concise Outputs
Memory Efficient
Scalable
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
36/42
About 20 iterations are enough
EXP3. Effect of Iterations
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
EXP4. Data Scalability
37/42
SWeG is linear in the number of edges
SWeG
(Hadoop)
SWeG
(Single machine)
≥ 𝟐𝟎 billion edges
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
EXP5. Machine Scalability
38/42
SWeG
(Hadoop)
SWeG
(Single machine)
SWeG scales up
ideal
ideal
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
Advantages of SWeG
39/42
Fast with Concise Outputs
Memory Efficient
Scalable
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
EXP6. Further Compression
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 40/42
SWeG+ achieves ~0.7 bit / link for Web graphs
BV BFS BP SWeG+
3.4X
- dataset:
𝟏. 𝟐
−
𝟑. 𝟒×
additional com
pression
Roadmap
• Problem Definition
• Proposed Method: SWeG
• Experimental Results
• Conclusions <<
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 41/42
Conclusions
• We propose SWeG (Summarizing Web Graphs)
◦ for summarizing large-scale graphs
42/42
Fast with Concise Outputs
Memory Efficient
Scalable
SWeG
(Hadoop)
SWeG
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)

More Related Content

Similar to SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering煜林 车
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxgnans Kgnanshek
 
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...Thejaka Amila Kanewala, Ph.D.
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystGábor Szárnyas
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Linear regression
Linear regressionLinear regression
Linear regressionansrivas21
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryDataWorks Summit
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarizationaftab alam
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...ssuser4b1f48
 

Similar to SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (20)

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Pytorch meetup
Pytorch meetupPytorch meetup
Pytorch meetup
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY...
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark Catalyst
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarization
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

  • 1. SWeG: Lossless and Lossy Summarization of Web-Scale Graphs Kijung Shin Amol Ghoting Myunghwan Kim Hema Raghavan SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 2. Graphs are Everywhere SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 2/42
  • 3. Graphs are Everywhere (cont.) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 3/42
  • 4. SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 4/42 Graphs are Everywhere (cont.)
  • 5. Real-world Graphs are Huge SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 5/42 2B+ active users 500M+ products 300M+ customers 4B+ Web pages 600M+ users 20B+ connections × 30+ How can we analyze and utilize such large graph data?
  • 6. Limitations of Existing Tools •Graph algorithms in textbooks ◦ Assume graphs fit in main memory (i.e., random-access memory) •Tools for out-of-core graph processing ◦ Not for every graph algorithm ◦ Requiring engineering for each algorithm ◦ Inappropriate for real-time applications SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 6/42 1000+ new graph algorithms per year!
  • 7. Solution: Graph Compression SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 7/42 Textbook algorithms Any new algorithms Real-time algorithms •Compressing large graphs so that ◦ Compressed data fit in main memory ◦ Algorithms can be performed on compressed data without changes
  • 8. Roadmap • Problem Definition << • Proposed Method: SWeG • Experimental Results • Conclusions SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 8/42
  • 9. Graph Summarization: Example • Graph summarization is a graph compression technique SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 9/42 − (𝑎, 𝑑) 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎, 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 Input Graph (w/ 9 edges) Output (w/ 6 edges)
  • 10. Graph Summarization: Definition • Given: an input graph • Find: ◦ a summary graph ◦ positive and negative residual graphs • To Minimize: the edge count (≈ description length) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 10/42 Summary Graph Residual Graph (Positive) Residual Graph (Negative) 𝑎 𝑏 𝑑 𝑒 𝑓 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 Input Graph 𝑐
  • 11. Restoration: Example 11/42 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 𝑎, 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 − (𝑎, 𝑑) 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 Restored Graph (w/ 9 edges) Summarized Graph (w/ 6 edges) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 12. Why Graph Summarization? • Neighbor queries can be rapid and efficient §Given a seed node, return its neighbors §Key building block of most graph algorithms • Easily extended to lossy compression • Easily combined with other graph-compression methods ◦ the outputs are also graphs 12/42SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) Summary Graph Residual Graph (Positive) Residual Graph (Negative)𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 discussed in the paper
  • 13. Challenge: Scalability! 13/42 Maximum Size of Input Graphs Compression Performance Good Bad VoG [KKVF14] Greedy [NSR08] millions 10 millions Randomized [NSR08] SAGS [KNL15] billions 10,000× SWeG (Proposed) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 14. Our Contribution: SWeG 14/42 Fast with Concise Outputs Memory Efficient Scalable • We develop SWeG (SWeG: Lossless and Lossy Summarization of Web-Scale Graphs): SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 15. Roadmap • Problem Definition • Proposed Method: SWeG << • Experimental Results • Conclusions SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 15/42
  • 16. Main Idea behind SWeG • Graph summarization ≈ node clustering ◦ Finding sets of similar nodes to be merged into super nodes • Previous heuristics are greedy algorithms • Why are even greedy algorithms slow? ◦ Too many node pairs are considered: 𝑂(𝑛3) • How can we reduce the number of node pairs to be considered without missing similar node pairs? SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 16/42 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
  • 17. Main Idea behind SWeG (cont.) • Step 1: Coarse clustering (Grouping) ◦ Fast and careless • Step 2: Fine clustering (Merging) ◦ Greedy algorithm ◦ Only node pairs within each group are considered SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 17/42 Repeated 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔
  • 18. Terminologies 18/42 Summary Graph 𝑺 {𝑎, 𝑏} = 𝐴 𝑐, 𝑑, 𝑒 = 𝐵 𝑓, 𝑔 = 𝐶 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 super node Residual Graph 𝑹 Positive Residual Graph 𝑹< Negative Residual Graph 𝑹= 𝑺𝒂𝒗𝒊𝒏𝒈 𝑨, 𝑩 : = 1 − 𝐶𝑜𝑠𝑡(𝐴 ∪ 𝐵) 𝐶𝑜𝑠𝑡 𝐴 + 𝐶𝑜𝑠𝑡(𝐵) Encoding cost when 𝐴 and 𝐵 are merged Encoding cost of 𝐴 Encoding cost of 𝐵 SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 19. Details of SWeG • Inputs: - input graph 𝑮 - number of iterations 𝑻 • Outputs: - summary graph 𝑺 - residual graph 𝑹 (or 𝑹< and 𝑹= ) • Procedure: 19/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 20. SWeG: Initializing Step 20/42 𝐴 = {𝑎} 𝐵 = {𝑏} 𝐷 = {𝑑} 𝐸 = {𝑒} 𝐹 = {𝑓} 𝐺 = {𝑔} 𝐶 = {𝑐} Summary Graph 𝑺 = 𝑮 Residual Graph 𝑹 = ∅ • S0: Initializing Step << • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 21. SWeG: Dividing Step 21/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step << • S1-2: Merging Step • S2: Compressing Step (optional) 𝐴 = {𝑎} 𝐵 = {𝑏} 𝐷 = {𝑑} 𝐶 = {𝑐} 𝐸 = {𝑒} 𝐹 = {𝑓} 𝐺 = {𝑔} • Divides super nodes into groups ◦ MinHashing: rapid and out-of-core SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 22. SWeG: Merging Step • Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W) 22/42 𝐴 = {𝑎, 𝑏} 𝐷 = {𝑑, 𝑒} 𝐶 = {𝑐} 𝐹 = {𝑓, 𝑔} • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 23. 23/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) 𝐴 = {𝑎, 𝑏} 𝐷 = {𝑑, 𝑒} 𝐹 = {𝑓, 𝑔} 𝐶 = {𝑐} Summary Graph 𝑺 Residual Graph 𝑹 − (𝑎, 𝑑) + 𝑑, 𝑔 SWeG: Merging Step (cont.) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 24. SWeG: Dividing Step 24/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step << • S1-2: Merging Step • S2: Compressing Step (optional) 𝐴 = {𝑎, 𝑏} 𝐷 = {𝑑, 𝑒} 𝐶 = {𝑐}𝐹 = {𝑓, 𝑔} • Divides super nodes into groups SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 25. 25/42 𝐴 = {𝑎, 𝑏} 𝐶 = {𝑐, 𝑑, 𝑒} 𝐹 = {𝑓, 𝑔} • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) • Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W) SWeG: Merging Step SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 26. 26/42 𝐴 = {𝑎, 𝑏} 𝐵 = 𝑐, 𝑑, 𝑒 𝐹 = 𝑓, 𝑔 − (𝑎, 𝑑) − (𝑐, 𝑒) + 𝑑, 𝑔 Summary Graph 𝑺 Residual Graph 𝑹 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) SWeG: Merging Step (cont.) SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 27. 27/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) • Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(W) • Decreasing 𝜃(W) = 1 + 𝑡 =X ◦ exploration of other groups ◦ exploitation within each group ◦ ~ 30% better compression than 𝜃(W) = 0 SWeG: Merging Step (cont.) Details SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 28. SWeG: Compressing Step • Compress each output graph (𝑺, 𝑹< and 𝑹= ) • Use any off-the-shelf graph-compression algorithm ◦ Boldi-Vigna [BV04] / VNMiner [BC08] / BFS [AD09] / Graph Bisection [DKKO+16] •SWeG+: SWeG with the compressing step 28/42 • S0: Initializing Step • repeat 𝑇 times • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) << SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 29. No need to load the entire graph in memory! • Map stage: compute min hashes in parallel • Shuffle stage: divide super nodes using min hashes • Reduce stage: process groups independently in parallel 29/42 Parallel & Distributed Processing 𝐴 = {𝑎} 𝐵 = {𝑏} 𝐶 = {𝑐} MinHash = 𝟏 Merge! 𝐷 = {𝑑} 𝐸 = {𝑒} MinHash = 𝟐 𝐹 = {𝑓} 𝐺 = {𝑔} MinHash = 𝟑 Merge! Merge! Details SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 30. Roadmap • Problem Definition • Proposed Method: SWeG • Experimental Results << • Conclusions SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 30/42
  • 31. Experimental Settings 31/42 • 13 real-world graphs (10K - 20B edges) • Graph summarization algorithms: ◦ Greedy [NRS08], Randomized [NSR08], SAGS [KNL15] • Implementations: & Social Collaboration Citation Web … … SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 32. EXP1. Speed and Compression 32/42 SWeG outperforms its competitors SWeG - dataset: 𝟑 𝟕 𝟎 − 𝟒, 𝟒 𝟗 𝟎× faster SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 33. Advantages of SWeG 33/42 Fast with Concise Outputs Memory Efficient Scalable SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 34. Memory Usage Input Graph 34/42 EXP2. Memory Efficiency SWeG loads ≤0.1−4% of edges in main memory at once 294X1209X SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 35. Advantages of SWeG 35/42 Fast with Concise Outputs Memory Efficient Scalable SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 36. 36/42 About 20 iterations are enough EXP3. Effect of Iterations SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 37. EXP4. Data Scalability 37/42 SWeG is linear in the number of edges SWeG (Hadoop) SWeG (Single machine) ≥ 𝟐𝟎 billion edges SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 38. EXP5. Machine Scalability 38/42 SWeG (Hadoop) SWeG (Single machine) SWeG scales up ideal ideal SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 39. Advantages of SWeG 39/42 Fast with Concise Outputs Memory Efficient Scalable SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)
  • 40. EXP6. Further Compression SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 40/42 SWeG+ achieves ~0.7 bit / link for Web graphs BV BFS BP SWeG+ 3.4X - dataset: 𝟏. 𝟐 − 𝟑. 𝟒× additional com pression
  • 41. Roadmap • Problem Definition • Proposed Method: SWeG • Experimental Results • Conclusions << SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin) 41/42
  • 42. Conclusions • We propose SWeG (Summarizing Web Graphs) ◦ for summarizing large-scale graphs 42/42 Fast with Concise Outputs Memory Efficient Scalable SWeG (Hadoop) SWeG SWeG: Lossless and Lossy Summarization of Web-Scale Graphs (by Kijung Shin)