SlideShare a Scribd company logo
Yulin Che, Shixuan Sun, Qiong Luo
Hong Kong University of
Science and Technology
1
Parallelizing Pruning-based
Graph Structural Clustering
2
2
Outline
3、Parallelization & Vectorization
1、Pruning-based Graph Structural Clustering
2、Performance Bottleneck & Challenge
4、Experimental Study
5、Conclusion
Outline
Graph Structural Clustering
2
3
• Graph Clustering
• group vertices into clusters: dense intra connection and sparse
inter connection
• Application
• do recommendations on social networks, web graphs and co-
purchasing graphs
• Graph Structural Clustering (Our Focus)
• utilize structural similarity among vertices for clustering
• identify clusters and vertex roles (cores, non-cores)
• Graph Structural Clustering (Our Focus)
• utilize structural similarity among vertices for clustering
• identify clusters and vertex roles (cores, non-cores: hubs, outliers)
Graph Structural Clustering Example
2
4clusters
core
non-core
SCAN [Xu+, KDD’07]
2
5
• Structural Similarity Computation
• based on neighbors of two vertices 𝑢 and 𝑣 (cosine measure):
• 𝑠𝑖𝑚 𝑢, 𝑣 = 𝑁 𝑢 ∩ 𝑁 𝑣 / |𝑁(𝑢)| ∙ |𝑁(𝑣)|
• 𝑢 and 𝑣 are similar neighbors, if
• they are connected
• their structural similarity 𝑠𝑖𝑚(𝑢, 𝑣) ≥ 𝜀
𝑁 6 = 5,6,7,8,9
𝑁 9 = 6,7,8,9,10,11
𝑠𝑖𝑚 6,9 = 4/ 5 ∙ 6 ≈ 0.73 similar:
𝑠𝑖𝑚(6,9) ≥ 0.6
𝜀 = 0.6, 𝜇 = 3
SCAN [Xu+, KDD’07]
2
6
• Core Checking
• a vertex 𝑢 is a core, if it has ≥ 𝜇 similar neighbors
• Structural Clustering
• clustering by similar neighbors from cores
• adopting Breadth-First-Search (BFS) for cluster expansion
core
non-core
corenon-core
similar
not-similar
𝜀 = 0.6, 𝜇 = 3
• Structural Clustering
• clustering by similar neighbors from cores
• adopting Breadth-First-Search (BFS) for cluster expansion
SCAN [Xu+, KDD’07]
2
7clusters
core
non-core
𝜀 = 0.6, 𝜇 = 3
pSCAN [Chang+, ICDE’16]
2
8
𝑁 6 = 5,6,7,8,9
𝑁 9 = 6,7,8,9,10,11
𝑠𝑖𝑚 6,9 = 4/ 5 ∙ 6 ≈ 0.73
• Pruning Similarity Computations
• adopt union-find data-structure, change algorithmic design
• avoid redundant similarity computation, apply pruning techniques
• apply early termination in similarity computation
set-intersection
expensive
Motivation of Parallelization
2
9
On Twitter (0.7-billion-edge)
• Motivation of Parallelizing pSCAN
• cost too much time for interactive exploration of clustering results
• cost most from the set-intersection based similarity computation
2
10
Outline
3、Parallelization & Vectorization
1、Pruning-based Graph Structural Clustering
2、Performance Bottleneck & Challenge
4、Experimental Study
5、Conclusion
Time BreakDown (SCAN and pSCAN)
2
11
On LiveJournal/Orkut/Twitter
• Observations
• similarity computation is the performance bottleneck
• workload reduction of pSCAN is light-weight but useful
pSCAN Parallelization Challenge
2
12
• Data Dependency
• there exists concurrent access of lower and upper bounds of
number of similar neighbors
• a priority queue for selecting vertex with max upper bound of
similar neighbors requires heavy synchronization
• owing to the similarity reuse technique, similarity values
𝒔𝒊𝒎(𝒖, 𝒗) and 𝒔𝒊𝒎(𝒗, 𝒖) are dependent
• Clustering Concurrency Issues
• union-find operations should be thread-safe
• cluster id initialization and assignment should be thread-safe
• Workload Skew and Irregularity
• the workload for each vertex depends on its degree and role
• pruning techniques make the workload irregular
2
13
Outline
3、Parallelization & Vectorization
1、Pruning-based Graph Structural Clustering
2、Performance Bottleneck & Challenge
4、Experimental Study
5、Conclusion
Two-Step Multi-Phase Design Overview
2
14
• Step 1: Role Computing (Determining Core or Non-Core)
• similarity pruning phase (without similarity computation)
• core checking phase
• core consolidating phase (finalizing roles of all the vertices)
• Step 2: Core and Non-Core Clustering
• core clustering without similarity computation phase
• finalizing core clustering with similarity computation phase
• cluster id initialization phase
• non-core clustering (cluster id assignment) phase
• Computation Optimizations
• degree-based task scheduling: dealing with the workload
skewness
• pivot-based set-intersection vectorization: improving the
efficiency of similarity computation
Step 1 : Role Computing
2
15
core
non-core
finalizing vertex roles (either core or non-core),
stashing some similarity values (unknow/similar/not-similar)
Similarity Pruning
2
16
• Utilize Similarity Definition & Min-Max Pruning
• do not incur set intersections (similarity computations)
• utilize lower and upper bounds of number of similar neighbors
similar
not-similar
𝑁 0 ∩ 𝑁 1 ≥ 2
𝑠𝑖𝑚(0,1) ≥ 2/ 4 ∙ 2 ≈ 0.71 > 𝜀
𝑁 9 ∩ 𝑁 11 ≤ 𝑁 11 = 2
𝑠𝑖𝑚(9,11) ≤ 2/ 6 ∙ 2 ≈ 0.58 < 𝜀
lower bound
upper bound
Similarity Pruning
2
17
• Utilize Similarity Definition & Min-Max Pruning
• do not incur set intersections (similarity computations)
• utilize lower and upper bounds of number of similar neighbors
this phase determines
some vertex roles
without similarity
computations
lower bound
upper bound
local variables 𝒔𝒅 (similar degree) and 𝒆𝒅 (effective degree):
lower and upper bounds of 𝑢’s similar neighborhood size
• Vertex Exploration Order Constraint (𝒖 < 𝒗):
• to guarantee no redundant computation: each undirected edge is
computed at most once for the similarity value
• Core Checking and Consolidating Two-Phase
• to apply similarity reuse technique given vertex order constraint,
while finalizing all vertex roles
Core Checking and Consolidating
2
18
after the two parallel phases,
all vertex roles are known
two phases
Core Checking
2
19
two phases
1) initialize pruning related
lower and upper bounds, and
see if we can benefit from
parallel core checking from
𝑢’s neighbors with the min-
max pruning technique
Core Checking
2
20
two phases
2) determine some vertex roles,
and apply the similarity reuse and
min-max pruning techniques
Core Consolidating
2
21
finalizing all vertex roles
two phases
work-efficiency-proof: the similarity computation is at most invoked once for
the similarity values 𝒔𝒊𝒎[𝒆(𝒖, 𝒗)] and 𝒔𝒊𝒎[𝒆(𝒗, 𝒖)]
Step 2 : Core and Non-Core Clustering
2
22
core and non-core clustering
cluster 0
cluster 1
non-core
Core Clustering
2
23
• Two Phase Separation
• core clustering using already known similarity values without set
intersections
• finalizing core clustering with set intersections
union-find pruning
vertex order constraint
vertex order constraint
• Avoiding Redundant Computation
• adding 𝑢 < 𝑣 constraint during the clustering for both phases
• applying union-find pruning in the second phase
two phases
Non-Core Clustering
2
24
• Two Phase Separation
• cluster id initialization using atomic operations
• non-core cluster id assignment from all the cores to similar
neighbors (to form final results)
CAS atomic operations
cluster id assignment
Degree-based Task Scheduling
2
25
degree-accumulation-
based-task-submission
handling tail task
• Dynamic Scheduling
• vertex computations relate to the degree and role of the vertex in all
the phases, for the exploration of neighbors and computations
• a task can be represented with 𝑣 𝑏𝑒𝑔, 𝑣 𝑒𝑛𝑑 , and the parameter of
range size is tunable (in our experimental setting: 32768)
Dynamic Scheduling Example for Core Checking
Set-Intersection Vectorization
2
26
1. Find the first element in u’s
sorted neighbors ≥ pivot_v
2. Find the first element in v’s
sorted neighbors ≥ pivot_u
3. Count the match
upper and lower bounds
early termination
2
27
Outline
3、Parallelization & Vectorization
1、Pruning-based Graph Structural Clustering
2、Performance Bottleneck & Challenge
4、Experimental Study
5、Conclusion
Experimental Setup
2
28
• Environments
• Xeon Phi Processor (KNL): 64 cores (2 VPUs / core), AVX512,
64KB/1024KB L1/L2 caches, 16GB MCDRAM (cache mode),
96GB RAM
• Xeon CPU E5-2650: 20 cores, AVX2, 64KB/256KB/25MB
L1/L2/L3 cache, 64GB RAM
• Algorithms
• Sequential: SCAN [Xu+, KDD’07], pSCAN [Chang+, ICDE’16]
• Parallel: anySCAN [Mai+, ICDE’17], SCAN-XP [Takahashi,
NDA’17], our ppSCAN, ppSCAN-NO (without vectorization)
• Graphs (Billion-Edge)
• Real-World: Orkut/Twitter/Friendster (Social), Webbase (Web)
• Synthetic Power-Law: ROLL Graphs [Hadian+, SIGMOD’16]
On KNL
Overall Performance (on CPU and KNL)
2
29
On CPU
• Algorithm Comparison
• SCAN and pSCAN lack parallelization
• SCAN-XP lacks usage of pruning techniques
• anySCAN suffers from heavy synchronization and poor memory locality,
and runs out of memory on webbase and friendster datasets
• ppSCAN has good memory locality and negligible synchronization
overheads, and utilizes vectorization AVX2 on CPU and AVX512 on KNL
Set-Intersection Invocation Reduction
2
30
• Work Efficiency
• multi-phase computation does not introduce more workload
• ppSCAN even compute less because of the similarity pruning and
parallel core checking benefits
𝝁 = 𝟏𝟎
𝝁 = 𝟓
Set-Intersection Vectorization Improvement
2
31
𝝁 = 𝟓 (On Both CPU and KNL)
• Core Checking Speedup from Vectorization
• on CPU: at most 3.5x, on KNL: at most 4.5x
• vectorization have better performance with more workloads (when
memory access is not a bottleneck, e.g., when 𝜀 = 0.1 , intensive set
intersections hide the memory access latency
Scalability to Number of threads
2
32
• Scalability and Time Breakdown
• all the four phases scale well to number of threads
• speedup of core checking is better than other phases, because the
intensive set intersection computations hide the memory access
latency
• time breakdown ratio:
• core checking and consolidating > similarity pruning > non-core
clustering > core clustering
𝜺 = 𝟎. 𝟐, 𝝁 = 𝟓 (On KNL)
Robustness
2
33
• Given Different Parameters and Graphs
• robust on varying input parameters
• finish computations within 70 seconds
• robust on synthetic graphs (1-billion edge)
• finish computations within 60 seconds
• achieve up to 135x self-speedup on KNL
2
34
Outline
3、Parallelization & Vectorization
1、Pruning-based Graph Structural Clustering
2、Performance Bottleneck & Challenge
4、Experimental Study
5、Conclusion
Conclusion
2
35
• Parallelization & Vectorization Design
• multi-phase lock-free parallel vertex computations
• dynamic degree-based vertex computation task scheduling
• pivot-based set intersection vectorization
• Experimental Study
• ppSCAN is about 2x faster on KNL (64 cores, AVX512) than on
Xeon CPU (20 cores, AVX2) because of wider SIMD width
• on KNL, two orders of magnitude faster than the sequential pSCAN
• on KNL, one order of magnitude faster than the parallel anySCAN
and SCAN-XP
• on KNL, up to 135x self-speedup (over single-thread ppSCAN)
• Source Codes / Figures / Related Projects :
https://github.com/GraphProcessor/ppSCAN
• More Experimental Studies (Scripts / Figures):
https://github.com/GraphProcessor/ppSCAN/tree/master/pyt
hon_experiments
• This PPT:
https://www.dropbox.com/sh/i1r45o2ceraey8j/AAD8V3
WwPElQjwJ0-QtaKAzYa?dl=0&preview=ppSCAN.pdf
End - Q & A
36

More Related Content

What's hot

Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
NithyananthSengottai
 
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
Eswar Publications
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architecturesIslam Samir
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
BaliThorat1
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12Alexander Decker
 
A novel load balancing model for overloaded cloud
A novel load balancing model for overloaded cloudA novel load balancing model for overloaded cloud
A novel load balancing model for overloaded cloud
eSAT Publishing House
 
Ijcatr04041019
Ijcatr04041019Ijcatr04041019
Ijcatr04041019
Editor IJCATR
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
Naoki Shibata
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systems
inside-BigData.com
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Naoki Shibata
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysisirisshicat
 
Optimum capacity allocation of distributed generation units using parallel ps...
Optimum capacity allocation of distributed generation units using parallel ps...Optimum capacity allocation of distributed generation units using parallel ps...
Optimum capacity allocation of distributed generation units using parallel ps...
eSAT Journals
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
BaliThorat1
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
108kaushik
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
Vajira Thambawita
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
csandit
 
10. resource management
10. resource management10. resource management
10. resource management
Dr Sandeep Kumar Poonia
 

What's hot (20)

Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
Analysis of Multi Level Feedback Queue Scheduling Using Markov Chain Model wi...
 
Birch
BirchBirch
Birch
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12
 
A novel load balancing model for overloaded cloud
A novel load balancing model for overloaded cloudA novel load balancing model for overloaded cloud
A novel load balancing model for overloaded cloud
 
Ijcatr04041019
Ijcatr04041019Ijcatr04041019
Ijcatr04041019
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Resource Management for Computer Operating Systems
Resource Management for Computer Operating SystemsResource Management for Computer Operating Systems
Resource Management for Computer Operating Systems
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis
 
Optimum capacity allocation of distributed generation units using parallel ps...
Optimum capacity allocation of distributed generation units using parallel ps...Optimum capacity allocation of distributed generation units using parallel ps...
Optimum capacity allocation of distributed generation units using parallel ps...
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
 
10. resource management
10. resource management10. resource management
10. resource management
 

Similar to Parallelizing Pruning-based Graph Structural Clustering

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
ssuser2624f71
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
ssuser2624f71
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
DaeJin Kim
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
Nevil Dsouza
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
thanhdowork
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
Akisato Kimura
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
FaridAliMousa1
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
Sungchul Kim
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
Yongsu Baek
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
oysteing
 
Join Algorithms in MapReduce
Join Algorithms in MapReduceJoin Algorithms in MapReduce
Join Algorithms in MapReduce
Shrihari Rathod
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
ssuser4b1f48
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 

Similar to Parallelizing Pruning-based Graph Structural Clustering (20)

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
 
Join Algorithms in MapReduce
Join Algorithms in MapReduceJoin Algorithms in MapReduce
Join Algorithms in MapReduce
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 

Recently uploaded

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 

Recently uploaded (20)

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 

Parallelizing Pruning-based Graph Structural Clustering

  • 1. Yulin Che, Shixuan Sun, Qiong Luo Hong Kong University of Science and Technology 1 Parallelizing Pruning-based Graph Structural Clustering
  • 2. 2 2 Outline 3、Parallelization & Vectorization 1、Pruning-based Graph Structural Clustering 2、Performance Bottleneck & Challenge 4、Experimental Study 5、Conclusion Outline
  • 3. Graph Structural Clustering 2 3 • Graph Clustering • group vertices into clusters: dense intra connection and sparse inter connection • Application • do recommendations on social networks, web graphs and co- purchasing graphs • Graph Structural Clustering (Our Focus) • utilize structural similarity among vertices for clustering • identify clusters and vertex roles (cores, non-cores)
  • 4. • Graph Structural Clustering (Our Focus) • utilize structural similarity among vertices for clustering • identify clusters and vertex roles (cores, non-cores: hubs, outliers) Graph Structural Clustering Example 2 4clusters core non-core
  • 5. SCAN [Xu+, KDD’07] 2 5 • Structural Similarity Computation • based on neighbors of two vertices 𝑢 and 𝑣 (cosine measure): • 𝑠𝑖𝑚 𝑢, 𝑣 = 𝑁 𝑢 ∩ 𝑁 𝑣 / |𝑁(𝑢)| ∙ |𝑁(𝑣)| • 𝑢 and 𝑣 are similar neighbors, if • they are connected • their structural similarity 𝑠𝑖𝑚(𝑢, 𝑣) ≥ 𝜀 𝑁 6 = 5,6,7,8,9 𝑁 9 = 6,7,8,9,10,11 𝑠𝑖𝑚 6,9 = 4/ 5 ∙ 6 ≈ 0.73 similar: 𝑠𝑖𝑚(6,9) ≥ 0.6 𝜀 = 0.6, 𝜇 = 3
  • 6. SCAN [Xu+, KDD’07] 2 6 • Core Checking • a vertex 𝑢 is a core, if it has ≥ 𝜇 similar neighbors • Structural Clustering • clustering by similar neighbors from cores • adopting Breadth-First-Search (BFS) for cluster expansion core non-core corenon-core similar not-similar 𝜀 = 0.6, 𝜇 = 3
  • 7. • Structural Clustering • clustering by similar neighbors from cores • adopting Breadth-First-Search (BFS) for cluster expansion SCAN [Xu+, KDD’07] 2 7clusters core non-core 𝜀 = 0.6, 𝜇 = 3
  • 8. pSCAN [Chang+, ICDE’16] 2 8 𝑁 6 = 5,6,7,8,9 𝑁 9 = 6,7,8,9,10,11 𝑠𝑖𝑚 6,9 = 4/ 5 ∙ 6 ≈ 0.73 • Pruning Similarity Computations • adopt union-find data-structure, change algorithmic design • avoid redundant similarity computation, apply pruning techniques • apply early termination in similarity computation set-intersection expensive
  • 9. Motivation of Parallelization 2 9 On Twitter (0.7-billion-edge) • Motivation of Parallelizing pSCAN • cost too much time for interactive exploration of clustering results • cost most from the set-intersection based similarity computation
  • 10. 2 10 Outline 3、Parallelization & Vectorization 1、Pruning-based Graph Structural Clustering 2、Performance Bottleneck & Challenge 4、Experimental Study 5、Conclusion
  • 11. Time BreakDown (SCAN and pSCAN) 2 11 On LiveJournal/Orkut/Twitter • Observations • similarity computation is the performance bottleneck • workload reduction of pSCAN is light-weight but useful
  • 12. pSCAN Parallelization Challenge 2 12 • Data Dependency • there exists concurrent access of lower and upper bounds of number of similar neighbors • a priority queue for selecting vertex with max upper bound of similar neighbors requires heavy synchronization • owing to the similarity reuse technique, similarity values 𝒔𝒊𝒎(𝒖, 𝒗) and 𝒔𝒊𝒎(𝒗, 𝒖) are dependent • Clustering Concurrency Issues • union-find operations should be thread-safe • cluster id initialization and assignment should be thread-safe • Workload Skew and Irregularity • the workload for each vertex depends on its degree and role • pruning techniques make the workload irregular
  • 13. 2 13 Outline 3、Parallelization & Vectorization 1、Pruning-based Graph Structural Clustering 2、Performance Bottleneck & Challenge 4、Experimental Study 5、Conclusion
  • 14. Two-Step Multi-Phase Design Overview 2 14 • Step 1: Role Computing (Determining Core or Non-Core) • similarity pruning phase (without similarity computation) • core checking phase • core consolidating phase (finalizing roles of all the vertices) • Step 2: Core and Non-Core Clustering • core clustering without similarity computation phase • finalizing core clustering with similarity computation phase • cluster id initialization phase • non-core clustering (cluster id assignment) phase • Computation Optimizations • degree-based task scheduling: dealing with the workload skewness • pivot-based set-intersection vectorization: improving the efficiency of similarity computation
  • 15. Step 1 : Role Computing 2 15 core non-core finalizing vertex roles (either core or non-core), stashing some similarity values (unknow/similar/not-similar)
  • 16. Similarity Pruning 2 16 • Utilize Similarity Definition & Min-Max Pruning • do not incur set intersections (similarity computations) • utilize lower and upper bounds of number of similar neighbors similar not-similar 𝑁 0 ∩ 𝑁 1 ≥ 2 𝑠𝑖𝑚(0,1) ≥ 2/ 4 ∙ 2 ≈ 0.71 > 𝜀 𝑁 9 ∩ 𝑁 11 ≤ 𝑁 11 = 2 𝑠𝑖𝑚(9,11) ≤ 2/ 6 ∙ 2 ≈ 0.58 < 𝜀 lower bound upper bound
  • 17. Similarity Pruning 2 17 • Utilize Similarity Definition & Min-Max Pruning • do not incur set intersections (similarity computations) • utilize lower and upper bounds of number of similar neighbors this phase determines some vertex roles without similarity computations lower bound upper bound local variables 𝒔𝒅 (similar degree) and 𝒆𝒅 (effective degree): lower and upper bounds of 𝑢’s similar neighborhood size
  • 18. • Vertex Exploration Order Constraint (𝒖 < 𝒗): • to guarantee no redundant computation: each undirected edge is computed at most once for the similarity value • Core Checking and Consolidating Two-Phase • to apply similarity reuse technique given vertex order constraint, while finalizing all vertex roles Core Checking and Consolidating 2 18 after the two parallel phases, all vertex roles are known two phases
  • 19. Core Checking 2 19 two phases 1) initialize pruning related lower and upper bounds, and see if we can benefit from parallel core checking from 𝑢’s neighbors with the min- max pruning technique
  • 20. Core Checking 2 20 two phases 2) determine some vertex roles, and apply the similarity reuse and min-max pruning techniques
  • 21. Core Consolidating 2 21 finalizing all vertex roles two phases work-efficiency-proof: the similarity computation is at most invoked once for the similarity values 𝒔𝒊𝒎[𝒆(𝒖, 𝒗)] and 𝒔𝒊𝒎[𝒆(𝒗, 𝒖)]
  • 22. Step 2 : Core and Non-Core Clustering 2 22 core and non-core clustering cluster 0 cluster 1 non-core
  • 23. Core Clustering 2 23 • Two Phase Separation • core clustering using already known similarity values without set intersections • finalizing core clustering with set intersections union-find pruning vertex order constraint vertex order constraint • Avoiding Redundant Computation • adding 𝑢 < 𝑣 constraint during the clustering for both phases • applying union-find pruning in the second phase two phases
  • 24. Non-Core Clustering 2 24 • Two Phase Separation • cluster id initialization using atomic operations • non-core cluster id assignment from all the cores to similar neighbors (to form final results) CAS atomic operations cluster id assignment
  • 25. Degree-based Task Scheduling 2 25 degree-accumulation- based-task-submission handling tail task • Dynamic Scheduling • vertex computations relate to the degree and role of the vertex in all the phases, for the exploration of neighbors and computations • a task can be represented with 𝑣 𝑏𝑒𝑔, 𝑣 𝑒𝑛𝑑 , and the parameter of range size is tunable (in our experimental setting: 32768) Dynamic Scheduling Example for Core Checking
  • 26. Set-Intersection Vectorization 2 26 1. Find the first element in u’s sorted neighbors ≥ pivot_v 2. Find the first element in v’s sorted neighbors ≥ pivot_u 3. Count the match upper and lower bounds early termination
  • 27. 2 27 Outline 3、Parallelization & Vectorization 1、Pruning-based Graph Structural Clustering 2、Performance Bottleneck & Challenge 4、Experimental Study 5、Conclusion
  • 28. Experimental Setup 2 28 • Environments • Xeon Phi Processor (KNL): 64 cores (2 VPUs / core), AVX512, 64KB/1024KB L1/L2 caches, 16GB MCDRAM (cache mode), 96GB RAM • Xeon CPU E5-2650: 20 cores, AVX2, 64KB/256KB/25MB L1/L2/L3 cache, 64GB RAM • Algorithms • Sequential: SCAN [Xu+, KDD’07], pSCAN [Chang+, ICDE’16] • Parallel: anySCAN [Mai+, ICDE’17], SCAN-XP [Takahashi, NDA’17], our ppSCAN, ppSCAN-NO (without vectorization) • Graphs (Billion-Edge) • Real-World: Orkut/Twitter/Friendster (Social), Webbase (Web) • Synthetic Power-Law: ROLL Graphs [Hadian+, SIGMOD’16]
  • 29. On KNL Overall Performance (on CPU and KNL) 2 29 On CPU • Algorithm Comparison • SCAN and pSCAN lack parallelization • SCAN-XP lacks usage of pruning techniques • anySCAN suffers from heavy synchronization and poor memory locality, and runs out of memory on webbase and friendster datasets • ppSCAN has good memory locality and negligible synchronization overheads, and utilizes vectorization AVX2 on CPU and AVX512 on KNL
  • 30. Set-Intersection Invocation Reduction 2 30 • Work Efficiency • multi-phase computation does not introduce more workload • ppSCAN even compute less because of the similarity pruning and parallel core checking benefits 𝝁 = 𝟏𝟎 𝝁 = 𝟓
  • 31. Set-Intersection Vectorization Improvement 2 31 𝝁 = 𝟓 (On Both CPU and KNL) • Core Checking Speedup from Vectorization • on CPU: at most 3.5x, on KNL: at most 4.5x • vectorization have better performance with more workloads (when memory access is not a bottleneck, e.g., when 𝜀 = 0.1 , intensive set intersections hide the memory access latency
  • 32. Scalability to Number of threads 2 32 • Scalability and Time Breakdown • all the four phases scale well to number of threads • speedup of core checking is better than other phases, because the intensive set intersection computations hide the memory access latency • time breakdown ratio: • core checking and consolidating > similarity pruning > non-core clustering > core clustering 𝜺 = 𝟎. 𝟐, 𝝁 = 𝟓 (On KNL)
  • 33. Robustness 2 33 • Given Different Parameters and Graphs • robust on varying input parameters • finish computations within 70 seconds • robust on synthetic graphs (1-billion edge) • finish computations within 60 seconds • achieve up to 135x self-speedup on KNL
  • 34. 2 34 Outline 3、Parallelization & Vectorization 1、Pruning-based Graph Structural Clustering 2、Performance Bottleneck & Challenge 4、Experimental Study 5、Conclusion
  • 35. Conclusion 2 35 • Parallelization & Vectorization Design • multi-phase lock-free parallel vertex computations • dynamic degree-based vertex computation task scheduling • pivot-based set intersection vectorization • Experimental Study • ppSCAN is about 2x faster on KNL (64 cores, AVX512) than on Xeon CPU (20 cores, AVX2) because of wider SIMD width • on KNL, two orders of magnitude faster than the sequential pSCAN • on KNL, one order of magnitude faster than the parallel anySCAN and SCAN-XP • on KNL, up to 135x self-speedup (over single-thread ppSCAN)
  • 36. • Source Codes / Figures / Related Projects : https://github.com/GraphProcessor/ppSCAN • More Experimental Studies (Scripts / Figures): https://github.com/GraphProcessor/ppSCAN/tree/master/pyt hon_experiments • This PPT: https://www.dropbox.com/sh/i1r45o2ceraey8j/AAD8V3 WwPElQjwJ0-QtaKAzYa?dl=0&preview=ppSCAN.pdf End - Q & A 36