Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

Dynamic Batch Parallel Algorithms for Updating PageRank
Subhajit Sahu†, Kishore Kothapalli† and Dip Sankar Banerjee‡
†International Institute of Information Technology Hyderabad, India.
‡Indian Institute of Technology Jodhpur, India.
subhajit.sahu@research.,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
Acknowledgement
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the National
Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
References
[1] P. Garg and K. Kothapalli, “STIC-D: Algorithmic Techniques For Efficient Parallel Pagerank Computation on Real-World Graphs,” in
Proceedings of the 17th International Conference on Distributed Computing and Networking - ICDCN ’16. ACM Press, 01 2016, pp. 1—-10.
[2] H. K. Giri, M. Haque, and D. S. Banerjee, “HyPR: Hybrid Page Ranking on Evolving Graphs,” in Proc. IEEE 27th International Conference on
High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 62–71.
Results
Batched vs Cumulative update
- CPU: 4066×, 2998× of 5000 edges batch wrt
single-edge cumulative update.
- GPU: 1712×, 2324× of 5000 edges batch wrt
cumulative single-edge update.
Comparison with state-of-the-art
- CPU: 6.1×, 8.6× wrt static plain STIC-D PR [1].
- GPU: 9.8×, 9.3× wrt naive dynamic nvGraph PR.
- CPU: 4.2×, 5.8× wrt Pure CPU HyPR [2].
- GPU: 1.9×, 1.8× wrt Pure GPU HyPR.
Figure 2: Comparison with pure-CPU HyPR and plain STIC-D PR on
the CPU; speedup of DynamicLevelwisePR on the respective bars
(top). Comparison with pure-GPU HyPR and naive dynamic
nvGraph PR on the GPU; speedup of DynamicMonolithicPR on the
respective bars (bottom). Averaged over batch sizes of 500, 1000,
2000, 5000, and 10000.
Figure 3: Speedup of batched DynamicLevelwisePR with respect
to cumulative single-edge updates (same approach) on the CPU
is shown on the top. Speedup of batched DynamicMonolithicPR
with respect to cumulative single-edge updates with the same
approach on the GPU is shown on the bottom. Batch sizes of 500,
1000, 5000, and 10000 are shown.
Dataset
- From the SuiteSparse Matrix Collection.
- Add self-loops to dead ends in all graphs.
- Number of vertices vary from 75k to 41M.
- Number of edges vary from 524k to 1.1B.
Batch generation
- Batch sizes vary from 500 to 10,000 edges.
- Edge insertions, deletions in equal mix.
- High degree vertices have higher chance
of selection (mimic real-world graphs).
- No new vertices are added or removed.
Performance measurement
- 32-bit integers for CSR representation.
- 32-bit floats for rank vector.
- L∞-norm for error measurement,
(L2-norm for nvGraph PageRank).
- Measured time only rank computation.
Platform
- Intel(R) Xeon(R) Silver 4116 CPU (12 cores) x 2
Cache L1: 768KB, L2: 12MB, L3: 16MB (shared).
- NVIDIA Tesla V100 GPU (16GB PCIe),
14 TFLOPs SP (84 SMs x 64 FP/INT cores),
- CentOS 7.9, OpenMP 5.0, CUDA 11.3, GCC 9.3.
Our Approaches
DynamicLevelwisePR
- Contrast to full power-iteration.
- Process vertices in levels of SCCs.
- Avoid converged/unstable vertices.
- No per-iteration sharing of ranks.
- Faster on CPU with OpenMP.
- Slightly higher error.
- Requires graph to be dead-end free.
DynamicMonolithicPR
- Full power-iteration, process all vertices.
- Group vertices by SCC for better access.
- Partition vertices by in-degree on GPU.
- Use old ranks, skip unaffected vertices.
- Affected vertices found with DFS.
- Faster on GPU with CUDA.
Introduction
Types of Dynamic graph algorithms
- Incremental: handles 1 edge/vertex insertion.
- Decremental: handles 1 edge/vertex deletion.
- Fully dynamic: handles 1 insertion or deletion.
- Batched fully dynamic: handles n insertions
and/or deletions.
Benefits of Dynamic graph algorithms
- Reduces time needed for performing analytics.
- Enables interactivity with dataset.
- Parallel fully dynamic algorithms accept a
batch of updates to minimize computation
needed in contrast to fully dynamic ones.
PageRank computation approaches
- Matrix multiplication.
- Power-iteration (push vs pull).
- Random walk (approximate).
Challenges & Limitations
- Graphs are massive and constantly updated.
- Existing dynamic algorithms do not utilize
reducibility of graphs.
- Vertices which are dependent upon other
vertices to converge are still processed.
- Locality benefits of SCCs are not explored.
PageRank has applications in:
- Ranking of websites.
- Measuring scientific impact of researchers.
- Finding the best teams and athletes.
- Ranking companies by talent concentration.
- Predicting road/foot traffic in urban spaces.
- Analysing protein networks.
- Finding the most authoritative news sources
- Identifying parts of brain that change jointly.
- Toxic waste management.
- PageRank is a link-analysis algorithm.
- By Larry Page and Sergey Brin in 1996.
- For ordering information on the web.
- Represented with a random-surfer model.
- Rank of a page is defined recursively.
- Calculate iteratively with power-iteration.
Fighting Fake news
- Click-Gap: When is Facebook is driving
disproportionate amounts of traffic to
websites.
- Effort to rid fake news from Facebook’s
services.
- Is a website relying on Facebook to drive
significant traffic, but not well ranked by the
rest of the web?
Debugging complex software systems
- MonitorRank: a version of PageRank designed
to analyze complex, engineered systems.
- Returns a ranked list of systems based on the
likelihood that they contributed to, or
participated in, an anomalous situation.
Finding the most original writers
- BookRank: using a network of 19th century
authors to find quantitative evidence that
Jane Austin and Walter Scott were found to be
the most original authors of the 19th century.
Finding topical authorities
- TwitterRank: using the teleportation vector
and topic-specific transition probabilities to
localize the PageRank vector.
1
2
3
4

Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

More Related Content

Similar to Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

More from Subhajit Sahu

Recently uploaded

Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER