Dynamic Batch Parallel Algorithms for Updating PageRank
Subhajit Sahu†, Kishore Kothapalli† and Dip Sankar Banerjee‡
†International Institute of Information Technology Hyderabad, India.
‡Indian Institute of Technology Jodhpur, India.
subhajit.sahu@research.,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
Acknowledgement
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the National
Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
References
[1] P. Garg and K. Kothapalli, “STIC-D: Algorithmic Techniques For Efficient Parallel Pagerank Computation on Real-World Graphs,” in
Proceedings of the 17th International Conference on Distributed Computing and Networking - ICDCN ’16. ACM Press, 01 2016, pp. 1—-10.
[2] H. K. Giri, M. Haque, and D. S. Banerjee, “HyPR: Hybrid Page Ranking on Evolving Graphs,” in Proc. IEEE 27th International Conference on
High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 62–71.
Results
Batched vs Cumulative update
- CPU: 4066×, 2998× of 5000 edges batch wrt
single-edge cumulative update.
- GPU: 1712×, 2324× of 5000 edges batch wrt
cumulative single-edge update.
Comparison with state-of-the-art
- CPU: 6.1×, 8.6× wrt static plain STIC-D PR [1].
- GPU: 9.8×, 9.3× wrt naive dynamic nvGraph PR.
- CPU: 4.2×, 5.8× wrt Pure CPU HyPR [2].
- GPU: 1.9×, 1.8× wrt Pure GPU HyPR.
Figure 2: Comparison with pure-CPU HyPR and plain STIC-D PR on
the CPU; speedup of DynamicLevelwisePR on the respective bars
(top). Comparison with pure-GPU HyPR and naive dynamic
nvGraph PR on the GPU; speedup of DynamicMonolithicPR on the
respective bars (bottom). Averaged over batch sizes of 500, 1000,
2000, 5000, and 10000.
Figure 3: Speedup of batched DynamicLevelwisePR with respect
to cumulative single-edge updates (same approach) on the CPU
is shown on the top. Speedup of batched DynamicMonolithicPR
with respect to cumulative single-edge updates with the same
approach on the GPU is shown on the bottom. Batch sizes of 500,
1000, 5000, and 10000 are shown.
Dataset
- From the SuiteSparse Matrix Collection.
- Add self-loops to dead ends in all graphs.
- Number of vertices vary from 75k to 41M.
- Number of edges vary from 524k to 1.1B.
Batch generation
- Batch sizes vary from 500 to 10,000 edges.
- Edge insertions, deletions in equal mix.
- High degree vertices have higher chance
of selection (mimic real-world graphs).
- No new vertices are added or removed.
Performance measurement
- 32-bit integers for CSR representation.
- 32-bit floats for rank vector.
- L∞-norm for error measurement,
(L2-norm for nvGraph PageRank).
- Measured time only rank computation.
Platform
- Intel(R) Xeon(R) Silver 4116 CPU (12 cores) x 2
Cache L1: 768KB, L2: 12MB, L3: 16MB (shared).
- NVIDIA Tesla V100 GPU (16GB PCIe),
14 TFLOPs SP (84 SMs x 64 FP/INT cores),
- CentOS 7.9, OpenMP 5.0, CUDA 11.3, GCC 9.3.
Our Approaches
DynamicLevelwisePR
- Contrast to full power-iteration.
- Process vertices in levels of SCCs.
- Avoid converged/unstable vertices.
- No per-iteration sharing of ranks.
- Faster on CPU with OpenMP.
- Slightly higher error.
- Requires graph to be dead-end free.
DynamicMonolithicPR
- Full power-iteration, process all vertices.
- Group vertices by SCC for better access.
- Partition vertices by in-degree on GPU.
- Use old ranks, skip unaffected vertices.
- Affected vertices found with DFS.
- Faster on GPU with CUDA.
Introduction
Types of Dynamic graph algorithms
- Incremental: handles 1 edge/vertex insertion.
- Decremental: handles 1 edge/vertex deletion.
- Fully dynamic: handles 1 insertion or deletion.
- Batched fully dynamic: handles n insertions
and/or deletions.
Benefits of Dynamic graph algorithms
- Reduces time needed for performing analytics.
- Enables interactivity with dataset.
- Parallel fully dynamic algorithms accept a
batch of updates to minimize computation
needed in contrast to fully dynamic ones.
PageRank computation approaches
- Matrix multiplication.
- Power-iteration (push vs pull).
- Random walk (approximate).
Challenges & Limitations
- Graphs are massive and constantly updated.
- Existing dynamic algorithms do not utilize
reducibility of graphs.
- Vertices which are dependent upon other
vertices to converge are still processed.
- Locality benefits of SCCs are not explored.
PageRank has applications in:
- Ranking of websites.
- Measuring scientific impact of researchers.
- Finding the best teams and athletes.
- Ranking companies by talent concentration.
- Predicting road/foot traffic in urban spaces.
- Analysing protein networks.
- Finding the most authoritative news sources
- Identifying parts of brain that change jointly.
- Toxic waste management.
- PageRank is a link-analysis algorithm.
- By Larry Page and Sergey Brin in 1996.
- For ordering information on the web.
- Represented with a random-surfer model.
- Rank of a page is defined recursively.
- Calculate iteratively with power-iteration.
Fighting Fake news
- Click-Gap: When is Facebook is driving
disproportionate amounts of traffic to
websites.
- Effort to rid fake news from Facebook’s
services.
- Is a website relying on Facebook to drive
significant traffic, but not well ranked by the
rest of the web?
Debugging complex software systems
- MonitorRank: a version of PageRank designed
to analyze complex, engineered systems.
- Returns a ranked list of systems based on the
likelihood that they contributed to, or
participated in, an anomalous situation.
Finding the most original writers
- BookRank: using a network of 19th century
authors to find quantitative evidence that
Jane Austin and Walter Scott were found to be
the most original authors of the 19th century.
Finding topical authorities
- TwitterRank: using the teleportation vector
and topic-specific transition probabilities to
localize the PageRank vector.
1
2
3
4

Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

  • 1.
    Dynamic Batch ParallelAlgorithms for Updating PageRank Subhajit Sahu†, Kishore Kothapalli† and Dip Sankar Banerjee‡ †International Institute of Information Technology Hyderabad, India. ‡Indian Institute of Technology Jodhpur, India. subhajit.sahu@research.,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in Acknowledgement This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the National Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16. References [1] P. Garg and K. Kothapalli, “STIC-D: Algorithmic Techniques For Efficient Parallel Pagerank Computation on Real-World Graphs,” in Proceedings of the 17th International Conference on Distributed Computing and Networking - ICDCN ’16. ACM Press, 01 2016, pp. 1—-10. [2] H. K. Giri, M. Haque, and D. S. Banerjee, “HyPR: Hybrid Page Ranking on Evolving Graphs,” in Proc. IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 62–71. Results Batched vs Cumulative update - CPU: 4066×, 2998× of 5000 edges batch wrt single-edge cumulative update. - GPU: 1712×, 2324× of 5000 edges batch wrt cumulative single-edge update. Comparison with state-of-the-art - CPU: 6.1×, 8.6× wrt static plain STIC-D PR [1]. - GPU: 9.8×, 9.3× wrt naive dynamic nvGraph PR. - CPU: 4.2×, 5.8× wrt Pure CPU HyPR [2]. - GPU: 1.9×, 1.8× wrt Pure GPU HyPR. Figure 2: Comparison with pure-CPU HyPR and plain STIC-D PR on the CPU; speedup of DynamicLevelwisePR on the respective bars (top). Comparison with pure-GPU HyPR and naive dynamic nvGraph PR on the GPU; speedup of DynamicMonolithicPR on the respective bars (bottom). Averaged over batch sizes of 500, 1000, 2000, 5000, and 10000. Figure 3: Speedup of batched DynamicLevelwisePR with respect to cumulative single-edge updates (same approach) on the CPU is shown on the top. Speedup of batched DynamicMonolithicPR with respect to cumulative single-edge updates with the same approach on the GPU is shown on the bottom. Batch sizes of 500, 1000, 5000, and 10000 are shown. Dataset - From the SuiteSparse Matrix Collection. - Add self-loops to dead ends in all graphs. - Number of vertices vary from 75k to 41M. - Number of edges vary from 524k to 1.1B. Batch generation - Batch sizes vary from 500 to 10,000 edges. - Edge insertions, deletions in equal mix. - High degree vertices have higher chance of selection (mimic real-world graphs). - No new vertices are added or removed. Performance measurement - 32-bit integers for CSR representation. - 32-bit floats for rank vector. - L∞-norm for error measurement, (L2-norm for nvGraph PageRank). - Measured time only rank computation. Platform - Intel(R) Xeon(R) Silver 4116 CPU (12 cores) x 2 Cache L1: 768KB, L2: 12MB, L3: 16MB (shared). - NVIDIA Tesla V100 GPU (16GB PCIe), 14 TFLOPs SP (84 SMs x 64 FP/INT cores), - CentOS 7.9, OpenMP 5.0, CUDA 11.3, GCC 9.3. Our Approaches DynamicLevelwisePR - Contrast to full power-iteration. - Process vertices in levels of SCCs. - Avoid converged/unstable vertices. - No per-iteration sharing of ranks. - Faster on CPU with OpenMP. - Slightly higher error. - Requires graph to be dead-end free. DynamicMonolithicPR - Full power-iteration, process all vertices. - Group vertices by SCC for better access. - Partition vertices by in-degree on GPU. - Use old ranks, skip unaffected vertices. - Affected vertices found with DFS. - Faster on GPU with CUDA. Introduction Types of Dynamic graph algorithms - Incremental: handles 1 edge/vertex insertion. - Decremental: handles 1 edge/vertex deletion. - Fully dynamic: handles 1 insertion or deletion. - Batched fully dynamic: handles n insertions and/or deletions. Benefits of Dynamic graph algorithms - Reduces time needed for performing analytics. - Enables interactivity with dataset. - Parallel fully dynamic algorithms accept a batch of updates to minimize computation needed in contrast to fully dynamic ones. PageRank computation approaches - Matrix multiplication. - Power-iteration (push vs pull). - Random walk (approximate). Challenges & Limitations - Graphs are massive and constantly updated. - Existing dynamic algorithms do not utilize reducibility of graphs. - Vertices which are dependent upon other vertices to converge are still processed. - Locality benefits of SCCs are not explored. PageRank has applications in: - Ranking of websites. - Measuring scientific impact of researchers. - Finding the best teams and athletes. - Ranking companies by talent concentration. - Predicting road/foot traffic in urban spaces. - Analysing protein networks. - Finding the most authoritative news sources - Identifying parts of brain that change jointly. - Toxic waste management. - PageRank is a link-analysis algorithm. - By Larry Page and Sergey Brin in 1996. - For ordering information on the web. - Represented with a random-surfer model. - Rank of a page is defined recursively. - Calculate iteratively with power-iteration. Fighting Fake news - Click-Gap: When is Facebook is driving disproportionate amounts of traffic to websites. - Effort to rid fake news from Facebook’s services. - Is a website relying on Facebook to drive significant traffic, but not well ranked by the rest of the web? Debugging complex software systems - MonitorRank: a version of PageRank designed to analyze complex, engineered systems. - Returns a ranked list of systems based on the likelihood that they contributed to, or participated in, an anomalous situation. Finding the most original writers - BookRank: using a network of 19th century authors to find quantitative evidence that Jane Austin and Walter Scott were found to be the most original authors of the 19th century. Finding topical authorities - TwitterRank: using the teleportation vector and topic-specific transition probabilities to localize the PageRank vector. 1 2 3 4