Adjusting data types for rank vector
Custom fp16 bfloat16 float double
1. Performance of vector element sum using float vs bfloat16 as the storage type.
2. Comparison of PageRank using float vs bfloat16 as the storage type (pull, CSR).
3. Performance of PageRank using 32-bit floats vs 64-bit floats (pull, CSR).
Adjusting Pagerank parameters
Damping Factor adjust dynamic-adjust
Tolerance L1 norm L2 norm L∞ norm
1. Comparing the effect of using different values of damping factor, with PageRank (pull, CSR).
2. Experimenting PageRank improvement by adjusting damping factor (α) between iterations.
3. Comparing the effect of using different functions for convergence check, with PageRank (...).
4. Comparing the effect of using different values of tolerance, with PageRank (pull, CSR).
Adjusting Sequential approach
Push Pull Class CSR
1. Performance of contribution-push based vs contribution-pull based PageRank.
2. Performance of C++ DiGraph class based vs CSR based PageRank (pull).
Adjusting OpenMP approach
Map Reduce Uniform Hybrid
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Performance of sequential execution based vs OpenMP based vector element sum.
3. Performance of uniform-OpenMP based vs hybrid-OpenMP based PageRank (pull, CSR).
Comparing sequential approach
OpenMP nvGraph
Sequential vs vs
OpenMP vs
1. Performance of sequential execution based vs OpenMP based PageRank (pull, CSR).
2. Performance of sequential execution based vs nvGraph based PageRank (pull, CSR).
3. Performance of OpenMP based vs nvGraph based PageRank (pull, CSR).
Adjusting Monolithic (Sequential) optimizations (from STICD)
Split components Skip in-identicals Skip chains Skip converged
1. Performance benefit of PageRank with vertices split by components (pull, CSR).
2. Performance benefit of skipping in-identical vertices for PageRank (pull, CSR).
3. Performance benefit of skipping chain vertices for PageRank (pull, CSR).
4. Performance benefit of skipping converged vertices for PageRank (pull, CSR).
Adjusting Levelwise (STICD) approach
Min. component size Min. compute size Skip teleport calculation
1. Comparing various min. component sizes for topologically-ordered components (levelwise...).
2. Comparing various min. compute sizes for topologically-ordered components (levelwise...).
3. Checking performance benefit of levelwise PageRank when teleport calculation is skipped.
Note: min. components size merges small components even before generating block-graph /
topological-ordering, but min. compute size does it before pagerank computation.
Comparing Levelwise (STICD) approach
Monolithic nvGraph
Levelwise (STICD) vs
1. Performance of monolithic vs topologically-ordered components (levelwise) PageRank.
Adjusting ranks for dynamic graphs
update new zero fill 1/N fill
update old, new scale, 1/N fill
1. Comparing strategies to update ranks for dynamic PageRank (pull, CSR).
Adjusting Levelwise (STICD) dynamic approach
Skip unaffected components For fixed graphs For temporal graphs
1. Checking for correctness of levelwise PageRank when unchanged components are skipped.
2. Perf. benefit of levelwise PageRank when unchanged components are skipped (fixed).
3. Perf. benefit of levelwise PageRank when unchanged components are skipped (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Comparing dynamic approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: temporal
Monolithic static vs: fixed, temporal vs: fixed, temporal
Levelwise static vs: fixed vs: fixed, temporal
1. Performance of nvGraph based static vs dynamic PageRank (temporal).
2. Performance of static vs dynamic PageRank (temporal).
3. Performance of static vs dynamic levelwise PageRank (fixed).
4. Performance of levelwise based static vs dynamic PageRank (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Adjusting Monolithic CUDA approach
Map launch
Reduce memcpy launch in-place launch vs
Thread /V launch sort/p. vertices sort edges
Block /V launch sort/p. vertices sort edges
Switched /V thread launch block launch switch-point
1. Comparing various launch configs for CUDA based vector multiply.
2. Comparing various launch configs for CUDA based vector element sum (memcpy).
3. Comparing various launch configs for CUDA based vector element sum (in-place).
4. Performance of memcpy vs in-place based CUDA based vector element sum.
5. Comparing various launch configs for CUDA thread-per-vertex based PageRank (pull, CSR).
6. Sorting vertices and/or edges by in-degree for CUDA thread-per-vertex based PageRank.
7. Comparing various launch configs for CUDA block-per-vertex based PageRank (pull, CSR).
8. Sorting vertices and/or edges by in-degree for CUDA block-per-vertex based PageRank.
9. Launch configs for CUDA switched-per-vertex based PageRank focusing on thread approach.
10. Launch configs for CUDA switched-per-vertex based PageRank focusing on block approach.
11. Sorting vertices and/or edges by in-degree for CUDA switched-per-vertex based PageRank.
12. Comparing various switch points for CUDA switched-per-vertex based PageRank (pull, ...).
Note: sort/p. vertices ⇒ sorting vertices by ascending or descending order of in-degree, or simply
partitioning (by in-degree). sort edges ⇒ sorting edges by ascending or descending order of id.
Adjusting Monolithic CUDA optimizations (from STICD)
Split components Skip in-identicals Skip chains Skip converged
1. Performance benefit of CUDA based PageRank with vertices split by components.
2. Performance benefit of skipping in-identical vertices for CUDA based PageRank (pull, CSR).
3. Performance benefit of skipping chain vertices for CUDA based PageRank (pull, CSR).
4. Performance benefit of skipping converged vertices for CUDA based PageRank (pull, CSR).
Adjusting Levelwise (STICD) CUDA approach
Min. component size Min. compute size Skip teleport calculation
1. Min. component sizes for topologically-ordered components (levelwise, CUDA) PageRank.
2. Min. compute sizes for topologically-ordered components (levelwise CUDA) PageRank.
Note: min. components size merges small components even before generating block-graph /
topological-ordering, but min. compute size does it before pagerank computation.
Comparing Levelwise (STICD) CUDA approach
nvGraph Monolithic CUDA
Monolithic vs vs
Monolithic CUDA vs
Levelwise CUDA vs vs
1. Performance of sequential execution based vs CUDA based PageRank (pull, CSR).
2. Performance of nvGraph vs CUDA based PageRank (pull, CSR).
3. Performance of Monolithic CUDA vs Levelwise CUDA PageRank (pull, CSR, ...).
Comparing dynamic CUDA approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
Monolithic static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
Levelwise static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal
1. Performance of static vs dynamic CUDA based PageRank (fixed).
2. Performance of static vs dynamic CUDA based PageRank (temporal).
3. Performance of CUDA based static vs dynamic levelwise PageRank (fixed).
4. Performance of static vs dynamic CUDA based levelwise PageRank (temporal).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.
Comparing dynamic optimized CUDA approach with static
nvGraph dynamic Monolithic dynamic Levelwise dynamic
nvGraph static vs: fixed vs: fixed vs: fixed
Monolithic static vs: fixed vs: fixed vs: fixed
Levelwise static vs: fixed vs: fixed vs: fixed
1. Performance of CUDA based optimized dynamic monolithic vs levelwise PageRank (fixed).
Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge
updated from temporal graphs.

PageRank Experiments : SHORT REPORT / NOTES

  • 1.
    Adjusting data typesfor rank vector Custom fp16 bfloat16 float double 1. Performance of vector element sum using float vs bfloat16 as the storage type. 2. Comparison of PageRank using float vs bfloat16 as the storage type (pull, CSR). 3. Performance of PageRank using 32-bit floats vs 64-bit floats (pull, CSR). Adjusting Pagerank parameters Damping Factor adjust dynamic-adjust Tolerance L1 norm L2 norm L∞ norm 1. Comparing the effect of using different values of damping factor, with PageRank (pull, CSR). 2. Experimenting PageRank improvement by adjusting damping factor (α) between iterations. 3. Comparing the effect of using different functions for convergence check, with PageRank (...). 4. Comparing the effect of using different values of tolerance, with PageRank (pull, CSR). Adjusting Sequential approach Push Pull Class CSR 1. Performance of contribution-push based vs contribution-pull based PageRank. 2. Performance of C++ DiGraph class based vs CSR based PageRank (pull). Adjusting OpenMP approach Map Reduce Uniform Hybrid 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Performance of sequential execution based vs OpenMP based vector element sum. 3. Performance of uniform-OpenMP based vs hybrid-OpenMP based PageRank (pull, CSR). Comparing sequential approach OpenMP nvGraph Sequential vs vs
  • 2.
    OpenMP vs 1. Performanceof sequential execution based vs OpenMP based PageRank (pull, CSR). 2. Performance of sequential execution based vs nvGraph based PageRank (pull, CSR). 3. Performance of OpenMP based vs nvGraph based PageRank (pull, CSR). Adjusting Monolithic (Sequential) optimizations (from STICD) Split components Skip in-identicals Skip chains Skip converged 1. Performance benefit of PageRank with vertices split by components (pull, CSR). 2. Performance benefit of skipping in-identical vertices for PageRank (pull, CSR). 3. Performance benefit of skipping chain vertices for PageRank (pull, CSR). 4. Performance benefit of skipping converged vertices for PageRank (pull, CSR). Adjusting Levelwise (STICD) approach Min. component size Min. compute size Skip teleport calculation 1. Comparing various min. component sizes for topologically-ordered components (levelwise...). 2. Comparing various min. compute sizes for topologically-ordered components (levelwise...). 3. Checking performance benefit of levelwise PageRank when teleport calculation is skipped. Note: min. components size merges small components even before generating block-graph / topological-ordering, but min. compute size does it before pagerank computation. Comparing Levelwise (STICD) approach Monolithic nvGraph Levelwise (STICD) vs 1. Performance of monolithic vs topologically-ordered components (levelwise) PageRank. Adjusting ranks for dynamic graphs update new zero fill 1/N fill update old, new scale, 1/N fill
  • 3.
    1. Comparing strategiesto update ranks for dynamic PageRank (pull, CSR). Adjusting Levelwise (STICD) dynamic approach Skip unaffected components For fixed graphs For temporal graphs 1. Checking for correctness of levelwise PageRank when unchanged components are skipped. 2. Perf. benefit of levelwise PageRank when unchanged components are skipped (fixed). 3. Perf. benefit of levelwise PageRank when unchanged components are skipped (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Comparing dynamic approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: temporal Monolithic static vs: fixed, temporal vs: fixed, temporal Levelwise static vs: fixed vs: fixed, temporal 1. Performance of nvGraph based static vs dynamic PageRank (temporal). 2. Performance of static vs dynamic PageRank (temporal). 3. Performance of static vs dynamic levelwise PageRank (fixed). 4. Performance of levelwise based static vs dynamic PageRank (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Adjusting Monolithic CUDA approach Map launch Reduce memcpy launch in-place launch vs Thread /V launch sort/p. vertices sort edges Block /V launch sort/p. vertices sort edges Switched /V thread launch block launch switch-point
  • 4.
    1. Comparing variouslaunch configs for CUDA based vector multiply. 2. Comparing various launch configs for CUDA based vector element sum (memcpy). 3. Comparing various launch configs for CUDA based vector element sum (in-place). 4. Performance of memcpy vs in-place based CUDA based vector element sum. 5. Comparing various launch configs for CUDA thread-per-vertex based PageRank (pull, CSR). 6. Sorting vertices and/or edges by in-degree for CUDA thread-per-vertex based PageRank. 7. Comparing various launch configs for CUDA block-per-vertex based PageRank (pull, CSR). 8. Sorting vertices and/or edges by in-degree for CUDA block-per-vertex based PageRank. 9. Launch configs for CUDA switched-per-vertex based PageRank focusing on thread approach. 10. Launch configs for CUDA switched-per-vertex based PageRank focusing on block approach. 11. Sorting vertices and/or edges by in-degree for CUDA switched-per-vertex based PageRank. 12. Comparing various switch points for CUDA switched-per-vertex based PageRank (pull, ...). Note: sort/p. vertices ⇒ sorting vertices by ascending or descending order of in-degree, or simply partitioning (by in-degree). sort edges ⇒ sorting edges by ascending or descending order of id. Adjusting Monolithic CUDA optimizations (from STICD) Split components Skip in-identicals Skip chains Skip converged 1. Performance benefit of CUDA based PageRank with vertices split by components. 2. Performance benefit of skipping in-identical vertices for CUDA based PageRank (pull, CSR). 3. Performance benefit of skipping chain vertices for CUDA based PageRank (pull, CSR). 4. Performance benefit of skipping converged vertices for CUDA based PageRank (pull, CSR). Adjusting Levelwise (STICD) CUDA approach Min. component size Min. compute size Skip teleport calculation 1. Min. component sizes for topologically-ordered components (levelwise, CUDA) PageRank. 2. Min. compute sizes for topologically-ordered components (levelwise CUDA) PageRank. Note: min. components size merges small components even before generating block-graph / topological-ordering, but min. compute size does it before pagerank computation. Comparing Levelwise (STICD) CUDA approach nvGraph Monolithic CUDA Monolithic vs vs
  • 5.
    Monolithic CUDA vs LevelwiseCUDA vs vs 1. Performance of sequential execution based vs CUDA based PageRank (pull, CSR). 2. Performance of nvGraph vs CUDA based PageRank (pull, CSR). 3. Performance of Monolithic CUDA vs Levelwise CUDA PageRank (pull, CSR, ...). Comparing dynamic CUDA approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal Monolithic static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal Levelwise static vs: fixed, temporal vs: fixed, temporal vs: fixed, temporal 1. Performance of static vs dynamic CUDA based PageRank (fixed). 2. Performance of static vs dynamic CUDA based PageRank (temporal). 3. Performance of CUDA based static vs dynamic levelwise PageRank (fixed). 4. Performance of static vs dynamic CUDA based levelwise PageRank (temporal). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs. Comparing dynamic optimized CUDA approach with static nvGraph dynamic Monolithic dynamic Levelwise dynamic nvGraph static vs: fixed vs: fixed vs: fixed Monolithic static vs: fixed vs: fixed vs: fixed Levelwise static vs: fixed vs: fixed vs: fixed 1. Performance of CUDA based optimized dynamic monolithic vs levelwise PageRank (fixed). Note: fixed ⇒ static graphs with batches of random edge updates. temporal ⇒ batches of edge updated from temporal graphs.