Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku
1. Simulating Stellar Merger using HPX/Kokkos on
A64FX on Supercomputer Fugaku
Patrick Diehl
Joint work with: Gregor Daiß, Kevin Huck, Dominic Marcello, Sagiv Schiber,
Hartmut Kaiser, and Dirk Pflüger
Center for Computation & Technology, Louisiana State University
Department of Physics & Astronomy
patrickdiehl@lsu.edu
May 2023
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 1 / 27
6. Example simulation
Astrophysical event: Merging of two stars – Flow on the surface which
corresponds in layman terms to the weather on the stars.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 6 / 27
8. Octo-Tiger
Astrophysics open source program1 simulating the evolution of star
systems based on the fast multipole method on adaptive Octrees.
Modules
Hydro
Gravity
Radiation
Supports
Communication:
MPI/libfabric/LCI + GasNet
Backends: CUDA, HIP, SYCL
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.”
Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
1
https://github.com/STEllAR-GROUP/octotiger
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 8 / 27
9. Kokkos: C++ Performance Portability Programming
EcoSystem
Kokkos is a C++ library2 for writing performance portable applications
targeting all major HPC platforms
CPU
OpenMP
HPX
GPU
Native: CUDA & HIP
SYCL: CUDA & HIP
Reference
Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on
Parallel and Distributed Systems 33.4 (2021): 805-817.
2
https://github.com/kokkos/kokkos
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 9 / 27
10. HPX
HPX is a open source C++ Standard Library for Concurrency and
Parallelism3.
Features
HPX exposes a uniform, standards-oriented API for ease of
programming parallel and distributed applications.
HPX provides unified syntax and semantics for local and remote
operations.
HPX exposes a uniform, flexible, and extendable performance counter
framework which can enable runtime adaptivity.
Reference
Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
3
https://github.com/STEllAR-GROUP/hpx
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 10 / 27
11. HPX and Kokkos
References
G. Daiß, et al., ”From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types,” 2022
IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), Dallas, TX,
USA, 2022, pp. 10-19
Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE
International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 11 / 27
13. Supercomputer Fugaku
Details:
158,976 nodes
Fujitsu A64FX CPU (48+4
core) per node
Tofu interconnect D
Memory 32 GiB/node
Software:
Fujitsu compiler software
Fujitsu MPI
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 13 / 27
14. Porting HPX to Arm
Challenges on Supercomputer
Fugaku:
Add support for Parallel job
manager (PJM)
Cross compilation on x86 head
node to A64FX compute nodes.
Some of the dependencies do
not support cross compilation.
Compiling on A64FX was very
slow
Reference
Gupta, Nikunj, et al. ”Deploying a task-based runtime system on raspberry pi clusters.” 2020 IEEE/ACM Fifth
International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2020.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 14 / 27
15. Nodel-level scaling
SVE vectorization reduced the computation time by a factor of two
compared to Neon.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 15 / 27
17. Disclaimer
This was a testbed allocation to port Octo-Tiger and its dependencies to
Supercomputer Fugaku. And do some scaling runs to prepare for a larger
allocation.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 17 / 27
18. Distributed scaling: Perlmutter and Supercomputer Fugaku
Due to the 28 GB of memory on Supercomputer Fugaku we used a small
example fitting into one Fugaku node.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 18 / 27
19. Distributed scaling: Summit and Supercomputer Fugaku
Due to the 28 GB of memory, more nodes on Supercomputer Fugaku are
required as on other machines.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 19 / 27
21. Power consumption
Measurements in the job manager’s output:
Power consumption using PowerAPI
Network traffic
Reference
R. E. Grant, M. Levenhagen, S. L. Olivier, D. DeBonis, K. T. Pedretti and J. H. Laros III, ”Standardizing Power
Monitoring and Control at Exascale,” in Computer, vol. 49, no. 10, pp. 38-46, Oct. 2016.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 21 / 27
25. Conclusion
HPX and Octo-Tiger
One of the early adoptors of Kokkos on Supercomputer Fugaku &
Ookami
The first using HPX on A64FX
Programming model
Work-stealing; overlapping communication and computation; and
light-weighted threads with irregular workloads worked well on A64FX.
Performance portability
Was relatively easy to compile the code (2 days)
The code was running without major changes
The obtained performance was good, but maybe not optimal
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 25 / 27
26. Outlook
Outlook
Do more large scale runs on Supercomputer Fugaku
Optimize the code for A64FX
Energy consumption measurements on other GPU-based systems
Thanks to all my collaborators and without their effort, I could not present all these results.
Thanks for your attention! Questions?
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 26 / 27
27. This work is licensed under a Creative
Commons “Attribution-NonCommercial-
NoDerivatives 4.0 International” license.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 27 / 27