SlideShare a Scribd company logo
1 of 27
Download to read offline
Simulating Stellar Merger using HPX/Kokkos on
A64FX on Supercomputer Fugaku
Patrick Diehl
Joint work with: Gregor Daiß, Kevin Huck, Dominic Marcello, Sagiv Schiber,
Hartmut Kaiser, and Dirk Pflüger
Center for Computation & Technology, Louisiana State University
Department of Physics & Astronomy
patrickdiehl@lsu.edu
May 2023
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 1 / 27
Motivation
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 2 / 27
Navigating Performance, Portability, and Productivity
Reference
Pennycook, S. John, et al. ”Navigating performance, portability, and productivity.” Computing in Science & Engineering
23.5 (2021): 28-38.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 3 / 27
Outline
1 Astrophysical application
2 Software framework
Octo-Tiger
Kokkos
HPX
Kokkos and HPX
3 Supercomputer Fugaku
4 Distributed scaling
5 Performance optimization
6 Conclusion and Outlook
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 4 / 27
Astrophysical application
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 5 / 27
Example simulation
Astrophysical event: Merging of two stars – Flow on the surface which
corresponds in layman terms to the weather on the stars.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 6 / 27
Software framework
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 7 / 27
Octo-Tiger
Astrophysics open source program1 simulating the evolution of star
systems based on the fast multipole method on adaptive Octrees.
Modules
Hydro
Gravity
Radiation
Supports
Communication:
MPI/libfabric/LCI + GasNet
Backends: CUDA, HIP, SYCL
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.”
Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
1
https://github.com/STEllAR-GROUP/octotiger
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 8 / 27
Kokkos: C++ Performance Portability Programming
EcoSystem
Kokkos is a C++ library2 for writing performance portable applications
targeting all major HPC platforms
CPU
OpenMP
HPX
GPU
Native: CUDA & HIP
SYCL: CUDA & HIP
Reference
Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on
Parallel and Distributed Systems 33.4 (2021): 805-817.
2
https://github.com/kokkos/kokkos
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 9 / 27
HPX
HPX is a open source C++ Standard Library for Concurrency and
Parallelism3.
Features
HPX exposes a uniform, standards-oriented API for ease of
programming parallel and distributed applications.
HPX provides unified syntax and semantics for local and remote
operations.
HPX exposes a uniform, flexible, and extendable performance counter
framework which can enable runtime adaptivity.
Reference
Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
3
https://github.com/STEllAR-GROUP/hpx
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 10 / 27
HPX and Kokkos
References
G. Daiß, et al., ”From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types,” 2022
IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), Dallas, TX,
USA, 2022, pp. 10-19
Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE
International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 11 / 27
Supercomputer Fugaku
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 12 / 27
Supercomputer Fugaku
Details:
158,976 nodes
Fujitsu A64FX CPU (48+4
core) per node
Tofu interconnect D
Memory 32 GiB/node
Software:
Fujitsu compiler software
Fujitsu MPI
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 13 / 27
Porting HPX to Arm
Challenges on Supercomputer
Fugaku:
Add support for Parallel job
manager (PJM)
Cross compilation on x86 head
node to A64FX compute nodes.
Some of the dependencies do
not support cross compilation.
Compiling on A64FX was very
slow
Reference
Gupta, Nikunj, et al. ”Deploying a task-based runtime system on raspberry pi clusters.” 2020 IEEE/ACM Fifth
International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2020.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 14 / 27
Nodel-level scaling
SVE vectorization reduced the computation time by a factor of two
compared to Neon.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 15 / 27
Distributed scaling
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 16 / 27
Disclaimer
This was a testbed allocation to port Octo-Tiger and its dependencies to
Supercomputer Fugaku. And do some scaling runs to prepare for a larger
allocation.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 17 / 27
Distributed scaling: Perlmutter and Supercomputer Fugaku
Due to the 28 GB of memory on Supercomputer Fugaku we used a small
example fitting into one Fugaku node.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 18 / 27
Distributed scaling: Summit and Supercomputer Fugaku
Due to the 28 GB of memory, more nodes on Supercomputer Fugaku are
required as on other machines.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 19 / 27
Scaling on Supercomputer Fugaku
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 20 / 27
Power consumption
Measurements in the job manager’s output:
Power consumption using PowerAPI
Network traffic
Reference
R. E. Grant, M. Levenhagen, S. L. Olivier, D. DeBonis, K. T. Pedretti and J. H. Laros III, ”Standardizing Power
Monitoring and Control at Exascale,” in Computer, vol. 49, no. 10, pp. 38-46, Oct. 2016.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 21 / 27
Performance optimization
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 22 / 27
Optimization
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 23 / 27
Conclusion and Outlook
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 24 / 27
Conclusion
HPX and Octo-Tiger
One of the early adoptors of Kokkos on Supercomputer Fugaku &
Ookami
The first using HPX on A64FX
Programming model
Work-stealing; overlapping communication and computation; and
light-weighted threads with irregular workloads worked well on A64FX.
Performance portability
Was relatively easy to compile the code (2 days)
The code was running without major changes
The obtained performance was good, but maybe not optimal
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 25 / 27
Outlook
Outlook
Do more large scale runs on Supercomputer Fugaku
Optimize the code for A64FX
Energy consumption measurements on other GPU-based systems
Thanks to all my collaborators and without their effort, I could not present all these results.
Thanks for your attention! Questions?
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 26 / 27
This work is licensed under a Creative
Commons “Attribution-NonCommercial-
NoDerivatives 4.0 International” license.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 27 / 27

More Related Content

Similar to Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC
 
OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC
 
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdfOpenACC
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open SourceRCCSRENKEI
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Patrick Diehl
 
OpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC
 
OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC
 
OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...Patrick Diehl
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023OpenACC
 
OpenACC Monthly Highlights: June 2019
OpenACC Monthly Highlights: June 2019OpenACC Monthly Highlights: June 2019
OpenACC Monthly Highlights: June 2019OpenACC
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 

Similar to Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku (20)

OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022
 
OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022
 
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
 
OpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly Highlights
 
OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021
 
OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
OpenACC Monthly Highlights: June 2019
OpenACC Monthly Highlights: June 2019OpenACC Monthly Highlights: June 2019
OpenACC Monthly Highlights: June 2019
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 

More from Patrick Diehl

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsPatrick Diehl
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondPatrick Diehl
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranPatrick Diehl
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...Patrick Diehl
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsPatrick Diehl
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Patrick Diehl
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchPatrick Diehl
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Patrick Diehl
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Patrick Diehl
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsPatrick Diehl
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersPatrick Diehl
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsPatrick Diehl
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...Patrick Diehl
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Patrick Diehl
 

More from Patrick Diehl (15)

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local models
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task Bench
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics models
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi Clusters
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic models
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

  • 1. Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku Patrick Diehl Joint work with: Gregor Daiß, Kevin Huck, Dominic Marcello, Sagiv Schiber, Hartmut Kaiser, and Dirk Pflüger Center for Computation & Technology, Louisiana State University Department of Physics & Astronomy patrickdiehl@lsu.edu May 2023 Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 1 / 27
  • 2. Motivation Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 2 / 27
  • 3. Navigating Performance, Portability, and Productivity Reference Pennycook, S. John, et al. ”Navigating performance, portability, and productivity.” Computing in Science & Engineering 23.5 (2021): 28-38. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 3 / 27
  • 4. Outline 1 Astrophysical application 2 Software framework Octo-Tiger Kokkos HPX Kokkos and HPX 3 Supercomputer Fugaku 4 Distributed scaling 5 Performance optimization 6 Conclusion and Outlook Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 4 / 27
  • 5. Astrophysical application Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 5 / 27
  • 6. Example simulation Astrophysical event: Merging of two stars – Flow on the surface which corresponds in layman terms to the weather on the stars. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 6 / 27
  • 7. Software framework Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 7 / 27
  • 8. Octo-Tiger Astrophysics open source program1 simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. Modules Hydro Gravity Radiation Supports Communication: MPI/libfabric/LCI + GasNet Backends: CUDA, HIP, SYCL Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. 1 https://github.com/STEllAR-GROUP/octotiger Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 8 / 27
  • 9. Kokkos: C++ Performance Portability Programming EcoSystem Kokkos is a C++ library2 for writing performance portable applications targeting all major HPC platforms CPU OpenMP HPX GPU Native: CUDA & HIP SYCL: CUDA & HIP Reference Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on Parallel and Distributed Systems 33.4 (2021): 805-817. 2 https://github.com/kokkos/kokkos Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 9 / 27
  • 10. HPX HPX is a open source C++ Standard Library for Concurrency and Parallelism3. Features HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications. HPX provides unified syntax and semantics for local and remote operations. HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity. Reference Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source Software 5.53 (2020): 2352. 3 https://github.com/STEllAR-GROUP/hpx Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 10 / 27
  • 11. HPX and Kokkos References G. Daiß, et al., ”From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types,” 2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), Dallas, TX, USA, 2022, pp. 10-19 Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 11 / 27
  • 12. Supercomputer Fugaku Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 12 / 27
  • 13. Supercomputer Fugaku Details: 158,976 nodes Fujitsu A64FX CPU (48+4 core) per node Tofu interconnect D Memory 32 GiB/node Software: Fujitsu compiler software Fujitsu MPI Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 13 / 27
  • 14. Porting HPX to Arm Challenges on Supercomputer Fugaku: Add support for Parallel job manager (PJM) Cross compilation on x86 head node to A64FX compute nodes. Some of the dependencies do not support cross compilation. Compiling on A64FX was very slow Reference Gupta, Nikunj, et al. ”Deploying a task-based runtime system on raspberry pi clusters.” 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2020. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 14 / 27
  • 15. Nodel-level scaling SVE vectorization reduced the computation time by a factor of two compared to Neon. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 15 / 27
  • 16. Distributed scaling Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 16 / 27
  • 17. Disclaimer This was a testbed allocation to port Octo-Tiger and its dependencies to Supercomputer Fugaku. And do some scaling runs to prepare for a larger allocation. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 17 / 27
  • 18. Distributed scaling: Perlmutter and Supercomputer Fugaku Due to the 28 GB of memory on Supercomputer Fugaku we used a small example fitting into one Fugaku node. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 18 / 27
  • 19. Distributed scaling: Summit and Supercomputer Fugaku Due to the 28 GB of memory, more nodes on Supercomputer Fugaku are required as on other machines. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 19 / 27
  • 20. Scaling on Supercomputer Fugaku Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 20 / 27
  • 21. Power consumption Measurements in the job manager’s output: Power consumption using PowerAPI Network traffic Reference R. E. Grant, M. Levenhagen, S. L. Olivier, D. DeBonis, K. T. Pedretti and J. H. Laros III, ”Standardizing Power Monitoring and Control at Exascale,” in Computer, vol. 49, no. 10, pp. 38-46, Oct. 2016. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 21 / 27
  • 22. Performance optimization Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 22 / 27
  • 23. Optimization Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 23 / 27
  • 24. Conclusion and Outlook Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 24 / 27
  • 25. Conclusion HPX and Octo-Tiger One of the early adoptors of Kokkos on Supercomputer Fugaku & Ookami The first using HPX on A64FX Programming model Work-stealing; overlapping communication and computation; and light-weighted threads with irregular workloads worked well on A64FX. Performance portability Was relatively easy to compile the code (2 days) The code was running without major changes The obtained performance was good, but maybe not optimal Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 25 / 27
  • 26. Outlook Outlook Do more large scale runs on Supercomputer Fugaku Optimize the code for A64FX Energy consumption measurements on other GPU-based systems Thanks to all my collaborators and without their effort, I could not present all these results. Thanks for your attention! Questions? Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 26 / 27
  • 27. This work is licensed under a Creative Commons “Attribution-NonCommercial- NoDerivatives 4.0 International” license. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger May 2023 27 / 27