SlideShare a Scribd company logo
1 of 51
Download to read offline
Recent developments in HPX and Octo-Tiger
Patrick Diehl
Joint work with: Gregor Daiß, Sagiv Schieber, Dominic Marcello, Kevin Huck,
Hartmut Kaiser, Juhan Frank, Geoffery Clayton, Patrick Motl, Dirk Pflüger,
Orsola DeMarco, Mikael Simberg, John Bidiscombe, Srinivas Yada, and many
more
Center for Computation & Technology, Louisiana State University
Department of Physics & Astronomy
patrickdiehl@lsu.edu
November 2022
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 1 / 44
Motivation
Astrophysical event: Merging of two stars – Flow on the surface which
corresponds in layman terms to the weather on the stars.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 2 / 44
Outline
1 Astrophysical application
2 Software framework
Octo-Tiger
HPX
APEX
3 New features
Kokkos and HPX
Vectorization
Work aggregation
4 Overhead measurements
5 Scaling
Synchronous (MPI) vs asynchronous communication (libfabric)
Scaling on ORNL’s Summit
First experience on Fugaku using A64FX
6 Performance profiling
7 Conclusion and Outlook
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 3 / 44
Outline
1 Astrophysical application
2 Software framework
Octo-Tiger
HPX
APEX
3 New features
Kokkos and HPX
Vectorization
Work aggregation
4 Overhead measurements
5 Scaling
Synchronous (MPI) vs asynchronous communication (libfabric)
Scaling on ORNL’s Summit
First experience on Fugaku using A64FX
6 Performance profiling
7 Conclusion and Outlook
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 4 / 44
Astrophysical application
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 5 / 44
V1309 Scorpii
At peak brightness, the rare 2002 red
nova V838 Monocerotis briefly
rivalled the most powerful stars in
the Galaxy. Credit: NASA/ESA/H.
E. Bond (STScl)
A near-infrared (I band) light curve
for V1309 Scorpii, plotted from
OGLE data
Reference
Tylenda, R., et al. ”V1309 Scorpii: merger of a contact binary.” Astronomy & Astrophysics 528 (2011): A114.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 6 / 44
Software framework
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 7 / 44
Octo-Tiger
Astrophysics open source program1 simulating the evolution of star
systems based on the fast multipole method on adaptive Octrees.
Modules
Hydro
Gravity
Radiation (benchmarking)
Supports
Communication: MPI/libfabric
Backends: CUDA, HIP, Kokkos
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.”
Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
1
https://github.com/STEllAR-GROUP/octotiger
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 8 / 44
Example of adaptive mesh refinement
Reference
Heller, Thomas, et al. ”Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two
stars.” The International Journal of High Performance Computing Applications 33.4 (2019): 699-715.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 9 / 44
Example of adaptive mesh refinement
Reference
Heller, Thomas, et al. ”Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two
stars.” The International Journal of High Performance Computing Applications 33.4 (2019): 699-715.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 9 / 44
HPX
HPX is a open source C++ Standard Library for Concurrency and
Parallelism2.
Features
HPX exposes a uniform, standards-oriented API for ease of
programming parallel and distributed applications.
HPX provides unified syntax and semantics for local and remote
operations.
HPX exposes a uniform, flexible, and extendable performance counter
framework which can enable runtime adaptivity.
Reference
Kaiser, Hartmut, et al. ”Hpx-the c++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
2
https://github.com/STEllAR-GROUP/hpx
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 10 / 44
HPX’s architecture
Application
Operating System
C++2z Concurrency/Parallelism APIs
Threading Subsystem
Active Global Address
Space (AGAS)
Local Control Objects
(LCOs)
Parcel Transport Layer
(Networking)
API
OS
Performance Counter
Framework
Policy
Engine/Policies
Reference
Kaiser, Hartmut, et al. ”Hpx-the c++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 11 / 44
APEX
APEX: Autonomous
Performance
Environment for
Exascale: Performance
measurement library for
distributed,
asynchronous
multitasking systems.
CUPTI used to capture CUDA events
NVML used to monitor the GPU
OTF2 and Google Trace Events trace
output
Task Graphs and Trees
Scatterplots of timers and counters
Reference
Huck, Kevin A., et al. ”An autonomic performance environment for exascale.” Supercomputing frontiers and innovations
2.3 (2015): 49-66.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 12 / 44
APEX
To support performance
measurement in systems
that employ user-level
threading, APEX uses a
dependency chain in
addition to the call stack
to produce traces and
task dependency graphs.
CUPTI used to capture CUDA events
NVML used to monitor the GPU
OTF2 and Google Trace Events trace
output
Task Graphs and Trees
Scatterplots of timers and counters
Reference
Huck, Kevin A., et al. ”An autonomic performance environment for exascale.” Supercomputing frontiers and innovations
2.3 (2015): 49-66.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 12 / 44
New features
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 13 / 44
HPX and Kokkos
Reference
Edwards, H. Carter, Christian R. Trott, and Daniel Sunderland. ”Kokkos: Enabling manycore performance portability
through polymorphic memory access patterns.” Journal of parallel and distributed computing 74.12 (2014): 3202-3216.
Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE
International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 14 / 44
Overhead
Reference
Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE
International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 15 / 44
Distributed scaling
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 16 / 44
Vectorizaton using Kokkos + HPX
We can now easily switch both the SIMD library (Kokkos SIMD or
std::experimental::simd) and the used SIMD extensions.
Reference
Daiß, Gregor, et al. ”From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types.”
arXiv preprint arXiv:2210.06439 (2022). (Accepted to SC 22 workshop proceedings)
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 17 / 44
Single node runs on different CPU architectures
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 18 / 44
Work aggregation strategies
Reference
Daiß, Gregor, et al. ”From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks
into Portable GPU Kernels.” arXiv preprint arXiv:2210.06438 (2022). (Accepted to SC 22 workshop proceedings)
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 19 / 44
Results
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 20 / 44
Overhead measurements
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 21 / 44
Task Bench
Task Bench is a configurable benchmark for evaluating the efficiency and
performance of parallel and distributed programming models, runtimes,
and languages. It is primarily intended for evaluating task-based
models, in which the basic unit of execution is a task, but it can be
implemented in any parallel system.
Reference
Slaughter, Elliott, et al. ”Task bench: A parameterized benchmark for evaluating parallel runtime performance.” SC20:
International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 22 / 44
Measurements
Single Node: 1 task per core
Distributed runs: 16 task per core
METG (Minimum Effective Task Granularity): 50% percent effective task granularity, the time a system takes to achieve 50%
overall efficiency.
Take away: Ligth-weight threads and work-stealing comes with overhead,
but not for large simulations.
Reference
Wu, Nanmiao, et al. ”Quantifying Overheads in Charm++ and HPX using Task Bench.” arXiv preprint
arXiv:2207.12127 (2022). (Accepted to EuroPar 22 workshop proceedings)
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 23 / 44
Scaling
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 24 / 44
Synchronous (MPI) vs asynchronous communication
(libfabric)
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 25 / 44
Configuration
Reference
Daiß, Gregor, et al. ”From piz daint to the stars: Simulation of stellar mergers using high-level abstractions.”
Proceedings of the international conference for high performance computing, networking, storage and analysis. 2019.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 26 / 44
Synchronous vs asynchronous communication
Reference
Daiß, Gregor, et al. ”From piz daint to the stars: Simulation of stellar mergers using high-level abstractions.”
Proceedings of the international conference for high performance computing, networking, storage and analysis. 2019.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 27 / 44
Scaling on ORNL’s Summit
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 28 / 44
Node level scaling: Hydro
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 29 / 44
Distributes scaling: Hydro
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 30 / 44
Node level scaling: Hydro + Gravity
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 31 / 44
Distributed scaling: Hydro + Gravity
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 32 / 44
First experience on Fugaku using A64FX
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 33 / 44
Porting HPX to Arm
Challenges on Fugaku:
Add support for Parallel job
manager (PJM)
Cross compilation on x86 head
node to A64FX compute nodes.
Some of the dependencies do
not support cross compilation.
Reference
Gupta, Nikunj, et al. ”Deploying a task-based runtime system on raspberry pi clusters.” 2020 IEEE/ACM Fifth
International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2020.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 34 / 44
Nodel-level scaling
SVE vectorization reduced the computation time by a factor of two
compared to Neon.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 35 / 44
Distributed scaling
Due to the 28 GB of memory, more nodes on Fugaku are required as on
other machines.
References: HPCI report submitted and IDPDS workshop paper in preparation.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 36 / 44
Performance profiling
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 37 / 44
Overhead measurements
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 38 / 44
Task trees and task graphs
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021.
Diehl, Patrick, et al. ”Distributed, combined CPU and GPU profiling within HPX using APEX.” arXiv preprint
arXiv:2210.06437 (2022).
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 39 / 44
Sampled profile of tasks on Piz Daint and Summit
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 40 / 44
Conclusion and Outlook
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 41 / 44
Conclusion
HPX and Octo-Tiger
Asynchronous integration of GPU in HPX:
→ Using CUDA API or HIP API
→ Using Kokkos for Nvidia or AMD GPUs
Providing HPX as a backend for Kokkos and integration of
asynchronous Kokkos lanuches
Work aggregation for small tasks on CPU and GPU
Vectorization using std::simd
Alternatives to MPI, like libfabric or LCI
Programming model
Work-stealing; overlapping communication and computation; and
light-weighted threads can improve the performance of irregular
workloads.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 42 / 44
Outlook
Outlook
Distributed scaling data and optimization for AMD GPUs
Optimization and large scale runs on Fugaku or Ookami
Get ready for Intel GPU support
Test the effect of libfabric on other systems
More astrophysics studies to prepare for the production run with the
light curve
Advanced visualization of the large scale results
Thanks to all my collaborators and without their effort, I could not present all these results.
Thanks for your attention! Questions?
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 43 / 44
Astrophysic validation
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 44 / 44
Resolution convergence: Double white dwarf merger
Reference
Diehl, Patrick, et al. ”Performance Measurements Within Asynchronous Task-Based Runtime Systems: A Double White
Dwarf Merger as an Application.” Computing in Science & Engineering 23.3 (2021): 73-81.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 45 / 44
Higher reconstruction in the hydro module
Reference
Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.”
arXiv:2107.10987 (2021). (Accepted IEEE Cluster 21)
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 46 / 44
Comparison with Flash I
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.”
Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 47 / 44
Comparison with Flash II
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.”
Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 48 / 44
This work is licensed under a Creative
Commons “Attribution-NonCommercial-
NoDerivatives 4.0 International” license.
Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 44 / 44

More Related Content

Similar to Recent Developments in HPX, Octo-Tiger and Vectorization

OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC
 
OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021OpenACC
 
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-TigerEvaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-TigerPatrick Diehl
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - EnglishDaichi Teruya
 
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...Arghya Kusum Das
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 

Similar to Recent Developments in HPX, Octo-Tiger and Vectorization (20)

OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022
 
OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021OpenACC Monthly Highlights: June 2021
OpenACC Monthly Highlights: June 2021
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021
 
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-TigerEvaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Grid.pdf
Grid.pdfGrid.pdf
Grid.pdf
 
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - English
 
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...
Augmenting Amdahl's Second Law for Cost-Effective and Balanced HPC Infrastruc...
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 

More from Patrick Diehl

D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsPatrick Diehl
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Patrick Diehl
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondPatrick Diehl
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranPatrick Diehl
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...Patrick Diehl
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsPatrick Diehl
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Patrick Diehl
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchPatrick Diehl
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Patrick Diehl
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...Patrick Diehl
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Patrick Diehl
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsPatrick Diehl
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersPatrick Diehl
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsPatrick Diehl
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...Patrick Diehl
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Patrick Diehl
 

More from Patrick Diehl (16)

D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local models
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task Bench
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics models
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi Clusters
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic models
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
 

Recently uploaded

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Recently uploaded (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

Recent Developments in HPX, Octo-Tiger and Vectorization

  • 1. Recent developments in HPX and Octo-Tiger Patrick Diehl Joint work with: Gregor Daiß, Sagiv Schieber, Dominic Marcello, Kevin Huck, Hartmut Kaiser, Juhan Frank, Geoffery Clayton, Patrick Motl, Dirk Pflüger, Orsola DeMarco, Mikael Simberg, John Bidiscombe, Srinivas Yada, and many more Center for Computation & Technology, Louisiana State University Department of Physics & Astronomy patrickdiehl@lsu.edu November 2022 Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 1 / 44
  • 2. Motivation Astrophysical event: Merging of two stars – Flow on the surface which corresponds in layman terms to the weather on the stars. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 2 / 44
  • 3. Outline 1 Astrophysical application 2 Software framework Octo-Tiger HPX APEX 3 New features Kokkos and HPX Vectorization Work aggregation 4 Overhead measurements 5 Scaling Synchronous (MPI) vs asynchronous communication (libfabric) Scaling on ORNL’s Summit First experience on Fugaku using A64FX 6 Performance profiling 7 Conclusion and Outlook Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 3 / 44
  • 4. Outline 1 Astrophysical application 2 Software framework Octo-Tiger HPX APEX 3 New features Kokkos and HPX Vectorization Work aggregation 4 Overhead measurements 5 Scaling Synchronous (MPI) vs asynchronous communication (libfabric) Scaling on ORNL’s Summit First experience on Fugaku using A64FX 6 Performance profiling 7 Conclusion and Outlook Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 4 / 44
  • 5. Astrophysical application Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 5 / 44
  • 6. V1309 Scorpii At peak brightness, the rare 2002 red nova V838 Monocerotis briefly rivalled the most powerful stars in the Galaxy. Credit: NASA/ESA/H. E. Bond (STScl) A near-infrared (I band) light curve for V1309 Scorpii, plotted from OGLE data Reference Tylenda, R., et al. ”V1309 Scorpii: merger of a contact binary.” Astronomy & Astrophysics 528 (2011): A114. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 6 / 44
  • 7. Software framework Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 7 / 44
  • 8. Octo-Tiger Astrophysics open source program1 simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. Modules Hydro Gravity Radiation (benchmarking) Supports Communication: MPI/libfabric Backends: CUDA, HIP, Kokkos Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. 1 https://github.com/STEllAR-GROUP/octotiger Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 8 / 44
  • 9. Example of adaptive mesh refinement Reference Heller, Thomas, et al. ”Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars.” The International Journal of High Performance Computing Applications 33.4 (2019): 699-715. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 9 / 44
  • 10. Example of adaptive mesh refinement Reference Heller, Thomas, et al. ”Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars.” The International Journal of High Performance Computing Applications 33.4 (2019): 699-715. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 9 / 44
  • 11. HPX HPX is a open source C++ Standard Library for Concurrency and Parallelism2. Features HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications. HPX provides unified syntax and semantics for local and remote operations. HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity. Reference Kaiser, Hartmut, et al. ”Hpx-the c++ standard library for parallelism and concurrency.” Journal of Open Source Software 5.53 (2020): 2352. 2 https://github.com/STEllAR-GROUP/hpx Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 10 / 44
  • 12. HPX’s architecture Application Operating System C++2z Concurrency/Parallelism APIs Threading Subsystem Active Global Address Space (AGAS) Local Control Objects (LCOs) Parcel Transport Layer (Networking) API OS Performance Counter Framework Policy Engine/Policies Reference Kaiser, Hartmut, et al. ”Hpx-the c++ standard library for parallelism and concurrency.” Journal of Open Source Software 5.53 (2020): 2352. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 11 / 44
  • 13. APEX APEX: Autonomous Performance Environment for Exascale: Performance measurement library for distributed, asynchronous multitasking systems. CUPTI used to capture CUDA events NVML used to monitor the GPU OTF2 and Google Trace Events trace output Task Graphs and Trees Scatterplots of timers and counters Reference Huck, Kevin A., et al. ”An autonomic performance environment for exascale.” Supercomputing frontiers and innovations 2.3 (2015): 49-66. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 12 / 44
  • 14. APEX To support performance measurement in systems that employ user-level threading, APEX uses a dependency chain in addition to the call stack to produce traces and task dependency graphs. CUPTI used to capture CUDA events NVML used to monitor the GPU OTF2 and Google Trace Events trace output Task Graphs and Trees Scatterplots of timers and counters Reference Huck, Kevin A., et al. ”An autonomic performance environment for exascale.” Supercomputing frontiers and innovations 2.3 (2015): 49-66. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 12 / 44
  • 15. New features Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 13 / 44
  • 16. HPX and Kokkos Reference Edwards, H. Carter, Christian R. Trott, and Daniel Sunderland. ”Kokkos: Enabling manycore performance portability through polymorphic memory access patterns.” Journal of parallel and distributed computing 74.12 (2014): 3202-3216. Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 14 / 44
  • 17. Overhead Reference Daiß, Gregor, et al. ”Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX.” 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 15 / 44
  • 18. Distributed scaling Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 16 / 44
  • 19. Vectorizaton using Kokkos + HPX We can now easily switch both the SIMD library (Kokkos SIMD or std::experimental::simd) and the used SIMD extensions. Reference Daiß, Gregor, et al. ”From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types.” arXiv preprint arXiv:2210.06439 (2022). (Accepted to SC 22 workshop proceedings) Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 17 / 44
  • 20. Single node runs on different CPU architectures Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 18 / 44
  • 21. Work aggregation strategies Reference Daiß, Gregor, et al. ”From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels.” arXiv preprint arXiv:2210.06438 (2022). (Accepted to SC 22 workshop proceedings) Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 19 / 44
  • 22. Results Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 20 / 44
  • 23. Overhead measurements Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 21 / 44
  • 24. Task Bench Task Bench is a configurable benchmark for evaluating the efficiency and performance of parallel and distributed programming models, runtimes, and languages. It is primarily intended for evaluating task-based models, in which the basic unit of execution is a task, but it can be implemented in any parallel system. Reference Slaughter, Elliott, et al. ”Task bench: A parameterized benchmark for evaluating parallel runtime performance.” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 22 / 44
  • 25. Measurements Single Node: 1 task per core Distributed runs: 16 task per core METG (Minimum Effective Task Granularity): 50% percent effective task granularity, the time a system takes to achieve 50% overall efficiency. Take away: Ligth-weight threads and work-stealing comes with overhead, but not for large simulations. Reference Wu, Nanmiao, et al. ”Quantifying Overheads in Charm++ and HPX using Task Bench.” arXiv preprint arXiv:2207.12127 (2022). (Accepted to EuroPar 22 workshop proceedings) Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 23 / 44
  • 26. Scaling Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 24 / 44
  • 27. Synchronous (MPI) vs asynchronous communication (libfabric) Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 25 / 44
  • 28. Configuration Reference Daiß, Gregor, et al. ”From piz daint to the stars: Simulation of stellar mergers using high-level abstractions.” Proceedings of the international conference for high performance computing, networking, storage and analysis. 2019. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 26 / 44
  • 29. Synchronous vs asynchronous communication Reference Daiß, Gregor, et al. ”From piz daint to the stars: Simulation of stellar mergers using high-level abstractions.” Proceedings of the international conference for high performance computing, networking, storage and analysis. 2019. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 27 / 44
  • 30. Scaling on ORNL’s Summit Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 28 / 44
  • 31. Node level scaling: Hydro Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 29 / 44
  • 32. Distributes scaling: Hydro Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 30 / 44
  • 33. Node level scaling: Hydro + Gravity Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 31 / 44
  • 34. Distributed scaling: Hydro + Gravity Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 32 / 44
  • 35. First experience on Fugaku using A64FX Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 33 / 44
  • 36. Porting HPX to Arm Challenges on Fugaku: Add support for Parallel job manager (PJM) Cross compilation on x86 head node to A64FX compute nodes. Some of the dependencies do not support cross compilation. Reference Gupta, Nikunj, et al. ”Deploying a task-based runtime system on raspberry pi clusters.” 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2020. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 34 / 44
  • 37. Nodel-level scaling SVE vectorization reduced the computation time by a factor of two compared to Neon. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 35 / 44
  • 38. Distributed scaling Due to the 28 GB of memory, more nodes on Fugaku are required as on other machines. References: HPCI report submitted and IDPDS workshop paper in preparation. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 36 / 44
  • 39. Performance profiling Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 37 / 44
  • 40. Overhead measurements Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 38 / 44
  • 41. Task trees and task graphs Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. Diehl, Patrick, et al. ”Distributed, combined CPU and GPU profiling within HPX using APEX.” arXiv preprint arXiv:2210.06437 (2022). Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 39 / 44
  • 42. Sampled profile of tasks on Piz Daint and Summit Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 40 / 44
  • 43. Conclusion and Outlook Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 41 / 44
  • 44. Conclusion HPX and Octo-Tiger Asynchronous integration of GPU in HPX: → Using CUDA API or HIP API → Using Kokkos for Nvidia or AMD GPUs Providing HPX as a backend for Kokkos and integration of asynchronous Kokkos lanuches Work aggregation for small tasks on CPU and GPU Vectorization using std::simd Alternatives to MPI, like libfabric or LCI Programming model Work-stealing; overlapping communication and computation; and light-weighted threads can improve the performance of irregular workloads. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 42 / 44
  • 45. Outlook Outlook Distributed scaling data and optimization for AMD GPUs Optimization and large scale runs on Fugaku or Ookami Get ready for Intel GPU support Test the effect of libfabric on other systems More astrophysics studies to prepare for the production run with the light curve Advanced visualization of the large scale results Thanks to all my collaborators and without their effort, I could not present all these results. Thanks for your attention! Questions? Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 43 / 44
  • 46. Astrophysic validation Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 44 / 44
  • 47. Resolution convergence: Double white dwarf merger Reference Diehl, Patrick, et al. ”Performance Measurements Within Asynchronous Task-Based Runtime Systems: A Double White Dwarf Merger as an Application.” Computing in Science & Engineering 23.3 (2021): 73-81. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 45 / 44
  • 48. Higher reconstruction in the hydro module Reference Diehl, Patrick, et al. ”Octo-Tiger’s New Hydro Module and Performance Using HPX+ CUDA on ORNL’s Summit.” arXiv:2107.10987 (2021). (Accepted IEEE Cluster 21) Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 46 / 44
  • 49. Comparison with Flash I Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 47 / 44
  • 50. Comparison with Flash II Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 48 / 44
  • 51. This work is licensed under a Creative Commons “Attribution-NonCommercial- NoDerivatives 4.0 International” license. Patrick Diehl (CCT/LSU) HPX & Octo-Tiger November 2022 44 / 44