Summer 2024
OPENACC AND
OPEN HACKATHONS
MONTHLY HIGHLIGHTS
2
WHO IS OPENACC?
The OpenACC Organization is dedicated to helping the research and
developer community advance science by expanding their accelerated
and parallel computing skills.
3 Areas of Focus
Ecosystem
Development
Training/Education
OpenACC
Specification
Participating in work that enables
and/or advances interoperability of
parallel programming models in
compilers or tools with the vision of
transitioning to standard language
parallelism.
Managing one of the leading
hackathon series, our events have
been instrumental in training
thousands of domain experts and
accelerating over 550 scientific
applications using a variety of
programming models and tools.
Developing and utilizing the
OpenACC directives-based
programming model to port,
accelerate, or optimize scientific
applications.
3
silica IFPEN, RMM-DIIS on P100
OPENACC SPECIFICATION MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
400+ APPS* USING OPENACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. We’re
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
“ “
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4
LEARN MORE
The 2024 Open Accelerated Computing (OAC) Summit is a
platform to examine the latest research and advances in
scientific computing using the latest technologies and tools.
Topics covered include enabling interoperability of parallel
programming models, transitioning to standard language
parallelism, using the OpenACC directives-based
programming model to accelerate or optimize scientific
applications, shared experiences and use cases from Open
Hackathons, and more.
Take advantage of this opportunity to highlight your
research to our global community by submitting a talk
proposal today!
2024 OPEN ACCELERATED COMPUTING SUMMIT
Call for Speakers Closes September 30th!
5
UPCOMING OPEN HACKATHONS & BOOTCAMPS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
China NIM Open Hackathon September 30, 2024 October 11-30, 2024
NCHC Open Hackathon October 15, 2024 November 13, December 2-4, 2024
Helmholtz Open Hackathon February 10, 2025 April 1, 9-11, 2025
IDRIS Open Hackathon March 18, 2025 May 13, 20-22, 2025
Onsite and Digital: Our events offer in-person, digital and hybrid formats!
6
READ THE BLOG
Hosting more than 1400 projects in both HPC and AI domains, the
Institute for Development and Resources in Intensive Scientific
Computing (IDRIS) held their fourth consecutive Open Hackathon in
a hybrid format to help applications to run at scale on the Jean Zay
supercomputing system. The event saw eight teams with projects
ranging from bioinformatics to large language model (LLM) training
benefit from dedicated mentor collaboration and community
knowledge exchange.
IDRIS HOLDS A HYBRID OPEN HACKATHON
Four Years and Going Strong
7
WATCH THE VIDEO
Listen to the organizers from Helmholtz Zentrum Dresden
Rossendorf (HZDR) and Helmholtz Information & Data Science
Academy (HiDA) as they share their thoughts on how Open
Hackathons bring together interdisciplinary researchers and help
advance collaborative science.
2024 HELMHOLTZ OPEN HACKATHON
A Word from Our Organizers
The value that comes out of the hackathon is the connection and friendly
outside review of your code. New inspirations and new ideas come in, and
we get more or better scientific software that hopefully more scientists can
use, is easier to use, and is more performant so that everyone wins from
this in the end.
“
“
Guido Juckeland
Helmholtz-Zentrum Dresden-Rossendorf
8
WATCH NOW
In this on-demand webinar, Dr. Pedro Valero Lara from Oak Ridge
National Laboratory presents an alternative OpenACC backend for
Kokkos: KokkACC.
KokkACC provides a high-productivity programming environment
and potentially a multidevice backend. Competitive performance has
been observed; in some cases, KokkACC is faster than NVIDIA's
CUDA backend and much faster than OpenMP's GPU offloading
backend. This work also includes implementation details and a
detailed performance study conducted with a set of mini-
benchmarks (AXPY and DOT product) and three mini-apps
(LULESH, LAMMPS, and miniFE).
KOKKACC: ENHANCING KOKKOS WITH OPENACC
On-Demand Webinar Now Available
9
RESOURCES
In this work, we detail the GPU-porting of an in-house pseudo-spectral solver tailored towards large-
scale simulations of interface-resolved simulation of drop- and bubble-laden turbulent flows. The code
relies on direct numerical simulation of the Navier-Stokes equations, used to describe the flow field,
coupled with a phase-field method, used to describe the shape, deformation, and topological changes
of the interface of the drops or bubbles. The governing equations – Navier-Stokes and Cahn-Hilliard
equations – are solved using a pseudo-spectral method that relies on transforming the variables in
the wavenumber space. The code targets large-scale simulations of drop- and bubble-laden turbulent
flows and relies on a multilevel parallelism. The first level of parallelism relies on the message-
passing interface (MPI) and is used on multi-core architectures in CPU-based infrastructures. A
second level of parallelism relies on OpenACC directives and cuFFT libraries and is used to
accelerate the code execution when GPU-based infrastructures are targeted. The resulting
multiphase flow solver can be efficiently executed in heterogeneous computing infrastructures and
exhibits a remarkable speed-up when GPUs are employed. Thanks to the modular structure of the
code and the use of a directive-based strategy to offload code execution on GPUs, only minor code
modifications are required when targeting different computing architectures. This improves code
maintenance, version control and the implementation of additional modules or governing equations.
READ PAPER
Paper: A GPU-ready Pseudo-spectral Method for Direct
Numerical Simulations of Multiphase Turbulence
Alessio Roccon
Figure 4. Strong scaling results for the code FLOW36
obtained on Marconi100. Two different problems sizes
have been considered: Nx ×Ny ×Nz = 512 × 512 × 513
and Nx × Ny × Nz = 1024 × 1024 × 1025. For the
smaller problem size (blue dots), tests have been
performed starting from 1 node (4 GPUs) up to 32
nodes (128 GPUs) while, for the larger problem size
(red dots), starting from 8 nodes (32 GPUs) up to 64
nodes (256 GPUs).
10
RESOURCES
We GPU ported with CUDA the solver module of the Astrometric Verification Unit–Global
Sphere Reconstruction (AVU–GSR) pipeline for the ESA Gaia mission. The code finds the
astrometric parameters of approximately 108 sources, by solving a linear system with the
LSQR. The coefficient matrix is large (10–50 TB) and sparse. The CUDA code accelerates
over the original MPI + OpenMP solver of approximately 14x on CINECA cluster Marconi100.
We migrated the code production to Leonardo, which has 4x GPU memory per node. This
speedup was obtained without computing the system covariances, whose total number is
Nunk × (Nunk − 1)/2 and occupy approximately 1 EB with Nunk approximately 5 × 108. This “Big
Data” problem cannot be solved with standard methods: we defined a two jobs, I/O-based
pipeline, where one job writes the files and the second concurrent job reads them, iteratively
computes the covariances, and deletes them. The covariances calculation does not
significantly slow down the code until a number of covariances elements equal to
approximately 8 × 106 .
Paper: The Gaia AVU–GSR Solver: CPU+GPU Parallel Solutions for Linear
Systems Solving and Covariances Calculation Toward Exascale Systems
Valentina Cesare, Ugo Becciani, Alberto Vecchiato, Mario Gilberto Lattanzi, Marco Aldinucci,
and Beatrice Bucciarelli
READ PAPER
11
RESOURCES
There is a continuing interest in using standard language constructs for accelerated
computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran
codes, the do concurrent (DC) loop has been successfully demonstrated on the
NVIDIA platform. However, support for DC on other platforms has taken longer to
implement. Recently, Intel has added DC GPU offload support to its compiler, as
has HPE for AMD GPUs. In this paper, we explore the current portability of using
DC across GPU vendors using the in-production solar surface flux evolution code,
HipFT. We discuss implementation and compilation details, including when/where
using directive APIs for data movement is needed/desired compared to using a
unified memory system. The performance achieved on both data center and
consumer platforms is shown.
Paper: Portability of Fortran’s ‘do concurrent’ on GPUs
Ronald M. Caplan, Miko M. Stulajiter, Jon A. Linker, Jeff Larkin,
Henry A. Gabb, Shinquan Su, Ivan Rodriguez, Zachary Tschirhart,
and Nicholas Malaya
READ PAPER
Figure 1. Visualization of the HipFT test case used to
evaluate the DC implementations. We show the maps at
times 0, 200, 400, and 600 hours (left to right) for four of
the eight realizations (top to bottom).
12
RESOURCES
Documentation: OpenACC Programming
And Best Practices Guide
VIEW GUIDE
Explore a wealth of resources for writing portable code,
assessing application performance, and understanding
and exploring the OpenACC directives-based
programming model for parallelization and accelerated
computing.
This guide covers the OpenACC directives syntax,
profiling, parallelizing and optimizing loops, handling data,
interoperability and advanced features of the specification.
13
STAY IN THE KNOW: JOIN THE COMMUNITY
OPENACC AND HACKATHON UPDATES
JOIN TODAY
The OpenACC Organization is dedicated to
helping the research and developer community
advance science by expanding their accelerated
and parallel computing skills.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.
Keep abreast of the new tools, latest resources,
recent research, and upcoming events.
WWW.OPENACC.ORG
Learn more at

OpenACC and Open Hackathons Monthly Highlights: Summer 2024

  • 1.
    Summer 2024 OPENACC AND OPENHACKATHONS MONTHLY HIGHLIGHTS
  • 2.
    2 WHO IS OPENACC? TheOpenACC Organization is dedicated to helping the research and developer community advance science by expanding their accelerated and parallel computing skills. 3 Areas of Focus Ecosystem Development Training/Education OpenACC Specification Participating in work that enables and/or advances interoperability of parallel programming models in compilers or tools with the vision of transitioning to standard language parallelism. Managing one of the leading hackathon series, our events have been instrumental in training thousands of domain experts and accelerating over 550 scientific applications using a variety of programming models and tools. Developing and utilizing the OpenACC directives-based programming model to port, accelerate, or optimize scientific applications.
  • 3.
    3 silica IFPEN, RMM-DIISon P100 OPENACC SPECIFICATION MOMENTUM Wide Adoption Across Key HPC Codes ANSYS Fluent Gaussian VASP LSDalton MPAS GAMERA GTC XGC ACME FLASH COSMO Numeca 400+ APPS* USING OPENACC Prof. Georg Kresse Computational Materials Physics University of Vienna For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar to CUDA, and OpenACC dramatically decreases GPU development and maintenance efforts. We’re excited to collaborate with NVIDIA and PGI as an early adopter of Unified Memory. “ “ VASP Top Quantum Chemistry and Material Science Code * Applications in production and development
  • 4.
    4 LEARN MORE The 2024Open Accelerated Computing (OAC) Summit is a platform to examine the latest research and advances in scientific computing using the latest technologies and tools. Topics covered include enabling interoperability of parallel programming models, transitioning to standard language parallelism, using the OpenACC directives-based programming model to accelerate or optimize scientific applications, shared experiences and use cases from Open Hackathons, and more. Take advantage of this opportunity to highlight your research to our global community by submitting a talk proposal today! 2024 OPEN ACCELERATED COMPUTING SUMMIT Call for Speakers Closes September 30th!
  • 5.
    5 UPCOMING OPEN HACKATHONS& BOOTCAMPS COMPLETE LIST OF EVENTS Event Call Closes Event Date China NIM Open Hackathon September 30, 2024 October 11-30, 2024 NCHC Open Hackathon October 15, 2024 November 13, December 2-4, 2024 Helmholtz Open Hackathon February 10, 2025 April 1, 9-11, 2025 IDRIS Open Hackathon March 18, 2025 May 13, 20-22, 2025 Onsite and Digital: Our events offer in-person, digital and hybrid formats!
  • 6.
    6 READ THE BLOG Hostingmore than 1400 projects in both HPC and AI domains, the Institute for Development and Resources in Intensive Scientific Computing (IDRIS) held their fourth consecutive Open Hackathon in a hybrid format to help applications to run at scale on the Jean Zay supercomputing system. The event saw eight teams with projects ranging from bioinformatics to large language model (LLM) training benefit from dedicated mentor collaboration and community knowledge exchange. IDRIS HOLDS A HYBRID OPEN HACKATHON Four Years and Going Strong
  • 7.
    7 WATCH THE VIDEO Listento the organizers from Helmholtz Zentrum Dresden Rossendorf (HZDR) and Helmholtz Information & Data Science Academy (HiDA) as they share their thoughts on how Open Hackathons bring together interdisciplinary researchers and help advance collaborative science. 2024 HELMHOLTZ OPEN HACKATHON A Word from Our Organizers The value that comes out of the hackathon is the connection and friendly outside review of your code. New inspirations and new ideas come in, and we get more or better scientific software that hopefully more scientists can use, is easier to use, and is more performant so that everyone wins from this in the end. “ “ Guido Juckeland Helmholtz-Zentrum Dresden-Rossendorf
  • 8.
    8 WATCH NOW In thison-demand webinar, Dr. Pedro Valero Lara from Oak Ridge National Laboratory presents an alternative OpenACC backend for Kokkos: KokkACC. KokkACC provides a high-productivity programming environment and potentially a multidevice backend. Competitive performance has been observed; in some cases, KokkACC is faster than NVIDIA's CUDA backend and much faster than OpenMP's GPU offloading backend. This work also includes implementation details and a detailed performance study conducted with a set of mini- benchmarks (AXPY and DOT product) and three mini-apps (LULESH, LAMMPS, and miniFE). KOKKACC: ENHANCING KOKKOS WITH OPENACC On-Demand Webinar Now Available
  • 9.
    9 RESOURCES In this work,we detail the GPU-porting of an in-house pseudo-spectral solver tailored towards large- scale simulations of interface-resolved simulation of drop- and bubble-laden turbulent flows. The code relies on direct numerical simulation of the Navier-Stokes equations, used to describe the flow field, coupled with a phase-field method, used to describe the shape, deformation, and topological changes of the interface of the drops or bubbles. The governing equations – Navier-Stokes and Cahn-Hilliard equations – are solved using a pseudo-spectral method that relies on transforming the variables in the wavenumber space. The code targets large-scale simulations of drop- and bubble-laden turbulent flows and relies on a multilevel parallelism. The first level of parallelism relies on the message- passing interface (MPI) and is used on multi-core architectures in CPU-based infrastructures. A second level of parallelism relies on OpenACC directives and cuFFT libraries and is used to accelerate the code execution when GPU-based infrastructures are targeted. The resulting multiphase flow solver can be efficiently executed in heterogeneous computing infrastructures and exhibits a remarkable speed-up when GPUs are employed. Thanks to the modular structure of the code and the use of a directive-based strategy to offload code execution on GPUs, only minor code modifications are required when targeting different computing architectures. This improves code maintenance, version control and the implementation of additional modules or governing equations. READ PAPER Paper: A GPU-ready Pseudo-spectral Method for Direct Numerical Simulations of Multiphase Turbulence Alessio Roccon Figure 4. Strong scaling results for the code FLOW36 obtained on Marconi100. Two different problems sizes have been considered: Nx ×Ny ×Nz = 512 × 512 × 513 and Nx × Ny × Nz = 1024 × 1024 × 1025. For the smaller problem size (blue dots), tests have been performed starting from 1 node (4 GPUs) up to 32 nodes (128 GPUs) while, for the larger problem size (red dots), starting from 8 nodes (32 GPUs) up to 64 nodes (256 GPUs).
  • 10.
    10 RESOURCES We GPU portedwith CUDA the solver module of the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) pipeline for the ESA Gaia mission. The code finds the astrometric parameters of approximately 108 sources, by solving a linear system with the LSQR. The coefficient matrix is large (10–50 TB) and sparse. The CUDA code accelerates over the original MPI + OpenMP solver of approximately 14x on CINECA cluster Marconi100. We migrated the code production to Leonardo, which has 4x GPU memory per node. This speedup was obtained without computing the system covariances, whose total number is Nunk × (Nunk − 1)/2 and occupy approximately 1 EB with Nunk approximately 5 × 108. This “Big Data” problem cannot be solved with standard methods: we defined a two jobs, I/O-based pipeline, where one job writes the files and the second concurrent job reads them, iteratively computes the covariances, and deletes them. The covariances calculation does not significantly slow down the code until a number of covariances elements equal to approximately 8 × 106 . Paper: The Gaia AVU–GSR Solver: CPU+GPU Parallel Solutions for Linear Systems Solving and Covariances Calculation Toward Exascale Systems Valentina Cesare, Ugo Becciani, Alberto Vecchiato, Mario Gilberto Lattanzi, Marco Aldinucci, and Beatrice Bucciarelli READ PAPER
  • 11.
    11 RESOURCES There is acontinuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the do concurrent (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown. Paper: Portability of Fortran’s ‘do concurrent’ on GPUs Ronald M. Caplan, Miko M. Stulajiter, Jon A. Linker, Jeff Larkin, Henry A. Gabb, Shinquan Su, Ivan Rodriguez, Zachary Tschirhart, and Nicholas Malaya READ PAPER Figure 1. Visualization of the HipFT test case used to evaluate the DC implementations. We show the maps at times 0, 200, 400, and 600 hours (left to right) for four of the eight realizations (top to bottom).
  • 12.
    12 RESOURCES Documentation: OpenACC Programming AndBest Practices Guide VIEW GUIDE Explore a wealth of resources for writing portable code, assessing application performance, and understanding and exploring the OpenACC directives-based programming model for parallelization and accelerated computing. This guide covers the OpenACC directives syntax, profiling, parallelizing and optimizing loops, handling data, interoperability and advanced features of the specification.
  • 13.
    13 STAY IN THEKNOW: JOIN THE COMMUNITY OPENACC AND HACKATHON UPDATES JOIN TODAY The OpenACC Organization is dedicated to helping the research and developer community advance science by expanding their accelerated and parallel computing skills. Take an active role in influencing the future of both the OpenACC specification and the organization itself by becoming a member of the community. Keep abreast of the new tools, latest resources, recent research, and upcoming events.
  • 14.