Stay up-to-date with the OpenACC Monthly Highlights. July's edition covers the OpenACC Summit 2021, GCC, upcoming GPU Hackathons and Bootcamps, Sunita Chandrasekaran named as PI for SOLLVE Project, recent research and more!
2. 2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3. 3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
230+ APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. Weâre
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
â â
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4. 4
VIEW PRESENTATIONS
Expanding from the annual meeting, the OpenACC Summit
brings together users of the OpenACC programming model,
members of the OpenACC organization, and the scientific
computing community across national laboratories, research
institutions and industry.
This yearâs Summit was held online and featured daily
keynotes, a Birds-of-a-Feather (BoF) interactive discussion,
invited talks, and a âConnect with Expertsâ session.
Presentations are now available.
OPENACC ANNUAL SUMMIT 2021
PRESENTATIONS NOW AVAILABLE
5. 5
UPCOMING GPU BOOTCAMP & HACKATHONS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
ENCCS & NSC Megatron Bootcamp October 6, 2021 October 25-27, 2021
KSC GPU Bootcamp October 18, 2021 October 25, 2021
HPCCC Croatia N Ways To GPU Programming Bootcamp October 13, 2021 October 28-29, 2021
Tsinghua GPU Hackathon 2021 October 3.2021 November 3, 10-12, 2021
NERSC December GPU Hackathon 2021 October 8, 2021 December 1, 7-9, 2021
ENCCS GPU Hackathon 2021 October 6, 2021 December 6, 13-15, 2021
CHPC AI For Science Bootcamp November 24, 2021 December 7-8, 2021
NOAA AI For Science Bootcamp November 24, 2021 December 3, 2021
Digital in 2021: Our virtual events continue to offer the same high-touch training and mentorship without the
hassle of travel!
6. 6
READ ARTICLE
While the GNU Compiler Collection has supported OpenACC
for a few years now as this parallel programming standard
popular with GPUs/accelerators, the current implementation
has been found to be inadequate for many real-world HPC
workloads leveraging OpenACC. Fortunately, Siemens has
been working to improve GCC's OpenACC kernels support.
Forthcoming work to the code offers a more unified internal
representation of the kernel and parallel regions, data-
dependence analysis using Graphite, improvements to
Graphite itself, and language front-end improvements too.
BETTER SUPPORT AND PERFORMANCE FOR
OPENACC KERNELS COMING TO GCC
7. 7
LEARN MORE
The US Department of Energyâs (DOEâs) Exascale Computing
Project (ECP) has named Sunita Chandrasekaran of
Brookhaven National Laboratory (BNL) as principal
investigator of the SOLLVE project.
As the new PI of SOLLVE, Chandrasekaran is charged with
developing leading-edge parallel programming models and
tools in collaboration with the LLVM community and
participating high-performance computing system vendors.
SUNITA CHANDRASEKARAN SELECTED
AS PRINCIPAL INVESTIGATOR OF
SOLLVE PROJECT
8. 8
SUBMIT NOW
Hyperion Research and the HPC User Forum Steering
Committee are now accepting submissions for the 18th round
of the HPC Innovation Excellence Awards. The program
started in 2011 and has recognized nearly 100 outstanding
HPC-supported achievements in academia, government and
the private sector.
Award entries are invited from throughout the world and may
be submitted at any time on or before October 25, 2021.
HPC INNOVATION EXCELLENCE AWARDS
NOW ACCEPTING SUBMISSIONS
9. 9
VIEW ARTICLE
The staff at the Oak Ridge Leadership Computing Facility,
backed by the Exascale Computing Project community and
technology partner HPE, is hard at work fielding the United
Statesâ first exascale system this year, and ensuring that the
system is ready for real science on day one.
This is a major milestone in the DOEâs larger mission goal to
âengage in research and development to create a capable
exascale computing ecosystem that integrates hardware and
software capability delivering at least 100 times the
performance of current 10 petaflops (10^15 floating-point
operations per second) systems across a range of
applications representing government needs and societal
priorities such as Artificial Intelligence (Al) technologies.â
FRONTIER INSTALLATION UNDERWAY
US CLOSES IN ON EXASCALE
Source: ORNL/DOE
10. 10
RESOURCES
Paper: Progress Towards Accelerating the Unified Model on Hybrid
Multi-Core Systems
Wei Zhang, Min Xu, Katherine Evans, Matthew Norman, Mario Morales-Hernandez,
Salil Mahajan, Adrian Hill, James Manners, Ben Shipway, and Maynard Christopher
The cloud microphysics scheme, CASIM, and the radiation scheme,
SOCRATES, are two computationally intensive parts within the Met
Office's Unified Model (UM). This study enables CASIM and SOCRATES
to use accelerated multi-core systems for optimal computational
performance of the UM. Using profiling to guide our efforts, we
refactored the code for optimal threading and kernel arrangement and
implemented OpenACC directives manually or through the CLAW
source-to-source translator. Initial porting results achieved 10.02x and
9.25x speedup in CASIM and SOCRATES respectively on 1 GPU
compared with 1 CPU core. A granular performance analysis of the
strategy and bottlenecks are discussed. These improvements will enable
UM to run on heterogeneous computers and a path forward for further
improvements is provided.
READ PAPER Fig. 4: The GPU speedup for microphysics_common and the entire
CASIM in experiments with different number of columns
11. 11
RESOURCES
Paper: Strong Scaling of OpenACC Enabled Nek5000 on Several GPU
Based HPC Systems
Jonathan Vincent, Jing Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur
Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson,
Dirk Pleiter, Erwin Laure, and Philipp Schlatter
We present new results on the strong parallel scaling for the OpenACC-accelerated
implementation of the high-order spectral element fluid dynamics solver Nek5000. The
test case considered consists of a direct numerical simulation of fully-developed
turbulent flow in a straight pipe, at two different Reynolds numbers ReT =360 and ReT =
500, based on friction velocity and pipe radius. The strong scaling is tested on several
GPU-enabled HPC systems, including the Swiss Piz Daint system, TACCâs Longhorn,
JĂźlichâs JUWELS Booster, and Berzelius in Sweden. The performance results show that
speed-up between 3-5 can be achieved using the GPU accelerated version compared
with the CPU version on these different systems. The run-time for 20 timesteps reduces
from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for ReT =
550 case on JUWELS Booster system. This illustrates the GPU accelerated version the
potential for high throughput. At the same time, the strong scaling limit is significantly
larger for GPUs, at about 2000 â 5000 elements per rank; compared to about 50 â 100
for a CPU-rank.
READ PAPER Fig. 2: Execution time for 1000 spectral elements as a function
of element polynomial order N using shared memory and no
shared memory.
12. 12
RESOURCES
Paper: Accelerating an Iterative Eigensolver for
Nuclear Structure Configuration Interaction
Calculations on GPUs using OpenACC
Pieter Marisa, Chao Yang, Dossay Oryspayev, and
Brandon Cook
To accelerate the solution of large eigenvalue problems arising from many-body calculations in
nuclear physics on distributed-memory parallel systems equipped with general-purpose Graphic
Processing Units (GPUs), we modified a previously developed hybrid MPI/OpenMP
implementation of an eigensolver written in FORTRAN 90 by using an OpenACC directives-based
programming model. Such an approach requires making minimal changes to the original code and
enables a smooth migration of large-scale nuclear structure simulations from a distributed-memory
many-core CPU system to a distributed GPU system. However, in order to make the OpenACC
based eigensolver run efficiently on GPUs, we need to take into account the architectural
differences between a many-core CPU and a GPU device. Consequently, the optimal way to insert
OpenACC directives may be different from the original way of inserting OpenMP directives. We
point out these differences in the implementation of sparse matrix-matrix multiplications (SpMM),
which constitutes the main cost of the eigensolver, as well as other differences in the
preconditioning step and dense linear algebra operations. We compare the performance of the
OpenACC based implementation executed on multiple GPUs with the performance on distributed-
memory many-core CPUs, and demonstrate significant speedup achieved on GPUs compared to
the on-node performance of a many-core CPU. We also show that the overall performance
improvement of the eigensolver on multiple GPUs is more modest due to the communication
overhead among different MPI ranks.
READ PAPER
Figure 8: A on-device performance comparison among three
versions of the OpenACC implementation of the SpMM
subroutine used in LOBPCG)
13. 13
RESOURCES
Website: GPUHackathons.org
Technical Resources
VISIT SITE
Explore a wealth of resources for GPU-accelerated
computing across HPC, AI and Big Data.
Review a collection of videos, presentations, GitHub repos,
tutorials, libraries and more to help you advance your skills
and expand your knowledge.
14. 14
LEARN MORE
Are you passionate about science and technology? Want to help
researchers accelerate their scientific codes or refine their AI projects?
Become a GPU Hackathon mentor and connect scientists to available
tools, techniques and GPU architectures. Sharpen your skills, learn about
the latest research efforts, and collaborate across the computing and
scientific community. Certification is available for HPC and AI pathways.
I strongly believe in the concept of community science. Scientists
should not be siloed; they should be continually contributing their
codes back and working collaboratively on projects at these GPU
Hackathons.
â â
Max Katz
Senior Solutions Architect, NVIDIA
Mentor for GPU Hackathons
BECOME A GPU HACKATHON MENTOR:
ADVANCE SCIENCE WITH TECHNOLOGY
15. 15
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our usersâ active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.