September 2020
OPENACC MONTHLY
HIGHLIGHTS
2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
200 APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. We’re
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
“ “
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4
VIEW SESSIONS NOW
The OpenACC Summit 2020 brought together users of the OpenACC
programming model and members of OpenACC organization across
national laboratories, research institutions and industry.
This year’s Summit was completely online and featured a keynote, Birds
of a Feather (BOF) interactive discussion, and invited talks across
multiple disciplines of science.
Speakers included:
OPENACC SUMMIT 2020: ON-DEMAND
• Martijn Marsman, University of Vienna
• Daniel Neuhauser, University of California, Los Angeles
• Peter Willendrup, European Spallation Source, Technical University of Denmark
• Dossay Oryspayev, National Energy Research Scientific Computing Center (NERSC)
• Igor Sfiligoi, San Diego Supercompuer Center
• Niclas Jansson, KTH Royal Institute of Technology
• Min-Gu Yoo, Princeton Plasma Physics Laboratory
• Antonio Ragagnin, Ludwig-Maximilians-Universität München
• Phil Hasnip, University of York
• Andrew Powis, Princeton University
5
DON’T MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
SFU AI For Science GPU Bootcamp (Digital) October 28, 2020 November 4-5, 2020
NOAA GPU Hackathon (Digital) October 8, 2020 December 1-9, 2020
SFU HPC OpenACC GPU Bootcamp (Digital) November 24, 2020 December 1-2, 2020
New in 2020: Many of our events are happening digitally! Get the same high-touch training and
mentorship without the hassle of travel!
6
READ BLOG
OpenACC has provided a high-level option for GPU
programmers for years. Application developers interested
in GPU-accelerated performance without the details,
complications, and overhead of programming in a
language, such as CUDA, have found OpenACC to be an
attractive solution. However, OpenACC's potential as an
efficient option for other types of accelerators, such as
Field Programmable Gate Arrays (FPGAs), is still under
exploration. A research team with collaborators from the
University of Oregon and Oak Ridge National Laboratory
is investigating this exact question.
Read this blog by Jacob Lambert from University of
Oregon to learn more about their development of an
OpenACC-to-FPGA framework.
CAN OPENACC SIMPLIFY FPGA
PROGRAMMING?
7
READ ARTICLE
Like most of the conferences, seminars, and
workshops taking place this year across the
country and around the world, the recent
GPU hackathon hosted by NERSC was a
fully virtual affair. Held July 13-15 in
conjunction with NVIDIA, the Oak Ridge
Leadership Computing Facility, and
OpenACC as part of the GPU Hackathons
series, the event served as an innovative
model for what could be the next generation
of HPC hackathons.
NERSC HOST VIRTUAL GPU HACKATHON
8
READ NEWS
One of the most widely used compilers at the OLCF, the
GNU Compiler Collection (GCC), is favored not only for
its high quality but also for its availability across platforms.
Because it’s open-source, it often comes as a default on
computers running the Linux operating system and is
easily installed on Windows and Mac systems. Now, the
OLCF has contracted with Mentor, a Siemens business,
who will contribute to the compiler’s development to better
meet the needs of OLCF users.
OLCF FOSTERS GCC COMPILER
DEVELOPMENT WITH MENTOR CONTRACT
9
VOTE TODAY
Each year the HPCwire Readers’ Choice Awards are
determined by their readers across the HPC community,
to recognize the most outstanding individuals,
organizations, products, and technologies in the industry.
Many great entries have been submitted – now it’s up to
you to support the best and brightest. Voting closes
October 12th at 11:59 PM PT.
Vote today!
2020 HPCWIRE READERS’ CHOICE
AWARDS: VOTING IS OPEN!
10
RESOURCES
Paper: 8 Steps to 3.7 TFLOP/s on NVIDIA V100
GPU: Roofline Analysis and Other Tricks
Charlene Yang
Performance optimization can be a daunting task especially as the hardware
architecture becomes more and more complex. This paper takes a kernel from
the Materials Science code BerkeleyGW and demonstrates a few performance
analysis and optimization techniques. Despite challenges such as high register
usage, low occupancy, complex data access patterns, and the existence of
several long-latency instructions, we have achieved 3.7 TFLOP/s of double-
precision performance on an NVIDIA V100 GPU, with 8 optimization steps. This
is 55% of the theoretical peak, 6.7 TFLOP/s, at nominal frequency 1312 MHz,
and 70% of the more customized peak based on our 58% FMA ratio, 5.3
TFLOP/s. An array of techniques used to analyze this OpenACC kernel and
optimize its performance are shown, including the use of hierarchical Roofline
performance model and the performance tool Nsight Compute. This kernel
exhibits computational characteristics that are commonly seen in many high-
performance computing (HPC) applications and are expected to be very helpful
to a general audience of HPC developers and computational scientists, as they
pursue more performance on NVIDIA GPUs..
READ PAPER
11
RESOURCES
Paper: Accelerating Spatial Cross-Matching on
CPU-GPU Hybrid Platform With CUDA and
OpenACC
Furqan Baig, Chao Gao, Dejun Teng, Jun Kong and
Fusheng Wang
In this paper, we present a CPU-GPU hybrid platform to accelerate the cross-
matching operation of geospatial datasets. We propose a pipeline of geospatial
subtasks that are dynamically scheduled to be executed on either CPU or
GPU. To accommodate geospatial datasets processing on GPU using
pixelization approach, we convert the floating point-valued vertices into integer-
valued vertices with an adaptive scaling factor as a function of the area of
minimum bounding box. We present a comparative analysis of GPU enabled
cross-matching algorithm implementation in CUDA and OpenACC accelerated
C++. We test our implementations over Natural Earth Data and our results
indicate that although CUDA based implementations provide better
performance, OpenACC accelerated implementations are more portable and
extendable while still providing considerable performance gain as compared to
CPU. We also investigate the effects of input data size on the IO / computation
ratio and note that a larger dataset compensates for IO overheads associated
with GPU computations. Finally, we demonstrate that an efficient cross-
matching comparison can be achieved with a cost-effective GPU.
READ PAPER
Fig. 2. Approximation of geospatial object with a scaling factor k to
convert vector based expression to pixel based expression.
12
RESOURCES
Books, eBooks and online courses: InformIT
VISIT SITE
InformIT, a part of Pearson, is your one-stop resource for
Addison-Wesley DRM-free eBooks and video courses for
learning tech skills including game development,
programming, and data engineering.
Through the end of 2020, InformIT is offering the community
35% off books or eBooks and 50% off video courses with
coupon code: NVIDIA.
13
RESOURCES
Website: GPUHackathons.org
Technical Resources
VISIT SITE
Explore a wealth of resources for GPU-accelerated
computing across HPC, AI and Big Data.
Review a collection of videos, presentations, GitHub repos,
tutorials, libraries and more to help you advance your skills
and expand your knowledge.
14
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users’ active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.
WWW.OPENACC.ORG
Learn more at

OpenACC Monthly Highlights September 2020

  • 1.
  • 2.
    2 WHAT IS OPENACC? main() { <serialcode> #pragma acc kernels { <parallel code> } } Add Simple Compiler Directive POWERFUL & PORTABLE Directives-based programming model for parallel computing Designed for performance and portability on CPUs and GPUs SIMPLE Open Specification Developed by OpenACC.org Consortium
  • 3.
    3 silica IFPEN, RMM-DIISon P100 OPENACC GROWING MOMENTUM Wide Adoption Across Key HPC Codes ANSYS Fluent Gaussian VASP LSDalton MPAS GAMERA GTC XGC ACME FLASH COSMO Numeca 200 APPS* USING OpenACC Prof. Georg Kresse Computational Materials Physics University of Vienna For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar to CUDA, and OpenACC dramatically decreases GPU development and maintenance efforts. We’re excited to collaborate with NVIDIA and PGI as an early adopter of Unified Memory. “ “ VASP Top Quantum Chemistry and Material Science Code * Applications in production and development
  • 4.
    4 VIEW SESSIONS NOW TheOpenACC Summit 2020 brought together users of the OpenACC programming model and members of OpenACC organization across national laboratories, research institutions and industry. This year’s Summit was completely online and featured a keynote, Birds of a Feather (BOF) interactive discussion, and invited talks across multiple disciplines of science. Speakers included: OPENACC SUMMIT 2020: ON-DEMAND • Martijn Marsman, University of Vienna • Daniel Neuhauser, University of California, Los Angeles • Peter Willendrup, European Spallation Source, Technical University of Denmark • Dossay Oryspayev, National Energy Research Scientific Computing Center (NERSC) • Igor Sfiligoi, San Diego Supercompuer Center • Niclas Jansson, KTH Royal Institute of Technology • Min-Gu Yoo, Princeton Plasma Physics Laboratory • Antonio Ragagnin, Ludwig-Maximilians-Universität München • Phil Hasnip, University of York • Andrew Powis, Princeton University
  • 5.
    5 DON’T MISS THESEUPCOMING EVENTS COMPLETE LIST OF EVENTS Event Call Closes Event Date SFU AI For Science GPU Bootcamp (Digital) October 28, 2020 November 4-5, 2020 NOAA GPU Hackathon (Digital) October 8, 2020 December 1-9, 2020 SFU HPC OpenACC GPU Bootcamp (Digital) November 24, 2020 December 1-2, 2020 New in 2020: Many of our events are happening digitally! Get the same high-touch training and mentorship without the hassle of travel!
  • 6.
    6 READ BLOG OpenACC hasprovided a high-level option for GPU programmers for years. Application developers interested in GPU-accelerated performance without the details, complications, and overhead of programming in a language, such as CUDA, have found OpenACC to be an attractive solution. However, OpenACC's potential as an efficient option for other types of accelerators, such as Field Programmable Gate Arrays (FPGAs), is still under exploration. A research team with collaborators from the University of Oregon and Oak Ridge National Laboratory is investigating this exact question. Read this blog by Jacob Lambert from University of Oregon to learn more about their development of an OpenACC-to-FPGA framework. CAN OPENACC SIMPLIFY FPGA PROGRAMMING?
  • 7.
    7 READ ARTICLE Like mostof the conferences, seminars, and workshops taking place this year across the country and around the world, the recent GPU hackathon hosted by NERSC was a fully virtual affair. Held July 13-15 in conjunction with NVIDIA, the Oak Ridge Leadership Computing Facility, and OpenACC as part of the GPU Hackathons series, the event served as an innovative model for what could be the next generation of HPC hackathons. NERSC HOST VIRTUAL GPU HACKATHON
  • 8.
    8 READ NEWS One ofthe most widely used compilers at the OLCF, the GNU Compiler Collection (GCC), is favored not only for its high quality but also for its availability across platforms. Because it’s open-source, it often comes as a default on computers running the Linux operating system and is easily installed on Windows and Mac systems. Now, the OLCF has contracted with Mentor, a Siemens business, who will contribute to the compiler’s development to better meet the needs of OLCF users. OLCF FOSTERS GCC COMPILER DEVELOPMENT WITH MENTOR CONTRACT
  • 9.
    9 VOTE TODAY Each yearthe HPCwire Readers’ Choice Awards are determined by their readers across the HPC community, to recognize the most outstanding individuals, organizations, products, and technologies in the industry. Many great entries have been submitted – now it’s up to you to support the best and brightest. Voting closes October 12th at 11:59 PM PT. Vote today! 2020 HPCWIRE READERS’ CHOICE AWARDS: VOTING IS OPEN!
  • 10.
    10 RESOURCES Paper: 8 Stepsto 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks Charlene Yang Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex. This paper takes a kernel from the Materials Science code BerkeleyGW and demonstrates a few performance analysis and optimization techniques. Despite challenges such as high register usage, low occupancy, complex data access patterns, and the existence of several long-latency instructions, we have achieved 3.7 TFLOP/s of double- precision performance on an NVIDIA V100 GPU, with 8 optimization steps. This is 55% of the theoretical peak, 6.7 TFLOP/s, at nominal frequency 1312 MHz, and 70% of the more customized peak based on our 58% FMA ratio, 5.3 TFLOP/s. An array of techniques used to analyze this OpenACC kernel and optimize its performance are shown, including the use of hierarchical Roofline performance model and the performance tool Nsight Compute. This kernel exhibits computational characteristics that are commonly seen in many high- performance computing (HPC) applications and are expected to be very helpful to a general audience of HPC developers and computational scientists, as they pursue more performance on NVIDIA GPUs.. READ PAPER
  • 11.
    11 RESOURCES Paper: Accelerating SpatialCross-Matching on CPU-GPU Hybrid Platform With CUDA and OpenACC Furqan Baig, Chao Gao, Dejun Teng, Jun Kong and Fusheng Wang In this paper, we present a CPU-GPU hybrid platform to accelerate the cross- matching operation of geospatial datasets. We propose a pipeline of geospatial subtasks that are dynamically scheduled to be executed on either CPU or GPU. To accommodate geospatial datasets processing on GPU using pixelization approach, we convert the floating point-valued vertices into integer- valued vertices with an adaptive scaling factor as a function of the area of minimum bounding box. We present a comparative analysis of GPU enabled cross-matching algorithm implementation in CUDA and OpenACC accelerated C++. We test our implementations over Natural Earth Data and our results indicate that although CUDA based implementations provide better performance, OpenACC accelerated implementations are more portable and extendable while still providing considerable performance gain as compared to CPU. We also investigate the effects of input data size on the IO / computation ratio and note that a larger dataset compensates for IO overheads associated with GPU computations. Finally, we demonstrate that an efficient cross- matching comparison can be achieved with a cost-effective GPU. READ PAPER Fig. 2. Approximation of geospatial object with a scaling factor k to convert vector based expression to pixel based expression.
  • 12.
    12 RESOURCES Books, eBooks andonline courses: InformIT VISIT SITE InformIT, a part of Pearson, is your one-stop resource for Addison-Wesley DRM-free eBooks and video courses for learning tech skills including game development, programming, and data engineering. Through the end of 2020, InformIT is offering the community 35% off books or eBooks and 50% off video courses with coupon code: NVIDIA.
  • 13.
    13 RESOURCES Website: GPUHackathons.org Technical Resources VISITSITE Explore a wealth of resources for GPU-accelerated computing across HPC, AI and Big Data. Review a collection of videos, presentations, GitHub repos, tutorials, libraries and more to help you advance your skills and expand your knowledge.
  • 14.
    14 STAY IN THEKNOW: JOIN THE OPENACC COMMUNITY JOIN TODAY The OpenACC specification is designed for, and by, users meaning that the OpenACC organization relies on our users’ active participation to shape the specification and to educate the scientific community on its use. Take an active role in influencing the future of both the OpenACC specification and the organization itself by becoming a member of the community.
  • 15.