2. 2
OPENACC DIRECTIVES ARE DESIGNED FOR
Scientists to
help do
more science
and less
programming
Performance
portability on
CPUs, GPUs
and other
platforms
1 2
3. 3
SINGLE CODE FOR MULTIPLE PLATFORMS
OpenPOWER
Sunway
x86 CPU
x86 Xeon Phi
NVIDIA GPU
AMD
PEZY-SC
9x 10x 11x
52x
9x 10x 11x
77x
0x
20x
40x
60x
80x
SpeedupvsSingleHaswellCore
PGI OpenACC
Intel OpenMP
IBM OpenMP
Dual Haswell Dual Broadwell Dual POWER8 1 Tesla
P100
OpenACC - Performance Portable Programming Model for HPC
1 Tesla
V100
AWE Hydrodynamics CloverLeaf mini-App, bm32 data set
Systems: Haswell: 2x16 core Haswell server, four K80s, CentOS 7.2 (perf-hsw10), Broadwell: 2x20 core Broadwell server, eight P100s (dgx1-prd-01), Minsky: POWER8+NVLINK, four P100s,
RHEL 7.3 (gsn1).
Compilers: Intel 17.0, IBM XL 13.1.3, PGI 16.10, KNL: Compiler version: 17.0.1 20161005,
Benchmark: CloverLeaf v1.3 downloaded from http://uk-mac.github.io/CloverLeaf the week of November 7 2016; CloverlLeaf_Serial; CloverLeaf_ref (MPI+OpenMP); CloverLeaf_OpenACC
(MPI+OpenACC)
Data compiled by PGI November 2016, Volta data collected June 2017
4. 4
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes and Domains
3 of Top 5 HPC Applications
ANSYS Fluent, Gaussian, VASP
40 core Broadwell 1 P100 2 P100 4 P100
0
1
2
3
4
5
Speed-up
vasp_std (5.4.4)
vasp_gpu (5.4.4)
OpenACC port (latest)
VASP, silica IFPEN, RMM-DIIS on P100
* OpenACC port covers more VASP routines than CUDA, OpenACC port planned top down, with complete analysis
of the call tree, OpenACC port leverages improvements in latest VASP Fortran source base
silica IFPEN, RMM-DIIS on
P100
CAAR Codes
GTC
XGC
ACME
FLASH
LSDalton
5. 5
2018 HACKATHONS ANNOUNCED
Worldwide 5-day events, 2 mentors/team, >85 codes accelerated to date
TU Dresden
Hackathon
Call for Applications:
January 14, 2018
Event Dates:
March 5-9, 2018
1st 2018 Hackathon
SUBMIT APPLICATION
6. 6
PGI COMMUNITY EDITION 17.10 RELEASED
Tesla V100 GPU Support
OpenACC for CUDA Unified Memory
Automatic Deep Copy of Fortran Derived Types in OpenACC
Use C++14 Lambdas with Capture in OpenACC Regions
New Profiling Features for OpenACC and CUDA Unified
Memory
New cuSOLVER Library Interoperability
PGI Unified Binary for Tesla and Multicore
and much more…
DOWNLOAD NOW
No-cost license to
a recent release of the
PGI Fortran, C and C++
compilers and tools for
multicore CPUs and
NVIDIA Tesla GPUs,
including all
OpenACC, OpenMP
and CUDA Fortran
features.
7. 7
BROAD ADOPTION IN WEATHER & CLIMATE
OpenACC is Widely Deployed in Weather and Climate Domain
"Our team has been investigating OpenACC as
a pathway to performance portability for the
MPAS global atmospheric model. Using this
approach for the MPAS dynamical core, we
have achieved speedups ranging between 2.5
and 2.7, relative to a dual-socket, 18-core Intel
Xeon v4 node.”
Dr. Rich Loft
Director of Technology Development at
the NCAR Computational Information
and Systems Laboratory
Key Models Worldwide
Using OpenACC
silica IFPEN, RMM-DIIS on P100
MPAS
ACME
COSMO
IFS(ESCAPE)
NICAM
ICON
8. 8
GORDON BELL FINALISTS TAP INTO OPENACC
“OpenACC allowed us to scale the CAM-SE to over
1.8 million Sunway cores with a simulation speed
of over 3 simulated years per day. Without
OpenACC, the team would have spent years
coding for the complexity of the components in the
TaihuLight Supercomputer.”
Petascale-level CAM-SE Performance on TaihuLight
Prof. Haohuan Fu,
Deputy Director of the National
Supercomputing Center in Wuxi
9. 9
OPENACC ONLINE COURSE RECORDINGS
Free lectures and hands-on labs
WATCH NOW
silica IFPEN, RMM-DIIS on P100
John Urbanic
Parallel Computing Specialist
Pittsburgh Supercomputing Center
Jeff Larkin
Developer Technology Software
Engineer NVIDIA
10. 10
RESOURCES
Paper: Parallware Trainer: Interactive Tool for Experiential Learning of
Parallel Programming using OpenMP and OpenACC
Recorded SC17 Talks
1. Application Readiness Projects for the Summit Architecture
2. Accelerating HPC Programmer Productivity with OpenACC and CUDA Unified
Memory
3. An Agile Approach to Building a GPU-enabled and Performance-portable Global
Cloud-resolving Atmospheric Model
READ NOW
“This paper presents Parallware Trainer, a new interactive tool for high-productivity STEM
education and training in parallel programming using OpenMP 4.5 and OpenACC 2.5. It enables
experiential learning by providing an interactive, real-time GUI with editor capabilities to assist in
the design, implementation and benchmarking of OpenMP/OpenACC-enabled parallel code.”
11. 11
&
OPENACC.ORG ANNOUNCEMENTS
New Members New Officers
Robert Henschel
Treasurer
Director, Science
Community Tools,
Indiana University
New Specification
OpenACC 2.6 Specification
Approved
Includes Deep Copy
one of the most user requested feature.
Benefits apps with complex data types.
Simplifies coding.
Complete OpenACC 2.6 Specification
Fernanda Foertter
Release Manager
HPC Data Scientist
Oak Ridge National
Laboratory
12. 12
OPENACC IS GROWING
New Member
>50K downloads of PGI
Community Edition since
the 1st release
2K students of OpenACC
Online Course in October
2017
2X Slack Channel growth
since June 2017
Growing User Community