OpenMP and NVIDIA
OpenMP is the dominant standard for directive-based parallel
programming.
NVIDIA joined OpenMP in 2011 t...
Why does NVIDIA care
about OpenMP?
3 Ways to Accelerate Applications
Applications
Libraries

Compiler
Directives

Programming
Languages

“Drop-in”
Accelerati...
Reaching a Broader Set of Developers

1,000,000’s

Universities
Supercomputing Centers
Oil & Gas

100,000’s
Research

Earl...
Introduction to
OpenMP
Teams/Distribute
OpenMP PARALLEL FOR
Executes the iterations of the
next for loop in parallel across
a team of threads.

#pragma omp parall...
OpenMP TARGET PARALLEL FOR
Offloads data and execution to
a target device, then
Executes the iterations of the
next for lo...
OpenMP 4.0 TEAMS/DISTRIBUTE
Creates a league of teams on
the target device, distributes
blocks of work among those
teams, ...
OpenMP 4.0 Teams Distribute Parallel For
#pragma omp target
#pragma omp parallel for reduction(+:sum)
for (i=0; i< ??;i++)...
OpenMP 4.0 Teams Distribute Parallel For
#pragma omp target teams
#pragma omp distribute parallel for
#pragma omp parallel...
OMP + NV: We’re not
done yet!
OMP + NV: We’re not done yet!
Hardware parallelism is not going away, programmers demand a
simple, portable way to use it....
Thank You
SC13: OpenMP and NVIDIA
Upcoming SlideShare
Loading in...5
×

SC13: OpenMP and NVIDIA

476

Published on

This talk was given at the OpenMP booth at Supercomputing 2013.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
476
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SC13: OpenMP and NVIDIA

  1. 1. OpenMP and NVIDIA Jeff Larkin NVIDIA Developer Technologies
  2. 2. OpenMP and NVIDIA OpenMP is the dominant standard for directive-based parallel programming. NVIDIA joined OpenMP in 2011 to contribute to discussions around parallel accelerators. NVIDIA proposed the TEAMS construct for accelerators in 2012 OpenMP 4.0 with accelerator support released in 2013
  3. 3. Why does NVIDIA care about OpenMP?
  4. 4. 3 Ways to Accelerate Applications Applications Libraries Compiler Directives Programming Languages “Drop-in” Acceleration Easily Accelerate Applications Maximum Flexibility
  5. 5. Reaching a Broader Set of Developers 1,000,000’s Universities Supercomputing Centers Oil & Gas 100,000’s Research Early Adopters 2004 Time CAE CFD Finance Rendering Data Analytics Life Sciences Defense Weather Climate Plasma Physics Present
  6. 6. Introduction to OpenMP Teams/Distribute
  7. 7. OpenMP PARALLEL FOR Executes the iterations of the next for loop in parallel across a team of threads. #pragma omp parallel for for (i=0; i<N; i++) p[i] = v1[i] * v2[i];
  8. 8. OpenMP TARGET PARALLEL FOR Offloads data and execution to a target device, then Executes the iterations of the next for loop in parallel across a team of threads. #pragma omp target #pragma omp parallel for for (i=0; i<N; i++) p[i] = v1[i] * v2[i];
  9. 9. OpenMP 4.0 TEAMS/DISTRIBUTE Creates a league of teams on the target device, distributes blocks of work among those teams, and executes the remaining work in parallel within each team This code is portable whether 1 team or many teams are used. #pragma omp target teams #pragma omp distribute parallel for reduction(+:sum) for (i=0; i<N; i++) sum += B[i] * C[i];
  10. 10. OpenMP 4.0 Teams Distribute Parallel For #pragma omp target #pragma omp parallel for reduction(+:sum) for (i=0; i< ??;i++) sum += B[i] **C[i]; N; i++) sum += B[i] C[i];
  11. 11. OpenMP 4.0 Teams Distribute Parallel For #pragma omp target teams #pragma omp distribute parallel for #pragma omp parallel for reduction(+:sum) for (i=0; i< ??; i++) sum += B[i] * C[i]; The programmer doesn’t have to #pragma omp parallel for reduction(+:sum) think about how the loop is for (i=??; i< ??; i++) sum += B[i] * C[i]; decomposed! #pragma omp parallel for reduction(+:sum) for (i=??; i< ??; i++) sum += B[i] * C[i]; #pragma omp parallel for reduction(+:sum) for (i=??; i< N; i++) sum += B[i] * C[i];
  12. 12. OMP + NV: We’re not done yet!
  13. 13. OMP + NV: We’re not done yet! Hardware parallelism is not going away, programmers demand a simple, portable way to use it. OpenMP 4.0 is just the first step toward a portable standard for directive-based acceleration We will continue to work with OpenMP to address the challenges of parallel computing Improved Interoperability Improved Portability Improved Expressibility
  14. 14. Thank You

×