SlideShare a Scribd company logo
1 of 17
Download to read offline
Jeff Larkin, NVIDIA
OPENACC FEATURES FUTURE DIRECTIONS
2
AGENDA
• Why Compiler Directives?
• OpenACC and OpenMP Status and Roadmap
3
Why compiler directives?
4
WHAT ARE COMPILER DIRECTIVES?
program myscience
... serial code ...
do k = 1,n1
do i = 1,n2
...
enddo
enddo
...
end program myscience
CPU GPU
Your original
Fortran, C, or C++
code
Insert portable compiler
directives
Compiler parallelizes code and
manages data movement
Programmer optimizes
incrementally
Designed for multi-core CPUs,
GPUs & many-core Accelerators
program myscience
... serial code ...
!$acc parallel loop
do k = 1,n1
do i = 1,n2
...
enddo
enddo
...
end program myscience
5
WHY USE COMPILER DIRECTIVES?
Single Source Code
No need to maintain multiple code paths
High Level
Abstract away device details, focus on expressing the parallelism and data
locality
Low Learning Curve
Programmer remains in same language and adds directives to existing code
Rapid Development
Fewer code changes means faster development
6
1,000,000’s
Early Adopters
Research
Universities
Supercomputing Centers
Oil & Gas
WHY SUPPORT COMPILER DIRECTIVES?
100,000’s
2004 Present
Increased Usability, Broader Adoption
CAE
CFD
Finance
Rendering
Data Analytics
Life Sciences
Defense
Weather
Climate
Plasma Physics
7
3 WAYS TO ACCELERATE APPLICATIONS
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
Compiler
Directives
Maximum
Flexibility
Easily Accelerate
Applications
8
OpenACC & OpenMP Status
9
COMPILER DIRECTIVES TODAY
OpenACC
Focused solely on compiler
directives for accelerators
Quickly moving
Performance Portability a
primary consideration
Descriptive approach to parallel
programming
OpenMP
Addresses a broad range of
parallel programming challenges
More measured approach
Less focus on performance
portability
Prescriptive approach to parallel
programming
Choice of two great options
10
OPENACC IMPLEMENTATIONS
OpenACC 2.0
launched
December 2013
OpenACC 2.0
launched
January 2014
OpenACC 2.0
Targeted for
Late 2014
Tremendous interest in academia:
• accULL – U. La Laguna/EPCC
• Omni – U. of Tsukuba
• OpenARC – Oak Ridge NL
• OpenUH – U. of Houston
• RoseACC – U of DE & LLNL
Compilers
Debuggers and Profilers
OpenACC 2.0
Targeted for
2015
11 11
OPENACC 2.NEXT
Deep-Copy
Support for more complex data structures
Improves C++ class support
Tools API
Standard API for profiling tools
Default to present_or behavior for data clauses
Tell us what else you need feedback@openacc.org
12
WHAT IS DEEP COPY?
On Host
Shallow Copy
Deep Copy
Deep copy is required for structures with pointer members to work properly, but also useful
where only part of a structure is needed on the device.
• Users can do this today manually, but we want it to be automatic
13
SOLVING DEEP COPY
Inform the compiler of the
shape of the data behind the
pointer.
struct T1 {
int N;
float* A;
#pragma acc policy(“shape") 
shape( A[0:N] )
};
Develop policies for how the
data should be relocated.
struct T1 {
int N;
float* A;
#pragma acc policy(“shape") 
shape( A[0:N] )
#pragma acc policy("boundary") 
update(A[0:1],A[N-1:1)
};
Self-describing Structures Data Policies
*Syntax and functionality subject to change
14
OPENACC & UNIFIED MEMORY
With Unified Memory (or equivalent) shallow copying no longer
results in runtime error, so does deep-copy become irrelevant?
Dereferencing the copied pointer no-longer results in error.
Hardware support for Unified Memory would allow for data to migrate on
access.
Deep Copy support in OpenACC is still beneficial because
Data can be moved before it’s needed without penalty
Code will be portable to devices without Unified Memory support
17 17
THE “MERGE”
Both specifications bring value to the market
OpenACC – more focused, more agile, descriptive approach
OpenMP – more cautious, more mature overall, prescriptive approach
The specifications are constantly converging, even if they never
eventually merge.
Use the best tool available to you today, the restructuring and
knowledge from each are portable.
In other words: the syntax isn’t the hard part.
Does having 2 specifications cause confusion?
19
SHOULD I JUST WAIT FOR OPENMP4
SUPPORT?
No!
THANK YOU

More Related Content

What's hot

OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC
 
OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC
 
OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC
 
OpenACC Monthly Highlights: August 2021
OpenACC Monthly Highlights: August 2021OpenACC Monthly Highlights: August 2021
OpenACC Monthly Highlights: August 2021OpenACC
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018NVIDIA
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC
 
OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017NVIDIA
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionNVIDIA
 
OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018NVIDIA
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Waitinside-BigData.com
 
HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018NVIDIA
 

What's hot (20)

OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019
 
OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022OpenACC Monthly Highlights: February 2022
OpenACC Monthly Highlights: February 2022
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021OpenACC Monthly Highlights: March 2021
OpenACC Monthly Highlights: March 2021
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021
 
OpenACC Monthly Highlights: August 2021
OpenACC Monthly Highlights: August 2021OpenACC Monthly Highlights: August 2021
OpenACC Monthly Highlights: August 2021
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI Revolution
 
OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Wait
 
HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018
 

Viewers also liked

UberCloud: From Experiment to Marketplace
UberCloud: From Experiment to MarketplaceUberCloud: From Experiment to Marketplace
UberCloud: From Experiment to Marketplaceinside-BigData.com
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Aheadinside-BigData.com
 
Nimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing MiddleNimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing Middleinside-BigData.com
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...inside-BigData.com
 
Introduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K ComputerIntroduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K Computerinside-BigData.com
 
The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSinside-BigData.com
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environmentinside-BigData.com
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputinginside-BigData.com
 
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPCPerforming Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPCinside-BigData.com
 
Epiphany-V: A 1024 processor 64- bit RISC System-On-Chip
Epiphany-V: A 1024 processor 64- bit RISC System-On-ChipEpiphany-V: A 1024 processor 64- bit RISC System-On-Chip
Epiphany-V: A 1024 processor 64- bit RISC System-On-Chipinside-BigData.com
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascaleinside-BigData.com
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 

Viewers also liked (16)

UberCloud: From Experiment to Marketplace
UberCloud: From Experiment to MarketplaceUberCloud: From Experiment to Marketplace
UberCloud: From Experiment to Marketplace
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
Nimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing MiddleNimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing Middle
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
 
Machine learning
Machine learningMachine learning
Machine learning
 
IDC HPC Market Update
IDC HPC Market UpdateIDC HPC Market Update
IDC HPC Market Update
 
Introduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K ComputerIntroduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K Computer
 
The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPS
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
 
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPCPerforming Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
 
Epiphany-V: A 1024 processor 64- bit RISC System-On-Chip
Epiphany-V: A 1024 processor 64- bit RISC System-On-ChipEpiphany-V: A 1024 processor 64- bit RISC System-On-Chip
Epiphany-V: A 1024 processor 64- bit RISC System-On-Chip
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 

Similar to The Past, Present, and Future of OpenACC

Cockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirCockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirHideki Takase
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkAdamRobertsIBM
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelMyNOG
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
Synopsis on online shopping by sudeep singh
Synopsis on online shopping by  sudeep singhSynopsis on online shopping by  sudeep singh
Synopsis on online shopping by sudeep singhSudeep Singh
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 

Similar to The Past, Present, and Future of OpenACC (20)

Cockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirCockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with Elixir
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, Intel
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
Synopsis on online shopping by sudeep singh
Synopsis on online shopping by  sudeep singhSynopsis on online shopping by  sudeep singh
Synopsis on online shopping by sudeep singh
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

The Past, Present, and Future of OpenACC

  • 1. Jeff Larkin, NVIDIA OPENACC FEATURES FUTURE DIRECTIONS
  • 2. 2 AGENDA • Why Compiler Directives? • OpenACC and OpenMP Status and Roadmap
  • 4. 4 WHAT ARE COMPILER DIRECTIVES? program myscience ... serial code ... do k = 1,n1 do i = 1,n2 ... enddo enddo ... end program myscience CPU GPU Your original Fortran, C, or C++ code Insert portable compiler directives Compiler parallelizes code and manages data movement Programmer optimizes incrementally Designed for multi-core CPUs, GPUs & many-core Accelerators program myscience ... serial code ... !$acc parallel loop do k = 1,n1 do i = 1,n2 ... enddo enddo ... end program myscience
  • 5. 5 WHY USE COMPILER DIRECTIVES? Single Source Code No need to maintain multiple code paths High Level Abstract away device details, focus on expressing the parallelism and data locality Low Learning Curve Programmer remains in same language and adds directives to existing code Rapid Development Fewer code changes means faster development
  • 6. 6 1,000,000’s Early Adopters Research Universities Supercomputing Centers Oil & Gas WHY SUPPORT COMPILER DIRECTIVES? 100,000’s 2004 Present Increased Usability, Broader Adoption CAE CFD Finance Rendering Data Analytics Life Sciences Defense Weather Climate Plasma Physics
  • 7. 7 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries “Drop-in” Acceleration Programming Languages Compiler Directives Maximum Flexibility Easily Accelerate Applications
  • 9. 9 COMPILER DIRECTIVES TODAY OpenACC Focused solely on compiler directives for accelerators Quickly moving Performance Portability a primary consideration Descriptive approach to parallel programming OpenMP Addresses a broad range of parallel programming challenges More measured approach Less focus on performance portability Prescriptive approach to parallel programming Choice of two great options
  • 10. 10 OPENACC IMPLEMENTATIONS OpenACC 2.0 launched December 2013 OpenACC 2.0 launched January 2014 OpenACC 2.0 Targeted for Late 2014 Tremendous interest in academia: • accULL – U. La Laguna/EPCC • Omni – U. of Tsukuba • OpenARC – Oak Ridge NL • OpenUH – U. of Houston • RoseACC – U of DE & LLNL Compilers Debuggers and Profilers OpenACC 2.0 Targeted for 2015
  • 11. 11 11 OPENACC 2.NEXT Deep-Copy Support for more complex data structures Improves C++ class support Tools API Standard API for profiling tools Default to present_or behavior for data clauses Tell us what else you need feedback@openacc.org
  • 12. 12 WHAT IS DEEP COPY? On Host Shallow Copy Deep Copy Deep copy is required for structures with pointer members to work properly, but also useful where only part of a structure is needed on the device. • Users can do this today manually, but we want it to be automatic
  • 13. 13 SOLVING DEEP COPY Inform the compiler of the shape of the data behind the pointer. struct T1 { int N; float* A; #pragma acc policy(“shape") shape( A[0:N] ) }; Develop policies for how the data should be relocated. struct T1 { int N; float* A; #pragma acc policy(“shape") shape( A[0:N] ) #pragma acc policy("boundary") update(A[0:1],A[N-1:1) }; Self-describing Structures Data Policies *Syntax and functionality subject to change
  • 14. 14 OPENACC & UNIFIED MEMORY With Unified Memory (or equivalent) shallow copying no longer results in runtime error, so does deep-copy become irrelevant? Dereferencing the copied pointer no-longer results in error. Hardware support for Unified Memory would allow for data to migrate on access. Deep Copy support in OpenACC is still beneficial because Data can be moved before it’s needed without penalty Code will be portable to devices without Unified Memory support
  • 15. 17 17 THE “MERGE” Both specifications bring value to the market OpenACC – more focused, more agile, descriptive approach OpenMP – more cautious, more mature overall, prescriptive approach The specifications are constantly converging, even if they never eventually merge. Use the best tool available to you today, the restructuring and knowledge from each are portable. In other words: the syntax isn’t the hard part. Does having 2 specifications cause confusion?
  • 16. 19 SHOULD I JUST WAIT FOR OPENMP4 SUPPORT? No!