SlideShare a Scribd company logo
Towards Exascale Computing with Fortran 2015
Damian Rouson

Sourcery Institute
Alessandro Fanfarillo

National Center for Atmospheric Research
Outline
Parallelism in Fortran 2008

SPMD

PGAS

Exascale challenges

Fortran 2015 targeting exascale challenges

Events

Teams

Collective subroutines

Failed-image detection

Richer set of atomic subroutines

Scientific kernels
SPMD
Images -> {processes | threads }
...
Image 1 Image 2 Image 3 Image n
SPMD
Images execute asynchronously until they reach a programmer-specified synchronization:

sync all
sync images
allocate/deallocate
Images -> {processes | threads }
...
Image 1 Image 2 Image 3 Image n
SPMD
Images execute asynchronously until they reach a programmer-specified synchronization:

sync all
sync images
allocate/deallocate
Images -> {processes | threads }
...
Image 1 Image 2 Image 3 Image n
SPMD
sync allsync all sync all sync all
Images execute asynchronously until they reach a programmer-specified synchronization:

sync all
sync images
allocate/deallocate
Images -> {processes | threads }
...
Image 1 Image 2 Image 3 Image n
SPMD
sync allsync all sync all sync all
Images execute asynchronously until they reach a programmer-specified synchronization:

sync all
sync images
allocate/deallocate
Images -> {processes | threads }
...
Image 1 Image 2 Image 3 Image n
PGAS
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
...
Image 1 Image 2 Image 3 Image n
PGAS
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
...
Image 1 Image 2 Image 3 Image n
PGAS
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
...
Image 1 Image 2 Image 3 Image n
PGAS
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
...
Image 1 Image 2 Image 3 Image n
PGAS
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
...
Image 1 Image 2 Image 3 Image n
PGAS
The ability to drop the square brackets harbors two important implications:

1. One can easily determine where communication might happen.

2. Parallelized legacy codes need not change where communication is not required.
a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2]
Coarrays integrate with other language features:

1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays.

2.“a” can be an object of derived type.
type(foo), allocatable :: a(:)[:]
integer, parameter :: local_size=100
allocate(a(local_size)[*])
...
Image 1 Image 2 Image 3 Image n
User-Defined Communication Patterns
a(3)[1] = a(4)[2]
...
Image 1 Image 2 Image 3 Image n
User-Defined Communication Patterns
a(3)[1] = a(4)[2]
...
Image 1 Image 2 Image 3 Image n
User-Defined Communication Patterns
Communication between two images that is invoked remotely:

a(3)[1] = a(4)[2]
if (this_image()==3) a(3)[1] = a(4)[2]
Outline
Parallelism in Fortran 2008

SPMD

PGAS

Exascale challenges

Fortran 2015 targeting exascale challenges

Events

Teams

Collective subroutines

Failed-image detection

Richer set of atomic subroutines

Scientific kernels
Exascale Challenges -> Fortran 2015 Responses
High parallelism within a single chip -> events, collectives, atomics

Heterogeneous hardware within a single chip -> events

Higher failure rates -> failed-image detection

Locality Control -> Teams
Source: Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M. and Le, H.,
2014, November. Abstract machine models and proxy architectures for exascale computing. In Hardware-Software Co-Design for High
Performance Computing (Co-HPC), 2014 (pp. 25-32). IEEE.
Outline
Parallelism in Fortran 2008

SPMD

PGAS

Exascale challenges

Fortran 2015 targeting exascale challenges

Events

Teams

Collective subroutines

Failed-image detection

Richer set of atomic subroutines

Scientific kernels
Events
An event is an intrinsic derived type that encapsulates a private integer
(atomic_int_kind) component default-initialized to zero:

User-Controlled Segment Ordering: One image posts an event to a remote
image, e.g., to signal that data is ready for pickup. The remote image might
query the event to obtain its atomic integer count, which holds the number of
posts. 

New statements and subroutines:

use iso_fortran_env, only : event_type
type(event_type), allocatable :: greeting_ready(:)[:]
type(event_type) :: ok_to_overwrite[*]
Events
Image 1 Image 3 Image n
...
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
query
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
query
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
post
query
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
post
query
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
post
query
wait
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
post
query
wait
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
...
Events
Image 1 Image 3 Image n
...
post
query
wait
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
post
...
Events
Performance-Oriented Constrains
The standard obligates the compiler to enforce the following constrains:
Query and wait must be local.
Post and wait are disallowed in do concurrent loops.
Additional Performance-Oriented Tips
Wherever safety permits, query without waiting
Write a spin-query-work loop & construct logical masks describing the remaining work.
Image 1 Image 3 Image n
...
post
query
wait
Image 2
greeting_ready(:)[1] ok_to_overwrite[3]
post
...
Overlap Computation/Communication
Teams
Team 1 Team 2
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 6Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[6]
Team: set of images that can readily execute independently of other images
TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
Collective Subroutines
Each non-failed image of the current team must invoke the collective.

Parallel calculation/communication and result is placed on one or all images.

Optional arguments: STAT, ERRMSG, RESULT_IMAGE, SOURCE_IMAGE.

All collectives have an INTENT(INOUT) argument A holding the input data on
entry and may hold the result on return (depending on RESULT_IMAGE).

No implicit synchronization at beginning/end: allows overlap with other actions.

No image’s data is accessed before that image invokes the collective subroutine.

TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
Extensible Set of Collectives
CO_BROADCAST (A, SOURCE_IMAGE [, STAT, ERRMSG])

CO_MAX (A [, RESULT_IMAGE, STAT, ERRMSG])

CO_MIN (A [, RESULT_IMAGE, STAT, ERRMSG])

CO_SUM (A [, RESULT_IMAGE, STAT, ERRMSG])

CO_REDUCE (A, OPERATOR [, RESULT_IMAGE, STAT, ERRMSG])
TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
Collective Sum: Definition
CO_SUM (A [, RESULT IMAGE, STAT, ERRMSG])

Description Sum elements on the current team of images.

Arguments

A shall be of numeric type. It shall have the same type and type parameters on all images of the
current team. It is an INTENT(INOUT) argument. If it is a scalar, the computed value is equal to a
processor-dependent and image-dependent approximation to the sum of the values of A on all images of
the current team. If it is an array it shall have the same shape on all images of the current team and
each element of the computed value is equal to a processor-dependent and image-dependent
approximation to the sum of all the corresponding elements of A on the images of the current team.

RESULT IMAGE (optional) shall be a scalar of type integer. It is an INTENT(IN) argument. If it is
present, it shall be present on all images of the current team, have the same value on all images of the
current team, and that value shall be the image index of an image of the current team.

STAT (optional) shall be a scalar of type default integer. It is an INTENT(OUT) argument.

ERRMSG (optional) shall be a scalar of type default character. It is an INTENT(INOUT) argument.
TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
Collective Sum: Example
Team 1 Team 2
Image 1 Image 3Image 2
a(1:4)[1]
0 5 3 6
a(1:4)[2]
2 6 5 1
a(1:4)[3]
3 4 9 7
Image 4 Image 6Image 5
a(1:4)[4]
3 4 5 1
a(1:4)[5]
2 4 3 9
a(1:4)[6]
4 4 0 1
Collective Sum: Example
Team 1 Team 2
Image 1 Image 3Image 2
a(1:4)[1]
0 5 3 6
a(1:4)[2]
2 6 5 1
a(1:4)[3]
3 4 9 7
Image 4 Image 6Image 5
a(1:4)[4]
3 4 5 1
a(1:4)[5]
2 4 3 9
a(1:4)[6]
4 4 0 1
Image 1 Image 3Image 2
a(1:4)[1]
5
1
5
1
7
1
4
a(1:4)[2] a(1:4)[3]
Image 4 Image 6Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[6]
5
1
5
1
7
1
4
5
1
5
1
7
1
4
9
1
2
8 11 9
1
2
8 11 9
1
2
8 11
call co_sum(a)
Execution time for user-defined collective sum reductions and Fortran 2015 intrinsic co_sum 

Hopper at NERSC: Cray XE6, peak performance of 1.28 Petaflops/sec, 153,216 compute cores, 

212 Terabytes of memory, and 2 Petabytes of disk.
Performance of Fortran 2008 User-Defined Collective

vs Fortran 2015 co_sum Intrinsic
Execution time for user-defined collective broadcast reductions and Fortran 2015 intrinsic co_sum 

Hopper at NERSC: Cray XE6, peak performance of 1.28 Petaflops/sec, 153,216 compute cores, 

212 Terabytes of memory, and 2 Petabytes of disk.
Performance of Fortran 2008 User-Defined Collective

vs Fortran 2015 co_broadcast Intrinsic
Fault Tolerance
MTBF - one node 1 year 10 years 120 years
MTBF - one
million nodes
30 seconds 5 minutes 1 hour
Exascale machines composed by millions of nodes
Each node composed by thousand of cores
Why do we need fault tolerance?
Checkpointing/Restart cannot be effective with high failure rates
Failed Images
FAIL IMAGE (simulates a failures)

IMAGE_STATUS (checks the status of a specific image)

FAILED_IMAGES (provides the list of failed images)

Coarray operations may fail. STAT= attribute used to check correct
behavior.
Failed Images provides support for failures detection
Failure Detection
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 6Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[6]
Failure Detection
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 6Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[6]
Failure Detection
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 6Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[6]
use iso_fortran_env, only : STAT_FAILED_IMAGE
integer :: status
sync all(stat==status)
if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm()
Outline
Parallelism in Fortran 2008

SPMD

PGAS

Exascale challenges

Fortran 2015 targeting exascale challenges

Events

Teams

Collective subroutines

Failed-image detection

Richer set of atomic subroutines

Scientific kernels
Scientific Kernels (PRK)
Parallel Research Kernels: suite of representative scientific parallel
codes.

Distributed Transpose (not examined)

Peer-to-peer synchronization

2D Stencil (halo exchange)
Sync_p2p Kernel
Stencil with demanding data dependence resolved using sw pipeline techniques.
Events provide the right level of granularity for this case
Sync_p2p on Stampede
0.0e+00
1.0e+03
2.0e+03
3.0e+03
4.0e+03
5.0e+03
6.0e+03
7.0e+03
16 32 64 128
MFlops
Cores
SYNC
EVENTS
Stencil Kernel
2D Stencil operator

One of the most common communication patterns: halo exchange

Events can be used on every side

Sync images/Sync all imply computation and communication to be separate
Stencil Kernel on Stampede
2.0e+04
4.0e+04
6.0e+04
8.0e+04
1.0e+05
1.2e+05
1.4e+05
1.6e+05
1.8e+05
2.0e+05
2.2e+05
2.4e+05
16 32 64 128
MFlops
Cores
SYNC
EVENTS
Compiler Support & Video Tutorials
GCC 7 + OpenCoarrays support

Fortran 2008 parallel features. 

Fortran 2015 events, collectives, atomics.

Unsupported: Teams, Failure detection (pending)

The Sourcery Institute (SI) Linux virtual machine (VM) contains GCC 7,
OpenCoarrays, and numerous useful tools for modern Fortran development:

Download free VM at sourceryinstitutute.org/store

SI VM is used in Stanford EARTH/CME 214 course

View video tutorials at sourceryinstitute.org/videos

Other compilers:

Cray supports Fortran 2008 + 2015 collectives

Intel supports Fortran 2008 parallel features
Related Research Literature
29
Coarray Fortran (CAF)

See http://www.opencoarrays.org/publications

See http://www.sourceryinstitute.org/publications

HPC Applications of CAF

Speedup of 33% relative to MPI-2 on 80K cores for European weather model: Mozdzynski,
G., Hamrud, M., & Wedi, N. (2015). A Partitioned Global Address Space implementation of the European Centre for Medium Range Weather Forecasts
Integrated Forecasting System. International Journal of High Performance Computing Applications, 1094342015576773. 

Performance competitive with MPI-3 for several applications: Garain, S.,
Balsara, D. S., & Reid, J. (2015). Comparing Coarray Fortran (CAF) with MPI for several structured mesh PDE applications. Journal of Computational
Physics. 

Speedup of 50% relative to MPI-2 for plasma fusion code on 130,000 cores: Preissl, R.,
Wichmann, N., Long, B., Shalf, J., Ethier, S., & Koniges, A. (2011, November). Multithreaded global address space communication techniques for
gyrokinetic fusion applications on ultra-scale platforms. In Proceedings of 2011 International Conference for High Performance Computing, Networking,
Storage and Analysis (p. 78). ACM.
Acknowledgements
Tobias Burnus and Andre Vehreschild, GCC developers

Salvatore Filippone, Cranfield University

Zaak Beekman, Princeton University

Valeria Cardellini, University of Rome “Tor Vergata”

Dan Nagle, NCAR

Kenneth Craft and Jeff Hammond, Intel

More Related Content

What's hot

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Performance analysis of real-time and general-purpose operating systems for p...
Performance analysis of real-time and general-purpose operating systems for p...Performance analysis of real-time and general-purpose operating systems for p...
Performance analysis of real-time and general-purpose operating systems for p...
IJECEIAES
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adder
VLSICS Design
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
NAVER Engineering
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
NAVER Engineering
 
reliability workshop
reliability workshopreliability workshop
reliability workshop
Gaurav Dixit
 
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
Taeksoo Kim
 
Expert system design for elastic scattering neutrons optical model using bpnn
Expert system design for elastic scattering neutrons optical model using bpnnExpert system design for elastic scattering neutrons optical model using bpnn
Expert system design for elastic scattering neutrons optical model using bpnn
ijcsa
 
Antenna Physical Characetristics
Antenna Physical CharacetristicsAntenna Physical Characetristics
Antenna Physical Characetristics
Assignmentpedia
 
A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...
Pablo Velarde A
 
3D Multi Object GAN
3D Multi Object GAN3D Multi Object GAN
3D Multi Object GAN
Yu Nishimura
 
Ijcatr04041019
Ijcatr04041019Ijcatr04041019
Ijcatr04041019
Editor IJCATR
 
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
ijscai
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
Design of a novel controller to increase the frequency response of an aerospace
Design of a novel controller to increase the frequency response of an aerospaceDesign of a novel controller to increase the frequency response of an aerospace
Design of a novel controller to increase the frequency response of an aerospace
IAEME Publication
 
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
Waqas Tariq
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Fundamentals of Image Processing & Computer Vision with MATLAB
Fundamentals of Image Processing & Computer Vision with MATLABFundamentals of Image Processing & Computer Vision with MATLAB
Fundamentals of Image Processing & Computer Vision with MATLAB
Ali Ghanbarzadeh
 
Matlab day 1: Introduction to MATLAB
Matlab day 1: Introduction to MATLABMatlab day 1: Introduction to MATLAB
Matlab day 1: Introduction to MATLAB
reddyprasad reddyvari
 
Real-time traffic sign detection and recognition using Raspberry Pi
Real-time traffic sign detection and recognition using Raspberry Pi Real-time traffic sign detection and recognition using Raspberry Pi
Real-time traffic sign detection and recognition using Raspberry Pi
IJECEIAES
 

What's hot (20)

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Performance analysis of real-time and general-purpose operating systems for p...
Performance analysis of real-time and general-purpose operating systems for p...Performance analysis of real-time and general-purpose operating systems for p...
Performance analysis of real-time and general-purpose operating systems for p...
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adder
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
reliability workshop
reliability workshopreliability workshop
reliability workshop
 
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
 
Expert system design for elastic scattering neutrons optical model using bpnn
Expert system design for elastic scattering neutrons optical model using bpnnExpert system design for elastic scattering neutrons optical model using bpnn
Expert system design for elastic scattering neutrons optical model using bpnn
 
Antenna Physical Characetristics
Antenna Physical CharacetristicsAntenna Physical Characetristics
Antenna Physical Characetristics
 
A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...
 
3D Multi Object GAN
3D Multi Object GAN3D Multi Object GAN
3D Multi Object GAN
 
Ijcatr04041019
Ijcatr04041019Ijcatr04041019
Ijcatr04041019
 
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Design of a novel controller to increase the frequency response of an aerospace
Design of a novel controller to increase the frequency response of an aerospaceDesign of a novel controller to increase the frequency response of an aerospace
Design of a novel controller to increase the frequency response of an aerospace
 
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
Simultaneous State and Actuator Fault Estimation With Fuzzy Descriptor PMID a...
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Fundamentals of Image Processing & Computer Vision with MATLAB
Fundamentals of Image Processing & Computer Vision with MATLABFundamentals of Image Processing & Computer Vision with MATLAB
Fundamentals of Image Processing & Computer Vision with MATLAB
 
Matlab day 1: Introduction to MATLAB
Matlab day 1: Introduction to MATLABMatlab day 1: Introduction to MATLAB
Matlab day 1: Introduction to MATLAB
 
Real-time traffic sign detection and recognition using Raspberry Pi
Real-time traffic sign detection and recognition using Raspberry Pi Real-time traffic sign detection and recognition using Raspberry Pi
Real-time traffic sign detection and recognition using Raspberry Pi
 

Viewers also liked

State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
inside-BigData.com
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
HPC Computing Trends
HPC Computing TrendsHPC Computing Trends
HPC Computing Trends
inside-BigData.com
 
Multi-Physics Methods, Modeling, Simulation & Analysis
Multi-Physics Methods, Modeling, Simulation & AnalysisMulti-Physics Methods, Modeling, Simulation & Analysis
Multi-Physics Methods, Modeling, Simulation & Analysis
inside-BigData.com
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmetic
inside-BigData.com
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
inside-BigData.com
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
Intersect360 Top of All Things in HPC Snapshot Analysis
Intersect360 Top of All Things in HPC Snapshot AnalysisIntersect360 Top of All Things in HPC Snapshot Analysis
Intersect360 Top of All Things in HPC Snapshot Analysis
inside-BigData.com
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
inside-BigData.com
 
Introduction to GPUs in HPC
Introduction to GPUs in HPCIntroduction to GPUs in HPC
Introduction to GPUs in HPC
inside-BigData.com
 
A good looking pipeline
A good looking pipelineA good looking pipeline
A good looking pipeline
Nacho Caballero
 
Programar En Fortran
Programar En FortranProgramar En Fortran
Programar En Fortran
Saul Bernal
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
inside-BigData.com
 
Fortran - concise review
Fortran - concise reviewFortran - concise review
Fortran - concise review
Hans Zimermann
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
inside-BigData.com
 
PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法
智啓 出川
 
Maximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal CostMaximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal Cost
inside-BigData.com
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
inside-BigData.com
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysics
inside-BigData.com
 
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIABusiness Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
VARINDIA
 

Viewers also liked (20)

State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
HPC Computing Trends
HPC Computing TrendsHPC Computing Trends
HPC Computing Trends
 
Multi-Physics Methods, Modeling, Simulation & Analysis
Multi-Physics Methods, Modeling, Simulation & AnalysisMulti-Physics Methods, Modeling, Simulation & Analysis
Multi-Physics Methods, Modeling, Simulation & Analysis
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmetic
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Intersect360 Top of All Things in HPC Snapshot Analysis
Intersect360 Top of All Things in HPC Snapshot AnalysisIntersect360 Top of All Things in HPC Snapshot Analysis
Intersect360 Top of All Things in HPC Snapshot Analysis
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Introduction to GPUs in HPC
Introduction to GPUs in HPCIntroduction to GPUs in HPC
Introduction to GPUs in HPC
 
A good looking pipeline
A good looking pipelineA good looking pipeline
A good looking pipeline
 
Programar En Fortran
Programar En FortranProgramar En Fortran
Programar En Fortran
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
Fortran - concise review
Fortran - concise reviewFortran - concise review
Fortran - concise review
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
 
PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法PGI CUDA FortranとGPU最適化ライブラリの一連携法
PGI CUDA FortranとGPU最適化ライブラリの一連携法
 
Maximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal CostMaximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal Cost
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysics
 
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIABusiness Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
 

Similar to Towards Exascale Computing with Fortran 2015

First Experiences with Parallel Application Development in Fortran 2018
First Experiences with Parallel Application Development in Fortran 2018First Experiences with Parallel Application Development in Fortran 2018
First Experiences with Parallel Application Development in Fortran 2018
inside-BigData.com
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learning
IRJET Journal
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
Oğul Göçmen
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
ankit_ppt
 
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET Journal
 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET Journal
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
cscpconf
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
ssuserb4d806
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
cscpconf
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
csandit
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
csandit
 
Y1 gd engine_terminology
Y1 gd engine_terminologyY1 gd engine_terminology
Y1 gd engine_terminology
Shaz Riches
 
Y1 gd engine_terminology
Y1 gd engine_terminologyY1 gd engine_terminology
Y1 gd engine_terminology
Shaz Riches
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
aciijournal
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
aciijournal
 
CE344L-200365-Lab7.pdf
CE344L-200365-Lab7.pdfCE344L-200365-Lab7.pdf
CE344L-200365-Lab7.pdf
UmarMustafa13
 
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
IRJET Journal
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
Andrew Lovett-Barron
 

Similar to Towards Exascale Computing with Fortran 2015 (20)

First Experiences with Parallel Application Development in Fortran 2018
First Experiences with Parallel Application Development in Fortran 2018First Experiences with Parallel Application Development in Fortran 2018
First Experiences with Parallel Application Development in Fortran 2018
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learning
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
Y1 gd engine_terminology
Y1 gd engine_terminologyY1 gd engine_terminology
Y1 gd engine_terminology
 
Y1 gd engine_terminology
Y1 gd engine_terminologyY1 gd engine_terminology
Y1 gd engine_terminology
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
CE344L-200365-Lab7.pdf
CE344L-200365-Lab7.pdfCE344L-200365-Lab7.pdf
CE344L-200365-Lab7.pdf
 
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
IRJET- An Acute Method of Encryption & Decryption by using Histograms and Che...
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
 

More from inside-BigData.com

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Towards Exascale Computing with Fortran 2015

  • 1. Towards Exascale Computing with Fortran 2015 Damian Rouson Sourcery Institute Alessandro Fanfarillo National Center for Atmospheric Research
  • 2. Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges Fortran 2015 targeting exascale challenges Events Teams Collective subroutines Failed-image detection Richer set of atomic subroutines Scientific kernels
  • 4. ... Image 1 Image 2 Image 3 Image n SPMD Images execute asynchronously until they reach a programmer-specified synchronization: sync all sync images allocate/deallocate Images -> {processes | threads }
  • 5. ... Image 1 Image 2 Image 3 Image n SPMD Images execute asynchronously until they reach a programmer-specified synchronization: sync all sync images allocate/deallocate Images -> {processes | threads }
  • 6. ... Image 1 Image 2 Image 3 Image n SPMD sync allsync all sync all sync all Images execute asynchronously until they reach a programmer-specified synchronization: sync all sync images allocate/deallocate Images -> {processes | threads }
  • 7. ... Image 1 Image 2 Image 3 Image n SPMD sync allsync all sync all sync all Images execute asynchronously until they reach a programmer-specified synchronization: sync all sync images allocate/deallocate Images -> {processes | threads }
  • 8. ... Image 1 Image 2 Image 3 Image n PGAS a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type.
  • 9. ... Image 1 Image 2 Image 3 Image n PGAS a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type.
  • 10. ... Image 1 Image 2 Image 3 Image n PGAS a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type.
  • 11. ... Image 1 Image 2 Image 3 Image n PGAS a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type.
  • 12. ... Image 1 Image 2 Image 3 Image n PGAS a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type.
  • 13. ... Image 1 Image 2 Image 3 Image n PGAS The ability to drop the square brackets harbors two important implications: 1. One can easily determine where communication might happen. 2. Parallelized legacy codes need not change where communication is not required. a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] a(1)=a(2)[2] Coarrays integrate with other language features: 1. All the usual Fortran 90 array syntax applies to Fortran 2008 coarrays. 2.“a” can be an object of derived type. type(foo), allocatable :: a(:)[:] integer, parameter :: local_size=100 allocate(a(local_size)[*])
  • 14. ... Image 1 Image 2 Image 3 Image n User-Defined Communication Patterns a(3)[1] = a(4)[2]
  • 15. ... Image 1 Image 2 Image 3 Image n User-Defined Communication Patterns a(3)[1] = a(4)[2]
  • 16. ... Image 1 Image 2 Image 3 Image n User-Defined Communication Patterns Communication between two images that is invoked remotely: a(3)[1] = a(4)[2] if (this_image()==3) a(3)[1] = a(4)[2]
  • 17. Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges Fortran 2015 targeting exascale challenges Events Teams Collective subroutines Failed-image detection Richer set of atomic subroutines Scientific kernels
  • 18. Exascale Challenges -> Fortran 2015 Responses High parallelism within a single chip -> events, collectives, atomics Heterogeneous hardware within a single chip -> events Higher failure rates -> failed-image detection Locality Control -> Teams Source: Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M. and Le, H., 2014, November. Abstract machine models and proxy architectures for exascale computing. In Hardware-Software Co-Design for High Performance Computing (Co-HPC), 2014 (pp. 25-32). IEEE.
  • 19. Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges Fortran 2015 targeting exascale challenges Events Teams Collective subroutines Failed-image detection Richer set of atomic subroutines Scientific kernels
  • 20. Events An event is an intrinsic derived type that encapsulates a private integer (atomic_int_kind) component default-initialized to zero: User-Controlled Segment Ordering: One image posts an event to a remote image, e.g., to signal that data is ready for pickup. The remote image might query the event to obtain its atomic integer count, which holds the number of posts. New statements and subroutines: use iso_fortran_env, only : event_type type(event_type), allocatable :: greeting_ready(:)[:] type(event_type) :: ok_to_overwrite[*]
  • 21. Events Image 1 Image 3 Image n ... Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 22. Events Image 1 Image 3 Image n ... Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 23. Events Image 1 Image 3 Image n ... query Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 24. Events Image 1 Image 3 Image n ... query Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 25. Events Image 1 Image 3 Image n ... post query Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 26. Events Image 1 Image 3 Image n ... post query Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 27. Events Image 1 Image 3 Image n ... post query wait Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 28. Events Image 1 Image 3 Image n ... post query wait Image 2 greeting_ready(:)[1] ok_to_overwrite[3] ...
  • 29. Events Image 1 Image 3 Image n ... post query wait Image 2 greeting_ready(:)[1] ok_to_overwrite[3] post ...
  • 30. Events Performance-Oriented Constrains The standard obligates the compiler to enforce the following constrains: Query and wait must be local. Post and wait are disallowed in do concurrent loops. Additional Performance-Oriented Tips Wherever safety permits, query without waiting Write a spin-query-work loop & construct logical masks describing the remaining work. Image 1 Image 3 Image n ... post query wait Image 2 greeting_ready(:)[1] ok_to_overwrite[3] post ...
  • 32. Teams Team 1 Team 2 Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 6Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[6] Team: set of images that can readily execute independently of other images TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
  • 33. Collective Subroutines Each non-failed image of the current team must invoke the collective. Parallel calculation/communication and result is placed on one or all images. Optional arguments: STAT, ERRMSG, RESULT_IMAGE, SOURCE_IMAGE. All collectives have an INTENT(INOUT) argument A holding the input data on entry and may hold the result on return (depending on RESULT_IMAGE). No implicit synchronization at beginning/end: allows overlap with other actions. No image’s data is accessed before that image invokes the collective subroutine. TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
  • 34. Extensible Set of Collectives CO_BROADCAST (A, SOURCE_IMAGE [, STAT, ERRMSG]) CO_MAX (A [, RESULT_IMAGE, STAT, ERRMSG]) CO_MIN (A [, RESULT_IMAGE, STAT, ERRMSG]) CO_SUM (A [, RESULT_IMAGE, STAT, ERRMSG]) CO_REDUCE (A, OPERATOR [, RESULT_IMAGE, STAT, ERRMSG]) TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
  • 35. Collective Sum: Definition CO_SUM (A [, RESULT IMAGE, STAT, ERRMSG]) Description Sum elements on the current team of images. Arguments A shall be of numeric type. It shall have the same type and type parameters on all images of the current team. It is an INTENT(INOUT) argument. If it is a scalar, the computed value is equal to a processor-dependent and image-dependent approximation to the sum of the values of A on all images of the current team. If it is an array it shall have the same shape on all images of the current team and each element of the computed value is equal to a processor-dependent and image-dependent approximation to the sum of all the corresponding elements of A on the images of the current team. RESULT IMAGE (optional) shall be a scalar of type integer. It is an INTENT(IN) argument. If it is present, it shall be present on all images of the current team, have the same value on all images of the current team, and that value shall be the image index of an image of the current team. STAT (optional) shall be a scalar of type default integer. It is an INTENT(OUT) argument. ERRMSG (optional) shall be a scalar of type default character. It is an INTENT(INOUT) argument. TS 18508 Additional Parallel Features in Fortran WG5/N2027 , August 22, 2014
  • 36. Collective Sum: Example Team 1 Team 2 Image 1 Image 3Image 2 a(1:4)[1] 0 5 3 6 a(1:4)[2] 2 6 5 1 a(1:4)[3] 3 4 9 7 Image 4 Image 6Image 5 a(1:4)[4] 3 4 5 1 a(1:4)[5] 2 4 3 9 a(1:4)[6] 4 4 0 1
  • 37. Collective Sum: Example Team 1 Team 2 Image 1 Image 3Image 2 a(1:4)[1] 0 5 3 6 a(1:4)[2] 2 6 5 1 a(1:4)[3] 3 4 9 7 Image 4 Image 6Image 5 a(1:4)[4] 3 4 5 1 a(1:4)[5] 2 4 3 9 a(1:4)[6] 4 4 0 1 Image 1 Image 3Image 2 a(1:4)[1] 5 1 5 1 7 1 4 a(1:4)[2] a(1:4)[3] Image 4 Image 6Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[6] 5 1 5 1 7 1 4 5 1 5 1 7 1 4 9 1 2 8 11 9 1 2 8 11 9 1 2 8 11 call co_sum(a)
  • 38. Execution time for user-defined collective sum reductions and Fortran 2015 intrinsic co_sum Hopper at NERSC: Cray XE6, peak performance of 1.28 Petaflops/sec, 153,216 compute cores, 212 Terabytes of memory, and 2 Petabytes of disk. Performance of Fortran 2008 User-Defined Collective vs Fortran 2015 co_sum Intrinsic
  • 39. Execution time for user-defined collective broadcast reductions and Fortran 2015 intrinsic co_sum Hopper at NERSC: Cray XE6, peak performance of 1.28 Petaflops/sec, 153,216 compute cores, 212 Terabytes of memory, and 2 Petabytes of disk. Performance of Fortran 2008 User-Defined Collective vs Fortran 2015 co_broadcast Intrinsic
  • 40. Fault Tolerance MTBF - one node 1 year 10 years 120 years MTBF - one million nodes 30 seconds 5 minutes 1 hour Exascale machines composed by millions of nodes Each node composed by thousand of cores Why do we need fault tolerance? Checkpointing/Restart cannot be effective with high failure rates
  • 41. Failed Images FAIL IMAGE (simulates a failures) IMAGE_STATUS (checks the status of a specific image) FAILED_IMAGES (provides the list of failed images) Coarray operations may fail. STAT= attribute used to check correct behavior. Failed Images provides support for failures detection
  • 42. Failure Detection Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 6Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[6]
  • 43. Failure Detection Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 6Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[6]
  • 44. Failure Detection Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 6Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[6] use iso_fortran_env, only : STAT_FAILED_IMAGE integer :: status sync all(stat==status) if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm()
  • 45. Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges Fortran 2015 targeting exascale challenges Events Teams Collective subroutines Failed-image detection Richer set of atomic subroutines Scientific kernels
  • 46. Scientific Kernels (PRK) Parallel Research Kernels: suite of representative scientific parallel codes. Distributed Transpose (not examined) Peer-to-peer synchronization 2D Stencil (halo exchange)
  • 47. Sync_p2p Kernel Stencil with demanding data dependence resolved using sw pipeline techniques. Events provide the right level of granularity for this case
  • 49. Stencil Kernel 2D Stencil operator One of the most common communication patterns: halo exchange Events can be used on every side Sync images/Sync all imply computation and communication to be separate
  • 50. Stencil Kernel on Stampede 2.0e+04 4.0e+04 6.0e+04 8.0e+04 1.0e+05 1.2e+05 1.4e+05 1.6e+05 1.8e+05 2.0e+05 2.2e+05 2.4e+05 16 32 64 128 MFlops Cores SYNC EVENTS
  • 51. Compiler Support & Video Tutorials GCC 7 + OpenCoarrays support Fortran 2008 parallel features. Fortran 2015 events, collectives, atomics. Unsupported: Teams, Failure detection (pending) The Sourcery Institute (SI) Linux virtual machine (VM) contains GCC 7, OpenCoarrays, and numerous useful tools for modern Fortran development: Download free VM at sourceryinstitutute.org/store SI VM is used in Stanford EARTH/CME 214 course View video tutorials at sourceryinstitute.org/videos Other compilers: Cray supports Fortran 2008 + 2015 collectives Intel supports Fortran 2008 parallel features
  • 52. Related Research Literature 29 Coarray Fortran (CAF) See http://www.opencoarrays.org/publications See http://www.sourceryinstitute.org/publications HPC Applications of CAF Speedup of 33% relative to MPI-2 on 80K cores for European weather model: Mozdzynski, G., Hamrud, M., & Wedi, N. (2015). A Partitioned Global Address Space implementation of the European Centre for Medium Range Weather Forecasts Integrated Forecasting System. International Journal of High Performance Computing Applications, 1094342015576773. Performance competitive with MPI-3 for several applications: Garain, S., Balsara, D. S., & Reid, J. (2015). Comparing Coarray Fortran (CAF) with MPI for several structured mesh PDE applications. Journal of Computational Physics. Speedup of 50% relative to MPI-2 for plasma fusion code on 130,000 cores: Preissl, R., Wichmann, N., Long, B., Shalf, J., Ethier, S., & Koniges, A. (2011, November). Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p. 78). ACM.
  • 53. Acknowledgements Tobias Burnus and Andre Vehreschild, GCC developers Salvatore Filippone, Cranfield University Zaak Beekman, Princeton University Valeria Cardellini, University of Rome “Tor Vergata” Dan Nagle, NCAR Kenneth Craft and Jeff Hammond, Intel