SlideShare a Scribd company logo
1 of 63
Download to read offline
First Experiences with
Application Development
with Fortran 2018
Damian Rouson
Overview
www.yourwebsite.com
Fortran 2018 in a Nutshell

 Results
ICAR & Coarray ICAR
 Conclusions
 WRF-Hydro
Overview
www.yourwebsite.com
Fortran 2018 in a Nutshell
— Motivation: exascale challenges
— Background: SPMD & PGAS in Fortran 2018
— Beyond CAF


 Results
ICAR & Coarray ICAR
 Conclusions

WRF-Hydro
EXASCALE
CHALLENGES
& Fortran 2018
Response
Slide / 01
Billion-way concurrency with high levels of on-chip parallelism
— events
— collective subroutines
— richer set of atomic subroutines
— teams




Expensive data movement
— one-sided communication
— teams (locality control)
Higher failure rates
— failed-image detection
Heterogeneous hardware on processor
— events
Source: Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan,
C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S.,
Kelly, S.M. and Le, H., 2014, November. Abstract machine
models and proxy architectures for exascale computing. In
Hardware-Software Co-Design for High Performance Computing
(Co-HPC), 2014 (pp. 25-32). IEEE.
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
Image 1 Image 2 Image N
Images
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
Image 1 Image 2 Image N
Images
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
...
Image 1 Image 2 Image N
Images {processes | threads}
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
...
Image 1 Image 2 Image N
Images {processes | threads}
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
...
sync all
Image 1 Image 2 Image N
Images {processes | threads}
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
...
sync allsync all
Image 1 Image 2 Image N
Images {processes | threads}
Single Program Multiple Data
(SPMD)
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
...
sync allsync all sync all
Image 1 Image 2 Image N
Images {processes | threads}
Images execute asynchronously up to a programmer-specified synchronization:
sync all
sync images
allocate/deallocate
Partitioned Global Address Space
(PGAS)
Partitioned Global Address Space
(PGAS)

 Communicate objects
Fortran 90 array syntax works
on local data.
Coarrays integrate with other
languages features:
Partitioned Global Address Space
(PGAS)
The ability to drop the square
brackets harbors two important
implications:
Easily determine where
communication occurs.


Parallelize legacy code with
minimal revisions.

 Communicate objects
Fortran 90 array syntax works
on local data.
Coarrays integrate with other
languages features:
Sign
Image 1 Image 2 Image 3
if (me<n) a(1:2) = a(2:3)[me+1]
Sign
Image 1 Image 2 Image 3
if (me<n) a(1:2) = a(2:3)[me+1]
Sign
Image 1 Image 2 Image 3
if (me<n) a(1:2) = a(2:3)[me+1]
Sign
Image 1 Image 2 Image 3
if (me<n) a(1:2) = a(2:3)[me+1]
Sign
Image 1 Image 2 Image 3
if (me<n) a(1:2) = a(2:3)[me+1]
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
Image 1 Image 2 Image 3
if (me==1) a(5)[2] = a(5)[3]
Sign
Far far away, behind the word
mountains, far from the countries
Vokalia and Consonantia, there live
the blind texts. Separated they live
in Bookmarksgrove.
Image 1 Image 2 Image 3
if (me==1) a(5)[2] = a(5)[3]
Segment Ordering:
EventsAn intrinsic module provides the derived type event_type,
which encapsulates an atomic_int_kind integer
component default-initialized to zero.
An image increments the event count on a
remote image by executing event post.

 The remote image obtains the post count
by executing event_query.
Image Control Side Effect
event post x atomic_add 1
event_query defines count
event wait x atomic_add -1
Events
Hello, world!
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world! query
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world! query
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
post
query
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
post
query
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
post
query
wait
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
post
query
wait
greeting_ready(2:n)[1] ok_to_overwrite[3]
...
Events
Hello, world!
post
query
wait
greeting_ready(2:n)[1] ok_to_overwrite[3]
post
...
Performance-oriented constraints:
— Query and wait must be local.
— Post and wait are disallowed in do concurrent constructs.
Events
Hello, world!
post
query
wait
greeting_ready(2:n)[1] ok_to_overwrite[3]
post
...
Performance-oriented constraints:
— Query and wait must be local.
— Post and wait are disallowed in do concurrent constructs.
Events
Hello, world!
post
query
wait
greeting_ready(2:n)[1] ok_to_overwrite[3]
post
...
Pro tips:
— Overlap communication and computation.
— Wherever safety permits, query without waiting.
— Write a spin-query-work loop & build a logical mask describing the remaining work.
Team 2
a(1:4)[1] a(1:4)[2]
a(1:4)[3]
a(1:4)[4]
A set of images that can readily execute independently of other images
TEAM
Team 1
Team 2
a(1:4)[2]
Image 1 Image 3Image 2 Image 4 Image 5 Image 6
a(1:4)[4] a(1:4)[5] a(1:4)[6]
Collective Subroutines
www.yourwebsite.com
Each non-failed image of the current team must invoke
the collective.




All collectives have intent(inout) argument
A holding the input data and may hold the
result on return, depending on result_image
Optional arguments: stat, errmsg,
result_image, source_image
After parallel calculation/communication, the result
is placed on one or all images.
No implicit synchronization at beginning/end,
which allows for overlap with other actions.


No image’s data is accessed before that
image invokes the collective subroutine.
Extensible Set of
Collective Subroutines
1e-05
0.0001
0.001
0.01
0.1
1
32 64 128 256 512
ExecutionTime(sec)
Num. of images
co_sum_int32
sum_bin_tree_int32
sum_bin_tree_real64
sum_rec_doubling_int32
sum_rec_doubling_real64
sum_alpha_tree_real64
NERSC Hopper: Xeon nodes on Cray XE6 KNL nodes on a Cray XC

Performance
User-Defined vs Intrinsic Collectives
Failure Detection
use iso_fortran_env, only : STAT_FAILED_IMAGE
integer :: status
sync all(stat==status)
if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm()
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[n]
Image n
…
Failure Detection
use iso_fortran_env, only : STAT_FAILED_IMAGE
integer :: status
sync all(stat==status)
if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm()
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[n]
Image n
…
Failure Detection
Image 1 Image 3Image 2
a(1:4)[1] a(1:4)[2] a(1:4)[3]
Image 4 Image 5
a(1:4)[4] a(1:4)[5] a(1:4)[n]
Image n
…
FORTRAN 2018
Failed-Image Detection
FAIL IMAGE (simulates a failures)

IMAGE_STATUS (checks the status of a specific image)

FAILED_IMAGES (provides the list of failed images)

Coarray operations may fail. STAT= attribute used to check correct behavior.
Overview
www.yourwebsite.com
Fortran 2018 in a Nutshell

 Results
ICAR & Coarray ICAR
 Conclusions
PrecipitationTopography
The Climate Downscaling Problem
Climate model (top row) and application needs (bottom row)
Computational Cost
High-res regional model
>10 billion core hours
(CONUS 4km 150yrs 40 scenarios)
The Intermediate Complexity Atmospheric
Research Model (ICAR)
ICAR Wind Field over
Topography
Analytical solution for flow over
topography (right)
90% of the information for 1% of
the computational cost
Core Numerics:
90% of cost = Cloud physics
grid-cells independent
5% of cost = Advection
fully explicit, requires local
neighbor communication
Gutmann et al (2016) J. of Hydrometeorology, 17 p957. doi:10.1175/JHM-D-15-0155.1
ICAR
Intermediate
Complexity
Atmospheric
Research
Animation of
Water vapor (blue) and
Precipitation (Green to Red)
over the contiguous United States
Output timestep : 1hr
Integration timestep : Variable (~30s)
Boundary Conditions: ERA-interim (historical)
Run on 16 cores with OpenMP
Limited scalability
ICAR
Intermediate
Complexity
Atmospheric
Research
Animation of
Water vapor (blue) and
Precipitation (Green to Red)
over the contiguous United States
Output timestep : 1hr
Integration timestep : Variable (~30s)
Boundary Conditions: ERA-interim (historical)
Run on 16 cores with OpenMP
Limited scalability
ICAR comparison to “Full-physics”
atmospheric model (WRF)
Ideal Hill case
ICAR (red) and WRF (blue)
precipitation over an idealized
hill (green)
Real Simulation
WRF (left) and ICAR (right)
Season total precipitation over
Colorado Rockies
ICAR Users and Applications
http://github.com/NCAR/icar.git
Coarray ICAR Mini-App
www.yourwebsite.com
Object-oriented design



Overlaps communication & computation via
one-sided “puts.”
Coarray halo exchanges with 3D, 1st-order upwind
advection (~2000 lines of new code)
Collective broadcast of initial data
Cloud microphysics (~5000 lines of pre-
existing code)

Overview
www.yourwebsite.com
Fortran 2018 in a Nutshell

 Results
ICAR & Coarray ICAR
 Conclusions
 WRF-Hydro
WRF-Hydro
Weather
Research
Forecast —
Hydrological
Model
Currently, ensemble runs involve redundant
initialization calculations that occupy
approximately 30% of the runtime and use only the
processes allocated to one ensemble member.
Our use of Fortran 2018 teams will eliminate the
redundancy and the amortization of that effort
across all of the processes allocated for the full
ensemble of simulations.
ICAR comparison to “Full-physics”
atmospheric model (WRF)
Ideal Hill case
ICAR (red) and WRF (blue)
precipitation over an idealized
hill (green)
Real Simulation
WRF (left) and ICAR (right)
Season total precipitation over
Colorado Rockies
form team(…)
end team
team_number(…)
1 2 3
color
MPI_Barrier(…)
MPI_Comm_Split(…)
MPI_Barrier(…)
Fortran statement
or procedure
MPI procedure
or variable
time
1 2 3
Mapping Fortran Teams onto MPI
Communicator Hand-off
(CAF to MPI)
Communicator Hand-off
(CAF to MPI)

 Language extension
exposes MPI
communicator
CAF compiler embeds
MPI_Init, MPI_Finalize
CAF in the driver seat:
Communicator Hand-off
(CAF to MPI)
MPI under the hood:
Only trivial modification
of MPI application


Future work: amortize
& speed up initialization

 Language extension
exposes MPI
communicator
CAF compiler embeds
MPI_Init, MPI_Finalize
CAF in the driver seat:
Overview
www.yourwebsite.com
Fortran 2018 in a Nutshell

 Results
ICAR & Coarray ICAR
 Conclusions
 WRF-Hydro
Coarray ICAR
Simulation Time
500 x 500 x 20 grid 2000 x 2000 x 20 grid
OpenSHMEM vs MPI
Puts vs gets

Number of processes Number of processes
GCC 6.3 on NCAR “Cheyenne” SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon Broadwell @ 2.3 GHz
Coarray ICAR
Speedup
500 x 500 x 20 grid 2000 x 2000 x 20 grid
Number of processes Number of processes
GCC 6.3 on NCAR “Cheyenne” SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon Broadwell @ 2.3 GHz
Coarray ICAR
Xeon vs KNL
Number of processes Number of processes
Compilers & platforms: GNU on Cheyenne; Cray compiler on Edison (Broadwell), Cori (KNL).
Coarray ICAR
At Scale
Number of processes Number of processes
Cray compiler on Edison.
Conclusions
www.yourwebsite.com
Fortran 2018 is a PGAS language supporting
SPMD programming at scale.


 High productivity pays off: from shared-
memory parallelism to 100,000 cores in
~100 person-hours.
Programming-model agnosticism is a life-
saver.
 CAF interoperates seamlessly with MPI
Acknowledgements & References
Ethan Gutman
Alessandro Fanfarillo
James McCreight Brian Friesen
Rouson, D., Gutmann, E. D., Fanfarillo, A., & Friesen, B. (2017, November). Performance portability of an intermediate-
complexity atmospheric research model in coarray Fortran. In Proceedings of the Second Annual PGAS Applications
Workshop (p. 4). ACM.
Rouson, D., McCreight, J. L., & Fanfarillo, A. (2017, November). Incremental caffeination of a terrestrial hydrological modeling
framework using Fortran 2018 teams. In Proceedings of the Second Annual PGAS Applications Workshop (p. 6). ACM.

More Related Content

Similar to First Experiences with Parallel Application Development in Fortran 2018

Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015inside-BigData.com
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_iankit_ppt
 
Program Comprehension: Past, Present, Future
Program Comprehension: Past, Present, FutureProgram Comprehension: Past, Present, Future
Program Comprehension: Past, Present, FutureJanet Siegmund
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...thahirakabeer
 
Atari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural NetworksAtari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural Networksjohnstamford
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...iCOMMUNITY
 
Neomades crosscompilation
Neomades crosscompilationNeomades crosscompilation
Neomades crosscompilationiCOMMUNITY
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
Why Graphics Is Fast, and What It Can Teach Us About Parallel ProgrammingWhy Graphics Is Fast, and What It Can Teach Us About Parallel Programming
Why Graphics Is Fast, and What It Can Teach Us About Parallel ProgrammingJonathan Ragan-Kelley
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 

Similar to First Experiences with Parallel Application Development in Fortran 2018 (20)

Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
OpenAI Retro Contest
OpenAI Retro ContestOpenAI Retro Contest
OpenAI Retro Contest
 
Program Comprehension: Past, Present, Future
Program Comprehension: Past, Present, FutureProgram Comprehension: Past, Present, Future
Program Comprehension: Past, Present, Future
 
rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...rit seminars-privacy assured outsourcing of image reconstruction services in ...
rit seminars-privacy assured outsourcing of image reconstruction services in ...
 
Atari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural NetworksAtari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural Networks
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 
Lbc data reduction
Lbc data reductionLbc data reduction
Lbc data reduction
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...
Emergent Middleware to Support Interoperability in Mobile Collaborative Appli...
 
Neomades crosscompilation
Neomades crosscompilationNeomades crosscompilation
Neomades crosscompilation
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
Praseed Pai
Praseed PaiPraseed Pai
Praseed Pai
 
How to Make Hand Detector on Native Activity with OpenCV
How to Make Hand Detector on Native Activity with OpenCVHow to Make Hand Detector on Native Activity with OpenCV
How to Make Hand Detector on Native Activity with OpenCV
 
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
Why Graphics Is Fast, and What It Can Teach Us About Parallel ProgrammingWhy Graphics Is Fast, and What It Can Teach Us About Parallel Programming
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

First Experiences with Parallel Application Development in Fortran 2018

  • 1. First Experiences with Application Development with Fortran 2018 Damian Rouson
  • 2. Overview www.yourwebsite.com Fortran 2018 in a Nutshell   Results ICAR & Coarray ICAR  Conclusions  WRF-Hydro
  • 3. Overview www.yourwebsite.com Fortran 2018 in a Nutshell — Motivation: exascale challenges — Background: SPMD & PGAS in Fortran 2018 — Beyond CAF    Results ICAR & Coarray ICAR  Conclusions  WRF-Hydro
  • 4. EXASCALE CHALLENGES & Fortran 2018 Response Slide / 01 Billion-way concurrency with high levels of on-chip parallelism — events — collective subroutines — richer set of atomic subroutines — teams     Expensive data movement — one-sided communication — teams (locality control) Higher failure rates — failed-image detection Heterogeneous hardware on processor — events Source: Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M. and Le, H., 2014, November. Abstract machine models and proxy architectures for exascale computing. In Hardware-Software Co-Design for High Performance Computing (Co-HPC), 2014 (pp. 25-32). IEEE.
  • 5. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Image 1 Image 2 Image N Images
  • 6. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Image 1 Image 2 Image N Images
  • 7. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. ... Image 1 Image 2 Image N Images {processes | threads}
  • 8. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. ... Image 1 Image 2 Image N Images {processes | threads}
  • 9. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. ... sync all Image 1 Image 2 Image N Images {processes | threads}
  • 10. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. ... sync allsync all Image 1 Image 2 Image N Images {processes | threads}
  • 11. Single Program Multiple Data (SPMD) Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. ... sync allsync all sync all Image 1 Image 2 Image N Images {processes | threads} Images execute asynchronously up to a programmer-specified synchronization: sync all sync images allocate/deallocate
  • 13. Partitioned Global Address Space (PGAS)   Communicate objects Fortran 90 array syntax works on local data. Coarrays integrate with other languages features:
  • 14. Partitioned Global Address Space (PGAS) The ability to drop the square brackets harbors two important implications: Easily determine where communication occurs.   Parallelize legacy code with minimal revisions.   Communicate objects Fortran 90 array syntax works on local data. Coarrays integrate with other languages features:
  • 15. Sign Image 1 Image 2 Image 3 if (me<n) a(1:2) = a(2:3)[me+1]
  • 16. Sign Image 1 Image 2 Image 3 if (me<n) a(1:2) = a(2:3)[me+1]
  • 17. Sign Image 1 Image 2 Image 3 if (me<n) a(1:2) = a(2:3)[me+1]
  • 18. Sign Image 1 Image 2 Image 3 if (me<n) a(1:2) = a(2:3)[me+1]
  • 19. Sign Image 1 Image 2 Image 3 if (me<n) a(1:2) = a(2:3)[me+1]
  • 20. Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Image 1 Image 2 Image 3 if (me==1) a(5)[2] = a(5)[3]
  • 21. Sign Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Image 1 Image 2 Image 3 if (me==1) a(5)[2] = a(5)[3]
  • 22. Segment Ordering: EventsAn intrinsic module provides the derived type event_type, which encapsulates an atomic_int_kind integer component default-initialized to zero. An image increments the event count on a remote image by executing event post.   The remote image obtains the post count by executing event_query. Image Control Side Effect event post x atomic_add 1 event_query defines count event wait x atomic_add -1
  • 32. Performance-oriented constraints: — Query and wait must be local. — Post and wait are disallowed in do concurrent constructs. Events Hello, world! post query wait greeting_ready(2:n)[1] ok_to_overwrite[3] post ...
  • 33. Performance-oriented constraints: — Query and wait must be local. — Post and wait are disallowed in do concurrent constructs. Events Hello, world! post query wait greeting_ready(2:n)[1] ok_to_overwrite[3] post ... Pro tips: — Overlap communication and computation. — Wherever safety permits, query without waiting. — Write a spin-query-work loop & build a logical mask describing the remaining work.
  • 34. Team 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] a(1:4)[4] A set of images that can readily execute independently of other images TEAM Team 1 Team 2 a(1:4)[2] Image 1 Image 3Image 2 Image 4 Image 5 Image 6 a(1:4)[4] a(1:4)[5] a(1:4)[6]
  • 35. Collective Subroutines www.yourwebsite.com Each non-failed image of the current team must invoke the collective.     All collectives have intent(inout) argument A holding the input data and may hold the result on return, depending on result_image Optional arguments: stat, errmsg, result_image, source_image After parallel calculation/communication, the result is placed on one or all images. No implicit synchronization at beginning/end, which allows for overlap with other actions.   No image’s data is accessed before that image invokes the collective subroutine.
  • 37. 1e-05 0.0001 0.001 0.01 0.1 1 32 64 128 256 512 ExecutionTime(sec) Num. of images co_sum_int32 sum_bin_tree_int32 sum_bin_tree_real64 sum_rec_doubling_int32 sum_rec_doubling_real64 sum_alpha_tree_real64 NERSC Hopper: Xeon nodes on Cray XE6 KNL nodes on a Cray XC Performance User-Defined vs Intrinsic Collectives
  • 38. Failure Detection use iso_fortran_env, only : STAT_FAILED_IMAGE integer :: status sync all(stat==status) if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm() Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[n] Image n …
  • 39. Failure Detection use iso_fortran_env, only : STAT_FAILED_IMAGE integer :: status sync all(stat==status) if (status==STAT_FAILED_IMAGE) call fault_tolerant_algorithm() Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[n] Image n …
  • 40. Failure Detection Image 1 Image 3Image 2 a(1:4)[1] a(1:4)[2] a(1:4)[3] Image 4 Image 5 a(1:4)[4] a(1:4)[5] a(1:4)[n] Image n …
  • 41. FORTRAN 2018 Failed-Image Detection FAIL IMAGE (simulates a failures) IMAGE_STATUS (checks the status of a specific image) FAILED_IMAGES (provides the list of failed images) Coarray operations may fail. STAT= attribute used to check correct behavior.
  • 42. Overview www.yourwebsite.com Fortran 2018 in a Nutshell   Results ICAR & Coarray ICAR  Conclusions
  • 43. PrecipitationTopography The Climate Downscaling Problem Climate model (top row) and application needs (bottom row) Computational Cost High-res regional model >10 billion core hours (CONUS 4km 150yrs 40 scenarios)
  • 44. The Intermediate Complexity Atmospheric Research Model (ICAR) ICAR Wind Field over Topography Analytical solution for flow over topography (right) 90% of the information for 1% of the computational cost Core Numerics: 90% of cost = Cloud physics grid-cells independent 5% of cost = Advection fully explicit, requires local neighbor communication Gutmann et al (2016) J. of Hydrometeorology, 17 p957. doi:10.1175/JHM-D-15-0155.1
  • 45. ICAR Intermediate Complexity Atmospheric Research Animation of Water vapor (blue) and Precipitation (Green to Red) over the contiguous United States Output timestep : 1hr Integration timestep : Variable (~30s) Boundary Conditions: ERA-interim (historical) Run on 16 cores with OpenMP Limited scalability
  • 46. ICAR Intermediate Complexity Atmospheric Research Animation of Water vapor (blue) and Precipitation (Green to Red) over the contiguous United States Output timestep : 1hr Integration timestep : Variable (~30s) Boundary Conditions: ERA-interim (historical) Run on 16 cores with OpenMP Limited scalability
  • 47. ICAR comparison to “Full-physics” atmospheric model (WRF) Ideal Hill case ICAR (red) and WRF (blue) precipitation over an idealized hill (green) Real Simulation WRF (left) and ICAR (right) Season total precipitation over Colorado Rockies
  • 48. ICAR Users and Applications http://github.com/NCAR/icar.git
  • 49. Coarray ICAR Mini-App www.yourwebsite.com Object-oriented design    Overlaps communication & computation via one-sided “puts.” Coarray halo exchanges with 3D, 1st-order upwind advection (~2000 lines of new code) Collective broadcast of initial data Cloud microphysics (~5000 lines of pre- existing code) 
  • 50. Overview www.yourwebsite.com Fortran 2018 in a Nutshell   Results ICAR & Coarray ICAR  Conclusions  WRF-Hydro
  • 51. WRF-Hydro Weather Research Forecast — Hydrological Model Currently, ensemble runs involve redundant initialization calculations that occupy approximately 30% of the runtime and use only the processes allocated to one ensemble member. Our use of Fortran 2018 teams will eliminate the redundancy and the amortization of that effort across all of the processes allocated for the full ensemble of simulations.
  • 52. ICAR comparison to “Full-physics” atmospheric model (WRF) Ideal Hill case ICAR (red) and WRF (blue) precipitation over an idealized hill (green) Real Simulation WRF (left) and ICAR (right) Season total precipitation over Colorado Rockies
  • 53. form team(…) end team team_number(…) 1 2 3 color MPI_Barrier(…) MPI_Comm_Split(…) MPI_Barrier(…) Fortran statement or procedure MPI procedure or variable time 1 2 3 Mapping Fortran Teams onto MPI
  • 55. Communicator Hand-off (CAF to MPI)   Language extension exposes MPI communicator CAF compiler embeds MPI_Init, MPI_Finalize CAF in the driver seat:
  • 56. Communicator Hand-off (CAF to MPI) MPI under the hood: Only trivial modification of MPI application   Future work: amortize & speed up initialization   Language extension exposes MPI communicator CAF compiler embeds MPI_Init, MPI_Finalize CAF in the driver seat:
  • 57. Overview www.yourwebsite.com Fortran 2018 in a Nutshell   Results ICAR & Coarray ICAR  Conclusions  WRF-Hydro
  • 58. Coarray ICAR Simulation Time 500 x 500 x 20 grid 2000 x 2000 x 20 grid OpenSHMEM vs MPI Puts vs gets
 Number of processes Number of processes GCC 6.3 on NCAR “Cheyenne” SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon Broadwell @ 2.3 GHz
  • 59. Coarray ICAR Speedup 500 x 500 x 20 grid 2000 x 2000 x 20 grid Number of processes Number of processes GCC 6.3 on NCAR “Cheyenne” SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon Broadwell @ 2.3 GHz
  • 60. Coarray ICAR Xeon vs KNL Number of processes Number of processes Compilers & platforms: GNU on Cheyenne; Cray compiler on Edison (Broadwell), Cori (KNL).
  • 61. Coarray ICAR At Scale Number of processes Number of processes Cray compiler on Edison.
  • 62. Conclusions www.yourwebsite.com Fortran 2018 is a PGAS language supporting SPMD programming at scale.    High productivity pays off: from shared- memory parallelism to 100,000 cores in ~100 person-hours. Programming-model agnosticism is a life- saver.  CAF interoperates seamlessly with MPI
  • 63. Acknowledgements & References Ethan Gutman Alessandro Fanfarillo James McCreight Brian Friesen Rouson, D., Gutmann, E. D., Fanfarillo, A., & Friesen, B. (2017, November). Performance portability of an intermediate- complexity atmospheric research model in coarray Fortran. In Proceedings of the Second Annual PGAS Applications Workshop (p. 4). ACM. Rouson, D., McCreight, J. L., & Fanfarillo, A. (2017, November). Incremental caffeination of a terrestrial hydrological modeling framework using Fortran 2018 teams. In Proceedings of the Second Annual PGAS Applications Workshop (p. 6). ACM.