SlideShare a Scribd company logo
1 of 26
Download to read offline
Benchmarking the Parallel 1D Heat Equation Solver in
Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust,
Swift, and Java
Patrick Diehl, Max Morris, Steven R. Brandt, Nikunj Gupta and
Hartmut Kaiser
Center of Computation and Technology
Department of Physiscs and Astronomy
Louisiana State University
patrickdiehl@lsu.edu
August 28, 2023
P. Diehl and et al. (LSU) August 28, 2023 1 / 26
Motivation
Ranking Language Ranking Change
1 Python 13.33% -2.30%
3 C++ 11.41% +0.49%
4 Java 10.33% -2.24%
12 Go 1.16% +0.20%
18 Swift 0.90% -0.35%
19 Rust 0.89% +0.32%
20 Julia 0.85% +0.41%
Table: TIOBE Index for August 2023
Chapel is not listed in the index.
Charm++ and HPX are using C++
How do these languages compare?
P. Diehl and et al. (LSU) August 28, 2023 2 / 26
Overview
1 Model problem
2 Features of the approaches
3 Productivity
4 Performance measurements
5 Conclusion and Outlook
P. Diehl and et al. (LSU) August 28, 2023 3 / 26
Model problem
P. Diehl and et al. (LSU) August 28, 2023 4 / 26
Model problem I
The one-dimensional heat equation on a 1-D loop (e.g. limp noodle)
(0 ≤ x < L) with the length L for all times t > 0 is described by
∂u
∂t
= α
∂2u
∂x2
, 0 ≤ x < L, t > 0, (1)
with α as the material’s diffusivity. For the discretization in space, we use
the N grid points x = {xi = i · h ∈ R | i = 0, . . . , N − 1}, with the grid
spacing h and we use 2nd order finite differencing. For the discretization in
time, we use the Euler method, i.e.
u(t + δt, xi) = u(t, xi) + δt · α
u(t, xi−1) − 2 · u(t, xi) + u(t, xi+1)
2h
, (2)
with the initial condition u(0, xi) = xi. To model a loop, we use periodic
boundary conditions, i.e. u(t, x) = u(t, L + x).
P. Diehl and et al. (LSU) August 28, 2023 5 / 26
Model problem II
The parallel algorithm was implemented by having multiple threads of
execution each sequentially applying Eq. 2 on a local segment of the grid.
We used queues to communicate ghost zones between the segments. We
note that for this problem, the queues are single-producer, single-consumer
and, therefore, in principle, don’t need synchronization (although
synchronization to suspend/resume threads seemed to help in some cases).
P. Diehl and et al. (LSU) August 28, 2023 6 / 26
Features of the approaches
P. Diehl and et al. (LSU) August 28, 2023 7 / 26
Overview
Approach Async Coroutine ParAlg Win Linux Mac Licence
C++ 17 X X X X X X GNU
Java X X X X X X GNU
Swift X X X X X X Apache
Chapel X X ∼ X X X Apache
Charm++ X ∼ X X X X Own
HPX X X X X X X Boost
Go X X X X X X BSD
Python X X X X X X BSD
Julia X X X X X X MIT
Rust X X X X X X MIT
Table: Overview of the programming languages: (1) the parallelism approaches
they provide, (2) supported OS, and (3) the license. The C++ 17 standard was
used as a base. The symbol ∼ indicates that partial support.
P. Diehl and et al. (LSU) August 28, 2023 8 / 26
Chapel
We had to write our own queue and the full/empty bit
synchronization mechanism was helpful
The coforall loop, which assigns a different thread to each iteration,
provided a convenient mechanism for launching the outer loop.
Chapel also lacked a built-in way to append to a file. However,
opening a file, seeking to the end, and writing is possible.
We also add that the support we received from questions asked in the
Chapel Gitter was exceptional.
We found Chapel among the higher performing codes, comparable to Rust
or C++.
P. Diehl and et al. (LSU) August 28, 2023 9 / 26
Go
We use go func to launch worker threads (goroutines) and buffered
channels using make() to facilitate the exchange of ghost zones.
We use go func to launch worker threads (goroutines) and buffered
channels using make() to facilitate the exchange of ghost zones. For
synchronization of the goroutines, we use sync.WaitGroup and add
threads by calling waitGroup.Add(), and synchronize the threads by
calling waitGroup.Wait().
At the time of this writing, only biogo, an HPC bioinformatics toolkit
[1], is available.
Reference
1. Köster, J.: Rust-bio: a fast and safe bioinformatics library. Bioinformatics 32(3), 444–446 (2016)
P. Diehl and et al. (LSU) August 28, 2023 10 / 26
Julia
Both Python and Fortran clearly inspire Julia. It is a good choice for
Fortran programmers who want to get into scripting, as it will offer
some familiarity in using one as the default start for array indexes
(instead of zero) and its use of end to mark the end of a block.
In our Julia code, we implemented our own queue. Since Julia does
not support classes directly (though it has structs), we found it
convenient to use arrays. For parallelism, we used Julia’s
Thread.@threads for loop macro.
Julia’s community contacted us and provided some optimized code.
However, you need to be confident in Julia and know the internals for
these optimizations.
P. Diehl and et al. (LSU) August 28, 2023 11 / 26
Rust
We use std :: thread :: scope to launch worker threads, and
non-blocking channels from std :: sync :: mpsc to facilitate the
exchange of ghost zones.
We avoided using unsafe, working only in the safe subset of Rust.
Only two scientific codes (molecular dynamic and bioinformatics) are
using Rust.
Because of its guarantees concerning data race conditions and memory
access, as well as its high performance, Rust is a potentially good choice
for new scientific programming projects.
However, Rust has vastly different syntax and semantics than more
traditional languages like C++, Java, and Python, all of which may make
for a steep learning curve.
P. Diehl and et al. (LSU) August 28, 2023 12 / 26
Swift
Swift claims to be safe by design and produces lightning-fast software.
Unfortunately, we had to disable the safety feature to get a
performant code.
UnsafeMutableBufferPointer<Double> to avoid unnecessary calls of
await for accessing the elements of arrays. These buffers allow
explicit vectorization on newer x86 and Apple Silicon. See, for
example, addingProduct. However, we could not measure a
significant improvement using these functions.
For concurrency, we use await with TaskGroup{ body: { group in}}
to launch chunks of works on each thread and
for wait _ in group{}.
We found Swift is designed for application development for iOS or Mac
OS, but not for numerical applications.
P. Diehl and et al. (LSU) August 28, 2023 13 / 26
Productivity
P. Diehl and et al. (LSU) August 28, 2023 14 / 26
Lines of code
0 50 100 150 200
Python
Swift
HPX
Julia
Go
Rust
Chapel
Charm++
C++ 17
Java
Lines of code (LOC)
The numbers were determined with the Linux tool cloc.
P. Diehl and et al. (LSU) August 28, 2023 15 / 26
Productivity metric
Average of the computation time
Taverage(approach) := (T2(approach) + T20(approach) + T40(approach))/3
Constructive Cost Model (COCOMO)
COCOMO does not reflect parallel features
However, the HPX community never proposed their cost model
We map both metrics to the interval [−1, 1] using
Easy and Difficult for the costs
Slow and Fast for computation time
References
1. Barry, B., et al.: Software engineering economics. New York 197 (1981)
2. Stutzke, R.D., Crosstalk, M.: Software estimating technology: A survey. Los. Alamitos, CA: IEEE Computer Society
Press (1997)
P. Diehl and et al. (LSU) August 28, 2023 16 / 26
Productivity
Difficult
Fast
Easy
Slow
Python
Go
Julia
Rust
Chapel
C++ 17
HPX
Charm++
Swift Java
Figure: 2D classification using the computational time and the COCOMO model.
P. Diehl and et al. (LSU) August 28, 2023 17 / 26
Performance measurements
P. Diehl and et al. (LSU) August 28, 2023 18 / 26
AMD EPYC 7H12
0 10 20 30 40
#cores
10−1
100
Time
[s]
nx=1000000 and nt=1000
go
python
swift
rust
chapel
cxx
hpx
julia
charm++
java
P. Diehl and et al. (LSU) August 28, 2023 19 / 26
Intel®
Xeon®
Gold 6148 Skylake
0 10 20 30 40
#cores
10−1
100
101
Time
[s]
nx=1000000 and nt=1000
go
python
swift
rust
chapel
cxx
hpx
julia
charm++
java
P. Diehl and et al. (LSU) August 28, 2023 20 / 26
A64FX
0 10 20 30 40
#cores
10−1
100
101
Time
[s]
nx=1000000 and nt=1000
go
python
rust
chapel
cxx
hpx
julia
charm++
java
Swift is missing, since no package was available for Rocky Linux.
P. Diehl and et al. (LSU) August 28, 2023 21 / 26
Summary of performance measurements
Table: R2
correlation of the fit of the measured data points for all approaches and
architectures, computed using Python NumPy.
Arch C++ Charm++ Chapel Rust Go Julia HPX Swift Python Java
Intel 0.49 0.36 0.45 0.52 0.28 0.41 0.52 0.56 0.43 0.03
AMD 0.48 0.45 0.53 0.49 0.75 0.12 0.42 0.02 0.46 0.12
A64FX 0.49 0.52 0.08 0.40 0.52 0.42 0.73 – 0.90 0.32
Python was the slowest approach.
Swift and Julia are comparable.
For larger than 10 threads Go behaves slightly better than Swift and Julia.
For smaller core counts up to eight cores, the remaining approaches behave
similarly.
However, Chapel gets slower for higher node counts.
For Rust, Charm++, and HPX the performance is comparable. HPX is for larger
node counts the fastest, but has a high variance, see R2
in Table 3.
P. Diehl and et al. (LSU) August 28, 2023 22 / 26
Conclusion and Outlook
P. Diehl and et al. (LSU) August 28, 2023 23 / 26
Conclusion and Outlook
Conclusion
We will not name a winner concerning speed.
The higher performing platforms were mostly similar in what they
achieved.
The tests in this paper depend on the
hardware, the version of the interpreters and compilers, the particular
problem chosen,
the amount of effort applied, and our level of expertise (which varied
by platform).
Outlook
More numerical applications for a more comprehensive comparison
Distributed runs and GPU support
I am happy to answer any of your questions.
P. Diehl and et al. (LSU) August 28, 2023 24 / 26
Special issue
P. Diehl and et al. (LSU) August 28, 2023 25 / 26
Advertisement
P. Diehl and et al. (LSU) August 28, 2023 26 / 26

More Related Content

Similar to Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
James Johnson
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
Jules Krdenas
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
Richard Ashworth
 
Future Programming Language
Future Programming LanguageFuture Programming Language
Future Programming Language
YLTO
 

Similar to Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java (20)

Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
 
20 26
20 26 20 26
20 26
 
Low complexity low-latency architecture for matching
Low complexity low-latency architecture for matchingLow complexity low-latency architecture for matching
Low complexity low-latency architecture for matching
 
Voip
VoipVoip
Voip
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScience
 
Scimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinScimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbin
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
 
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
A methodology for the study of fiber optic cables
A methodology for the study of fiber optic cablesA methodology for the study of fiber optic cables
A methodology for the study of fiber optic cables
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
post119s1-file2
post119s1-file2post119s1-file2
post119s1-file2
 
Future Programming Language
Future Programming LanguageFuture Programming Language
Future Programming Language
 

More from Patrick Diehl

Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Patrick Diehl
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local models
Patrick Diehl
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Patrick Diehl
 
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Patrick Diehl
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
Patrick Diehl
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Patrick Diehl
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Patrick Diehl
 

More from Patrick Diehl (18)

Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-TigerEvaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-TigerEvaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
 
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local models
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task Bench
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
 
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics models
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic models
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

  • 1. Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java Patrick Diehl, Max Morris, Steven R. Brandt, Nikunj Gupta and Hartmut Kaiser Center of Computation and Technology Department of Physiscs and Astronomy Louisiana State University patrickdiehl@lsu.edu August 28, 2023 P. Diehl and et al. (LSU) August 28, 2023 1 / 26
  • 2. Motivation Ranking Language Ranking Change 1 Python 13.33% -2.30% 3 C++ 11.41% +0.49% 4 Java 10.33% -2.24% 12 Go 1.16% +0.20% 18 Swift 0.90% -0.35% 19 Rust 0.89% +0.32% 20 Julia 0.85% +0.41% Table: TIOBE Index for August 2023 Chapel is not listed in the index. Charm++ and HPX are using C++ How do these languages compare? P. Diehl and et al. (LSU) August 28, 2023 2 / 26
  • 3. Overview 1 Model problem 2 Features of the approaches 3 Productivity 4 Performance measurements 5 Conclusion and Outlook P. Diehl and et al. (LSU) August 28, 2023 3 / 26
  • 4. Model problem P. Diehl and et al. (LSU) August 28, 2023 4 / 26
  • 5. Model problem I The one-dimensional heat equation on a 1-D loop (e.g. limp noodle) (0 ≤ x < L) with the length L for all times t > 0 is described by ∂u ∂t = α ∂2u ∂x2 , 0 ≤ x < L, t > 0, (1) with α as the material’s diffusivity. For the discretization in space, we use the N grid points x = {xi = i · h ∈ R | i = 0, . . . , N − 1}, with the grid spacing h and we use 2nd order finite differencing. For the discretization in time, we use the Euler method, i.e. u(t + δt, xi) = u(t, xi) + δt · α u(t, xi−1) − 2 · u(t, xi) + u(t, xi+1) 2h , (2) with the initial condition u(0, xi) = xi. To model a loop, we use periodic boundary conditions, i.e. u(t, x) = u(t, L + x). P. Diehl and et al. (LSU) August 28, 2023 5 / 26
  • 6. Model problem II The parallel algorithm was implemented by having multiple threads of execution each sequentially applying Eq. 2 on a local segment of the grid. We used queues to communicate ghost zones between the segments. We note that for this problem, the queues are single-producer, single-consumer and, therefore, in principle, don’t need synchronization (although synchronization to suspend/resume threads seemed to help in some cases). P. Diehl and et al. (LSU) August 28, 2023 6 / 26
  • 7. Features of the approaches P. Diehl and et al. (LSU) August 28, 2023 7 / 26
  • 8. Overview Approach Async Coroutine ParAlg Win Linux Mac Licence C++ 17 X X X X X X GNU Java X X X X X X GNU Swift X X X X X X Apache Chapel X X ∼ X X X Apache Charm++ X ∼ X X X X Own HPX X X X X X X Boost Go X X X X X X BSD Python X X X X X X BSD Julia X X X X X X MIT Rust X X X X X X MIT Table: Overview of the programming languages: (1) the parallelism approaches they provide, (2) supported OS, and (3) the license. The C++ 17 standard was used as a base. The symbol ∼ indicates that partial support. P. Diehl and et al. (LSU) August 28, 2023 8 / 26
  • 9. Chapel We had to write our own queue and the full/empty bit synchronization mechanism was helpful The coforall loop, which assigns a different thread to each iteration, provided a convenient mechanism for launching the outer loop. Chapel also lacked a built-in way to append to a file. However, opening a file, seeking to the end, and writing is possible. We also add that the support we received from questions asked in the Chapel Gitter was exceptional. We found Chapel among the higher performing codes, comparable to Rust or C++. P. Diehl and et al. (LSU) August 28, 2023 9 / 26
  • 10. Go We use go func to launch worker threads (goroutines) and buffered channels using make() to facilitate the exchange of ghost zones. We use go func to launch worker threads (goroutines) and buffered channels using make() to facilitate the exchange of ghost zones. For synchronization of the goroutines, we use sync.WaitGroup and add threads by calling waitGroup.Add(), and synchronize the threads by calling waitGroup.Wait(). At the time of this writing, only biogo, an HPC bioinformatics toolkit [1], is available. Reference 1. Köster, J.: Rust-bio: a fast and safe bioinformatics library. Bioinformatics 32(3), 444–446 (2016) P. Diehl and et al. (LSU) August 28, 2023 10 / 26
  • 11. Julia Both Python and Fortran clearly inspire Julia. It is a good choice for Fortran programmers who want to get into scripting, as it will offer some familiarity in using one as the default start for array indexes (instead of zero) and its use of end to mark the end of a block. In our Julia code, we implemented our own queue. Since Julia does not support classes directly (though it has structs), we found it convenient to use arrays. For parallelism, we used Julia’s Thread.@threads for loop macro. Julia’s community contacted us and provided some optimized code. However, you need to be confident in Julia and know the internals for these optimizations. P. Diehl and et al. (LSU) August 28, 2023 11 / 26
  • 12. Rust We use std :: thread :: scope to launch worker threads, and non-blocking channels from std :: sync :: mpsc to facilitate the exchange of ghost zones. We avoided using unsafe, working only in the safe subset of Rust. Only two scientific codes (molecular dynamic and bioinformatics) are using Rust. Because of its guarantees concerning data race conditions and memory access, as well as its high performance, Rust is a potentially good choice for new scientific programming projects. However, Rust has vastly different syntax and semantics than more traditional languages like C++, Java, and Python, all of which may make for a steep learning curve. P. Diehl and et al. (LSU) August 28, 2023 12 / 26
  • 13. Swift Swift claims to be safe by design and produces lightning-fast software. Unfortunately, we had to disable the safety feature to get a performant code. UnsafeMutableBufferPointer<Double> to avoid unnecessary calls of await for accessing the elements of arrays. These buffers allow explicit vectorization on newer x86 and Apple Silicon. See, for example, addingProduct. However, we could not measure a significant improvement using these functions. For concurrency, we use await with TaskGroup{ body: { group in}} to launch chunks of works on each thread and for wait _ in group{}. We found Swift is designed for application development for iOS or Mac OS, but not for numerical applications. P. Diehl and et al. (LSU) August 28, 2023 13 / 26
  • 14. Productivity P. Diehl and et al. (LSU) August 28, 2023 14 / 26
  • 15. Lines of code 0 50 100 150 200 Python Swift HPX Julia Go Rust Chapel Charm++ C++ 17 Java Lines of code (LOC) The numbers were determined with the Linux tool cloc. P. Diehl and et al. (LSU) August 28, 2023 15 / 26
  • 16. Productivity metric Average of the computation time Taverage(approach) := (T2(approach) + T20(approach) + T40(approach))/3 Constructive Cost Model (COCOMO) COCOMO does not reflect parallel features However, the HPX community never proposed their cost model We map both metrics to the interval [−1, 1] using Easy and Difficult for the costs Slow and Fast for computation time References 1. Barry, B., et al.: Software engineering economics. New York 197 (1981) 2. Stutzke, R.D., Crosstalk, M.: Software estimating technology: A survey. Los. Alamitos, CA: IEEE Computer Society Press (1997) P. Diehl and et al. (LSU) August 28, 2023 16 / 26
  • 17. Productivity Difficult Fast Easy Slow Python Go Julia Rust Chapel C++ 17 HPX Charm++ Swift Java Figure: 2D classification using the computational time and the COCOMO model. P. Diehl and et al. (LSU) August 28, 2023 17 / 26
  • 18. Performance measurements P. Diehl and et al. (LSU) August 28, 2023 18 / 26
  • 19. AMD EPYC 7H12 0 10 20 30 40 #cores 10−1 100 Time [s] nx=1000000 and nt=1000 go python swift rust chapel cxx hpx julia charm++ java P. Diehl and et al. (LSU) August 28, 2023 19 / 26
  • 20. Intel® Xeon® Gold 6148 Skylake 0 10 20 30 40 #cores 10−1 100 101 Time [s] nx=1000000 and nt=1000 go python swift rust chapel cxx hpx julia charm++ java P. Diehl and et al. (LSU) August 28, 2023 20 / 26
  • 21. A64FX 0 10 20 30 40 #cores 10−1 100 101 Time [s] nx=1000000 and nt=1000 go python rust chapel cxx hpx julia charm++ java Swift is missing, since no package was available for Rocky Linux. P. Diehl and et al. (LSU) August 28, 2023 21 / 26
  • 22. Summary of performance measurements Table: R2 correlation of the fit of the measured data points for all approaches and architectures, computed using Python NumPy. Arch C++ Charm++ Chapel Rust Go Julia HPX Swift Python Java Intel 0.49 0.36 0.45 0.52 0.28 0.41 0.52 0.56 0.43 0.03 AMD 0.48 0.45 0.53 0.49 0.75 0.12 0.42 0.02 0.46 0.12 A64FX 0.49 0.52 0.08 0.40 0.52 0.42 0.73 – 0.90 0.32 Python was the slowest approach. Swift and Julia are comparable. For larger than 10 threads Go behaves slightly better than Swift and Julia. For smaller core counts up to eight cores, the remaining approaches behave similarly. However, Chapel gets slower for higher node counts. For Rust, Charm++, and HPX the performance is comparable. HPX is for larger node counts the fastest, but has a high variance, see R2 in Table 3. P. Diehl and et al. (LSU) August 28, 2023 22 / 26
  • 23. Conclusion and Outlook P. Diehl and et al. (LSU) August 28, 2023 23 / 26
  • 24. Conclusion and Outlook Conclusion We will not name a winner concerning speed. The higher performing platforms were mostly similar in what they achieved. The tests in this paper depend on the hardware, the version of the interpreters and compilers, the particular problem chosen, the amount of effort applied, and our level of expertise (which varied by platform). Outlook More numerical applications for a more comprehensive comparison Distributed runs and GPU support I am happy to answer any of your questions. P. Diehl and et al. (LSU) August 28, 2023 24 / 26
  • 25. Special issue P. Diehl and et al. (LSU) August 28, 2023 25 / 26
  • 26. Advertisement P. Diehl and et al. (LSU) August 28, 2023 26 / 26