SlideShare a Scribd company logo
1 of 35
HPC MPI CICD
OBJECT AUTOMATION SYSTEM SOLUTIONS PVT. LTD.
CI/CD
Contents
 Overview
 Bluefield Run
 OMB Tests
 IMB Tests
 MPICH Tests
 NAS Tests
 References
Overview
 CI/CD falls under DevOps (the joining of development and operations) and combines the practices of
continuous integration and continuous delivery. CI/CD automates much or all of the manual human
intervention traditionally needed to get new code from a commit into production such as build, test,
and deploy, as well as infrastructure provisioning. With a CI/CD pipeline, developers can make
changes to code that are then automatically tested and pushed out for delivery and deployment. With
CI/CD right code releases happen faster.
 Continuous integration is the practice of integrating all code changes into the main branch of a shared
source code repository early and often, automatically testing each change when commit or merge
happens, and automatically kicking off a build. With continuous integration, errors and security issues
can be identified and fixed more easily, and much earlier in the software development lifecycle.
 Continuous delivery is a software development practice that works in conjunction with continuous
integration to automate the infrastructure provisioning and application release process.
 Once code has been tested and built as part of the CI process, continuous delivery takes over during
the final stages to ensure it can be deployed packaged with everything it needs to deploy to any
environment at any time. Continuous delivery can cover everything from provisioning the
infrastructure to deploying the application to the testing or production environment.
 With continuous delivery, the software is built so that it can be deployed to production at any time.
Then one can trigger the deployments manually or move to continuous deployment where
deployments are automated as well.
Directory Tree
The pipeline is living in
/global/home/users/ rgopal / CITest / usr /bin. This is the base directory and where the gitlab runner is
installed. From here, the structure of the directory looks like:
/global/home/users/ rgopal /CITest /usr /bin
|src/
|...
|<git pull location>
|builds/
|<commit_hash>
|gcc/
|install/
|logs/
|<commit_hash>
|nas
|mpich
|imb
|omb
|tests/
|tmp/
|<commit_hash>
|...
|mv2
src/
The location that the gitlab-runner clones the most recent commit. This directory is checked for
changes at each new job. Do not change any files here in any of the jobs.
builds/
The install location of the built files from the most recent commit
logs/
Where is testing logs live tests/
The binaries of all of the external tests (not built alongside mv2)
tmp/
Location of temporary files.
Pipeline Structure
Currently, the pipeline is divided into 4 phases:
 build
 test
 verify
 clean
Within each phase, there are separate jobs being executed at the same time. For each new job,
the gitlab runner makes a clone from master into a directory and tries to start from clean slate.
build
For the build step, the entire repo is copied into
$GITLAB_BASE_DIR/ tmp/mv2. This is done because if we run a build in the directory, the following
phases will notice the files have changed. They'll try to make a reset and will eventually complain.
It then runs autogen, make, make install and installs to /builds/ <commit_hash>/gcc.
test
This phase just submits batch jobs using sbatch. Currently, we're using the thor partition. The scripts
will save the output to
$GITLAB_BASE_DIR/logs/<commit_hash> and when it's done, it will generate a file in
$GITLAB_BASE_DIR/tmp/<commit_hash>.
This phase should be completed in a couple of seconds. We're just submitting jobs to sbatch and
then checking for them in the verify phase
verify
Checks for the status of the tests that we submitted to sbatch. The scripts in the test phase should
generate a
.done file in
$GITLAB_BASE_DIR/tmp/<commit_hash>. The verify scripts loop through and check for that. Once it's
found, it will grep the output from the batch job to look for errors and log them in
$GITLAB_BASE_DIR/logs/<commit_hash>/<test>/
clean
Cleans up any extra files, removes the builds and any generated hostfiles
Bluefield Run Building
After cloning, run ./build.sh. This script will run autogen, configure, make, and make install. If you
open the script and set ARMSRC_DIR to another clone of this repo in a separate folder (make sure it's
on the same commit), it will launch a parallel build.
Note: Open the script and set LICENSE=0 before running in order to build without the need for a
license file.
Note 2: Building on ARM takes a long time. Wait around 20 minutes for it to complete. The host build
will finish much faster. Don't forget that both host and ARM need to finish before you can run!
The script uses a separate set of configure flags in order to allow both SRC_DIR and ARMSRC_DIR to
have the same --prefix. Basically, the ARMSRC_DIR flags will build mpicc, mpispawn, proxy_program,
etc. all appended with the -arm suffix to distinguish the host binaries from arm binaries.
If you see an error "cannot execute binary file", please make sure that file ./install/bin/ mpispawn
reports as x86 executable, and file
./install/bin/ proxy_program reports as aarch64 executable. If either is wrong, then run either cp
proxy_program-arm proxy_program or cp mpispawn-
x86 mpispawn. If these files don't exist, rerun make && make install to regenerate them.
Environment Setup
On HPCAC, the Thor hosts have two physical HCAs plugged in. One is the BF2, and the other is
a ConnectX-6:
[rgopal@thor011 xsc]$ ibstat CA 'mlx5_0'
CA type: MT4123 Number of ports: 1
Firmware version: 20.30.1004
Hardware version: 0
Node GUID: 0x98039b03008553e6
System image GUID: 0x98039b03008553e6 Port 1:
State: Active
Physical state: LinkUp Rate: 100
Base lid: 60
LMC: 0
SM lid: 9
Capability mask: 0x2651e848 Port GUID: 0x98039b03008553e6
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4123 Number of ports: 1
Firmware version: 20.30.1004
Hardware version: 0
Node GUID: 0x98039b03008553e7
System image GUID: 0x98039b03008553e6 Port 1:
State: Active
Physical state: LinkUp Rate: 100
Base lid: 41
LMC: 0
SM lid: 1
Capability mask: 0x2651e848 Port GUID: 0x98039b03008553e7
Link layer: InfiniBand
CA 'mlx5_2'
CA type: MT41686
Number of ports: 1
Firmware version: 24.30.1004
Hardware version: 0
Node GUID: 0x043f720300ec7f0e
System image GUID: 0x043f720300ec7f0e Port 1: State: Active Physical state: LinkUp Rate:100
Base lid: 51
LMC: 0
SM lid: 1
Capability mask: 0x2641e848 Port GUID: 0x043f720300ec7f13
Link layer: InfiniBand
In order to run the offload, set the following snippet in ~/.bashrc to select the BF-2 on the host while also
setting the BF-2 on the ARM cores.
STR=`hostname` SUB="bf"
if [[ "$STR" == *"$SUB"* ]]; then export MV2_IBA_HCA=mlx5_0 else
export MV2_IBA_HCA=mlx5_2 fi
Running
 Create a hostfile as usual. (a hostfile contains the list of hostnames of nodes to launch in the MPI job
 Create a file called dpufile. Fill it with the individual hostnames of each BlueField. Don't
write any WPN information (like thor-bf01:2 or thor-bf01nthor-bf01) since the launcher will
launch 8 WPN automatically.
For example, you can generate a dpufile like this if you have a job allocation with
SLURM: scontrol show hostnames | grep bf | tee ./dpufile Set MV2_USE_DPU=1 as an
environment variable in mpirun_rsh.
Full run command example:
./bin/ mpirun_rsh -np 128 -hostfile ./hostfile -dpufile
./dpufile MV2_USE_DPU=1 ./ libexec / osu-micro- benchmarks/ mpi/collective/ osu_ialltoall
OMB tests
Osu MicroBenchmarks (OMB) is a benchmark suite developed by NOWLAB that is included in every
installation of MVAPICH2 (including MVAPICH2- DPU) in the install-prefix/ libexec /osu-micro-
benchmarks folder as a binary after building, and in the osu_benchmarks folder as source code.
Additionally, it can be downloaded as a standalone package here: http://mvapich.cse.ohio-
state.edu/benchmarks/.
The benchmark suite has tests for MPI point-to-point (sending and recving between exactly two
processes), collectives (communication among groups of processes), and RMA (one-sided put and get
(sending without a corresponding recv, or recv'ing without a corresponding send), and others)
As of this writing, the MVAPICH2-DPU package supports MPI_Ialltoall, MPI_Ibcast, and
MPI_Iallgather collective offloads.
This is a good picture describing what these collectives do. Each row is a buffer (sendbuf
or recvbuf) on a single process:
Also, it is important to note that there is an I in front of the collective
name. This means the collective is actually nonblocking. To
demonstrate this, let me show the usage of a blocking alltoall (i.e.,
MPI_Alltoall) and compare it with a nonblocking alltoall (i.e.
MPI_Ialltoall).
MPI_Alltoall:
// Recvbuf empty
MPI_Alltoall( sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm); // May take
some time to complete
// Recvbuf guaranteed to be full MPI_Ialltoall
MPI_Request request;
// Recvbuf empty
MPI_Ialltoall( sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, &request); //
Returns instantly
// Do another job while the non-blocking all to all progresses heavy_computation
which_does_not depend on recvbuf()
// Recvbuf may or may not be filled
MPI_Wait(&request, MPI_STATUS_IGNORE); // May or may not take much time to complete,
depending on how long heavy_computation which_does_not_depend_on_recvbuf took
// Recvbuf guaranteed to be filled
 Since MVAPICH2-DPU supports MPI_Ialltoall, MPI_Ibcast, and MPI_Iallgather, we are mainly
interested in the output of osu_ialltoall, osu_iallgather, and osu_ibcast. Binaries for these can be
found in the install-prefix/ libexec /osu-micro- benchmarks/ mpi/collective folder.
Following MPI tests are included in the OMB package:
 Point-to-Point MPI Benchmarks: Latency, multi- threaded latency, multi-pair latency, multiple
bandwidth / message rate test, bandwidth, bidirectional bandwidth
 Collective MPI Benchmarks: Collective latency tests for various MPI collective operations such as
MPI_Reduce, MPI_Reduce_Scatter, MPI_Scatter and vector collectives.
 Non-Blocking Collective (NBC) MPI Benchmarks: Collective latency and Overlaptests
for various MPI collective operations such as MPI_Iallgather , MPI_Iallreduce , MPI_Ialltoall,
MPI_Ibarrier , MPI_Ibcast , MPI_Igather , MPI_Ireduce, MPI_Iscatter and vector collectives.
 One-sided MPI Benchmarks: one-sided put
latency, one-sided put bandwidth, one-sided put bidirectional bandwidth, one-sided get latency,
one-sided get bandwidth, one-sided accumulate latency, compare and swap latency, fetch and
operate and get_accumulate latency for MVAPICH2 (MPI-2 and MPI-3).
omb-refactor.sh runs OMB tests.
Then the tests are run on 1, 2, 4, 8 and 16 Nodes with full subscription
It is run with MV2_USE_DPU=0 and MV2_USE_DPU=1.
All tests are run with the below options
COMMON="MV2_DEBUG_SHOW_BACKTRAC E=2 MV2_ENABLE_AFFINITY=0"
MV2_DEBUG_SHOW_BACKTRACE
Show a backtrace when a process fails on errors like
Instruction”, ”Abort” or ”Floating point exception”. MV2_ENABLE_AFFINITY
Enable CPU affinity by setting MV2_ENABLE_AFFINITY to 1 or disable it by setting
MV2_ENABLE_AFFINITY to 0.
Set this to limit, in seconds, of the execution time of
MV2_MPIRUN_TIMEOUT parameter. Point-to-Point MPI Benchmarks
 osu_latency - Latency Test
 osu_latency_mt - Multi-threaded Latency Test
 osu_latency_mp - Multi-process Latency Test
 osu_bw - Bandwidth Test
 osu_bibw - Bidirectional Bandwidth Test
 osu_mbw_mr - Multiple Bandwidth / Message Rate Test
 osu_multi_lat - Multi-pair Latency Test
Pt2Pt tests are run on 1 node[2ppn] and 2 nodes[1ppn]
Total Tests: (7 tests * 2 scenarios[host/ dpu]) = 14
Collective MPI Benchmarks
 osu_allgather - MPI_Allgather Latency Test
 osu_allgatherv - MPI_Allgatherv Latency Test
 osu_allreduce - MPI_Allreduce Latency Test
 osu_alltoall - MPI_Alltoall Latency Test
 osu_alltoallv - MPI_Alltoallv Latency Test
 osu_barrier - MPI_Barrier Latency Test
 osu_bcast - MPI_Bcast Latency Test
 osu_gather - MPI_Gather Latency Test
 osu_gatherv - MPI_Gatherv Latency Test
 osu_reduce - MPI_Reduce Latency Test
 osu_reduce_scatter - MPI_Reduce_scatter Latency Test
 osu_scatter - MPI_Scatter Latency Test
 osu_scatterv - MPI_Scatterv Latency Test
Non-Blocking Collective (NBC) MPI Benchmarks
 osu_iallgather - MPI_Iallgather Latency Test
 osu_iallgatherv - MPI_Iallgatherv Latency Test
 osu_iallreduce - MPI_Iallreduce Latency Test
 osu_ialltoall - MPI_Ialltoall Latency Test
 osu_ialltoallv - MPI_Ialltoallv Latency Test
 osu_ialltoallw - MPI_Ialltoallw Latency Test
 osu_ibarrier - MPI_Ibarrier Latency Test
 osu_ibcast - MPI_Ibcast Latency Test
 osu_igather - MPI_Igather Latency Test
 osu_igatherv - MPI_Igatherv Latency Test
 osu_ireduce - MPI_Ireduce Latency Test
 osu_iscatter - MPI_Iscatter Latency Test
 osu_iscatterv - MPI_Iscatterv Latency Test
Collective tests are run on 1,2,4,8 and 16 nodes with full subscription [ 16ppn]
Total Tests: (26 tests * 2 scenarios[host/ dpu]) = 54 – 1
= 53
Note: ialltoall is a time consuming test hence it is run with the max message size of 32KB, rest all are
run with the default max message size
One-sided MPI Benchmarks
 osu_put_latency - Latency Test for Put with Active/Passive Synchronization
 osu_get_latency -Latency Test for Get with Active/Passive Synchronization
 osu_put_bw - Bandwidth Test for Put with Active/Passive Synchronization
 osu_get_bw - Bandwidth Test for Get with Active/Passive Synchronization
 osu_put_bibw - Bi-directional Bandwidth Test
for Put with Active Synchronization
 osu_acc_latency - Latency Test for Accumulate with Active/Passive Synchronization
 osu_cas_latency - Latency Test for Compare and Swap with Active/Passive Synchronization
 osu_fop_latency - Latency Test for Fetch andOp
with Active/Passive Synchronization
 osu_get_acc_latency - Get_accumulate Synchronization
Latency Test for with Active/Passive
RMA tests are run one or two nodes with ppn = 2 or ppn = 1 respectively
Total Tests: (9 tests * 2 scenarios[host/ dpu]) = 18
IMB Tests
The objectives of the Intel® MPI Benchmarks are:
•Provide a concise set of benchmarks targeted at measuring the most important MPI functions.
• Set forth a precise benchmark methodology.
•Report bare timings rather than provide interpretation of the measured results. Show throughput
values if and only if these values are well- defined. Intel® MPI Benchmarks is developed using ANSI C
plus standard MPI
Intel® MPI Benchmarksperforms a set of MPI
message sizes. The generated benchmark data fully characterizes:
• Performanceof a cluster system, including node performance, network latency,
and throughput
• Efficiency of the MPI implementation used Intel® MPI Benchmarks package consists of the
following components:
• IMB-MPI1 - benchmarks for MPI-1 functions.
• Two components for MPI-2 functionality:
• IMB-EXT - one-sided communications benchmarks.
• IMB-IO - input/output (I/O) benchmarks.
• Two components for MPI-3 functionality:
• IMB-NBC - benchmarks for nonblocking collective (NBC) operations.
• IMB-RMA - one-sided communications benchmarks. These benchmarks measure the Remote
Memory Access (RMA) functionality introduced in the MPI-3 standard. Each component
constitutes a separate executable file. You can run all of the supported benchmarks, or specify
a single executable file in the command line to get results for a specific subset of
benchmarks.imb-refactor.sh runs IMB tests.
On a single node IMB tests are run with full subscription. 16 nodes and 16 ppn = 256 processes. It is
run with MV2_USE_DPU=0 and MV2_USE_DPU=1.
Then the tests are repeated on 2 Nodes and 4 Nodes
For two and four nodes maximum message size is limited to 64KB and number of iterations are kept at
500. Here too tests are run with MV2_USE_DPU=0
and 1
Number of iterations for all the tests is set to 500.
Tests are run with “multi” option set to 1 and without it.
This defines whether the benchmarkruns in the multiple mode or not.
All tests are run with the below options
COMMON="MV2_DEBUG_SHOW_BACKTRAC E=2 MV2_ENABLE_AFFINITY=0
MV2_SUPPRESS_JOB_STARTUP_PERFORMAN CE_WARNING=1 MPIEXEC_TIMEOUT=300"
MV2_DEBUG_SHOW_BACKTRACE
Show a backtrace when a process fails on errors like
Instruction”, ”Abort” or ”Floating point exception”. MV2_ENABLE_AFFINITY
Enable CPU affinity by setting MV2_ENABLE_AFFINITY to 1 or disable it
by setting MV2_ENABLE_AFFINITY to 0.
Set this to limit, in seconds, of the execution time of the mpi application.
This overwritesthe MV2_MPIRUN_TIMEOUT parameter.
The following table lists all IMB-NBC benchmarks:
IMB-NBC tests
Total Tests: (26 tests * 2 scenarios[host/dpu] *2
modes [multi/non-multi] ) = 104
IMB-EXT tests
Unidir_Put
Unidir_Get
Bidir_Put
Bidir_Get Accumulate Window
Total Tests: (6 tests * 2 scenarios[host/ dpu] * 2modes [multi/non-multi] ) = 24
MPI-1 Benchmarks
Total Tests: (19 tests * 2 scenarios[host/ dpu ] *2 modes [multi/non-multi] ) = 76
IMB-RMA Benchmarks
The table below lists all IMB-RMA benchmarks
Total Tests: (19 tests*2 scenarios[host/ dpu]* 2 modes [multi/non-multi] ) = 76
MPICH Tests
The MVAPICH2 MPI library (by Network Based Computing Laboratory at Ohio State University) is a
derivative of the MPICH (by Argonne National Laboratory). There exists some tests originally from
MPICH that can be found in the ./test folder of a fresh clone of the MVAPICH2 code. There are
folders with tests for multiple parts of the code-- pt2pt, collectives, rma, etc.
Each folder has a testlist file that can be read in by a script to know which tests to run. Since the
MVAPICH2-DPU library at this time only has support for dpu-based collectives, we are mainly
interested in passing tests within the ./test/mpi/ coll folder.
MPICH collective tests
Total Tests: (169 tests * 2 scenarios[host/ dpu] ) = 338
There are many dense tests here such as
"alltoall", "bcasttest", "redscat", "red_scat_block", "gather_big", "opprod"
"nbicbcast", "nbicallreduce", "nbic"
MPICH comm, pt2pt and rma are yet to be investigated.
NAS Tests
The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the
performance of parallel supercomputers. The benchmarks are derived from computational fluid
dynamics (CFD) applications and consist of five kernels and three pseudo-applications in the original
"pencil-and-paper" specification (NPB 1).
The benchmark suite has been extended to include new benchmarks for unstructured
adaptive meshes,
parallel I/O, multi-zone applications, and computational grids. Problem sizes in NPB are
predefined
and indicated as different classes.
Reference implementations of NPB are available in commonly-used programming models
like MPI and
OpenMP (NPB 2 and NPB 3).
"mg.B.x”,"cg.B.x","ft.B.x","lu.B.x",
"is.B.x","sp.B.x","bt.B.x.ep_io", "bt.B.x.mpi_io_full"
"ep.B.x",
NAS has 9 benchmarks. Following are NAS benchmarks
All benchmarks are run successfully on 2 nodes with 2ppn.
It is yet to be scaled up to 4,8 and 16 nodes with full subscription.
References
1. OMB user guide
2. IMB user guide
3. Wiki
End of Document
Thank you.
www.object-automation.com
India
Object Automation System Solutions Pvt. Ltd.
Chennai.
Contact Details:
hr@object-automation.com
India : +91 73977 84815 / +91 91500 55959

More Related Content

Similar to HPC_MPI_CICID_OA.pptx

Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdbRoman Podoliaka
 
maXbox Starter87
maXbox Starter87maXbox Starter87
maXbox Starter87Max Kleiner
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3kognate
 
Raising the Bar on Robotics Code Quality
Raising the Bar on Robotics Code QualityRaising the Bar on Robotics Code Quality
Raising the Bar on Robotics Code QualityThomas Moulard
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy Systemadrian_nye
 
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbersDefcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbersAlexandre Moneger
 
Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008guestd9065
 
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Fabrice Bernhard
 
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Andrey Karpov
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemLinaro
 
Development Setup of B-Translator
Development Setup of B-TranslatorDevelopment Setup of B-Translator
Development Setup of B-TranslatorDashamir Hoxha
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaGeorge Markomanolis
 
Node.js basics
Node.js basicsNode.js basics
Node.js basicsBen Lin
 
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...Pôle Systematic Paris-Region
 
Practical solutions for connections administrators
Practical solutions for connections administratorsPractical solutions for connections administrators
Practical solutions for connections administratorsSharon James
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developerPaul Czarkowski
 
Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -LynellBull52
 

Similar to HPC_MPI_CICID_OA.pptx (20)

Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Readme
ReadmeReadme
Readme
 
maXbox Starter87
maXbox Starter87maXbox Starter87
maXbox Starter87
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3
 
Raising the Bar on Robotics Code Quality
Raising the Bar on Robotics Code QualityRaising the Bar on Robotics Code Quality
Raising the Bar on Robotics Code Quality
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbersDefcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
 
Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008Purdue CS354 Operating Systems 2008
Purdue CS354 Operating Systems 2008
 
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
 
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
Development Setup of B-Translator
Development Setup of B-TranslatorDevelopment Setup of B-Translator
Development Setup of B-Translator
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to Omega
 
Samba
SambaSamba
Samba
 
Node.js basics
Node.js basicsNode.js basics
Node.js basics
 
Open Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFVOpen Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFV
 
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...
PyParis2017 / Tutorial transcript - Function-as-a-service : a pythonic perspe...
 
Practical solutions for connections administrators
Practical solutions for connections administratorsPractical solutions for connections administrators
Practical solutions for connections administrators
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

HPC_MPI_CICID_OA.pptx

  • 1. HPC MPI CICD OBJECT AUTOMATION SYSTEM SOLUTIONS PVT. LTD.
  • 2. CI/CD Contents  Overview  Bluefield Run  OMB Tests  IMB Tests  MPICH Tests  NAS Tests  References
  • 3. Overview  CI/CD falls under DevOps (the joining of development and operations) and combines the practices of continuous integration and continuous delivery. CI/CD automates much or all of the manual human intervention traditionally needed to get new code from a commit into production such as build, test, and deploy, as well as infrastructure provisioning. With a CI/CD pipeline, developers can make changes to code that are then automatically tested and pushed out for delivery and deployment. With CI/CD right code releases happen faster.  Continuous integration is the practice of integrating all code changes into the main branch of a shared source code repository early and often, automatically testing each change when commit or merge happens, and automatically kicking off a build. With continuous integration, errors and security issues can be identified and fixed more easily, and much earlier in the software development lifecycle.
  • 4.  Continuous delivery is a software development practice that works in conjunction with continuous integration to automate the infrastructure provisioning and application release process.  Once code has been tested and built as part of the CI process, continuous delivery takes over during the final stages to ensure it can be deployed packaged with everything it needs to deploy to any environment at any time. Continuous delivery can cover everything from provisioning the infrastructure to deploying the application to the testing or production environment.  With continuous delivery, the software is built so that it can be deployed to production at any time. Then one can trigger the deployments manually or move to continuous deployment where deployments are automated as well.
  • 5. Directory Tree The pipeline is living in /global/home/users/ rgopal / CITest / usr /bin. This is the base directory and where the gitlab runner is installed. From here, the structure of the directory looks like: /global/home/users/ rgopal /CITest /usr /bin |src/ |... |<git pull location> |builds/ |<commit_hash> |gcc/ |install/ |logs/ |<commit_hash> |nas |mpich
  • 6. |imb |omb |tests/ |tmp/ |<commit_hash> |... |mv2 src/ The location that the gitlab-runner clones the most recent commit. This directory is checked for changes at each new job. Do not change any files here in any of the jobs. builds/ The install location of the built files from the most recent commit logs/ Where is testing logs live tests/ The binaries of all of the external tests (not built alongside mv2)
  • 7. tmp/ Location of temporary files. Pipeline Structure Currently, the pipeline is divided into 4 phases:  build  test  verify  clean Within each phase, there are separate jobs being executed at the same time. For each new job, the gitlab runner makes a clone from master into a directory and tries to start from clean slate. build For the build step, the entire repo is copied into $GITLAB_BASE_DIR/ tmp/mv2. This is done because if we run a build in the directory, the following phases will notice the files have changed. They'll try to make a reset and will eventually complain. It then runs autogen, make, make install and installs to /builds/ <commit_hash>/gcc.
  • 8. test This phase just submits batch jobs using sbatch. Currently, we're using the thor partition. The scripts will save the output to $GITLAB_BASE_DIR/logs/<commit_hash> and when it's done, it will generate a file in $GITLAB_BASE_DIR/tmp/<commit_hash>. This phase should be completed in a couple of seconds. We're just submitting jobs to sbatch and then checking for them in the verify phase verify Checks for the status of the tests that we submitted to sbatch. The scripts in the test phase should generate a .done file in $GITLAB_BASE_DIR/tmp/<commit_hash>. The verify scripts loop through and check for that. Once it's found, it will grep the output from the batch job to look for errors and log them in $GITLAB_BASE_DIR/logs/<commit_hash>/<test>/ clean Cleans up any extra files, removes the builds and any generated hostfiles
  • 9. Bluefield Run Building After cloning, run ./build.sh. This script will run autogen, configure, make, and make install. If you open the script and set ARMSRC_DIR to another clone of this repo in a separate folder (make sure it's on the same commit), it will launch a parallel build. Note: Open the script and set LICENSE=0 before running in order to build without the need for a license file. Note 2: Building on ARM takes a long time. Wait around 20 minutes for it to complete. The host build will finish much faster. Don't forget that both host and ARM need to finish before you can run! The script uses a separate set of configure flags in order to allow both SRC_DIR and ARMSRC_DIR to have the same --prefix. Basically, the ARMSRC_DIR flags will build mpicc, mpispawn, proxy_program, etc. all appended with the -arm suffix to distinguish the host binaries from arm binaries. If you see an error "cannot execute binary file", please make sure that file ./install/bin/ mpispawn reports as x86 executable, and file ./install/bin/ proxy_program reports as aarch64 executable. If either is wrong, then run either cp proxy_program-arm proxy_program or cp mpispawn- x86 mpispawn. If these files don't exist, rerun make && make install to regenerate them.
  • 10. Environment Setup On HPCAC, the Thor hosts have two physical HCAs plugged in. One is the BF2, and the other is a ConnectX-6: [rgopal@thor011 xsc]$ ibstat CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.30.1004 Hardware version: 0 Node GUID: 0x98039b03008553e6 System image GUID: 0x98039b03008553e6 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 60 LMC: 0 SM lid: 9 Capability mask: 0x2651e848 Port GUID: 0x98039b03008553e6 Link layer: InfiniBand
  • 11. CA 'mlx5_1' CA type: MT4123 Number of ports: 1 Firmware version: 20.30.1004 Hardware version: 0 Node GUID: 0x98039b03008553e7 System image GUID: 0x98039b03008553e6 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 41 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0x98039b03008553e7 Link layer: InfiniBand CA 'mlx5_2' CA type: MT41686 Number of ports: 1 Firmware version: 24.30.1004 Hardware version: 0 Node GUID: 0x043f720300ec7f0e System image GUID: 0x043f720300ec7f0e Port 1: State: Active Physical state: LinkUp Rate:100
  • 12. Base lid: 51 LMC: 0 SM lid: 1 Capability mask: 0x2641e848 Port GUID: 0x043f720300ec7f13 Link layer: InfiniBand In order to run the offload, set the following snippet in ~/.bashrc to select the BF-2 on the host while also setting the BF-2 on the ARM cores. STR=`hostname` SUB="bf" if [[ "$STR" == *"$SUB"* ]]; then export MV2_IBA_HCA=mlx5_0 else export MV2_IBA_HCA=mlx5_2 fi Running  Create a hostfile as usual. (a hostfile contains the list of hostnames of nodes to launch in the MPI job  Create a file called dpufile. Fill it with the individual hostnames of each BlueField. Don't write any WPN information (like thor-bf01:2 or thor-bf01nthor-bf01) since the launcher will launch 8 WPN automatically.
  • 13. For example, you can generate a dpufile like this if you have a job allocation with SLURM: scontrol show hostnames | grep bf | tee ./dpufile Set MV2_USE_DPU=1 as an environment variable in mpirun_rsh. Full run command example: ./bin/ mpirun_rsh -np 128 -hostfile ./hostfile -dpufile ./dpufile MV2_USE_DPU=1 ./ libexec / osu-micro- benchmarks/ mpi/collective/ osu_ialltoall OMB tests Osu MicroBenchmarks (OMB) is a benchmark suite developed by NOWLAB that is included in every installation of MVAPICH2 (including MVAPICH2- DPU) in the install-prefix/ libexec /osu-micro- benchmarks folder as a binary after building, and in the osu_benchmarks folder as source code. Additionally, it can be downloaded as a standalone package here: http://mvapich.cse.ohio- state.edu/benchmarks/. The benchmark suite has tests for MPI point-to-point (sending and recving between exactly two processes), collectives (communication among groups of processes), and RMA (one-sided put and get (sending without a corresponding recv, or recv'ing without a corresponding send), and others)
  • 14. As of this writing, the MVAPICH2-DPU package supports MPI_Ialltoall, MPI_Ibcast, and MPI_Iallgather collective offloads. This is a good picture describing what these collectives do. Each row is a buffer (sendbuf or recvbuf) on a single process:
  • 15. Also, it is important to note that there is an I in front of the collective name. This means the collective is actually nonblocking. To demonstrate this, let me show the usage of a blocking alltoall (i.e., MPI_Alltoall) and compare it with a nonblocking alltoall (i.e. MPI_Ialltoall).
  • 16. MPI_Alltoall: // Recvbuf empty MPI_Alltoall( sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm); // May take some time to complete // Recvbuf guaranteed to be full MPI_Ialltoall MPI_Request request; // Recvbuf empty MPI_Ialltoall( sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, &request); // Returns instantly // Do another job while the non-blocking all to all progresses heavy_computation which_does_not depend on recvbuf() // Recvbuf may or may not be filled MPI_Wait(&request, MPI_STATUS_IGNORE); // May or may not take much time to complete, depending on how long heavy_computation which_does_not_depend_on_recvbuf took // Recvbuf guaranteed to be filled
  • 17.  Since MVAPICH2-DPU supports MPI_Ialltoall, MPI_Ibcast, and MPI_Iallgather, we are mainly interested in the output of osu_ialltoall, osu_iallgather, and osu_ibcast. Binaries for these can be found in the install-prefix/ libexec /osu-micro- benchmarks/ mpi/collective folder. Following MPI tests are included in the OMB package:  Point-to-Point MPI Benchmarks: Latency, multi- threaded latency, multi-pair latency, multiple bandwidth / message rate test, bandwidth, bidirectional bandwidth  Collective MPI Benchmarks: Collective latency tests for various MPI collective operations such as MPI_Reduce, MPI_Reduce_Scatter, MPI_Scatter and vector collectives.  Non-Blocking Collective (NBC) MPI Benchmarks: Collective latency and Overlaptests for various MPI collective operations such as MPI_Iallgather , MPI_Iallreduce , MPI_Ialltoall, MPI_Ibarrier , MPI_Ibcast , MPI_Igather , MPI_Ireduce, MPI_Iscatter and vector collectives.
  • 18.  One-sided MPI Benchmarks: one-sided put latency, one-sided put bandwidth, one-sided put bidirectional bandwidth, one-sided get latency, one-sided get bandwidth, one-sided accumulate latency, compare and swap latency, fetch and operate and get_accumulate latency for MVAPICH2 (MPI-2 and MPI-3). omb-refactor.sh runs OMB tests. Then the tests are run on 1, 2, 4, 8 and 16 Nodes with full subscription It is run with MV2_USE_DPU=0 and MV2_USE_DPU=1. All tests are run with the below options COMMON="MV2_DEBUG_SHOW_BACKTRAC E=2 MV2_ENABLE_AFFINITY=0" MV2_DEBUG_SHOW_BACKTRACE Show a backtrace when a process fails on errors like Instruction”, ”Abort” or ”Floating point exception”. MV2_ENABLE_AFFINITY
  • 19. Enable CPU affinity by setting MV2_ENABLE_AFFINITY to 1 or disable it by setting MV2_ENABLE_AFFINITY to 0. Set this to limit, in seconds, of the execution time of MV2_MPIRUN_TIMEOUT parameter. Point-to-Point MPI Benchmarks  osu_latency - Latency Test  osu_latency_mt - Multi-threaded Latency Test  osu_latency_mp - Multi-process Latency Test  osu_bw - Bandwidth Test  osu_bibw - Bidirectional Bandwidth Test  osu_mbw_mr - Multiple Bandwidth / Message Rate Test  osu_multi_lat - Multi-pair Latency Test Pt2Pt tests are run on 1 node[2ppn] and 2 nodes[1ppn]
  • 20. Total Tests: (7 tests * 2 scenarios[host/ dpu]) = 14 Collective MPI Benchmarks  osu_allgather - MPI_Allgather Latency Test  osu_allgatherv - MPI_Allgatherv Latency Test  osu_allreduce - MPI_Allreduce Latency Test  osu_alltoall - MPI_Alltoall Latency Test  osu_alltoallv - MPI_Alltoallv Latency Test  osu_barrier - MPI_Barrier Latency Test  osu_bcast - MPI_Bcast Latency Test  osu_gather - MPI_Gather Latency Test  osu_gatherv - MPI_Gatherv Latency Test  osu_reduce - MPI_Reduce Latency Test  osu_reduce_scatter - MPI_Reduce_scatter Latency Test  osu_scatter - MPI_Scatter Latency Test  osu_scatterv - MPI_Scatterv Latency Test
  • 21. Non-Blocking Collective (NBC) MPI Benchmarks  osu_iallgather - MPI_Iallgather Latency Test  osu_iallgatherv - MPI_Iallgatherv Latency Test  osu_iallreduce - MPI_Iallreduce Latency Test  osu_ialltoall - MPI_Ialltoall Latency Test  osu_ialltoallv - MPI_Ialltoallv Latency Test  osu_ialltoallw - MPI_Ialltoallw Latency Test  osu_ibarrier - MPI_Ibarrier Latency Test  osu_ibcast - MPI_Ibcast Latency Test  osu_igather - MPI_Igather Latency Test  osu_igatherv - MPI_Igatherv Latency Test  osu_ireduce - MPI_Ireduce Latency Test  osu_iscatter - MPI_Iscatter Latency Test  osu_iscatterv - MPI_Iscatterv Latency Test Collective tests are run on 1,2,4,8 and 16 nodes with full subscription [ 16ppn]
  • 22. Total Tests: (26 tests * 2 scenarios[host/ dpu]) = 54 – 1 = 53 Note: ialltoall is a time consuming test hence it is run with the max message size of 32KB, rest all are run with the default max message size One-sided MPI Benchmarks  osu_put_latency - Latency Test for Put with Active/Passive Synchronization  osu_get_latency -Latency Test for Get with Active/Passive Synchronization  osu_put_bw - Bandwidth Test for Put with Active/Passive Synchronization  osu_get_bw - Bandwidth Test for Get with Active/Passive Synchronization  osu_put_bibw - Bi-directional Bandwidth Test for Put with Active Synchronization  osu_acc_latency - Latency Test for Accumulate with Active/Passive Synchronization  osu_cas_latency - Latency Test for Compare and Swap with Active/Passive Synchronization  osu_fop_latency - Latency Test for Fetch andOp with Active/Passive Synchronization
  • 23.  osu_get_acc_latency - Get_accumulate Synchronization Latency Test for with Active/Passive RMA tests are run one or two nodes with ppn = 2 or ppn = 1 respectively Total Tests: (9 tests * 2 scenarios[host/ dpu]) = 18 IMB Tests The objectives of the Intel® MPI Benchmarks are: •Provide a concise set of benchmarks targeted at measuring the most important MPI functions. • Set forth a precise benchmark methodology. •Report bare timings rather than provide interpretation of the measured results. Show throughput values if and only if these values are well- defined. Intel® MPI Benchmarks is developed using ANSI C plus standard MPI Intel® MPI Benchmarksperforms a set of MPI message sizes. The generated benchmark data fully characterizes:
  • 24. • Performanceof a cluster system, including node performance, network latency, and throughput • Efficiency of the MPI implementation used Intel® MPI Benchmarks package consists of the following components: • IMB-MPI1 - benchmarks for MPI-1 functions. • Two components for MPI-2 functionality: • IMB-EXT - one-sided communications benchmarks. • IMB-IO - input/output (I/O) benchmarks. • Two components for MPI-3 functionality: • IMB-NBC - benchmarks for nonblocking collective (NBC) operations. • IMB-RMA - one-sided communications benchmarks. These benchmarks measure the Remote Memory Access (RMA) functionality introduced in the MPI-3 standard. Each component constitutes a separate executable file. You can run all of the supported benchmarks, or specify a single executable file in the command line to get results for a specific subset of benchmarks.imb-refactor.sh runs IMB tests.
  • 25. On a single node IMB tests are run with full subscription. 16 nodes and 16 ppn = 256 processes. It is run with MV2_USE_DPU=0 and MV2_USE_DPU=1. Then the tests are repeated on 2 Nodes and 4 Nodes For two and four nodes maximum message size is limited to 64KB and number of iterations are kept at 500. Here too tests are run with MV2_USE_DPU=0 and 1 Number of iterations for all the tests is set to 500. Tests are run with “multi” option set to 1 and without it. This defines whether the benchmarkruns in the multiple mode or not. All tests are run with the below options COMMON="MV2_DEBUG_SHOW_BACKTRAC E=2 MV2_ENABLE_AFFINITY=0 MV2_SUPPRESS_JOB_STARTUP_PERFORMAN CE_WARNING=1 MPIEXEC_TIMEOUT=300" MV2_DEBUG_SHOW_BACKTRACE
  • 26. Show a backtrace when a process fails on errors like Instruction”, ”Abort” or ”Floating point exception”. MV2_ENABLE_AFFINITY Enable CPU affinity by setting MV2_ENABLE_AFFINITY to 1 or disable it by setting MV2_ENABLE_AFFINITY to 0. Set this to limit, in seconds, of the execution time of the mpi application. This overwritesthe MV2_MPIRUN_TIMEOUT parameter. The following table lists all IMB-NBC benchmarks:
  • 27. IMB-NBC tests Total Tests: (26 tests * 2 scenarios[host/dpu] *2 modes [multi/non-multi] ) = 104 IMB-EXT tests Unidir_Put Unidir_Get Bidir_Put
  • 28. Bidir_Get Accumulate Window Total Tests: (6 tests * 2 scenarios[host/ dpu] * 2modes [multi/non-multi] ) = 24 MPI-1 Benchmarks
  • 29.
  • 30. Total Tests: (19 tests * 2 scenarios[host/ dpu ] *2 modes [multi/non-multi] ) = 76 IMB-RMA Benchmarks The table below lists all IMB-RMA benchmarks
  • 31. Total Tests: (19 tests*2 scenarios[host/ dpu]* 2 modes [multi/non-multi] ) = 76 MPICH Tests The MVAPICH2 MPI library (by Network Based Computing Laboratory at Ohio State University) is a derivative of the MPICH (by Argonne National Laboratory). There exists some tests originally from MPICH that can be found in the ./test folder of a fresh clone of the MVAPICH2 code. There are folders with tests for multiple parts of the code-- pt2pt, collectives, rma, etc. Each folder has a testlist file that can be read in by a script to know which tests to run. Since the MVAPICH2-DPU library at this time only has support for dpu-based collectives, we are mainly interested in passing tests within the ./test/mpi/ coll folder.
  • 32. MPICH collective tests Total Tests: (169 tests * 2 scenarios[host/ dpu] ) = 338 There are many dense tests here such as "alltoall", "bcasttest", "redscat", "red_scat_block", "gather_big", "opprod" "nbicbcast", "nbicallreduce", "nbic" MPICH comm, pt2pt and rma are yet to be investigated. NAS Tests The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks are derived from computational fluid dynamics (CFD) applications and consist of five kernels and three pseudo-applications in the original "pencil-and-paper" specification (NPB 1).
  • 33. The benchmark suite has been extended to include new benchmarks for unstructured adaptive meshes, parallel I/O, multi-zone applications, and computational grids. Problem sizes in NPB are predefined and indicated as different classes. Reference implementations of NPB are available in commonly-used programming models like MPI and OpenMP (NPB 2 and NPB 3). "mg.B.x”,"cg.B.x","ft.B.x","lu.B.x", "is.B.x","sp.B.x","bt.B.x.ep_io", "bt.B.x.mpi_io_full" "ep.B.x",
  • 34. NAS has 9 benchmarks. Following are NAS benchmarks All benchmarks are run successfully on 2 nodes with 2ppn. It is yet to be scaled up to 4,8 and 16 nodes with full subscription. References 1. OMB user guide 2. IMB user guide 3. Wiki End of Document
  • 35. Thank you. www.object-automation.com India Object Automation System Solutions Pvt. Ltd. Chennai. Contact Details: hr@object-automation.com India : +91 73977 84815 / +91 91500 55959