Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures

||ID | SIS
2019 hpc-ch Forum – Cloud and Containers
Andrei Plamadă, Jarunan Panyasantisuk
ETH Zürich – Scientific IT Services
16.05.2019 1
Benchmarking MPI Applications in Singularity Containers
on Traditional HPC and Cloud Infrastructures
Andrei Plamadă

||ID | SIS
§ Motivation
§ User experience:
§ Traditional HPC vs HPC in the Public Cloud
§ Singularity v2.6
§ Benchmarking MPI Applications
§ OSU Micro-Benchmarks
§ Machine Learning: TensorFlow
16.05.2019Andrei Plamadă 2
Outline

||ID | SIS
§ 2018-2022: 20.2% CAGR for IaaS (see Forbes –
Gartner)
Motivation – Public Cloud is growing rapidly
80.0
94.8 110.5
126.7
143.7
30.5 38.9 49.1 61.9
76.7
2018 2019 2020 2021 2022
Worldwide Public Cloud SaaS and IaaS
Revenue Forecast (Billions of U.S. Dollars)
SaaS IaaS

||ID | SIS
Gartner)
§ Expectations
§ More competitive prices
§ More regions
§ More heterogeneous
80.0
94.8 110.5
126.7
143.7
30.5 38.9 49.1 61.9
76.7
2018 2019 2020 2021 2022
SaaS IaaS

||ID | SIS
Gartner)
§ Expectations
§ More competitive prices
§ More regions
§ More heterogeneous
§ Available in Switzerland
§ 2019-03-12 Google Cloud Platform in Zurich
§ Announced in Switzerland
§ 2018-03-14 Azure Switzerland North and West
80.0
94.8 110.5
126.7
143.7
30.5 38.9 49.1 61.9
76.7
2018 2019 2020 2021 2022
SaaS IaaS

||ID | SIS
§ Amazon EC2
§ 2018-11-26 c5n Instances
§ Intel Xeon Platinum ~3.0 GHz, 72 vCPUs, 2.6 GB/vCPU, 100 Gbps
§ Azure
§ 2017-10-23 Cray in Azure
§ Cray XC-series, Cray CS-series
§ 2018-11-14 New H-series in preview*
§ AMD EPYC 7551 ~3.0 GHz: 60 vCPUs, 4.0 GB/vCPU, 100 Gbps EDR InfiniBand (2019-05-14 available)
§ Intel Xeon Platinum 8168 ~3.4 GHz: 44 vCPUs, 8.0 GB/vCPU, 100 Gbps EDR InfiniBand
§ Google Cloud Platform
§ 2019-04-02 Compute-Optimized VMs (C2)
§ 2nd Gen Intel Xeon Scalable Processors ~3.8 GHz, 60 vCPUs, 4.0 GB/vCPU
Motivation – HPC is in the Cloud as per Press Releases

||ID | SIS
§ Containers improve portability and can address the reproducibility issue in
research (EnhanceR Survey - Science IT Consultants)
§ EnhanceR Survey - Infrastructure Providers for Container Use
§ Singularity:
§ Developed initially at LBL - Berkeley Lab - for HPC use case (multi-tenancy)
§ Open source with standard BSD 3 clause license https://github.com/sylabs/singularity
§ Under active development with 12 contributors with more than 100 commits
§ Available also with commercial support: Singularity Pro
§ Used world wide and recommended by vendors, e.g. NVIDIA, Azure Batch
§ Big worldwide community (google groups, slack)
§ Swiss community - EnhanceR
Motivation – Singularity as the container solution for HPC

||ID | SIS
§ Containers improve portability and can address the reproducibility issue in
research (EnhanceR Survey - Science IT Consultants)
§ EnhanceR Survey - Infrastructure Providers for Container Use
§ Main idea
Motivation – Singularity as the container solution for HPC
Host OS+Drivers+Middleware
(OSDM)
MPI
• mpirun
• MPI Library
SSH
Server
App
• Shared MPI
Library
Host OS+Drivers+Middleware
(OSDM)
MPI
• mpirun
SSH
Server
Container OSDM
• MPI
• App
• Shared MPI Library

||ID | SIS
§ Traditional HPC (ETH – SIS – HPC)
§ Euler IV:
§ 2x18 core Intel Xeon Gold 6150 (2.7-3.7 GHz)
§ All cores available
§ HT available
§ 7.4 GB/core Memory
§ 100 Gbps InfiniBand
§ Public Cloud - Azure
§ In preview HC-Series – Standard_HC44rs
§ 2x24 core Intel Xeon Plat 8168 (2.7-3.7 GHz)?
§ 2x2 core used by the supervisor?
§ HT disabled?
§ 8.0 GB/core Memory
§ 100 Gbps InfiniBand
Traditional HPC vs HPC in the Public Cloud

||ID | SIS
§ Traditional HPC (ETH – SIS – HPC)
§ Ready to be used (LSF)
§ No maintenance / set-up
§ Login and Compute Nodes
§ Moderate flexibility regarding the software
stack
§ Queue
§ It generally works as expected
§ Public Cloud - Azure
§ Needs to be set-up (Slurm Cluster) via
CycleCloud
§ As admin fully responsible
§ Master and Execute Nodes
§ High flexibility (as the admin), e.g. OpenMPI,
MPICH, MVAPICH2, Intel MPI
§ Queue (as admin high availability)
§ Auto-scaling
§ https://github.com/Azure/cyclecloud-
slurm/issues
User Experience – Traditional HPC vs HPC in the Public Cloud

||ID | SIS 16.05.2019Andrei Plamadă 11
User Experience on CentOS 7 – Singularity v2.6
Create
• Docker
• root access
• on your PC
Run
• Singularity
• on your PC or HPC
infrastructure
§ Multi-node: MPICH ABI Compatibility
initiative

||ID | SIS
Bytes EN m2 v2.2 EC m2 v2.2 EC m2 v2.3 AN m2 v2.3 AC m2 v2.3
8 0.16 0.15 0.16 0.16 0.08
64 1.30 1.27 1.29 1.28 1.25
512 8.27 8.21 8.14 7.87 7.65
4K 37.41 37.65 37.42 37.23 36.54
32K 88.89 89.25 89.43 83.50 82.47
2M 94.75 94.59 95.19 94.25 94.30
16M 94.95 94.75 95.50 91.49 89.99
Osu Micro-Benchmarks – osu_bw (Gbps) 1000 iterations
Abbreviations: Azure (A), Euler (E), MVAPICH2 (m2), Native (N), Container (C)
§ Naïve EC/AC MPICH v3.3 is working but only up to 10/4 Gbps (no InfiniBand)
§ Host: AC MPICH v3.3, Container: m2 v2.3; results as for AC m2 v2.3 - up to 100 Gbps
§ OpenMPI is not compatible with MPICH-derived MPI implementations is not working

||ID | SIS
Bytes EN m2 v2.2 EC m2 v2.2 EC m2 v2.3 AN m2 v2.3 AC m2 v2.3
8 1.25 1.26 1.30 2.37 2.34
64 1.37 1.38 1.37 2.54 2.54
512 2.12 2.09 2.12 3.44 3.38
4K 3.44 3.34 3.63 5.16 5.30
32K 8.69 8.59 8.88 14.07 13.47
2M 28.46 28.39 28.54 39.62 38.71
16M 188.68 188.70 185.10 202.52 204.84
Osu Micro-Benchmarks – osu_latency (μs) 100000 iterations
Abbreviations: Azure (A), Euler (E), MVAPICH2 (m2), Native (N), Container (C)

Osu Micro-Benchmarks – Dockerfile

||ID | SIS
§ 2018-11-24: new N-Series Azure Virtual Machines (in preview)
§ Standard_ND40s_v2:
§ Intel Skylake: 40 vCPUs, 16.8 GB/vCPU
§ 8 x NVIDIA Tesla V100 NVLINK
Machine Learning – Tensor Flow – on Azure
(1 iteration – NO STATISTICS)
Time to Solution (min)
No of GPUs CUDA 9 CUDA 10 Singularity CUDA 10
1 87 63 65
2 102 89 59?
4 66 46 45
8 28 19 18

Machine Learning – Tensor Flow – Dockerfile (1/2)

Machine Learning – Tensor Flow – Dockerfile (2/2)

Conclusion
§ User experience on Azure - HPC in the cloud is catching up:
§ CycleCloud Slurm Cluster with compute intensive VMs + 100 Gbps InfiniBand in preview
§ Big Machine learning VMs (up to 8 x Tesla V100 NVLINK) in preview
§ Singularity Containers:
§ Once the host is similar with the container we did not experience any overhead
§ HPC partially breaks the portability of containers
§ The container should be compatible with host infrastructure and host MPI implementation
§ Updating CUDA drivers (9 to 10) might improve the time to solution

||ID | SIS
ETH Zürich
Andrei Plamadă
Scientific IT Services
Weinbergstrasse 11
8092 Zürich
Contact Acknowledgements
SIS colleagues
Thomas Wüst
Urban Borstnik
Samuel Fux
EnhanceR colleagues
Alexander Kashev (UniBe)
Microsoft / Azure
Lukasz Miroslaw
Andy Howard
EnhanceR Survey - Infrastructure Providers for Container Use
https://forms.gle/JBW78qDPWabd4GDR8

Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures

Similar to Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures (20)

More from inside-BigData.com

More from inside-BigData.com (20)

Recently uploaded

Recently uploaded (20)

Benchmarking MPI Applications in Singularity Containers on Traditional HPC and Cloud Infrastructures