Exploring the Performance Impact of Virtualization on an HPC Cloud

Nuttapong Chakthranont†, Phonlawat Khunphet†, !
Ryousei Takano‡, Tsutomu Ikegami‡
†King Mongkut’s University of Technology North Bangkok, Thailand!
‡National Institute of Advanced Industrial Science and Technology, Japan
IEEE CloudCom 2014@Singapore, 18 Dec. 2014
Exploring the Performance
Impact of Virtualization !
on an HPC Cloud

Outline
•  HPC Cloud
•  AIST Super Green Cloud (ASGC)
•  Experiment
•  Conclusion
2

HPC Cloud
•  HPC users begin to take an interest in the Cloud.
–  e.g., CycleCloud, Penguin on Demand
•  Virtualization is a key technology.
–  Pro: a customized software environment, elasticity, etc
–  Con: a large overhead, spoiling I/O performance.
•  VMM-bypass I/O technologies, e.g., PCI passthrough and !
SR-IOV, can signiﬁcantly mitigate the overhead.

Motivating Observation
•  Performance evaluation of HPC cloud
–  (Para-)virtualized I/O incurs a large overhead.
–  PCI passthrough signiﬁcantly mitigate the overhead.
0
50
100
150
200
250
300
BT CG EP FT LU
Executiontime[seconds]
BMM (IB) BMM (10GbE)
KVM (IB) KVM (virtio)
The overhead of I/O virtualization on the NAS Parallel
Benchmarks 3.3.1 class C, 64 processes.
BMM: Bare Metal Machine
KVM (virtio)
VM1
10GbE NIC
VMM
Guest
driver
Physical
driver
Guest OS
KVM (IB)
VM1
IB QDR HCA
VMM
Physical
driver
Guest OS
Bypass
Improvement by !
PCI passthrough

HPC Cloud
•  HPC users begin to take an interest in the Cloud.
–  e.g., CycleCloud, Penguin on Demand
•  Virtualization is a key technology.
–  Pro: a customized software environment, elasticity, etc
–  Con: a large overhead, spoiling I/O performance.
•  VMM-bypass I/O technologies, e.g., PCI passthrough and !
SR-IOV, can signiﬁcantly mitigate the overhead.
We quantitatively assess the performance impact of virtualization!
to demonstrate the feasibility of HPC Cloud.

Outline
•  HPC Cloud
•  Experiment
•  Conclusion
6

Easy to Use Supercomputer!
- Usage Model of AIST Cloud -
7
Allow users to
customize their
virtual clusters
Web appsBigDataHPC
1. Choose a template image of!
a virtual machine
2. Add required software!
package
VM template image ﬁles
HPC
+
Ease of use
Virtual cluster
deploy
take snapshots
Launch a virtual
machine when
necessary
3. Save a user-customized!
template image

8
Login node
Image!
repository
sgc-tools
Create a
virtual cluster
Easy to Use Supercomputer!
- Elastic Virtual Cluster -
Frontend node (VM)
cmp.!
node
cmp.!
node
cmp.!
node
NFSdJob scheduler
Virtual Cluster
InﬁniBand
Scale in/!
scale out
Submit a job
Note: a singleVM runs on a node.

ASGC Hardware Spec.
9
Compute Node
CPU Intel Xeon E5-2680v2/2.8GHz !
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InﬁniBand Mellanox ConnectX-3 (FDR)
Ethernet Intel X520-DA2 (10 GbE)
Disk Intel SSD DC S3500 600 GB
•  155 node-cluster consists of Cray H2312 blade server
•  The theoretical peak performance is 69.44 TFLOPS
•  The operation started from July, 2014

ASGC Software Stack
Management Stack
–  CentOS 6.5 (QEMU/KVM 0.12.1.2)
–  Apache CloudStack 4.3 + our extensions
•  PCI passthrough/SR-IOV support (KVM only)
•  sgc-tools: Virtual cluster construction utility
–  RADOS cluster storage
HPC Stack (Virtual Cluster)
–  Intel Compiler/Math Kernel Library SP1 1.1.106
–  Open MPI 1.6.5
–  Mellanox OFED 2.1
–  Torque job scheduler
10

Outline
•  HPC Cloud
•  Experiment
•  Conclusion
11

Benchmark Programs
Micro benchmark
–  Intel Micro Benchmark (IMB) version 3.2.4
Application-level benchmark
–  HPC Challenge (HPCC) version 1.4.3
•  G-HPL
•  EP-STREAM
•  G-RandomAccess
•  G-FFT
–  OpenMX version 3.7.4
–  Graph 500 version 2.1.4

MPI Point-to-point
communication
13
0.1$
1$
10$
1$ 1024$
Throughput)(GB/s)
Message)Size)(KB)
Physical$Cluster$
Virtual$Cluster$
5.85GB/s
5.69GB/s
The overhead is less than 3% with large message,
though it is up to 25% with small message.
IMB

MPI Collectives (64bytes)

0
1000
2000
3000
4000
5000
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
0
200
400
600
800
1,000
1,200
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
0
2000
4000
6000
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
Allgather Allreduce
Alltoall
IMB
The overhead becomes
signiﬁcant as the number
of nodes increases.
… load imbalance?
+77% +88%
+43%

G-HPL (LINPACK)
15
0
10
20
30
40
50
60
0 32 64 96 128
Performance(TFLOPS)
Number of Nodes
Physical Cluster
Virtual Cluster
Performance degradation:!
5.4 - 6.6%
Efﬁciency* on 128 nodes
Physical: 90%
Virtual: 84%
*) Rmax / Rpeak
HPCC

EP-STREAM and G-FFT

0
2
4
6
0 32 64 96 128
Performance(GB/s)
Number of Nodes
Physical Cluster
Virtual Cluster
0
40
80
120
160
0 32 64 96 128
Performance(GFLOPS)
Number of Nodes
Physical Cluster
Virtual Cluster
EP-STREAM G-FFT
HPCC
The overheads are ignorable.
memory intensive
with no communication
all-to-all communication!
with large messages

Graph500 (replicated-csc, scale 26)
17
1.00E+07
1.00E+08
1.00E+09
1.00E+10
0 16 32 48 64
Performance(TEPS)
Number of Nodes
Physical Cluster
Virtual Cluster
Graph500
Performance degradation:!
2% (64node)
Graph500 is a Hybrid parallel program (MPI + OpenMP).
We used a combination of 2 MPI processes and 10 OpenMP threads.

Findings
•  PCI passthrough is effective in improving the I/O
performance, however, it is still unable to achieve
the low communication latency of a physical cluster
due to a virtual interrupt injection.
•  VCPU pinning improves the performance for HPC
applications.
•  Almost all MPI collectives suffer from the scalability
issue.
•  The overhead of virtualization has less impact on
actual applications.

Outline
•  HPC Cloud
•  Experiment
•  Conclusion
19

Conclusion and Future work
•  HPC Cloud is promising.
–  Micro benchmark: MPI collectives have the scalability issue.
–  Application-level benchmarks: the negative impact is
limited and the virtualization overhead is about 5%.
–  Our HPC Cloud operation started from July, 2014.
•  Virtualization can contribute to system utilization
improvements.
–  SR-IOV
–  VM placement optimization based on the workloads of
virtual clusters
20

Question?
Thank you for your attention!
21
Acknowledgments:
The authors would like to thank Assoc. Prof. Vara Varavithya, KMUTNB, and Dr.
Yoshio Tanaka,AIST, for valuable guidance and advice. In addition, the authors would also
like to thank the ASGC support team for preparation and trouble shooting of the
experiments. This work was partly supported by JSPS KAKENHI Grant Number 24700040.

Exploring the Performance Impact of Virtualization on an HPC Cloud

More Related Content

What's hot

Similar to Exploring the Performance Impact of Virtualization on an HPC Cloud

More from Ryousei Takano

Recently uploaded

Exploring the Performance Impact of Virtualization on an HPC Cloud