AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud

Ryousei Takano, Yusuke Tanimura, Akihiko Oota,!
Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka
National Institute of Advanced Industrial Science and Technology, Japan
ISGC 2015@Taipei, 20 March. 2015
AIST Super Green Cloud: !
Lessons Learned from the Operation and !
the Performance Evaluation of HPC Cloud

AIST Super Green Cloud
This talk is about…

Introduction
•  HPC Cloud is promising HPC platform.
•  Virtualization is a key technology.
–  Pro: a customized software environment, elasticity, etc
–  Con: a large overhead, spoiling I/O performance.
•  VMM-bypass I/O technologies, e.g., PCI passthrough and !
SR-IOV, can signiﬁcantly mitigate the overhead.
•  “99% of HPC jobs running on US NSF computing
centers ﬁt into one rack.” -- M. Norman, UCSD
•  Current virtualization technologies are feasible
enough for supporting such a scale.

LINPACK on ASGC
4
0
10
20
30
40
50
60
0 32 64 96 128
Performance(TFLOPS)
Number of Nodes
Physical Cluster
Virtual Cluster
Performance degradation:!
5.4 - 6.6%
Efﬁciency* on 128 nodes
Physical cluster: 90%
Virtual cluster: 84%
*) Rmax / Rpeak
IEEE CloudCom 2014

Introduction (cont’d)
•  HPC Clouds are heading for hybrid-cloud and multi-
cloud systems, where the user can execute their
application anytime and anywhere he/she wants.
•  Vision of AIST HPC Cloud:
“Build once, run everywhere”
•  AIST Super Green Cloud (ASGC): !
a fully virtualized HPC system

Outline
•  AIST Super Green Cloud (ASGC) and!
HPC Cloud service
•  Lessons learned from the ﬁrst six months of operation
•  Experiments
•  Conclusion
6

Vision of AIST HPC Cloud!
“Build Once, Run Everywhere”
7
Academic Cloud
Private Cloud
Commercial Cloud
Virtual cluster!
templates
Deploy a Virtual Cluster
Feature 1: Create a customized
virtual cluster easily
Feature 2: Build a virtual
cluster once, and run it
everywhere on clouds

Usage Model of AIST Cloud
8
Allow users to
customize their
virtual clusters
0 21
Web appsBigDataHPC
1. Select a template of!
a virtual machine
2. Install required software!
package
VM template ﬁles
HPC
+
Ease of use
Virtual cluster
deploy
take snapshots
Launch a virtual
machine when
necessary
3. Save a user-customized!
template in the repository

9
Elastic Virtual Cluster
Cloud
controller
Login node
sgc-tools
Image
repository
Virtual cluster!
template
Cloud
controller
Frontend node
cmp!
node
cmp
node
cmp
node Scale in/!
scale out
NFSdJob scheduler
Virtual Cluster on ASGC
InﬁniBand/Ethernet
Image
repository
Import/
export
Create a virtual
cluster
Virtual Cluster!
on Public Cloud
Frontend node
cmp!
node
cmp!
node
Ethernet
Submit a job
Submit a job
In operationUnder development

ASGC Hardware Spec.
10
Compute Node
CPU Intel Xeon E5-2680v2/2.8GHz !
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InﬁniBand Mellanox ConnectX-3 (FDR)
Ethernet Intel X520-DA2 (10 GbE)
Disk Intel SSD DC S3500 600 GB
•  155 node-cluster consists of Cray H2312 blade server
•  The theoretical peak performance is 69.44 TFLOPS
Network switch
InﬁniBand Mellanox SX6025
Ethernet Extreme BlackDiamond X8

ASGC Software Stack
Management Stack
–  CentOS 6.5 (QEMU/KVM 0.12.1.2)
–  Apache CloudStack 4.3 + our extensions
•  PCI passthrough/SR-IOV support (KVM only)
•  sgc-tools: Virtual cluster construction utility
–  RADOS cluster storage
HPC Stack (Virtual Cluster)
–  Intel Cluster Studio SP1 1.2.144
–  Mellanox OFED 2.1
–  TORQUE job scheduler 4.2.8
11

Storage Architecture
User%a'ached+
storage
x+N
VLAN
Compute+nodes
Compute+network+
(Inﬁniband+FDR)
Management+network+
(10/1GbE)
x+155
x+155
BDX8
x+155
x+5
VMDI+public+SW
RGWRADOS
VMDI+cluster+SW
x+10
x+10
VMDI+storage
Data+network+(10GbE)
NFS
x+2 x+2
VM
•  No shared storage/!
ﬁlesystem
•  VMDI (Virtual Machine
Disk Image) storage
–  RADOS storage cluster
–  RADOS gateway
–  NFS secondary staging
server
•  User-attached storage
primary storage!
(local SSD)
secondary
storage
user data

Zabbix Monitoring System
CPU Usage
Power Usage

Outline
HPC Cloud service
–  CloudStack on a Supercomputer
–  Cloud Service for HPC users
–  Utilization
•  Experiments
•  Conclusion
14

Overview of ASGC Operation
•  The operation started from July 2014.
•  Accounts: 30+
–  Main users are material scientists and genome scientists.
•  Utilization: < 70%
•  95% of the total usage time is consumed for running
HPC VM instances.
•  Hardware failures: 19 (memory, M/B, power supply)

CloudStack on Supercomputer
•  Supercomputer is not designed for cloud computing.
–  Cluster management software is troublesome.
•  We can launch a highly productive system in a short
development time by leveraging open source system
software.
•  Software maturity of CloudStack
–  Our storage architecture is slightly uncommon, that is we
use local SSD disk as primary storage, and S3-compatible
object store as secondary storage.
–  We discovered and resolved several serious bugs.

Software Maturily
CloudStack Issue Our action Status
cloudstack-agent jsvc gets too large virtual memory space Patch Fixed
listUsageRecords generates NullPointerExceptions for
expunging instances
Patch Fixed
Duplicate usage records when listing large number of records /
Small page sizes return duplicate results
Backporting Fixed
Public key content is overridden by template's meta data when
you create a instance
Bug report Fixed
Migration of aVM with volumes in local storage to another
host in the same cluster is failing
Backporting Fixed
Negative ref_cnt of template(snapshot/volume)_store_ref
results in out-of-range error in MySQL
Patch (not merged) Fixed
[S3] Parallel deployment makes reference count of a cache in
NFS secondary staging store negative(-1)
Patch (not merged) Unresolved
Can't create proper template fromVM on S3 secondary
storage environment
Patch Fixed
Fails to attach a volume (is made from a snapshot) to aVM
with using local storage as primary storage
Bug report Unresolved

Cloud service for HPC users
•  SaaS is the best if the target application is clear.
•  IaaS is quite ﬂexible. However, it is difﬁcult to
manage an HPC environment from scratch for
application users.
•  To bridge this gap, sgc-tools is introduced on top of
an IaaS service.
•  We believe it works well, although some minor
problems are remained.
•  To improve the ability to maintain VM template, the
idea of “Infrastructure as code” can help.

Utilization
•  Efﬁcient use of limited resources is required.
•  A virtual cluster dedicates resources whether the
user fully utilizes them or not.
•  sgc-tools do not support queuing at system wide,
therefore, the users need to check the availability.
•  Introducing a global scheduler, e.g., Condor VM
universe, can be a solution for this problem.

Outline
HPC Cloud service
•  Experiments
–  Deployment time
–  Performance evaluation of SR-IOV
•  Conclusion
20

0 5 10 15 20 25 30 35
0
200
400
600
800
1000
Number of nodes
Deploytime[second]
●
●
●
●
●
● no cache
cache on NFS/SS
cache on local node
Virtual Cluster Deployment
Breakdown (second)
Device attach!
(before OS boot)
90
OS boot 90
FS creation (mkfs) 90
transfer from !
RADOS to SS
transfer from SS to !
local node
VM
VM
RADOS
NFS/SS
Compute!
node
VM

Benchmark Programs
Micro benchmark
–  Intel Micro Benchmark (IMB) version 3.2.4
•  Point-to-point
•  Collectives: Allgather, Allreduce, Alltoall, Bcast, Reduce, Barrier
Application-level benchmark
–  LAMMPS Molecular Dynamics Simulator version 28 June
2014
•  EAM benchmark, 100x100x100 atoms

MPI Point-to-point
communication
23
6.00GB/s
5.72GB/s
5.73GB/s
IMB
Message size [Byte]
Bandwidth[MB/sec]
101
102
103
104
105
10610−1
101
102
103
104
BM
PCI passthrough
SR−IOV
The overhead is less than 5% with large message,
though it is up to 30% with small message.

Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
MPI Collectives
Time [microsecond]
BM 6.87 (1.00)
PCI passthrough 8.07 (1.17)
SR-IOV 9.36 (1.36)
Message size [Byte]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
Message size [Byte]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
Message size [Byte]
101
102
103
104
105
106101
102
103
104
BM
PCI passthrough
SR−IOV
Message size [Byte]
101
102
103
104
105
106101
102
103
104
BM
PCI passthrough
SR−IOV
ReuceBcast
AllreduceAllgather Alltoall
The performance of SR-IOV is comparable to
that of PCI passthrough while unexpected
performance degradation is often observed
Barrier

LAMMPS: MD simulator
•  EAM benchmark:
–  Fixed problem size (1M
atoms)
–  #proc: 20 - 160
•  VCPU pinning reduces
performance ﬂuctuation.
•  Performance overhead
of PCI passthrough and
SR-IOV is about 13 %.
20 40 60 80 100 120 140 160
0
10
20
30
40
50
60
Number of processes
Executiontime[second]
●
●
●
●
●
● BM
PCI passthrough
PCI passthrough w/ vCPU pin
SR−IOV
SR−IOV w/ vCPU pin

Findings
•  The performance of SR-IOV is comparable to that of
PCI passthrough while unexpected performance
degradation is often observed.
•  VCPU pinning improves the performance for HPC
applications.

Outline
HPC Cloud service
•  Experiments
•  Conclusion
27

Conclusion and Future work
•  ASGC is a fully virtualized HPC system.
•  We can launch a highly productive system in a short
development time by leveraging start-of-the-art open
source system software.
–  Extension: PCI passthrough/SR-IOV support, sgc-tools
–  Bug ﬁxes…
•  Future research direction: data movement is key.
–  Efﬁcient data management and transfer methods
–  Federated identity management
28

Question?
Thank you for your attention!
29
Acknowledgments:
This work was partly supported by JSPS KAKENHI Grant Number 24700040.

AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud

Similar to AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud (20)

More from Ryousei Takano

More from Ryousei Takano (16)

Recently uploaded

Recently uploaded (20)

AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud