Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Nuttapong Chakthranont†, Phonlawat Khunphet†, !
Ryousei Takano‡, Tsutomu Ikegami‡
†King Mongkut’s University of Technology...
Outline
•  HPC Cloud
•  AIST Super Green Cloud (ASGC)
•  Experiment
•  Conclusion
2
HPC Cloud
•  HPC users begin to take an interest in the Cloud.
–  e.g., CycleCloud, Penguin on Demand
•  Virtualization is...
Motivating Observation
•  Performance evaluation of HPC cloud
–  (Para-)virtualized I/O incurs a large overhead.
–  PCI pa...
HPC Cloud
•  HPC users begin to take an interest in the Cloud.
–  e.g., CycleCloud, Penguin on Demand
•  Virtualization is...
Outline
•  HPC Cloud
•  AIST Super Green Cloud (ASGC)
•  Experiment
•  Conclusion
6
Easy to Use Supercomputer!
- Usage Model of AIST Cloud -
7
Allow users to
customize their
virtual clusters
Web appsBigData...
8
Login node
Image!
repository
sgc-tools
Create a
virtual cluster
Easy to Use Supercomputer!
- Elastic Virtual Cluster -
F...
ASGC Hardware Spec.
9
Compute Node
CPU Intel Xeon E5-2680v2/2.8GHz !
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InfiniBand Me...
ASGC Software Stack
Management Stack
–  CentOS 6.5 (QEMU/KVM 0.12.1.2)
–  Apache CloudStack 4.3 + our extensions
•  PCI pa...
Outline
•  HPC Cloud
•  AIST Super Green Cloud (ASGC)
•  Experiment
•  Conclusion
11
Benchmark Programs
Micro benchmark
–  Intel Micro Benchmark (IMB) version 3.2.4
Application-level benchmark
–  HPC Challen...
MPI Point-to-point
communication
13
0.1$
1$
10$
1$ 1024$
Throughput)(GB/s)
Message)Size)(KB)
Physical$Cluster$
Virtual$Clu...
MPI Collectives (64bytes)
  
0
1000
2000
3000
4000
5000
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluste...
G-HPL (LINPACK)
15
0
10
20
30
40
50
60
0 32 64 96 128
Performance(TFLOPS)
Number of Nodes
Physical Cluster
Virtual Cluster...
EP-STREAM and G-FFT
  
0
2
4
6
0 32 64 96 128
Performance(GB/s)
Number of Nodes
Physical Cluster
Virtual Cluster
0
40
80
1...
Graph500 (replicated-csc, scale 26)
17
1.00E+07
1.00E+08
1.00E+09
1.00E+10
0 16 32 48 64
Performance(TEPS)
Number of Nodes...
Findings
•  PCI passthrough is effective in improving the I/O
performance, however, it is still unable to achieve
the low ...
Outline
•  HPC Cloud
•  AIST Super Green Cloud (ASGC)
•  Experiment
•  Conclusion
19
Conclusion and Future work
•  HPC Cloud is promising.
–  Micro benchmark: MPI collectives have the scalability issue.
–  A...
Question?
Thank you for your attention!
21
Acknowledgments:
The authors would like to thank Assoc. Prof. Vara Varavithya, ...
Upcoming SlideShare
Loading in …5
×

Exploring the Performance Impact of Virtualization on an HPC Cloud

7,260 views

Published on

IEEE CloudCom 2014@Singapore

Published in: Technology
  • Be the first to comment

Exploring the Performance Impact of Virtualization on an HPC Cloud

  1. 1. Nuttapong Chakthranont†, Phonlawat Khunphet†, ! Ryousei Takano‡, Tsutomu Ikegami‡ †King Mongkut’s University of Technology North Bangkok, Thailand! ‡National Institute of Advanced Industrial Science and Technology, Japan IEEE CloudCom 2014@Singapore, 18 Dec. 2014 Exploring the Performance Impact of Virtualization ! on an HPC Cloud
  2. 2. Outline •  HPC Cloud •  AIST Super Green Cloud (ASGC) •  Experiment •  Conclusion 2
  3. 3. HPC Cloud •  HPC users begin to take an interest in the Cloud. –  e.g., CycleCloud, Penguin on Demand •  Virtualization is a key technology. –  Pro: a customized software environment, elasticity, etc –  Con: a large overhead, spoiling I/O performance. •  VMM-bypass I/O technologies, e.g., PCI passthrough and ! SR-IOV, can significantly mitigate the overhead.
  4. 4. Motivating Observation •  Performance evaluation of HPC cloud –  (Para-)virtualized I/O incurs a large overhead. –  PCI passthrough significantly mitigate the overhead. 0 50 100 150 200 250 300 BT CG EP FT LU Executiontime[seconds] BMM (IB) BMM (10GbE) KVM (IB) KVM (virtio) The overhead of I/O virtualization on the NAS Parallel Benchmarks 3.3.1 class C, 64 processes. BMM: Bare Metal Machine KVM (virtio) VM1 10GbE NIC VMM Guest driver Physical driver Guest OS KVM (IB) VM1 IB QDR HCA VMM Physical driver Guest OS Bypass Improvement by ! PCI passthrough
  5. 5. HPC Cloud •  HPC users begin to take an interest in the Cloud. –  e.g., CycleCloud, Penguin on Demand •  Virtualization is a key technology. –  Pro: a customized software environment, elasticity, etc –  Con: a large overhead, spoiling I/O performance. •  VMM-bypass I/O technologies, e.g., PCI passthrough and ! SR-IOV, can significantly mitigate the overhead. We quantitatively assess the performance impact of virtualization! to demonstrate the feasibility of HPC Cloud.
  6. 6. Outline •  HPC Cloud •  AIST Super Green Cloud (ASGC) •  Experiment •  Conclusion 6
  7. 7. Easy to Use Supercomputer! - Usage Model of AIST Cloud - 7 Allow users to customize their virtual clusters Web appsBigDataHPC 1. Choose a template image of! a virtual machine 2. Add required software! package VM template image files HPC + Ease of use Virtual cluster deploy take snapshots Launch a virtual machine when necessary 3. Save a user-customized! template image
  8. 8. 8 Login node Image! repository sgc-tools Create a virtual cluster Easy to Use Supercomputer! - Elastic Virtual Cluster - Frontend node (VM) cmp.! node cmp.! node cmp.! node NFSdJob scheduler Virtual Cluster InfiniBand Scale in/! scale out Submit a job Note: a singleVM runs on a node.
  9. 9. ASGC Hardware Spec. 9 Compute Node CPU Intel Xeon E5-2680v2/2.8GHz ! (10 core) x 2CPU Memory 128 GB DDR3-1866 InfiniBand Mellanox ConnectX-3 (FDR) Ethernet Intel X520-DA2 (10 GbE) Disk Intel SSD DC S3500 600 GB •  155 node-cluster consists of Cray H2312 blade server •  The theoretical peak performance is 69.44 TFLOPS •  The operation started from July, 2014
  10. 10. ASGC Software Stack Management Stack –  CentOS 6.5 (QEMU/KVM 0.12.1.2) –  Apache CloudStack 4.3 + our extensions •  PCI passthrough/SR-IOV support (KVM only) •  sgc-tools: Virtual cluster construction utility –  RADOS cluster storage HPC Stack (Virtual Cluster) –  Intel Compiler/Math Kernel Library SP1 1.1.106 –  Open MPI 1.6.5 –  Mellanox OFED 2.1 –  Torque job scheduler 10
  11. 11. Outline •  HPC Cloud •  AIST Super Green Cloud (ASGC) •  Experiment •  Conclusion 11
  12. 12. Benchmark Programs Micro benchmark –  Intel Micro Benchmark (IMB) version 3.2.4 Application-level benchmark –  HPC Challenge (HPCC) version 1.4.3 •  G-HPL •  EP-STREAM •  G-RandomAccess •  G-FFT –  OpenMX version 3.7.4 –  Graph 500 version 2.1.4  
  13. 13. MPI Point-to-point communication 13 0.1$ 1$ 10$ 1$ 1024$ Throughput)(GB/s) Message)Size)(KB) Physical$Cluster$ Virtual$Cluster$ 5.85GB/s 5.69GB/s The overhead is less than 3% with large message, though it is up to 25% with small message. IMB
  14. 14. MPI Collectives (64bytes)   0 1000 2000 3000 4000 5000 0 32 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster 0 200 400 600 800 1,000 1,200 0 32 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster 0 2000 4000 6000 0 32 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster Allgather Allreduce Alltoall IMB The overhead becomes significant as the number of nodes increases. … load imbalance? +77% +88% +43%
  15. 15. G-HPL (LINPACK) 15 0 10 20 30 40 50 60 0 32 64 96 128 Performance(TFLOPS) Number of Nodes Physical Cluster Virtual Cluster Performance degradation:! 5.4 - 6.6% Efficiency* on 128 nodes Physical: 90% Virtual: 84% *) Rmax / Rpeak HPCC
  16. 16. EP-STREAM and G-FFT   0 2 4 6 0 32 64 96 128 Performance(GB/s) Number of Nodes Physical Cluster Virtual Cluster 0 40 80 120 160 0 32 64 96 128 Performance(GFLOPS) Number of Nodes Physical Cluster Virtual Cluster EP-STREAM G-FFT HPCC The overheads are ignorable. memory intensive with no communication all-to-all communication! with large messages
  17. 17. Graph500 (replicated-csc, scale 26) 17 1.00E+07 1.00E+08 1.00E+09 1.00E+10 0 16 32 48 64 Performance(TEPS) Number of Nodes Physical Cluster Virtual Cluster Graph500 Performance degradation:! 2% (64node) Graph500 is a Hybrid parallel program (MPI + OpenMP). We used a combination of 2 MPI processes and 10 OpenMP threads.
  18. 18. Findings •  PCI passthrough is effective in improving the I/O performance, however, it is still unable to achieve the low communication latency of a physical cluster due to a virtual interrupt injection. •  VCPU pinning improves the performance for HPC applications. •  Almost all MPI collectives suffer from the scalability issue. •  The overhead of virtualization has less impact on actual applications.  
  19. 19. Outline •  HPC Cloud •  AIST Super Green Cloud (ASGC) •  Experiment •  Conclusion 19
  20. 20. Conclusion and Future work •  HPC Cloud is promising. –  Micro benchmark: MPI collectives have the scalability issue. –  Application-level benchmarks: the negative impact is limited and the virtualization overhead is about 5%. –  Our HPC Cloud operation started from July, 2014. •  Virtualization can contribute to system utilization improvements. –  SR-IOV –  VM placement optimization based on the workloads of virtual clusters 20
  21. 21. Question? Thank you for your attention! 21 Acknowledgments: The authors would like to thank Assoc. Prof. Vara Varavithya, KMUTNB, and Dr. Yoshio Tanaka,AIST, for valuable guidance and advice. In addition, the authors would also like to thank the ASGC support team for preparation and trouble shooting of the experiments. This work was partly supported by JSPS KAKENHI Grant Number 24700040.

×