IEEE  CloudCom  2014参加報告
⾼高野@産総研  担当パート
•  Session:  2C:  Virtualization  I
•  Session:  3C,  4B:  HPC  on  Cloud
120150206  グリッド協議会第45回ワークショップ
•  アカデミア⾊色が強い
•  アジア系が多い
•  採択率率率の割に。。。
•  分野の成熟
Rank:  CORE  computer  
science  conference  
rankings
Publication,  Citation:  
Microsoft  academic  
search
所感
Rank	
 Publica+on	
 Cita+on	
 %	
  accepted	
IEEE/ACM	
  CCGrid	
 A	
 1454	
 10577	
 19	
IEEE	
  CLOUD	
 B	
 234	
 445	
 18	
IEEE	
  CloudCom	
 C	
 70	
 187	
 18	
IEEE	
  CloudNet	
 -­‐	
 -­‐	
 -­‐	
 28	
IEEE/ACM	
  UCC	
 -­‐	
 -­‐	
 -­‐	
 19	
ACM	
  SoCC	
 -­‐	
 -­‐	
 -­‐	
 24	
CLOSER	
 -­‐	
 -­‐	
 -­‐	
 17	
  	
Gartner  Hype  Curve  2014
クラウドを冠した国際会議
(順番に意味はないのであしからず)
A  3-‐‑‒level  Cache  Miss  Model  for  a  Nonvolatile  
Extension  to  Transcendent  Memory
•  Transcendent  memory  (tmem)
–  サイズは誰にもわからず、書込みは失敗するかもしれず、
読出し時にデータはすでに消えているかもしれないメモリ
–  クリーンページのキャッシュ管理理⽤用の機構
•  cleancache,  frontswap
•  zcache,  RAMster,  Xen  shim
–  応⽤用例例:VM環境のメモリオーバ
プロビジョニング
•  NEXTmem  (aka.  Ex-‐‑‒Tmem)
–  キャッシュ量量を増やすために
不不揮発メモリを利利⽤用
–  クラウド環境はメモリ階層が
深化する傾向に有り、その解析
モデルは重要な研究
evicted page
clean
(FIFO)
put buffer
NEXTmem
memory allocation
guest VM
swap region clean region
(LFU)
DRAMhot region
(LRU)
NVM
hypervisor
dirty
level2
level1
disk
flush
put
3
参考:  Persistent  memory
•  ブロックデバイス
–  NVMe  driver
•  ファイルシステム
–  ファイルキャッシュ層を削除し、直接NVMにアクセス
–  PMFS,  DAX
•  OpenNVM  (SanDisk)
–  API:  atomic  write,  atomic  trim
–  NVMKV,  NVMFS
•  SNIA  NVM  Programming  Technical  WG
–  http://www.snia.org/forums/sssi/nvmp
4
PM  =  Linux⽤用語で不不揮発メモリ
HPC  on  Cloud  (8  papers)
1.  “Reliability	
  Guided	
  Resource	
  Alloca+on	
  for	
  Large-­‐Scale	
  Systems,”	
  	
  
S.	
  Umamaheshwaran	
  and	
  T.	
  J.	
  Hacker	
  (Purdue	
  U.)	
  
2.  “Energy-­‐Efficient	
  Scheduling	
  of	
  Urgent	
  Bag-­‐of-­‐Tasks	
  Applica+ons	
  in	
  Clouds	
  through	
  
DVFS,”	
  R.	
  N.	
  Calheiros	
  and	
  R.	
  Buyya	
  (U.	
  Melbourne)	
  
3.  “A	
  Framework	
  for	
  Measuring	
  the	
  Impact	
  and	
  Effec+veness	
  of	
  the	
  NEES	
  Cyber-­‐
infrastructure	
  for	
  Earthquake	
  Engineering,”	
  T.	
  Hacker	
  and	
  A.	
  J.	
  Magana	
  (Purdue	
  U.)	
  
4.  “Execu+ng	
  Bag	
  of	
  Distributed	
  Tasks	
  on	
  the	
  Cloud:	
  Inves+ga+ng	
  the	
  Trade-­‐Offs	
  
between	
  Performance	
  and	
  Cost,”	
  L.	
  Thai,	
  B.	
  Varghese,	
  and	
  A.	
  Barker	
  (U.	
  St	
  Andrew)	
  
5.  “CPU	
  Performance	
  Coefficient	
  (CPU-­‐PC):	
  A	
  Novel	
  Performance	
  Metric	
  Based	
  on	
  
Real-­‐Time	
  CPU	
  Resource	
  Provisioning	
  in	
  Time-­‐Shared	
  Cloud	
  Environments,”	
  T.	
  
Mastelić,	
  I.	
  Brandić,	
  and	
  J.	
  Jašarević	
  (Vienna	
  U.	
  of	
  Technology)	
  
6.  “Performance	
  Analysis	
  of	
  Cloud	
  Environments	
  on	
  Top	
  of	
  Energy-­‐Efficient	
  Pla^orms	
  
Featuring	
  Low	
  Power	
  Processors,”	
  V.	
  Plugaru,	
  S.	
  Varre[e,	
  and	
  P.	
  Bouvry	
  (U.	
  Luxembourg)	
  
7.  “Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud,”	
  N.	
  
Chakthranont,	
  P.	
  Khunphet,	
  R.	
  Takano,	
  and	
  T.	
  Ikegami	
  (KMUTNB,	
  AIST)	
  
8.  “GateCloud:	
  An	
  Integra+on	
  of	
  Gate	
  Monte	
  Carlo	
  Simula+on	
  with	
  a	
  Cloud	
  
Compu+ng	
  Environment,”	
  B.	
  A.	
  Rowedder,	
  H.	
  Wang,	
  and	
  Y.	
  Kuang	
  (UNLV)	
  
5
キーワード
•  ⽬目的
–  耐障害性  [1]、省省電⼒力力  [2,  6]、性能指標  [4,  5]、
⾼高性能  [6,  7]
•  システム
–  リソースプロビジョニング・スケジューラ  [1,  4,  5]
–  IaaS:  OpenStack  [6],  CloudStack  [7]
–  ワークフロー  [8]
•  アプリケーション
–  MPI  [6,  7]
–  Bag  of  Tasks  [2],  Bag  of  Distributed  Tasks  [4]
–  Webアプリ  (FFmpeg,  MongoDB,  Ruby  on  Rails)  [5]
–  モンテカルロ  [8]
–  Earthquake  Engineering  [3]
6
︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎  ︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎CPU  Performance  Coefficient  (CPU-‐‑‒PC):  A  Novel  
Performance  Metric  Based  on  Real-‐‑‒time  CPU  Resource  
Provisioning  in  Time-‐‑‒shared  Cloud  Environment
•  クラウド環境では1台のサーバに複数のVMが共存
•  クラウド提供者も利利⽤用者も使える性能指標が欲しい
–  response  timeは他のVMの影響で変動
•  stolen  timeに着⽬目した指標CPU-‐‑‒PCを提案
•  CPU-‐‑‒PCとresponse  timeは⾮非常に⾼高い相関
7
ASGC Hardware Spec.
8
Compute Node
CPU Intel Xeon E5-2680v2/2.8GHz 
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InfiniBand Mellanox ConnectX-3 (FDR)
Ethernet Intel X520-DA2 (10 GbE)
Disk Intel SSD DC S3500 600 GB
•  155 node-cluster consists of Cray H2312 blade server
•  The theoretical peak performance is 69.44 TFLOPS
•  The operation started from July, 2014
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
ASGC Software Stack
Management Stack
–  CentOS 6.5 (QEMU/KVM 0.12.1.2)
–  Apache CloudStack 4.3 + our extensions
•  PCI passthrough/SR-IOV support (KVM only)
•  sgc-tools: Virtual cluster construction utility
–  RADOS cluster storage
HPC Stack (Virtual Cluster)
–  Intel Compiler/Math Kernel Library SP1 1.1.106
–  Open MPI 1.6.5
–  Mellanox OFED 2.1
–  Torque job scheduler
9
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
Benchmark Programs
Micro benchmark
–  Intel Micro Benchmark (IMB) version 3.2.4
Application-level benchmark
–  HPC Challenge (HPCC) version 1.4.3
•  G-HPL
•  EP-STREAM
•  G-RandomAccess
•  G-FFT
–  OpenMX version 3.7.4
–  Graph 500 version 2.1.4
10
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
MPI Point-to-point
communication
11
0.1$
1$
10$
1$ 1024$
Throughput)(GB/s)
Message)Size)(KB)
Physical$Cluster$
Virtual$Cluster$
5.85GB/s
5.69GB/s
The overhead is less than 3% with large message,
though it is up to 25% with small message.
IMBExploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
MPI Collectives (64bytes)
12
0
1000
2000
3000
4000
5000
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
0
200
400
600
800
1,000
1,200
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
0
2000
4000
6000
0 32 64 96 128
ExecutionTime(usec)
Number of Nodes
Physical Cluster
Virtual Cluster
Allgather Allreduce
Alltoall
IMB
The overhead becomes
significant as the number
of nodes increases.
… load imbalance?
+77% +88%
+43%
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
G-HPL (LINPACK)
13
0
10
20
30
40
50
60
0 32 64 96 128
Performance(TFLOPS)
Number of Nodes
  Physical Cluster
  Virtual Cluster
Performance degradation:
5.4 - 6.6%
Efficiency* on 128 nodes
・Physical: 90%
・Virtual: 84%
*) Rmax / Rpeak
HPCCExploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
EP-STREAM and G-FFT
14
0
2
4
6
0 32 64 96 128
Performance(GB/s)
Number of Nodes
  Physical Cluster
  Virtual Cluster
0
40
80
120
160
0 32 64 96 128
Performance(GFLOPS)
Number of Nodes
  Physical Cluster
  Virtual Cluster
EP-STREAM G-FFT
HPCC
The overheads are ignorable.
memory intensive
with no communication
all-to-all communication
with large messages
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
Graph500 (replicated-csc, scale 26)
15
1.00E+07
1.00E+08
1.00E+09
1.00E+10
0 16 32 48 64
Performance(TEPS)
Number of Nodes
Physical Cluster
Virtual Cluster
Graph500
Performance degradation:
2% (64node)
Graph500 is a Hybrid parallel program (MPI + OpenMP).
We used a combination of 2 MPI processes and 10 OpenMP threads.
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  
Findings
•  PCI passthrough is effective in improving the I/O
performance, however, it is still unable to achieve
the low communication latency of a physical cluster
due to a virtual interrupt injection.
•  VCPU pinning improves the performance for HPC
applications.
•  Almost all MPI collectives suffer from the scalability
issue.
•  The overhead of virtualization has less impact on
actual applications.
16
Exploring	
  the	
  Performance	
  Impact	
  of	
  Virtualiza+on	
  on	
  an	
  HPC	
  Cloud	
  

IEEE CloudCom 2014参加報告

  • 1.
    IEEE  CloudCom  2014参加報告 ⾼高野@産総研 担当パート •  Session:  2C:  Virtualization  I •  Session:  3C,  4B:  HPC  on  Cloud 120150206  グリッド協議会第45回ワークショップ
  • 2.
    •  アカデミア⾊色が強い •  アジア系が多い • 採択率率率の割に。。。 •  分野の成熟 Rank:  CORE  computer   science  conference   rankings Publication,  Citation:   Microsoft  academic   search 所感 Rank Publica+on Cita+on %  accepted IEEE/ACM  CCGrid A 1454 10577 19 IEEE  CLOUD B 234 445 18 IEEE  CloudCom C 70 187 18 IEEE  CloudNet -­‐ -­‐ -­‐ 28 IEEE/ACM  UCC -­‐ -­‐ -­‐ 19 ACM  SoCC -­‐ -­‐ -­‐ 24 CLOSER -­‐ -­‐ -­‐ 17   Gartner  Hype  Curve  2014 クラウドを冠した国際会議 (順番に意味はないのであしからず)
  • 3.
    A  3-‐‑‒level  Cache Miss  Model  for  a  Nonvolatile   Extension  to  Transcendent  Memory •  Transcendent  memory  (tmem) –  サイズは誰にもわからず、書込みは失敗するかもしれず、 読出し時にデータはすでに消えているかもしれないメモリ –  クリーンページのキャッシュ管理理⽤用の機構 •  cleancache,  frontswap •  zcache,  RAMster,  Xen  shim –  応⽤用例例:VM環境のメモリオーバ プロビジョニング •  NEXTmem  (aka.  Ex-‐‑‒Tmem) –  キャッシュ量量を増やすために 不不揮発メモリを利利⽤用 –  クラウド環境はメモリ階層が 深化する傾向に有り、その解析 モデルは重要な研究 evicted page clean (FIFO) put buffer NEXTmem memory allocation guest VM swap region clean region (LFU) DRAMhot region (LRU) NVM hypervisor dirty level2 level1 disk flush put 3
  • 4.
    参考:  Persistent  memory • ブロックデバイス –  NVMe  driver •  ファイルシステム –  ファイルキャッシュ層を削除し、直接NVMにアクセス –  PMFS,  DAX •  OpenNVM  (SanDisk) –  API:  atomic  write,  atomic  trim –  NVMKV,  NVMFS •  SNIA  NVM  Programming  Technical  WG –  http://www.snia.org/forums/sssi/nvmp 4 PM  =  Linux⽤用語で不不揮発メモリ
  • 5.
    HPC  on  Cloud (8  papers) 1.  “Reliability  Guided  Resource  Alloca+on  for  Large-­‐Scale  Systems,”     S.  Umamaheshwaran  and  T.  J.  Hacker  (Purdue  U.)   2.  “Energy-­‐Efficient  Scheduling  of  Urgent  Bag-­‐of-­‐Tasks  Applica+ons  in  Clouds  through   DVFS,”  R.  N.  Calheiros  and  R.  Buyya  (U.  Melbourne)   3.  “A  Framework  for  Measuring  the  Impact  and  Effec+veness  of  the  NEES  Cyber-­‐ infrastructure  for  Earthquake  Engineering,”  T.  Hacker  and  A.  J.  Magana  (Purdue  U.)   4.  “Execu+ng  Bag  of  Distributed  Tasks  on  the  Cloud:  Inves+ga+ng  the  Trade-­‐Offs   between  Performance  and  Cost,”  L.  Thai,  B.  Varghese,  and  A.  Barker  (U.  St  Andrew)   5.  “CPU  Performance  Coefficient  (CPU-­‐PC):  A  Novel  Performance  Metric  Based  on   Real-­‐Time  CPU  Resource  Provisioning  in  Time-­‐Shared  Cloud  Environments,”  T.   Mastelić,  I.  Brandić,  and  J.  Jašarević  (Vienna  U.  of  Technology)   6.  “Performance  Analysis  of  Cloud  Environments  on  Top  of  Energy-­‐Efficient  Pla^orms   Featuring  Low  Power  Processors,”  V.  Plugaru,  S.  Varre[e,  and  P.  Bouvry  (U.  Luxembourg)   7.  “Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud,”  N.   Chakthranont,  P.  Khunphet,  R.  Takano,  and  T.  Ikegami  (KMUTNB,  AIST)   8.  “GateCloud:  An  Integra+on  of  Gate  Monte  Carlo  Simula+on  with  a  Cloud   Compu+ng  Environment,”  B.  A.  Rowedder,  H.  Wang,  and  Y.  Kuang  (UNLV)   5
  • 6.
    キーワード •  ⽬目的 –  耐障害性 [1]、省省電⼒力力  [2,  6]、性能指標  [4,  5]、 ⾼高性能  [6,  7] •  システム –  リソースプロビジョニング・スケジューラ  [1,  4,  5] –  IaaS:  OpenStack  [6],  CloudStack  [7] –  ワークフロー  [8] •  アプリケーション –  MPI  [6,  7] –  Bag  of  Tasks  [2],  Bag  of  Distributed  Tasks  [4] –  Webアプリ  (FFmpeg,  MongoDB,  Ruby  on  Rails)  [5] –  モンテカルロ  [8] –  Earthquake  Engineering  [3] 6
  • 7.
    ︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎  ︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎CPU  Performance Coefficient  (CPU-‐‑‒PC):  A  Novel   Performance  Metric  Based  on  Real-‐‑‒time  CPU  Resource   Provisioning  in  Time-‐‑‒shared  Cloud  Environment •  クラウド環境では1台のサーバに複数のVMが共存 •  クラウド提供者も利利⽤用者も使える性能指標が欲しい –  response  timeは他のVMの影響で変動 •  stolen  timeに着⽬目した指標CPU-‐‑‒PCを提案 •  CPU-‐‑‒PCとresponse  timeは⾮非常に⾼高い相関 7
  • 8.
    ASGC Hardware Spec. 8 ComputeNode CPU Intel Xeon E5-2680v2/2.8GHz (10 core) x 2CPU Memory 128 GB DDR3-1866 InfiniBand Mellanox ConnectX-3 (FDR) Ethernet Intel X520-DA2 (10 GbE) Disk Intel SSD DC S3500 600 GB •  155 node-cluster consists of Cray H2312 blade server •  The theoretical peak performance is 69.44 TFLOPS •  The operation started from July, 2014 Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 9.
    ASGC Software Stack ManagementStack –  CentOS 6.5 (QEMU/KVM 0.12.1.2) –  Apache CloudStack 4.3 + our extensions •  PCI passthrough/SR-IOV support (KVM only) •  sgc-tools: Virtual cluster construction utility –  RADOS cluster storage HPC Stack (Virtual Cluster) –  Intel Compiler/Math Kernel Library SP1 1.1.106 –  Open MPI 1.6.5 –  Mellanox OFED 2.1 –  Torque job scheduler 9 Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 10.
    Benchmark Programs Micro benchmark – Intel Micro Benchmark (IMB) version 3.2.4 Application-level benchmark –  HPC Challenge (HPCC) version 1.4.3 •  G-HPL •  EP-STREAM •  G-RandomAccess •  G-FFT –  OpenMX version 3.7.4 –  Graph 500 version 2.1.4 10 Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 11.
    MPI Point-to-point communication 11 0.1$ 1$ 10$ 1$ 1024$ Throughput)(GB/s) Message)Size)(KB) Physical$Cluster$ Virtual$Cluster$ 5.85GB/s 5.69GB/s Theoverhead is less than 3% with large message, though it is up to 25% with small message. IMBExploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 12.
    MPI Collectives (64bytes) 12 0 1000 2000 3000 4000 5000 032 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster 0 200 400 600 800 1,000 1,200 0 32 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster 0 2000 4000 6000 0 32 64 96 128 ExecutionTime(usec) Number of Nodes Physical Cluster Virtual Cluster Allgather Allreduce Alltoall IMB The overhead becomes significant as the number of nodes increases. … load imbalance? +77% +88% +43% Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 13.
    G-HPL (LINPACK) 13 0 10 20 30 40 50 60 0 3264 96 128 Performance(TFLOPS) Number of Nodes  Physical Cluster  Virtual Cluster Performance degradation: 5.4 - 6.6% Efficiency* on 128 nodes ・Physical: 90% ・Virtual: 84% *) Rmax / Rpeak HPCCExploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 14.
    EP-STREAM and G-FFT 14 0 2 4 6 032 64 96 128 Performance(GB/s) Number of Nodes  Physical Cluster  Virtual Cluster 0 40 80 120 160 0 32 64 96 128 Performance(GFLOPS) Number of Nodes  Physical Cluster  Virtual Cluster EP-STREAM G-FFT HPCC The overheads are ignorable. memory intensive with no communication all-to-all communication with large messages Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 15.
    Graph500 (replicated-csc, scale26) 15 1.00E+07 1.00E+08 1.00E+09 1.00E+10 0 16 32 48 64 Performance(TEPS) Number of Nodes Physical Cluster Virtual Cluster Graph500 Performance degradation: 2% (64node) Graph500 is a Hybrid parallel program (MPI + OpenMP). We used a combination of 2 MPI processes and 10 OpenMP threads. Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud  
  • 16.
    Findings •  PCI passthroughis effective in improving the I/O performance, however, it is still unable to achieve the low communication latency of a physical cluster due to a virtual interrupt injection. •  VCPU pinning improves the performance for HPC applications. •  Almost all MPI collectives suffer from the scalability issue. •  The overhead of virtualization has less impact on actual applications. 16 Exploring  the  Performance  Impact  of  Virtualiza+on  on  an  HPC  Cloud