SlideShare a Scribd company logo
1 of 29
Download to read offline
Ryousei Takano, Yusuke Tanimura, Akihiko Oota,!
Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka
National Institute of Advanced Industrial Science and Technology, Japan
ISGC 2015@Taipei, 20 March. 2015
AIST Super Green Cloud: !
Lessons Learned from the Operation and !
the Performance Evaluation of HPC Cloud
AIST Super Green Cloud
This talk is about…
Introduction
•  HPC Cloud is promising HPC platform.
•  Virtualization is a key technology.
–  Pro: a customized software environment, elasticity, etc
–  Con: a large overhead, spoiling I/O performance.
•  VMM-bypass I/O technologies, e.g., PCI passthrough and !
SR-IOV, can significantly mitigate the overhead.
•  “99% of HPC jobs running on US NSF computing
centers fit into one rack.” -- M. Norman, UCSD
•  Current virtualization technologies are feasible
enough for supporting such a scale.
LINPACK on ASGC
4
0
10
20
30
40
50
60
0 32 64 96 128
Performance(TFLOPS)
Number of Nodes
Physical Cluster
Virtual Cluster
Performance degradation:!
5.4 - 6.6%
Efficiency* on 128 nodes
Physical cluster: 90%
Virtual cluster: 84%
*) Rmax / Rpeak
IEEE CloudCom 2014
Introduction (cont’d)
•  HPC Clouds are heading for hybrid-cloud and multi-
cloud systems, where the user can execute their
application anytime and anywhere he/she wants.
•  Vision of AIST HPC Cloud:
“Build once, run everywhere”
•  AIST Super Green Cloud (ASGC): !
a fully virtualized HPC system
Outline
•  AIST Super Green Cloud (ASGC) and!
HPC Cloud service
•  Lessons learned from the first six months of operation
•  Experiments
•  Conclusion
6
Vision of AIST HPC Cloud!
“Build Once, Run Everywhere”
7
Academic Cloud
Private Cloud
Commercial Cloud
Virtual cluster!
templates
Deploy a Virtual Cluster
Feature 1: Create a customized
virtual cluster easily
Feature 2: Build a virtual
cluster once, and run it
everywhere on clouds
Usage Model of AIST Cloud
8
Allow users to
customize their
virtual clusters
0 21
Web appsBigDataHPC
1. Select a template of!
a virtual machine
2. Install required software!
package
VM template files
HPC
+
Ease of use
Virtual cluster
deploy
take snapshots
Launch a virtual
machine when
necessary
3. Save a user-customized!
template in the repository
9
Elastic Virtual Cluster
Cloud
controller
Login node
sgc-tools
Image
repository
Virtual cluster!
template
Cloud
controller
Frontend node
cmp!
node
cmp
node
cmp
node Scale in/!
scale out
NFSdJob scheduler
Virtual Cluster on ASGC
InfiniBand/Ethernet
Image
repository
Import/
export
Create a virtual
cluster
Virtual Cluster!
on Public Cloud
Frontend node
cmp!
node
cmp!
node
Ethernet
Submit a job
Submit a job
In operationUnder development
ASGC Hardware Spec.
10
Compute Node
CPU Intel Xeon E5-2680v2/2.8GHz !
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InfiniBand Mellanox ConnectX-3 (FDR)
Ethernet Intel X520-DA2 (10 GbE)
Disk Intel SSD DC S3500 600 GB
•  155 node-cluster consists of Cray H2312 blade server
•  The theoretical peak performance is 69.44 TFLOPS
Network switch
InfiniBand Mellanox SX6025
Ethernet Extreme BlackDiamond X8
ASGC Software Stack
Management Stack
–  CentOS 6.5 (QEMU/KVM 0.12.1.2)
–  Apache CloudStack 4.3 + our extensions
•  PCI passthrough/SR-IOV support (KVM only)
•  sgc-tools: Virtual cluster construction utility
–  RADOS cluster storage
HPC Stack (Virtual Cluster)
–  Intel Cluster Studio SP1 1.2.144
–  Mellanox OFED 2.1
–  TORQUE job scheduler 4.2.8
11
Storage Architecture
User%a'ached+
storage
x+N
VLAN
Compute+nodes
Compute+network+
(Infiniband+FDR)
Management+network+
(10/1GbE)
x+155
x+155
BDX8
x+155
x+5
VMDI+public+SW
RGWRADOS
VMDI+cluster+SW
x+10
x+10
VMDI+storage
Data+network+(10GbE)
NFS
x+2 x+2
VM
•  No shared storage/!
filesystem
•  VMDI (Virtual Machine
Disk Image) storage
–  RADOS storage cluster
–  RADOS gateway
–  NFS secondary staging
server
•  User-attached storage
primary storage!
(local SSD)
secondary
storage
user data
Zabbix Monitoring System
CPU Usage
Power Usage
Outline
•  AIST Super Green Cloud (ASGC) and!
HPC Cloud service
•  Lessons learned from the first six months of operation
–  CloudStack on a Supercomputer
–  Cloud Service for HPC users
–  Utilization
•  Experiments
•  Conclusion
14
Overview of ASGC Operation
•  The operation started from July 2014.
•  Accounts: 30+
–  Main users are material scientists and genome scientists.
•  Utilization: < 70%
•  95% of the total usage time is consumed for running
HPC VM instances.
•  Hardware failures: 19 (memory, M/B, power supply)
CloudStack on Supercomputer
•  Supercomputer is not designed for cloud computing.
–  Cluster management software is troublesome.
•  We can launch a highly productive system in a short
development time by leveraging open source system
software.
•  Software maturity of CloudStack
–  Our storage architecture is slightly uncommon, that is we
use local SSD disk as primary storage, and S3-compatible
object store as secondary storage.
–  We discovered and resolved several serious bugs.
Software Maturily
CloudStack Issue Our action Status
cloudstack-agent jsvc gets too large virtual memory space Patch Fixed
listUsageRecords generates NullPointerExceptions for
expunging instances
Patch Fixed
Duplicate usage records when listing large number of records /
Small page sizes return duplicate results
Backporting Fixed
Public key content is overridden by template's meta data when
you create a instance
Bug report Fixed
Migration of aVM with volumes in local storage to another
host in the same cluster is failing
Backporting Fixed
Negative ref_cnt of template(snapshot/volume)_store_ref
results in out-of-range error in MySQL
Patch (not merged) Fixed
[S3] Parallel deployment makes reference count of a cache in
NFS secondary staging store negative(-1)
Patch (not merged) Unresolved
Can't create proper template fromVM on S3 secondary
storage environment
Patch Fixed
Fails to attach a volume (is made from a snapshot) to aVM
with using local storage as primary storage
Bug report Unresolved
Cloud service for HPC users
•  SaaS is the best if the target application is clear.
•  IaaS is quite flexible. However, it is difficult to
manage an HPC environment from scratch for
application users.
•  To bridge this gap, sgc-tools is introduced on top of
an IaaS service.
•  We believe it works well, although some minor
problems are remained.
•  To improve the ability to maintain VM template, the
idea of “Infrastructure as code” can help.
Utilization
•  Efficient use of limited resources is required.
•  A virtual cluster dedicates resources whether the
user fully utilizes them or not.
•  sgc-tools do not support queuing at system wide,
therefore, the users need to check the availability.
•  Introducing a global scheduler, e.g., Condor VM
universe, can be a solution for this problem.
Outline
•  AIST Super Green Cloud (ASGC) and!
HPC Cloud service
•  Lessons learned from the first six months of operation
•  Experiments
–  Deployment time
–  Performance evaluation of SR-IOV
•  Conclusion
20
0 5 10 15 20 25 30 35
0
200
400
600
800
1000
Number of nodes
Deploytime[second]
●
●
●
●
●
● no cache
cache on NFS/SS
cache on local node
Virtual Cluster Deployment
Breakdown (second)
Device attach!
(before OS boot)
90
OS boot 90
FS creation (mkfs) 90
transfer from !
RADOS to SS
transfer from SS to !
local node
VM
VM
RADOS
NFS/SS
Compute!
node
VM
Benchmark Programs
Micro benchmark
–  Intel Micro Benchmark (IMB) version 3.2.4
•  Point-to-point
•  Collectives: Allgather, Allreduce, Alltoall, Bcast, Reduce, Barrier
Application-level benchmark
–  LAMMPS Molecular Dynamics Simulator version 28 June
2014
•  EAM benchmark, 100x100x100 atoms
MPI Point-to-point
communication
23
6.00GB/s
5.72GB/s
5.73GB/s
IMB
Message size [Byte]
Bandwidth[MB/sec]
101
102
103
104
105
10610−1
101
102
103
104
BM
PCI passthrough
SR−IOV
The overhead is less than 5% with large message,
though it is up to 30% with small message.
Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
MPI Collectives
Time [microsecond]
BM 6.87 (1.00)
PCI passthrough 8.07 (1.17)
SR-IOV 9.36 (1.36)
Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
105
BM
PCI passthrough
SR−IOV
Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
BM
PCI passthrough
SR−IOV
Message size [Byte]
Executiontime[microsecond]
101
102
103
104
105
106101
102
103
104
BM
PCI passthrough
SR−IOV
ReuceBcast
AllreduceAllgather Alltoall
The performance of SR-IOV is comparable to
that of PCI passthrough while unexpected
performance degradation is often observed
Barrier
LAMMPS: MD simulator
•  EAM benchmark:
–  Fixed problem size (1M
atoms)
–  #proc: 20 - 160
•  VCPU pinning reduces
performance fluctuation.
•  Performance overhead
of PCI passthrough and
SR-IOV is about 13 %.
20 40 60 80 100 120 140 160
0
10
20
30
40
50
60
Number of processes
Executiontime[second]
●
●
●
●
●
● BM
PCI passthrough
PCI passthrough w/ vCPU pin
SR−IOV
SR−IOV w/ vCPU pin
Findings
•  The performance of SR-IOV is comparable to that of
PCI passthrough while unexpected performance
degradation is often observed.
•  VCPU pinning improves the performance for HPC
applications.
Outline
•  AIST Super Green Cloud (ASGC) and!
HPC Cloud service
•  Lessons learned from the first six months of operation
•  Experiments
•  Conclusion
27
Conclusion and Future work
•  ASGC is a fully virtualized HPC system.
•  We can launch a highly productive system in a short
development time by leveraging start-of-the-art open
source system software.
–  Extension: PCI passthrough/SR-IOV support, sgc-tools
–  Bug fixes…
•  Future research direction: data movement is key.
–  Efficient data management and transfer methods
–  Federated identity management
28
Question?
Thank you for your attention!
29
Acknowledgments:
This work was partly supported by JSPS KAKENHI Grant Number 24700040.

More Related Content

What's hot

Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCRyousei Takano
 
Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*Intel® Software
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyDavid Lecomber
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudAccubits Technologies
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCScale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCinside-BigData.com
 
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...inside-BigData.com
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Clusterinside-BigData.com
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 

What's hot (20)

Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCScale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 

Viewers also liked

クラウドコンピューティングとは何か?
クラウドコンピューティングとは何か?クラウドコンピューティングとは何か?
クラウドコンピューティングとは何か?Ryousei Takano
 
Building Trust Despite Digital Personal Devices
Building Trust Despite Digital Personal DevicesBuilding Trust Despite Digital Personal Devices
Building Trust Despite Digital Personal DevicesJavier González
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術Ryousei Takano
 
Towards Application Driven Storage
Towards Application Driven StorageTowards Application Driven Storage
Towards Application Driven StorageJavier González
 
Operating System Support for Run-Time Security with a Trusted Execution Envir...
Operating System Support for Run-Time Security with a Trusted Execution Envir...Operating System Support for Run-Time Security with a Trusted Execution Envir...
Operating System Support for Run-Time Security with a Trusted Execution Envir...Javier González
 
IEEE eScience 2012および併設ワークショップ報告
IEEE eScience 2012および併設ワークショップ報告IEEE eScience 2012および併設ワークショップ報告
IEEE eScience 2012および併設ワークショップ報告Ryousei Takano
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterRyousei Takano
 
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...Javier González
 
オペレーティングシステム 設計と実装 第3版(20101211)
オペレーティングシステム 設計と実装 第3版(20101211)オペレーティングシステム 設計と実装 第3版(20101211)
オペレーティングシステム 設計と実装 第3版(20101211)Ryousei Takano
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何かRyousei Takano
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksRyousei Takano
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano
 
WALをバックアップとレプリケーションに使う方法
WALをバックアップとレプリケーションに使う方法WALをバックアップとレプリケーションに使う方法
WALをバックアップとレプリケーションに使う方法Takashi Hoshino
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...Ryousei Takano
 
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...Kensuke Harada
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 

Viewers also liked (18)

クラウドコンピューティングとは何か?
クラウドコンピューティングとは何か?クラウドコンピューティングとは何か?
クラウドコンピューティングとは何か?
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
Building Trust Despite Digital Personal Devices
Building Trust Despite Digital Personal DevicesBuilding Trust Despite Digital Personal Devices
Building Trust Despite Digital Personal Devices
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
Towards Application Driven Storage
Towards Application Driven StorageTowards Application Driven Storage
Towards Application Driven Storage
 
Operating System Support for Run-Time Security with a Trusted Execution Envir...
Operating System Support for Run-Time Security with a Trusted Execution Envir...Operating System Support for Run-Time Security with a Trusted Execution Envir...
Operating System Support for Run-Time Security with a Trusted Execution Envir...
 
IEEE eScience 2012および併設ワークショップ報告
IEEE eScience 2012および併設ワークショップ報告IEEE eScience 2012および併設ワークショップ報告
IEEE eScience 2012および併設ワークショップ報告
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
 
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
 
オペレーティングシステム 設計と実装 第3版(20101211)
オペレーティングシステム 設計と実装 第3版(20101211)オペレーティングシステム 設計と実装 第3版(20101211)
オペレーティングシステム 設計と実装 第3版(20101211)
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center Networks
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価
 
WALをバックアップとレプリケーションに使う方法
WALをバックアップとレプリケーションに使う方法WALをバックアップとレプリケーションに使う方法
WALをバックアップとレプリケーションに使う方法
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
 
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...
2014 IEEE Int. Symposium on System Integration (SII) : Project on Development...
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 

Similar to AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud

Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxBiHongPhc
 
JITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfJITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfRichHagarty
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Eduardo Patrocinio
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudWolfgang Gentzsch
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudThe UberCloud
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munichMongoDB
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
 

Similar to AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud (20)

Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptx
 
JITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfJITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdf
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-time
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 

More from Ryousei Takano

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive ComputingRyousei Takano
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentRyousei Takano
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...Ryousei Takano
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~Ryousei Takano
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green CloudRyousei Takano
 
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システムRyousei Takano
 
SoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksSoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksRyousei Takano
 
異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構Ryousei Takano
 
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式Ryousei Takano
 
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ryousei Takano
 
インタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムインタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムRyousei Takano
 
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Ryousei Takano
 
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御Ryousei Takano
 

More from Ryousei Takano (16)

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
 
ABCI Data Center
ABCI Data CenterABCI Data Center
ABCI Data Center
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
IEEE/ACM SC2013報告
IEEE/ACM SC2013報告IEEE/ACM SC2013報告
IEEE/ACM SC2013報告
 
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
 
SoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksSoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired Networks
 
異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構
 
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
 
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
 
インタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムインタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システム
 
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
 
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC cloud

  • 1. Ryousei Takano, Yusuke Tanimura, Akihiko Oota,! Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka National Institute of Advanced Industrial Science and Technology, Japan ISGC 2015@Taipei, 20 March. 2015 AIST Super Green Cloud: ! Lessons Learned from the Operation and ! the Performance Evaluation of HPC Cloud
  • 2. AIST Super Green Cloud This talk is about…
  • 3. Introduction •  HPC Cloud is promising HPC platform. •  Virtualization is a key technology. –  Pro: a customized software environment, elasticity, etc –  Con: a large overhead, spoiling I/O performance. •  VMM-bypass I/O technologies, e.g., PCI passthrough and ! SR-IOV, can significantly mitigate the overhead. •  “99% of HPC jobs running on US NSF computing centers fit into one rack.” -- M. Norman, UCSD •  Current virtualization technologies are feasible enough for supporting such a scale.
  • 4. LINPACK on ASGC 4 0 10 20 30 40 50 60 0 32 64 96 128 Performance(TFLOPS) Number of Nodes Physical Cluster Virtual Cluster Performance degradation:! 5.4 - 6.6% Efficiency* on 128 nodes Physical cluster: 90% Virtual cluster: 84% *) Rmax / Rpeak IEEE CloudCom 2014
  • 5. Introduction (cont’d) •  HPC Clouds are heading for hybrid-cloud and multi- cloud systems, where the user can execute their application anytime and anywhere he/she wants. •  Vision of AIST HPC Cloud: “Build once, run everywhere” •  AIST Super Green Cloud (ASGC): ! a fully virtualized HPC system
  • 6. Outline •  AIST Super Green Cloud (ASGC) and! HPC Cloud service •  Lessons learned from the first six months of operation •  Experiments •  Conclusion 6
  • 7. Vision of AIST HPC Cloud! “Build Once, Run Everywhere” 7 Academic Cloud Private Cloud Commercial Cloud Virtual cluster! templates Deploy a Virtual Cluster Feature 1: Create a customized virtual cluster easily Feature 2: Build a virtual cluster once, and run it everywhere on clouds
  • 8. Usage Model of AIST Cloud 8 Allow users to customize their virtual clusters 0 21 Web appsBigDataHPC 1. Select a template of! a virtual machine 2. Install required software! package VM template files HPC + Ease of use Virtual cluster deploy take snapshots Launch a virtual machine when necessary 3. Save a user-customized! template in the repository
  • 9. 9 Elastic Virtual Cluster Cloud controller Login node sgc-tools Image repository Virtual cluster! template Cloud controller Frontend node cmp! node cmp node cmp node Scale in/! scale out NFSdJob scheduler Virtual Cluster on ASGC InfiniBand/Ethernet Image repository Import/ export Create a virtual cluster Virtual Cluster! on Public Cloud Frontend node cmp! node cmp! node Ethernet Submit a job Submit a job In operationUnder development
  • 10. ASGC Hardware Spec. 10 Compute Node CPU Intel Xeon E5-2680v2/2.8GHz ! (10 core) x 2CPU Memory 128 GB DDR3-1866 InfiniBand Mellanox ConnectX-3 (FDR) Ethernet Intel X520-DA2 (10 GbE) Disk Intel SSD DC S3500 600 GB •  155 node-cluster consists of Cray H2312 blade server •  The theoretical peak performance is 69.44 TFLOPS Network switch InfiniBand Mellanox SX6025 Ethernet Extreme BlackDiamond X8
  • 11. ASGC Software Stack Management Stack –  CentOS 6.5 (QEMU/KVM 0.12.1.2) –  Apache CloudStack 4.3 + our extensions •  PCI passthrough/SR-IOV support (KVM only) •  sgc-tools: Virtual cluster construction utility –  RADOS cluster storage HPC Stack (Virtual Cluster) –  Intel Cluster Studio SP1 1.2.144 –  Mellanox OFED 2.1 –  TORQUE job scheduler 4.2.8 11
  • 12. Storage Architecture User%a'ached+ storage x+N VLAN Compute+nodes Compute+network+ (Infiniband+FDR) Management+network+ (10/1GbE) x+155 x+155 BDX8 x+155 x+5 VMDI+public+SW RGWRADOS VMDI+cluster+SW x+10 x+10 VMDI+storage Data+network+(10GbE) NFS x+2 x+2 VM •  No shared storage/! filesystem •  VMDI (Virtual Machine Disk Image) storage –  RADOS storage cluster –  RADOS gateway –  NFS secondary staging server •  User-attached storage primary storage! (local SSD) secondary storage user data
  • 13. Zabbix Monitoring System CPU Usage Power Usage
  • 14. Outline •  AIST Super Green Cloud (ASGC) and! HPC Cloud service •  Lessons learned from the first six months of operation –  CloudStack on a Supercomputer –  Cloud Service for HPC users –  Utilization •  Experiments •  Conclusion 14
  • 15. Overview of ASGC Operation •  The operation started from July 2014. •  Accounts: 30+ –  Main users are material scientists and genome scientists. •  Utilization: < 70% •  95% of the total usage time is consumed for running HPC VM instances. •  Hardware failures: 19 (memory, M/B, power supply)
  • 16. CloudStack on Supercomputer •  Supercomputer is not designed for cloud computing. –  Cluster management software is troublesome. •  We can launch a highly productive system in a short development time by leveraging open source system software. •  Software maturity of CloudStack –  Our storage architecture is slightly uncommon, that is we use local SSD disk as primary storage, and S3-compatible object store as secondary storage. –  We discovered and resolved several serious bugs.
  • 17. Software Maturily CloudStack Issue Our action Status cloudstack-agent jsvc gets too large virtual memory space Patch Fixed listUsageRecords generates NullPointerExceptions for expunging instances Patch Fixed Duplicate usage records when listing large number of records / Small page sizes return duplicate results Backporting Fixed Public key content is overridden by template's meta data when you create a instance Bug report Fixed Migration of aVM with volumes in local storage to another host in the same cluster is failing Backporting Fixed Negative ref_cnt of template(snapshot/volume)_store_ref results in out-of-range error in MySQL Patch (not merged) Fixed [S3] Parallel deployment makes reference count of a cache in NFS secondary staging store negative(-1) Patch (not merged) Unresolved Can't create proper template fromVM on S3 secondary storage environment Patch Fixed Fails to attach a volume (is made from a snapshot) to aVM with using local storage as primary storage Bug report Unresolved
  • 18. Cloud service for HPC users •  SaaS is the best if the target application is clear. •  IaaS is quite flexible. However, it is difficult to manage an HPC environment from scratch for application users. •  To bridge this gap, sgc-tools is introduced on top of an IaaS service. •  We believe it works well, although some minor problems are remained. •  To improve the ability to maintain VM template, the idea of “Infrastructure as code” can help.
  • 19. Utilization •  Efficient use of limited resources is required. •  A virtual cluster dedicates resources whether the user fully utilizes them or not. •  sgc-tools do not support queuing at system wide, therefore, the users need to check the availability. •  Introducing a global scheduler, e.g., Condor VM universe, can be a solution for this problem.
  • 20. Outline •  AIST Super Green Cloud (ASGC) and! HPC Cloud service •  Lessons learned from the first six months of operation •  Experiments –  Deployment time –  Performance evaluation of SR-IOV •  Conclusion 20
  • 21. 0 5 10 15 20 25 30 35 0 200 400 600 800 1000 Number of nodes Deploytime[second] ● ● ● ● ● ● no cache cache on NFS/SS cache on local node Virtual Cluster Deployment Breakdown (second) Device attach! (before OS boot) 90 OS boot 90 FS creation (mkfs) 90 transfer from ! RADOS to SS transfer from SS to ! local node VM VM RADOS NFS/SS Compute! node VM
  • 22. Benchmark Programs Micro benchmark –  Intel Micro Benchmark (IMB) version 3.2.4 •  Point-to-point •  Collectives: Allgather, Allreduce, Alltoall, Bcast, Reduce, Barrier Application-level benchmark –  LAMMPS Molecular Dynamics Simulator version 28 June 2014 •  EAM benchmark, 100x100x100 atoms
  • 23. MPI Point-to-point communication 23 6.00GB/s 5.72GB/s 5.73GB/s IMB Message size [Byte] Bandwidth[MB/sec] 101 102 103 104 105 10610−1 101 102 103 104 BM PCI passthrough SR−IOV The overhead is less than 5% with large message, though it is up to 30% with small message.
  • 24. Message size [Byte] Executiontime[microsecond] 101 102 103 104 105 106101 102 103 104 105 BM PCI passthrough SR−IOV MPI Collectives Time [microsecond] BM 6.87 (1.00) PCI passthrough 8.07 (1.17) SR-IOV 9.36 (1.36) Message size [Byte] Executiontime[microsecond] 101 102 103 104 105 106101 102 103 104 105 BM PCI passthrough SR−IOV Message size [Byte] Executiontime[microsecond] 101 102 103 104 105 106101 102 103 104 105 BM PCI passthrough SR−IOV Message size [Byte] Executiontime[microsecond] 101 102 103 104 105 106101 102 103 104 BM PCI passthrough SR−IOV Message size [Byte] Executiontime[microsecond] 101 102 103 104 105 106101 102 103 104 BM PCI passthrough SR−IOV ReuceBcast AllreduceAllgather Alltoall The performance of SR-IOV is comparable to that of PCI passthrough while unexpected performance degradation is often observed Barrier
  • 25. LAMMPS: MD simulator •  EAM benchmark: –  Fixed problem size (1M atoms) –  #proc: 20 - 160 •  VCPU pinning reduces performance fluctuation. •  Performance overhead of PCI passthrough and SR-IOV is about 13 %. 20 40 60 80 100 120 140 160 0 10 20 30 40 50 60 Number of processes Executiontime[second] ● ● ● ● ● ● BM PCI passthrough PCI passthrough w/ vCPU pin SR−IOV SR−IOV w/ vCPU pin
  • 26. Findings •  The performance of SR-IOV is comparable to that of PCI passthrough while unexpected performance degradation is often observed. •  VCPU pinning improves the performance for HPC applications.
  • 27. Outline •  AIST Super Green Cloud (ASGC) and! HPC Cloud service •  Lessons learned from the first six months of operation •  Experiments •  Conclusion 27
  • 28. Conclusion and Future work •  ASGC is a fully virtualized HPC system. •  We can launch a highly productive system in a short development time by leveraging start-of-the-art open source system software. –  Extension: PCI passthrough/SR-IOV support, sgc-tools –  Bug fixes… •  Future research direction: data movement is key. –  Efficient data management and transfer methods –  Federated identity management 28
  • 29. Question? Thank you for your attention! 29 Acknowledgments: This work was partly supported by JSPS KAKENHI Grant Number 24700040.