College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
Building Software Ecosystems for AI Cloud using Singularity HPC Container
1. National Institute of
Advanced Industrial Science and Technology (AIST)
Artificial Intelligence Research Center
Hitoshi Sato
1
2. About AIST
2
ITRI
Information Technology
Research Institute
Artificial Intelligence
Research Center
Real World Big-Data
Computing – Open
Innovation Laboratory
One of the public research organization in Japan under METI*, focused on
“bridging” the gap between innovative technological seeds and commercialization.
*Ministry of Economy, Trade and Industry
@Odaiba, Tokyo Joint Lab w/ Tokyo Tech
HQ @Tsukuba
3. Current Status of AI R&D in Japan:
AI requires the convergence of Algorithm, Big Data, and
Computing Infrastructure
3
Big Data
Algorithm
Computing Infrastructure
Deep Learning, Machine Learning, etc.
Cutting edge infrastructures,
Clouds, Supercomputers, etc.
We lack the cutting edge infrastructure
dedicated to AI and Big Data, open to public use
Big Companies AI/BD
R&D (also Science)
AI Venture Startups
AI/BD Centers & Labs in National
Labs & Universities
4. AIST ABCI: AI Bridging Cloud Infrastructure
as the worlds first large-scale Open AI Infrastructure
• Open, Public, and Dedicated infrastructure
for AI & Big Data Algorithms, Software and Applications
• Platform to accelerate joint academic-industry R&D for AI in Japan
• Top-level compute capability w/ 0.55EFlops(HP), 37PFlops(DP)
4
Univ. Tokyo Kashiwa II Campus
Operation Scheduled in 2018
5. • 1088x compute nodes w/ 4352x NVIDIA Tesla V100 GPUs, 476TiB of Memory,
1.6PB of NVMe SSDs, 22PB of HDD-based Storage and Infiniband EDR network
• Ultra-dense IDC design from the ground-up w/ 20x thermal density of standard IDC
• Extreme Green w/ ambient warm liquid cooling, high-efficiency power supplies, etc.,
commoditizing supercomputer cooling technologies to clouds ( 2.2MW, 60kW/rack)
5
Gateway and
Firewall
Computing Nodes: 550 PFlops(HP), 37 PFlops(DP)
476 TiB Mem, 1.6 PB NVMe SSD
Storage: 22 PB GPFS
High Performance Computing Nodes (w/GPU) x1088
• Intel Xeon Gold6148 (2.4GHz/20cores) x2
• NVIDIA Tesla V100 (SXM2) x 4
• 384GiB Memory, 1.6TB NVMe SSD
Multi-platform Nodes (w/o GPU) x10
• Intel Xeon Gold6132 (2.6GHz/14cores) x2
• 768GiB Memory, 3.8TB NVMe SSD
Interactive Nodes
DDN SFA14K
(w/ SS8462 Enclosure x 10) x 3
• 12TB 7.2Krpm NL-SAS HDD x 2400
• 3.84TB SAS SSD x 216
• NSD Servers x 12
100GbEInterconnect (Infiniband EDR)
Service Network (10GbE)
Object Storage for Protocol Nodes
•
•
Management and Gateway Nodes
SINET5
6. • Accommodate open ecosystems of data and software for AI R&D
– Sharable/Reusable/Reproducible algorithm methods,
training data w/ labels, trained models, etc.
• Creating various social applications that use AI (ML, DL, etc.)
6
X
Y
Training
Dataset
Public Data
on Internet Deep Learning
Framework
Data
Collection and
Cleansing
Modeling Learning
Open
Data in
Enterprise
Open
Data in
Government
Trained
Model
Inference
Publish Trained Models to Public
to apply various domains
Prepare Training Datasets
Trial and error for designing
networks, parameters
Accelerate learning with HPC
to improve accuracy of models
Reuse
Trained
Model
Implementing
AI Techs
to Society
Retail Stores
Factories
Logistics Centers
Object Detection and
Tracking
7. ABCI Prototype: AIST AI Cloud (AAIC) ∼June 2017
• 400x NVIDIA Tesla P100s and Infiniband EDR accelerate various AI workloads
including ML (Machine Learning) and DL (Deep Learning).
• Advanced data analytics leveraged by 4PiB shared Big Data Storage and Apache
Spark w/ its ecosystem.
08 2RPSWVEVMR UVIP ETKI 2ESEGMV VRTEKI UVIP
2RPSWVEVMR RHIU 6 A &
• 8 VI DIR 4 X (
• B8380 IU E && B M N .
• ( ,6M1 IPRT .&61 3
2RPSWVEVMR RHIU R 6 A ,.
• 8 VI DIR 4 X (
• ( ,6M1 IPRT .&61 3
KPV ITXMGI
RHIU ,
8 VITEGVMXI RHIU
(
&& EUGE 6 AU
)& 1 IPRT
, 1 3
33 50
• 5M I UITXIT &6F4 (
81 43>
• . 1 - ( TSP 0
733 -)&
• 6>83 GE IT 6 5
/ M1 I IGVMXI
>C &&61 U
2RPSWVEVMR IV RTN
I E R 2 - (& 3MTIGVRT MVGL
• 43> &&6FSU ( ,
1M HMTIGVMR (&&6FSU
5W FM UIGVMR FE H MHVL
ITXMGI E H E EKIPI V IV RTN
81 43> &&6FSU 81 43> &&6FSU
6F4 RT &6F4 6F4 RT &6F4
5MTI E
• 5RTVM6EVI ). 3 (
• 5RTVM0 E IT &&&4 (
A 5MTI E
& &&6FSU G EUU
&6F4
8 4
8 VIT IV
2R IGVMR
& &&6F4
8. Example of Real AI/DL Apps Usage
• lang2program
– https://github.com/kelvinguu/
lang2program
– Implementation for ACL 2017
(http://acl2017.org/ ) paper
– Provided as Dockerfile
• Including Tensorflow, CUDA, CuDNN,
PostgresQL, Python pip Packages, etc.
• Various software dependencies for installation
8
Can traditional shared HPC systems support
such DL Apps?
• Not allowed Docker for security reason
(rootless access required)
• Elaborate efforts for manual installation
9. =RIV DTH /FRU UVHP IRT - LQ +
B K GM 6 HLRLM MP G , G &- EHN
6LQWZ =
-RQVDLQHT .RFNHT -ORWG =HTXLFHU
SHQ=VDFN
/VKHTQHV
> = VLFKHU
LJK
6DVHQF &6R
-DSDFLV 8A
6RFDO 8RGH
=VRTDJH
Z( -
.LUVTLEWVHG LOHU UHP
. = EMHFV =VRTH
DS HGWFH TDPH RTN
=SDTN& DGRRS
,.&+ UHT +SSOLFDVLRQU
.,
RUVJTHU 6
DFKLQH 6HDTQLJ
6OLE&
DKRWV&-KDLQHT
TDSK TRFHUULQJ
TDSKB&
LTDSK
&=FDOH TDSK
DXD =FDOD VKRQ .6
= 6&8RQ = 6
LXH& LJ
-ORWG.,&8R= 6
EDUH&-DUUDQGTD& RQGR.,
-RRTGLQDVLRQ =HTXLFH
CRR HHSHT
/ZLUVLQJ -ORWGU /ZLUVLQJ =WSHTFRPSWVHTU+SSOLFDVLRQ 6D HT
• EHN .H L H M G -GM K MBO P K LHNK HGMKHE 6 & -L
• , .H L K M A KB GM K LHNK HGMKHE R -
= UVHP =RIV DTH 6D HT
• EHN IEHRL ,B A KH N MBOBMR G N L NM
I K HK G G E M H NL HG M G ERMB L G
RG B K JN GM A G L
• , IEHRL ,B A K HK G G N L NM K JNBK L
BGC KH K KL EHP IKH N MBOBMR / KG EL H IBE KL
P EE MNG K LNEM LA K R GR IKH K L E LL K PKBM
• EHN H NL HG M L L G M GBINE MBHG PHKD EHP
• , H NL HG H INM D KG EL O G HK M IKH LLBG
.H L L E L MH MAHNL G L H CH L MANL N BG G
I K HK G MNGBG
• EHN K JNBK L INKIHL LI B B H INMBG M GOBKHG GM L
P EE L MA BK NMN E BLHE MBHG L NKBMR
• , K JNBK L GOBKHG GM HK LM E G NL H K LHNK L NM HG
H KG ABG L K JNBK HGLB K E LRLM LH MP K LNIIHKM
• EHN , L HG 6 KO K S H H BMRT L KO KL
BLMKB NM LMHK HG GH L LLN BG 6 & - LL
• , , K LLBO ER HIML G P M AGHEH B L LN A L
8L H NL HG NEMB M I K HK G M AB A K HLM
LA K LMHK MH LNIIHKM E R IIL
= 6D HT
DTG DTH 6D HT
DTLRWU FRQXHTJHQFH THUHDTFK HIIRTVU WQGHT D EWV QR THDOLUVLF FRQXHTJHG =A
=VDFN HV ) DFKLHXLQJ - + &,.&-ORWG FRQXHTJHQFH NH +,- JRDO
- UHT -RGH
RTVTDQ - - .6
8WPHTLFDO 6LETDTLHU
6+ +- >A
DTLRWU .=6U
DTDOOHO .HEWJJHTU DQG TRILOHTU
SHQ &+-- - .+& SHQ-6
DTDOOHO LOHU UVHP
6WUVTH =
,DVFK RE =FKHGWOHTU
,= TR =OWTP /
QILQL,DQG& +
LJK -DSDFLV
6R 6DVHQF 8A
LJK HTIRTPDQFH
=+8 ,WTUV ,WIIHTU
B(
+FFHOHTDVRTU
H J U
+U
6LQWZ =
ARTNIOR
= UVHPU
10. ) N 8ALP MAHAION BJM (/ GJP NOAH
6,
PNOMAU
6,
) O D J
DA PGAMN
J G ,G ND +
6J IO
OJM CA
DFS
HDFS
)+ (/ NAM (KKG O JIN
8+)
6JNOCMA
6 ODJI PK OAM JOA JJF 8 AO /+
. QA 6 C
GJP +) J
. NA 3JI J+) 8A N
8ANJPM A
)MJFAMN
3 D IA
A MI IC
M M AN
PHAM G M M AN
) ( 3 OG
,JMOM IU U
O QA J AN
)+ (GCJM ODH
AMIAGN NJMO AO
6 M GGAG +A PCCAMN I 6MJB GAMN
JMFBGJR
NOAHN
M KD
JHKPO IC
M M AN
+AAK
A MI IC
,M HARJMFN
A
AMQ AN
IPS JIO IAMN U GJP AMQ AN
36/U KAI36 ( U +( KAI
IPS
/)U 6(
. CD K O
JR OAI
AJI 6D
( AGAM OJMN A C
6 ,6 ( FA
MANO
(KKG O JI
ü Easy use of various ML/DL/Graph frameworks from
Python, Jupyter Notebook, R, etc.
ü Web-based applications and services provision
NOAH JBOR MA
ü HPC-oriented techniques for numerical libraries, BD
Algorithm kernels, etc.
ü Supporting long running jobs / workflow for DL
ü Accelerated I/O and secure data access to large data
sets
ü User-customized environment based on Linux
containers for easy deployment and reproducibility
. M R MA
ü Modern HPC computing facilities based on commodity
components
11. • Singularity
– HPC Linux Container developed at LBNL and Sylabs Inc.
• http://singularity.lbl.gov/
• https://github.com/singularityware/singularity
– Rootless program execution, storage access
• No daemons w/ root privilege like Docker
– Computing resources are managed by job schedulers such as UGE
• Root privilege is given by commands w/ setuid
– Mount, Linux namespace creation, Bind paths between host and container
– Existing Docker images available via DockerHub
– HPC software stacks available
• CUDA, MPI, Infiniband, etc.
11
12. ) )
sudo singularity build –sandbox tmpdir/ Singularity
sudo singularity build –writable container.img Singularity
sudo singularity build container.img Singularity
sudo singularity build container.img docker://ubuntu
sudo singularity build container.img shub://ubuntu
Sandbox directory
Container image file
sudo singularity shell –writable container.img
Package software installation
(yum, apt etc.)
Writing a Recipe file for container image creation
Using Docker container images via DockerHub
Using Singularity container images via SingularityHub
container.img
. .
) (
Using container images from interactive operations
Using given container images
Using user defined recipe
singularity run container.img
singularity exec container.img …
singularity shell container.img
Shell Execution (shell)
Command execution
(exec)
User defined execution
(run)
User Machine Shared HPC Systems
12
13. • Can we use GPU, Infiniband, etc. => Yes
– Mount host installed device drivers and libraries to container.
– cf. nvidia-docker
• Can we use MPI? => Yes
– Mount host installed MPI to container.
(but a few merit w/ container.. )
– Install MPI programs inside the container, and launch the binary
from the outside.
(Same MPI configuration may be required between host and container)
• mpirun –np N singularity exec container.img mpi_program
• Can we use mpi4py? => Yes
– Install MPI inside the container
– Install mpi4py inside the container
– Launch python script from the outside
13
14. How to Create Container Images for Distributed DL
• Host
– Mount native device drivers and
system libraries to containers
• GPU, Infiniband, etc.
• cf. nvidia-docker
• Containers
– Install user space libraries
• CUDA, CuDNN, NCCL2, ibverbs, MPI, etc.
– Install distributed deep learning
frameworks
• Optimized build for underlying computing
resources
14
Base Drivers, Libraries on Host
CUDA
Drivers
Infiniband
Drivers
Filesystem
Libraries
(GPFS, Lustre)
Userland Libraries on Container
CUDA CuDNN NCCL2
MPI
(mpi4py)
Mount
ibverbs
Distributed Deep Learning
Frameworks
Caffe2 ChainerMNDistributed
TensorflowMXNet
15. AICloudModules: Docker-based Container Image
Collection for AI
• Base Images
– Optimized OS Images for
AI Cloud
• Including CUDA,CuDNN, NCCL,
MPI, etc., Libraries
• Excluding CUDA, Infiniband, GPFS
etc., native drivers on host
DL Images
• DL Images
– Installed Deep Learning
Frameworks and AI Applications
based on Base OS Images
15
Base Images
Base OS Images
incl. CUDA, CuDNN, NCCL2, MPI, Infiniband
DL Images
Images for Deep Learning Frameworks
based on HPCBase
CentOSUbuntu
Base Drivers, Libraries on Host
GPU Infiniband GPFSMPI
Caffe2 ChainerMNDistributed
TensorflowMXNet CNTK
https://hub.docker.com/r/aistairc/aaic/
as a prototype
16. Preliminary Performance Studies
• Emulated AI Workloads
– Computation (GEMM)
– Memory Bandwidth (STREAM)
– Network
(OSU Micro Benchmarks)
– Storage I/O (FIO)
– Distributed Deep Learning
(Imagenet1K)
• Experimental Environment
– AAIC
– Baremetal
• CentOS 7.3, Linux kernel v3.10.0,
gcc v4.8.5, glibc v2.17
– Singularity Container
• Singularity v2.4
• Ubuntu 16.04, Linux kernel v3.10.0,
gcc v5.4.0, glibc v2.23
– Base Software on Both
• CUDA 8.0.61.2 CuDNN v6.0.21
NCCL v2.0.5, OpenMPI v2.1.1
16
17. Computing Node Configuration
of AAIC (up to 50x nodes)
• NVIDIA TESLA P100 x 8
• Intel Xeon E5-2630 v4 x 2 sockets
– 10 cores per socket
– Hyper Threading (HT) enabled
– 40 cores per node
• 256GiB of Memory
• 480GB of SSD
• EDR Infiniband
– Connected to other computing nodes and
GPFS storage
17
Machine (256GB total)
NUMANode P#0 (128GB)
Package P#0
L3 (25MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#23
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#24
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#8
PU P#5
PU P#25
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#9
PU P#6
PU P#26
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#10
PU P#7
PU P#27
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#11
PU P#8
PU P#28
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#12
PU P#9
PU P#29
PCI 10de:15f9
card1
renderD128
PCI 10de:15f9
card2
renderD129
PCI 10de:15f9
card3
renderD130
PCI 10de:15f9
card4
renderD131
PCI 8086:8d62
sda
PCI 1a03:2000
card0
controlD64
PCI 8086:8d02
NUMANode P#1 (128GB)
Package P#1
L3 (25MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#10
PU P#30
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#11
PU P#31
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#12
PU P#32
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#13
PU P#33
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#14
PU P#34
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#8
PU P#15
PU P#35
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#9
PU P#16
PU P#36
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#10
PU P#17
PU P#37
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#11
PU P#18
PU P#38
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#12
PU P#19
PU P#39
PCI 8086:1528
enp129s0f0
PCI 8086:1528
enp129s0f1
PCI 10de:15f9
card5
renderD132
PCI 10de:15f9
card6
renderD133
PCI 10de:15f9
card7
renderD134
PCI 10de:15f9
card8
renderD135
PCI 15b3:1013
ib0 mlx5_0
Indexes: physical
Date: Tue May 30 11:32:43 2017
MEM
CPU Cores
GPUs
Infiniband HBA
SSD
24. Discussion
• Singularity
– Competitive performance to Baremetal
• Not only Computation, but Memory Bandwidth, Storage I/O, Network
– Easy to handle OS differences (CentOS vs. Ubuntu)
• cf.) glibc version differences
– Simple installation to shared computing resources
• configure, make, make install (but, root privilege required)
– Remaining Some Engineering Issues
• Software Stability
• Security Issues
24
25. Discussion (cont’d)
• Are there any problems to run uncertain user’s containers?
– Basically no problems, but it depends on the site policy
=> Define the issues !!
– Singularity provides some security mechanisms:
• Singularity Check command to analyze installed binaries inside the container
• File-owner- or Path-based limitation to launch Singularity container
• Read/Write restriction to container image files
• Do we need special container images to the site?
– Basically, NO. Singularity can run on/with ARM and OPA, etc. (If Linux)
– Optimized builds are required for best performance
• Can we use ISV Applications?
– Difficult to include inside the container due to license issues
– Alternatively, mount host installed ISV apps to container
25
26. ) (
• Can we run demonized programs, such as web services and
databases, etc.?
– Yes. Singularity instance command helps to run such instances
in the background.
– User privilege
=> Under investigation using
our AI application prototype (Web Services)
26
https://github.com/gistairc/MUSIC4P3
Power Plant Detection from Satellite Data
27. • Software Ecosystems for AI Cloud
– AI needs both HPC and Cloud/Big Data technologies
• Hardware, Software, etc.
– How to fill the chasm?
• Case study of Singularity (User-level Container)
– Containerization of Distributed Deep Learning Frameworks
– Preliminary Performance Studies w/ Emulated AI Workloads
• Competitive performance to Baremetal
27
28. The worlds first large-scale Open AI Infrastructure
28
ABCI
Serving your AI needs
in Spring 2018