Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Backend.AI Technical Introduction
Lablup Inc.
2019. 09
ENG
1 / 38

GPU Computing: Maximizing GPU utilization via Backend.AI
Backend.AI: The most efficient way to build and train your machine learning models
2 / 38

Synergy of Deep Learning and GPU
Deep Learning = Repetition of numeric ops on millions/billions of parameter matrices
2015 Microsoft ResNet
2015
13,000
2016 Baidu Deep Speech 2
2016
20,000
Google NMT
50억 달러
2017
60 million parameters
70 quadrillion calc.
300 million parameters
200 quadrillion calc.
Calc.: GOPS * bandwidth
Reference: NVIDIA 2017 “A NET COMPUTING ERA”
8.7 billion parameters
1.05 quintillion calc.1 2 3
4 5 6
7 8
9 10
11 12
58
1 + 7 + 2 + 9 + 3 + 11 = 58
3 / 38

Synergy of Deep Learning and GPU
DRAM
Cache
Control
ALU
ALU ALU
ALU
CPU GPU
DRAMDRAM
GPU = More computing units per chip area (ALU)
§ C/C++ codes that runs on the GPU in parallel made easy with NVIDIA's CUDA (2007) and OpenCL (2009)
§ Used in machine learning, numerical analysis, and scientific computing
CPU single thread performance
1000X
(Till 2025)
YoY 1.5X
YoY 1.1X
GPU computing power
YoY 1.5X
102
103
104
105
106
107
1980 1990 2000 2010 2020
4 / 38

Why GPU Computing?
HPC & AI = High utilization of large-scale resources
GPU = High-density computing chips
Note(s): CPU Baselined to 5000 Servers for each workload | Capex Costs: CPU node with 2x Skylake CPU’s ~$9K; GPU node with 4x V100 GPU’s ~$45K | Opex Costs: Power & cooling is $180/kW/month | Power: CPU serv
er + n/w = 0.6 KW; GPU server + n/w = 1.6 KW; DGX-1V/HGX-1 Server = 3.2KW | HPC: GPU node with 4xV100 compared to 2xCPU Server | DL Training: DGX-1V compared to a 2xCPU server | DL Inference: HGX-1 based s
erver (8xV100) Compared to 2x CPU Server |numbers rounded to nearest $0.5M
Workload
Baseline
(CPU-Only)
HPC
(Amber,LAMPS)
AI Training
(TensorFlow)
AI Inference
(Image, Speech)
Speed Up 1x 20x >100x 60x
Servers 5,000 250 <50 84
Capex $45M $11M $7.5M $7M
3 Year Opex
(Power+Cooling)
$19.5M $2.5M $1M $1.5M
TCO Saving N/A 79% 86% 86%
GPU is necessary!
5 / 38

Backend.AI https://www.backend.ai
Easy and fast
streamlined platform
to train and serve
Machine Learning models
on-premises and clouds
6 / 38

Easy and fast
streamlined platform
to train and serve
Machine Learning models
on-premises and clouds
7 / 38

Backend.AI Goal
GPU
GPU
GPU
GPU
GPU
GPU ??? Backend.AI
GPU
GPU
GPU
GPU
GPU
GPU
GPU GPU GPU
GPU
GPU
§ Manual assignment of GPUs to researchers
§ Inefficient allocation
§ Manual checks for SW compatibilities
§ Automatic sharing and consolidation of GPUs
§ Need-based use of GPUs
§ Containerized runtime environments
8 / 38

Backend.AI Usage Scenario
Backend.AI
GPUGPUGPU
GPUGPUGPU
GPUGPUGPU
GPUGPUGPU
Backend.AI
GPUGPUGPUGPUGPUGPUGPUGPU
Backend.AI
GPUGPUGPU
GPUGPUGPU
Cloud
Building GPU clusters
Sharing
high-end GPU nodes
Dynamic scaling out
from on-prem to clouds
9 / 38

Backend.AI Platform
IaaS / OS
Hardware Infra.
Managed GPU Apps
Data
Scientists
Data
Analysts
Instructors &
Learners
Developers
Container-level
GPU virtualization
Click-to-ready GPU
Environments
Web GUI for
monitoring & control
Backend.AI Manager
IDE Integration
Backend.AI Client
Backend.AI Agent
Brand Guidelines
TensorFlow is an end-to-end open-source platform
for machine learning. It has a comprehensive,
flexible ecosystem of tools, libraries and community
resources that lets researchers push the
10 / 38

Backend.AI Differentiation
• The only solution that provides machine learning container technology in a
single framework
Existing orchestration layers are optimized for domain-specific functions other than
machine learning (e.g. scheduling, microservice hosting)
Lack of products to solve the problems of real machine learning researchers and
developers
• Backend.AI
GPU optimization technology
ü Implementing CUDA-optimized solutions with NVIDIA partnership
ü The only container-based multi / partial GPU sharing (fractional scaling) solution
Dynamic sandboxing: programmable and rewritable syscall filters
ü Support for richer programmable policies compared to apparmor/seccomp, etc.
Docker-based legacy app resource control
ü Calibration of the number of CPU cores recognized by mathematical libraries
such as OpenBLAS
11 / 38

GPU Virtualization Technology
12 / 38

Backend.AI is an open-source
cloud resource management platform.
We provide fractional GPU resourcing so
you can scale efficiently
whether you’re a scientist, DevOps,
enterprise, or an AI hobbyist.
13 / 38

Backend.AI: GPU Features
• Container-level fractional GPU scaling
Assigning slices of SMP / RAM to containers
ü e.g.) Allocating 2.5 GPUs or 0.3 GPUs
Shared GPUs for inference & education workloads
Multiple GPUs for model training workloads
With proprietary CUDA virtualization layer
• NVIDIA platform integration
Optimized for DGX server families
Supports NGC (for DL / HPC) image integration
Example of GPU sharing / allocation
(2.5 / 0.5 slots)
2.5 GPUs 0.5 GPUs
14 / 38

Container 2
Backend.AI GPU Virtualizer
Container 1 Container 3 Container 4
Fractional & Multi-GPU Scaling
nvidia-docker + CUDA Driver
PCIE/0 PCIE/1 PCIE/2 PCIE/3 PCIE/4 PCIE/5
PCIE/0PCIE/1PCIE/0 PCIE/0 PCIE/1 PCIE/0 PCIE/1 PCIE/2
/device:GPU:0 /device:GPU:0 /device:GPU:1 /device:GPU:0 /device:GPU:0 /device:GPU:1 /device:GPU:2/device:GPU:1
/device:GPU:0 /device:GPU:1 /device:GPU:2 /device:GPU:3 /device:GPU:4 /device:GPU:5
Host-side
view:
15 / 39

NVIDIA DGX Series
• NVIDIA DGX-1/DGX-2
Complete multi-GPU environment system
ü Ubuntu-based Host OS (also RedHat support)
ü NV Link / NV Switch based high-speed networking
ü Great testbed for various load tests!
• Backend.AI on DGX-family
Complements to NVIDIA Container Runtime
ü GPU sharing for multi-user support
ü Scheduling with CPU/GPU topology
ü Features for machine learning pipeline
Technology collaboration via NVIDIA Inception Program
SYSTEM SPECIFICATIONS
GPUs 16X NVIDIA®
Tesla V100
GPU Memory 512GB total
Performance 2 petaFLOPS
NVIDIA CUDA®
Cores 81920
NVIDIA Tensor Cores 10240
NVSwitches 12
Maximum Power Usage 10 kW
CPU Dual Intel Xeon Platinum
8168, 2.7 GHz, 24-cores
System Memory 1.5TB
Network 8X 100Gb/sec
Inﬁniband/100GigE
Dual 10/25Gb/sec Ethernet
NVIDIA DGX-2
THE WORLD’S MOST POWERFUL
DEEP LEARNING SYSTEM FOR THE
MOST COMPLEX AI CHALLENGES
The Challenge of Scaling to Meet the Demands of
Modern AI and Deep Learning
Deep neural networks are rapidly growing in size and complexity, in response to the
most pressing challenges in business and research. The computational capacity
needed to support today’s modern AI workloads has outpaced traditional data center
architectures. Modern techniques that exploit increasing use of model parallelism
are colliding with the limits of inter-GPU bandwidth, as developers build increasingly
large accelerated computing clusters, pushing the limits of data center scale.
A new approach is needed - one that delivers almost limitless AI computing scale
in order to break through the barriers to achieving faster insights that can transform
the world.
Deep-learning Framework
TensorFlow, Caffe, Torch, mxnet,
Theano, etc.
Deep-learning user program
NVIDIA DIGITS
Container tools
NVIDIA Container Runtime for Docker
GPU Driver
NVIDIA Driver
System
Ubuntu-based Host OS
16 / 38

NVIDIA Platform Integration: NGC
• NGC (NVIDIA GPU Cloud)
Container image collection optimized for nvidia-docker
ü Direct optimization options and library dependency
management by NVIDIA
Announced expansion to model store from GTC 2019
ü Sharing deep learning models between users and organizations
ü Supports transfer learning by adding additional data based on
learned deep learning model
ü Easier and faster model learning environment through model
script
• Backend.AI with NGC
Supports all NGC-based image execution
NVIDIA recommended options applied (including docker shm limit)
Fractional GPU sharing
Supports NGC model store and model script (soon)
17 / 38

Backend.AI @NVIDIA GTC Silicon Valley 2019
• DGX User Group Meetup
Hearing DGX deployment case and
customer requirements
• NGC User Group Meetup
Presenting Backend.AI NGC
integration
• Main Session Talk
Introducing Backend.AI technology
• Inception Startup Booth
Demonstrating container-level GPU
virtualization
Having direct Q&A with NVIDIA CUDA
Developers
18 / 38

Backend.AI Competitor Analysis
Technology
nvidia-
docker
Docker
Swarm
OpenStack Kubernetes
Apache
Mesos
Backend.AI
GPU Support
GPU Assignment &
Isolation
Heterogeneous
Accelerators
Fractional GPU
Scaling
Security
Sandboxing via
Hypervisor/Container
Programmable
Sandboxing
Virtualization
VM (Hypervisor)
Docker Container
Scheduling
Availability-slot based
Advanced (e.g. DRF)
Integration
Modern AI
Frameworks
* Now on beta testing
** Cloud vendor / OpenStack handles VM management
*** slot-based but can do advanced customization with label feature
***
**** **
*
19 / 38

Flexible Resource Management
20 / 38

Flexible Resource Allocation: Resource Groups
• Resource Groups: Groups of managed hardware resources
Specify the available resource groups for each user, project, and domain
Allow resource requests to be allocated only within specific resource groups
Autoscale implementation in the cloud can be applied in units of resource groups
• Examples
Resource Groups by Device Performance : V100 / P100 / K80 / etc.
Resource Groups by Nodes : Servers / Workstations / IDC / etc.
Resource Groups by Clouds : AWS / GCP / Azure / etc.
• Applications
Assign specific hardware or GPU only to specific users, projects, teams, or domains
Divide node groups by CPU / GPU / storage
Group and manage nodes that are physically in the same network (for multi-network cluster)
21 / 38

Flexible Resource Allocation: Scenarios
Resource Group A (On-premise)
Resource Group D (Cloud / Scalable)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
22 / 38

• Per-user resource group
permission
User 1: grant to RG A
User 2: grant to RG A, B
Each user has separate privileges
to Storage C
• Session / task batch
Manual batch to specific RG
Automatic discovery of optimal
resource combinations across all
available RGs before starting
Resource Group D (Cloud)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
User 1 User 2
User 2
23 / 38

• Project-wise resource group
permission
Project 1: grant to RG A, B
Project 2: grant to RG B, D
• Storage sharing
Different resource groups can
share the same storage groups
Personal storage folder
ü Only owners can access
ü Invitation feature for sharing
Project storage folder
ü All project members can access
Resource Group D (Cloud)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
Project 1
Project 2 Project 2Project 1
24 / 38

• Resource group example
RG A: NVIDIA V100 GPU group
RG B: NVIDIA P100 GPU group
User 1 can only use V100, User 2
can use both V100 and P100
Project 3 can use P100 group and
AWS cloud
Project 4 can only use Microsoft
Azure cloud
Resource Group A (V100)
Resource Group D (AWS)Resource Group B (P100)
Storage Group C
Backend.AI
Manager
Project 3 Project 3
Project 4
User 1 User 2
User 2
Resource Group E (Azure) 25 / 38

User-Friendly GUI
26 / 38

Environment and Resource Selection
28 / 38

Interactive Development Tool
29 / 38

Cases
31 / 38

Backend.AI Cases: AI Bigdata MBA Dept., Kookmin Univ.
• GPU server farm for students and researchers in finance fields
3 servers with 24 GPUs for the simultaneous use of more than 80 students in a class and
researchers in labs
• Spec.
Different resource policy for students and researchers
18 TiB ceph distributed file system by binding HDDs on nodes with LAN connection
Web GUI for operation and maintenance: no operator needed
1 Gbps LAN
24 NVIDIA GPUs
Node-specific CPU
18 TiB
Distributed file system
(cephfs)
Manager + Agent Agent Agent
ML class with
40+ students
10+ graduates
and researchers
32 / 38

Backend.AI Cases: Lablup GPU Cloud
• Backend.AI service for cloud users (B2C)
https://cloud.backend.ai/ (in private beta)
Use Backend.AI on the web after sign-up (invitation needed)
• Spec.
Unified AWS + Azure + GCP
Google TPU support (beta)
Azure FileShare + AWS EFS (Elastic File System) for datastore
DGX-2 Custom-built GPU nodes
ap-northeast-2 LG U+ IDC
korea-south
FileShare
asia-east1
EFSRDS TPUsUsers
Internet
Manager + Agents
Agents
Agents 33 / 38

GPU Virtualization Performance
34 / 38

Backend.AI Performance Benchmark
• Example: fashion-MNIST
• V100/P100 GPU cluster (1-node 8 GPU)
Model # : 50
Comparison
ü V100 As-Is standalone (Whole)
ü V100 Backend.AI sharing (Fractional*)
ü P100** Backend.AI sharing (Fractional)
* Note, Fraction : 4 SMP, GPU Memory 1 GiB
** Note, P100 is a generation older GPU than V100
GPU virtualization and automatic
resource management increases
utilization of expensive GPU
resources
100.00%
20.24%
33.74%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
V100 x 8
As-Is
(Whole)
V100 x 8
Backend.AI
(Fractional)
P100 x 8
Backend.AI
(Fractional)
Fashion-MNIST 50 Tasks Processing Time
(Relative)
35 / 38

Cost-saving with Backend.AI
36 / 38

Case of Cloud for Machine Learning Education
• Machine learning education and
development cloud service
25 users / 2 months for each term
• Optimal utilization of each education /
development through GPU virtualization
Infrastructure costs reduced by more than 75%
• Automatic resource allocation and
environment preparation with GUI
Optimal operation without dedicated
administrator
Eliminates long term maintenance burden
Infrastructure / management cost reduction through GPU virtualization also applies to on-premise solutions.
20%
0%
23%
0% 20% 40% 60% 80% 100% 120%
Total Cost
Operator Payroll
Infra. Cost
Case Study : Cloud-base ML Education
Service Costs Comparison
A company ML Cloud Backend.AI Cloud
37 / 38

Make AI Accessible!
For more information,
Lablup Inc.
Backend.AI
Backend.AI GitHub
Backend.AI Cloud (beta)
https://www.lablup.com
https://www.backend.ai
https://github.com/lablup/backend.ai
https://cloud.backend.ai
38 / 38

Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Similar to Backend.AI Technical Introduction (19.09 / 2019 Autumn) (20)

More from Lablup Inc.

More from Lablup Inc. (18)

Recently uploaded

Recently uploaded (20)

Backend.AI Technical Introduction (19.09 / 2019 Autumn)