Enterprise data centers have to support a diverse of set of workloads: cloud native, big data, high performance computing, and legacy applications. While cloud native applications are ideal to run in Docker clusters, bare metal and virtualization infrastructures must still be supported in the data center. The result is a proliferation of clusters and technologies running in individual silos, resulting in high management costs and low utilization. This talk describes the challenges and experiences in implementing a shared cluster infrastructure based on Kubernetes to support big data, high performance computing, and VM-based workloads. The talk will show the deployment and scaling of a high performance computing workload manager, Spark, and OpenStack, and how the VM and Docker management can be integrated together.
4. Docker Adoption Growth
Sample of 10,000 companies tracking real usage (Ref: https://www.datadoghq.com/docker-adoption/)
90% is still
not
“Dockerized”
85% has still
not adopted
5. (5)
Challenges
Legacy Applications in Containers
•Static IP address requirement
•Hostname/MAC address dependent
Network dependencies
•Persistent data on local file system
Storage dependencies
•Data security across tenants
Security dependencies
Legacy Linux kernel requirements
Other OS, e.g. Windows
VM is still ideal to
run legacy
applications
6. Introducing SkyForm
VMs
Physical
Server
Auto-
scaling
Storage
VPC Firewall Networks
VDisks
Clusters Hosts Storage Network
Image
Mgmt
App Store
Applicati
on
Layout
Container
Mgmt
Releasing
Service
Discovery
Auto-
scaling
Load
Balancing
Monitoring
Logs
Dept Mgmt
User
Mgmt
Tenant
Mgmt
Quota
Mgmt
Logs
Reporting
Images
Repository
Flow
Configura-
tion
IaaS
SkyForm ECP
HA
SkyForm CMP
Authentication
7. (7)
Server
Linux OS with KVM
Server
Model 1:
Fine Grain Shared Docker & VM Management
Container
SkyFormDockerEngine
Server
Running VM inside container
Limitations : Lack support for migration, snapshotting, etc.
Linux OS with KVMLinux OS with KVM
VM
Container
Modern
app
Container
VM
Container
Modern
app
Container
VM
Container
Modern
app
DockerEngine Docker Engine
LegacyApps Container ready Apps
8. (8)
The customer wants to keep their existing VM management framework, in particular, assume the existing VM
environment is managed by OpenStack using KVM
• Objective becomes how to improve utilization and QoS in a shared infrastructure
Architecture
• OpenStack itself is deployed and managed as one SkyFormECP application where the OpenStack services run in
docker containers
• SkyFormECP allocates whole physical servers to OpenStack
• VMs are managed by OpenStack, and SkyFormECP is not aware of VMs – VMs run on the hypervisor, not in
containers
Model 2:
Course Grain Shared Docker & VM Management:
Linux with kvm
Server Server Server Server
Containers
VM
SkyFormDocker
Engine
Server Server Server
Openstack services run in containers
VM VM VM VM VM
Docker
Engine
Docker
Engine
Docker
Engine
Linux with kvm Linux with kvm Linux with kvm Linux with kvm Linux with kvm Linux with kvm
Docker
Engine
Docker
Engine
Docker
Engine
9. (9)
• HPC job could be serial or parallel. Parallel
job could run across multiple hosts
• HPC cluster is typically used by multiple
users.
• A supercomputing center may run jobs by
multiple tenants. They share the same
bare metal cluster environment
• The job management consists of
– Job queues
– Resourceallocation policies
– Job submission,monitoring and control
A Use Case: HPC Cluster
Server Server Server
Batch system: OpenLava
Job
Job
Job
Job
Job
OS OS OS
10. VMs: Tenant D
TensorFlow: Tenant C
Spark: Tenant B
(10)
Server Server
Docker-enabled HPC Cluster
J
o
b
WLM
J
o
b
Job
WLM
Server Server
WLM
J
o
b
WLM
Job
Server Server Server
HPC Batch Jobs: Tenant A
Job
Job HPC batch job process
WLM HPC batch service
Job HPC batch job container
Traditional HPC Cluster
HPC batch service &
jobs run in processes
Docker-enabled HPC Cluster
• Dockerized HPC compute
nodes
• Run job from user supplied
Docker image
• Run traditional “a.out” jobs
Benefits
• Application portability
• Isolation
• Compatibility
Expansion of HPC Cluster
• Big Data, ML, VM (legacy)
• Multi-tenant
• All workloads run in containers
Benefits
• Increased demand for HPC infrastructure
• Higher utilization
• Reduced IT costs
WLM HPC batch service run as a container
11. (11)
Course Grain HPC Batch Sharing: Method 1
Job HPC batch job process
WLM HPC batch service daemon process
Job HPC batch job container SparkTensorFlow VM in container
Server Server
S
p
a
r
k
S
p
a
r
k
Server
V
M
V
M
S
p
a
r
k
T
F
T
F
T
F
Server
J
o
b
WLM
Docker
Engine
Docker
Engine
Docker
Engine
Server
Batch
Service
Master
Batch
Failover
Master
(slave)
J
o
b
Server Server
S
p
a
r
k
Server
V
M
T
F
T
F
Server
J
o
b
WLM
Docker
Engine
Docker
Engine
Docker
Engine
Server
Batch
Service
Master
Batch
Failover
Master
(slave)
J
o
b
WLM
J
o
b
J
o
bBatch
scale out
Batch
scale in
Sharing at thephysical host boundary
Batchjobs that arenot provided as a containerimagearerun inside a container createdby thesystem
Pros
• Evolutionaryextension of existing HPC cluster architecture
• Physical isolation betweenHPC batchtenant and non-HPC
Cons
• Lower utilization
• Dedicatedmaster hosts
• Longer time toscale out – entirehost must be vacatedbeforeproviding to new tenant
WLM HPC batch service run as a container
12. (12)
Course Grain HPC Batch Sharing: Method 2
Job HPC batch job process
WLM HPC batch service process
Job HPC batch job container SparkTensorflow VM in container
Server
S
p
a
r
k
S
p
a
r
k
Server
V
M
V
M
S
p
a
r
k
T
F
T
F
T
F
Server
J
o
b
WLM
Docker
Engine
Docker
Engine
Docker
Engine
Batch
Master
&
Slave
J
o
b
Server
S
p
a
r
k
Server
V
M
T
F
T
F
Server
J
o
b
WLM
Docker
Engine
Docker
Engine
Docker
Engine
J
o
b
WLM
J
o
b
J
o
bBatch
scale out
Batch
scale in
Run Master & slave ina container
LeveragesECP for master HA
Pro
• Remove dedicatedhosts
• ImprovedutilizationcomparedwithModel 1
• Can support multiple HPCBatch Service tenants
Con
• Utilizationisnot optimal due to sharing at the physical host boundary
WLM HPC batch service run as a container
Docker
Engine
Server
J
o
b
Batch
Master
&
Slave
Docker
Engine
Server
J
o
b
13. (13)
Fine Grain HPC Cluster Sharing Architecture
Server Server
S
p
a
r
k
S
p
a
r
k
Server
V
M
J
o
b
S
p
a
r
k
J
o
b
V
M
S
p
a
r
k
T
F
T
F
J
o
b
Sky
Form
T
F
HPC Batch
Queue
Definitions
SparkTensorFlow VM in container
Job HPC batch job container
Each of HPC Batch Service, Spark, TensorFlow,all VMs
in aggregateis a separatetenant
SampleResourcePlan (Algorithm) forBatch Service
tenant
• Own:50 CPUs, 100 GB memory
• Borrow:As manyCPUs/memoryas availablewith
the understandingthata job runningon
borrowedresources mustfinishedwithin ½ hour
• Lend: Lend up to 20 CPUs, 50 GB memorywith
the understandingthatthe loan resourceswill be
returned bywithin 15 minutes after a reclaim
requestis made
• The numberofpendingjobs and the job
characteristics / historicalpatterns areused to
trigger scaleup (reclaim,borrow),scaledown
(lend, return borrowed resources)
Built-in HPC Batch Service
• Optimal utilization
• Lowest TCO
• Fastest scaling out / in
14. (14)
OpenLava Compute Server
Server
Linux OS
Docker
“Best Effort” mode
Statefulset
Run as “host network”,
no need to set up DNS
lim:
root
Res:
root
Sbd:
root
JobA
UserAJobB
UserB
JobD
Docker job
UserD
JobC
Docker job
UserC
Shared file system (e.g.,
gluster, NFS)
Mount
LIMreports the resource
metrics of the physical host
On Docker startup, connectto
existing LDAP,Mount filesystems,
start up OL daemons, runlsaddhost
to add slaveto OpenLava cluster Support binary-based (process)
jobs and Docker based jobs
15. (15)
SkyForm ECP (Enterprise Container Platform) leverages Kubernetes to manage
container clusters
Challenge: How to share resources to ensure QoS among multiple tenants when
there is contention for resources (i.e., all resources all used/reserved)?
Solution: Leverage traditional HPC job scheduling algorithm for scheduling
containers
Using Kubernetes
16. (16)
Scheduling Policies and QoS
Policies
Priority
Fairshare/Round Robin
Preemption
Ownership (private cloud)
• Higher QoS users get
• High priority
• Moreshares
• Being able to preemptlow
priorityusers’containers
• Morephysicalresource
ownership
• Lower QoS users get
• Low priority
• Less shares
• Containers could be
preempted
• Less or zero physicalresource
ownership
17. (17)
SkyForm Future Development
Workload
Assurance
IoT, Big Data, HPC, ML
Infrastructure
Optimization
Multi-Tenant
Sharing
IntegratedDocker & VM Management
Algorithmic IT (AI) Enabled VM/Docker Cluster Manager
HighLow Infrastructure
Utilization
Low
High
OPTIMAL
Time
MULTI-TENANTFAIRSHARE