SlideShare a Scribd company logo
1 of 29
Download to read offline
SAMSUNG OPEN SOURCE CONFERENCE 2019
SOSCON
Method of NUMA-Aware Resource Management for
Kubernetes 5G NFV Cluster
Samsung Electronics | Samsung Research | Byonggon Chun
10.16, 2019
SOSCON2019BIO
Byonggon Chun
Projects Details
Tizen
(2015~2016)
Tizen Web-Device API development
Iotivity
(2016~2017)
Iotivity development based on OCF 1.0 spec
(Endpoint, Smarthome, etc)
Edge Computing
(2017~2018)
Factory Edge Computing PoC
(Based on EdgeX, DDS)
FaaS based Home Edge Computing PoC
(Based on Greengrass Core, OSS FaaS, etc)
5G MEC
(2018~2019)
5G MEC PoC
(Based on LF Akraino, Openstak-helm, ETSI MEC Standard)
Container-based
NFV Infra
(2019~)
NUMA-aware Resource Manager for CNF(PoC)
(CPU, Memory, Hugepages)
Opensource Contribution~
(Kubernetes, Docker, Containerd)
SAMSUNG OPEN SOURCE CONFERENCE 2019
SOSCON2019
Background
Deep dive into Kubernetes at the node level
How Kubernetes supports NUMA
Kubernetes Contribution
01
02
03
04
Agenda
SOSCON2019Background
Major benefits of Network Function Containerization
• Faster startup speed(quick to deploy)
• Lower performance overhead(no overhead from guest kernel)
Cyclictest Benchmark with generic kernel
VM Docker Native
6532
391
90
39 38
Startuptimeinms
App
GuestOS
VM Docker Native
39422
13 8
LatencyinuSec
Startup Benchmark with generic kernel
Cyclictest Benchmark source: Minimizing Latency of Real-TimeContainer Cloud for Software Radio Access Networks, IEEE CloudCom,2015
SOSCON2019Background
Virtual Machine vs Container
Host Kernel
Hypervisor
(KVM/QEMU)
Guest Kernel
Bins/Libs
User Apps
Host Kernel
Container Runtime
(namespaces/cgroups)
VM Container
Difference between VM and Container
• Q. So…is Container a new kind of Virtual Machine without kernel emulating?
• A. Nope, you should know about “Linux namespaces” and “Linux control groups”.
Guest Kernel
Bins/Libs
User Apps
SOSCON2019Background
What is Container?
• The concept of container is lightweight mechanism to provide isolated environment.
• Processes are “isolated by linux namespaces”.
• The resource usage is “restricted by linux cgroup”.
• So the most of containers share host kernel.
• Sometimes containers running on the isolated kernel similar to virtual machine.
(kata-runtime, gvisor, etc)
Host Kernel
Host Space
The fundamental concept of container
Isolated Space
Processes Processes
Container
SOSCON2019Deep dive into Kuburnetes at the node level
The structure of Kubernetes is straightforward.
• Kubernetes consists of master components(APIs, scheduler, etc)
and node components(kubelet, container-runtime, mandatory-services).
Overall architecture of Kubernetes
API Server etcd Scheduler Controller Manager
Master
Kubelet Proxy
Node
Pod
Pod
Pod
Pod
Kubelet Proxy
Node
Pod
Pod
Pod
Pod
…
SOSCON2019Deep dive into Kuburnetes at the node level
Let’s tear down Pod.
• Pod is usually known as the basic execution unit or smallest deployable unit in Kubernetes.
• Let’s see the Pod at the point of namespaces and cgroup.
The concept of Pod
Pod
Infra Container
Container foo
Container goo
Container hoo
provide
network, ipc
namespaces
for containers
pod-cgroup
├── container-foo-cgroup
│ └── control-files
├── container-goo-cgroup
│ └── control-files
├── container-hoo-cgroup
│ └── control-files
├── infra-container-cgroup
│ └── control-files
└── control-files Reousce limits
are set on
cgroup.
SOSCON2019Deep dive into Kuburnetes at the node level
• Kubelet communicates with container runtimes over CRI.
• CRI is developed for loosely coupled structure between kubelet and container runtimes.
(But Kubelet still communicates with docker over dockershim which is part of kubelet)
• CRI offers set of gRPC APIs and protobuf messages for pod/container lifecycle management.
(CRI runtime runs CRI runtime service server, kubelet is client)
Let’s talk about kubelet and container runtime.
source: Kubernetes Blog, Introducing Container Runtime Interface (CRI) in Kubernetes
CRI RuntimeKubelet
GRPC/CRI
RunPodSandbox
CreateContainer
StartContainer
...
Docker,
containerd,
CRI-O
…
The concept of CRI
SOSCON2019Deep dive into Kuburnetes at the node level
What is OCI and OCI compliant runtime?
source: OCI Runtime Specification
• OCI(Open Container Initiative) offers “image-spec” and “runtime-spec” as open industry standards.
• Image-spec specifies image format for “OCI Runtime bundle” which is set of files.
• Runtime-spec defines the concept of runtime bundle and configuration & lifecycle of a container.
• OCI compliant runtime means runtime which can run “OCI Runtime bundle”.
(opencontainers/runc is known as the iconicOCI runtime and reference implementation.)
The concept of OCI image spec and runtime spec
OCI Runtime bundleOCI Image OCI Container
Extraction Execution
SOSCON2019Deep dive into Kuburnetes at the node level
Now we can draw clear picture with CRI and OCI runtime.
CRI RuntimeKubelet
CRI
Lifecycle related CRI APIs and OCI Runtime event
OCI Runtime
OCI
RunPodSandbox,
CreateContainer,
UpdateContainerResources,
StartContainer,
StopContainer,
RemoveContainer,
StopPodSandbox,
RemovePodSandbox,
…
State,
Create,
Start,
Kill,
Delete
source: Container Runtime Interface, OCI Runtime Specification
OCI Container
SOSCON2019Deep dive into Kuburnetes at the node level
Now we can draw clear picture with CRI and OCI runtime.
CRI RuntimeKubelet
CRI
List of CRI and OCI Runtimes
OCI Runtime
OCI
docker,
containerd,
CRI-O,
rkt,
frakti,
singularity-cri,
…
runc,
crun,
gVisor,
kata-runtimes,
nabula,
firecracker-runtime,
singularity,
…
OCI Container
SOSCON2019Deep dive into Kuburnetes at the node level
But in the real world, there is a “shim”.
Kubelet/dockershim dockerd/containerd runc
CRI-plugin/ContainerdKubelet runc
containerd-shim-v2
containerd-shim-v2
CRI-OKubelet runc
CRI
CRI
CRI
3 ways to runc
OCI
OCI
OCI
shim
shim
SOSCON2019Deep dive into Kuburnetes at the node level
But in the real world, there is a “shim”.
CRI-plugin/ContainerdKubelet containerd-shim-v2
CRI-OKubelet Kata-runtime
CRI
CRI
OCI
OCI
containerd-shim-v2
Kata-runtime
2 ways to Kata-runtime
shim
shim
SOSCON2019Deep dive into Kuburnetes at the node level
Do we have to know all of this for resource management?
CRI RuntimeKubelet
CRI
Sequence of pod and container creation
OCI Runtime
OCI
Create a Pod level
cgroup, then set
resource restriction
for a pod.
OCI Container
Create a container
level cgroup, then
set resource
restriction for a
container. Lastly, run
container on
dedicated cgroup.
• It is required to know how to manage resources at the low level.
(to use Node Allocatable Feature, and Resource Managers in Kubernetes like CPU manager.)
• It is required to know to run hardware accelerated application like DPDK with low level resource management.
• In the case of kata-container with KVM/QEMU, the way to manage resources is little bit different.
SOSCON2019How Kubernetes supports NUMA
What is NUMA?
• NUMA(Non-Uniform Memory Access) is modern style architecture for multi processors.
• Each socket(NUMA node) has own CPU Processor, Memory, PCI Devices.
(Typically, one socket equal to one NUMA node.)
• Processor is able to access remote memory and I/O devices on other sockets.
(But the remote access of resources shows performance decrement)
Typical 2 sockets configuration of Intel Xeon
CPU CPU
3 x PCIe(16x) 3 x PCIe(16x)
UPI
6 x DDR4 6 x DDR4
SOSCON2019How Kubernetes supports NUMA
When NUMA aware resource allocation is required?
• NUMA aware resource allocation should be made for following applications.
• Latency-sensitive applications such as real-time AR/VR and game streaming.
• Hardware acceleration based applications such as DPDK and CUDA.
Socket 0 Socket 1
136
194
LatencyinnSec
DPDK l2fwd Throughput with 10Gbps NIC
(Intel Xeon Scalable Gold 6148)
Memory access latency from socket 0
(Intel Xeon E7-4800)
aligned misaligned
9.9
7.9
ThroughputinGbps
Latency test source: Memory Latencies on Intel® Xeon® Processor E5-4600 and E7-4800 product families,Intel
SOSCON2019How Kubernetes supports NUMA
CPU Pinning in Kubernetes
• CPU pinning allows exclusive usage of CPUs for process or thread.
• CPU Manager in Kubernetes responsible for allocating logical threads(SMT) to containers.
(CPU Manager attempts to allocate siblingthreads to containers, when siblings are available.)
• CPU Manager allocates exclusive CPUs using CPUSET cgroup controller.
(It is possible to adjust container’s cpu affinity at thread level by “sched_setaffintiy”.)
• Alternative(Intel CMK) also available.
(Both solutions and NTM are contributed by Intel.)
Comparison between CPU Manager and Intel CMK
Solution Part of
Kubelet
Approach Allowed
CPUSET
NUMA Support Node Allocatable
Feature
Node Topology
Manager
CPU Manager Yes cgroup
(CPUSET)
Allocated CPUs
only
CPU, I/O Devices, etc
over NTM
supported supported
Intel CMK No(Plugin) sched_setaffinity
subprocess
Entire CPUs on
machine
CPU Only Not supported Not supported
SOSCON2019How Kubernetes supports NUMA
How it Works: CPU Manager
apiVersion: v1
kind: Pod
metadata:
name: dpdk-sample
spec:
containers:
- image: dpdk-sample
name: simple-l2fwd
resources:
requests:
cpu: "4"
memory: "1Gi"
hugepages-1Gi: "2Gi"
limits:
cpu: "4"
memory: "1Gi"
hugepages-1Gi: "2Gi"
cat <container-cgroup>.cpuset.cpus
1-2,41-42
Socket 0 Socket 1
-------- --------
Core 0 [0, 40] [20, 60]
Core 1 [1, 41] [21, 61]
Core 2 [2, 42] [22, 62]
Core 3 [3, 43] [23, 63]
Core 4 [4, 44] [24, 64]
Core 5 [5, 45] [25, 65]
Core 6 [6, 46] [26, 66]
Core 7 [7, 47] [27, 67]
Core 8 [8, 48] [28, 68]
…
Yaml example for CPU pinning CPU Layout(2socket, SMT enabled)
Allocated CPUs for container
“sched_setaffinity” usage in DPDK
(pin pThread3 to lCore42)
Process A
pThread0 pThread1
pThread2 pThread3
lCore2lCore1 lCore41 lCore42
Thread_creation
├─pthread_create
├─pthread_setname_np
└─pthread_setaffinity_np
└─ sched_setaffinity(tid, cpuset)
SOSCON2019How Kubernetes supports NUMA
Resource Manager and Plugins in Kubernetes
• Device Manager
(Component of Kubelet, advertises/allocates extendedresources.)
• Device Plugins
(nvidia-gpu-plugin,amd-gpu-plugin, gpu-sharing-plugin, sr-iov-plugin, rdma-device-plugin,etc)
Sequence of extended resource allocation in Kubernetes
Device Plugin
Register
Device Manager
ListWatch
Allocate
Normally, plugins are
gRPC server and
containerized.
Kubelet
PodAdmitHandler
SOSCON2019How Kubernetes supports NUMA
The concept of Topology Manager
• Topology Manager provides the way of NUMA-aware resource allocation for containers at the node level.
• Topology Manager retrieves Topology Hint from Hint Providers
• Topology Manager calculates NUMA node affinity then judges whether admit pod or not by given policy.
(pod admission will be rejected, if chosen policycannot be satisfied.)
Sequence of Pod admission with Topology Manager
Topology ManagerKubelet
Hints
Admit()
HintProviders
GetTopologyHints()
SyncPod()
admitorrejectPod
CPU Manager,
Device Manager,
…
SOSCON2019How Kubernetes supports NUMA
What is Topology Hint and Topology Policy?
• Topology Hint is data structure to represent NUMA nodes of allocable resources as bits.
• Topology Manager collects hints then merges the hints to find best hint.
(Policies share the same merging algorithm in 1.16, each policywill have own one in future release)
• Each policy has own pod admission criteria.
Policy Description
none Do nothing, Topology Manager will not working.
best-effort Calculate best hint then just use it whatever it is
restricted* Reject pod admission if best hint is not preferred hint
single-numa* Reject pod admission if best hint does not fit to single
NUMA nodeTopology Hint Structure
Topology Policies
//TopologyHint is a struct containing the NUMANodeAffinity for a Container
type TopologyHint struct {
NUMANodeAffinity bitmask.BitMask
// Preferred is set to true when the NUMANodeAffinity encodes a preferred
// allocation for the Container. It is set to false otherwise.
Preferred bool
}
SOSCON2019How Kubernetes supports NUMA
How it Works: Topology Manager (w/single-numa policy)
apiVersion: v1
kind: Pod
metadata:
name: ntm-sample
spec:
containers:
- image: simple-sample
name: simple-sample
resources:
requests:
cpu: “4"
memory: "1Gi"
nvidia.com/gpu: 1
limits:
cpu: “4"
memory: "1Gi"
nvidia.com/gpu: 1
Yaml example for NTM
Test Case Resource
availability at
scheduler level
Available Resources
on Socket 0
Available
Resources
on Socket 1
Expected Result
Positive Case 1 CPU: 20, GPU: 4 CPU: 10, GPU: 2 CPU: 10, GPU: 2 Socket0,
Socket1
Positive Case 2 CPU: 20, GPU: 2 CPU: 10, GPU: 2 CPU: 10, GPU: 0 Socket0
Positive Case 3 CPU: 7, GPU: 3 CPU: 3, GPU: 2 CPU: 4, GPU: 1 Socket1
Negative Case 1 CPU: 13, GPU: 2 CPU: 3 , GPU 2: CPU: 10, GPU: 0 Admit Rejected
Negative Case 2 CPU: 6, GPU: 4 CPU: 3 , GPU 2: CPU: 3 , GPU 2: Admit Rejected
PodAdmit TestCase
SOSCON2019How Kubernetes supports NUMA
Issues(in 1.16)
Issue Description
Kubernetes/Issues/#83476* Unreliable Topology Hint generation when multiple containers in the same pod require alignment.
Kubernetes/PR/#83697 Topology Manager wouldn’t allow pod admit with single-numa policy when any of hint providers had
no NUMA preferences.
(Merged)
Kubernetes/PR/#83492 Topology Manager supports only guaranteed QoS class.
(Merged)
Kubernetes/Issue/#83483 To support “inter-device” topology contstraints(i.e. GPU-direct, Nvlink, RDMA)
Kubernetes/Issues/#83478 Same affinity calculation algorithm for various policies.
(Refactoring has been already started.)
TBD Alignment is limited at the container level, Topology Manager doesn't support Pod level alignment.
SOSCON2019How Kubernetes supports NUMA
Helpful Links
Title Link
Cgroup https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
CPU Manager KEP https://github.com/kubernetes/community/blob/master/contributors/design-
proposals/node/cpu-manager.md
Device Manager KEP https://github.com/kubernetes/community/blob/master/contributors/design-
proposals/resource-management/device-plugin.md
Topology Manager KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0035-20190130-
topology-manager.md
CPU Manager Guide https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/
Topology Manager Guide https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/
Kubelet
(Container Manager)
https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/cm
SOSCON2019Kubernetes Contribution
Special Interest Groups(SIGs) are open to new contributors
source: https://github.com/kubernetes/community
SOSCON2019Kubernetes Contribution
Hugepages Enhancement
But…What is hugepages?
• Hugepages are literally page which has huge size, typical Linux machine supports two page sizes(2MB, 1GB).
(Default page size is 4kb)
• The concept of hugepages is reducing TLB miss to reduce memory access latency.
(Hugepages also allow high utilization of hardware cache by reducing PageTable Entries.)
• DPDK and Database are usually known as applications which consumes hugepages.
(DPDK is Data Plane Development Kit for packet processing.)
• Kubernetes supports to consume pre-allocated hugepages but it does not support NUMA
and container isolation of hugepages.
SOSCON2019Kubernetes Contribution
Hugepages Enhancement
What is the goal of hugepages enhancement?
• Support container isolation of hugepages
• Support multi size hugepages at host and container level.
• Support NUMA aware hugepages management.
SOSCON2019
SAMSUNG OPEN SOURCE CONFERENCE 2019
THANK YOU

More Related Content

What's hot

DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
FreeSWITCH Cluster by K8s
FreeSWITCH Cluster by K8sFreeSWITCH Cluster by K8s
FreeSWITCH Cluster by K8sChien Cheng Wu
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFinside-BigData.com
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingSreenivas Makam
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersDocker, Inc.
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux FirewallMarian Marinov
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux KernelKernel TLV
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Diverajdeep
 

What's hot (20)

DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Qemu Pcie
Qemu PcieQemu Pcie
Qemu Pcie
 
FreeSWITCH Cluster by K8s
FreeSWITCH Cluster by K8sFreeSWITCH Cluster by K8s
FreeSWITCH Cluster by K8s
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes Networking
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewall
 
Deploying IPv6 on OpenStack
Deploying IPv6 on OpenStackDeploying IPv6 on OpenStack
Deploying IPv6 on OpenStack
 
Linux05 DHCP Server
Linux05 DHCP ServerLinux05 DHCP Server
Linux05 DHCP Server
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Dive
 

Similar to Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster

DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationHank Preston
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016Kuniyasu Suzaki
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesAdam Hamsik
 
Kubernetes presentation
Kubernetes presentationKubernetes presentation
Kubernetes presentationGauranG Bajpai
 
Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !Anthony Dahanne
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetesJuraj Hantak
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetesDongwon Kim
 
kata-containers-onboarding-deck.pptx
kata-containers-onboarding-deck.pptxkata-containers-onboarding-deck.pptx
kata-containers-onboarding-deck.pptxQforQA
 
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...NETWAYS
 
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...NETWAYS
 
1. CNCF kubernetes meetup - Ondrej Sika
1. CNCF kubernetes meetup - Ondrej Sika1. CNCF kubernetes meetup - Ondrej Sika
1. CNCF kubernetes meetup - Ondrej SikaJuraj Hantak
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4OpenEBS
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...PT Datacomm Diangraha
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesRonny Trommer
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 

Similar to Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster (20)

Kubernetes
KubernetesKubernetes
Kubernetes
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Comparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetesComparison of existing cni plugins for kubernetes
Comparison of existing cni plugins for kubernetes
 
Kubernetes presentation
Kubernetes presentationKubernetes presentation
Kubernetes presentation
 
Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
kata-containers-onboarding-deck.pptx
kata-containers-onboarding-deck.pptxkata-containers-onboarding-deck.pptx
kata-containers-onboarding-deck.pptx
 
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
 
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
 
1. CNCF kubernetes meetup - Ondrej Sika
1. CNCF kubernetes meetup - Ondrej Sika1. CNCF kubernetes meetup - Ondrej Sika
1. CNCF kubernetes meetup - Ondrej Sika
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster

  • 1. SAMSUNG OPEN SOURCE CONFERENCE 2019 SOSCON Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster Samsung Electronics | Samsung Research | Byonggon Chun 10.16, 2019
  • 2. SOSCON2019BIO Byonggon Chun Projects Details Tizen (2015~2016) Tizen Web-Device API development Iotivity (2016~2017) Iotivity development based on OCF 1.0 spec (Endpoint, Smarthome, etc) Edge Computing (2017~2018) Factory Edge Computing PoC (Based on EdgeX, DDS) FaaS based Home Edge Computing PoC (Based on Greengrass Core, OSS FaaS, etc) 5G MEC (2018~2019) 5G MEC PoC (Based on LF Akraino, Openstak-helm, ETSI MEC Standard) Container-based NFV Infra (2019~) NUMA-aware Resource Manager for CNF(PoC) (CPU, Memory, Hugepages) Opensource Contribution~ (Kubernetes, Docker, Containerd)
  • 3. SAMSUNG OPEN SOURCE CONFERENCE 2019 SOSCON2019 Background Deep dive into Kubernetes at the node level How Kubernetes supports NUMA Kubernetes Contribution 01 02 03 04 Agenda
  • 4. SOSCON2019Background Major benefits of Network Function Containerization • Faster startup speed(quick to deploy) • Lower performance overhead(no overhead from guest kernel) Cyclictest Benchmark with generic kernel VM Docker Native 6532 391 90 39 38 Startuptimeinms App GuestOS VM Docker Native 39422 13 8 LatencyinuSec Startup Benchmark with generic kernel Cyclictest Benchmark source: Minimizing Latency of Real-TimeContainer Cloud for Software Radio Access Networks, IEEE CloudCom,2015
  • 5. SOSCON2019Background Virtual Machine vs Container Host Kernel Hypervisor (KVM/QEMU) Guest Kernel Bins/Libs User Apps Host Kernel Container Runtime (namespaces/cgroups) VM Container Difference between VM and Container • Q. So…is Container a new kind of Virtual Machine without kernel emulating? • A. Nope, you should know about “Linux namespaces” and “Linux control groups”. Guest Kernel Bins/Libs User Apps
  • 6. SOSCON2019Background What is Container? • The concept of container is lightweight mechanism to provide isolated environment. • Processes are “isolated by linux namespaces”. • The resource usage is “restricted by linux cgroup”. • So the most of containers share host kernel. • Sometimes containers running on the isolated kernel similar to virtual machine. (kata-runtime, gvisor, etc) Host Kernel Host Space The fundamental concept of container Isolated Space Processes Processes Container
  • 7. SOSCON2019Deep dive into Kuburnetes at the node level The structure of Kubernetes is straightforward. • Kubernetes consists of master components(APIs, scheduler, etc) and node components(kubelet, container-runtime, mandatory-services). Overall architecture of Kubernetes API Server etcd Scheduler Controller Manager Master Kubelet Proxy Node Pod Pod Pod Pod Kubelet Proxy Node Pod Pod Pod Pod …
  • 8. SOSCON2019Deep dive into Kuburnetes at the node level Let’s tear down Pod. • Pod is usually known as the basic execution unit or smallest deployable unit in Kubernetes. • Let’s see the Pod at the point of namespaces and cgroup. The concept of Pod Pod Infra Container Container foo Container goo Container hoo provide network, ipc namespaces for containers pod-cgroup ├── container-foo-cgroup │ └── control-files ├── container-goo-cgroup │ └── control-files ├── container-hoo-cgroup │ └── control-files ├── infra-container-cgroup │ └── control-files └── control-files Reousce limits are set on cgroup.
  • 9. SOSCON2019Deep dive into Kuburnetes at the node level • Kubelet communicates with container runtimes over CRI. • CRI is developed for loosely coupled structure between kubelet and container runtimes. (But Kubelet still communicates with docker over dockershim which is part of kubelet) • CRI offers set of gRPC APIs and protobuf messages for pod/container lifecycle management. (CRI runtime runs CRI runtime service server, kubelet is client) Let’s talk about kubelet and container runtime. source: Kubernetes Blog, Introducing Container Runtime Interface (CRI) in Kubernetes CRI RuntimeKubelet GRPC/CRI RunPodSandbox CreateContainer StartContainer ... Docker, containerd, CRI-O … The concept of CRI
  • 10. SOSCON2019Deep dive into Kuburnetes at the node level What is OCI and OCI compliant runtime? source: OCI Runtime Specification • OCI(Open Container Initiative) offers “image-spec” and “runtime-spec” as open industry standards. • Image-spec specifies image format for “OCI Runtime bundle” which is set of files. • Runtime-spec defines the concept of runtime bundle and configuration & lifecycle of a container. • OCI compliant runtime means runtime which can run “OCI Runtime bundle”. (opencontainers/runc is known as the iconicOCI runtime and reference implementation.) The concept of OCI image spec and runtime spec OCI Runtime bundleOCI Image OCI Container Extraction Execution
  • 11. SOSCON2019Deep dive into Kuburnetes at the node level Now we can draw clear picture with CRI and OCI runtime. CRI RuntimeKubelet CRI Lifecycle related CRI APIs and OCI Runtime event OCI Runtime OCI RunPodSandbox, CreateContainer, UpdateContainerResources, StartContainer, StopContainer, RemoveContainer, StopPodSandbox, RemovePodSandbox, … State, Create, Start, Kill, Delete source: Container Runtime Interface, OCI Runtime Specification OCI Container
  • 12. SOSCON2019Deep dive into Kuburnetes at the node level Now we can draw clear picture with CRI and OCI runtime. CRI RuntimeKubelet CRI List of CRI and OCI Runtimes OCI Runtime OCI docker, containerd, CRI-O, rkt, frakti, singularity-cri, … runc, crun, gVisor, kata-runtimes, nabula, firecracker-runtime, singularity, … OCI Container
  • 13. SOSCON2019Deep dive into Kuburnetes at the node level But in the real world, there is a “shim”. Kubelet/dockershim dockerd/containerd runc CRI-plugin/ContainerdKubelet runc containerd-shim-v2 containerd-shim-v2 CRI-OKubelet runc CRI CRI CRI 3 ways to runc OCI OCI OCI shim shim
  • 14. SOSCON2019Deep dive into Kuburnetes at the node level But in the real world, there is a “shim”. CRI-plugin/ContainerdKubelet containerd-shim-v2 CRI-OKubelet Kata-runtime CRI CRI OCI OCI containerd-shim-v2 Kata-runtime 2 ways to Kata-runtime shim shim
  • 15. SOSCON2019Deep dive into Kuburnetes at the node level Do we have to know all of this for resource management? CRI RuntimeKubelet CRI Sequence of pod and container creation OCI Runtime OCI Create a Pod level cgroup, then set resource restriction for a pod. OCI Container Create a container level cgroup, then set resource restriction for a container. Lastly, run container on dedicated cgroup. • It is required to know how to manage resources at the low level. (to use Node Allocatable Feature, and Resource Managers in Kubernetes like CPU manager.) • It is required to know to run hardware accelerated application like DPDK with low level resource management. • In the case of kata-container with KVM/QEMU, the way to manage resources is little bit different.
  • 16. SOSCON2019How Kubernetes supports NUMA What is NUMA? • NUMA(Non-Uniform Memory Access) is modern style architecture for multi processors. • Each socket(NUMA node) has own CPU Processor, Memory, PCI Devices. (Typically, one socket equal to one NUMA node.) • Processor is able to access remote memory and I/O devices on other sockets. (But the remote access of resources shows performance decrement) Typical 2 sockets configuration of Intel Xeon CPU CPU 3 x PCIe(16x) 3 x PCIe(16x) UPI 6 x DDR4 6 x DDR4
  • 17. SOSCON2019How Kubernetes supports NUMA When NUMA aware resource allocation is required? • NUMA aware resource allocation should be made for following applications. • Latency-sensitive applications such as real-time AR/VR and game streaming. • Hardware acceleration based applications such as DPDK and CUDA. Socket 0 Socket 1 136 194 LatencyinnSec DPDK l2fwd Throughput with 10Gbps NIC (Intel Xeon Scalable Gold 6148) Memory access latency from socket 0 (Intel Xeon E7-4800) aligned misaligned 9.9 7.9 ThroughputinGbps Latency test source: Memory Latencies on Intel® Xeon® Processor E5-4600 and E7-4800 product families,Intel
  • 18. SOSCON2019How Kubernetes supports NUMA CPU Pinning in Kubernetes • CPU pinning allows exclusive usage of CPUs for process or thread. • CPU Manager in Kubernetes responsible for allocating logical threads(SMT) to containers. (CPU Manager attempts to allocate siblingthreads to containers, when siblings are available.) • CPU Manager allocates exclusive CPUs using CPUSET cgroup controller. (It is possible to adjust container’s cpu affinity at thread level by “sched_setaffintiy”.) • Alternative(Intel CMK) also available. (Both solutions and NTM are contributed by Intel.) Comparison between CPU Manager and Intel CMK Solution Part of Kubelet Approach Allowed CPUSET NUMA Support Node Allocatable Feature Node Topology Manager CPU Manager Yes cgroup (CPUSET) Allocated CPUs only CPU, I/O Devices, etc over NTM supported supported Intel CMK No(Plugin) sched_setaffinity subprocess Entire CPUs on machine CPU Only Not supported Not supported
  • 19. SOSCON2019How Kubernetes supports NUMA How it Works: CPU Manager apiVersion: v1 kind: Pod metadata: name: dpdk-sample spec: containers: - image: dpdk-sample name: simple-l2fwd resources: requests: cpu: "4" memory: "1Gi" hugepages-1Gi: "2Gi" limits: cpu: "4" memory: "1Gi" hugepages-1Gi: "2Gi" cat <container-cgroup>.cpuset.cpus 1-2,41-42 Socket 0 Socket 1 -------- -------- Core 0 [0, 40] [20, 60] Core 1 [1, 41] [21, 61] Core 2 [2, 42] [22, 62] Core 3 [3, 43] [23, 63] Core 4 [4, 44] [24, 64] Core 5 [5, 45] [25, 65] Core 6 [6, 46] [26, 66] Core 7 [7, 47] [27, 67] Core 8 [8, 48] [28, 68] … Yaml example for CPU pinning CPU Layout(2socket, SMT enabled) Allocated CPUs for container “sched_setaffinity” usage in DPDK (pin pThread3 to lCore42) Process A pThread0 pThread1 pThread2 pThread3 lCore2lCore1 lCore41 lCore42 Thread_creation ├─pthread_create ├─pthread_setname_np └─pthread_setaffinity_np └─ sched_setaffinity(tid, cpuset)
  • 20. SOSCON2019How Kubernetes supports NUMA Resource Manager and Plugins in Kubernetes • Device Manager (Component of Kubelet, advertises/allocates extendedresources.) • Device Plugins (nvidia-gpu-plugin,amd-gpu-plugin, gpu-sharing-plugin, sr-iov-plugin, rdma-device-plugin,etc) Sequence of extended resource allocation in Kubernetes Device Plugin Register Device Manager ListWatch Allocate Normally, plugins are gRPC server and containerized. Kubelet PodAdmitHandler
  • 21. SOSCON2019How Kubernetes supports NUMA The concept of Topology Manager • Topology Manager provides the way of NUMA-aware resource allocation for containers at the node level. • Topology Manager retrieves Topology Hint from Hint Providers • Topology Manager calculates NUMA node affinity then judges whether admit pod or not by given policy. (pod admission will be rejected, if chosen policycannot be satisfied.) Sequence of Pod admission with Topology Manager Topology ManagerKubelet Hints Admit() HintProviders GetTopologyHints() SyncPod() admitorrejectPod CPU Manager, Device Manager, …
  • 22. SOSCON2019How Kubernetes supports NUMA What is Topology Hint and Topology Policy? • Topology Hint is data structure to represent NUMA nodes of allocable resources as bits. • Topology Manager collects hints then merges the hints to find best hint. (Policies share the same merging algorithm in 1.16, each policywill have own one in future release) • Each policy has own pod admission criteria. Policy Description none Do nothing, Topology Manager will not working. best-effort Calculate best hint then just use it whatever it is restricted* Reject pod admission if best hint is not preferred hint single-numa* Reject pod admission if best hint does not fit to single NUMA nodeTopology Hint Structure Topology Policies //TopologyHint is a struct containing the NUMANodeAffinity for a Container type TopologyHint struct { NUMANodeAffinity bitmask.BitMask // Preferred is set to true when the NUMANodeAffinity encodes a preferred // allocation for the Container. It is set to false otherwise. Preferred bool }
  • 23. SOSCON2019How Kubernetes supports NUMA How it Works: Topology Manager (w/single-numa policy) apiVersion: v1 kind: Pod metadata: name: ntm-sample spec: containers: - image: simple-sample name: simple-sample resources: requests: cpu: “4" memory: "1Gi" nvidia.com/gpu: 1 limits: cpu: “4" memory: "1Gi" nvidia.com/gpu: 1 Yaml example for NTM Test Case Resource availability at scheduler level Available Resources on Socket 0 Available Resources on Socket 1 Expected Result Positive Case 1 CPU: 20, GPU: 4 CPU: 10, GPU: 2 CPU: 10, GPU: 2 Socket0, Socket1 Positive Case 2 CPU: 20, GPU: 2 CPU: 10, GPU: 2 CPU: 10, GPU: 0 Socket0 Positive Case 3 CPU: 7, GPU: 3 CPU: 3, GPU: 2 CPU: 4, GPU: 1 Socket1 Negative Case 1 CPU: 13, GPU: 2 CPU: 3 , GPU 2: CPU: 10, GPU: 0 Admit Rejected Negative Case 2 CPU: 6, GPU: 4 CPU: 3 , GPU 2: CPU: 3 , GPU 2: Admit Rejected PodAdmit TestCase
  • 24. SOSCON2019How Kubernetes supports NUMA Issues(in 1.16) Issue Description Kubernetes/Issues/#83476* Unreliable Topology Hint generation when multiple containers in the same pod require alignment. Kubernetes/PR/#83697 Topology Manager wouldn’t allow pod admit with single-numa policy when any of hint providers had no NUMA preferences. (Merged) Kubernetes/PR/#83492 Topology Manager supports only guaranteed QoS class. (Merged) Kubernetes/Issue/#83483 To support “inter-device” topology contstraints(i.e. GPU-direct, Nvlink, RDMA) Kubernetes/Issues/#83478 Same affinity calculation algorithm for various policies. (Refactoring has been already started.) TBD Alignment is limited at the container level, Topology Manager doesn't support Pod level alignment.
  • 25. SOSCON2019How Kubernetes supports NUMA Helpful Links Title Link Cgroup https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt CPU Manager KEP https://github.com/kubernetes/community/blob/master/contributors/design- proposals/node/cpu-manager.md Device Manager KEP https://github.com/kubernetes/community/blob/master/contributors/design- proposals/resource-management/device-plugin.md Topology Manager KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0035-20190130- topology-manager.md CPU Manager Guide https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ Topology Manager Guide https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ Kubelet (Container Manager) https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/cm
  • 26. SOSCON2019Kubernetes Contribution Special Interest Groups(SIGs) are open to new contributors source: https://github.com/kubernetes/community
  • 27. SOSCON2019Kubernetes Contribution Hugepages Enhancement But…What is hugepages? • Hugepages are literally page which has huge size, typical Linux machine supports two page sizes(2MB, 1GB). (Default page size is 4kb) • The concept of hugepages is reducing TLB miss to reduce memory access latency. (Hugepages also allow high utilization of hardware cache by reducing PageTable Entries.) • DPDK and Database are usually known as applications which consumes hugepages. (DPDK is Data Plane Development Kit for packet processing.) • Kubernetes supports to consume pre-allocated hugepages but it does not support NUMA and container isolation of hugepages.
  • 28. SOSCON2019Kubernetes Contribution Hugepages Enhancement What is the goal of hugepages enhancement? • Support container isolation of hugepages • Support multi size hugepages at host and container level. • Support NUMA aware hugepages management.
  • 29. SOSCON2019 SAMSUNG OPEN SOURCE CONFERENCE 2019 THANK YOU