2. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
3. Host Linux Kernel
Process ProcessProcess
namespaces
/cgroups
namespaces
/cgroups
namespaces
/cgroups
Memory Network StorageCPU
Software is not enough!
Why Kata?
4. Why Kata?
Linux Kernel
container container container
Process Process Process
Linux Kernel
container container container
Process Process Process
VM VM
Host Linux Kernel
Host Linux Kernel
Guest Linux Kernel
Process
Container
Guest Linux Kernel Guest Linux Kernel
Process
Container
Process
Container
VM VM VM
HW Virtualization HW Virtualization HW Virtualization
Kata
Containers
“trust something to
someone”
Maintaining each individual VM becomes a pain!
To achieve more security, avoid noisy neighbor issues, Cloud providers spun up containers inside a VM
5. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
7. Container eco-system: Kubernetes
o Kubelet: Primary “node agent” that runs on each node. It has one job: given a
set of containers to run, make sure they are all running.
o CRI-O: An implementation of the Kubernetes CRI (Container Runtime
Interface) to enable using OCI (Open Container Initiative) compatible runtimes.
o Pod: A group of one or more containers, with shared storage/network, and a
specification for how to run the containers.
o Pause Container: Part of each pod that is responsible to create shared
network, assign IP address, share volumes for all containers inside a pod.
o Regular Containers Vs VM based containers:
io.kubernetes.cri-
o.TrustedSandbox not set
io.kubernetes.cri-
o.TrustedSandbox = true
io.kubernetes.cri-
o.TrustedSandbox = false
Default CRI-O trust
level: trusted
runc runc Kata containers
Default CRI-O trust
level: untrusted
Kata containers Kata containers Kata containers
Kubelet
CRI-O
runc Kata-runtime
pod
container
VM
pod
container
OCI OCI
8. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
9. Kata Architecture Overview
Kata-Shim
Kata-Shim
Kata-Runtime
Kata-Proxy Hypervisor
Guest Linux Kernel
Kata-Agent
VM
Container
Process Process
Container
I/O OCI Command
gRPC gRPC
gRPC over Yamux
o Kata-Runtime: An OCI compatible container runtime and is responsible for handling all commands
specified by the OCI runtime specification
o Kata-Agent: A process running in the guest as a supervisor for managing containers and processes
running within those containers.
o Kata-Shim: A container process reaper, such as Docker's containerd-shim or CRI-O's conmon.
o Kata-Proxy: A process offering access to the VM kata-agent to multiple kata-runtime and kata-shim
clients associated with the VM.
10. Kata Architecture Overview:
OCI lifecycle - Create
Kata-Shim Kata-Runtime
Kata-Proxy Hypervisor
Guest Linux Kernel
Kata-Agent
VM
Container
Process
I/O OCI create command
Socket Connection Start proxy (Socket Connection)
Connect VM
Start VM
Listen to serialConnection Established
Create Container
Container is created and running!
11. Kata Architecture Overview:
Container infrastructure
Kata-Runtime Kata-Agent
VM
libcontainer
Guest Linux Kernel
Container
Process
OCI Spec
o All namespace (PID, UTS, IPC, …) are created in kata-agent running inside
VM, except network and shared PID namespace.
o Resource constraint:
Inside VM: Can be achieved by passing value through kernel
command line to boot guest (configuration.toml).
Inside Container: Container orchestrator be it kubernetes /docker
needs to specify the resources via OCI spec
• E.g. “docker run -it --cpus=".5" ubuntu /bin/bash” would
ensure that container can use maximum 50% of the CPU
o Since ACRN-DM doesn’t support
9pfs, use device-mapper storage
driver.
Benefits: Device-mapper
POSIX compliant and more
performant than 9pfs.
o Container rootfs is provided as a
block device and is hot-plugged
directly to the container.
Block
device Guest Linux Kernel
virtio-blk
Container
rootfs volumes
VM
12. Kata Architecture Overview:
Host Namespaces
VETH
PAIR
VM
T
A
P
BRIDGE
DOCKER
BRIDGE
Pre-Start Hook
CNI/CNM plugin
New N/W namespace
Network namespace is run in the host, as network plugins expect to create an interface
between container namespace and host network namespace.
13. Kata Architecture Overview:
Sum-up 1. Create the network namespace where we
will spawn VM and shims processes.
2. Call into the pre-start hooks for creating the
veth network pair between the host
network namespace and the network
namespace freshly created.
3. Scan the network from the new network
namespace, and create a bridge connecting
the veth interface to a tap.
4. Start the VM inside the network namespace.
5. Wait for the VM to be ready.
6. Start kata-proxy, which will connect to the
created VM. (single proxy per VM)
7. Communicate with kata-agent (through the
proxy) to configure the sandbox inside the
VM.
8. Communicate with kata-agent to create the
container, relying on the OCI configuration
file config.json initially provided to kata-
runtime.
9. Start kata-shim, which will connect to the
gRPC server socket provided by the kata-
proxy. kata-shim will spawn a few Go
routines to parallelize blocking calls
ReadStdout() , ReadStderr().
14. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
15. Kata adaptations to support ACRN
o Kata-runtime:
Added ACRN hypervisor as a supported hypervisor, so that kata config could pickup ACRN instead of QEMU
when launching VM
[hypervisor.acrn]
path = "/usr/bin/acrn-dm"
kernel = "/usr/share/kata-containers/vmlinuz.container"
# initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
image = "/usr/share/kata-containers/kata-containers.img"
Implemented Sandbox Management APIs such as CreateSandBox, FetchSandBox..
• https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md#sandbox-management-api
Implemented Sandbox Operation APIs such as StartSandbox, StopSanbox, PauseSandBox..
• https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md#sandbox-operation-api
Implemented hot-plug API (currently have a WA. Working on adding PCI device hot-plug support in ACRN-
DM)
• https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md#sandbox-hotplug-api
Prime the devices, image and kernel as parameters for ACRN-DM when launching the VM.
16. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
17. ACRN-DM features to support Kata
o Socket Backend support:
Kata uses socket communication to talk between kata-runtime <->Kata-proxy, Kata-shim<->Kata-Proxy
and Kata-Proxy<->Kata-agent running inside VM.
Implemented socket backend for virtio-console device in acrn-dm. (This feature is already merged upstream)
• -s x,virtio-console,socket:”socket name”=“socket path” where x is the PCI slot number.
o PCI device Hot-Plug support:
Kata containers does hot plugging of container roofs for both Docker and Kubernetes.
• Kubernetes when creating a pod, first creates a *pause container and then subsequently creates the application
container(s). During the launch of the VM, the roofs for the application container is not known, it needs to be hot-
plugged.
Looking at both ACPI based hot-plug and PCI/PCIe hot-plug support for PCI device (container rootfs is
passed as virtio-blk device). Scoped out initial design and code changes that would be needed in acrn-
dm. (WIP)
o Graceful shutdown of VMs:
When kata containers finishes its job, expectation is that VM associated with container will be shutdown
gracefully. (WIP)
18. Agenda
Why Kata?
Container eco-system
Kata architecture Overview
Kata adaptations to support ACRN
ACRN-DM features to support Kata
Current Results & Next Steps
20. Next Steps
o Complete PCI device hot-plug support in ACRN-DM.
o Complete support for graceful shutdown of VMs.
o Complete validation with Kubernetes/Docker and identify limitations.
For example Docker Privilege mode cannot be supported as no simple way to grant the
VM access to all of the host devices which is expected by this command.
o Create PR request for Kata changes and upstream the changes.
Kata team wants all the ACRN-DM related changes to be complete.
o Performance optimizations to exceed or at least match
qemu/firecracker.