CRIU: Time and Space Travel for Linux Containers

Project Manager at Parallels, Inc.
Nov. 6, 2015
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
1 of 22

More Related Content

What's hot

Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDOGluster.org
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster.org
N problems of Linux containersN problems of Linux containers
N problems of Linux containersOpenVZ
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Gluster.org
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitDon Marti
Container-relevant Upstream Kernel DevelopmentsContainer-relevant Upstream Kernel Developments
Container-relevant Upstream Kernel DevelopmentsDocker, Inc.

Similar to CRIU: Time and Space Travel for Linux Containers

Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovOpenVZ
Live migrating a container: pros, cons and gotchasLive migrating a container: pros, cons and gotchas
Live migrating a container: pros, cons and gotchasDocker, Inc.
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelOpenVZ
Containers > VMsContainers > VMs
Containers > VMsDavid Timothy Strauss
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
LXC on GanetiLXC on Ganeti
LXC on Ganetikawamuray

More from Kirill Kolyshkin

What's missing from upstream kernel containers?What's missing from upstream kernel containers?
What's missing from upstream kernel containers?Kirill Kolyshkin
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux ContainersKirill Kolyshkin
N problems of Linux ContainersN problems of Linux Containers
N problems of Linux ContainersKirill Kolyshkin
A brief history of Linux Containers A brief history of Linux Containers
A brief history of Linux Containers Kirill Kolyshkin
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerKirill Kolyshkin
Criu texas-linux-fest-2014Criu texas-linux-fest-2014
Criu texas-linux-fest-2014Kirill Kolyshkin

Recently uploaded

The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
MicroK8s 1.28 - MicroCeph on MicroK8s.pdfMicroK8s 1.28 - MicroCeph on MicroK8s.pdf
MicroK8s 1.28 - MicroCeph on MicroK8s.pdfKonstantinos Tsakalozos
Salesforce @AXA.pdfSalesforce @AXA.pdf
Salesforce @AXA.pdfPatrickYANG48
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
VisarXperienceVisarXperience
VisarXperienceVisartech

CRIU: Time and Space Travel for Linux Containers

Editor's Notes

  1. It's not about CRIU per se, as I can talk for a whole day about it, and you are probably not interested. It's about one of it's applications, which is containers live migration. I'm going to tell why and when it is useful, why it's not, and what are the obstacles if you decide to do it. <number>
  2. What is live migration? Live migration is very well described in science fiction, it's just its called teleportation there. An object is analyzed, information about its bits and pieces are communicated to the other side, and it's assembled there at the destination. It's pretty much the same for containers, except for the fact it's already implemented. <number>
  3. It is already implemented in OpenVZ, for about 10 years, in the kernel, as a kernel modules. For the last 4 years we are working on re-implementing that feature using a different «engine», developing the functionality of analyzing, decomposing and then re-composing the processes not as kernel modules, but as a user-space application. <number>
  4. Why would we want to migrate containers?: First, It looks awesome, totally mind blowing. If you take an inexperienced user and show them a set of processes with all the bells and whistles and stuff being moved from one physical server to another without being stopped --- it looks cool! Live migration can also be used to balance a load between a few machines. <number>
  5. Of course live migration is a complex technology, and it is error-prone and people are afraid of using it because of various possible side effects, good or bad. So, there are ways to avoid live migration. <number>
  6. One method is to balance not the processes using the resources, but the reason why they start to do it. For example, incoming network traffic – you can use some frontend to load balance, if your architecture allows it. Another method is microservices – you run services that don't have much context, much state, so you can stop anything and run it on a different machine pretty fast and without losing anything. Again, if your architecture allows it. This is a paradigm of OpenStack, Docker, and some Docker-based projects such as Kubernetes. Third option is somewhat peculiar, but is still being used. You wait until there's a major problem with the machine, and then you reboot and upgrade. Obvious option is to plan a downtime. <number>
  7. Anyway, live migration is also a way to go, and once we start using it we'll see that during migration a lot of time is spent on moving the memory over the network. To make the migration really live, to have a really uninterrupted service, you need to exclude this memory migration from the period of time when the container is frozen. There are two options for that. First one is to copy all or most of the memory before freezing the container. Second is not to migrate the memory. <number>
  8. Once we take into account this need to pre- or post-migrate the memory, the live migration is becoming more complicated. <number>
  9. There is some specifics in implementing such a technology for containers. As live migration for VMs exist for a while, while for containers it's relatively new. So to better understand the details, let's compare containers and VMs. Let's do it step by step. <number>
  10. All the virtual hardware a hypervisor gives to the guest OS, virtual CPU state and memory state. It's sort of like the same for Cts, but named differently. Instead of virtual hardware we have cgroups and namespaces. Instead of CPUs we have processes. <number>
  11. Not a problem for VM, as a hypervisor manages VM memory and knows everything about it. For Cts, there are many different types of memory – shared or private, backed by a file or not backed by a file, etc etc <number>
  12. There are two ways to catch the processes. First, we follow the steps of ps utility, get the processes one by one, stop them, make sure the ones we haven't stopped yet might fork and their children might fork. A second option is to use freeze cgroup. If you put processes inside such a cgroup you can later say «freeze!» and it will. In such case this freezing will be done by the kernel who is good at it. <number>
  13. For VM running a fresh install of say Fedora Linux, excluding the memory it will be about 300K of data and less than 100 objects. For CT, this is way more fine grained – open files, sockets, and everything those processes might have used. Plus, some of those objects might be shared, like files – so we have a graph rather than a tree. It takes somethat less space (comparable to VM), but the number of objects is two orders of magnitude greater! The second problem is not a fundamental one, but rather a specifics of the CRIU implementation. If we would do checkpoint from the kernel, we would know everything, every state of every object. But as we are doing it from the userspace we need some API to get such state. <number>
  14. For containers, receiving side can't get it from a socket as there might be some objects depending on the objects that are not yet copied <number>
  15. For CTs, we have a set of objects to be restored, and we have relations between those objects, a graph, and we have some rules, some restrictions on how to create these objects with their relations. It's not like we can create an object and then tie it to some other objects. We also have a state to which we want to go. So we need to solve this task, figure out a sequence to recreate all this. <number>
  16. To install a font: Open Fonts by clicking the Start button , clicking Control Panel, clicking Appearance and Personalization, and then clicking Fonts. Click File, and then click Install New Font. ... In the Add Fonts dialog box, under Drives, click the drive where the font that you want to install is located. http://windows.microsoft.com/en-us/windows-vista/install-or-uninstall-fonts <number>
  17. If a page is missing, the kernel won't kill the process but send a special message over that file descriptor so the listening process can get this memory and give it to the kernel Userfaultfd is not working as it for CRIU for a few reasons: - with QEMU, it's the same process initialing and handling the page fault,with CRIU it's different processes - not all memory types are currently supported . - an app can remap its memory, currently unsupported - fork() is not supported, child wil have pages with zeroes <number>
  18. Vibrant community, version 1.7.2 was released this week. Mostly driven by Odin, but also Google, Canonical, Red Hat, SuSE Debian, Samsung, Huawei, Docker… Integrated with OpenVZ (future version), LXC, LXD, Docker/Rocket libcontainer. Linux kernel developers are aware and helpful <number>
  19. For slow boot, we tried starting Eclipse GUI, took 30s to start, 1.5s to restore. <number>
  20. Project logo is the little humpbacked horse (a magic pony) <number>