It's all started in 1999... let's see where we are in 2015. The history of Linux Containers, presented by Kirill Kolyshkin at the ContainerCon 2015 in Seattle.
Diamond Application Development Crafting Solutions with Precision
Not so brief history of Linux Containers
1. A brief history of
Linux Containers
Kir Kolyshkin <kir@openvz.org>
ContainerCon, Seattle, 17th
of August 2015
2. A (not so) brief history of
(mostly) Linux Containers
Kir Kolyshkin <kir@openvz.org>
ContainerCon, Seattle, 17th
of August 2015
3. Evolution of OS
● Single process → batch processing → multitask
● Single user → multiple users and groups
● Single computer → network of computers
● Single userspace → multiple userspaces
a.k.a. containers
5. 1999-2000
● 1999: Initial idea about Virtuozzo
– “virtual environments” – groups of processes
– a file system to share code / save RAM
– resource management / isolation
● 2000: 5 engineers, public testing, 5000 VEs
with root accounts, public source code release
6. 2000
● User Beancounters:
– per process group limits
– Andrey Savochkin and Alan Cox
– barrier, limit, held, maxheld, failcnt
● Al Viro: [mount] namespace
7. 2001
● Virtuozzo for … Windows!
– no source code – lots of reverse engineering
– live kernel patching
– “most advanced software ever written for Windows”
● Linux-VServer project
– Jacques Gélinas, Herbert Pötzl
8. 2002-2003
● 2002 Jan: First Virtuozzo release (v2.0)
● 2003: Meiosys Metacluster
– containers for the sake of live migration
– acquired by IBM in 2005
9. 2004-2005
● Feb: Solaris Zones/Containers released
– kudos to Sun for the term “containers”!
● Dec: first Virtuozzo for Windows release
● CKRM, rsrc mgmt frmwrk frm IBM [FAIL]
● 2005: OpenVZ project announced
– better late than never
10. 2006-2010: up the stream!
● Lots of new namespaces:
– PID (process tree)
– net (net devices, addresses, routing etc)
– IPC (shared memory, semaphores, msg queues)
– UTS (hostname, kernel version)
– Mount (filesystem mounts and files, 2000)
– user (UIDs/GIDs, only completed in 2013, Linux 3.9)
● Use: clone() with CLONE_NEW* flags
12. 2006
● Kernel ports: 2.6.15, FC5, RHEL4, 2.6.18
● “Weekend project” ports to SPARC and Power
● Live migration in OpenVZ
13. Checkpointing and Live Migration
● Live migration, simplified:
– freeze processes, dump their complete state
– copy that dump to other machine
– restore from dump; unfreeze!
● Initially implemented in the kernel
– touches every subsystem (except drivers)
– so, really hard to merge upstream
15. 2007
● IBM AIX WPARs
● HP-UX SRP containers
● Rebase to RHEL5 kernel, port to 2.6.20
● 2007: cgroups framework from Google [PASS]
– based on cpusets feature from BULL/SGI
16. CGroups
● Cgroups is a mechanism to control resources
per hierarchical groups of processes
● Modern alternative to user beancounters
● Cgroups is nothing without controllers:
– blkio, cpu, cpuacct, cpuset, devices, freezer,
memory, net_cls, net_prio
● Cgroups are orthogonal to namespaces
● Still working on it: just added kmem controller
17. 2008-2009
● Kernel port to 2.6.25
● Weekend project – port to ARM
● LXC (userspace tool a la vzctl) was born
18. 2010
● Port to RHEL6
● VSwap (RAM/swap limits, simplified UBC)
● ploop aka CT filesystem in a file
– on-demand allocation
– instant snapshots
– online resize, merge, compact
– write tracker (improved live migration)
19. 20
2011-2012: CRIU
● Jul 2011: initial proposal for CRIU
● Idea: implement most of
C/R in userspace
using existing APIs
● Jul 2012: initial
CRIU release (v0.1)
criu.org
21. 23
2014
● CRIU for Docker & LXC support
● LXD announced
● OpenStack talks abt adding containers support
22. 24
OpenVZ in 2015
● New, more open development model
● Unified with Virtuozzo
● Plays well with Docker (in, out, and on the side)
23. CRIU in 2015
● 3 years old, tools at version 1.6.2
● Users: Google, Samsung, Huawei, ...
● LXC & Docker – integrated!
● TCP connection migration works!
● About 160 patches merged to 3.x - 4.x kernels
under CONFIG_CHECKPOINT_RESTORE
● Live migration: p.haul (criu.org/P.Haul)
24. Future!
● Virtuozzo 7
●
4th
gen of resource management: vcmmd
– More dynamic, with bursts, guarantees etc
● Proper port to POWER, ARM
● CRIU: p.haul, integration
(http://criu.org/Integration)
● MetaPC? Mosaic?
I like that this is a nested talk, it&apos;s like a novel within a book or story within a story. I don&apos;t like it&apos;s only 15 minutes, I got so much to tell you!
I like that this is a nested talk, it&apos;s like a novel within a book or story within a story. I don&apos;t like it&apos;s only 15 minutes, I got so much to tell you!
So, this is the first containercon. When do you think the history of containers started for Linux?
Disclaimer: I work for Odin (ex Parallels, ex SWsoft), my POV is skewed.
Our chief scientist, a professor from MIPT (~ru MIT), Alexander Tormasov proposed a new direction to senior mgmt – lightweight partitioning. He was inspired by IBM mainframe partitioning. The idea is to have multiple “virtual environments”, – isolated groups of processes, each acting as a standalone Linux machine (except for the kernel – shared). Another idea was about file system to share code (binaries/libraries) and therefore save RAM, making density even higher. Third cornerstone was resource isolation.
In Feb 2000 they got an office in MIPT, 3 engineers, a sysadm, a manager/engineer. Later two guys for web mgmt tools. Initial public testing, hot summer – 5000 VEs,
That initial testing revealed a big problem with resource isolation. A mathematician from MSU (~ru Stanford) hired, he wrote User Beancounters (with Alan Cox, luid idea from HP-UX). WARNING: PhD in economics!
Also in 2000 Al Viro wrote a first namespace for Linux kernel – the [mount] namespace. It&apos;s like chroot() but with bells and whistles. Kernel API is clone() call with CLONE_NEWNS flag.
As a result of OpenVZ upstreaming efforts, a few more namespaces appeared in the Linux kernel. Most notable ones are netns and pidns. Netns was developed by OpenVZ kernel guys based on their experience with OVZ kernel but from scratch. Pidns – were there two implementations, one from IBM, one from us, we won as ours had zero overhead on the first level of nesting.
User namespace was all IBM work, and it was initially merged in 2.6.23 (2007), but was only completed (became usable) in Linux 3.9 (2013).
We failed to upstream our User Beancounters, but Google contributed cgroups framework (it was an adaptation of cpusets feature from BULL/Silicon Graphics).
As stuff become available in the kernel, userspace tools emerged. LXC is such a tool from IBM.
This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it&apos;s the bottom of that top10, that is. Other companies in that list are way bigger.
Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.
This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it&apos;s the bottom of that top10, that is. Other companies in that list are way bigger.
Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.
What is LXC?
From the first glance very similar to OpenVZ
In fact LXC is just a user space tool a la vzctl
LXC uses standard kernel
OpenVZ is a complete set with its own kernel, many tools, libraries etc.
A superset of OpenVZ also existsas a commercial product (Virtuozzo)
Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.
The other thing is next gen resource management. It&apos;s more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.
Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.
The other thing is next gen resource management. It&apos;s more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.
We will probably be working on a proper ARM and POWER ports (the improper ones were done by me years ago just to demonstrate that the containers technology is arch-agnostic). The only arch-dependent feature is CPT/RST as it requires deep knowledge of arch to develop. CRIU is ported to ARM currently.
Finally, a MetaPC is something we&apos;re thinking about, a way to combine many servers into a single virtual big one. This is anti-partitioning, and it will work with the help of CRIU.