N problems of Linux Containers

Kirill Kolyshkin
Kirill KolyshkinProject Manager at Parallels, Inc.
N Problems
of Linux Containers
(with solutions!)
Kir Kolyshkin
<kir@openvz.org>
6 June 2015 ContainerDays Boston
openvz.org || criu.org || odin.com
Problem: Effective virtualization
● Virtualization is partitioning
● Historical way: $M mainframes
● Modern way: virtual machines
● Problem: performance overhead
● Partial solution: hardware support
(Intel VT, AMD V)
openvz.org || criu.org || odin.com
Solution: isolation
● Run many userspace instances
on top of one single (Linux) kernel
● All processes see each other
– files, process information, network,
shared memory, users, etc.
● Make them unsee it!
openvz.org || criu.org || odin.com
One historical way to unsee
chroot()
openvz.org || criu.org || odin.com
Namespaces
● Implemented in the Linux kernel
– PID (process tree)
– net (net devices, addresses, routing etc)
– IPC (shared memory, semaphores, msg queues)
– UTS (hostname, kernel version)
– mnt (filesystem mounts)
– user (UIDs/GIDs)
● clone() with CLONE_NEW* flags
openvz.org || criu.org || odin.com
Problem: Shared resources
● All containers share the same set of resources
(CPU, RAM, disk, various in-kernel things ...)
● Need fair distribution of “goods” so everyone
gets their share
● Need DoS prevention
● Need prioritization and SLAs
N problems of Linux Containers
openvz.org || criu.org || odin.com
Solution: OpenVZ resource controls
● OpenVZ:
– user beancounters
● controls 20 parameters
– hierarchical CPU scheduler
– disk quota per containers
– I/O priority and I/O bandwidth limit per-container
● Dynamic control, can “resize” runtime
N problems of Linux Containers
openvz.org || criu.org || odin.com
Solution 2: VSwap
● Only two primary parameters: RAM and swap
– others still exist, but are optional
● Swap is virtual, no actual I/O is performed
● Slow down to emulate real swap
● Only when actual global RAM shortage occurs,
virtual swap goes into the real swap
● Currently only available in OpenVZ kernel
openvz.org || criu.org || odin.com
Solution: cgroups + controllers
● Cgroups is a mechanism to control resources
per hierarchical groups of processes
● Cgroups is nothing without controllers:
– blkio, cpu, cpuacct, cpuset, devices, freezer,
memory, net_cls, net_prio
● Cgroups are orthogonal to namespaces
● Still working on it: just added kmem controller
openvz.org || criu.org || odin.com
Solution 3: vcmmd
●
4th
generation of OpenVZ resource mgmt
● A user-space daemon using kernel controls
● Monitors usage, tweaks limits
● Adds a “time” dimension
● More flexible limits, e.g. burstable
openvz.org || criu.org || odin.com
Problem: fast live migration
● We can already live migrate
a running OpenVZ container
from one server to another
without shutting it down
● We want to do it fast even for huge containers
– huge disk: use shared storage
– huge RAM: ???
openvz.org || criu.org || odin.com
Live migration process
(assuming shared storage)
● 1 Freeze the container
● 2 Dump its complete state to a dump file
● 3 Copy the dump file to destination server
● 4 Undump back to RAM, recreate everything
● 5 Unfreeze
● Problem: huge dump file -- takes long time*
to dump, copy, undump
* seconds
openvz.org || criu.org || odin.com
Solution 1: network swap
● 1 Dump the minimal memory, lock the rest
● 2 Restore the minimal memory,
mark the rest as swapped out
● 3 Set up network swap from the source
● 4 Unfreeze. Missing RAM will be “swapped in”
● 5 Migrate the rest of RAM and kill it on source
openvz.org || criu.org || odin.com
Solution 1: network swap
● 1 Dump the minimal memory, lock the rest
● 2 Copy, undump what we have,
mark the rest as swapped out
● 3 Set up network swap served from the source
● 4 Unfreeze. Missing RAM will be “swapped in”
● 5 Migrate the rest of RAM and kill it on source
● PROBLEM: no way to rollback
openvz.org || criu.org || odin.com
Solution 2: Iterative RAM migration
● 1 Ask kernel to track modified pages
● 2 Copy all memory to destination system mem
● 3 Ask kernel for list of modified pages
● 4 Copy those pages
● 5 GOTO 3 until satisfied
● 6 Freeze and do migration as usual, but
with much smaller set of pages
openvz.org || criu.org || odin.com
Problem: upstreaming
● OpenVZ was developed separately
● Same for many past IBM Linux projects
(ELVM, CKRM, ...)
● Develop, then merge it upstream
(i.e. to vanilla Linux kernel)
● Problem?
N problems of Linux Containers
openvz.org || criu.org || odin.com
Problem: upstreaming
● OpenVZ was developed separately
● Same for many past IBM Linux projects
(ELVM, CKRM, ...)
● Develop, then merge it upstream
(i.e. to vanilla Linux kernel)
● Problem:
grizzly bears upstream developers
do not accept massive patchsets
appearing out of nowhere
openvz.org || criu.org || odin.com
Solution 1: rewrite from scratch
● User Beancounters -> CGroups + controllers
● PID namespace: 2 rewrites until accepted
● Network namespace – rewritten
● It works!
● 1500+ patches ended up in vanilla
● OpenVZ made it to top10 contributors
openvz.org || criu.org || odin.com
Solution 2: circumvent the system!
● We tried hard to merge checkpoint/restore
● Other people tried hard too, no luck
● Can't make it to the kernel? Let's riot!
implement it in userspace
● With minimal kernel intervention when required
● Kernel exports most of information already, so
let's just add missing bits and pieces
openvz.org || criu.org || odin.com
CRIU
● Checkpoint / Restore [mostly] In Userspace
● About 3 years old, tools at version 1.6
● Users: Google, Samsung, Huawei, ...
● LXC & Docker – integrated!
● Already in upstream 3.x kernel
CONFIG_CHECKPOINT_RESTORE
● Live migration: P.Haul http://criu.org/P.Haul
openvz.org || criu.org || odin.com
CRIU Linux kernel patches, per v
Total: 176 (+11 this year)
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
4.0
4.1
pending
0
10
20
30
40
50
60
openvz.org || criu.org || odin.com
Problem: common file system
● Container is just a directory on the host we chroot() into
● File system journal (metadata updates) is a bottleneck
● Lots of small-size files I/O on CT backup/migration
(sometimes rsync hangs or OOMs!)
● No sub-tree disk quota support in upstream
● No sub-tree snapshots
● Live migration: rsync -- changed inodes
● File system type and properties are fixed, same for all CTs
openvz.org || criu.org || odin.com
Solution 1: LVM
● Only works only on top of block device
● Hard to manage
(e.g. how to migrate a huge volume?)
● No thin provisioning
openvz.org || criu.org || odin.com
Solution 2: loop device
(filesystem within a file)
● VFS operations leads to double page-caching
– (already fixed in the recent kernels)
● No thin provisioning
● Limited feature set
openvz.org || criu.org || odin.com
Solution 3: ZFS + zvol
● PRO: features
– zvol, thin provisioning, dedup, zfs send/receive
● CONTRA:
– Licensing is problematic
– Linux port issues (people report cache OOM)
– Was not available in 2008
openvz.org || criu.org || odin.com
Solution 4: ploop
● Basic idea: same as block loop, just better
● Modular design:
– various image formats (qcow2 in TODO progress)
– various I/O backends (ext4, vfs O_DIRECT, nfs)
● Feature rich:
– online resize (grow and shrink, ballooning)
– instant live snapshots
– write tracker to facilitate faster live migration
openvz.org || criu.org || odin.com
Any problems questions?
● kir@openvz.org
● Twitter: @kolyshkin @_openvz_ @__criu__
1 of 30

Recommended

Kонтейнерная виртуализация в продуктах parallels прошлое, настоящее и будущее. by
Kонтейнерная виртуализация в продуктах parallels прошлое, настоящее и будущее.Kонтейнерная виртуализация в продуктах parallels прошлое, настоящее и будущее.
Kонтейнерная виртуализация в продуктах parallels прошлое, настоящее и будущее.WG_ Events
2.6K views20 slides
OpenVZ Linux Containers by
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux ContainersKirill Kolyshkin
4.5K views28 slides
Openvz - a quick introduction by
Openvz - a quick introductionOpenvz - a quick introduction
Openvz - a quick introductionOlle E Johansson
1.7K views15 slides
Seven problems of Linux Containers by
Seven problems of Linux ContainersSeven problems of Linux Containers
Seven problems of Linux ContainersKirill Kolyshkin
28.1K views32 slides
Introduction to containers by
Introduction to containersIntroduction to containers
Introduction to containersNitish Jadia
122 views81 slides
Linux Virtualization by
Linux VirtualizationLinux Virtualization
Linux VirtualizationOpenVZ
514 views27 slides

More Related Content

What's hot

Containers are the future of the Cloud by
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the CloudPavel Odintsov
3.2K views21 slides
Not so brief history of Linux Containers by
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux ContainersKirill Kolyshkin
933 views25 slides
Puppet managed loadays by
Puppet managed loadaysPuppet managed loadays
Puppet managed loadaysloadays
284 views31 slides
An overview of OpenVZ virtualization technology by
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technologyOpenVZ
819 views26 slides
What's missing from upstream kernel containers? by
What's missing from upstream kernel containers?What's missing from upstream kernel containers?
What's missing from upstream kernel containers?Kirill Kolyshkin
853 views8 slides
The implementation of ldrp (with rear) by
The implementation of ldrp (with rear)The implementation of ldrp (with rear)
The implementation of ldrp (with rear)loadays
861 views39 slides

What's hot(18)

Containers are the future of the Cloud by Pavel Odintsov
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the Cloud
Pavel Odintsov3.2K views
Not so brief history of Linux Containers by Kirill Kolyshkin
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux Containers
Kirill Kolyshkin933 views
Puppet managed loadays by loadays
Puppet managed loadaysPuppet managed loadays
Puppet managed loadays
loadays284 views
An overview of OpenVZ virtualization technology by OpenVZ
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technology
OpenVZ819 views
What's missing from upstream kernel containers? by Kirill Kolyshkin
What's missing from upstream kernel containers?What's missing from upstream kernel containers?
What's missing from upstream kernel containers?
Kirill Kolyshkin853 views
The implementation of ldrp (with rear) by loadays
The implementation of ldrp (with rear)The implementation of ldrp (with rear)
The implementation of ldrp (with rear)
loadays861 views
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo... by Yandex
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex20.1K views
Docker storage drivers by Jérôme Petazzoni by Docker, Inc.
Docker storage drivers by Jérôme PetazzoniDocker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme Petazzoni
Docker, Inc.7.7K views
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going by Anne Nicolas
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Anne Nicolas795 views
Manage your bare-metal infrastructure with a CI/CD-driven approach by inovex GmbH
Manage your bare-metal infrastructure with a CI/CD-driven approachManage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approach
inovex GmbH2.1K views
Java in containers by Martin Baez
Java in containersJava in containers
Java in containers
Martin Baez271 views
Gluster fs for_storage_admins_glusterfs_meetup_07_feb by bipin kunal
Gluster fs for_storage_admins_glusterfs_meetup_07_febGluster fs for_storage_admins_glusterfs_meetup_07_feb
Gluster fs for_storage_admins_glusterfs_meetup_07_feb
bipin kunal1K views
GlusterFs: a scalable file system for today's and tomorrow's big data by Roberto Franchini
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big data
Roberto Franchini2.2K views
CRIU: Time and Space Travel for Linux Containers by Kirill Kolyshkin
CRIU: Time and Space Travel for Linux ContainersCRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
Kirill Kolyshkin1.7K views
Fedora Virtualization Day: Linux Containers & CRIU by Andrey Vagin
Fedora Virtualization Day: Linux Containers & CRIUFedora Virtualization Day: Linux Containers & CRIU
Fedora Virtualization Day: Linux Containers & CRIU
Andrey Vagin3K views
Arbiter volumes in gluster by itisravi
Arbiter volumes in glusterArbiter volumes in gluster
Arbiter volumes in gluster
itisravi1K views

Similar to N problems of Linux Containers

Seven problems of Linux containers by
Seven problems of Linux containersSeven problems of Linux containers
Seven problems of Linux containersOpenVZ
190 views32 slides
Openvz booth by
Openvz boothOpenvz booth
Openvz boothOpenVZ
930 views7 slides
OpenVZ, Virtuozzo and Docker by
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerKirill Kolyshkin
5.6K views32 slides
Not so brief history of Linux Containers - Kir Kolyshkin by
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinOpenVZ
1.4K views25 slides
Lightweight Virtualization with Linux Containers and Docker | YaC 2013 by
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
14.2K views76 slides
Lightweight Virtualization with Linux Containers and Docker I YaC 2013 by
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
1.1K views76 slides

Similar to N problems of Linux Containers(20)

Seven problems of Linux containers by OpenVZ
Seven problems of Linux containersSeven problems of Linux containers
Seven problems of Linux containers
OpenVZ190 views
Openvz booth by OpenVZ
Openvz boothOpenvz booth
Openvz booth
OpenVZ930 views
Not so brief history of Linux Containers - Kir Kolyshkin by OpenVZ
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
OpenVZ1.4K views
Lightweight Virtualization with Linux Containers and Docker | YaC 2013 by dotCloud
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud14.2K views
Lightweight Virtualization with Linux Containers and Docker I YaC 2013 by Docker, Inc.
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.1.1K views
OpenVZ Linux containers by OpenVZ
OpenVZ Linux containersOpenVZ Linux containers
OpenVZ Linux containers
OpenVZ468 views
Containers and Namespaces in the Linux Kernel by OpenVZ
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
OpenVZ748 views
LXC Containers and AUFs by Docker, Inc.
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFs
Docker, Inc.1.3K views
Containerization & Docker - Under the Hood by Imesha Sudasingha
Containerization & Docker - Under the HoodContainerization & Docker - Under the Hood
Containerization & Docker - Under the Hood
Imesha Sudasingha501 views
Containers - Cloud Phoenix March Meetup by Miguel Zuniga
Containers - Cloud Phoenix March MeetupContainers - Cloud Phoenix March Meetup
Containers - Cloud Phoenix March Meetup
Miguel Zuniga612 views
Systemd: the modern Linux init system you will learn to love by Alison Chaiken
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
Alison Chaiken3.1K views
Containers: from development to production at DevNation 2015 by Jérôme Petazzoni
Containers: from development to production at DevNation 2015Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015
Jérôme Petazzoni6.4K views
Docker and-containers-for-development-and-deployment-scale12x by rkr10
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
rkr10112 views
Advanced Namespaces and cgroups by Kernel TLV
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
Kernel TLV2.7K views
Intro to Kernel Debugging - Just make the crashing stop! by All Things Open
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
All Things Open632 views

Recently uploaded

ADDO_2022_CICID_Tom_Halpin.pdf by
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdfTomHalpin9
5 views33 slides
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...Stefan Wolpers
33 views38 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
6 views7 slides
predicting-m3-devopsconMunich-2023.pptx by
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
8 views24 slides
Introduction to Gradle by
Introduction to GradleIntroduction to Gradle
Introduction to GradleJohn Valentino
5 views7 slides
Introduction to Git Source Control by
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source ControlJohn Valentino
7 views18 slides

Recently uploaded(20)

ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers33 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino7 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app9 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta9 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert29 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski13 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8714 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 views
aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1146 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views

N problems of Linux Containers

  • 1. N Problems of Linux Containers (with solutions!) Kir Kolyshkin <kir@openvz.org> 6 June 2015 ContainerDays Boston
  • 2. openvz.org || criu.org || odin.com Problem: Effective virtualization ● Virtualization is partitioning ● Historical way: $M mainframes ● Modern way: virtual machines ● Problem: performance overhead ● Partial solution: hardware support (Intel VT, AMD V)
  • 3. openvz.org || criu.org || odin.com Solution: isolation ● Run many userspace instances on top of one single (Linux) kernel ● All processes see each other – files, process information, network, shared memory, users, etc. ● Make them unsee it!
  • 4. openvz.org || criu.org || odin.com One historical way to unsee chroot()
  • 5. openvz.org || criu.org || odin.com Namespaces ● Implemented in the Linux kernel – PID (process tree) – net (net devices, addresses, routing etc) – IPC (shared memory, semaphores, msg queues) – UTS (hostname, kernel version) – mnt (filesystem mounts) – user (UIDs/GIDs) ● clone() with CLONE_NEW* flags
  • 6. openvz.org || criu.org || odin.com Problem: Shared resources ● All containers share the same set of resources (CPU, RAM, disk, various in-kernel things ...) ● Need fair distribution of “goods” so everyone gets their share ● Need DoS prevention ● Need prioritization and SLAs
  • 8. openvz.org || criu.org || odin.com Solution: OpenVZ resource controls ● OpenVZ: – user beancounters ● controls 20 parameters – hierarchical CPU scheduler – disk quota per containers – I/O priority and I/O bandwidth limit per-container ● Dynamic control, can “resize” runtime
  • 10. openvz.org || criu.org || odin.com Solution 2: VSwap ● Only two primary parameters: RAM and swap – others still exist, but are optional ● Swap is virtual, no actual I/O is performed ● Slow down to emulate real swap ● Only when actual global RAM shortage occurs, virtual swap goes into the real swap ● Currently only available in OpenVZ kernel
  • 11. openvz.org || criu.org || odin.com Solution: cgroups + controllers ● Cgroups is a mechanism to control resources per hierarchical groups of processes ● Cgroups is nothing without controllers: – blkio, cpu, cpuacct, cpuset, devices, freezer, memory, net_cls, net_prio ● Cgroups are orthogonal to namespaces ● Still working on it: just added kmem controller
  • 12. openvz.org || criu.org || odin.com Solution 3: vcmmd ● 4th generation of OpenVZ resource mgmt ● A user-space daemon using kernel controls ● Monitors usage, tweaks limits ● Adds a “time” dimension ● More flexible limits, e.g. burstable
  • 13. openvz.org || criu.org || odin.com Problem: fast live migration ● We can already live migrate a running OpenVZ container from one server to another without shutting it down ● We want to do it fast even for huge containers – huge disk: use shared storage – huge RAM: ???
  • 14. openvz.org || criu.org || odin.com Live migration process (assuming shared storage) ● 1 Freeze the container ● 2 Dump its complete state to a dump file ● 3 Copy the dump file to destination server ● 4 Undump back to RAM, recreate everything ● 5 Unfreeze ● Problem: huge dump file -- takes long time* to dump, copy, undump * seconds
  • 15. openvz.org || criu.org || odin.com Solution 1: network swap ● 1 Dump the minimal memory, lock the rest ● 2 Restore the minimal memory, mark the rest as swapped out ● 3 Set up network swap from the source ● 4 Unfreeze. Missing RAM will be “swapped in” ● 5 Migrate the rest of RAM and kill it on source
  • 16. openvz.org || criu.org || odin.com Solution 1: network swap ● 1 Dump the minimal memory, lock the rest ● 2 Copy, undump what we have, mark the rest as swapped out ● 3 Set up network swap served from the source ● 4 Unfreeze. Missing RAM will be “swapped in” ● 5 Migrate the rest of RAM and kill it on source ● PROBLEM: no way to rollback
  • 17. openvz.org || criu.org || odin.com Solution 2: Iterative RAM migration ● 1 Ask kernel to track modified pages ● 2 Copy all memory to destination system mem ● 3 Ask kernel for list of modified pages ● 4 Copy those pages ● 5 GOTO 3 until satisfied ● 6 Freeze and do migration as usual, but with much smaller set of pages
  • 18. openvz.org || criu.org || odin.com Problem: upstreaming ● OpenVZ was developed separately ● Same for many past IBM Linux projects (ELVM, CKRM, ...) ● Develop, then merge it upstream (i.e. to vanilla Linux kernel) ● Problem?
  • 20. openvz.org || criu.org || odin.com Problem: upstreaming ● OpenVZ was developed separately ● Same for many past IBM Linux projects (ELVM, CKRM, ...) ● Develop, then merge it upstream (i.e. to vanilla Linux kernel) ● Problem: grizzly bears upstream developers do not accept massive patchsets appearing out of nowhere
  • 21. openvz.org || criu.org || odin.com Solution 1: rewrite from scratch ● User Beancounters -> CGroups + controllers ● PID namespace: 2 rewrites until accepted ● Network namespace – rewritten ● It works! ● 1500+ patches ended up in vanilla ● OpenVZ made it to top10 contributors
  • 22. openvz.org || criu.org || odin.com Solution 2: circumvent the system! ● We tried hard to merge checkpoint/restore ● Other people tried hard too, no luck ● Can't make it to the kernel? Let's riot! implement it in userspace ● With minimal kernel intervention when required ● Kernel exports most of information already, so let's just add missing bits and pieces
  • 23. openvz.org || criu.org || odin.com CRIU ● Checkpoint / Restore [mostly] In Userspace ● About 3 years old, tools at version 1.6 ● Users: Google, Samsung, Huawei, ... ● LXC & Docker – integrated! ● Already in upstream 3.x kernel CONFIG_CHECKPOINT_RESTORE ● Live migration: P.Haul http://criu.org/P.Haul
  • 24. openvz.org || criu.org || odin.com CRIU Linux kernel patches, per v Total: 176 (+11 this year) 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 4.0 4.1 pending 0 10 20 30 40 50 60
  • 25. openvz.org || criu.org || odin.com Problem: common file system ● Container is just a directory on the host we chroot() into ● File system journal (metadata updates) is a bottleneck ● Lots of small-size files I/O on CT backup/migration (sometimes rsync hangs or OOMs!) ● No sub-tree disk quota support in upstream ● No sub-tree snapshots ● Live migration: rsync -- changed inodes ● File system type and properties are fixed, same for all CTs
  • 26. openvz.org || criu.org || odin.com Solution 1: LVM ● Only works only on top of block device ● Hard to manage (e.g. how to migrate a huge volume?) ● No thin provisioning
  • 27. openvz.org || criu.org || odin.com Solution 2: loop device (filesystem within a file) ● VFS operations leads to double page-caching – (already fixed in the recent kernels) ● No thin provisioning ● Limited feature set
  • 28. openvz.org || criu.org || odin.com Solution 3: ZFS + zvol ● PRO: features – zvol, thin provisioning, dedup, zfs send/receive ● CONTRA: – Licensing is problematic – Linux port issues (people report cache OOM) – Was not available in 2008
  • 29. openvz.org || criu.org || odin.com Solution 4: ploop ● Basic idea: same as block loop, just better ● Modular design: – various image formats (qcow2 in TODO progress) – various I/O backends (ext4, vfs O_DIRECT, nfs) ● Feature rich: – online resize (grow and shrink, ballooning) – instant live snapshots – write tracker to facilitate faster live migration
  • 30. openvz.org || criu.org || odin.com Any problems questions? ● kir@openvz.org ● Twitter: @kolyshkin @_openvz_ @__criu__

Editor's Notes

  1. Remember on the earlier slide chroot() was a solution? Now it become a problem