Andrey Vagin <avagin@openvz.org>
CRIU - Checkpoint/Restore in User-space
FOSDEM 2015
2
Agenda
● History
● Under the hood
● Online (iterative) migration
● P.Haul
● How to integrate with/into CRIU
● Kernel impact
● Questions
3
History
● OpenVZ (2005)
– OpenVZ kernel
● Linux Checkpoint/Restart by Oren Laadan (2008)
– A non-mainline kernel
● CRIU (2011)
OpenVZ
2005
Linux C/R
2008
CRIU
2011
5
How does this work?
Kernel objects Process tree
crtools
Image files
Name-spaces
Files
Sockets
Pipes
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
001101
101010
110001
011010
000011
010101
6
Dump
● Parasite code
– Receive file descriptors
– Dump memory content
– prctl(), sigaction, pending signals, timers, etc.
● Ptrace
– Freeze processes
– Inject a parasite code
● Netlink
– Get information about sockets, netns
● Procfs
/proc/PID/maps, /proc/PID/map_files/,
/proc/PID/status, /proc/PID/mountinfo
7
Restore
● Collect shared objects
● Restore name-spaces
● Create a process tree
– Restore SID, PGID
– Restore objects, which should be inherited
● Files, sockets, pipes, ...
● Restore per-task properties.
● Restore memory
● Sim! Sala bim!
● Awesome
Namespaces
Processes
9
sigreturn()
User mode Kernel mode
Normal
program
flow
do_signal()
handle_signal()
setup_frame()
Signal
handler
Return code
on the stack
system_call()
sys_sigreturn()
restore_sigcontext()
10
Why to integrate with CRIU
● RPC protocol and a shared library
● Plugins
● Action scripts
– lock, unlock network
– setup name-spaces
– etc
● External resources
– Bind-mounts of host file systems
– One end of a socket pair or a tty pair
– etc
How to integrate with CRIU
11
Who is CRIU user?
14
Online migration
● Suspend the container on the source host
● Transfer images on the remote host
A size of images can be quite big, so the downtime is too long.
● Resume the container on the target host
● Transfer memory in a few iterations without freezing processes
● Freeze processes only on the last iteration
15
Transfer memory iteratively
Whole memory
Dirty memory
Freeze
16
P.Haul (process hauler) - Live migration using CRIU
Live migration using CRIU
● Iterative
● Optimal
● Customizable
#./p.haul ovz 100 10.30.25.213
Migration succeeded
total time is ~2.86 sec
frozen time is ~1.99 sec
( ['0.27', '0.18', '1.55'] )
restore time is ~0.86 sec
img sync time is ~0.32 sec
17
P.Haul (Process Hauler)
Pre-dump and sync FS
Freeze, dump, sync FS
Restore
Kill
Resume
post-dump
Resume
rollback
clean up
source host destination host
18
New features in the kernel
● Parasite code injection (by Tejun Heo)
– Read task states, that are currently retrieved by a task only about itself
● The kcmp() system call
– Helps checking which kernel objects are shared between processes
● Proc map_files directory
– Find out what exact file is mapped
– Mappings sharing info
● A bunch of prctl extensions
– Set various private stuff on task/mm objects (c/r-only feature)
● Last-pid sysctl
– Restore task with desired PID value
19
New features in a kernel
● Sockets information dumping via netlink (sock_diag)
– Extendable sockets state retrieving engine
● TCP repair mode
– Read intimate state of a TCP connection
and reconstructs it from scratch on a freshly created socket
● Virtual net devices indexes
– Allows to restore network devices in a namespace
● Socket peeking offset
– Allows peeking sockets queues (reading without removing data from queue)
● Task memory tracking
– incremental snapshots, online migration
20
Community
21
In a Nutshell, CRIU...
.... has had 5046 commits made by 44 contributors
representing 85414 lines of code
... is mostly written in C
with a very low number of source code comments
... has a young, but established codebase
maintained by a large development team
with stable Y-O-Y commits
... took an estimated 17 years of effort (COCOMO model)
starting with its first commit in September, 2011
https://www.ohloh.net/p/criu#
Thank you
http://criu.org
https://plus.google.com/+CriuOrg
criu@openvz.org

FOSDEM 2015: Live migration for containers is around the corner

  • 1.
    Andrey Vagin <avagin@openvz.org> CRIU- Checkpoint/Restore in User-space FOSDEM 2015
  • 2.
    2 Agenda ● History ● Underthe hood ● Online (iterative) migration ● P.Haul ● How to integrate with/into CRIU ● Kernel impact ● Questions
  • 3.
    3 History ● OpenVZ (2005) –OpenVZ kernel ● Linux Checkpoint/Restart by Oren Laadan (2008) – A non-mainline kernel ● CRIU (2011) OpenVZ 2005 Linux C/R 2008 CRIU 2011
  • 4.
    5 How does thiswork? Kernel objects Process tree crtools Image files Name-spaces Files Sockets Pipes 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101 001101 101010 110001 011010 000011 010101
  • 5.
    6 Dump ● Parasite code –Receive file descriptors – Dump memory content – prctl(), sigaction, pending signals, timers, etc. ● Ptrace – Freeze processes – Inject a parasite code ● Netlink – Get information about sockets, netns ● Procfs /proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo
  • 6.
    7 Restore ● Collect sharedobjects ● Restore name-spaces ● Create a process tree – Restore SID, PGID – Restore objects, which should be inherited ● Files, sockets, pipes, ... ● Restore per-task properties. ● Restore memory ● Sim! Sala bim! ● Awesome Namespaces Processes
  • 7.
    9 sigreturn() User mode Kernelmode Normal program flow do_signal() handle_signal() setup_frame() Signal handler Return code on the stack system_call() sys_sigreturn() restore_sigcontext()
  • 8.
    10 Why to integratewith CRIU ● RPC protocol and a shared library ● Plugins ● Action scripts – lock, unlock network – setup name-spaces – etc ● External resources – Bind-mounts of host file systems – One end of a socket pair or a tty pair – etc How to integrate with CRIU
  • 9.
  • 10.
    14 Online migration ● Suspendthe container on the source host ● Transfer images on the remote host A size of images can be quite big, so the downtime is too long. ● Resume the container on the target host ● Transfer memory in a few iterations without freezing processes ● Freeze processes only on the last iteration
  • 11.
    15 Transfer memory iteratively Wholememory Dirty memory Freeze
  • 12.
    16 P.Haul (process hauler)- Live migration using CRIU Live migration using CRIU ● Iterative ● Optimal ● Customizable #./p.haul ovz 100 10.30.25.213 Migration succeeded total time is ~2.86 sec frozen time is ~1.99 sec ( ['0.27', '0.18', '1.55'] ) restore time is ~0.86 sec img sync time is ~0.32 sec
  • 13.
    17 P.Haul (Process Hauler) Pre-dumpand sync FS Freeze, dump, sync FS Restore Kill Resume post-dump Resume rollback clean up source host destination host
  • 14.
    18 New features inthe kernel ● Parasite code injection (by Tejun Heo) – Read task states, that are currently retrieved by a task only about itself ● The kcmp() system call – Helps checking which kernel objects are shared between processes ● Proc map_files directory – Find out what exact file is mapped – Mappings sharing info ● A bunch of prctl extensions – Set various private stuff on task/mm objects (c/r-only feature) ● Last-pid sysctl – Restore task with desired PID value
  • 15.
    19 New features ina kernel ● Sockets information dumping via netlink (sock_diag) – Extendable sockets state retrieving engine ● TCP repair mode – Read intimate state of a TCP connection and reconstructs it from scratch on a freshly created socket ● Virtual net devices indexes – Allows to restore network devices in a namespace ● Socket peeking offset – Allows peeking sockets queues (reading without removing data from queue) ● Task memory tracking – incremental snapshots, online migration
  • 16.
  • 17.
    21 In a Nutshell,CRIU... .... has had 5046 commits made by 44 contributors representing 85414 lines of code ... is mostly written in C with a very low number of source code comments ... has a young, but established codebase maintained by a large development team with stable Y-O-Y commits ... took an estimated 17 years of effort (COCOMO model) starting with its first commit in September, 2011 https://www.ohloh.net/p/criu#
  • 18.