Andrey Vagin <avagin@openvz.org>● 1 June 2013, Moscow<Linux ContainersFedora Virtualization Day
2Different types of Virtualization● Virtual Machines– Emulation (qemu)– Paravirtualization (XEN)– Hardware Virtualization ...
3Virtual Machine (VM)HardwareHypervisorVirtual HWKernelAppsVirtual HWKernelAppsVirtual HWKernelAppsVirtual HWKernelApps
4Containers (CT)HardwareHost KernelAppsNamespacesAppsNamespacesAppsNamespacesAppsNamespaces- chroot() on steroids
5
7Comparison VM-s vs CT-s● One real HW, many virtual HW,many OS-s.● One real HW, one kernel, manyuserspace instances● Full ...
8
9
10Evolution of Operating System● Multitaskmany processes● Multiusermany users● Multicontainermany containers
11Containers (CT)Cgroups– control resources● cpu, cpuacct, cpuset● blkio● memory● net_clsNamespaces– isolate environments●...
12How to execute CTAll allowed by default● unshare, nsenter● Systemd Lightweight Containers● LXC● Libvirt LXCAll restricte...
13vzctl - perform various operations on a container# yum install -y vzctl-core# vzctl create 101 --ostemplate fedora-15# v...
14OpenVZ kernel only features● Ploop (snapshot, backups, different formats)● Second level quota● More functional memory ac...
Questions?http://openvz.org
Andrey Vagin <avagin@openvz.org><CRIU - Checkpoint/Restore in User-space
17What is C/R and how can it be used?C/R is the ability to save states of processes and to restore them later.Usage scenar...
18History●Berkeley Lab Checkpoint/Restart (BLCR) (2003)– Load a kernel module and link with a library● DMTCP: Distributed ...
19How does this work?Kernel objects Process treecrtoolsImage filesName-spacesFilesSocketsPipes0011011010101100010110100000...
20Kernel interfacesDump Restoresyscallsnetlink/proc/ptrace
21Dump● Parasite code– Receive file descriptors– Dump memory content– Prctl(), sigaction, pending signals, timers, etc.● P...
22Restore● Collect shared objects● Restore name-spaces● Create a process tree– Restore SID, PGID– Restore objects, which s...
23Interesting moments● How to restore shared objects?– Send file descriptors via unix sockets– Map files from /proc/self/m...
24Kernel impact~140 patches merged ~10 patches in flight~11 new features appeared ~2 new features to come
25New features in a kernel● Parasite code injection (by Tejun Heo)– Read task states, that are currently retrieved by a ta...
26New features in a kernel● TCP repair mode– Read intimate state of a TCP connectionand reconstructs it from scratch on a ...
27What are already supported?– X86_64 architecture– Process tree linkage– Multi-threaded apps– All kinds of memory mapping...
28How is CRIU tested?● ZDTM – a set of unit-tests● Real-life applications– Apache, Nginx– MySQL, MongoDB, Oracle– Make && ...
29Future plans (Feb, 2013)● Support all kinds of kernel objects● Merge all in-flight patches in the mainstream kernel● Int...
30How to use● ./crtools dump -t pid [<options>]– checkpoint a process/tree identified by pid● ./crtools restore -t pid [<o...
31Checkpoint/restore of a VNC server.
Questions?http://criu.org
Upcoming SlideShare
Loading in …5
×

Fedora Virtualization Day: Linux Containers & CRIU

2,611 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,611
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • BLCR is used a kernel module, doesn&apos;t checkpoint sockets, SysV IPC, zombies, etc. Applications should be linked with a library and executed via a helper. DMTCP uses an executer too, but doesn&apos;t require a kernel module. C/R in OpenVZ is used for checkpount/restore and migrate OpenVZ containers. It requires the OpenVZ kernel. Linux C/R is very similar on OpenVZ C/R. It is used for checkpoint/restore of LXC. CRIU combines all this project. It will work on the pure upstream kernel. It&apos;s able to dump a task without any preparation.
  • Fedora Virtualization Day: Linux Containers & CRIU

    1. 1. Andrey Vagin <avagin@openvz.org>● 1 June 2013, Moscow<Linux ContainersFedora Virtualization Day
    2. 2. 2Different types of Virtualization● Virtual Machines– Emulation (qemu)– Paravirtualization (XEN)– Hardware Virtualization (KVM, ESX)● OS Level Virtualization– Containers (Linux Containers, Solaris Zones, BSD Jails)
    3. 3. 3Virtual Machine (VM)HardwareHypervisorVirtual HWKernelAppsVirtual HWKernelAppsVirtual HWKernelAppsVirtual HWKernelApps
    4. 4. 4Containers (CT)HardwareHost KernelAppsNamespacesAppsNamespacesAppsNamespacesAppsNamespaces- chroot() on steroids
    5. 5. 5
    6. 6. 7Comparison VM-s vs CT-s● One real HW, many virtual HW,many OS-s.● One real HW, one kernel, manyuserspace instances● Full control on the guest OS ● Native performance: [almost] nooverhead● High density● KSM (Kernel SamePage Merging) ● Use resources on demand● Dynamic resource allocation● Naturally share pages● Depends on hardware(VT-x, VT-d, EPT, etc)● Not all functionality are virtualized● Flexibility
    7. 7. 8
    8. 8. 9
    9. 9. 10Evolution of Operating System● Multitaskmany processes● Multiusermany users● Multicontainermany containers
    10. 10. 11Containers (CT)Cgroups– control resources● cpu, cpuacct, cpuset● blkio● memory● net_clsNamespaces– isolate environments● MNT● PID● NET● IPC● User● UTS
    11. 11. 12How to execute CTAll allowed by default● unshare, nsenter● Systemd Lightweight Containers● LXC● Libvirt LXCAll restricted by default● OpenVZ (vzctl-core) (FC19)
    12. 12. 13vzctl - perform various operations on a container# yum install -y vzctl-core# vzctl create 101 --ostemplate fedora-15# vzctl start 101# vzctl exec 101 ps axPID TTY STAT TIME COMMAND1 ? Ss 0:00 init11830 ? Ss 0:00 syslogd -m 011897 ? Ss 0:00 /usr/sbin/sshd11943 ? Ss 0:00 xinetd -stayalive -pidfile ...12218 ? Ss 0:00 sendmail: accepting connections12265 ? Ss 0:00 sendmail: Queue runner@01:00:0013362 ? Ss 0:00 /usr/sbin/httpd13363 ? S 0:00 _ /usr/sbin/httpd..............................................6416 ? Rs 0:00 ps axf# vzctl stop 101# vzctl destroy 101
    13. 13. 14OpenVZ kernel only features● Ploop (snapshot, backups, different formats)● Second level quota● More functional memory accounting● PFCache (memory deduplication. Io-ops saving)● More isolated in compare with FC19 (lack of userns)
    14. 14. Questions?http://openvz.org
    15. 15. Andrey Vagin <avagin@openvz.org><CRIU - Checkpoint/Restore in User-space
    16. 16. 17What is C/R and how can it be used?C/R is the ability to save states of processes and to restore them later.Usage scenarios:– Failure recovery– Live migration– Reboot-less upgrade– Speed up of slow-boot services– HPC issues
    17. 17. 18History●Berkeley Lab Checkpoint/Restart (BLCR) (2003)– Load a kernel module and link with a library● DMTCP: Distributed MultiThreaded CheckPointing (2004-2006)– Preload a library●OpenVZ (2005)– OpenVZ kernel● Linux Checkpoint/Restart by Oren Laadan (2008)– A non-mainline kernel●CRIU (2011)OpenVZ2005BLCR2003Linux C/R2008CRIU2011DMTCP2007
    18. 18. 19How does this work?Kernel objects Process treecrtoolsImage filesName-spacesFilesSocketsPipes001101101010110001011010000011010101001101101010110001011010000011010101001101101010110001011010000011010101001101101010110001011010000011010101001101101010110001011010000011010101001101101010110001011010000011010101
    19. 19. 20Kernel interfacesDump Restoresyscallsnetlink/proc/ptrace
    20. 20. 21Dump● Parasite code– Receive file descriptors– Dump memory content– Prctl(), sigaction, pending signals, timers, etc.● Ptrace– freeze processes– Inject a parasite code● Netlink– Get information about sockets, netns● Procfs/proc/PID/maps, /proc/PID/map_files/,/proc/PID/status, /proc/PID/mountinfo
    21. 21. 22Restore● Collect shared objects● Restore name-spaces● Create a process tree– Restore SID, PGID– Restore objects, which should be inherited● Files, sockets, pipes, ...● Restore per-task properties.● Restore memory● Call sigreturn● AwesomeNamespacesProcesses
    22. 22. 23Interesting moments● How to restore shared objects?– Send file descriptors via unix sockets– Map files from /proc/self/map_files/ for restoring anon shared mappings● How to restore memory mappings on the correct places?– Map a new code block and a stack– Unmap crtools mappings– Remap tasks mappings on the correct places● How to resume a process?– Create a signal frame– Call sigreturn()
    23. 23. 24Kernel impact~140 patches merged ~10 patches in flight~11 new features appeared ~2 new features to come
    24. 24. 25New features in a kernel● Parasite code injection (by Tejun Heo)– Read task states, that are currently retrieved by a task only about itself● The kcmp() system call– Helps checking which kernel objects are shared between processes● Proc map_files directory– Find out what exact file is mapped– Mappings sharing info● A bunch of prctl extensions– Set various private stuff on task/mm objects (c/r-only feature)● Last-pid sysctl– Restore task with desired PID value
    25. 25. 26New features in a kernel● TCP repair mode– Read intimate state of a TCP connectionand reconstructs it from scratch on a freshly created socket● Sockets information dumping via netlink (sock_diag)– Extendable sockets state retrieving engine● Virtual net devices indexes– Allows to restore network devices in a namespace● Socket peeking offset– Allows peeking sockets queues (reading without removing data from queue)● Task memory tracking– incremental snapshots, online migration
    26. 26. 27What are already supported?– X86_64 architecture– Process tree linkage– Multi-threaded apps– All kinds of memory mappings– Terminals, groups, sessions– Open files (shared and unlinked)– Established TCP connections– Unix sockets, Packet sockets– Name-spaces (net, mount, ipc)– Non-posix files (epoll, inotify)– Pipes, Fifo-s, IPC, ...– ARM architecture– Pending signals– TCP time-stamps– Iterative snapshots– VDSO– LXC and OpenVZ containersIn flight– Posix timers– Convert OpenVZ images
    27. 27. 28How is CRIU tested?● ZDTM – a set of unit-tests● Real-life applications– Apache, Nginx– MySQL, MongoDB, Oracle– Make && gcc– Tar & gzip– Screen– Java– LXC– VNC server + GUI applications
    28. 28. 29Future plans (Feb, 2013)● Support all kinds of kernel objects● Merge all in-flight patches in the mainstream kernel● Integrate CRIU with OpenVZ and LXC utilities● Iterative migration– Migrate memory content before freezing applications● Integration in distributions– CRIU was accepted to Fedora 19
    29. 29. 30How to use● ./crtools dump -t pid [<options>]– checkpoint a process/tree identified by pid● ./crtools restore -t pid [<options>]– restore - restore a process/tree identified by pid● ./crtools show (-D dir)|(-f file) [<options>]– show dump file(s) contents● ./crtools check– checks whether the kernel support is up-to-date● ./crtools exec -t pid <syscall-string>– exec - execute a system call by other task
    30. 30. 31Checkpoint/restore of a VNC server.
    31. 31. Questions?http://criu.org

    ×