CRIU:
Time and Space Travel Service
for Linux Applications
Kir Kolyshkin
Texas Linux Fest, 14 Jun 2014
2
Agenda
What is CRIU?
Project history and state
Usage scenarios

Live migration

Reboot-less kernel upgrade

Slow serv...
3
What is CRIU?
Checkpoint Restore In Userspace
Checkpoint
or
Dump
Restore
or
Restart
Full
info
about
state
4
CRIU pre-history
●
OpenVZ project
●
Containers live migration feature
●
Containers → Upstream Linux
●
1500+ kernel patch...
5
Why in userspace?
Kernel
User-space
Dump:
- ptrace
- /proc
- netlink
- syscalls
Restore:
- syscalls
Process
kmod
C/R API
6
Some history
Project started almost 3 years ago
– an RFC on kernel memory API extension
– small command line tool
– mini...
7
Current project state
The latest release
– v1.3rc1
– supports x86_64 & ARM & AARM64
– support features that typical apps...
8
Some vitals
- 55K lines of code
- 150+ kernel patches
- contribs from Google, Huawei, Samsung, Canonical
9
Usage scenarios
●
Live migration
●
incl. Docker, LXC, OpenVZ containers
●
Kernel upgrade w/o reboot
●
Slow services star...
10
Live migration
Host A Host B
11
Live migration
Host A Host B
Shared FS
Pre-migrate memory
with memory tracker
http://criu.org/P.Haul
12
Load balancing on cluster
Host A
Host C
Host B
13
Power saving on cluster
Host A
Host C
Host B
14
Node maintenance
Host A Host B
15
Kernel upgrade w/o reboot
Host
Kernel A
Kexec
Kernel B
16
Slow services startup
time# service foo start
Service readiness
Spawn process
Load config
Top-up caches
Initialize reso...
17
Slow services startup
time
Tt < T
Ready
Spawn process
100%
Service readiness
# service foo restore
18
Periodic snapshots
time
Memory tracker helps
to keep images smaller
19
HPC
time
Power
failure
0% 20% 40% 60% 60%
20
Advanced debugging
Production Host
Application
in trouble
Developer Host
Debugger
21
Advanced testing
...
New test
or
new hardware
?
22
More (funny) use cases
Forgot to launch your program in screen
– Live-migrate it there
Playing a game without the save ...
23
Recap
●
Started as containers live-migration tool
●
General tool to dump/restore apps state
●
v1.2 + Linux-3.11+ can do...
24
Resources
http://criu.org – main site, documentation
http://git.criu.org – git repo with tool sources
http://plus.googl...
Upcoming SlideShare
Loading in …5
×

Criu texas-linux-fest-2014

1,326 views

Published on

Checkpoint/Restore is a technology that allows to take a snapshot of running Linux processes and restore those processes at any other place and time. This opens various possibilities such as live migration, keeping HPC tasks safe from hardware problems, cloud services, dynamic load balancing etc. Despite being very tempting feature to have, Linux lacked one for quite a long time.

The Checkpoint-Restore In Userspace (CRIU) project is The One to make this technology real. This talk covers the project history, its dependence from and influence on the Linux Kernel, and then goes on to usage scenarios that are now real with CRIU and that will be possible in the future.

The talk will be interesting to anyone who knows Linux as user, but especially to system and distribution developers, advanced users, and anyone involved in containers, virtualization, HA or HPC.

Published in: Software, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,326
On SlideShare
0
From Embeds
0
Number of Embeds
58
Actions
Shares
0
Downloads
17
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Criu texas-linux-fest-2014

  1. 1. CRIU: Time and Space Travel Service for Linux Applications Kir Kolyshkin Texas Linux Fest, 14 Jun 2014
  2. 2. 2 Agenda What is CRIU? Project history and state Usage scenarios  Live migration  Reboot-less kernel upgrade  Slow services startup  Advanced debugging and testing  and more...
  3. 3. 3 What is CRIU? Checkpoint Restore In Userspace Checkpoint or Dump Restore or Restart Full info about state
  4. 4. 4 CRIU pre-history ● OpenVZ project ● Containers live migration feature ● Containers → Upstream Linux ● 1500+ kernel patches from us ● Kernel-level checkpoint-restore merge failed ● User-level checkpoint-restore ...
  5. 5. 5 Why in userspace? Kernel User-space Dump: - ptrace - /proc - netlink - syscalls Restore: - syscalls Process kmod C/R API
  6. 6. 6 Some history Project started almost 3 years ago – an RFC on kernel memory API extension – small command line tool – minimal dump of process' internals First release – v0.1 -- 23 Jul 2012 (x86 and basic stuff) Since then – Kernel part completed a year ago (150+ kernel patches: new APIs for reading and setting process' state)
  7. 7. 7 Current project state The latest release – v1.3rc1 – supports x86_64 & ARM & AARM64 – support features that typical apps use – works on unmodified linux-3.11+ – Included into Debian, Fedora, Ubuntu, Arch, SUSE, Gentoo, CoreOS... Explicitly checked – Apache, nginx, Oracle*, mysql, mongodb – ssh/sshd, openvpn, cron, sendmail – Java, gcc, make – VNC + { gimp, mplayer, blender, supertux } – Screen + { bash, top, tcpdump, tar/bz2 } * some kernel tweaks required
  8. 8. 8 Some vitals - 55K lines of code - 150+ kernel patches - contribs from Google, Huawei, Samsung, Canonical
  9. 9. 9 Usage scenarios ● Live migration ● incl. Docker, LXC, OpenVZ containers ● Kernel upgrade w/o reboot ● Slow services startup ● Periodic snapshots (HPC) ● Advanced debugging and testing
  10. 10. 10 Live migration Host A Host B
  11. 11. 11 Live migration Host A Host B Shared FS Pre-migrate memory with memory tracker http://criu.org/P.Haul
  12. 12. 12 Load balancing on cluster Host A Host C Host B
  13. 13. 13 Power saving on cluster Host A Host C Host B
  14. 14. 14 Node maintenance Host A Host B
  15. 15. 15 Kernel upgrade w/o reboot Host Kernel A Kexec Kernel B
  16. 16. 16 Slow services startup time# service foo start Service readiness Spawn process Load config Top-up caches Initialize resource pools Ready T 100%
  17. 17. 17 Slow services startup time Tt < T Ready Spawn process 100% Service readiness # service foo restore
  18. 18. 18 Periodic snapshots time Memory tracker helps to keep images smaller
  19. 19. 19 HPC time Power failure 0% 20% 40% 60% 60%
  20. 20. 20 Advanced debugging Production Host Application in trouble Developer Host Debugger
  21. 21. 21 Advanced testing ... New test or new hardware ?
  22. 22. 22 More (funny) use cases Forgot to launch your program in screen – Live-migrate it there Playing a game without the save button – Snapshot it [Put your own use case here] http://criu.org/Usage_scenarios
  23. 23. 23 Recap ● Started as containers live-migration tool ● General tool to dump/restore apps state ● v1.2 + Linux-3.11+ can do the trick ● A lot of interesting technologies ● Memory tracker ● Migration of TCP connections ● Injecting your code into a running application ● Detecting kernel objects sharing ● etc.
  24. 24. 24 Resources http://criu.org – main site, documentation http://git.criu.org – git repo with tool sources http://plus.google.com/+CRIU page criu@openvz.org mailing list Kir Kolyshkin <kir@openvz.org> that's me Thank you!

×