Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Docker 原理與實作

4,596 views

Published on

the technology behind docker.
This is for osdc.tw 2014

Published in: Technology

Docker 原理與實作

  1. 1. docker 原理與實作 果凍
  2. 2. 簡介 ● 任職於迎廣科技 ○ python ○ openstack ● http://about.me/ya790206 ● http://blog.blackwhite.tw/ ● https://github.com/ya790206/call_seq
  3. 3. Agenda ● linux kernel namespace ● seccomp ● cgroup ● lxc ● docker
  4. 4. docker ● lightweight, portable, self- sufficient containers. ● the process running in the container is isolated from the process running in the other container.
  5. 5. Linux startup process ● Linux startup process ○ Boot loader -> ○ Kernel -> ○ Init process ● Difference between Linux distros: ○ package manager ○ init
  6. 6. Docker Autofs lxc Kernel namespaces Apparmor and SELinux profiles Seccomp policies Control groups Kernel capabilities Chroots btrfs
  7. 7. kernel namespace ● The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. ● Private view
  8. 8. kernel pid namespace root pid namespace pid 1 (pid 1) pid namespace x pid 2 (pid 2) pid 3 (pid 1) pid 4 (pid 2) ● black: the real pid. ● red: the pid process use getpid to get.
  9. 9. kernel namespace Mount namespaces UTS namespaces PID namespaces Network namespaces User namespaces IPC namespaces
  10. 10. int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | SIGCHLD, NULL); ● https://gist.github.com/ya790206/9855021
  11. 11. 尾巴沒藏好
  12. 12. int child_pid = clone(child_main, child_stack+STACK_SIZE, CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL); mount("proc", "/proc", "proc", 0, NULL); ● https://gist.github.com/ya790206/9855094
  13. 13. seccomp ● A process running in seccomp mode is severely limited in what it can do; ● there are only four system calls - read(), write(), exit(), and sigreturn() to already- open file descriptors.
  14. 14. libseccomp example https://gist.github. com/ya790206/9579145
  15. 15. cgroup ● This work was started by engineers at Google ● Resource limiting ● Prioritization ● Accounting ● Control
  16. 16. cgroup ○ blkio — this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, USB, etc.). ○ cpu — this subsystem uses the scheduler to provide cgroup tasks access to the CPU. ○ cpuacct — this subsystem generates automatic reports on CPU resources used by tasks in a cgroup. ○ cpuset — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup. ○ devices — this subsystem allows or denies access to devices by tasks in a cgroup. ○ freezer — this subsystem suspends or resumes tasks in a cgroup. ○ memory — this subsystem sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks. ○ net_cls — this subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup task. ○ net_prio — this subsystem provides a way to dynamically set the priority of network traffic per network interface. ○ ns — the namespace subsystem.
  17. 17. cgroup freezer ● The cgroup freezer is useful to batch job management system which start and stop sets of tasks in order to schedule the resources of a machine according to the desires of a system administrator.
  18. 18. $ mount -t cgroup - ofreezer freezer /<path>/freezer /<path>/freezer: root cgroup tasks other file my /<path>/freezer/my: sub cgroup tasks other file $ mkdir /<path>/freezer/my all process pid
  19. 19. cgroup freezer $ mount -t cgroup -ofreezer freezer /<path>/freezer $ ch /<path>/freezer/; ls cgroup.clone_children cgroup.event_control cgroup.procs cgroup.sane_behavior notify_on_release release_agent tasks 1. mkdir my_group;cd mygroup 2. echo $some_pid > tasks 3. echo FROZEN > freezer.state 4. echo THAWED > freezer.state
  20. 20. other cgroup ● memory cgroup: ○ limit process memoroy usage. ○ show various statistics ● blkio cgroup: ○ change widget ○ show various statistics
  21. 21. lxc ● LXC is a userspace interface for the Linux kernel containment features. ● Container templates ● A set of standard tools to control the containers
  22. 22. lxc host os container A process 1 process 2 container B process 3 process 4 process x A can see BA B A B A can see B. B can see A.
  23. 23. lxc 1. lxc-create -n test-container -t ubuntu 2. lxc-ls --fancy 3. lxc-start -n test-container 4. lxc-console -n test-container 5. lxc-stop -n test-container 6. lxc-destroy -n test-container
  24. 24. start vs execute ● start: ○ boot linux system ● execute: ○ execute program directly ○ make sure you have "/usr/lib/lxc/lxc-init" in your container
  25. 25. sudo lxc-checkpoint -name p1 --statefile a ● output: ○ lxc-checkpoint: 'checkpoint' function not implemented
  26. 26. linux aufs ● It allows files and directories of separate filesystem to co-exist under a single directories. /tmp/union /tmp/a /tmp/b /tmp/c
  27. 27. # apt-get install aufs-tools # mount -t aufs -o br=/tmp/a:/tmp/b none /tmp/union/ # mount -t aufs -o br=/tmp/a=rw:/tmp/b=rw none /tmp/union
  28. 28. docker vs lxc ● docker is based on lxc ● docker can create image from text file. ● docker seldom boot system. ● docker provide user-friendly interface ● docker use less disk space.(aufs)
  29. 29. docker running containers process rootfs stopped containers rootfs image commit r u n s t o p s t a r t rootfs
  30. 30. rootfs in container image: rw ZZZ image: ro XXX image: ro ubuntu image: ro rootfs in image image: ro ZZZ image: ro XXX image: ro ubuntu image: ro a u f s a u f s
  31. 31. taiwan.py site dockerfile FROM ubuntu:12.10 RUN apt-get update RUN apt-get install -y python-dev RUN apt-get install -y python-pip RUN apt-get install -y git RUN pip install mynt RUN git clone https://github.com/lucemia/taiwan.py RUN mynt gen -f taiwan.py/src/ taiwan.py/build/ EXPOSE 8000 CMD cd taiwan.py/build/ && python -m SimpleHTTPServer
  32. 32. How to run 1. cat dockerfile | sudo docker build -t taiwanpy - 2. docker run -p 8000:9000 taiwanpy 3. docker stop xxx 4. docker start xxx 5. docker stop xxx 6. docker rm xxx 7. docker rmi taiwanpy
  33. 33. simple docker shell ● https://github. com/ya790206/misc_tools/tree/ma ster/docker_wrapper
  34. 34. Summary ● Namespace for virtualization. ● Cgroup for controlling a group of process. ● Conatiner and host system use the same kernel. ● Docker is similar to lxc. But docker is easy to use.
  35. 35. Question
  36. 36. Thank you
  37. 37. 參考資料 - kernel namespace ● Namespaces in operation, part 1: namespaces overview ● PaaS under the hood, episode 1: kernel namespaces ● Introduction to Linux namespaces – Part 1: UTS
  38. 38. 參考資料 - cgruop ● cgroup ● http://en.wikipedia. org/wiki/Cgroups
  39. 39. 參考書目 ● Linux Kernel Hacks:改善效能、提昇開發效率 及節能的技巧與工具

×