Docker - container and lightweight virtualization

5,341 views

Published on

Docker

Published in: Technology

Docker - container and lightweight virtualization

  1. 1. Docker - Container and Lightweight virtualization Paul Sim Technical Account Manager paul.sim@canonical.com
  2. 2. Virtualization Virtual Machine Virtual Machine Application Hypervisor Hardware Virtual Machine Application Application Application Application Application Container Container Application Application Application Operating System Hardware Kernel Kernel Kernel Para-virtualization Para-virtualization Container Application Application Application Type 1, Type 2 Lightweight virtualization
  3. 3. Linux Container - aka LXC Namespace Namespace Ubuntu Precise Ubuntu Trusty Namespace CentOS tomcat Hardware MongoDB Linux Kernel apache Namespace - UTS - IPC - PID - Network - User Control group MySQL Rails Nginx running env running env running env
  4. 4. Linux Container - performance less than 1 % degradation Realizing Linux Containers (LXC) - IBM
  5. 5. Linux Container - performance Realizing Linux Containers (LXC) - IBM
  6. 6. Docker janghoon@ubuntu:~$ sudo docker run -i -t centos:latest /bin/bash bash-4.1# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 15:51 ? 00:00:00 /bin/bash root 75 1 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 77 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 78 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 79 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 80 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 81 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 82 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 83 75 0 15:54 ? 00:00:00 /usr/sbin/httpd apache 84 75 0 15:54 ? 00:00:00 /usr/sbin/httpd bash-4.1# ls bin boot dev etc home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp usr var bash-4.1# uname -a Linux 7c6702b13a48 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux bash-4.1# cat /etc/redhat-release CentOS release 6.5 (Final) janghoon@ubuntu:~$ ps -ef | grep httpd root 7605 7256 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7607 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7608 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7609 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7610 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7611 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7612 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7613 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd 48 7614 7605 0 6월12 ? 00:00:00 /usr/sbin/httpd
  7. 7. Docker Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.
  8. 8. Docker Docker Hub/Repository libcontainer lxc libvirt systemd-nspawn Linux Namespace Control group Capabilities AppArmor netfilter
  9. 9. Docker Docker images A Docker image is a read-only template. For example, an image could contain an Ubuntu operating system with Apache and your web application installed. Images are used to create Docker containers. Docker provides a simple way to build new images or update existing images, or you can download Docker images that other people have already created. Docker images are the build component of Docker. Docker Registries Docker registries hold images. These are public or private stores from which you upload or download images. The public Docker registry is called Docker Hub. It provides a huge collection of existing images for your use. These can be images you create yourself or you can use images that others have previously created. Docker registries are the distribution component of Docker. Docker containers Docker containers are similar to a directory. A Docker container holds everything that is needed for an application to run. Each container is created from a Docker image. Docker containers can be run, started, stopped, moved, and deleted. Each container is an isolated and secure application platform. Docker containers are the run component of Docker.
  10. 10. Custom Image Docker MySQL Memcached node.js Rails MongoDB Apache Docker Image Running environment - libraries, binaries... Debian, Ubuntu, CentOS Docker Hub/Registry Container Run pull commit
  11. 11. Docker - networking Host machine Container NameSpace -1 NameSpace-2 vNIC NIC vNIC Bridge Container vNIC NIC Bridge veth veth veth veth peer
  12. 12. Docker - networking janghoon@ubuntu:~$ ifconfig docker0 Link encap:Ethernet HWaddr 4e:da:3e:50:cb:ef inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::18df:90ff:fe07:45b7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:10569 errors:0 dropped:0 overruns:0 frame:0 TX packets:18750 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:553677 (553.6 KB) TX bytes:28304983 (28.3 MB) em1 Link encap:Ethernet HWaddr c0:3f:d5:62:44:05 inet addr:172.30.1.51 Bcast:172.30.1.255 Mask:255.255.255.0 inet6 addr: fe80::c23f:d5ff:fe62:4405/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:29764 errors:0 dropped:0 overruns:0 frame:0 TX packets:20619 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:30366129 (30.3 MB) TX bytes:2240284 (2.2 MB) Interrupt:20 Memory:f7c00000-f7c20000 veth3adc Link encap:Ethernet HWaddr 4e:da:3e:50:cb:ef inet6 addr: fe80::4cda:3eff:fe50:cbef/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:10569 errors:0 dropped:0 overruns:0 frame:0 TX packets:18739 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:701643 (701.6 KB) TX bytes:28302881 (28.3 MB) bash-4.1# ifconfig eth0 Link encap:Ethernet HWaddr FA:25:A0:77:55:C9 inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::f825:a0ff:fe77:55c9/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:18739 errors:0 dropped:2 overruns:0 frame:0 TX packets:10569 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:28302881 (26.9 MiB) TX bytes:701643 (685.1 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) bash-4.1# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 172.17.42.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 * 255.255.0.0 U 0 0 0 eth0 within a container on host machine
  13. 13. Docker - storage root@ubuntu:~# ls -l /var/lib/docker/containers/7c6702b13a48a9b5ba9c70537e8c82ec95a5a98ac8fb23fa70669f39d80b1d07/root total 80 dr-xr-xr-x 2 root root 4096 6월 12 23:52 bin drwxr-xr-x 3 root root 4096 6월 12 23:52 boot drwxr-xr-x 4 root root 4096 6월 12 23:51 dev drwxr-xr-x 57 root root 4096 6월 12 23:52 etc drwxr-xr-x 2 root root 4096 9월 23 2011 home dr-xr-xr-x 8 root root 4096 6월 12 23:52 lib dr-xr-xr-x 6 root root 4096 6월 12 23:52 lib64 drwx------ 2 root root 4096 6월 10 00:10 lost+found drwxr-xr-x 2 root root 4096 9월 23 2011 media drwxr-xr-x 2 root root 4096 9월 23 2011 mnt drwxr-xr-x 2 root root 4096 9월 23 2011 opt drwxr-xr-x 2 root root 4096 6월 10 00:10 proc dr-xr-x--- 2 root root 4096 6월 10 00:14 root dr-xr-xr-x 2 root root 4096 6월 12 23:52 sbin drwxr-xr-x 3 root root 4096 6월 10 00:14 selinux drwxr-xr-x 2 root root 4096 9월 23 2011 srv drwxr-xr-x 2 root root 4096 6월 10 00:10 sys drwxrwxrwt 2 root root 4096 6월 12 23:54 tmp drwxr-xr-x 18 root root 4096 6월 12 23:52 usr drwxr-xr-x 24 root root 4096 6월 12 23:54 var
  14. 14. Namespace root@ubuntu:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7c6702b13a48 centos:centos6 /bin/bash About an hour ago Up About an hour berserk_ptolemy root@ubuntu:~# docker inspect 7c6702b13a48 | grep Pid "Pid": 7256, root@ubuntu:~# ls -l /proc/7256/ns total 0 lrwxrwxrwx 1 root root 0 6월 13 01:15 ipc -> ipc:[4026532245] lrwxrwxrwx 1 root root 0 6월 13 01:15 mnt -> mnt:[4026532243] lrwxrwxrwx 1 root root 0 6월 13 01:15 net -> net:[4026532248] lrwxrwxrwx 1 root root 0 6월 13 01:15 pid -> pid:[4026532246] lrwxrwxrwx 1 root root 0 6월 13 01:15 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 6월 13 01:15 uts -> uts:[4026532244] root@ubuntu:~# ps faux ... root 976 0.0 0.1 439596 14768 ? Sl 6월12 0:00 _ /usr/bin/docker.io -d root 7256 0.0 0.0 11484 1660 pts/3 Ss+ 6월12 0:00 _ /bin/bash root 7605 0.0 0.0 175764 3748 ? Ss 6월12 0:00 _ /usr/sbin/httpd 48 7607 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7608 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7609 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7610 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7611 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7612 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7613 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7614 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd ...
  15. 15. Namespace Currently, Linux implements six different types of namespaces. The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization (as well as other purposes) that provides a group of processes with the illusion that they are the only processes on the system. 1. Mount namespaces (CLONE_NEWNS, Linux 2.4.19) isolate the set of filesystem mount points seen by a group of processes. Thus, processes in different mount namespaces can have different views of the filesystem hierarchy. One use of mount namespaces is to create environments that are similar to chroot jails. However, by contrast with the use of the chroot() system call, mount namespaces are a more secure and flexible tool for this task. Other more sophisticated uses of mount namespaces are also possible. For example, separate mount namespaces can be set up in a master-slave relationship, so that the mount events are automatically propagated from one namespace to another; this allows, for example, an optical disk device that is mounted in one namespace to automatically appear in other namespaces. 2. UTS namespaces (CLONE_NEWUTS, Linux 2.6.19) isolate two system identifiers—nodename and domainname—returned by the uname() system call; the names are set using the sethostname() and setdomainname() system calls. In the context of containers, the UTS namespaces feature allows each container to have its own hostname and NIS domain name.
  16. 16. Namespace 3. IPC namespaces (CLONE_NEWIPC, Linux 2.6.19) isolate certain interprocess communication (IPC) resources, namely, System V IPC objects and (since Linux 2.6.30) POSIX message queues. The common characteristic of these IPC mechanisms is that IPC objects are identified by mechanisms other than filesystem pathnames. Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. 4. PID namespaces (CLONE_NEWPID, Linux 2.6.24) isolate the process ID number space. In other words, processes in different PID namespaces can have the same PID. 5. Network namespaces (CLONE_NEWNET, started in Linux 2.4.19 2.6.24 and largely completed by about Linux 2.6.29) provide isolation of the system resources associated with networking. Thus, each network namespace has its own network devices, IP addresses, IP routing tables, /proc/net directory, port numbers, and so on. 6. User namespaces (CLONE_NEWUSER, started in Linux 2.6.23 and completed in Linux 3.8) isolate the user and group ID number spaces. In other words, a process's user and group IDs can be different inside and outside a user namespace.
  17. 17. Namespace root@ubuntu:~# ps faux ... root 976 0.0 0.1 439596 14768 ? Sl 6월12 0:00 _ /usr/bin/docker.io -d root 7256 0.0 0.0 11484 1660 pts/3 Ss+ 6월12 0:00 _ /bin/bash root 7605 0.0 0.0 175764 3748 ? Ss 6월12 0:00 _ /usr/sbin/httpd 48 7607 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd 48 7608 0.0 0.0 175764 2216 ? S 6월12 0:00 _ /usr/sbin/httpd ... root@ubuntu:~# ls -l /proc/7256/ns lrwxrwxrwx 1 root root 0 6월 13 01:15 ipc -> ipc:[4026532245] lrwxrwxrwx 1 root root 0 6월 13 01:15 mnt -> mnt:[4026532243] lrwxrwxrwx 1 root root 0 6월 13 01:15 net -> net:[4026532248] lrwxrwxrwx 1 root root 0 6월 13 01:15 pid -> pid:[4026532246] lrwxrwxrwx 1 root root 0 6월 13 01:15 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 6월 13 01:15 uts -> uts:[4026532244] root@ubuntu:~# ls -l /proc/7605/ns lrwxrwxrwx 1 root root 0 6월 13 01:48 ipc -> ipc:[4026532245] lrwxrwxrwx 1 root root 0 6월 13 01:48 mnt -> mnt:[4026532243] lrwxrwxrwx 1 root root 0 6월 13 01:48 net -> net:[4026532248] lrwxrwxrwx 1 root root 0 6월 13 01:48 pid -> pid:[4026532246] lrwxrwxrwx 1 root root 0 6월 13 01:48 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 6월 13 01:48 uts -> uts:[4026532244] root@ubuntu:~# ls -l /proc/976/ns lrwxrwxrwx 1 root root 0 6월 13 01:47 ipc -> ipc:[4026531839] lrwxrwxrwx 1 root root 0 6월 13 01:47 mnt -> mnt:[4026531840] lrwxrwxrwx 1 root root 0 6월 13 01:47 net -> net:[4026531968] lrwxrwxrwx 1 root root 0 6월 13 01:47 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 6월 13 01:47 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 6월 13 01:47 uts -> uts:[4026531838]
  18. 18. Security - POSIX Capabilities For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list). Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute. - man capabilities Permitted Inheritable Effective CAP_CHOWN CAP_SETPCAP CAP_NET_ADMIN CAP_SYS_BOOT ……
  19. 19. Security - POSIX Capabilities janghoon@ubuntu:~# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c33faf7c3537 ubuntu:12.04 /bin/bash 17 hours ago Up 16 hours berserk_wright 758e5b50d54e ubuntu:12.04 /bin/bash 18 hours ago Up 16 hours prickly_fermat * --privileged=true janghoon@ubuntu:~# sudo getpcaps 1951 Capabilities for `1951': =ep * --privileged=false janghoon@ubuntu:~# sudo getpcaps 2174 Capabilities for `2174': =ep cap_setpcap,cap_net_admin,cap_sys_module,cap_sys_rawio,cap_sys_pacct,cap_sys_admin, cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_audit_write,cap_audit_control, cap_mac_override,cap_mac_admin-ep root@758e5b50d54e:/# id uid=0(root) gid=0(root) groups=0(root) root@758e5b50d54e:/# rmmod bridge ERROR: Removing 'bridge': Operation not permitted root@758e5b50d54e:/# iptables -L -n iptables v1.4.12: can't initialize iptables table `filter': Permission denied (you must be root) Perhaps iptables or your kernel needs to be upgraded.
  20. 20. Control Group - aka cgroup cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups. In late 2007 it was merged to kernel version 2.6.24. By using cgroups, system administrators gain fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. Hardware resources can be smartly divided up among tasks and users, increasing overall efficiency Cgroups are organized hierarchically, like processes, and child cgroups inherit some of the attributes of their parents. ● blkio — this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, USB, etc.). ● cpu — this subsystem uses the scheduler to provide cgroup tasks access to the CPU. ● cpuacct — this subsystem generates automatic reports on CPU resources used by tasks in a cgroup. ● cpuset — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup. ● devices — this subsystem allows or denies access to devices by tasks in a cgroup. ● freezer — this subsystem suspends or resumes tasks in a cgroup. ● memory — this subsystem sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks. ● net_cls — this subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup task. ● net_prio — this subsystem provides a way to dynamically set the priority of network traffic per network interface. ● ns — the namespace subsystem.

×