Inside Docker for Fedora20/RHEL7

  • 4,330 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,330
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
189
Comments
0
Likes
34

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Inside Docker for Fedora20/RHEL7 ver1.8e Etsuji Nakai Twitter @enakai00 Open Cloud Campus Inside Docker for Fedora20/RHEL7
  • 2. Open Cloud Campus 2 Inside Docker for Fedora20/RHEL7 $ who am i – The author of “Professional Linux Systems” series. • Available only in Japanese (some are in Korean taranslation.) • Translation offering from publishers are welcomed ;-) Self-study Linux Deploy and Manage by yourself Professional Linux Systems Deployment and Management Professional Linux Systems Network Management  Etsuji Nakai – Senior solution architect and cloud evangelist at Red Hat. Professional Linux Systems Technology for Next Decade New OpenStack book is in store now!
  • 3. Open Cloud Campus 3 Inside Docker for Fedora20/RHEL7 Contents  What is Linux Container  Device Mapper Thin-Provisioning  Network Namespace  systemd and cgroups (*) Contents of this document is based on Fedora20 with docker-io-1.0.0-1.fc20.x86_64
  • 4. Inside Docker for Fedora20/RHEL7 What is Linux Container
  • 5. Open Cloud Campus 5 Inside Docker for Fedora20/RHEL7 Traditional server virtualization Physical machine Physical machine ホスト OS Hypervisor (Kernel Module) Virtual Machine Guest OS VMware vSphere, Xen, etc. Linux KVM Hardware assisted virtualization (Hypervisor is embedded in firmware.) Software assisted virtualization (Hypervisor is installed on physical machine.) Software assisted virtualization (Host OS provides the hypervisor feature.) Physical machine OS Baremetal Traditional "server virtualization" is a technology to create software emulated "virtual machines" hosting various guest operating systems. Hypervisor (Software) Physical machine Hypervisor (Firmware) Virtual Machine Guest OS Virtual Machine Guest OS Virtual Machine Guest OS Virtual Machine Guest OS Virtual Machine Guest OS Virtual Machine Guest OS Virtual Machine Guest OS
  • 6. Open Cloud Campus 6 Inside Docker for Fedora20/RHEL7  "Linux Container" is a Linux kernel feature to contain a group of processes in an independent execution environment called a container.  Linux kernel provides an independent apllication execution environment for each container which includes: – Independent filesystem. – Independent network interface and IP address. – Usage limit for memory and CPU time.  You can use containers on Linux virtual machines in addition to baremetal servers since the container can co-exist with the traditional server virtualization technology. Linux Kernel UserProcess ・・・ Physical Machine Physical Machine OS ContainerBaremetal UserProcess UserProcess User Space Linux Kernel UserProcess UserProcess User Space UserProcess UserProcess User Space ・・・ What is container technology? Container
  • 7. Open Cloud Campus 7 Inside Docker for Fedora20/RHEL7  Container supports separation of various resources. They are internally realized with different technologies called "namespace." – Filesystem separation  → Mount namespace (kernel 2.4.19) – Hostname separation → UTS namespace (kernel 2.6.19) – IPC separtion → IPC namespece (kernel 2.6.19) – User (UID/GID) separation → User namespace (kernel 2.6.23〜kernel 3.8) – Processtable separation  → PID namespace (kernel 2.6.24)  – Network separtion    → Network Namepsace (kernel 2.6.24) – Usage limit of CPU/Memory → Control groups (*) Reference: "Namespaces in operation, part 1: namespaces overview" • http://lwn.net/Articles/531114/  Linux container is realized by integrating these namespace features. There are multiple container management tools such as lxctools, libvirt and docker. They may use different parts of these features. Under the hood
  • 8. Open Cloud Campus 8 Inside Docker for Fedora20/RHEL7  Processes in all containers are executed on the same Linux kernel. But inside a container, you can see processes only in the container. – This is because each container has its own process table. On host linux, which is outside containers, you can see all processes icnluding ones in containers. Resource separation / Process tables # ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 09:49 ? 00:00:00 /bin/sh /usr/local/bin/init.sh root 35 1 0 09:49 ? 00:00:00 /usr/sbin/sshd root 47 1 0 09:49 ? 00:00:00 /usr/sbin/httpd apache 49 47 0 09:49 ? 00:00:00 /usr/sbin/httpd apache 50 47 0 09:49 ? 00:00:00 /usr/sbin/httpd ... apache 56 47 0 09:49 ? 00:00:00 /usr/sbin/httpd root 57 1 0 09:49 ? 00:00:00 /bin/bash # ps -ef UID PID PPID C STIME TTY TIME CMD ... root 802 1 0 18:10 ? 00:01:20 /usr/bin/docker -d --selinux-enabled -H fd:// ... root 3687 802 0 18:49 pts/2 00:00:00 /bin/sh /usr/local/bin/init.sh root 3736 3687 0 18:49 ? 00:00:00 /usr/sbin/sshd root 3748 3687 0 18:49 ? 00:00:00 /usr/sbin/httpd 48 3750 3748 0 18:49 ? 00:00:00 /usr/sbin/httpd ... 48 3757 3748 0 18:49 ? 00:00:00 /usr/sbin/httpd root 3758 3687 0 18:49 pts/2 00:00:00 /bin/bash Processes seen inside container Processes seen outside container
  • 9. Open Cloud Campus 9 Inside Docker for Fedora20/RHEL7 Resource separation / Process tables (cont.) fork/exec sshd PID namespace  In the example of previous page, docker daemon fork/exec-ed the initial process "init.sh" and put it in a new "PID namespace." After that, all processes fork/exec-ed from init.sh are put in the same namespace. – Inside container, the initial process has PID=1 independently from the host. Likewise, child processes of it have independent PID's. – Since Docer1.0 doesn't support UID namespace, the same UID/GID's are used as the host even in the container. User/group names could be different because /etc/passwd is different in the containter. • Reference:"Docker 1.0 and user namespaces" https://groups.google.com/forum/#!topic/docker-dev/MoIDYDF3suY PID=1 bash /bin/sh /usr/local/bin/init.sh httpd httpd ・・・ #!/bin/sh service sshd start service httpd start while [[ true ]]; do /bin/bash done init.sh docker daemon
  • 10. Open Cloud Campus 10 Inside Docker for Fedora20/RHEL7 Resource separation / Filesystem  A specific directory on the host is bind mounted as a root directory of the container. Inside container, that directory is seen as a root directory, very similar mechanism to the "chroot jail."  When using traditional container management tools such as lxctools or libvirt, you need to prepare the directory contents by hand. – You can put minimam contants for a specific application such as application bianaries and shared libraries in the directory. – It's also possible to copy a whole root filesystem of a specific linux distribution to the directory. – If necessary, special filesystems such as /dev, /proc and /sys are mounted in the container by the management tool. Mount namespace / |--etc |--bin |--sbin ... /export/container01/rootfs/ |--etc |--bin |--sbin ... bind mount
  • 11. Open Cloud Campus 11 Inside Docker for Fedora20/RHEL7 Resource separation / Filesystem (cont.)  Docker provides the original disk image management system which mounts the specified image on the host and make it the root filesystem of the container. # df -a Filesystem 1K-blocks Used Available Use% Mounted on rootfs 10190136 169036 9480428 2% / /dev/mapper/docker-252:3-130516-d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e0170 5ee82f364c6 10190136 169036 9480428 2% / proc 0 0 0 - /proc sysfs 0 0 0 - /sys tmpfs 1025136 0 1025136 0% /dev shm 65536 0 65536 0% /dev/shm devpts 0 0 0 - /dev/pts /dev/vda3 14226800 3013432 10467640 23% /.dockerinit /dev/vda3 14226800 3013432 10467640 23% /etc/resolv.conf /dev/vda3 14226800 3013432 10467640 23% /etc/hostname /dev/vda3 14226800 3013432 10467640 23% /etc/hosts devpts 0 0 0 - /dev/console ... # df Filesystem 1K-blocks Used Available Use% Mounted on ... /dev/dm-2 10190136 169036 9480428 2% /var/lib/docker/devicemapper/mnt/d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e017 05ee82f364c6 Filesystem seen in a container Specified disk image mounted on the host Disk image mounted on the host. Some files are separately bind-mounted.
  • 12. Open Cloud Campus 12 Inside Docker for Fedora20/RHEL7 Network namespace Resource separation / Network  Container uses Linux's "veth" device for network communication. – veth is a pair of logical NIC devices connected through a (virtual) crossover cable.  One side of the veth pair is placed in a container's network namespace so that it can be seen only inside the container. The other side is connected to a Linux bridge on the host. – A device name in the container is renamed such as "eth0." By means of the namespace, network settings such as IP address, routing table and iptables are independently configured in the container。 – The connection between the bridge and a physical network is up to the host configuration. Host Linux vethXX eth0 docker0 eth0 IP masquerade Physical network  Docker creates a bridge "docker0" and packets from containers are forwarded with IP masquerade. – Packets from the physical network targeted to specified ports are forwarded to the container using the port forwarding feature of iptables. 172.17.42.1
  • 13. Open Cloud Campus 13 Inside Docker for Fedora20/RHEL7 Resource separation / CPU and Memory  Processes inside container recognize all physical memory and CPU cores. But allocation is restricted with Linux's controll groups (cgroups). – In theory, fine grained allocation controll including number of CPU cores, CPU time quota and I/O bandwidth is possible.  Docker uses systemd's unit mechanism to manage the group of processes in the container. – When creating a container, Docker asks systemd to create a new unit to start the initial process. As a result, all processes fork/exec-ed from the initial process belong to the same unit. At the same time, systemd creates a new cgroups' group for the unit. # systemd-cgls ... └─system.slice ├─docker-cc08291a81556ba55f049e50fd2c04287b04c6cf657a8a9971ef42468a2befa7.scope │ ├─7444 nginx: master process ngin │ ├─7458 nginx: worker proces │ ├─7459 nginx: worker proces │ ├─7460 nginx: worker proces │ └─7461 nginx: worker proces ... "docker-<Container ID>.scope" is the cgroups' group name
  • 14. Inside Docker for Fedora20/RHEL7 Device Mapper Thin-Provisioning
  • 15. Open Cloud Campus 15 Inside Docker for Fedora20/RHEL7  Device Mapper is a Linux's virtual filesystems mechanism to create a logical device which provides additional features on top of physical block devices. This is done through a wrapper of software modules. Typical moduldes are: – dm-raid : add a software RAID feature – dm-multipath : add a multipath access to LUN's – dm-crypt : add an encryption feature – dm-delay : add an access delay emulation feature What is Device Mapper? /dev/sda /dev/sdb /dev/dm1 Mirroring dm-raid /dev/sda /dev/dm1 dm-crypt Encryption /Decryption /dev/sda /dev/dm1 dm-delay Access delay
  • 16. Open Cloud Campus 16 Inside Docker for Fedora20/RHEL7  Device Mapper Thin-Provisioning (dm-thin) is a relatively new module which provides "thin-provisioning" and "snapshot" features similar to commercial storage appliances.  dm-thin uses two block devices, one is for "block pool" and the others is for "metadata device." – Fixed size blocks are dynamically allocated to logical devices so that blocks are consumed only when data are actually written. – Pointers from segments of logical devices to blocks in the block pool are stored in the metadata device. – CoW (Copy on Write) snapshots are created by allowing pointing to the same block from different logical devices. You can create multi-generation snapshots with this mecanism. What is Device Mapper Thin-Provisioning? Block Pool Metadata Device Pointers from segments of logical devices to block in the pool are stored. Logical device #001 Logical device #002 Logical device #003
  • 17. Open Cloud Campus 17 Inside Docker for Fedora20/RHEL7  On recent Linux distributions, you can use dm-thin through LVM interface as below. – First, create a volume group as usual. – Then, define a "thin pool". It creates LV's for block pool and metadata in the background. Using dm-thin through LVM interface # fallocate -l $((1024*1024*1024)) pooldev.img # losetup -f pooldev.img # losetup -a /dev/loop0: [64768]:39781720 (/root/pooldev.img) # pvcreate /dev/loop0 # vgcreate vg_data /dev/loop0 # lvcreate -L 900M -T vg_data/thinpool Logical volume "lvol1" created Logical volume "thinpool" created # lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert ... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00 LV: thinpool LV: lvol1 VG: vg_data Block pool Metadata device Logical device vol00 Logical device vol01 ・・・
  • 18. Open Cloud Campus 18 Inside Docker for Fedora20/RHEL7 – Define a new logical device specifying its logical size with -V option. – Create a snapshot with the following command. – Snapshots are inactive by default for the sake of data protection. You can use it after activating with the following command. Using dm-thin through LVM interface (cont.) # lvcreate -V 100G -T vg_data/thinpool -n vol00 Logical volume "vol00" created # lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert ... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00 vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00 # lvcreate -s --name vol01 vg_data/vol00 Logical volume "vol01" created # lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert ... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00 vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00 vol01 vg_data Vwi---tz-k 100.00g thinpool vol00 # lvchange -K -ay /dev/vg_data/vol01
  • 19. Open Cloud Campus 19 Inside Docker for Fedora20/RHEL7  Docker has a plugin mechanism for image management drivers and "Device Mapper driver" is used in Fedora20/RHEL7. It stores each image in a logical device of "Device Mapper Thin Provisioning (dm-thin)." – When starting a new container, a snapshot of the specified image is attached to the container. – When storing the image with "docker commit", it creates a new snapshot of the snapshot. You'd better stop the container with "docker stop" before executing "docker commit." Use of Thin Provisioning in Docker Local image Snapshot Create a snapshot when starting a container. × run commit rm Processes Snapshot stop start Local image When a container is sopped, all processes in it are stopped. (The snapshot image is not deleted.) When a container is removed, the associated snapshot is deleted.Save a new local image by taking a snapshot of the snapshot.
  • 20. Open Cloud Campus 20 Inside Docker for Fedora20/RHEL7  Docker uses the native dm interface of dm-thin module instead of LVM interface. – When a docker service is launched, it loop-mounts the following "data" and "meadata" disk image file, and create a block pool with them. How Docker uses Device Mapper Thin-Provisioning? # ls -lh /var/lib/docker/devicemapper/devicemapper/ total 1.2G -rw-------. 1 root root 100G May 11 21:37 data -rw-------. 1 root root 2.0G May 11 22:05 metadata # losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE /dev/loop0 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/data /dev/loop1 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/metadata # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm Block pool device Metadata device
  • 21. Open Cloud Campus 21 Inside Docker for Fedora20/RHEL7  Configuration data of logical devices are stored in the following JSON files. – /var/lib/docker/devicemapper/metadata/<Image ID> – The logical device with device ID "0" has a special role. It is created with 10GB size when Docker service is started for the first time. Docker initializes it as an empty ext4 filesystem. – When you downloads images from an external registory, snapshots of thie device are used to store those images. Therefore, all logical devices have the same 10GB size and ext4 filesystem. How Docker uses Device Mapper Thin-Provisioning? (cont.) # docker images enakai/httpd REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE enakai/httpd ver1.0 d3d92adfcafb 36 hours ago 206.6 MB # cat /var/lib/docker/devicemapper/metadata/d3d92adfcafb* | python -mjson.tool { "device_id": 72, "initialized": false, "size": 10737418240, "transaction_id": 99 } # cat /var/lib/docker/devicemapper/metadata/base | python -mjson.tool { "device_id": 0, "initialized": true, "size": 10737418240, "transaction_id": 1 }
  • 22. Open Cloud Campus 22 Inside Docker for Fedora20/RHEL7  As a sort of hacking technique, you can mount disk image contents by hand, using dmsetup command to interact with dm-thin module. – At first, using the commands in the previous page, check the "deivce_id" and "size" of the disk image you want to mount. In addition, check the name of thin pool with the following command. It's "docker-252:3-130516-pool" in this example. – For the sake of simplicity, set these values in shell variables. Manipulating image contents by hand # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm # device_id=72 # size=10737418240 # pool=docker-252:3-130516-pool
  • 23. Open Cloud Campus 23 Inside Docker for Fedora20/RHEL7 – Activate and mount the logical device with the following command. Under "rootfs" is the root filesystem seen from a container. – Finally, unmount and deactivate the logical device. (*) Modifying the contents of images is not a supported procedure of Docker. You should do it at you own risk as it may damage the image. – Reference: https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt Manipulating image contents by hand (cont.) # dmsetup create myvol --table "0 $(($size / 512)) thin /dev/mapper/$pool $device_id" # lsblk ... loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm └─myvol 253:1 0 10G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm └─myvol 253:1 0 10G 0 dm # mount /dev/mapper/myvol /mnt # ls /mnt id lost+found rootfs # cat /mnt/rootfs/var/www/html/index.html Hello, World! # umount /mnt # dmsetup remove myvol
  • 24. Inside Docker for Fedora20/RHEL7 Network Namespace
  • 25. Open Cloud Campus 25 Inside Docker for Fedora20/RHEL7 Network namespace Network configuration in Docker  Container's logical NIC "eth0" is connected to a Linux bridge "docker0." Communication between container and external network is controlled with iptables on the host. – Packets from a container is forwarded with IP masquerade. – Packets from external network to specified ports are forwarded to a container with iptables' port forward feature. Host Linux vethXX eth0 docker0 eth0 IP Masquerade 172.17.42.1  As an example, starting a container with portforwarding from 8000 to 80, and from 2222 to 22. – The one end of a veth pair is connected to the bridge "docker0." # docker run -itd -p 8000:80 -p 2222:22 enakai/httpd:ver1.0 a7838c84cd008161086839379e4a0be2d0e109e02c779229cde49f53b79ae1d5 # brctl show bridge name bridge id STP enabled interfaces docker0 8000.56847afe9799 no veth66c0 # ifconfig docker0 docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0 ...
  • 26. Open Cloud Campus 26 Inside Docker for Fedora20/RHEL7 Network configuration in Docker (cont.) – nat table of iptables is configured as below. ① Packets from an external network are processed in DOCKER chain for port forwarding. ② Packets from localhost to localhost's IP address (except "127.0.0.0/8") are processed in      DOCKER chain, too. ③ Packets from a container to an external network are forwarded with IP masquerade. ④⑤ Portforwading configuration specified with "docker run". – I'm not sure why "127.0.0.0/8" is excluded in ②. But anyway, packets to "127.0.0.0/8" are processed appropriately because... (see next page.) # iptables-save # Generated by iptables-save v1.4.19.1 on Fri Jun 13 22:36:14 2014 *nat ... -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.17.0.0/16 ! -d 172.17.0.0/16 -j MASQUERADE -A DOCKER ! -i docker0 -p tcp -m tcp --dport 2222 -j DNAT --to-destination 172.17.0.23:22 -A DOCKER ! -i docker0 -p tcp -m tcp --dport 8000 -j DNAT --to-destination 172.17.0.23:80 COMMIT ① ② ③ ④ ⑤
  • 27. Open Cloud Campus 27 Inside Docker for Fedora20/RHEL7 Network configuration in Docker (cont.) – Docker daemon provides the port forward proxy feature, and packets which are not processed with iptables are handled with this. – Originally, the feature is prepared for hosts without iptables. I'm not sure why packets to "127.0.0.0/8" are selectively handled with this. # lsof -i -P COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ... docker 20003 root 11u IPv6 177010 0t0 TCP *:2222 (LISTEN) docker 20003 root 12u IPv6 178468 0t0 TCP *:8000 (LISTEN) ...
  • 28. Open Cloud Campus 28 Inside Docker for Fedora20/RHEL7 Network namespace manipulation  As a sort of hacking technique, you can directly manipulate network namespaces. Without Docker, you would use network namespaces in the following steps. – Define a new namespace. – Add network configuration in the namespace such as logical NIC, IP address, routing table and iptables. – Launch processes in the namespace.  You can use "ip netns" command to manipulate network namespaces. But you need some additional operations to manipulate network namespaces created by Docker. – Find a PID of one of the processes in the container. – There is a sysmlink to the descripter to manipulate the namespace in /proc filesystem of this process. # systemd-cgls ... └─system.slice ├─docker-61151db106a7fd6d5cf937a03eac0e9b33c7799d3d48b6cddc83070839afeea9.scop │ ├─502 /bin/sh /usr/local/bin/init.sh │ ├─545 /usr/sbin/sshd │ ├─557 /usr/sbin/httpd ... # ls -l /proc/502/ns/net lrwxrwxrwx 1 root root 0 June 13 22:52 /proc/502/ns/net -> net:[4026532255]
  • 29. Open Cloud Campus 29 Inside Docker for Fedora20/RHEL7 Network namespace manipulation (cont.) – By creating a symlink under /var/run/netns/ to the descriptor, ip command recognizes the namespace. – From this point, you can execute any commands inside the namespace "foo-ns." – For example, by starting bash inside the namespace, you can see the network configuration in the container. But configurations other than network is the same as host since you switched only the network namespace. # mkdir /var/run/netns # ln -s /proc/502/ns/net /var/run/netns/foo-ns # ip netns foo-ns # ip netns exec foo-ns <command> # ip netns exec foo-ns bash # ifconfig eth0 eth0: flags=67<UP,BROADCAST,RUNNING> mtu 1500 inet 172.17.0.2 netmask 255.255.0.0 broadcast 0.0.0.0 ... # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.42.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 # exit # ip netns exec foo-ns <command>
  • 30. Open Cloud Campus 30 Inside Docker for Fedora20/RHEL7 Adding more logical NIC's  With the hacking technique of "ip netns", you can add logical NIC's after starting a new container. The following is an example of adding a logical NIC which connects to the physical network through a bridge "br0." (This is not a supported operation of Docker.) – Create a bridge "br0" and move the IP address (192.168.200.20/24 in this case) of physical NIC to the bridge. # brctl addbr br0; ip link set br0 up # ip addr del 192.168.200.20/24 dev eth0; ip addr add 192.168.200.20/24 broadcast 192.168.200.255 dev br0; brctl addif br0 eth0; route add default gw 192.168.200.1 # echo 'NM_CONTROLLED="no"' >> /etc/sysconfig/network-scripts/ifcfg-eth0 # systemctl enable network.service Host Linux vethXX eth0 Container docker0 IP Masquerade External network vethYY eth1 br0 192.168.200.99 192.168.200.20 192.168.200.20 eth0 (*) You should understand what you're doing with these commands. It may disable the network connection if you made a mistake.
  • 31. Open Cloud Campus 31 Inside Docker for Fedora20/RHEL7 Adding more logical NIC's (cont.) – Create a veth pair "veth-host / veth-guest", and attach "veth-host" to the bridge br0. # ip link add name veth-host type veth peer name veth-guest # ip link set veth-guest down # brctl addif br0 veth-host # brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.525400677470 no eth0 veth-host Host Linux vethXX eth0 Container docker0 IP Masquerade External network veth-host veth-guest br0 eth0 • At this point, both veth-host and veth-guest are visible on the host, not in the container.
  • 32. Open Cloud Campus 32 Inside Docker for Fedora20/RHEL7 Adding more logical NIC's (cont.) – Add veth-guest to the container's namespace. At this point, veth-guest becomes invisible on the host. – From this point, you can use "ip netns exec" to make additional network configurations in the container. The following is to rename the logical NIC to "eth0" and add an IP address. In addition, modifying routing table to make eth1 as a default gateway. # ip link set veth-guest netns foo-ns # ifconfig veth-guest veth-guest: error fetching interface information: Device not found # ip netns exec foo-ns ip link set veth-guest name eth1 # ip netns exec foo-ns ip addr add 192.168.200.99/24 dev eth1 # ip netns exec foo-ns ip link set eth1 up # ip netns exec foo-ns ip route delete default # ip netns exec foo-ns ip route add default via 192.168.200.1
  • 33. Open Cloud Campus 33 Inside Docker for Fedora20/RHEL7 Adding more logical NIC's (cont.) – Login to the container and check the network configuration inside container. – Now you can directly access the container without port forwarding. – You can remove the symlink in /var/run/netns once you finished the configuration.  By the way, there is a shell script to automate this procedure.... – jpetazzo/pipework – https://github.com/jpetazzo/pipework # ssh enakai@localhost -p 2222 $ ifconfig eth1 eth1 Link encap:Ethernet HWaddr BE:53:16:06:BF:3A inet addr:192.168.200.99 Bcast:0.0.0.0 Mask:255.255.255.0 ... $ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.200.1 0.0.0.0 UG 0 0 0 eth1 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 192.168.200.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 $ curl http://192.168.200.99:80 Hello, World! # rm /var/run/netns/foo-ns
  • 34. Inside Docker for Fedora20/RHEL7 systemd and cgroups
  • 35. Open Cloud Campus 35 Inside Docker for Fedora20/RHEL7 Basics of systemd and cgroups  Refer to the following slides for systemd basics. – Your first dive into systemd • http://www.slideshare.net/enakai/systemd-study-v14e  Especially, you need to understand how systemd manages cgroups in conjunction with units. – systemd defines various "units" corresponding to services and daemons. – When systemd starts a service as a unit, it dynamically creates cgroups' group for that unit. All processes of the service is place under this group. – If You specify "CPUShares" and "MemoryLimit" in the unit's configuration file, they are translated to the corresponding cgroups settings. (CPUShares specifies relative weight of CPU time allocation, and "MemoryLimit" specifies the upper limit of memory usage.)
  • 36. Open Cloud Campus 36 Inside Docker for Fedora20/RHEL7 Basics of systemd and cgroups (cont.)  You can check the cgroups status managed by systemd with the following command. # systemd-cgls ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 23 ├─user.slice │ └─user-0.slice │ ├─session-1.scope │ │ ├─439 sshd: root@pts/0 │ │ ├─444 -bash │ │ ├─464 systemd-cgls │ │ └─465 systemd-cgls │ └─user@0.service │ ├─441 /usr/lib/systemd/systemd --user │ └─442 (sd-pam) └─system.slice ├─polkit.service │ └─352 /usr/lib/polkit-1/polkitd --no-debug ├─auditd.service │ └─301 /sbin/auditd -n ├─systemd-udevd.service │ └─248 /usr/lib/systemd/systemd-udevd ...
  • 37. Open Cloud Campus 37 Inside Docker for Fedora20/RHEL7 How Docker works with systemd?  When starting a container, Docker asks systemd to create a new unit to start the initial process. – As a result, all processes fork/exec-ed from the initial process belong to the same unit and placed under the same cgroups' group. The unit name is "docker-<container ID>.scope". # docker run -td -p 8000:80 -p 2222:22 enakai/httpd:ver1.0 # systemd-cgls -a ... └─system.slice ├─var-lib-docker-devicemapper-mnt-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a4 7f7c4bc8b37e3b488b.mount ├─docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope │ ├─496 /bin/sh /usr/local/bin/init.sh │ ├─538 /usr/sbin/sshd │ ├─550 /usr/sbin/httpd │ ├─552 /bin/bash │ ├─553 /usr/sbin/httpd │ ├─554 /usr/sbin/httpd │ ├─555 /usr/sbin/httpd │ ├─556 /usr/sbin/httpd │ ├─557 /usr/sbin/httpd │ ├─558 /usr/sbin/httpd │ ├─559 /usr/sbin/httpd │ └─560 /usr/sbin/httpd ...
  • 38. Open Cloud Campus 38 Inside Docker for Fedora20/RHEL7 How Docker works with systemd? – You can check the unit status corresponding to a container. # unitname=docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope # systemctl status $unitname docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope - docker container a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b Loaded: loaded (/run/systemd/system/docker- a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope; static) Drop-In: /run/systemd/system/docker- a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope.d └─90-BlockIOAccounting.conf, 90-CPUAccounting.conf, 90-Description.conf, 90- MemoryAccounting.conf, 90-Slice.conf Active: active (running) since 金 2014-06-13 23:05:27 JST; 1min 41s ago CGroup: /system.slice/docker- a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope ├─496 /bin/sh /usr/local/bin/init.sh ├─538 /usr/sbin/sshd ├─550 /usr/sbin/httpd ├─552 /bin/bash ├─553 /usr/sbin/httpd ├─554 /usr/sbin/httpd ├─555 /usr/sbin/httpd ... └─560 /usr/sbin/httpd 6月 13 23:05:27 fedora20 systemd[1]: Started docker container a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a...488b. Hint: Some lines were ellipsized, use -l to show in full.
  • 39. Open Cloud Campus 39 Inside Docker for Fedora20/RHEL7 How Docker works with systemd? (cont.) – There are "-c" and "-m" options for "docker run" command. They are translated to the unit's configuration parameter "CPUShares" and "MemoryLimit". – After starting a container, you can change these parameters through systemd's interface.  Systemd will be more integrated with cgroups in the future. After that, additional resource control (CPU pinning, CPU quota, I/O bandwidth) may be added to Docker. # systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)" CPUShares=1024 MemoryLimit=18446744073709551615 # systemctl set-property $unitname CPUShares=512 --runtime # systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)" CPUShares=512 MemoryLimit=18446744073709551615
  • 40. Inside Docker for Fedora20/RHEL7 Etsuji Nakai Twitter @enakai00 Open Cloud Campus Let's learn the up-to-date technology with Fedora/RHEL