{
第 1 回
コンテナ情報交換会
@関西
Kernel/LXC/OpenVZ/Virtuozzo/Linux-Vserver/etc
2013年6月1日
 自己紹介
 36歳(昭和51年(1976年)12月生まれ)
 古き良きインフラエンジニアです
 ジブリをこよなく愛してます
 Kernelを弄るのが好きです
 最近テレフォニー/UCの分野に目覚めました
山本 政秀 / QLOOG
@masahide7
https://github.com/dotcloud/docker/
早速ですが、dockerって知ってますか?
Docker containers are both hardware-agnostic and platform-agnostic. This means that they can run
anywhere, from your laptop to the largest EC2 compute instance and everything in between - and they don't
require that you use a particular language, framework or packaging system. That makes them great building
blocks for deploying and scaling web apps, databases and backend services without depending on a
particular stack or provider.
 デモ環境
[root@lxcbase-local dockerwk]# go version
go version go1.1 linux/amd64
[root@lxcbase-local dockerwk]# docker version
Version: 0.3.4
Git Commit: 1c09165+CHANGES
[root@lxcbase-local dockerwk]# uname -a
Linux lxcbase-local.qloog.ne.jp 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun
1 02:59:16 JST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@lxcbase-local dockerwk]# ls
./ aufs-aufs-util/ docker/ go1.1.linux-amd64.tar.gz libarchive-
3.1.2.tar.gz util-linux-2.23.1.tar.xz
../ aufs-aufs3-standalone/ go/ libarchive-3.1.2/ util-linux-2.23.1/
dockerのデモ
 デーモン起動
[root@lxcbase-local docker]# docker -d -b lxcbr1 >/var/log/docker.log 2>&1 &
 イメージの検索とインストール(busybox)
[root@lxcbase-local docker]# docker search busy
2013/05/31 23:23:31 GET /v1.1/images/search?term=busy
Found 8 results matching your query ("busy")
NAME DESCRIPTION
test422/busybox
shykes/busybox
lopter/busybox
kencochrane/busybox
busybox
vieux/busybox
vieux/busybox.test
vieux/busyboxreadme
[root@lxcbase-local /]# docker pull busybox
2013/06/01 03:03:38 POST /v1.1/images/create?fromImage=busybox&registry=&tag=
Pulling repository busybox from https://index.docker.io/v1
Pulling image e9aa60c60128cad1 (latest) from busybox
 イメージ上でコンテナプロセスを生成(これだけ!)
[root@lxcbase-local ~]# docker run -i -t busybox /bin/sh
BusyBox v1.19.3 (Ubuntu 1:1.19.3-7ubuntu1.1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
/ # uname -a
Linux 9526b79d150a 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun 1 02:59:16 JST 2013
x86_64 GNU/Linux
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
45: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether be:6d:92:19:da:5e brd ff:ff:ff:ff:ff:ff
inet 10.20.30.12/24 brd 10.20.30.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::bc6d:92ff:fe19:da5e/64 scope link
valid_lft forever preferred_lft forever
 続き
/ # while :; do sleep 1; echo test; done
test
test
test
test
 先ほどコンテナを起動したコマンドのプロセスを強制終了
[root@lxcbase-local /]# pgrep -lf busybox
7540 docker run -i -t busybox /bin/sh
[root@lxcbase-local /]# kill -9 7540
 コンテナプロセスの一覧の表示(まだ裏で動いている)
[root@lxcbase-local /]# docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
9526b79d150a busybox:latest /bin/sh 7 minutes ago Up 7 minutes
 そのコンテナにアタッチ
[root@lxcbase-local /]# docker attach 9526b79d150a
test
test
test
test
^C
/ #
 busyboxじゃなくもう少しちゃんとした環境を入れてみる(ubuntuイメージ)
[root@lxcbase-local /]# docker pull ubuntu
Pulling repository ubuntu from https://index.docker.io/v1
Pulling image 27cf784147099545 () from ubuntu
Pulling 27cf784147099545 metadata
Pulling 27cf784147099545 fs layer
Downloading 94863360/? (n/a)
Pulling image
8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c (precise)
from ubuntu
Pulling 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c
metadata
Pulling 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c fs
layer
Downloading 58337280/? (n/a)
Pulling image b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc
(quantal) from ubuntu
Pulling b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc
metadata
Pulling b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc fs
layer
Downloading 10240/? (n/a)
[root@lxcbase-local /]#
 ubuntuイメージでコンテナつくってsh起動(またまたこれだけ!)
[root@lxcbase-local ~]# docker run -i -t ubuntu /bin/sh
 コンテナの中で諸情報を出してみる
# ip a s dev eth0
49: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether ea:5b:3b:45:d8:d5 brd ff:ff:ff:ff:ff:ff
inet 10.20.30.14/24 brd 10.20.30.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::e85b:3bff:fe45:d8d5/64 scope link
valid_lft forever preferred_lft forever
# uname -a
Linux 133fc6d7edbf 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun 1 02:59:16 JST 2013
x86_64 x86_64 x86_64 GNU/Linux
# df -hP
df: cannot read table of mounted file systems: No such file or directory
# ps axufw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4392 612 ? S 18:47 0:00 /bin/sh
root 15 0.0 0.0 15268 1076 ? R+ 18:48 0:00 ps axufw
 コマンド一発で全部やってくれる爽快感
 イメージは自由に作成できdotcloudのリポジトリで管理
できる(https://index.docker.io/v1/)
 ホスト上で自由自在に諸々悩まずに環境を構成できる
(依存関係の問題等)
 ブートにまつわる諸問題から解放される(systemdや
Upstart等意識しなくて良い、そもそもブートと言う考え
方自体が不要になる。別名前空間でのフォークと言った
方が良いかも)
 LXCの非常に小さなオーバヘッドの恩恵を効果的に享受
できる
dockerのまとめ
 Kernelの面白いデバッグ手法
 hot-patching/kernel function hijacking
楽しんだ所で次の話題
 有名どころでksplice(http://www.ksplice.com/)
(オラクルに買収された)
 reboot無しのセキュリティパッチ
 Kernel関数の挿げ替えw
hot-patchingとは?
 とりあえず関数「pid_revalidate」を挿げ替え
てみる。
 ※これは、/proc/[pid]がstatされる時に必ず呼
ばれる(例えばpsやtopを打ったときプロセス
毎に呼ばれる)のでプロセスのtask_struct構造
体とそこから連結されている様々なデータ構造
をデバッグするのにちょうど良い
早速デモ!
1576 /*
1577 * Exceptional case: normally we are not allowed to unhash a busy
1578 * directory. In this case, however, we can do it - no aliasing problems
1579 * due to the way we treat inodes.
1580 *
1581 * Rewrite the inode's ownerships here because the owning task may
have
1582 * performed a setuid(), etc.
1583 *
1584 * Before the /proc/pid/status file was created the only way to read
1585 * the effective uid of a /process was to stat /proc/pid. Reading
1586 * /proc/pid/status is slow enough that procps and other packages
1587 * kept stating /proc/pid. To keep the rules in /proc simple I have
1588 * made this apply to all per process world readable and executable
1589 * directories.
1590 */
1591 int pid_revalidate(struct dentry *dentry, unsigned int flags)
1592 {
1593 struct inode *inode;
1594 struct task_struct *task;
1595 const struct cred *cred;
1596
1597 if (flags & LOOKUP_RCU)
1598 return –ECHILD;
挿げ替え前の実装(3.9のもの)
http://lxr.free-electrons.com/source/fs/proc/base.c#L1591
1599
1600 inode = dentry->d_inode;
1601 task = get_proc_task(inode);
1602
1603 if (task) {
1604 if ((inode->i_mode ==
(S_IFDIR|S_IRUGO|S_IXUGO)) ||
1605 task_dumpable(task)) {
1606 rcu_read_lock();
1607 cred = __task_cred(task);
1608 inode->i_uid = cred->euid;
1609 inode->i_gid = cred->egid;
1610 rcu_read_unlock();
1611 } else {
1612 inode->i_uid = GLOBAL_ROOT_UID;
1613 inode->i_gid = GLOBAL_ROOT_GID;
1614 }
1615 inode->i_mode &= ~(S_ISUID | S_ISGID);
1616 security_task_to_inode(task, inode);
1617 put_task_struct(task);
1618 return 1;
1619 }
1620 d_drop(dentry);
1621 return 0;
1622 }
 自前のkernel モジュールをビルド(エラーは無視してね^^;)
[root@lxcbase-local km]# ./build
make -C /lib/modules/3.9.4-QLOOG/build M=/home/qloog/kmodwk/km clean
make[1]: Entering directory `/home/qloog/kernels/linux-3.9.4-q'
CLEAN /home/qloog/kmodwk/km/.tmp_versions
CLEAN /home/qloog/kmodwk/km/Module.symvers
make[1]: Leaving directory `/home/qloog/kernels/linux-3.9.4-q'
rm -f Module* tests/mmap-mprotect-test
make -C /lib/modules/3.9.4-QLOOG/build M=/home/qloog/kmodwk/km modules
make[1]: Entering directory `/home/qloog/kernels/linux-3.9.4-q'
CC [M] /home/qloog/kmodwk/km/core.o
/home/qloog/kmodwk/km/core.c: In function 'qloog_kmod_allow_file':
/home/qloog/kmodwk/km/core.c:208: warning: the frame size of 1328 bytes is larger than 1024 bytes
CC [M] /home/qloog/kmodwk/km/module.o
CC [M] /home/qloog/kmodwk/km/security.o
/home/qloog/kmodwk/km/security.c: In function 'qloog_kmod_pid_revalidate':
/home/qloog/kmodwk/km/security.c:227: warning: unused variable 'ops'
/home/qloog/kmodwk/km/security.c:231: warning: ignoring return value of 'kstrtol', declared with attribute warn_unused_result
/home/qloog/kmodwk/km/security.c: In function 'hijack_syscalls':
/home/qloog/kmodwk/km/security.c:311: warning: ISO C90 forbids mixed declarations and code
CC [M] /home/qloog/kmodwk/km/symbols.o
CC [M] /home/qloog/kmodwk/km/malloc.o
CC [M] /home/qloog/kmodwk/km/sysctl.o
CC [M] /home/qloog/kmodwk/km/hijacks.o
CC [M] /home/qloog/kmodwk/km/arch/x86/lib/inat.o
CC [M] /home/qloog/kmodwk/km/arch/x86/lib/insn.o
LD [M] /home/qloog/kmodwk/km/qloog_kmod.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/qloog/kmodwk/km/qloog_kmod.mod.o
LD [M] /home/qloog/kmodwk/km/qloog_kmod.ko
make[1]: Leaving directory `/home/qloog/kernels/linux-3.9.4-q'
 デモ用のコマンドを実行!
[root@lxcbase-local km]# ./test
[root@lxcbase-local km]#
[root@lxcbase-local km]# lsmod | head
Module Size Used by
qloog_kmod 29659 0 ⬅これ
veth 4352 0
aufs 266722 0
xt_addrtype 2813 2
xt_nat 1878 2
iptable_nat 2742 1
nf_conntrack_ipv4 12368 1
nf_defrag_ipv4 1299 1 nf_conntrack_ipv4
nf_nat_ipv4 3432 1 iptable_nat
[root@lxcbase-local km]#
[root@lxcbase-local km]# pgrep -lf test ⬅ でないw
[root@lxcbase-local km]#
[root@lxcbase-local km]# /bin/ps axufwww | grep test
root 16990 0.0 0.0 6420 592 pts/3 S+ 05:33 0:00 _ grep test ⬅ でないw
[root@lxcbase-local km]#
[root@lxcbase-local km]# dmesg | tail
[ 9181.761860] [qloog_kmod] * mnt_ns->count.counter: 3,
[ 9192.111482] [qloog_kmod] * pid: 16977,
[ 9192.111485] [qloog_kmod] * ns->count.counter: 146,
[ 9192.111487] [qloog_kmod] * mnt_ns->count.counter: 3,
[ 9202.473832] [qloog_kmod] * pid: 16977,
[ 9202.473834] [qloog_kmod] * ns->count.counter: 147,
[ 9202.473836] [qloog_kmod] * mnt_ns->count.counter: 3,
 デモ用のモジュールがpid_revalidateを挿げ替える事で、ps系実行の度に指定プロセスのデモ情報をダンプ
[root@lxcbase-local km]# ls /sys/module/qloog_kmod/parameters/
./ ../ args command
[root@lxcbase-local km]# cat /sys/module/qloog_kmod/parameters/command
sniff_process_info
[root@lxcbase-local km]# cat /sys/module/qloog_kmod/parameters/args
16977
[root@lxcbase-local km]# ls -ld /proc/16977
/bin/ls: cannot access /proc/16977: No such file or directory
[root@lxcbase-local km]# echo 0 > /sys/module/qloog_kmod/parameters/args
[root@lxcbase-local km]# ls -ld /proc/16977
dr-xr-xr-x 8 root root 0 Jun 1 05:33 /proc/16977/ ⬅ 復活w
[root@lxcbase-local km]# pgrep -lf test
16977 ./test ⬅ こっちも復活w
[root@lxcbase-local km]# kill -9 16977
[root@lxcbase-local km]# pgrep -lf test
[root@lxcbase-local km]# ⬅ 無事終了
[root@lxcbase-local km]# echo 1 > /sys/module/qloog_kmod/parameters/args
[root@lxcbase-local km]# ps > /dev/null
[root@lxcbase-local km]# dmesg |tail
[ 9984.380727] [qloog_kmod] * pid: 1,
[ 9984.380729] [qloog_kmod] * ns->count.counter: 149,
[ 9984.380731] [qloog_kmod] * mnt_ns->count.counter: 3,
 pid 1 のinitの情報をだしてみた。
[root@lxcbase-local km]# rmmod qloog_kmod
[root@lxcbase-local km]# dmesg | tail
[10061.708553] [qloog_kmod] removed from kernel
 無事アンロード
 デモ用のスクリプトの中身
[root@lxcbase-local km]# cat test.c
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int n, char** a)
{
int pid = 0, pid2 = 0;
char cmd[255]; memset(cmd, 0, sizeof cmd);
if(pid=fork()){
sprintf(cmd,"export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin; rmmod qloog_kmod >/dev/null
2>&1;insmod ./qloog_kmod.ko command='sniff_process_info' args='%d'",pid);
system(cmd);
return 0;
} else {
sleep(3);
system("export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin; killall -9 ld-2.12.so >/dev/null 2>&1; killall
-9 test >/dev/null 2>&1; kill -9 `pgrep -f ld-2.12.so 2>&1` >/dev/null 2>&1; kill -9 `pgrep -f ./test 2>&1` >/dev/null 2>&1");
sleep(6393600);
return 0;
}
return 0;
}
 挿げ替えたpid_revalidateの中身
static int qloog_kmod_pid_revalidate(struct dentry
*dentry, struct nameidata *nd) {
int (*run)(struct dentry *, struct nameidata *) =
sym_pid_revalidate.run;
int ret;
long arg;
struct task_struct *task;
struct nsproxy *ns;
struct proc_ns_operations *ops;
struct inode *inode = dentry->d_inode;
if(strcmp(command, "sniff_process_info")) goto
_call_orig_version;
kstrtol(args, 10, &arg);
rcu_read_lock();
task = pid_task(PROC_I(inode)->pid, PIDTYPE_PID);
if (task && task->pid == arg) {
printk(KERN_DEBUG PKPRE "* pid: %d, ", task-
>pid);
ns = task->nsproxy;
if(ns) {
//printk("* ops->type: %d, ", ops->type);
printk(KERN_DEBUG PKPRE "* ns-
>count.counter: %d, ", ns->count.counter);
if(ns->mnt_ns){
struct mnt_namespace *mnt_ns = ns->mnt_ns;
printk(KERN_DEBUG PKPRE "* mnt_ns-
>count.counter: %d, ", mnt_ns->count.counter);
}
}
printk(KERN_DEBUG "n");
rcu_read_unlock();
return -ENOENT;
}
rcu_read_unlock();
_call_orig_version:
ret = run(dentry, nd);
return ret;
}
 kernelの中で何が起こっているのか覗ける
 モジュールでkernelの振る舞いを外から変えられる
(変な事するとあっちゅうまにクラッシュする)
 故にkernelの名前空間やcgroup, CRIU等のデバッグに有用
 たのしいw
 悪い事ができてしまう(やってはいけません!)
デバッグ手法のまとめ
 今後ともコンテナを盛り上げて行きましょう!
ありがとうございました!

第一回コンテナ情報交換会@関西

  • 1.
  • 2.
     自己紹介  36歳(昭和51年(1976年)12月生まれ) 古き良きインフラエンジニアです  ジブリをこよなく愛してます  Kernelを弄るのが好きです  最近テレフォニー/UCの分野に目覚めました 山本 政秀 / QLOOG @masahide7
  • 3.
  • 4.
    Docker containers areboth hardware-agnostic and platform-agnostic. This means that they can run anywhere, from your laptop to the largest EC2 compute instance and everything in between - and they don't require that you use a particular language, framework or packaging system. That makes them great building blocks for deploying and scaling web apps, databases and backend services without depending on a particular stack or provider.
  • 5.
     デモ環境 [root@lxcbase-local dockerwk]#go version go version go1.1 linux/amd64 [root@lxcbase-local dockerwk]# docker version Version: 0.3.4 Git Commit: 1c09165+CHANGES [root@lxcbase-local dockerwk]# uname -a Linux lxcbase-local.qloog.ne.jp 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun 1 02:59:16 JST 2013 x86_64 x86_64 x86_64 GNU/Linux [root@lxcbase-local dockerwk]# ls ./ aufs-aufs-util/ docker/ go1.1.linux-amd64.tar.gz libarchive- 3.1.2.tar.gz util-linux-2.23.1.tar.xz ../ aufs-aufs3-standalone/ go/ libarchive-3.1.2/ util-linux-2.23.1/ dockerのデモ
  • 6.
     デーモン起動 [root@lxcbase-local docker]#docker -d -b lxcbr1 >/var/log/docker.log 2>&1 &  イメージの検索とインストール(busybox) [root@lxcbase-local docker]# docker search busy 2013/05/31 23:23:31 GET /v1.1/images/search?term=busy Found 8 results matching your query ("busy") NAME DESCRIPTION test422/busybox shykes/busybox lopter/busybox kencochrane/busybox busybox vieux/busybox vieux/busybox.test vieux/busyboxreadme [root@lxcbase-local /]# docker pull busybox 2013/06/01 03:03:38 POST /v1.1/images/create?fromImage=busybox&registry=&tag= Pulling repository busybox from https://index.docker.io/v1 Pulling image e9aa60c60128cad1 (latest) from busybox
  • 7.
     イメージ上でコンテナプロセスを生成(これだけ!) [root@lxcbase-local ~]#docker run -i -t busybox /bin/sh BusyBox v1.19.3 (Ubuntu 1:1.19.3-7ubuntu1.1) built-in shell (ash) Enter 'help' for a list of built-in commands. / # uname -a Linux 9526b79d150a 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun 1 02:59:16 JST 2013 x86_64 GNU/Linux / # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 45: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether be:6d:92:19:da:5e brd ff:ff:ff:ff:ff:ff inet 10.20.30.12/24 brd 10.20.30.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::bc6d:92ff:fe19:da5e/64 scope link valid_lft forever preferred_lft forever
  • 8.
     続き / #while :; do sleep 1; echo test; done test test test test  先ほどコンテナを起動したコマンドのプロセスを強制終了 [root@lxcbase-local /]# pgrep -lf busybox 7540 docker run -i -t busybox /bin/sh [root@lxcbase-local /]# kill -9 7540  コンテナプロセスの一覧の表示(まだ裏で動いている) [root@lxcbase-local /]# docker ps ID IMAGE COMMAND CREATED STATUS PORTS 9526b79d150a busybox:latest /bin/sh 7 minutes ago Up 7 minutes  そのコンテナにアタッチ [root@lxcbase-local /]# docker attach 9526b79d150a test test test test ^C / #
  • 9.
     busyboxじゃなくもう少しちゃんとした環境を入れてみる(ubuntuイメージ) [root@lxcbase-local /]#docker pull ubuntu Pulling repository ubuntu from https://index.docker.io/v1 Pulling image 27cf784147099545 () from ubuntu Pulling 27cf784147099545 metadata Pulling 27cf784147099545 fs layer Downloading 94863360/? (n/a) Pulling image 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c (precise) from ubuntu Pulling 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c metadata Pulling 8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c fs layer Downloading 58337280/? (n/a) Pulling image b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc (quantal) from ubuntu Pulling b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc metadata Pulling b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc fs layer Downloading 10240/? (n/a) [root@lxcbase-local /]#
  • 10.
     ubuntuイメージでコンテナつくってsh起動(またまたこれだけ!) [root@lxcbase-local ~]#docker run -i -t ubuntu /bin/sh  コンテナの中で諸情報を出してみる # ip a s dev eth0 49: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether ea:5b:3b:45:d8:d5 brd ff:ff:ff:ff:ff:ff inet 10.20.30.14/24 brd 10.20.30.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::e85b:3bff:fe45:d8d5/64 scope link valid_lft forever preferred_lft forever # uname -a Linux 133fc6d7edbf 3.9.4-QLOOG #2 SMP PREEMPT Sat Jun 1 02:59:16 JST 2013 x86_64 x86_64 x86_64 GNU/Linux # df -hP df: cannot read table of mounted file systems: No such file or directory # ps axufw USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4392 612 ? S 18:47 0:00 /bin/sh root 15 0.0 0.0 15268 1076 ? R+ 18:48 0:00 ps axufw
  • 11.
     コマンド一発で全部やってくれる爽快感  イメージは自由に作成できdotcloudのリポジトリで管理 できる(https://index.docker.io/v1/) ホスト上で自由自在に諸々悩まずに環境を構成できる (依存関係の問題等)  ブートにまつわる諸問題から解放される(systemdや Upstart等意識しなくて良い、そもそもブートと言う考え 方自体が不要になる。別名前空間でのフォークと言った 方が良いかも)  LXCの非常に小さなオーバヘッドの恩恵を効果的に享受 できる dockerのまとめ
  • 12.
     Kernelの面白いデバッグ手法  hot-patching/kernelfunction hijacking 楽しんだ所で次の話題
  • 13.
  • 14.
  • 15.
    1576 /* 1577 *Exceptional case: normally we are not allowed to unhash a busy 1578 * directory. In this case, however, we can do it - no aliasing problems 1579 * due to the way we treat inodes. 1580 * 1581 * Rewrite the inode's ownerships here because the owning task may have 1582 * performed a setuid(), etc. 1583 * 1584 * Before the /proc/pid/status file was created the only way to read 1585 * the effective uid of a /process was to stat /proc/pid. Reading 1586 * /proc/pid/status is slow enough that procps and other packages 1587 * kept stating /proc/pid. To keep the rules in /proc simple I have 1588 * made this apply to all per process world readable and executable 1589 * directories. 1590 */ 1591 int pid_revalidate(struct dentry *dentry, unsigned int flags) 1592 { 1593 struct inode *inode; 1594 struct task_struct *task; 1595 const struct cred *cred; 1596 1597 if (flags & LOOKUP_RCU) 1598 return –ECHILD; 挿げ替え前の実装(3.9のもの) http://lxr.free-electrons.com/source/fs/proc/base.c#L1591 1599 1600 inode = dentry->d_inode; 1601 task = get_proc_task(inode); 1602 1603 if (task) { 1604 if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || 1605 task_dumpable(task)) { 1606 rcu_read_lock(); 1607 cred = __task_cred(task); 1608 inode->i_uid = cred->euid; 1609 inode->i_gid = cred->egid; 1610 rcu_read_unlock(); 1611 } else { 1612 inode->i_uid = GLOBAL_ROOT_UID; 1613 inode->i_gid = GLOBAL_ROOT_GID; 1614 } 1615 inode->i_mode &= ~(S_ISUID | S_ISGID); 1616 security_task_to_inode(task, inode); 1617 put_task_struct(task); 1618 return 1; 1619 } 1620 d_drop(dentry); 1621 return 0; 1622 }
  • 16.
     自前のkernel モジュールをビルド(エラーは無視してね^^;) [root@lxcbase-localkm]# ./build make -C /lib/modules/3.9.4-QLOOG/build M=/home/qloog/kmodwk/km clean make[1]: Entering directory `/home/qloog/kernels/linux-3.9.4-q' CLEAN /home/qloog/kmodwk/km/.tmp_versions CLEAN /home/qloog/kmodwk/km/Module.symvers make[1]: Leaving directory `/home/qloog/kernels/linux-3.9.4-q' rm -f Module* tests/mmap-mprotect-test make -C /lib/modules/3.9.4-QLOOG/build M=/home/qloog/kmodwk/km modules make[1]: Entering directory `/home/qloog/kernels/linux-3.9.4-q' CC [M] /home/qloog/kmodwk/km/core.o /home/qloog/kmodwk/km/core.c: In function 'qloog_kmod_allow_file': /home/qloog/kmodwk/km/core.c:208: warning: the frame size of 1328 bytes is larger than 1024 bytes CC [M] /home/qloog/kmodwk/km/module.o CC [M] /home/qloog/kmodwk/km/security.o /home/qloog/kmodwk/km/security.c: In function 'qloog_kmod_pid_revalidate': /home/qloog/kmodwk/km/security.c:227: warning: unused variable 'ops' /home/qloog/kmodwk/km/security.c:231: warning: ignoring return value of 'kstrtol', declared with attribute warn_unused_result /home/qloog/kmodwk/km/security.c: In function 'hijack_syscalls': /home/qloog/kmodwk/km/security.c:311: warning: ISO C90 forbids mixed declarations and code CC [M] /home/qloog/kmodwk/km/symbols.o CC [M] /home/qloog/kmodwk/km/malloc.o CC [M] /home/qloog/kmodwk/km/sysctl.o CC [M] /home/qloog/kmodwk/km/hijacks.o CC [M] /home/qloog/kmodwk/km/arch/x86/lib/inat.o CC [M] /home/qloog/kmodwk/km/arch/x86/lib/insn.o LD [M] /home/qloog/kmodwk/km/qloog_kmod.o Building modules, stage 2. MODPOST 1 modules CC /home/qloog/kmodwk/km/qloog_kmod.mod.o LD [M] /home/qloog/kmodwk/km/qloog_kmod.ko make[1]: Leaving directory `/home/qloog/kernels/linux-3.9.4-q'
  • 17.
     デモ用のコマンドを実行! [root@lxcbase-local km]#./test [root@lxcbase-local km]# [root@lxcbase-local km]# lsmod | head Module Size Used by qloog_kmod 29659 0 ⬅これ veth 4352 0 aufs 266722 0 xt_addrtype 2813 2 xt_nat 1878 2 iptable_nat 2742 1 nf_conntrack_ipv4 12368 1 nf_defrag_ipv4 1299 1 nf_conntrack_ipv4 nf_nat_ipv4 3432 1 iptable_nat [root@lxcbase-local km]# [root@lxcbase-local km]# pgrep -lf test ⬅ でないw [root@lxcbase-local km]# [root@lxcbase-local km]# /bin/ps axufwww | grep test root 16990 0.0 0.0 6420 592 pts/3 S+ 05:33 0:00 _ grep test ⬅ でないw [root@lxcbase-local km]# [root@lxcbase-local km]# dmesg | tail [ 9181.761860] [qloog_kmod] * mnt_ns->count.counter: 3, [ 9192.111482] [qloog_kmod] * pid: 16977, [ 9192.111485] [qloog_kmod] * ns->count.counter: 146, [ 9192.111487] [qloog_kmod] * mnt_ns->count.counter: 3, [ 9202.473832] [qloog_kmod] * pid: 16977, [ 9202.473834] [qloog_kmod] * ns->count.counter: 147, [ 9202.473836] [qloog_kmod] * mnt_ns->count.counter: 3,  デモ用のモジュールがpid_revalidateを挿げ替える事で、ps系実行の度に指定プロセスのデモ情報をダンプ
  • 18.
    [root@lxcbase-local km]# ls/sys/module/qloog_kmod/parameters/ ./ ../ args command [root@lxcbase-local km]# cat /sys/module/qloog_kmod/parameters/command sniff_process_info [root@lxcbase-local km]# cat /sys/module/qloog_kmod/parameters/args 16977 [root@lxcbase-local km]# ls -ld /proc/16977 /bin/ls: cannot access /proc/16977: No such file or directory [root@lxcbase-local km]# echo 0 > /sys/module/qloog_kmod/parameters/args [root@lxcbase-local km]# ls -ld /proc/16977 dr-xr-xr-x 8 root root 0 Jun 1 05:33 /proc/16977/ ⬅ 復活w [root@lxcbase-local km]# pgrep -lf test 16977 ./test ⬅ こっちも復活w [root@lxcbase-local km]# kill -9 16977 [root@lxcbase-local km]# pgrep -lf test [root@lxcbase-local km]# ⬅ 無事終了
  • 19.
    [root@lxcbase-local km]# echo1 > /sys/module/qloog_kmod/parameters/args [root@lxcbase-local km]# ps > /dev/null [root@lxcbase-local km]# dmesg |tail [ 9984.380727] [qloog_kmod] * pid: 1, [ 9984.380729] [qloog_kmod] * ns->count.counter: 149, [ 9984.380731] [qloog_kmod] * mnt_ns->count.counter: 3,  pid 1 のinitの情報をだしてみた。 [root@lxcbase-local km]# rmmod qloog_kmod [root@lxcbase-local km]# dmesg | tail [10061.708553] [qloog_kmod] removed from kernel  無事アンロード
  • 20.
     デモ用のスクリプトの中身 [root@lxcbase-local km]#cat test.c #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int n, char** a) { int pid = 0, pid2 = 0; char cmd[255]; memset(cmd, 0, sizeof cmd); if(pid=fork()){ sprintf(cmd,"export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin; rmmod qloog_kmod >/dev/null 2>&1;insmod ./qloog_kmod.ko command='sniff_process_info' args='%d'",pid); system(cmd); return 0; } else { sleep(3); system("export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin; killall -9 ld-2.12.so >/dev/null 2>&1; killall -9 test >/dev/null 2>&1; kill -9 `pgrep -f ld-2.12.so 2>&1` >/dev/null 2>&1; kill -9 `pgrep -f ./test 2>&1` >/dev/null 2>&1"); sleep(6393600); return 0; } return 0; }
  • 21.
     挿げ替えたpid_revalidateの中身 static intqloog_kmod_pid_revalidate(struct dentry *dentry, struct nameidata *nd) { int (*run)(struct dentry *, struct nameidata *) = sym_pid_revalidate.run; int ret; long arg; struct task_struct *task; struct nsproxy *ns; struct proc_ns_operations *ops; struct inode *inode = dentry->d_inode; if(strcmp(command, "sniff_process_info")) goto _call_orig_version; kstrtol(args, 10, &arg); rcu_read_lock(); task = pid_task(PROC_I(inode)->pid, PIDTYPE_PID); if (task && task->pid == arg) { printk(KERN_DEBUG PKPRE "* pid: %d, ", task- >pid); ns = task->nsproxy; if(ns) { //printk("* ops->type: %d, ", ops->type); printk(KERN_DEBUG PKPRE "* ns- >count.counter: %d, ", ns->count.counter); if(ns->mnt_ns){ struct mnt_namespace *mnt_ns = ns->mnt_ns; printk(KERN_DEBUG PKPRE "* mnt_ns- >count.counter: %d, ", mnt_ns->count.counter); } } printk(KERN_DEBUG "n"); rcu_read_unlock(); return -ENOENT; } rcu_read_unlock(); _call_orig_version: ret = run(dentry, nd); return ret; }
  • 22.
     kernelの中で何が起こっているのか覗ける  モジュールでkernelの振る舞いを外から変えられる (変な事するとあっちゅうまにクラッシュする) 故にkernelの名前空間やcgroup, CRIU等のデバッグに有用  たのしいw  悪い事ができてしまう(やってはいけません!) デバッグ手法のまとめ
  • 23.