rootless
User namespace
•
•
•
•
•
• user
• user
• podman
• podman uid_man
• podman
• podman
• rootless
•
: rootless
• root root
• Docker docker group 

docker group ≒ root rootless rootfull
• rootless
•
• e.g. CVE-2014-9357: (Docker)
• root
rootless : podman
• RHEL8
• Docker
• Podman docker
• root daemon Docker
RedHat
• root
• docker
• RHEL8 podman rootless
•
•
• Retrieva Tech Blog
• [🔍 TECH Blog]
•
•
• Linux Namespace cgroups
(+ CoW secomp etc……)
• Linux Namespace pid ( )OS
ID ( )
•
• root
:
•
• /proc/${PID}/ns/
• fork
• clone(2) unshare(2)
• setns(2)
• /proc/${PID}/ns/ fd
:
• mnt : (2.4.19 )
• ipc : (2.6.19 )
• uts : (2.6.19 )
• net : (2.6.24 )
• pid : ID (2.6.24 )
• user : uid/gid capability (2.6.23 )
• 3.8
: mnt
•
• /tmp
• pivot_root
• /proc
• clone(2) CLONE_NEW* (2.4.19)
CLONE_NEWNS
: ipc
• (InterProcess Communication)
•
• PIPE IPC
• /proc/sys/fs/mqueue
: uts
•
•
• /etc/hosts
mnt
: net
•
•
• net
•
• veth( )
• ip(1)
• /proc/${PID}/ns/ bind
: pid
• id
• pid pid
• pid
• /proc mnt /proc
• ps(1) /proc pid
user new!!
•
• uid
• → uid=0 (root)
• Linux 3.8 User Namespace
• clone(2) CLONE_NEWUSER 2.6.23 clone(2)
3.5 3.8
• RHEL RHEL7.3(Kernel 3.10.0) User Namespace
• RHEL7.4 sysctl RHEL8
user
•
•
• uid=0 ( )
• e.g. (uid=0) / /
SUID / CLONE_FS chroot so / mount propagation
/ audit log( ) etc
• RHEL Fedora Project
User Namespace
•
• root
• etc
• =
• User Namespace
:
• RHEL7/Centos7 (7.4 ) (RHEL8 / Ubuntu )
• sudo sysctl user.max_user_namespaces=31194
• user 7 0
•
• sudo useradd -m -U -u 2001 alice
• sudo useradd -m -U -u 2002 bob
• sudo useradd -m -U -u 2003 -G wheel charlotte; sudo passwd charlotte
: unshare -U
• unshare(1) -U user
• root
• 65534(nobody)
• sysctl kernel.overflowuid
(kernel.overflowgid)
• uid/gid
• nobdy
[alice@rutledge ~]$ id # alice
uid=2001(alice) gid=2001(alice)
groups=2001(alice) ...
[alice@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026531837]
[alice@rutledge ~]$ unshare -U # sudo
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ readlink /proc/$$/ns/user
user:[4026532602]
[nobody@rutledge ~]$ sysctl kernel.overflowuid
kernel.overflowuid = 65534
[nobody@rutledge ~]$ ls -ld /home/* /root/
drwx------. 2 nobody nobody 99 Apr 15 18:36 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
dr-xr-x---. 2 nobody nobody 114 Apr 12 18:55 /
root/
: nobody
•
• /home/alice
• /home/bob
• → nobody
• Alice
• user alice
• → Alice
• user alice
• nobody
[nobody@rutledge~]$ touch /home/alice/file
[nobody@rutledge ~]$ touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[nobody@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 nobody nobody 0 Apr 15 18:40 /
home/alice/file
[nobody@rutledge ~]$ ls -l /home/bob/
ls: cannot open directory '/home/bob/':
Permission denied
[nobody@rutledge ~]$ exit #
logout
[alice@rutledge ~]$ ls -l /home/alice/file
-rw-rw-r--. 1 alice alice 0 Apr 15 18:40 /home/
alice/file
: alice nobody
• /proc/${PID}/uid_map user
• ( uid) ( uid) ( )
•
• (5 )
•
•
•
• uid
• uid
[alice@rutledge ~]$ unshare -U
[nobody@rutledge ~]$ id
uid=65534(nobody) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ echo $$
2392
--- ---
[alice@rutledge ~]$ echo "0 2002 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 2" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
[alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/
uid_map
-bash: echo: write error: Operation not permitted
--- ---
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
: root
• uid=0 2001(alice)
• alice uid=0(root)
• /home/bob /root (
)alice
nobody( )
• unshare -r
• sudo root
[nobody@rutledge ~]$ id
uid=0(root) gid=65534(nobody)
groups=65534(nobody) ...
[nobody@rutledge ~]$ ls -ld /home/* /home/
drwxr-xr-x. 5 nobody nobody 47 Apr 15 18:21 /
home/
drwx------. 2 root nobody 111 Apr 15 18:40 /
home/alice
drwx------. 2 nobody nobody 62 Apr 15 18:11 /
home/bob
drwx------. 2 nobody nobody 83 Apr 15 18:32 /
home/charlotte
: root
• root
• /etc/shadow
• bob home
•
•
•
• poweroff
• root 🤔
[root@rutledge ~]# cat /etc/shadow
cat: /etc/shadow: Permission denied
[root@rutledge ~]# touch /home/bob/file
touch: cannot touch '/home/bob/file':
Permission denied
[root@rutledge ~]# pkill NetworkManager
pkill: killing pid 969 failed: Operation not
permitted
[root@rutledge ~]# ip link add type veth
RTNETLINK answers: Operation not permitted
[root@rutledge ~]# mount -t tmpfs tmpfs /bin/
mount: /usr/bin: permission denied.
[root@rutledge ~]# umount /boot
umount: /boot: must be superuser to unmount.
[root@rutledge ~]# poweroff
Failed to connect to bus: Operation not
permitted
Failed to open initctl fifo: Permission denied
Failed to talk to init daemon.
: root
• user alice
•
• user root
• chroot
• -U unshare
•
[root@rutledge ~]# chroot /
[root@rutledge /]# unshare --pid --fork --
mount-proc
[root@rutledge /]# ps -el --forest
F S UID PID PPID C PRI NI ADDR SZ
WCHAN TTY TIME CMD
4 S 0 1 0 0 80 0 - 7337 -
pts/1 00:00:00 bash
0 R 0 24 1 0 80 0 - 11184 -
pts/1 00:00:00 ps
:
• user user
• mnt mount
• net
• pid
• user
root
• ok (user
)
• user
[root@rutledge /]# unshare --mount --net --pid
--fork --mount-proc
[root@rutledge /]# mount -t tmpfs tmp /tmp/
[root@rutledge /]# findmnt /tmp
TARGET SOURCE FSTYPE OPTIONS
/tmp tmp tmpfs
rw,relatime,seclabel,uid=2001,gid=2001
[root@rutledge /]# ip link add type veth
[root@rutledge /]# ip a
1: lo: <LOOPBACK> ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: veth0@veth1: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether 22:43:f8:f3:10:60 brd
ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-
DOWN> ...
link/ether e2:d0:8b:dd:19:b0 brd
ff:ff:ff:ff:ff:ff
:
•
chroot/pivot_root
1.
2. user + mount
3. pivot_root
bind
4. oldroot
5. pivot_root
6. oldroot exec chroot
7. oldroot lazy umount
•
--- yum charlotte
alice ---
[alice@rutledge ~]$ su - charlotte
[charlotte@rutledge ~]$ sudo yum install -y --
installroot=/home/alice/wonderland --releasever=8 @core
iproute
[charlotte@rutledge ~]$ sudo chown -R alice: /home/
alice/wonderland
--- alice---
[alice@rutledge ~]$ unshare -Ur -n -m -pf
[root@rutledge ~]# mkdir -p under_ground
[root@rutledge ~]# mount -o bind wonderland under_ground
[root@rutledge ~]# mkdir -p under_ground/.oldroot
[root@rutledge ~]# cd under_ground
[root@rutledge under_ground]# pivot_root . .oldroot
[root@rutledge under_ground]# exec chroot . /bin/bash -l
[root@rutledge /]# mount -t proc proc /proc
[root@rutledge /]# umount --lazy .oldroot
[root@rutledge /]# findmnt
TARGET SOURCE FSTYPE
OPTIONS
/ /dev/mapper/rhel-home[/alice/wonderland] xfs
rw,relatime,seclabel,attr2,inode64,noquota
└─/proc proc proc
rw,relatime
:
• ……
• su
• →uid_map 1
• net
• →net veth
NIC net
root
• bind overlayfs
• CoW
• → overlayfs (Kernel
)user
[root@rutledge /]# useradd jack
Setting mailbox file permissions: Invalid
argument
[root@rutledge /]# su - jack
su: cannot set groups: Operation not permitted
[root@rutledge /]# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop ...
link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
[root@rutledge /]# mkdir -p upper work newroot
[root@rutledge /]# mount -t overlay -o
lowerdir=/,upperdir=upper,workdir=work overlay
newroot
mount: /mnt: permission denied.
podman
•
• podman (on RHEL8)
• podman yum dnf
• centos7 sleep inf
• Docker podman exec
• sudo (rootless!!)
[alice@rutledge ~]$ podman run -d
centos:centos7 sleep inf
1209...7e74
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# ps aux --forest
USER PID %CPU %MEM VSZ RSS TTY
STAT START TIME COMMAND
root 6 1.0 0.3 11832 2972 pts/0
Ss 10:23 0:00 /bin/bash
root 19 0.0 0.4 51748 3392 pts/0
R+ 10:23 0:00 _ ps aux --forest
root 1 0.0 0.0 4372 664 ?
Ss 10:22 0:00 sleep inf
1: podman uid_map
• podman
• uid_map 0 2001 1 

1 100000 65536 ……
• jack uid=1000
uid=1000999 ……
•
• root
uid_map uid
• 1000000
• 🤔
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# useradd jack
[root@1209b4cedd82 /]# su -c id jack
uid=1000(jack) gid=1000(jack) groups=1000(jack)
[root@1209b4cedd82 /]# cat /proc/1/uid_map
0 2001 1
1 100000 65536
newuidmap(1) / newgidmap(1)
• shadow-utils
• /proc/${pid}/uid_map(gid_map)
•
• SUID
=root
uid
• /etc/
subuid(subgid)
• useradd
•
• SUID rootless ……
[alice@rutledge ~]$ cat /etc/subuid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ cat /etc/subgid
alice:100000:65536
bob:165536:65536
charlotte:231072:65536
[alice@rutledge ~]$ unshare -U sleep inf &
[1] 7126
[alice@rutledge ~]$ newuidmap $! 0 2002 1
newuidmap: uid range [0-1) -> [2002-2003) not
allowed
[alice@rutledge ~]$ newuidmap $! 0 $(id -u) 1 1
100000 65536
[alice@rutledge ~]$ newgidmap $! 0 $(id -g) 1 1
100000 65536
[alice@rutledge ~]$ cat /proc/$!/uid_map
0 2001 1
1 100000 65536
[alice@rutledge ~]$ cat /proc/$!/gid_map
0 2001 1
1 100000 65536
SUID rootless
• rootless
• root
• (200 )
• int overflow ……
• uid_map/gid_map
• e.g. user uid
• ( 1 1 newuidmap …… )
2: podman
• podman ( )
• tap0
• grep
slirp4netns
•
tap0
• → TUN/TAP
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@1209b4cedd82 /]# curl -I 'https://
retrieva.jp/'
HTTP/1.1 200 OK
:
[root@1209b4cedd82 /]# yum install -y iproute
[root@934bf6e4252b /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel ...
:
[root@934bf6e4252b /]# exit
[alice@rutledge ~]$ ps aux | grep tap0
alice 11881 0.0 0.2 4592 1856 pts/0
S 19:22 0:00 /usr/bin/slirp4netns -c -e 3 -
r 4 11870 tap0
[alice@rutledge ~]$ kill 11870
[alice@rutledge ~]$ podman exec -it $(podman ps
-ql) ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue ...
:
slirp4netns: slirp
• slirp SLIP
(Serial Line Internet Protocol)
• SLIP PPP
•
net
slirp4netns
• QEMU
• IP
• default route: 10.0.2.2/24
• DNS forward: 10.0.2.3
• DHCP addresses: 10.0.2.15 - 10.0.2.31
[alice@rutledge ~]$ podman exec -lit /bin/bash
[root@934bf6e4252b /]# curl 'https://retrieva.jp/' -I
HTTP/1.1 200 OK
:
[root@a041f01d3221 /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP>...
link/loopback 00:00:00:00:00:00 brd ...
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP>...
link/ether 0e:3c:3c:65:d9:82 brd ...
inet 10.0.2.100/24 brd 10.0.2.255 scope global
tap0
valid_lft forever preferred_lft forever
inet6 fe80::c3c:3cff:fe65:d982/64 scope link
valid_lft forever preferred_lft forever
[root@a041f01d3221 /]# ip route
default via 10.0.2.2 dev tap0
10.0.2.0/24 dev tap0 proto kernel scope link src
10.0.2.100
[root@934bf6e4252b /]# exit
slirp4netns: slirp netns
•
root
• net
• SUID
• RHEL8 slirp
listen
• slirp4netns-0.1-2 bind
[alice@rutledge ~]$ ls -l $(which slirp4netns)
-rwxr-xr-x. 1 root root 76264 8 11 2018 /
usr/bin/slirp4netns
[alice@rutledge ~]$ podman run -p 10080:80
centos:centos7
port bindings are not yet supported by rootless
containers
[alice@rutledge ~]$ rpm -q slirp4netns
slirp4netns-0.1-1.dev.gitc4e1bc5.el8+1463+3d8a3
dce.x86_64
3: CoW
• OS
• CoW(Copy-on-Write)
+
• Docker dm-thin overlayfs
root
• podman info
• GraphDriverName vfs
• GraphRoot ~/.local/ storage
• RunRoot /run/user/${UID}/run
• RunRoot bind
(hosts resolve.conf ) GraphRoot
• vfs-layers/mountpoints.json
[alice@rutledge ~]$ podman info
:
store:
ContainerStore:
number: 1
GraphDriverName: vfs
GraphOptions: []
GraphRoot: /home/alice/.local/share/
containers/storage
GraphStatus: {}
ImageStore:
number: 1
RunRoot: /run/user/2001/run
[alice@rutledge ~]$ find /run/user/2001/run
:
/run/user/2001/run/vfs-containers/d1ab...eefd
/run/user/2001/run/vfs-containers/d1ab...eefd/
userdata
:
/run/user/2001/run/vfs-layers
/run/user/2001/run/vfs-layers/mountpoints.json
podman (vfs)
• mountpoints.json
•
• jack
10999
(uid_map)
•
[alice@rutledge ~]$ jq '.[].path' /run/user/
2001/run/vfs-layers/mountpoints.json
"/home/alice/.local/share/containers/storage/
vfs/dir/aeaa...458a"
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...458a/
total 16
-rw-r--r--. 1 alice alice 12082 Mar 6 02:36
anaconda-post.log
lrwxrwxrwx. 1 alice alice 7 Mar 6 02:34
bin -> usr/bin
drwxr-xr-x. 2 alice alice 6 Mar 6 02:34
dev
[alice@rutledge ~]$ ll /home/alice/.local/
share/containers/storage/vfs/dir/aeaa...7458a/
home/
total 0
drwx------. 2 100999 100999 62 Apr 15 21:19
jack
podman(vfs)
• centos:centos7
210M
• 10
210M*10=2G
• CoW
• → 2G
• CoW
[alice@rutledge ~]$ du -sh .local/share/
containers/storage/vfs/dir/aeaa...7458a/
210M .local/share/containers/storage/vfs/
dir/aeaa...458a/
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 4.2G 16G 21% /
home
[alice@rutledge ~]$ seq 10 | xargs -I{} podman
run -d centos:centos7 sleep inf
[alice@rutledge ~]$ df -h .local/share/
containers/storage/
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/rhel-home 20G 6.3G 14G 32% /
home
fuse-overlayfs(1)
• vfs
• fuse-overlayfs user
overlayfs
• ~/.config/containers/storage.conf
• storage.driver="overlay"
• storage_options.mount_program="/usr/
bin/fuse-overlayfs"
•
podman storage
• vfs XFS reflink
shallow copy/CoW
• orz
[alice@rutledge ~]$ podman rm -f --all
[alice@rutledge ~]$ podman rmi -f --all
[alice@rutledge ~]$ su -c 'rm /home/
alice/.local/' charlotte #
[alice@rutledge ~]$ mkdir -p .config/
containers/
[alice@rutledge ~]$ cat .config/containers/
storage.conf
[storage]
driver = "overlay"
[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"
podman with fuse-overlayfs
• / fuse-overlayfs
• ~/.local/share/
containers/storage/*/ overlayfs
• diff: CoW
• work: overlayfs
• merged: overlayfs
•
• → mnt
[alice@rutledge ~]$ podman run -d centos:centos7
sleep inf
[alice@rutledge ~]$ podman exec -l findmnt /
TARGET SOURCE FSTYPE OPTIONS
/ fuse-overlayfs fuse.fuse-overlayfs
rw,nosuid,nodev,relatime,user_id=0,group_id=0,def
ault_permissions,allow_other
[alice@rutledge ~]$ ll /home/alice/.local/share/
containers/storage/overlay/*
/home/alice/.local/share/containers/storage/
overlay/
2bbb2f38cf08544b67e60954e9da373c67f2d5658a7e6a074
afc5818c9805ebe:
8
drwxr-xr-x. 4 alice alice 28 4 16 23:13 diff
-rw-r--r--. 1 alice alice 26 4 16 23:13 link
-rw-rw-r--. 1 alice alice 28 4 16 23:13 lower
drwx------. 2 alice alice 6 4 16 23:13 merged
drwx------. 3 alice alice 18 4 16 23:13 work
:
rootless
• su (uid_map )
• newuidmap(1) / newgidmap(1) (SUID )
• net (veth )
• slirp4netns !
• (bind )
• bind overlayfs (CoW )
• fuse-overlayfs nserns
• XFS reflink
: rootless
1.
2. user + mnt + net
3. [NEW] newuidmap(1) / newgidmap(1)
4. [UPDATE] pivot_root
bind fuse-overlayfs
5. oldroot
6. [NEW] fuse-overlayfs
pivot_root
mnt
•
• dev/ console tty bind
mount sys/ proc/
7. pivot_root
8. oldroot exec
chroot
9. oldroot lazy umount
10.[NEW] slirp4userns
11.[NEW] ip route
• Rootless
• https://www.slideshare.net/AkihiroSuda/rootless
• Namespaces in operation, part 1: namespaces overview [LWN.net]
• https://lwn.net/Articles/531114/
• Namespaces in operation, part 5: User namespaces [LWN.net]
• https://lwn.net/Articles/532593/
• Filesystem mounts in user namespaces [LWN.net]
• https://lwn.net/Articles/652468/
• Anatomy of a user namespaces vulnerability [LWN.net]
• https://lwn.net/Articles/543273/
• Man page of USER_NAMESPACES
• https://linuxjm.osdn.jp/html/LDP_man-pages/man7/
user_namespaces.7.html
• util-linux/unshare.c at master · karelzak/util-linux
• https://github.com/karelzak/util-linux/blob/master/sys-utils/
unshare.c
• shadow/newuidmap.c at master · shadow-maint/shadow
• https://github.com/shadow-maint/shadow/blob/master/src/
newuidmap.c
• hnakamur’s blog: QEMU Wiki Slirp Tap
• http://hnakamur.blogspot.com/2009/08/qemu-wikislirptap.html
• slirp4netns/main.c at master · rootless-containers/slirp4netns
• https://github.com/rootless-containers/slirp4netns/blob/master/
main.c
• Working with the Container Storage library and tools in Red Hat
Enterprise Linux
• https://www.redhat.com/en/blog/working-container-storage-
library-and-tools-red-hat-enterprise-linux
• The State of Rootless Containers
• https://www.slideshare.net/AkihiroSuda/the-state-of-rootless-
containers

コンテナ仮想、その裏側 〜user namespaceとrootlessコンテナ〜

  • 1.
  • 2.
    • • • • • • user • user •podman • podman uid_man • podman • podman • rootless •
  • 3.
    : rootless • rootroot • Docker docker group 
 docker group ≒ root rootless rootfull • rootless • • e.g. CVE-2014-9357: (Docker) • root
  • 4.
    rootless : podman •RHEL8 • Docker • Podman docker • root daemon Docker RedHat • root • docker
  • 5.
    • RHEL8 podmanrootless • • • Retrieva Tech Blog • [🔍 TECH Blog] •
  • 6.
    • • Linux Namespacecgroups (+ CoW secomp etc……) • Linux Namespace pid ( )OS ID ( ) • • root
  • 7.
    : • • /proc/${PID}/ns/ • fork •clone(2) unshare(2) • setns(2) • /proc/${PID}/ns/ fd
  • 8.
    : • mnt :(2.4.19 ) • ipc : (2.6.19 ) • uts : (2.6.19 ) • net : (2.6.24 ) • pid : ID (2.6.24 ) • user : uid/gid capability (2.6.23 ) • 3.8
  • 9.
    : mnt • • /tmp •pivot_root • /proc • clone(2) CLONE_NEW* (2.4.19) CLONE_NEWNS
  • 10.
    : ipc • (InterProcessCommunication) • • PIPE IPC • /proc/sys/fs/mqueue
  • 11.
  • 12.
    : net • • • net • •veth( ) • ip(1) • /proc/${PID}/ns/ bind
  • 13.
    : pid • id •pid pid • pid • /proc mnt /proc • ps(1) /proc pid
  • 14.
    user new!! • • uid •→ uid=0 (root) • Linux 3.8 User Namespace • clone(2) CLONE_NEWUSER 2.6.23 clone(2) 3.5 3.8 • RHEL RHEL7.3(Kernel 3.10.0) User Namespace • RHEL7.4 sysctl RHEL8
  • 15.
    user • • • uid=0 () • e.g. (uid=0) / / SUID / CLONE_FS chroot so / mount propagation / audit log( ) etc • RHEL Fedora Project
  • 16.
    User Namespace • • root •etc • = • User Namespace
  • 17.
    : • RHEL7/Centos7 (7.4) (RHEL8 / Ubuntu ) • sudo sysctl user.max_user_namespaces=31194 • user 7 0 • • sudo useradd -m -U -u 2001 alice • sudo useradd -m -U -u 2002 bob • sudo useradd -m -U -u 2003 -G wheel charlotte; sudo passwd charlotte
  • 18.
    : unshare -U •unshare(1) -U user • root • 65534(nobody) • sysctl kernel.overflowuid (kernel.overflowgid) • uid/gid • nobdy [alice@rutledge ~]$ id # alice uid=2001(alice) gid=2001(alice) groups=2001(alice) ... [alice@rutledge ~]$ readlink /proc/$$/ns/user user:[4026531837] [alice@rutledge ~]$ unshare -U # sudo [nobody@rutledge ~]$ id uid=65534(nobody) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ readlink /proc/$$/ns/user user:[4026532602] [nobody@rutledge ~]$ sysctl kernel.overflowuid kernel.overflowuid = 65534 [nobody@rutledge ~]$ ls -ld /home/* /root/ drwx------. 2 nobody nobody 99 Apr 15 18:36 / home/alice drwx------. 2 nobody nobody 62 Apr 15 18:11 / home/bob drwx------. 2 nobody nobody 83 Apr 15 18:32 / home/charlotte dr-xr-x---. 2 nobody nobody 114 Apr 12 18:55 / root/
  • 19.
    : nobody • • /home/alice •/home/bob • → nobody • Alice • user alice • → Alice • user alice • nobody [nobody@rutledge~]$ touch /home/alice/file [nobody@rutledge ~]$ touch /home/bob/file touch: cannot touch '/home/bob/file': Permission denied [nobody@rutledge ~]$ ls -l /home/alice/file -rw-rw-r--. 1 nobody nobody 0 Apr 15 18:40 / home/alice/file [nobody@rutledge ~]$ ls -l /home/bob/ ls: cannot open directory '/home/bob/': Permission denied [nobody@rutledge ~]$ exit # logout [alice@rutledge ~]$ ls -l /home/alice/file -rw-rw-r--. 1 alice alice 0 Apr 15 18:40 /home/ alice/file
  • 20.
    : alice nobody •/proc/${PID}/uid_map user • ( uid) ( uid) ( ) • • (5 ) • • • • uid • uid [alice@rutledge ~]$ unshare -U [nobody@rutledge ~]$ id uid=65534(nobody) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ echo $$ 2392 --- --- [alice@rutledge ~]$ echo "0 2002 1" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted [alice@rutledge ~]$ echo "0 2001 2" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted [alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/ uid_map [alice@rutledge ~]$ echo "0 2001 1" > /proc/2392/ uid_map -bash: echo: write error: Operation not permitted --- --- [nobody@rutledge ~]$ id uid=0(root) gid=65534(nobody) groups=65534(nobody) ...
  • 21.
    : root • uid=02001(alice) • alice uid=0(root) • /home/bob /root ( )alice nobody( ) • unshare -r • sudo root [nobody@rutledge ~]$ id uid=0(root) gid=65534(nobody) groups=65534(nobody) ... [nobody@rutledge ~]$ ls -ld /home/* /home/ drwxr-xr-x. 5 nobody nobody 47 Apr 15 18:21 / home/ drwx------. 2 root nobody 111 Apr 15 18:40 / home/alice drwx------. 2 nobody nobody 62 Apr 15 18:11 / home/bob drwx------. 2 nobody nobody 83 Apr 15 18:32 / home/charlotte
  • 22.
    : root • root •/etc/shadow • bob home • • • • poweroff • root 🤔 [root@rutledge ~]# cat /etc/shadow cat: /etc/shadow: Permission denied [root@rutledge ~]# touch /home/bob/file touch: cannot touch '/home/bob/file': Permission denied [root@rutledge ~]# pkill NetworkManager pkill: killing pid 969 failed: Operation not permitted [root@rutledge ~]# ip link add type veth RTNETLINK answers: Operation not permitted [root@rutledge ~]# mount -t tmpfs tmpfs /bin/ mount: /usr/bin: permission denied. [root@rutledge ~]# umount /boot umount: /boot: must be superuser to unmount. [root@rutledge ~]# poweroff Failed to connect to bus: Operation not permitted Failed to open initctl fifo: Permission denied Failed to talk to init daemon.
  • 23.
    : root • useralice • • user root • chroot • -U unshare • [root@rutledge ~]# chroot / [root@rutledge /]# unshare --pid --fork -- mount-proc [root@rutledge /]# ps -el --forest F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 S 0 1 0 0 80 0 - 7337 - pts/1 00:00:00 bash 0 R 0 24 1 0 80 0 - 11184 - pts/1 00:00:00 ps
  • 24.
    : • user user •mnt mount • net • pid • user root • ok (user ) • user [root@rutledge /]# unshare --mount --net --pid --fork --mount-proc [root@rutledge /]# mount -t tmpfs tmp /tmp/ [root@rutledge /]# findmnt /tmp TARGET SOURCE FSTYPE OPTIONS /tmp tmp tmpfs rw,relatime,seclabel,uid=2001,gid=2001 [root@rutledge /]# ip link add type veth [root@rutledge /]# ip a 1: lo: <LOOPBACK> ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: veth0@veth1: <BROADCAST,MULTICAST,M- DOWN> ... link/ether 22:43:f8:f3:10:60 brd ff:ff:ff:ff:ff:ff 3: veth1@veth0: <BROADCAST,MULTICAST,M- DOWN> ... link/ether e2:d0:8b:dd:19:b0 brd ff:ff:ff:ff:ff:ff
  • 25.
    : • chroot/pivot_root 1. 2. user +mount 3. pivot_root bind 4. oldroot 5. pivot_root 6. oldroot exec chroot 7. oldroot lazy umount • --- yum charlotte alice --- [alice@rutledge ~]$ su - charlotte [charlotte@rutledge ~]$ sudo yum install -y -- installroot=/home/alice/wonderland --releasever=8 @core iproute [charlotte@rutledge ~]$ sudo chown -R alice: /home/ alice/wonderland --- alice--- [alice@rutledge ~]$ unshare -Ur -n -m -pf [root@rutledge ~]# mkdir -p under_ground [root@rutledge ~]# mount -o bind wonderland under_ground [root@rutledge ~]# mkdir -p under_ground/.oldroot [root@rutledge ~]# cd under_ground [root@rutledge under_ground]# pivot_root . .oldroot [root@rutledge under_ground]# exec chroot . /bin/bash -l [root@rutledge /]# mount -t proc proc /proc [root@rutledge /]# umount --lazy .oldroot [root@rutledge /]# findmnt TARGET SOURCE FSTYPE OPTIONS / /dev/mapper/rhel-home[/alice/wonderland] xfs rw,relatime,seclabel,attr2,inode64,noquota └─/proc proc proc rw,relatime
  • 26.
    : • …… • su •→uid_map 1 • net • →net veth NIC net root • bind overlayfs • CoW • → overlayfs (Kernel )user [root@rutledge /]# useradd jack Setting mailbox file permissions: Invalid argument [root@rutledge /]# su - jack su: cannot set groups: Operation not permitted [root@rutledge /]# ip a 1: lo: <LOOPBACK> mtu 65536 qdisc noop ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 [root@rutledge /]# mkdir -p upper work newroot [root@rutledge /]# mount -t overlay -o lowerdir=/,upperdir=upper,workdir=work overlay newroot mount: /mnt: permission denied.
  • 27.
    podman • • podman (onRHEL8) • podman yum dnf • centos7 sleep inf • Docker podman exec • sudo (rootless!!) [alice@rutledge ~]$ podman run -d centos:centos7 sleep inf 1209...7e74 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# ps aux --forest USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 6 1.0 0.3 11832 2972 pts/0 Ss 10:23 0:00 /bin/bash root 19 0.0 0.4 51748 3392 pts/0 R+ 10:23 0:00 _ ps aux --forest root 1 0.0 0.0 4372 664 ? Ss 10:22 0:00 sleep inf
  • 28.
    1: podman uid_map •podman • uid_map 0 2001 1 
 1 100000 65536 …… • jack uid=1000 uid=1000999 …… • • root uid_map uid • 1000000 • 🤔 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# useradd jack [root@1209b4cedd82 /]# su -c id jack uid=1000(jack) gid=1000(jack) groups=1000(jack) [root@1209b4cedd82 /]# cat /proc/1/uid_map 0 2001 1 1 100000 65536
  • 29.
    newuidmap(1) / newgidmap(1) •shadow-utils • /proc/${pid}/uid_map(gid_map) • • SUID =root uid • /etc/ subuid(subgid) • useradd • • SUID rootless …… [alice@rutledge ~]$ cat /etc/subuid alice:100000:65536 bob:165536:65536 charlotte:231072:65536 [alice@rutledge ~]$ cat /etc/subgid alice:100000:65536 bob:165536:65536 charlotte:231072:65536 [alice@rutledge ~]$ unshare -U sleep inf & [1] 7126 [alice@rutledge ~]$ newuidmap $! 0 2002 1 newuidmap: uid range [0-1) -> [2002-2003) not allowed [alice@rutledge ~]$ newuidmap $! 0 $(id -u) 1 1 100000 65536 [alice@rutledge ~]$ newgidmap $! 0 $(id -g) 1 1 100000 65536 [alice@rutledge ~]$ cat /proc/$!/uid_map 0 2001 1 1 100000 65536 [alice@rutledge ~]$ cat /proc/$!/gid_map 0 2001 1 1 100000 65536
  • 30.
    SUID rootless • rootless •root • (200 ) • int overflow …… • uid_map/gid_map • e.g. user uid • ( 1 1 newuidmap …… )
  • 31.
    2: podman • podman( ) • tap0 • grep slirp4netns • tap0 • → TUN/TAP [alice@rutledge ~]$ podman exec -lit /bin/bash [root@1209b4cedd82 /]# curl -I 'https:// retrieva.jp/' HTTP/1.1 200 OK : [root@1209b4cedd82 /]# yum install -y iproute [root@934bf6e4252b /]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue ... : 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel ... : [root@934bf6e4252b /]# exit [alice@rutledge ~]$ ps aux | grep tap0 alice 11881 0.0 0.2 4592 1856 pts/0 S 19:22 0:00 /usr/bin/slirp4netns -c -e 3 - r 4 11870 tap0 [alice@rutledge ~]$ kill 11870 [alice@rutledge ~]$ podman exec -it $(podman ps -ql) ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue ... :
  • 32.
    slirp4netns: slirp • slirpSLIP (Serial Line Internet Protocol) • SLIP PPP • net slirp4netns • QEMU • IP • default route: 10.0.2.2/24 • DNS forward: 10.0.2.3 • DHCP addresses: 10.0.2.15 - 10.0.2.31 [alice@rutledge ~]$ podman exec -lit /bin/bash [root@934bf6e4252b /]# curl 'https://retrieva.jp/' -I HTTP/1.1 200 OK : [root@a041f01d3221 /]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP>... link/loopback 00:00:00:00:00:00 brd ... inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tap0: <BROADCAST,UP,LOWER_UP>... link/ether 0e:3c:3c:65:d9:82 brd ... inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 valid_lft forever preferred_lft forever inet6 fe80::c3c:3cff:fe65:d982/64 scope link valid_lft forever preferred_lft forever [root@a041f01d3221 /]# ip route default via 10.0.2.2 dev tap0 10.0.2.0/24 dev tap0 proto kernel scope link src 10.0.2.100 [root@934bf6e4252b /]# exit
  • 33.
    slirp4netns: slirp netns • root •net • SUID • RHEL8 slirp listen • slirp4netns-0.1-2 bind [alice@rutledge ~]$ ls -l $(which slirp4netns) -rwxr-xr-x. 1 root root 76264 8 11 2018 / usr/bin/slirp4netns [alice@rutledge ~]$ podman run -p 10080:80 centos:centos7 port bindings are not yet supported by rootless containers [alice@rutledge ~]$ rpm -q slirp4netns slirp4netns-0.1-1.dev.gitc4e1bc5.el8+1463+3d8a3 dce.x86_64
  • 34.
    3: CoW • OS •CoW(Copy-on-Write) + • Docker dm-thin overlayfs root • podman info • GraphDriverName vfs • GraphRoot ~/.local/ storage • RunRoot /run/user/${UID}/run • RunRoot bind (hosts resolve.conf ) GraphRoot • vfs-layers/mountpoints.json [alice@rutledge ~]$ podman info : store: ContainerStore: number: 1 GraphDriverName: vfs GraphOptions: [] GraphRoot: /home/alice/.local/share/ containers/storage GraphStatus: {} ImageStore: number: 1 RunRoot: /run/user/2001/run [alice@rutledge ~]$ find /run/user/2001/run : /run/user/2001/run/vfs-containers/d1ab...eefd /run/user/2001/run/vfs-containers/d1ab...eefd/ userdata : /run/user/2001/run/vfs-layers /run/user/2001/run/vfs-layers/mountpoints.json
  • 35.
    podman (vfs) • mountpoints.json • •jack 10999 (uid_map) • [alice@rutledge ~]$ jq '.[].path' /run/user/ 2001/run/vfs-layers/mountpoints.json "/home/alice/.local/share/containers/storage/ vfs/dir/aeaa...458a" [alice@rutledge ~]$ ll /home/alice/.local/ share/containers/storage/vfs/dir/aeaa...458a/ total 16 -rw-r--r--. 1 alice alice 12082 Mar 6 02:36 anaconda-post.log lrwxrwxrwx. 1 alice alice 7 Mar 6 02:34 bin -> usr/bin drwxr-xr-x. 2 alice alice 6 Mar 6 02:34 dev [alice@rutledge ~]$ ll /home/alice/.local/ share/containers/storage/vfs/dir/aeaa...7458a/ home/ total 0 drwx------. 2 100999 100999 62 Apr 15 21:19 jack
  • 36.
    podman(vfs) • centos:centos7 210M • 10 210M*10=2G •CoW • → 2G • CoW [alice@rutledge ~]$ du -sh .local/share/ containers/storage/vfs/dir/aeaa...7458a/ 210M .local/share/containers/storage/vfs/ dir/aeaa...458a/ [alice@rutledge ~]$ df -h .local/share/ containers/storage/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-home 20G 4.2G 16G 21% / home [alice@rutledge ~]$ seq 10 | xargs -I{} podman run -d centos:centos7 sleep inf [alice@rutledge ~]$ df -h .local/share/ containers/storage/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-home 20G 6.3G 14G 32% / home
  • 37.
    fuse-overlayfs(1) • vfs • fuse-overlayfsuser overlayfs • ~/.config/containers/storage.conf • storage.driver="overlay" • storage_options.mount_program="/usr/ bin/fuse-overlayfs" • podman storage • vfs XFS reflink shallow copy/CoW • orz [alice@rutledge ~]$ podman rm -f --all [alice@rutledge ~]$ podman rmi -f --all [alice@rutledge ~]$ su -c 'rm /home/ alice/.local/' charlotte # [alice@rutledge ~]$ mkdir -p .config/ containers/ [alice@rutledge ~]$ cat .config/containers/ storage.conf [storage] driver = "overlay" [storage.options] mount_program = "/usr/bin/fuse-overlayfs"
  • 38.
    podman with fuse-overlayfs •/ fuse-overlayfs • ~/.local/share/ containers/storage/*/ overlayfs • diff: CoW • work: overlayfs • merged: overlayfs • • → mnt [alice@rutledge ~]$ podman run -d centos:centos7 sleep inf [alice@rutledge ~]$ podman exec -l findmnt / TARGET SOURCE FSTYPE OPTIONS / fuse-overlayfs fuse.fuse-overlayfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,def ault_permissions,allow_other [alice@rutledge ~]$ ll /home/alice/.local/share/ containers/storage/overlay/* /home/alice/.local/share/containers/storage/ overlay/ 2bbb2f38cf08544b67e60954e9da373c67f2d5658a7e6a074 afc5818c9805ebe: 8 drwxr-xr-x. 4 alice alice 28 4 16 23:13 diff -rw-r--r--. 1 alice alice 26 4 16 23:13 link -rw-rw-r--. 1 alice alice 28 4 16 23:13 lower drwx------. 2 alice alice 6 4 16 23:13 merged drwx------. 3 alice alice 18 4 16 23:13 work :
  • 39.
    rootless • su (uid_map) • newuidmap(1) / newgidmap(1) (SUID ) • net (veth ) • slirp4netns ! • (bind ) • bind overlayfs (CoW ) • fuse-overlayfs nserns • XFS reflink
  • 40.
    : rootless 1. 2. user+ mnt + net 3. [NEW] newuidmap(1) / newgidmap(1) 4. [UPDATE] pivot_root bind fuse-overlayfs 5. oldroot 6. [NEW] fuse-overlayfs pivot_root mnt • • dev/ console tty bind mount sys/ proc/ 7. pivot_root 8. oldroot exec chroot 9. oldroot lazy umount 10.[NEW] slirp4userns 11.[NEW] ip route
  • 41.
    • Rootless • https://www.slideshare.net/AkihiroSuda/rootless •Namespaces in operation, part 1: namespaces overview [LWN.net] • https://lwn.net/Articles/531114/ • Namespaces in operation, part 5: User namespaces [LWN.net] • https://lwn.net/Articles/532593/ • Filesystem mounts in user namespaces [LWN.net] • https://lwn.net/Articles/652468/ • Anatomy of a user namespaces vulnerability [LWN.net] • https://lwn.net/Articles/543273/ • Man page of USER_NAMESPACES • https://linuxjm.osdn.jp/html/LDP_man-pages/man7/ user_namespaces.7.html • util-linux/unshare.c at master · karelzak/util-linux • https://github.com/karelzak/util-linux/blob/master/sys-utils/ unshare.c • shadow/newuidmap.c at master · shadow-maint/shadow • https://github.com/shadow-maint/shadow/blob/master/src/ newuidmap.c • hnakamur’s blog: QEMU Wiki Slirp Tap • http://hnakamur.blogspot.com/2009/08/qemu-wikislirptap.html • slirp4netns/main.c at master · rootless-containers/slirp4netns • https://github.com/rootless-containers/slirp4netns/blob/master/ main.c • Working with the Container Storage library and tools in Red Hat Enterprise Linux • https://www.redhat.com/en/blog/working-container-storage- library-and-tools-red-hat-enterprise-linux • The State of Rootless Containers • https://www.slideshare.net/AkihiroSuda/the-state-of-rootless- containers