SlideShare a Scribd company logo
1 of 44
Download to read offline
Linux Containers
with focus on namespaces
created December 2014 for SUSE Linux Expert Forum
Ralf Dannert
Systems Engineer
rdannert@suse.com
2
Agenda
• Containers – clean slate approach
• Linux namespaces
3
Container examples
‒ Non-Linux:
‒ Solaris Containers(Zones), FreeBSD jails, WPAR(AIX)
‒ Linux:
‒ Vserver, OpenVZ, and FreeVPS
‒ Out of tree
‒ Process containers:
‒ OpenAFS's PAGs process authentication group membership
‒ Inheritance through fork()
‒ Cached token used for access control
‒ http://docs.openafs.org/AdminGuide/ch02s10.html
‒ Process containers: http://lwn.net/Articles/236529/
‒ Plan9:
‒ Everything as a filesystem(naming, access, protection methods)
‒ per-process namespaces
4
Linux containers - a conceptual
artifice
‒ Namespaces
‒ Isolation, virtualization
‒ clone() and unshare()
‒ Resource containers
‒ manage the use of resources outside the operating system
‒ disk, network, memory and processor
‒ cgroups
‒ Capability bounding sets
‒ divide the privileges traditionally associated with superuser into distinct units
‒ limit the privilege available to containers, CAP_SYS_ADMIN
‒ Checkpoint/restart
‒ Requires former
Containers – clean slate approach
6
Looking forward..
‒ 16 Aug 2006 Andrew Morton
‒ “Generally, I am not very comfortable merging any
namespace/containerization/resource management patches into
mainline until we have some sort of high- level agreed-to roadmap
which will take us to an agreed-to-at-a-high-level destination.
‒ Now, I _am_ OK with merging useless infrastructure as long as all
the prime stakeholders are OK with it. ..
‒ That would not be a useful patchset on its own because nothing
_uses_ it..
‒ We don't normally merge useless patches, but this is a special
case.
‒ So, (policy making on the fly), let's start merging the well-tested,
well-isolated, low-overhead generally-agreed-to features into
mainline.”
7
Multiple Instances of the Global Linux
Namespaces(2006)
Eric W. Biederman, Linux Networx
‒ By adding additional namespaces .. we can, at a trivial cost,
extend the UNIX concept and make novel uses of Linux
possible
‒ Multiple instances of a namespace simply means that you can
have two things with the same name.
‒ Implementation: allow an application with capability full control
over a namespace and still not be able to escape
‒ https://www.kernel.org/doc/ols/2006/ols2006v1-pages-101-112.pdf
8
Historie
http://www.golem.de/0610/48351.html
9
Coordinated Efforts 2007
Companies And Individuals Involved
‒ Arista Networks(Arastra): Eric Biedermann - all, initial approach
‒ SGI: Paul Jackson - original cpusets, now part of cgroups
‒ Linux-VServer: Herbert Poetzl - namespaces, containers
‒ Openvz: Pavel Emelyanov, Kir Kolyshkin
‒ Google: Paul Menage - task containers, cgroups
‒ Zap project: Oren Ladaan - C/R
‒ IBM: Serge E. Hallyn, Dave Hanson, Cedric Le Goater, Daniel Lezcano -
ns, C/R, Balbir Singh, Srivatsa Vaddagiri - task containers
‒ Others: NEC, XtreemOS, kerlabs, Bull, HP, planetlab
‒ Source: container mailing list - containers development plans (Aug 8 2007)
10
Coordinated Efforts
‒ post anything container-related to containers mailinglist, before
any attempts to send it upstream - containers@lists.osdl.org
‒ make sure what is in -mm fits openvz, VServer and other
products
‒ make sure initial framework also fits requirements of basic
resource management system
Linux Namespaces
12
Namespaces
• Namespaces - lightweight process virtualization
• Isolation: Enable a process (or several processes) to
have different views of the system than other
processes
• Currently 6 namespaces:
‒ mnt, pid, net, ipc, uts, user
‒ 4 more planned..(2006)
‒ security namespace
‒ security keys namespace
‒ device namespace
‒ time namespace
13
Mount namespace
‒ Mount namespace first type by Al Viro, 2002
‒ Kernel 2.4.19
‒ CLONE_NEWNS
‒ 6 CLONE_NEW * flags were added (include/linux/sched.h)
‒ These flags (or a combination of them) can be used in clone()
or unshare() syscalls to create a namespace
14
Clone() flags
‒ CLONE_NEWNS 2.4.19 CAP_SYS_ADMIN
‒ CLONE_NEWUTS 2.6.19 CAP_SYS_ADMIN
‒ CLONE_NEWIPC 2.6.19 CAP_SYS_ADMIN
‒ CLONE_NEWPID 2.6.24 CAP_SYS_ADMIN
‒ CLONE_NEWNET 2.6.29 CAP_SYS_ADMIN
‒ CLONE_NEWUSER 3.8 No capability is required
15
Namespace: Systemcalls
‒ 3 system calls are used
‒ clone()
‒ Creates new process and a new namespace, attach process to ns
‒ unshare()
‒ new namespace, attach current process to it
‒ reverses sharing that was done using clone(2) system call(2005)
‒ setns(int fd, int nstype)
‒ join an existing namespace
16
• no parameter of a namespace name
• 6 entries (inodes) added under /proc/<pid>/ns
‒ Kernel 3.8
• Nsproxy
• Kernel config items:
‒ CONFIG_UTS_NS
‒ CONFIG_IPC_NS
‒ CONFIG_USER_NS
‒ CONFIG_PID_NS
‒ CONFIG_NET_NS
17
Namespace: User space additions
‒ nsenter(util-linux >= 2.23)
‒ wrapper around setns
‒ allows running a new process in context of existing process
‒ iproute
‒ ip netns
‒ add, del, exec
‒ util-linux
‒ unshare
‒ All 6 namespaces
18
UTS namespace
‒ Uts - Unix timesharing
‒ new_utsname struct:
‒ sysname, nodename, release, version, machine, domainname
‒ CLONE_NEWUTS
‒ Since 2.6.19
‒ Initial usecase: vserver/openvz - clone a new uts namespace
for each new virtual server
‒ http://lwn.net/Articles/179345/
‒ Demo: unshare -u /bin/bash
19
IPC namespace
‒ same principle as uts
‒ process will have independent namespace for System V
message queues, semaphore sets and shared memory
segments
‒ CONFIG_IPC_NS, CONFIG_SYSVIPC
‒ CLONE_NEWIPC flag:
‒ since 2.4.19
20
Network namespace
‒ A network namespace is logically another copy of the network
stack, with its own routes, firewall rules, and network devices
‒ a network device belongs to exactly one network namespace
‒ a socket belongs to exactly one network namespace
‒ a new network namespace only includes the loopback device
‒ communication between namespaces using veth or unix
sockets
21
Network namespace: Usecases
‒ Turn off network inside namespace:
‒ ensure that processes running there will be unable to make connections
outside of namespace
‒ i.e.:spam, botnets
‒ Restricted namespace:
‒ Even processes that handle network traffic (a web server worker process or
web browser rendering process for example) can be placed into a restricted
namespace
‒ Namespace without network devices
‒ make impossible for child or worker processes to make additional network
connections
‒ http://lwn.net/Articles/580893/
22
Network namespace
‒ man ip-netns
‒ ip netns add <net_ns>
‒ creates /var/run/netns/tns0
‒ ip netns exec NAME cmd ... - Run cmd in the named network namespace
‒ /etc/netns/<net_ns>/resolv.conf overrides /etc/resolv.conf
‒
‒ Communicate between net ns by
‒ creating a pair of network devices (veth) and move one to another network
namespace
network namespaces demo
24
Network namespace example
Move a VPN connection to its own namespace
‒ ip netns add tns0
‒ mkdir /etc/netns/tns0
‒ openconnect -s /etc/vpnc/vpnc-script <your-vpn-network>
‒ ip link set dev tun0 netns tns0
‒ #example: VPN_IP_ADDRESS=`ip a|grep 149|sed -e 's/..*149/149/' -e 's#/32.*##'`
‒ ip netns exec tns0 ip addr add $VPN_IP_ADDRESS dev tun0
‒ ip netns exec tns0 ip link set tun0 up
‒ ip netns exec tns0 ip link set lo up
‒ #test: ip netns exec tns0 ping $VPN_IP_ADDRESS
‒ #ip netns exec tns0 ip route restore </tmp/ip-route-save-vpn
‒ ip route|sed -e 's/ [scope|proto].*//' -e 's/^/ip route add /g' >/tmp/ip-route-add
‒ chmod 755 /tmp/ip-route-add
‒ ip netns exec tns0 /tmp/ip-route-add
‒ #test: ip netns exec tns0 ip route
‒ echo nameserver <your_VPN_specific_nameserver> >/etc/netns/tns0/resolv.conf
‒ ip netns exec tns0 cat /etc/resolv.conf
‒ ip netns exec tns0 wget <IP_ADDRESS_only_available_via_VPN>
25
User namespace
‒ only namespace which can be created without CAP_SYS_ADMIN capability
‒ A process will have distinct set of UIDs, GIDs and capabilities
‒ User namespaces allow per-namespace mappings of user and group IDs.
‒ users and groups may have privileges for certain operations inside the
container without having those privileges outside the container
‒ Capabilities
‒ have root privileges for operations inside the container only
‒ map user IDs on the host system to corresponding user IDs in the
namespace
‒ Since 3.8 complete
‒ aving a full set of caps in your local user namespace is safe
‒ user namespace root users can create network namespaces
User namespaces demo
27
User namespaces demo
‒ as demo user:
‒ unshare --net --user /bin/bash
‒ nobody@sles12rc3:~> echo $$
‒ 4016
‒ as root user:
‒ cat /proc/4016/uid_map
‒ #empty
‒ #ID-inside-ns ID-outside-ns length
‒ echo 0 1000 10 >
/proc/4016/uid_map
‒ echo 0 100 10 >
/proc/4016/gid_map
‒ as demo user:
‒ nobody@sles12rc3:~> id
‒ uid=0(root) gid=0(root)
groups=0(root)
‒ nobody@sles12rc3:~> whoami
‒ root
‒ nobody@sles12rc3:~> ls -la
/root/
‒ ls: cannot open directory /root/:
Permission denied
http://man7.org/linux/man-pages/man7/user_namespaces.7.html
Appendix
Advanced Container examples
30
cgroup only container
‒ One of the cgroup only container uses we see@Parallels (so no separate
filesystem and no net namespaces) is pure apache load balancer type
shared hosting. In this scenario, base apache is effectively brought up in
the host environment, but then spawned instances are resource limited
using cgroups according to what the customer has paid.
‒ Obviously all apache instances are sharing /var and /run from the host
(mostly for logging and pid storage and static pages). The reason some
hosters do this is that it allows much higher density simple web serving
(either static pages from quota limited chroots or dynamic pages limited by
database space constraints) because each "instance" shares so much from
the host. The service is obviously much more basic than giving each
customer a container running apache, but it's much easier for the hoster to
administer and it serves the customer just as well for a large cross section
of use cases and for those it doesn't serve, the hoster uall has separate
container hosting (for a higher price, of course).
‒ systemd-devel ml: Sun, 25 Aug 13, 19:16 CEST James Bottomley
31
PaaS SaaS Container
‒ I gave you one example: a really simplistic one. A more sophisticated
example is a PaaS or SaaS container where you bring the OS up in the host
but spawn a particular application into its own container (this is essentially
similar to what Docker does). Often in this case, you do add separate
mount and network namespaces to make the application isolated and
migrateable with its own IP address. The reason you share init and most of
the OS from the host is for elasticity and density, which are fast becoming a
holy grail type quest of cloud orchestration systems: if you don't have to
bring up the OS from init and you can just start the application from a C/R
image (orders of magnitude smaller than a full system image) and slap on
the necessary namespaces as you clone it, you have something that comes
online in miliseconds which is a feat no hypervisor based virtualisation can
match.
‒ systemd-devel ml, Sun, 25 Aug 13, 20:16 CEST James Bottomley
32
tidbits
‒ mboxgrep namespace systemd-devel201*
‒ It sounds like you're setting up your containers wrongly. If a container can
reboot the system it means that host root capabilities have leaked into the
container, which is a big security no-no. The upstream way of avoiding this
is USER_NS (because root in the container is now not root in the host).
The OpenVZ kernel uses a different mechanism to solve the problem, but
we think USER_NS is the better way to go on this.
‒ For launching new services in a container simply sending a message to the
init process is probably what you want. I think those messages already
traverse unix domain sockets so it insn't too shabby.
‒
33
tidbits
‒ mboxgrep namespace systemd-devel201*
‒ Feb 2014
‒ > FYI I have succesfully run Fedora 19 with systemd inside a container
‒ > with libvirt LXC, however, I did *not* enable user namespaces. Every
‒ > time I try user namespaces I find some other bug in either the kernel
‒ > or libvirt, so I wouldn't be surprised if yet more breakage has
‒ > occurred in user namepsaces :-(
‒ Those bugs should now be fixed, if you don't enable the option, how are we
supposed to know what is left to be done? :)
34
tidbits
‒ https://lkml.org/lkml/2013/4/25/596
‒ > Final question, is it by design that uid 0 within a namespace in not
‒ > allowed to write to
‒ > /proc/*/oom_score_adj?
‒
‒ Essentially. It is by design that uid 0 within a namespace be mapped to some
other uid outside the namespace, and that the permissions on writes should use
the permission needed outside of the user namespace.
‒ Which means there are all kinds of things only uid 0 can write to, that you can't
touch in a user namespace. Some of those things the policy may need to be
reconsidered. A lot of those things the default policy is good. Regardless we are
now defaulting to not letting root in a container do risky things which is a good
thing.
‒ Eric
35
Capabilities
‒ http://man7.org/linux/man-pages/man7/user_namespaces.7.html
‒ The child process created by clone(2) with the CLONE_NEWUSER flag starts out
with a complete set of capabilities in the new user namespace. Likewise, a
process that creates a new user namespace using unshare(2) or joins an existing
user namespace using setns(2) gains a full set of capabilities in that namespace.
On the other hand, that process has no capabilities in the parent (in the case of
clone(2)) or previous (in the case of unshare(2) and setns(2)) user namespace,
even if the new namespace is created or joined by the root user (i.e., a process
with user ID 0 in the root namespace).
‒ Note that a call to execve(2) will cause a process's capabilities to be recalculated
in the usual way (see capabilities(7)), so that usually, unless it has a user ID of 0
within the namespace or the executable file has a nonempty inheritable
capabilities mask, it will lose all capabilities.
‒ Having a capability inside a user namespace permits a process to perform
operations (that require privilege) only on resources governed by that namespace.
36
Socketat - network namespaces
‒ http://lwn.net/Articles/407615/
‒ The use case are applications are the handful of networking applications that find that it
makes sense to listen to sockets from multiple network namespaces at once. Say a
home machine that has a vpn into your office network and the vpn into the office network
runs in a different network namespace so you don't have to worry about address conflicts
between the two networks, the chance of accidentally bridging between them, and so you
can use different dns resolvers for the different networks.
‒ In that scenario it would be nice if I could run some services on both networks. Starting
two+ copies of the daemons just so the can have live in all of the networks is ok, but in the
fullness of time I expect that there will be daemons that want to optimize things and have
sockets in all of the network namespaces you are connected to.
‒ In a multiple network namespace aware application when it goes to open a socket it will
want to specify which network namespace the socket is in. If it is a general listener it will
probably listening to events in /proc/mounts waiting for extra namespaces to be mounted
under a standard location say: /var/run/netns/<netnsname>/ns.
‒ Once the application receives the event for a new network namespace showing up it can will
want to create a new socket listening for connections in the new network namespace.
‒ In that scenario none of those network namespaces are foreign, but one network
namespace will be the default and the rest will be non-default network namespaces.
37
socketat
‒ http://lists.openvz.org/pipermail/devel/2010-October/025720.html
‒ [Devel] Re: [PATCH 8/8] net: Implement socketat.
‒ Just to clarify this point. You enter the namespace, create the socket and go back
to the initial namespace (or create a new one). Further operations can be made
against this fd because it is the network namespace stored in the sock struct
which is used, not the current process network namespace which is used at the
socket creation only.
‒ We can actually already do that by unsharing and then create a socket. This
socket will pin the namespace and can be used as a control socket for the
namespace (assuming the socket domain will be ok for all the operations).
‒ .. if I assume you want to create a process controlling 1024 netns, let's try to
identificate what happen with setns and with socketat :
‒ With setns:
‒ * open /proc/self/ns/net (1)
‒ * unshare the netns
‒ * open /proc/self/ns/net (2)
‒ * setns (1)
‒ * create a virtual network device
‒ * move the virtual device to (2) (using the set netns by fd)
38
socketat
‒ http://lists.openvz.org/pipermail/devel/2010-October/025736.html
‒ > The app control point is in namespace0. I still want to be able to
‒ > "boot" namespaces first and maybe a few seconds later do a socketat()...
‒ > and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
‒ > would involve:
‒ > * open /proc/self/ns/net (namespace-name)
‒ > * unshare the netns
‒ > Is this correct?
‒
‒ Almost.
‒ create should be:
‒ * verify namespace-name is not already in use
‒ * mkdir -p /var/run/netns/<namespace-name>
‒ * unshare the netns
‒ * mount --bind /proc/self/ns/net /var/run/netns/<namespace-name>
39
Operating system–level virtualization
Stand: 30.11.2014
http://en.wikipedia.org/wiki/Operating_system-level_virtualization
40
References – old
‒ Paul B. Menage. Adding Generic Process Containers to the Linux Kernel. Proceedings
of the Ottawa Linux Symposium, 2007.
‒ http://www.kernel.org/doc/ols/2007/ols2007v2-pages-45-58.pdf
‒ Linux-CR: Transparent Application Checkpoint-Restart in Linux
‒ http://www1.cs.columbia.edu/~orenl/papers/ols2010-linuxcr.pdf
‒ Making applications mobile using containers
‒ http://lxc.sourceforge.net/doc/ols2006/lxc-ols2006-slides.pdf
‒ Virtual Servers and Checkpoint/Restart in Mainstream Linux
‒ describes the general namespace support in Linux and its usage
‒ Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating
Systems -Oren Laadan
‒ Source: Operating System Virtualization: Practice and Experience Oren
Ladaan(systor2010_osvirt.pdf)
41
References
‒
‒ http://lwn.net/Articles/531114/#series_index
‒ Namespaces in operation, 6 part series by Michael Kerrisk
‒ https://github.com/bigbighd604/C-Notes
‒ demo codes git from namespace series
‒ www.haifux.org/lectures/299/netLec7.pdf (Rami Rosen, 2013)
‒ https://www.kernel.org/doc/ols/2006/ols2006v1-pages-101-112.pdf (Biederman)
‒ http://books.google.de/books?id=RpsQAwAAQBAJ&pg=PA424&lpg=PA423&ots=
rAqP4sxMXn&focus=viewport&dq=Rami+Rosen+network+namespaces&hl=de
‒ Linux Kernel Networking(Rami Rosen)
‒ http://www.makelinux.net/kernel_map/
‒ http://en.wikipedia.org/wiki/Operating_system-level_virtualization
‒ /usr/src/linux/Documentation/unshare.txt
‒ How to find namespaces in a Linux system
‒ http://www.opencloudblog.com/?p=251
42
Corporate Headquarters
Maxfeldstrasse 5
90409 Nuremberg
Germany
+49 911 740 53 0 (Worldwide)
www.suse.com
Join us on:
www.opensuse.org
43
Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of
their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,
and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at
any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in
this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All
third-party trademarks are the property of their respective owners.

More Related Content

What's hot

Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Dobrica Pavlinušić
 
Linuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best PracticesLinuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best Practiceschristophm
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
Introduction to linux containers
Introduction to linux containersIntroduction to linux containers
Introduction to linux containersGoogle
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Boden Russell
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the CloudPavel Odintsov
 
Docker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme PetazzoniDocker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme PetazzoniDocker, Inc.
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Jérôme Petazzoni
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroupsKernel TLV
 
Union FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerUnion FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerKnoldus Inc.
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyBoden Russell
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni
 
Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7Etsuji Nakai
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?Jérôme Petazzoni
 
Lxc- Linux Containers
Lxc- Linux ContainersLxc- Linux Containers
Lxc- Linux Containerssamof76
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationEtsuji Nakai
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 

What's hot (20)

Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Linuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best PracticesLinuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best Practices
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Introduction to linux containers
Introduction to linux containersIntroduction to linux containers
Introduction to linux containers
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the Cloud
 
Docker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme PetazzoniDocker storage drivers by Jérôme Petazzoni
Docker storage drivers by Jérôme Petazzoni
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
 
Union FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerUnion FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a Container
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
Lxc- Linux Containers
Lxc- Linux ContainersLxc- Linux Containers
Lxc- Linux Containers
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 

Viewers also liked

Virtual Server Virtual Server
Virtual Server Virtual ServerVirtual Server Virtual Server
Virtual Server Virtual Serverwebhostingguy
 
Boost introduction
Boost introductionBoost introduction
Boost introductionrockoder
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016Phil Estes
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of DockerSpeedyCloud
 
Linux network namespaces
Linux network namespacesLinux network namespaces
Linux network namespacesMike Wilson
 
Docker Security - Secure Container Deployment on Linux
Docker Security - Secure Container Deployment on LinuxDocker Security - Secure Container Deployment on Linux
Docker Security - Secure Container Deployment on LinuxMichael Boelen
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 

Viewers also liked (10)

Linux Namespace
Linux NamespaceLinux Namespace
Linux Namespace
 
Virtual Server Virtual Server
Virtual Server Virtual ServerVirtual Server Virtual Server
Virtual Server Virtual Server
 
Boost introduction
Boost introductionBoost introduction
Boost introduction
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of Docker
 
Linux network namespaces
Linux network namespacesLinux network namespaces
Linux network namespaces
 
LSA2 - 02 Namespaces
LSA2 - 02  NamespacesLSA2 - 02  Namespaces
LSA2 - 02 Namespaces
 
Linux Namespaces
Linux NamespacesLinux Namespaces
Linux Namespaces
 
Docker Security - Secure Container Deployment on Linux
Docker Security - Secure Container Deployment on LinuxDocker Security - Secure Container Deployment on Linux
Docker Security - Secure Container Deployment on Linux
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 

Similar to Linux containers-namespaces(Dec 2014)

lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf-
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolationallingeek
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.Chafik Belhaoues
 
Rootless Containers & Unresolved issues
Rootless Containers & Unresolved issuesRootless Containers & Unresolved issues
Rootless Containers & Unresolved issuesAkihiro Suda
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespacesKiratech
 
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...Thomas Fricke
 
Docker introduction
Docker introductionDocker introduction
Docker introductionLayne Peng
 
POUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptxPOUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptxMahmoud Hatem
 
Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Alex Cercel
 
Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Antonios Giannopoulos
 
How containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go liveHow containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go liveRamon Navarro
 
Unix Security
Unix SecurityUnix Security
Unix Securityreplay21
 
Linux training
Linux trainingLinux training
Linux trainingartisriva
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten EmpireYogesh Sharma
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudSalman Baset
 

Similar to Linux containers-namespaces(Dec 2014) (20)

lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolation
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
Rootless Containers & Unresolved issues
Rootless Containers & Unresolved issuesRootless Containers & Unresolved issues
Rootless Containers & Unresolved issues
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
 
Rhel1
Rhel1Rhel1
Rhel1
 
Tuan Q. Phan - WESST - Getting Started on the Computational Social Sciences
Tuan Q. Phan - WESST - Getting Started on the Computational Social SciencesTuan Q. Phan - WESST - Getting Started on the Computational Social Sciences
Tuan Q. Phan - WESST - Getting Started on the Computational Social Sciences
 
Solaris basics
Solaris basicsSolaris basics
Solaris basics
 
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
Endocode Kubernetes Meetup: Architecture Patterns for Microservices in Kubern...
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
POUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptxPOUG2022_OracleDbNestInsideOut.pptx
POUG2022_OracleDbNestInsideOut.pptx
 
Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018
 
Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018
 
How containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go liveHow containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go live
 
Basic orientation to Linux
Basic orientation to LinuxBasic orientation to Linux
Basic orientation to Linux
 
Unix Security
Unix SecurityUnix Security
Unix Security
 
Linux training
Linux trainingLinux training
Linux training
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten Empire
 
Linux
LinuxLinux
Linux
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Linux containers-namespaces(Dec 2014)

  • 1. Linux Containers with focus on namespaces created December 2014 for SUSE Linux Expert Forum Ralf Dannert Systems Engineer rdannert@suse.com
  • 2. 2 Agenda • Containers – clean slate approach • Linux namespaces
  • 3. 3 Container examples ‒ Non-Linux: ‒ Solaris Containers(Zones), FreeBSD jails, WPAR(AIX) ‒ Linux: ‒ Vserver, OpenVZ, and FreeVPS ‒ Out of tree ‒ Process containers: ‒ OpenAFS's PAGs process authentication group membership ‒ Inheritance through fork() ‒ Cached token used for access control ‒ http://docs.openafs.org/AdminGuide/ch02s10.html ‒ Process containers: http://lwn.net/Articles/236529/ ‒ Plan9: ‒ Everything as a filesystem(naming, access, protection methods) ‒ per-process namespaces
  • 4. 4 Linux containers - a conceptual artifice ‒ Namespaces ‒ Isolation, virtualization ‒ clone() and unshare() ‒ Resource containers ‒ manage the use of resources outside the operating system ‒ disk, network, memory and processor ‒ cgroups ‒ Capability bounding sets ‒ divide the privileges traditionally associated with superuser into distinct units ‒ limit the privilege available to containers, CAP_SYS_ADMIN ‒ Checkpoint/restart ‒ Requires former
  • 5. Containers – clean slate approach
  • 6. 6 Looking forward.. ‒ 16 Aug 2006 Andrew Morton ‒ “Generally, I am not very comfortable merging any namespace/containerization/resource management patches into mainline until we have some sort of high- level agreed-to roadmap which will take us to an agreed-to-at-a-high-level destination. ‒ Now, I _am_ OK with merging useless infrastructure as long as all the prime stakeholders are OK with it. .. ‒ That would not be a useful patchset on its own because nothing _uses_ it.. ‒ We don't normally merge useless patches, but this is a special case. ‒ So, (policy making on the fly), let's start merging the well-tested, well-isolated, low-overhead generally-agreed-to features into mainline.”
  • 7. 7 Multiple Instances of the Global Linux Namespaces(2006) Eric W. Biederman, Linux Networx ‒ By adding additional namespaces .. we can, at a trivial cost, extend the UNIX concept and make novel uses of Linux possible ‒ Multiple instances of a namespace simply means that you can have two things with the same name. ‒ Implementation: allow an application with capability full control over a namespace and still not be able to escape ‒ https://www.kernel.org/doc/ols/2006/ols2006v1-pages-101-112.pdf
  • 9. 9 Coordinated Efforts 2007 Companies And Individuals Involved ‒ Arista Networks(Arastra): Eric Biedermann - all, initial approach ‒ SGI: Paul Jackson - original cpusets, now part of cgroups ‒ Linux-VServer: Herbert Poetzl - namespaces, containers ‒ Openvz: Pavel Emelyanov, Kir Kolyshkin ‒ Google: Paul Menage - task containers, cgroups ‒ Zap project: Oren Ladaan - C/R ‒ IBM: Serge E. Hallyn, Dave Hanson, Cedric Le Goater, Daniel Lezcano - ns, C/R, Balbir Singh, Srivatsa Vaddagiri - task containers ‒ Others: NEC, XtreemOS, kerlabs, Bull, HP, planetlab ‒ Source: container mailing list - containers development plans (Aug 8 2007)
  • 10. 10 Coordinated Efforts ‒ post anything container-related to containers mailinglist, before any attempts to send it upstream - containers@lists.osdl.org ‒ make sure what is in -mm fits openvz, VServer and other products ‒ make sure initial framework also fits requirements of basic resource management system
  • 12. 12 Namespaces • Namespaces - lightweight process virtualization • Isolation: Enable a process (or several processes) to have different views of the system than other processes • Currently 6 namespaces: ‒ mnt, pid, net, ipc, uts, user ‒ 4 more planned..(2006) ‒ security namespace ‒ security keys namespace ‒ device namespace ‒ time namespace
  • 13. 13 Mount namespace ‒ Mount namespace first type by Al Viro, 2002 ‒ Kernel 2.4.19 ‒ CLONE_NEWNS ‒ 6 CLONE_NEW * flags were added (include/linux/sched.h) ‒ These flags (or a combination of them) can be used in clone() or unshare() syscalls to create a namespace
  • 14. 14 Clone() flags ‒ CLONE_NEWNS 2.4.19 CAP_SYS_ADMIN ‒ CLONE_NEWUTS 2.6.19 CAP_SYS_ADMIN ‒ CLONE_NEWIPC 2.6.19 CAP_SYS_ADMIN ‒ CLONE_NEWPID 2.6.24 CAP_SYS_ADMIN ‒ CLONE_NEWNET 2.6.29 CAP_SYS_ADMIN ‒ CLONE_NEWUSER 3.8 No capability is required
  • 15. 15 Namespace: Systemcalls ‒ 3 system calls are used ‒ clone() ‒ Creates new process and a new namespace, attach process to ns ‒ unshare() ‒ new namespace, attach current process to it ‒ reverses sharing that was done using clone(2) system call(2005) ‒ setns(int fd, int nstype) ‒ join an existing namespace
  • 16. 16 • no parameter of a namespace name • 6 entries (inodes) added under /proc/<pid>/ns ‒ Kernel 3.8 • Nsproxy • Kernel config items: ‒ CONFIG_UTS_NS ‒ CONFIG_IPC_NS ‒ CONFIG_USER_NS ‒ CONFIG_PID_NS ‒ CONFIG_NET_NS
  • 17. 17 Namespace: User space additions ‒ nsenter(util-linux >= 2.23) ‒ wrapper around setns ‒ allows running a new process in context of existing process ‒ iproute ‒ ip netns ‒ add, del, exec ‒ util-linux ‒ unshare ‒ All 6 namespaces
  • 18. 18 UTS namespace ‒ Uts - Unix timesharing ‒ new_utsname struct: ‒ sysname, nodename, release, version, machine, domainname ‒ CLONE_NEWUTS ‒ Since 2.6.19 ‒ Initial usecase: vserver/openvz - clone a new uts namespace for each new virtual server ‒ http://lwn.net/Articles/179345/ ‒ Demo: unshare -u /bin/bash
  • 19. 19 IPC namespace ‒ same principle as uts ‒ process will have independent namespace for System V message queues, semaphore sets and shared memory segments ‒ CONFIG_IPC_NS, CONFIG_SYSVIPC ‒ CLONE_NEWIPC flag: ‒ since 2.4.19
  • 20. 20 Network namespace ‒ A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices ‒ a network device belongs to exactly one network namespace ‒ a socket belongs to exactly one network namespace ‒ a new network namespace only includes the loopback device ‒ communication between namespaces using veth or unix sockets
  • 21. 21 Network namespace: Usecases ‒ Turn off network inside namespace: ‒ ensure that processes running there will be unable to make connections outside of namespace ‒ i.e.:spam, botnets ‒ Restricted namespace: ‒ Even processes that handle network traffic (a web server worker process or web browser rendering process for example) can be placed into a restricted namespace ‒ Namespace without network devices ‒ make impossible for child or worker processes to make additional network connections ‒ http://lwn.net/Articles/580893/
  • 22. 22 Network namespace ‒ man ip-netns ‒ ip netns add <net_ns> ‒ creates /var/run/netns/tns0 ‒ ip netns exec NAME cmd ... - Run cmd in the named network namespace ‒ /etc/netns/<net_ns>/resolv.conf overrides /etc/resolv.conf ‒ ‒ Communicate between net ns by ‒ creating a pair of network devices (veth) and move one to another network namespace
  • 24. 24 Network namespace example Move a VPN connection to its own namespace ‒ ip netns add tns0 ‒ mkdir /etc/netns/tns0 ‒ openconnect -s /etc/vpnc/vpnc-script <your-vpn-network> ‒ ip link set dev tun0 netns tns0 ‒ #example: VPN_IP_ADDRESS=`ip a|grep 149|sed -e 's/..*149/149/' -e 's#/32.*##'` ‒ ip netns exec tns0 ip addr add $VPN_IP_ADDRESS dev tun0 ‒ ip netns exec tns0 ip link set tun0 up ‒ ip netns exec tns0 ip link set lo up ‒ #test: ip netns exec tns0 ping $VPN_IP_ADDRESS ‒ #ip netns exec tns0 ip route restore </tmp/ip-route-save-vpn ‒ ip route|sed -e 's/ [scope|proto].*//' -e 's/^/ip route add /g' >/tmp/ip-route-add ‒ chmod 755 /tmp/ip-route-add ‒ ip netns exec tns0 /tmp/ip-route-add ‒ #test: ip netns exec tns0 ip route ‒ echo nameserver <your_VPN_specific_nameserver> >/etc/netns/tns0/resolv.conf ‒ ip netns exec tns0 cat /etc/resolv.conf ‒ ip netns exec tns0 wget <IP_ADDRESS_only_available_via_VPN>
  • 25. 25 User namespace ‒ only namespace which can be created without CAP_SYS_ADMIN capability ‒ A process will have distinct set of UIDs, GIDs and capabilities ‒ User namespaces allow per-namespace mappings of user and group IDs. ‒ users and groups may have privileges for certain operations inside the container without having those privileges outside the container ‒ Capabilities ‒ have root privileges for operations inside the container only ‒ map user IDs on the host system to corresponding user IDs in the namespace ‒ Since 3.8 complete ‒ aving a full set of caps in your local user namespace is safe ‒ user namespace root users can create network namespaces
  • 27. 27 User namespaces demo ‒ as demo user: ‒ unshare --net --user /bin/bash ‒ nobody@sles12rc3:~> echo $$ ‒ 4016 ‒ as root user: ‒ cat /proc/4016/uid_map ‒ #empty ‒ #ID-inside-ns ID-outside-ns length ‒ echo 0 1000 10 > /proc/4016/uid_map ‒ echo 0 100 10 > /proc/4016/gid_map ‒ as demo user: ‒ nobody@sles12rc3:~> id ‒ uid=0(root) gid=0(root) groups=0(root) ‒ nobody@sles12rc3:~> whoami ‒ root ‒ nobody@sles12rc3:~> ls -la /root/ ‒ ls: cannot open directory /root/: Permission denied http://man7.org/linux/man-pages/man7/user_namespaces.7.html
  • 30. 30 cgroup only container ‒ One of the cgroup only container uses we see@Parallels (so no separate filesystem and no net namespaces) is pure apache load balancer type shared hosting. In this scenario, base apache is effectively brought up in the host environment, but then spawned instances are resource limited using cgroups according to what the customer has paid. ‒ Obviously all apache instances are sharing /var and /run from the host (mostly for logging and pid storage and static pages). The reason some hosters do this is that it allows much higher density simple web serving (either static pages from quota limited chroots or dynamic pages limited by database space constraints) because each "instance" shares so much from the host. The service is obviously much more basic than giving each customer a container running apache, but it's much easier for the hoster to administer and it serves the customer just as well for a large cross section of use cases and for those it doesn't serve, the hoster uall has separate container hosting (for a higher price, of course). ‒ systemd-devel ml: Sun, 25 Aug 13, 19:16 CEST James Bottomley
  • 31. 31 PaaS SaaS Container ‒ I gave you one example: a really simplistic one. A more sophisticated example is a PaaS or SaaS container where you bring the OS up in the host but spawn a particular application into its own container (this is essentially similar to what Docker does). Often in this case, you do add separate mount and network namespaces to make the application isolated and migrateable with its own IP address. The reason you share init and most of the OS from the host is for elasticity and density, which are fast becoming a holy grail type quest of cloud orchestration systems: if you don't have to bring up the OS from init and you can just start the application from a C/R image (orders of magnitude smaller than a full system image) and slap on the necessary namespaces as you clone it, you have something that comes online in miliseconds which is a feat no hypervisor based virtualisation can match. ‒ systemd-devel ml, Sun, 25 Aug 13, 20:16 CEST James Bottomley
  • 32. 32 tidbits ‒ mboxgrep namespace systemd-devel201* ‒ It sounds like you're setting up your containers wrongly. If a container can reboot the system it means that host root capabilities have leaked into the container, which is a big security no-no. The upstream way of avoiding this is USER_NS (because root in the container is now not root in the host). The OpenVZ kernel uses a different mechanism to solve the problem, but we think USER_NS is the better way to go on this. ‒ For launching new services in a container simply sending a message to the init process is probably what you want. I think those messages already traverse unix domain sockets so it insn't too shabby. ‒
  • 33. 33 tidbits ‒ mboxgrep namespace systemd-devel201* ‒ Feb 2014 ‒ > FYI I have succesfully run Fedora 19 with systemd inside a container ‒ > with libvirt LXC, however, I did *not* enable user namespaces. Every ‒ > time I try user namespaces I find some other bug in either the kernel ‒ > or libvirt, so I wouldn't be surprised if yet more breakage has ‒ > occurred in user namepsaces :-( ‒ Those bugs should now be fixed, if you don't enable the option, how are we supposed to know what is left to be done? :)
  • 34. 34 tidbits ‒ https://lkml.org/lkml/2013/4/25/596 ‒ > Final question, is it by design that uid 0 within a namespace in not ‒ > allowed to write to ‒ > /proc/*/oom_score_adj? ‒ ‒ Essentially. It is by design that uid 0 within a namespace be mapped to some other uid outside the namespace, and that the permissions on writes should use the permission needed outside of the user namespace. ‒ Which means there are all kinds of things only uid 0 can write to, that you can't touch in a user namespace. Some of those things the policy may need to be reconsidered. A lot of those things the default policy is good. Regardless we are now defaulting to not letting root in a container do risky things which is a good thing. ‒ Eric
  • 35. 35 Capabilities ‒ http://man7.org/linux/man-pages/man7/user_namespaces.7.html ‒ The child process created by clone(2) with the CLONE_NEWUSER flag starts out with a complete set of capabilities in the new user namespace. Likewise, a process that creates a new user namespace using unshare(2) or joins an existing user namespace using setns(2) gains a full set of capabilities in that namespace. On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of unshare(2) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace). ‒ Note that a call to execve(2) will cause a process's capabilities to be recalculated in the usual way (see capabilities(7)), so that usually, unless it has a user ID of 0 within the namespace or the executable file has a nonempty inheritable capabilities mask, it will lose all capabilities. ‒ Having a capability inside a user namespace permits a process to perform operations (that require privilege) only on resources governed by that namespace.
  • 36. 36 Socketat - network namespaces ‒ http://lwn.net/Articles/407615/ ‒ The use case are applications are the handful of networking applications that find that it makes sense to listen to sockets from multiple network namespaces at once. Say a home machine that has a vpn into your office network and the vpn into the office network runs in a different network namespace so you don't have to worry about address conflicts between the two networks, the chance of accidentally bridging between them, and so you can use different dns resolvers for the different networks. ‒ In that scenario it would be nice if I could run some services on both networks. Starting two+ copies of the daemons just so the can have live in all of the networks is ok, but in the fullness of time I expect that there will be daemons that want to optimize things and have sockets in all of the network namespaces you are connected to. ‒ In a multiple network namespace aware application when it goes to open a socket it will want to specify which network namespace the socket is in. If it is a general listener it will probably listening to events in /proc/mounts waiting for extra namespaces to be mounted under a standard location say: /var/run/netns/<netnsname>/ns. ‒ Once the application receives the event for a new network namespace showing up it can will want to create a new socket listening for connections in the new network namespace. ‒ In that scenario none of those network namespaces are foreign, but one network namespace will be the default and the rest will be non-default network namespaces.
  • 37. 37 socketat ‒ http://lists.openvz.org/pipermail/devel/2010-October/025720.html ‒ [Devel] Re: [PATCH 8/8] net: Implement socketat. ‒ Just to clarify this point. You enter the namespace, create the socket and go back to the initial namespace (or create a new one). Further operations can be made against this fd because it is the network namespace stored in the sock struct which is used, not the current process network namespace which is used at the socket creation only. ‒ We can actually already do that by unsharing and then create a socket. This socket will pin the namespace and can be used as a control socket for the namespace (assuming the socket domain will be ok for all the operations). ‒ .. if I assume you want to create a process controlling 1024 netns, let's try to identificate what happen with setns and with socketat : ‒ With setns: ‒ * open /proc/self/ns/net (1) ‒ * unshare the netns ‒ * open /proc/self/ns/net (2) ‒ * setns (1) ‒ * create a virtual network device ‒ * move the virtual device to (2) (using the set netns by fd)
  • 38. 38 socketat ‒ http://lists.openvz.org/pipermail/devel/2010-October/025736.html ‒ > The app control point is in namespace0. I still want to be able to ‒ > "boot" namespaces first and maybe a few seconds later do a socketat()... ‒ > and create devices, tcp sockets etc. I suspect create_ns(namespace-name) ‒ > would involve: ‒ > * open /proc/self/ns/net (namespace-name) ‒ > * unshare the netns ‒ > Is this correct? ‒ ‒ Almost. ‒ create should be: ‒ * verify namespace-name is not already in use ‒ * mkdir -p /var/run/netns/<namespace-name> ‒ * unshare the netns ‒ * mount --bind /proc/self/ns/net /var/run/netns/<namespace-name>
  • 39. 39 Operating system–level virtualization Stand: 30.11.2014 http://en.wikipedia.org/wiki/Operating_system-level_virtualization
  • 40. 40 References – old ‒ Paul B. Menage. Adding Generic Process Containers to the Linux Kernel. Proceedings of the Ottawa Linux Symposium, 2007. ‒ http://www.kernel.org/doc/ols/2007/ols2007v2-pages-45-58.pdf ‒ Linux-CR: Transparent Application Checkpoint-Restart in Linux ‒ http://www1.cs.columbia.edu/~orenl/papers/ols2010-linuxcr.pdf ‒ Making applications mobile using containers ‒ http://lxc.sourceforge.net/doc/ols2006/lxc-ols2006-slides.pdf ‒ Virtual Servers and Checkpoint/Restart in Mainstream Linux ‒ describes the general namespace support in Linux and its usage ‒ Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems -Oren Laadan ‒ Source: Operating System Virtualization: Practice and Experience Oren Ladaan(systor2010_osvirt.pdf)
  • 41. 41 References ‒ ‒ http://lwn.net/Articles/531114/#series_index ‒ Namespaces in operation, 6 part series by Michael Kerrisk ‒ https://github.com/bigbighd604/C-Notes ‒ demo codes git from namespace series ‒ www.haifux.org/lectures/299/netLec7.pdf (Rami Rosen, 2013) ‒ https://www.kernel.org/doc/ols/2006/ols2006v1-pages-101-112.pdf (Biederman) ‒ http://books.google.de/books?id=RpsQAwAAQBAJ&pg=PA424&lpg=PA423&ots= rAqP4sxMXn&focus=viewport&dq=Rami+Rosen+network+namespaces&hl=de ‒ Linux Kernel Networking(Rami Rosen) ‒ http://www.makelinux.net/kernel_map/ ‒ http://en.wikipedia.org/wiki/Operating_system-level_virtualization ‒ /usr/src/linux/Documentation/unshare.txt ‒ How to find namespaces in a Linux system ‒ http://www.opencloudblog.com/?p=251
  • 42. 42
  • 43. Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany +49 911 740 53 0 (Worldwide) www.suse.com Join us on: www.opensuse.org 43
  • 44. Unpublished Work of SUSE LLC. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.