This Presentation contains information about os level virtualization and Containers internals. It has used other material on slide share which is referenced in Notes of PPT
2. This session is not about DevOps, CI/CD or test but must to know to design a state of art
DevOps and SecDevOps solutions.
No new concepts and most of concepts are as old as year 2002 and in some cases 1970’s.
Presentation is designed in two parts
Information for all
Information for system programmers
Examples are as on RHEL 7 platform
What is not covered
Indepth discussion on storage related topics like copy-on-write.
Containers and systemd/apparmour related topics and issues.
3. Basics of OS LEVEL Virtualization.
Products of Interest.
Features of OS level virtualization.
OS level virtualization features in brief.
Linux Container Building blocks.
Samples
5. It is server level virtualization, works with OS layer.
Single instance/physical instance virtualized into multiple isolated partition.
Common hardware and OS Kernel hosting multiple isolated partition.
Cannot host guest OS kernel different from host OS kernel.
OS level virtualization needs orienting host kernel and system services to
support multiple isolated partition.
Limiting Hardware resource for per process usages.
7. OS Containers:
Shares kernel of host operating system but provide userspace isolation.
System resources (like RAM,processer, libraries etsc.) are shared among container
System resources are controlled by quota created as per policy on container controller or host
system.
Runs multiple processes and services
No Layered filesystem in default configuration
Built on top of native process resource isolation.
Example: LXC, openVZ, Linux Vserver, BSD Jails, Solaris Zones etc
8. Application Containers are designed to run single processes/Service.
Build on top of OS container
13. Kernel need userspace process help to understand which process is important and have
higher priority.[NICE]
Limit the usage of a given process.
Without CPU quotas many container process can starve and slows the system.
Every OS provide certain control to manage resource usage for per process.
Administrator can designate container specific CPU/Core.
14. Networking is based on isolation, not virtualization.
Why
To leverage existing infrastructure and scale up as and when required.
Provide security through sandboxing.
To make network resource transparent with host,
Obsolete/Old type
Links and Ambassador
Container Mapped Networking
Modern Container networking
None
Bridge
Host
Overlay
Underlays
MACVLAN
IPVLAN
DIRECT ROUTING
FAN Networking
Point-to-Point
Benefit
OS support
15. Memory limit
A container is as process and operating system is bound to insure the amount to memory it
needs, provided operating system should have it.
Running memory intensive task can consume all of you system memory.
Limiting a memory if part of operating system’s framework in general.
Container solution can use OS provided framework to control memory on per process basis.
Example : a container with memory setting can use maximum of value that is set as memory
limit in RAM.
Not setting this may throw your container into uninterruptible sleep state.
I/O rate limit
Same OS framework which controls memory limiting also dod I/O rate limiting.
All containers use same cpu sys time.
We need this setting to make sure some container run in parallel instead getting preempted
all the time.
Defining CPU share is the key.
16. Disk quotas
When a admin need to give access to multiple users/service to a container
And a user/service should not be able to consume all the disk space.
In general 3 parameters are required to determine to how much disk space and inode a
container can use.
Disk space
Disk inode
Quota time
Partitioning
By definition partitioning is running multiple OS on a single physical system and share
hardware resources.
Approaches
Hosted Architecture
Hypervisor(Bare Metal Architecture)
Application level partitioning
17. Check Pointing
Running container make changes to the filesystem which remains intact if container engine
starts/stops
In memory data can be lost in such container engine start/stop events.
If container or host system crashes container instance and data may remain inconsistent in
filesystem
A robust container solution must have solution which allows to freeze a running container and
create a checkpoint as collection of files.
Linux provide CRIU mechanism to create Checkpoint/Restore in userspace.
[https://criu.org/Main_Page]
Live migration
A process to move live container from one physical server to another or cloud without
disconnecting from client.
Two kind of live migration
1) pre-copy memory 2)post-copy memory (lazy migration)
18. FileSystem Isolation
How to restrict container to read/write within its own filesystem
Chroot is the basic form of filesystem isolation
Two types of isolators in general
Filesystem/posix
Works on all posix complaint system
Share same host filesystem
This isolaters handles persistant volume by creating symlinks in container sandbox.
This symlinks points to specific persistent volume on the host filesystem
Example: mesos
Filesystem/linux
Container gets its own mount
Use unix permission to secure container sandboxes.
Example: docker, mesos
Root Privilege Isolation
19. Nice we can run and execute any application as container without even care about
underlying host OS or even hardware unless host os/machine garantees the
availability of OS.
But what if user want to test some kernel functionality ?
use virtual kernels
Compile and execute kernel code in userspace
Example
Vkernel
RUMP kernel
Usermode linux
Unikernel
21. Namespace
Control groups
Capabilities
CRIU (Checkpoint-Restore in userspace)
Storage
SELINUX
22. Linux kernel allows developers to partition kernel resources in such a manner that a
distinct processes get distinct view of these kernel resources
This feature uses same namespace for set of resources and processes.
Namespaces are basic building blocks of Linux containers.
There are different namespace for different resources.
USER isolates user and groups IDs
MNT isolates mount points
PID isolates process IDs
Network isolates network devices, port, stacks etc.
UTS isolates hostname and NIS domain name.
IPC isolates system-V IPC and POSIX message queue
TIME isolates boot and montonic clocks
CGROUP it isolates cgroup directories
23. It is very often an application can start consuming system resources up to extent
where user start seeing hang kind situation while other processes starve for
resources.
This may lead to system crash or more serious all of the ecosystem.
Developers addressed this problem with early development of Android kernel in
2006 and merge in to mainline Linux kernel 2008 under tag line of CGROUPS.
Main goal of CGROUPS was to provide a single interface to realize a whole
operating system level virtualization.
CGROUP provides following functionalities:
Resource Limiting
Prioritization
Accounting
Control (like device node access control)
24. Every process on linux is child of common process init and so linux process model is single
hierarchy or tree.
Except init, every other process in linux inherits the environment (e.g. PATH) and some other
attributes like open file descriptor of its parent.
Cgroup are somewhat similar to process in that
They are hierarchical
Child subgroup inherit attributes from their parent cgroup.
Caveat : Different hierarchies of a cgroup in numbers can coexists, while processes lives in a
single tree process model.
Multiple hierarchies of a cgroup allows to them to be part of many subsystems simultaneously.
A subsystem is a kernel component that modifies the behavior of the processes in a cgroup.
25. cpuset - assigns individual processor(s) and memory nodes to task(s) in a group;
cpu - uses the scheduler to provide cgroup tasks access to the processor resources;
cpuacct - generates reports about processor usage by a group;
io - sets limit to read/write from/to block devices;
memory - sets limit on memory usage by a task(s) from a group;
devices - allows access to devices by a task(s) from a group;
freezer - allows to suspend/resume for a task(s) from a group;
net_cls - allows to mark network packets from task(s) from a group;
net_prio - provides a way to dynamically set the priority of network traffic per network
interface for a group;
perf_event - provides access to perf events) to a group;
hugetlb - activates support for huge pages for a group;
pid - sets limit to number of processes in a group, to avoid fork bomb.
27. [vasharma@vasharma ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1743648k,nr_inodes=435912,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/
28. • As a container feature designer, One cannot desire to give root access of the host system
to everyone.
• Capabilities allows designer to segregate between the processes as privileged process or
unprivileged process.
• Privileged process will bypass all kernel permission checks based on process credential.
• List of important capabilities implemented in Linux:
• CAP_AUDIT_CONTROL
• CAP_AUDIT_READ
• CAP_AUDIT_WRITE
• CAP_CHOWN
• CAP_FOWNER
• CAP_IPC_LOCK
• CAP_IPC_OWNER
• CAP_KILL
• CAP_LINUX_IMMUTABLE
• CAP_MKNOD
• CAP_NET_ADMIN
• CAP_SETGID
• CAP_SETUID
• CAP_SYS_ADMIN
• CAP_SYS_BOOT
• CAP_SYS_CHROOT
29. CRIU feature allows to stop a process and save a state to the filesystem.
CRIU allow to restore the saved state.
This process helps to achieve load balancing while container solution is deployed
in high availability environment.
There can be a PID collision while trying to restore the saved state of process
unless process under restore had its own PID namespace.
30. Container use case create two problem while maintaining multiple
containers at a time
Inefficient disk space utilization
10 container running on native filesystem of size 1 GB each will consume 10 GB of
physical memory. Seems lots of inefficient utilization.
Latency in creating a new containers
Containers all processes and created as child of container engines.
Containers shares copy of memory segment of parent process
To create a container engine copies a container image, that should be completed in
few seconds.
So the footprint of image should be small such that it can share physical memory
segment among other containers.
Union filesystem or similar solutions with copy-on-write support
(overlayfs, UnionMount, AUFS etc.) are basic building blacks of any
Linux based container solution.
Union filesystem works on top of any filesystem native to Linux
environment.
31. All major linux distribution has a Security framework consist of either
Apparmor or Selinux.
SELinux/APPaormor restrict capabilities of a process running on the host
operating system.
Both SELinux & APPaormor provides security lables to secure container
processes and files.
Example of a container process secured with SELINUX
system_u:system_r:container_t:s0:c940,c967
System_u : user [ user designated to run system services]
System_r : role [This role is for all system processes except user processes:]
container_t : Types [ prebuilt selinux type to run containers]
Running a docker container with apparmor security in Ubuntu
docker run --rm -it --security-opt apparmor=unconfined debian:jessie bash -i
33. From MAN page of CGROUP
The kernel's cgroup interface is provided through a pseudo-filesystem called
cgroupfs. Grouping is implemented in the core cgroup kernel code, while
resource tracking and limits are implemented in a set of per-resource-type
subsystems (memory, CPU, and so on).
34. Two Versions:
CGROUP – v1 [Linux Kernel ver 2.6.24 and later ]
CGROUP- v2 [ Linux Kernel ver. 4.5 and later
Both version are orthogonal
Currently, cgroups v2 implements only a subset of the controllers available in cgroups v1.
The two systems are implemented so that both v1 controllers and v2 controllers can be
mounted on the same system. But Container controller cannot simultaneously employed in
both.
CGROUP –v1 is named hierarchies.
Multiple instances of such hierarchies can be mounted; each hierarchy must have a unique name.
The only purpose of such hierarchies is to track processes.
mount -t cgroup -o none,name=somename none /some/mount/point
35. CGROUP-v2 is unified hierarchies.
Cgroups v2 provides a unified hierarchy against which all controllers are mounted.
"Internal" processes are not permitted. With the exception of the root cgroup, processes may reside only in leaf nodes (cgroups that do not
themselves contain child cgroups). The details are somewhat more subtle than this, and are described below.
Active cgroups must be specified via the files cgroup.controllers and cgroup.subtree_control.
The tasks file has been removed. In addition, the cgroup.clone_children file that is employed by the cpuset controller has been removed.
An improved mechanism for notification of empty cgroups is provided by the cgroup.events file.
mount -t cgroup2 none /mnt/cgroup2
A cgroup v2 controller is available only if it is not currently in use via a mount against a cgroup v1 hierarchy.
Cgroups v2 controllers
cpu, cpuset, freezer, hugetlb, io, memory, perf_envent, pids, rdma
There is no direct equivalent of the net_cls and net_prio controllers from cgroups version 1. Instead, support has been added to iptables(8) to
allow eBPF filters that hook on cgroup v2 pathnames to make decisions about network traffic on a per-cgroup basis.
cgroup in the v2 hierarchy contains the following two files:
cgroup.controllers : This read-only file exposes a list of the controllers that are available in this cgroup.
cgroup.subtree_control : This is a list of controllers that are active (enabled) in the cgroup.
Example : echo '+pids -memory' > x/y/cgroup.subtree_control
“No Internal Process" rule of CGROUP-v2
if cgroup /cg1/cg2 exists, then a process may reside in /cg1/cg2, but not in /cg1. This is to avoid an ambiguity in cgroups v1 with respect to the
delegation of resources between processes in /cg1 and its child cgroups.
In /cg1/cg2 path cg2 directory is called leaf node.
So above rule can be stated as
“A (nonroot) cgroup can't both (1) have member processes, and (2) distribute resources into child cgroups—that is, have a nonempty
cgroup.subtree_control file.”
36. The implementation of cgroups requires a few, simple hooks into the rest of the kernel,
none in performance-critical paths:
In boot phase (init/main.c) to preform various initializations.
In process creation and destroy methods, fork() and exit().
A new file system of type "cgroup" (VFS)
Process descriptor additions (struct task_struct)
Add procfs entries:
For each process: /proc/pid/cgroup.
System-wide: /proc/cgroups
CGROUP code location:
mm/memcontrol.c for memory
kernel/cpuset.c for cpu set
And as per functionality requirement in different directories of kernel source
CGROUPs are not dependent on Namespaces.
CGROUP is very complex feature and comes with very large number of rules if
someone wants to control resources in a given environment for a container. Multiple
container solution provides wrapper around that.
37. A single hierarchy can have one or more subsystems attached to it.
Any single subsystem (e.g. cpuacct) cannot be attached to more than one
hierarchy if one of those hierarchies has a different subsystem attached to it
already.
A process cannot be a part of two different cgroup in same hierarchy.
A forked process inherits same cgroups as its parent process.
38. A child process created via fork(2) inherits its parent's cgroup memberships. A process's cgroup memberships are preserved across
execve(2).
The clone3(2) CLONE_INTO_CGROUP flag can be used to create a childprocess that begins its life in a different version 2 cgroup from
the parent process.
CGROUP-v1/v2 related file
# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 3 1 1
cpu 9 1 1
cpuacct 9 1 1
memory 4 1 1
devices 11 92 1
freezer 7 1 1
net_cls 8 1 1
blkio 10 1 1
perf_event 5 1 1
hugetlb 6 1 1
pids 2 92 1
net_prio 8 1 1
# cat /proc/[pid]/cgroup
11:devices:/system.slice/gdm.service
10:blkio:/
9:cpuacct,cpu:/
/sys/kernel/cgroup/delegate : This file exports a list of the cgroups v2 files (one per line) that are delegatable.
/sys/kernel/cgroup/features : This file contains list of cgroups v2 features that are provided by the kernel.
39. Development library : libcgroup
yum install libcgroup ( this will install cgconfig)
yum install libcgroup-tools
Setup cgconfig service and restart it [ edit /etc/cgconfig.conf ]
mount {
controller_name = /sys/fs/cgroup/controller_name;
…
}
# systemctl restart cgconfig.service
CGROUP uses VFS.
CGROUP actions are filesystem operations i.e moun/unmout, create/delete directory etc.
Mounting CGROUP
# mkdir /sys/fs/cgroup/name
# mount -t cgroup -o controller_name none /sys/fs/cgroup/controller_name
Mount command will aattach controller cgroup
Verify whether cgroup is attached to the hierarchy correctly by listing all available hierarchies along with their current mount points using the lssubsys command
# lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls /sys/fs/cgroup/net_cls
blkio /sys/fs/cgroup/blkio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
net_prio /sys/fs/cgroup/net_prio
Unmount hierarchy :
# umount /sys/fs/cgroup/controller_name
40. Use cgcreate command
cgcreate -t uid:gid -a uid:gid -g controllers:path
-g — specifies the hierarchy in which the cgroup should be created, as a comma-separated list of the controllers associated with hierarchies.
Alternatively we can create a child of cgroup directly using mkdir command
mkdir /sys/fs/cgroup/controller/name/child_name
To delete cgroup :
cgdelete controllers:path
Modify /etc/cgconfig.conf to set parameter of a control group.
perm {
task {
uid = task_user;
gid = task_group;
}
admin {
uid = admin_name;
gid = admin_group;
}
}
Alternatively we can use cgset command.
cgset -r parameter=value path_to_cgroup
Now we can move a desired process to cgroup
# cgclassify -g controllers:path_to_cgroup pidlist
Start a process in control group
# cgexec -g controllers:path_to_cgroup command arguments
Displaying Parameters of Control Groups
cgget -r parameter list_of_cgroups
# cgget -g cpuset /
group name {
[permissions]
controller {
param_name =
param_value; … } …
}
$ cgget -g cpuset /
/:
cpuset.memory_pressure_enabled: 0
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 0
cpuset.sched_relax_domain_level: -1
41. Things to discuss
Namespace - Recap
Linux processes and Namespace
CGROUP namespace
PID namespace
USER namespace
NET namespace
MNT namespace
UTS namespace
IPC namespace
TIME namespace
42. A namespace wraps a global system resource in an abstraction that makes it
appear to the processes within the namespace that they have their own isolated
instance of the global resource. Changes to the global resource are visible to other
processes that are members of the namespace, but are invisible to other processes.
One use of namespaces is to implement containers.
Namespace Flag Page Isolates
Cgroup CLONE_NEWCGROUP cgroup_namespaces(7) Cgroup root directory
IPC CLONE_NEWIPC ipc_namespaces(7)
1.System V IPC 2.POSIX message
queues
Network CLONE_NEWNET network_namespaces(7) Network devices stacks ports etc.
Mount CLONE_NEWNS mount_namespaces(7) Mount points
PID CLONE_NEWPID pid_namespaces(7) Process IDs
Time CLONE_NEWTIME time_namespaces(7) Boot and monotonic clocks
User CLONE_NEWUSER user_namespaces(7) User and group IDs
UTS CLONE_NEWUTS uts_namespaces(7) Hostname and NIS domain name
43. Namespace APIs contains following system call
clone()
setns()
unshare()
nsenter command
44. clone() create a new process
Unlike fork(2), it allows a child process to share parts of its
Execution context with parent process
Memory space
File descriptor table
Singnal handler table
Important flags
CLONE_FS : allows child process to share same filesystem
CLONE_IO: allows child process to share I/O context with parent
CLONE_PARENT : if set parent of the new child (as returned by getppid(2)) will be the same as that of the
calling parent process. Else the child's parent is the calling parent process.
CLONE_NEWIPC : Create the process in a new IPC namespace.
CLONE_NEWNET : create the process in a new network namespace.
CLONE_NEWNS : the cloned child is started in a new mount namespace, initialized with a copy of the
namespace of the parent
CLONE_NEWPID: create the process in a new PID namespace.
CLONE_NEWUSER: create the process in a new user namespace.
CLONE_NEWUTS: create the process in a new UTS namespace, whose identifiers are initialized by
duplicating the identifiers from the UTS namespace of the calling process.
45. This systemcall reassociate thread with a namespace.
Signature : int setns(int fd, int nstype);
nstype argument specifies which type of namespace the calling thread may be
reassociated with.
0: Allow any type of namespace to be joined
CLONE_NEWIPC: fd must refer to an IPC namespace.
CLONE_NEWNET: fd must refer to a network namespace.
CLONE_NEWUTS: fd must refer to a UTS namespace.
46. unshare() enables a process to disassociate parts of its execution context that are
currently being shared with other process.
int unshare(int flags); // defined in sched.h
CLONE_FS flags revers the effect of clone(2) CLONE_FS flag. It will unshare file
system attributes, so that calling process no longer share its root directory.
Following flags will Unshare the given namespace, so that the calling process has
a private copy of the given namespace which is not shared with any other process.
CLONE_NEWIPC
CLONE_NEWNET
CLONE_NEWNS
CLONE_NEWUTS
NOTE: If flags is specified as zero, then unshare() is a no-op; no changes are made
to the calling process's execution context.
49. clone() - > do_fork() -> copy_process() -> copy_namespaces()
In case any namespace flags not present in do_fork() call it just uses parent
namespaces else it will create a new nsproxy struct and copies all namespaces.
Child process is responsible to change any namespace data.
unshare() system call will allow process to disassociate some of its part of
execution context that are being shared with other processes.
When a process ends, all namespaces they belong to that does not have any other
process attached are cleaned .
50. nsenter stands for namespace enter.
nsenter command allows to enter in specified namespace.
Use nsenter command to dimistify the container and to understand internals of
containers.
51. [vasharma@vasharma ~]$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531836 pid 2 9943 vasharma -bash
4026531837 user 2 9943 vasharma -bash
4026531838 uts 2 9943 vasharma -bash
4026531839 ipc 2 9943 vasharma -bash
4026531840 mnt 2 9943 vasharma -bash
4026531956 net 2 9943 vasharma –bash
To check list of namespace associated with a given process
lsns –p <pid of a container process>
52. Example1: check ip address and routing table in network namespace
nsenter -t <pid of a container process> -n ip a s
nsenter -t <pid of a container process> -n ip route
Exanple2: check hostname through UTC namespace
nsenter -t <pid of a container process> -u hostname
53. Processes running in different PID namespace can have same UID
PID of first process in a nsmaespace while creating it should be 1.
Behavior of PID 1 in namespace will be like init process.
getppid() on newly created process with PID 1 will return 0.
PID namespace can be nested upto 32 nesting level.
54. A process created in user namespace will have differnet UIDs and GIDs
It allows to map UID in container to UID on host
UID 0 of container can be mapped to non privileged user on the host
User can check the current mapping in
/proc/PID/uid_map
/proc/PID/gid_map
These files have 3 values
ID-inside-ns ID-outside-ns length
The writing process must have the CAP_SETUID (CAP_SETGID for gid_map)
capability in the user namespace of the process PID.
The writing process must be in either the user namespace of the process PID or
inside the (immediate) parent user namespace of the process PID.
55. Mount namespace allows process to have their own private mounts and root fs.
Container can have /proc, /sys/, nfs mounts
Container can have prvet /tmp mounted per service or per user.
Each namespace has owner user namespace
While creating a less privileged mount namespace , shared mounts are reduced to
slave mounts.
56. When a user create a process within a given network namespace it create it own set of network stack available
privately to newly created process.
Process will see
Network interface
Routing table rules
Firewall rules
Sockets
To create a new network namespace
ip netns add <new namespace name>
Assign a interface to network namespace
Create a virtual ethernet adapter
ip link add veth0 type veth peer name <virtual adampter name>
Move this virtual network adapter to newly created namespace
ip link set <virtual adampter name> netns <network namespace name>
List network interface in given network namespace
ip netns exec <network namespace name> ip link list
Configure network interface in network interface
ip netns exec <network namespace name> <command to run against that namespace>
Connecting Network Namespaces to the Physical Network
ip link set dev <device> netns < network namespace name>
57. IPC namespace allows us to isolate following IPC resources,
System V IPC (man 7 sysvipc)
POSIX message queues
/proc interface are different for each IPC namespace
POSIX Message queue interfaces in /proc/sys/fs/mqueue.
The System V IPC interfaces in /proc/sys/kernel for shmmini, shmmax, shmall,
shm_rmid_forced, sem, msgmax, msgmnb, msgmni.
58. UTS : Unix Time Sharing
UTS namespace isolates hostname and NIS domain name.
Systemcall : uname()/sethostname()/gethostname()
59. Namespaces in operation, part 1: namespaces overview
Namespaces in operation, part 2: the namespaces API
Namespaces in operation, part 3: PID namespaces
Namespaces in operation, part 4: more on PID namespaces
Namespaces in operation, part 5: User namespaces
Namespaces in operation, part 6: more on user namespaces
Namespaces in operation, part 7: Network namespaces
Mount namespaces and shared subtrees
Mount namespaces, mount propagation, and unbindable mounts
https://www.usenix.org/conference/usenixsecurity18/presentation/sun
https://www.redhat.com/en/blog/how-selinux-separates-containers-using-multi-level-security
https://cloud.google.com/container-optimized-os/docs/how-to/secure-apparmor
docker run --rm -it --security-opt apparmor=unconfined debian:jessie bash –I [ rm will remove container once work has done]
https://opensource.com/article/18/2/understanding-selinux-labels-container-runtimes