Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linux Container Brief for IEEE WG P2302


Published on

A brief into to Linux Containers presented to IEEE working group P2302 (InterCloud standards and portability). This deck covers:
- Definitions and motivations for containers
- Container technology stack
- Containers vs Hypervisor VMs
- Cgroups
- Namespaces
- Pivot root vs chroot
- Linux Container image basics
- Linux Container security topics
- Overview of Linux Container tooling functionality
- Thoughts on container portability and runtime configuration
- Container tooling in the industry
- Container gaps
- Sample use cases for traditional VMs

Overall, a bulk of this deck is covered in other material I have posted here. However there are a few new slides in this deck, most notability some thoughts on container portability and runtime config.

Published in: Technology, News & Politics
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... 
    Are you sure you want to  Yes  No
    Your message goes here

Linux Container Brief for IEEE WG P2302

  1. 1. Linux Containers (LXC) IEEE P2302 Working Group Brief Boden Russell ( /
  2. 2. Definitions  Linux Containers (LXC  LinuX Containers) – Lightweight virtualization – Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)  Containerization of – (Linux) Operating Systems (less the kernel) – Single or multiple applications  LXC as a technology ≠ LXC “tools” 6/27/2014 2
  3. 3. Why LXC  Provision in seconds / milliseconds  Near bare metal runtime performance  VM-like agility – it’s still “virtualization”  Flexibility – Containerize a “system” – Containerize “application(s)”  Lightweight – Just enough Operating System (JeOS) – Minimal per container penalty  Open source – free – lower TCO  Supported with OOTB modern Linux kernel  Growing in popularity 6/27/2014 3 “Linux Containers are poised as the next VM in our modern Cloud era…” Manual VM LXC Provision Time Days Minutes Seconds / ms linpack performance @ 45000 0 50 100 150 200 250 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 BM vcpus GFlops Google trends - LXC Google trends - docker
  4. 4. LXC Technology Stack 6/27/2014 4 UserSpaceKernelSpace Kernel System Call Interface Architecture Dependent Kernel Code GLIBC / Pseudo FS / User Space Tools & Libs Linux Container Tooling Linux Container Commoditization Orchestration & Management Hardware cgroups namespaces chroots LSM lxc
  5. 5. Hypervisors vs. Linux Containers 6/27/2014 5 Hardware Operating System Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App Type 1 Hypervisor Type 2 Hypervisor Linux Containers Containers share the OS kernel of the host and thus are lightweight. However, each container must have the same OS kernel. Containers are isolated, but share OS and, where appropriate, libs / bins.
  6. 6. Linux cgroups  History – Work started in 2006 by google engineers – Merged into upstream 2.6.24 kernel due to wider spread LXC usage – A number of features still a WIP  Functionality – Access; which devices can be used per cgroup – Resource limiting; memory, CPU, device accessibility, block I/O, etc. – Prioritization; who gets more of the CPU, memory, etc. – Accounting; resource usage per cgroup – Control; freezing & check pointing – Injection; packet tagging  Usage – cgroup functionality exposed as “resource controllers” (aka “subsystems”) – Subsystems mounted on FS – Top-level subsystem mount is the root cgroup; all procs on host – Directories under top-level mounts created per cgroup – Procs put in tasks file for group assignment – Interface via read / write pseudo files in group 6/27/2014 6 Cgroups provide throttling, prioritization, control and metrics for a group of processes.
  7. 7. Linux cgroups Subsystems  cgroups provided via kernel modules – Not always loaded / provided by default – Locate and load with modprobe  Some features tied to kernel version  See: 6/27/2014 7 Subsystem Tunable Parameters blkio - Weighted proportional block I/O access. Group wide or per device. - Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second. cpu - Time period (microseconds per second) a group should have CPU access. - Group wide upper limit on CPU time per second. - Weighted proportional value of relative CPU time for a group. cpuset - CPUs (cores) the group can access. - Memory nodes the group can access and migrate ability. - Memory hardwall, pressure, spread, etc. devices - Define which devices and access type a group can use. freezer - Suspend/resume group tasks. memory - Max memory limits for the group (in bytes). - Memory swappiness, OOM control, hierarchy, etc.. hugetlb - Limit HugeTLB size usage. - Per cgroup HugeTLB metrics. net_cls - Tag network packets with a class ID. - Use tc to prioritize tagged packets. net_prio - Weighted proportional priority on egress traffic (per interface).
  8. 8. Linux namespaces  History – Initial kernel patches in 2.4.19 – Recent 3.8 patches for user namespace support – A number of features still a WIP  Functionality – Provide process level isolation of global resources • MNT (mount points, file systems, etc.) • PID (process) • NET (NICs, routing, etc.) • IPC (System V IPC resources) • UTS (host & domain name) • USER (UID + GID) – Process(es) in namespace have illusion they are the only processes on the system – Generally constructs exist to permit “connectivity” with parent namespace  Usage – Construct namespace(s) of desired type – Create process(es) in namespace (typically done when creating namespace) – If necessary, initialize “connectivity” to parent namespace – Process(es) in name space internally function as if they are only proc(s) on system 6/27/2014 8 Namespaces provide isolation of Linux resources for a group of processes.
  9. 9. Linux namespaces: Conceptual Overview 6/27/2014 9 global (i.e. root) namespace MNT NS / /proc /mnt/fsrd /mnt/fsrw /mnt/cdrom /run2 UTS NS globalhost PID NS PID COMMAND 1 /sbin/init 2 [kthreadd] 3 [ksoftirqd] 4 [cpuset] 5 /sbin/udevd 6 /bin/sh 7 /bin/bash IPC NS SHMID OWNER 32452 root 43321 boden SEMID OWNER 0 root 1 Boden MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… eth1: UP… br0: UP… app1 IP:5000 app2 IP:6000 app3 IP:7000 USER NS root 0:0 ntp 104:109 mysql 105:110 boden 106:111 purple namespace MNT NS / /proc /mnt/purplenfs /mnt/fsrw /mnt/cdrom UTS NS purplehost PID NS PID COMMAND 1 /bin/bash 2 /bin/vim IPC NS SHMID OWNER SEMID OWNER 0 root MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… app1 IP:1000 app2 IP:7000 USER NS root 0:0 app 106:111 blue namespace MNT NS / /proc /mnt/cdrom /bluens UTS NS bluehost PID NS PID COMMAND 1 /bin/bash 2 python 3 node IPC NS SHMID OWNER SEMID OWNER MSQID OWNER NET NS lo: UNKNOWN… eth0: DOWN… eth1: UP app1 IP:7000 app2 IP:9000 USER NS root 0:0 app 104:109
  10. 10. Linux namespaces: Common Idioms  It’s not required to use all namespaces – Pick & choose; if your toolset allows it  Constructs exist to permit “connectivity” between parent / child namespace  Various linux user space tools have namespace support  Linux sys API supports flexible namespace creation 6/27/2014 10
  11. 11. Linux namespaces & cgroups: Availability 6/27/2014 11 Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support: - Map LXC UID/GID between container and host - Non-root LXC creation
  12. 12. Linux chroot & pivot_root  Using pivot_root with MNT namespace addresses escaping chroot concerns  The pivot_root target directory becomes the “new root FS” 6/27/2014 12
  13. 13. LXC Images LXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint.  Basic constraints – Same architecture & endian – Linux’ish Operating System; you can run different Linux distros on same host  Image types – System; virtualize Operating System(s) – standard distro root FS less the kernel – Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just enough Operating System)  Bind mount host libs / bins into LXC to share host resources  Container image init process – Container init command provided on invocation – can be an application or a full fledged init process – Init script customized for image – skinny SysVinit, upstart, etc. – Reduces overhead of lxc start-up and runtime foot print  Various tools to build images – SuSE Kiwi – Debootstrap – Etc.  LXC tooling options often include numerous image templates 6/27/2014 13
  14. 14. Linux Security Modules & MAC  Linux Security Modules (LSM) – kernel modules which provide a framework for Mandatory Access Control (MAC) security implementations  MAC vs DAC – In MAC, admin (user or process) assigns access controls to subject / initiator – In DAC, resource owner (user) assigns access controls to individual resources  Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc. 6/27/2014 14
  15. 15. Linux Capabilities  Per process privileges which define sys call access  Can be assigned to LXC process(es) 6/27/2014 15
  16. 16. Other Security Measures  Reduce shared FS access using RO bind mounts  Linux seccomp – Confine system calls  Keep Linux kernel up to date  User namespaces in 3.8+ kernel; mitigates container root escape concerns – Launching containers as non-root user – Mapping UID / GID into container 6/27/2014 16
  17. 17. LXC Security Triangle  Private environments (trusted code) – App packaging / deployment / management / etc, devOps, Cloud, etc… No additional worries about security  Public environments – Single tenant • Same restrictions as private envs; tenant trusted code – Multi-tenant 6/27/2014 17 Privileges, Multitenancy, Untrusted Code SecurityMeasures LSM, capabilities, seccomp, RO bind mounts, GRSEC, etc. LXC Security Triangle NOTE: User namespaces will mitigate root escape concerns
  18. 18. So You Want To Build A Container? 6/27/2014 18
  19. 19. LXC Engine: A “Hypervisor” for Containers 6/27/2014 19 Hardware Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App LXC Engine / Tooling Type 1 Hypervisor Linux Containers LXC Engine / Tooling Conceptual Mapping VM  Container Hypervisor  LXC Engine
  20. 20. Docker Container Engine 6/27/2014 20 PC Mac Virtual Machine Bare Metal Docker Image Develop container image once – run anywhere the docker engine is supported.
  21. 21. Typical Container Engine / Tooling Functionality  Standard container operations to realize / manage LXCs – Run, start, stop, delete, attach, snapshot, etc.  APIs – CLI, Web API (RESTful or other), Automation files (e.g. dockerfile) – Security – who can use the APIs  Images – Format (typically RAW), required dev files, etc.. – Image repository or catalog – Container FS layers (union FS style) – Import / export images and containers  Metrics – CPU, memory, block IO, etc..  Eventing – Push events on container changes  Logs – Inspect / manage container logs  Checkpointing – Freeze a container’s state 6/27/2014 21 Sample concepts below; not a complete list.
  22. 22. LXC Host / Engine Portability Considerations  Linux Kernel dependencies – Non-standard kernel modules needed by app(s) on image – LXC related kernel features tied to specific versions of the Kernel  LSM profile settings; security – Different profile config per LSM (e.g. AppArmor, SELinux, etc.)  Container file system / storage type & capability – FS type – FS capabilities such as direct I/O  External storage – FS vols bind mounted into container  Network options – Linux bridging, OVS, etc…  Shared host libs / bins; dependencies from host OS 6/27/2014 22 Sample concepts below; not a complete list. Not well solidified in current open source community.
  23. 23. LXC Runtime Configuration Considerations  Linux capabilities – Applied to container process for Linux security – Sysctl settings  Cgroups settings – Allowed devices, limits, UID / GID mappings, etc.  Environment variables – Exposed to shell / procs in container  Namespaces – Which to use for container procs  Networking – Type, IP, gateway, hostname, etc..  Init – Which cmd / bin to run on container init 6/27/2014 23 Example: Sample concepts below; not a complete list. Community consolidating around docker’s libcontainer project.
  24. 24. LXC Industry Tooling Parallels OpenVZ Linux VServer Libvirt-lxc Lxc (tools) Warden lmctfy Docker Summary Commercial product using OpenVZ under the hood Custom Kernel (pre GNU 3.13) providing well seasoned LXC support. A set of kernel patches providing LXC. Not based on cgroups or namespaces. Libvirt support for LXC via cgroups and namespaces. Lib + set of user spaces tools /bindings for LXC. LXC management tooling used by CF. Similar to LXC, but provides more intent based focus. Commoditizatio n of LXC adding support for images, build files, etc. Part of upstream Kernel? No No Partial Yes Yes Yes Yes, but additional patches needed for specific features. Yes License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2 APIs / Bindings - CLI - API - CLI - C - CLI - C - Python - Java - C# - PHP - Python - Lua - GO - CLI - GO - REST - CLI - Python - Other 3rd party libs Managem ent plane/ Dashboard Parallels Cloud Server Parallels + others - OpenStack - Archipel - Virt- Manager - LXC web panel - Lexy - OpenStack - Shipyard - Docker UI 6/27/2014 24
  25. 25. LXC Orchestration & Management  Docker & libvirt-lxc in OpenStack – Manage containers heterogeneously with traditional VMs… but not w/the level of support & features we might like  CoreOS – Zero-touch admin Linux distro with docker images as the unit of operation – Centralized key/value store to coordinate distributed environment  Various other 3rd party apps – Maestro for docker – Shipyard for docker – Fleet for CoreOS – Etc.  LXC migration – Container migration via criu  But… – Still no great way to tie all virtual resources together with LXC – e.g. storage + networking • IMO; an area which needs focus for LXC to become more generally applicable 6/27/2014 25
  26. 26. LXC Gaps There are gaps…  Lack of industry tooling / support  Live migration still a WIP  Full orchestration across resources (compute / storage / networking)  Fears of security  Not a well known technology… yet  Integration with existing virtualization and Cloud tooling  Not much / any industry standards  Missing skillset  Slower upstream support due to kernel dev process  Etc. 6/27/2014 26
  27. 27. LXC: Use Cases For Traditional VMs There are still use cases where traditional VMs are warranted.  Virtualization of non Linux based OSs – Windows – AIX – Etc.  LXC not supported on host  VM requires unique kernel setup which is not applicable to other VMs on the host (i.e. per VM kernel config)  Etc. 6/27/2014 27
  28. 28. Questions? 6/27/2014 28
  29. 29. Links & Resources   openstack  libcontainer           6/27/2014 29