Linux Containers – NextGen Virtualization
for Cloud (Intro & Overview)
Cloud Expo
June 10-12, 2014
New York City, NY
Boden...
Why LXC: Performance
6/13/2014 2
Manual VM LXC
Provision Time
Days
Minutes
Seconds / ms
linpack performance @ 45000
0
50
1...
Why LXC: Industry Uptrend
6/13/2014 3
Google trends - LXC
Google trends - docker
Why LXC: Flexible & Lightweight
Virtual Machines Linux Containers
6/13/2014 4
OS
bins / libs
app
OS
bins / libs
app app
bi...
Why LXC: Lower TCO
 Supported with out of the box modern
Linux Kernel
 Open source toolsets
 Cloudy integration
6/13/20...
Definitions
 Linux Containers (LXC  LinuX Containers)
– Lightweight virtualization
– Realized using features provided by...
Hypervisors vs. Linux Containers
6/13/2014 7
Hardware
Operating System
Hypervisor
Virtual Machine
Operating
System
Bins / ...
LXC Technology Stack
6/13/2014 8
UserSpaceKernelSpace
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIB...
So You Want To Build A Container?
 High level checklist
– Process(es)
– Throttling / limits
– Prioritization
– Resource i...
Linux Control Groups (cgroups)
 Problem
– How do I throttle, prioritize, control and obtain metrics for a group of
tasks ...
Linux cgroup Subsystems
Subsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per dev...
Linux cgroups Pseudo FS Interface
 Linux pseudo FS is the interface to cgroups
– Directory per subsystem per cgroup
– Rea...
Linux cgroups FS Layout
6/13/2014 13
Linux cgroups: CPU Usage
 Use CPU shares (and other controls) to prioritize jobs /
containers
 Carry out complex schedul...
Linux cgroups: CPU Pinning
 Pin containers / jobs to CPU cores
 Carry out complex scheduling schemes
 Reduce core switc...
Linux cgroups: Device Access
 Limit device visibility; isolation
 Implement device access controls
– Secure sharing
 Se...
So You Want To Build A Container?
6/13/2014 17
Linux namespaces
 Problem
– How do I provide an isolated view of global resources to a group of tasks
(processes)?
 Solu...
Linux namespaces: Conceptual Overview
6/13/2014 19
global (i.e. root) namespace
MNT NS
/
/proc
/mnt/fsrd
/mnt/fsrw
/mnt/cd...
Linux namespaces: Common Idioms
 It’s not required to use all namespaces
– Pick & choose; if your toolset allows it
 Con...
Linux namespaces & cgroups: Availability
6/13/2014 21
Note: user namespace support in
upstream kernel 3.8+, but
distributi...
So You Want To Build A Container?
6/13/2014 22
Linux chroot & pivot_root
 Using pivot_root with MNT namespace addresses escaping chroot
concerns
 The pivot_root target...
LXC Images
LXC images provide a flexible means to deliver only what you need – lightweight and minimal
footprint
 Basic c...
So You Want To Build A Container?
6/13/2014 25
Linux Security Modules & MAC
 Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Acces...
Linux Capabilities
 Per process privileges which define sys call
access
 Can be assigned to LXC process(es)
6/13/2014 27
Other Security Measures
 Reduce shared FS access using RO bind mounts
 Linux seccomp
– Confine system calls
 Keep Linux...
So You Want To Build A Container?
6/13/2014 29
LXC Industry Tooling
Virtuozzo OpenVZ Linux
VServer
Libvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commercial
produc...
LXC Orchestration & Management
 Docker & libvirt-lxc in OpenStack
– Manage containers heterogeneously with traditional VM...
LXC Gaps
There are gaps…
 Lack of industry tooling / support
 Live migration still a WIP
 Full orchestration across res...
LXC: Use Cases For Traditional VMs
There are still use cases where traditional VMs are warranted.
 Virtualization of non ...
References & Related Links
 http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc
 http://bodenr.blogspot...
Upcoming SlideShare
Loading in...5
×

Lxc – next gen virtualization for cloud intro (cloudexpo)

1,261

Published on

Intro to Linux Containers presented @ cloudexpo 2014 east in NYC.

Published in: Technology, News & Politics

Transcript of "Lxc – next gen virtualization for cloud intro (cloudexpo)"

  1. 1. Linux Containers – NextGen Virtualization for Cloud (Intro & Overview) Cloud Expo June 10-12, 2014 New York City, NY Boden Russell (brussell@us.ibm.com)
  2. 2. Why LXC: Performance 6/13/2014 2 Manual VM LXC Provision Time Days Minutes Seconds / ms linpack performance @ 45000 0 50 100 150 200 250 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 B M vcpus GFlops
  3. 3. Why LXC: Industry Uptrend 6/13/2014 3 Google trends - LXC Google trends - docker
  4. 4. Why LXC: Flexible & Lightweight Virtual Machines Linux Containers 6/13/2014 4 OS bins / libs app OS bins / libs app app bins / libs app bins / libs app app app app OS bins / libs app OS bins / libs app OS bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app bins / libs app FlexibilityDensity OS
  5. 5. Why LXC: Lower TCO  Supported with out of the box modern Linux Kernel  Open source toolsets  Cloudy integration 6/13/2014 5
  6. 6. Definitions  Linux Containers (LXC  LinuX Containers) – Lightweight virtualization – Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)  Containerization of – (Linux) Operating Systems – Single or multiple applications  LXC as a technology ≠ LXC “tools” 6/13/2014 6
  7. 7. Hypervisors vs. Linux Containers 6/13/2014 7 Hardware Operating System Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Hypervisor Virtual Machine Operating System Bins / libs App App Virtual Machine Operating System Bins / libs App App Hardware Operating System Container Bins / libs App App Container Bins / libs App App Type 1 Hypervisor Type 2 Hypervisor Linux Containers Containers share the OS kernel of the host and thus are lightweight. However, each container must have the same OS kernel. Containers are isolated, but share OS and, where appropriate, libs / bins.
  8. 8. LXC Technology Stack 6/13/2014 8 UserSpaceKernelSpace Kernel System Call Interface Architecture Dependent Kernel Code GLIBC / Pseudo FS / User Space Tools & Libs Linux Container Tooling Linux Container Commoditization Orchestration & Management Hardware cgroups namespaces chroots LSM lxc
  9. 9. So You Want To Build A Container?  High level checklist – Process(es) – Throttling / limits – Prioritization – Resource isolation – Root file system – Security 6/13/2014 9 my-lxc ?
  10. 10. Linux Control Groups (cgroups)  Problem – How do I throttle, prioritize, control and obtain metrics for a group of tasks (processes)?  Solution  control groups (cgroups) 6/13/2014 10 cgroup blue proc proc proc – Device Access – Resource limiting – Prioritization – Accounting – Control – Injection
  11. 11. Linux cgroup Subsystems Subsystem Tunable Parameters blkio - Weighted proportional block I/O access. Group wide or per device. - Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second. cpu - Time period (microseconds per second) a group should have CPU access. - Group wide upper limit on CPU time per second. - Weighted proportional value of relative CPU time for a group. cpuset - CPUs (cores) the group can access. - Memory nodes the group can access and migrate ability. - Memory hardwall, pressure, spread, etc. devices - Define which devices and access type a group can use. freezer - Suspend/resume group tasks. memory - Max memory limits for the group (in bytes). - Memory swappiness, OOM control, hierarchy, etc.. hugetlb - Limit HugeTLB size usage. - Per cgroup HugeTLB metrics. net_cls - Tag network packets with a class ID. - Use tc to prioritize tagged packets. net_prio - Weighted proportional priority on egress traffic (per interface). 6/13/2014 11
  12. 12. Linux cgroups Pseudo FS Interface  Linux pseudo FS is the interface to cgroups – Directory per subsystem per cgroup – Read / write to pseudo file(s) in your cgroup directory 6/13/2014 12 /sys/fs/cgroup/my-lxc |-- blkio | |-- blkio.io_merged | |-- blkio.io_queued | |-- blkio.io_service_bytes | |-- blkio.io_serviced | |-- blkio.io_service_time | |-- blkio.io_wait_time | |-- blkio.reset_stats | |-- blkio.sectors | |-- blkio.throttle.io_service_bytes | |-- blkio.throttle.io_serviced | |-- blkio.throttle.read_bps_device | |-- blkio.throttle.read_iops_device | |-- blkio.throttle.write_bps_device | |-- blkio.throttle.write_iops_device | |-- blkio.time | |-- blkio.weight | |-- blkio.weight_device | |-- cgroup.clone_children | |-- cgroup.event_control | |-- cgroup.procs | |-- notify_on_release | |-- release_agent | `-- tasks |-- cpu | |-- ... |-- ... `-- perf_event echo "8:16 1048576“ > blkio.throttle.read_bps_device cat blkio.weight_device dev weight 8:1 200 8:16 500 App App App
  13. 13. Linux cgroups FS Layout 6/13/2014 13
  14. 14. Linux cgroups: CPU Usage  Use CPU shares (and other controls) to prioritize jobs / containers  Carry out complex scheduling schemes  Segment host resources  Adhere to SLAs 6/13/2014 14
  15. 15. Linux cgroups: CPU Pinning  Pin containers / jobs to CPU cores  Carry out complex scheduling schemes  Reduce core switching costs  Adhere to SLAs 6/13/2014 15
  16. 16. Linux cgroups: Device Access  Limit device visibility; isolation  Implement device access controls – Secure sharing  Segment device access  Device whitelist / blacklist 6/13/2014 16
  17. 17. So You Want To Build A Container? 6/13/2014 17
  18. 18. Linux namespaces  Problem – How do I provide an isolated view of global resources to a group of tasks (processes)?  Solution  namespaces 6/13/2014 18 namespace blue – MNT; mount points, files systems, etc. – PID; processes – NET; NICs, routing, etc. – IPC; System V IPC – UTS; host and domain name – USER; UID and GID MNT PID NET UTS USER proc proc proc
  19. 19. Linux namespaces: Conceptual Overview 6/13/2014 19 global (i.e. root) namespace MNT NS / /proc /mnt/fsrd /mnt/fsrw /mnt/cdrom /run2 UTS NS globalhost rootns.com PID NS PID COMMAND 1 /sbin/init 2 [kthreadd] 3 [ksoftirqd] 4 [cpuset] 5 /sbin/udevd 6 /bin/sh 7 /bin/bash IPC NS SHMID OWNER 32452 root 43321 boden SEMID OWNER 0 root 1 Boden MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… eth1: UP… br0: UP… app1 IP:5000 app2 IP:6000 app3 IP:7000 USER NS root 0:0 ntp 104:109 mysql 105:110 boden 106:111 purple namespace MNT NS / /proc /mnt/purplenfs /mnt/fsrw /mnt/cdrom UTS NS purplehost purplens.com PID NS PID COMMAND 1 /bin/bash 2 /bin/vim IPC NS SHMID OWNER SEMID OWNER 0 root MSQID OWNER NET NS lo: UNKNOWN… eth0: UP… app1 IP:1000 app2 IP:7000 USER NS root 0:0 app 106:111 blue namespace MNT NS / /proc /mnt/cdrom /bluens UTS NS bluehost bluens.com PID NS PID COMMAND 1 /bin/bash 2 python 3 node IPC NS SHMID OWNER SEMID OWNER MSQID OWNER NET NS lo: UNKNOWN… eth0: DOWN… eth1: UP app1 IP:7000 app2 IP:9000 USER NS root 0:0 app 104:109
  20. 20. Linux namespaces: Common Idioms  It’s not required to use all namespaces – Pick & choose; if your toolset allows it  Constructs exist to permit “connectivity” between parent / child namespace  Various linux user space tools have namespace support  Linux sys API supports flexible namespace creation 6/13/2014 20
  21. 21. Linux namespaces & cgroups: Availability 6/13/2014 21 Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support: - Map LXC UID/GID between container and host - Non-root LXC creation
  22. 22. So You Want To Build A Container? 6/13/2014 22
  23. 23. Linux chroot & pivot_root  Using pivot_root with MNT namespace addresses escaping chroot concerns  The pivot_root target directory becomes the “new root FS” 6/13/2014 23
  24. 24. LXC Images LXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint  Basic constraints – Same architecture & endian – Linux’ish Operating System; you can run different Linux distros on same host  Image types – System; virtualize Operating System(s) – standard distro root FS less the kernel – Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just enough Operating System)  Bind mount host libs / bins into LXC to share host resources  Container image init process – Container init command provided on invocation – can be an application or a full fledged init process – Init script customized for image – skinny SysVinit, upstart, etc. – Reduces overhead of lxc start-up and runtime foot print  Various tools to build images – SuSE Kiwi – Debootstrap – Etc.  LXC tooling options often include numerous image templates 6/13/2014 24
  25. 25. So You Want To Build A Container? 6/13/2014 25
  26. 26. Linux Security Modules & MAC  Linux Security Modules (LSM) – kernel modules which provide a framework for Mandatory Access Control (MAC) security implementations  MAC vs DAC – In MAC, admin (user or process) assigns access controls to subject / initiator – In DAC, resource owner (user) assigns access controls to individual resources  Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc. 6/13/2014 26
  27. 27. Linux Capabilities  Per process privileges which define sys call access  Can be assigned to LXC process(es) 6/13/2014 27
  28. 28. Other Security Measures  Reduce shared FS access using RO bind mounts  Linux seccomp – Confine system calls  Keep Linux kernel up to date  User namespaces in 3.8+ kernel – Launching containers as non-root user – Mapping UID / GID into container 6/13/2014 28
  29. 29. So You Want To Build A Container? 6/13/2014 29
  30. 30. LXC Industry Tooling Virtuozzo OpenVZ Linux VServer Libvirt-lxc Lxc (tools) Warden lmctfy Docker Summary Commercial product using OpenVZ under the hood Custom Kernel providing well seasoned LXC support A set of kernel patches providing LXC. Not based on cgroups or namespaces. Libvirt support for LXC via cgroups and namespaces. Lib + set of user spaces tools /bindings for LXC. LXC management tooling used by CF. Similar to LXC, but provides more intent based focus. Commoditizatio n of LXC adding support for images, build files, etc. Part of upstream Kernel? No No Partial Yes Yes Yes Yes, but additional patches needed for specific features. Yes License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2 APIs / Bindings - CLI - API - CLI - C - CLI - C - Python - Java - C# - PHP - Python - Lua - GO - CLI - GO - REST - CLI - Python - Other 3rd party libs Managem ent plane/ Dashboard Virtuozzo Parrallels Virtuozzo Parrallels + others - OpenStack - Archipel - Virt- Manager - LXC web panel - Lexy - OpenStack - Shipyard - Docker UI 6/13/2014 30
  31. 31. LXC Orchestration & Management  Docker & libvirt-lxc in OpenStack – Manage containers heterogeneously with traditional VMs… but not w/the level of support & features we might like  CoreOS – Zero-touch admin Linux distro with docker images as the unit of operation – Centralized key/value store to coordinate distributed environment  Various other 3rd party apps – Maestro for docker – Shipyard for docker – Fleet for CoreOS – Etc.  LXC migration – Container migration via criu  But… – Still no great way to tie all virtual resources together with LXC – e.g. storage + networking • IMO; an area which needs focus for LXC to become more generally applicable 6/13/2014 31
  32. 32. LXC Gaps There are gaps…  Lack of industry tooling / support  Live migration still a WIP  Full orchestration across resources (compute / storage / networking)  Fears of security  Not a well known technology… yet  Integration with existing virtualization and Cloud tooling  Not much / any industry standards  Missing skillset  Slower upstream support due to kernel dev process  Etc. 6/13/2014 32
  33. 33. LXC: Use Cases For Traditional VMs There are still use cases where traditional VMs are warranted.  Virtualization of non Linux based OSs – Windows – AIX – Etc.  LXC not supported on host  VM requires unique kernel setup which is not applicable to other VMs on the host (i.e. per VM kernel config)  Etc. 6/13/2014 33
  34. 34. References & Related Links  http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc  http://bodenr.blogspot.com/2014/05/kvm-and-docker-lxc-benchmarking- with.html  https://www.docker.io/  http://sysbench.sourceforge.net/  http://dag.wiee.rs/home-made/dstat/  http://www.openstack.org/  https://wiki.openstack.org/wiki/Rally  https://wiki.openstack.org/wiki/Docker  http://devstack.org/  http://www.linux-kvm.org/page/Main_Page  https://github.com/stackforge/nova-docker  https://github.com/dotcloud/docker-registry  http://www.netperf.org/netperf/  http://www.tokutek.com/products/iibench/  http://www.brendangregg.com/activebenchmarking.html  http://wiki.openvz.org/Performance 6/13/2014 34
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×