What is Container?
Lightweight VM. But, It’s not quite like a VM
1 Uses the host kernel
2 Does not need to boot a different OS
3 Does not have its own modules
4 Does not need init as PID 1
It’s just normal processes on a host machine
What is Container?
Containers wrap a pieces of software in a complete
filesystem that contains everything it needs to run:
• Code,
• Runtime,
• System tools
• System libraries
Anything you can install on a server
This guarantees that it will always run the same
regardless of the environment where it is running on.
VM vs. Container
Infrastructure
Operating system
Hypervisor
Guest
OS
Guest
OS
Guest
OS
Bins/Libs
App1
Bins/Libs
App2
Bins/Libs
App3
Infrastructure
Operating system
Docker Engine
Bins/Libs
App1
Bins/Libs
App2
Bins/Libs
App3
Share the kernel with other containers
Running as isolated processes in user
space
Docker containers are not tied to any
specific infrastructure
Base tech of container(AUFS)
Group of branches by order
- a branch (=a single directory)
- is stored in a directory in the host
at least,
- a single branch for Read-only
many Read-Write branches Read-only
Read-write
Read-write
Read-write
Base tech of container(AUFS)
Mount
point
AUFS, mount-point of a container is:
/var/lib/docker/aufs/mnt/$CONTAINER_ID/
It is only mounted when the container is running
AUFS branches(read-only & read-write) are in:
/var/lib/docker/aufs/diff/$CONTAINER_OR_IMAGE_ID
Base tech of container(AUFS)
e.g. Create Container
/proc/mount
/sys/fs/aufs/si_XXXX/br*
/var/lib/docker/aufs/diff/XXX
Container = a group of branches
host container
Base tech of container(AUFS)
A file (container / host)
Delete container
container
Host
Base tech of container(AUFS)
Docker V1.10
: Content addressable storage model
Ubuntu: 15.04 Image
C84bfc126a2
188MB
D14bfc54ea1
194.5KB
c80179960767
1.895KB
6d45a3841788
0 B
Thin R/W layer Container layer
Image layer (R/O)
- Docker storage driver is:
enabling and managing both image layer & container layer.
stacking layers , providing a single unified view
- Location: /var/lib/docker/.
Ubuntu: 15.04 Image
C84bfc126a2
188MB
D14bfc54ea1
194.5KB
c80179960767
1.895KB
6d45a3841788
0 B
Thin R/W layer
• Security
• Avoid ID Collisions
• Guarantees data integrity
Random UUID
Cryptographic
Content hashes
Storage Driver
AUFS
Btrfs
Device mapper
OverlayFS
ZFS
1. Search through the image layers
top-down approach
2. Perform “copy-up” operation
copies the file thin writable layer
3. Modify the copy of the file
File modification(create, delete, update) steps..
Ubuntu: 15.04 Image
C84bfc126a2
188MB
D14bfc54ea1
194.5KB
c80179960767
1.895KB
6d45a3841788
0 B
Thin R/W layer
Ubuntu: 15.04 Image
C84bfc126a2
188MB
D14bfc54ea1
194.5KB
c80179960767
1.895KB
6d45a3841788 0 B
Thin R/W layer
6d45a3841788 2B
Modification
2B on 6d~
copy-up
modification
Developed by Rohit Seth in 2006 under the name
“Process Containers”
Kernel capability to limit, account(metering) and isolate
resources
CPU, Memory, Disk I/O, Network
Base tech of container(CGroups)
Cgroup controllers
Memory controller
CPUset controller
CPUaccounting controller
CPUscheduler controller
Devices controller
I/O controller for block devices
Freezer
Network Class Controller
reducing resource
contention and increasing
predictability in performance
Controller Description
memory
Allows for setting limits of RAM and resource
usage and querying cumulative usage of all
processes in the group
cpuset
Binding of processes within a group to a set of
CPUs and controlling migration between CPUs
cpuacct
Information about CPU usage for a group of
processes
cpu
Controlling the prioritization of processes in the
group
devices
Access control lists on character and block
devices
Base tech of container(CGroups)
Base tech of container(CGroups)
Cgroups(control groups)
A ‘cgroups’ associate a set of tasks with a set of parameters for one or
more subsystems
A ‘subsystem’ is a module that makes use of the task grouping facilities
provided by cgroups to treat groups of tasks in particular ways
A ‘subsystem’ is typically a “resource controller” that schedules a
resource and applies per-cgroup limits
A ‘hierarchy’ is a set of cgroups arranged in a tree, such that every task
in the system is in exactly one of the cgroups in the hierarchy and a set
of subsystems; each subsystem has system-specific state attached to
each cgroups in the hierarchy. Each hierarchy has an instance of the
cgroups virtual filesystem associated with it.
Cgroup subsystem
-Isolation and special controls: cpuset, namespace, freezer, device, checkpoint/restart
-Resource control: cpu(scheduler), memory, disk io, network
Base tech of
container(Namespace)
handle six items in table below
Controller Description
PID Processes (Process ID)
NET Network Interface/ Iptables/ Routing Tables/ Sockets
MNT Root File System
UTS Hostname
IPC Inter Process Communication
USER UID/GID, security improvement
Base tech of container(Summarize)
Why do we need CGroups?
SLA Management: reduce resource contention and increase predictability in performance
Large Virtual Consolidation: prevent single or group of virtual machines monopolizing resources or
impacting other env
Cgroups-Limit use of resources
Namespace-Limits what resources can be seen
Namespace provide processes with their own view of
system
Docker
namespaces cgroups
libcontainer
Base tech of container(COW)
Everyone has a single shared copy of the same data until
it’s over written, and then a copy is made.
Docker uses COW, which essentially means that every
instance of your docker image uses the same files until
one of them needs to change a file.
K8S terms
Replication
Controllers
Dynamically manage(create, kill, etc) the lifecycle of pods
(Scaling up/down, rolling updates)
Clusters
Services
• abstraction
• a REST object
• a logical set of
pods & a policy
Services
pod pod pod
pod pod pod
Pods
• a collocated
group of Docker
containers with
shared volumes
• each of pods are
born and die
container container
server server server
Deployable unit
• Created
• Scheduled
• Managed
Pool of
Kubernetes
resources
IPtables Rule
container
container
K8S terms (routing mode of service traffic)
Iptables rule
service
endpoint
endpoint
endpoint
Kube-proxy
Master
mode: userspace
pod
redirect
Iptables rule
service
endpoint
endpoint
endpoint
Kube-proxy
Master
mode: iptables
pod
redirect
• Fast
• Reliable
But,
• No retry
How K8S works
Kubernetes Master
Worker Node
API server
ETCD
Scheduler
Kubernetes controller manager
server
kublet Kube-proxy
Master’s status is stored
Validates and configures
Pod
Service
Replication controller
REST operations
Container manifest
: YAML
(description of pod)
Services
pod pod pod
8080
4001
8080
8080
Schedule pods to worker nodes
Synchronize pod status
K8S Service Traffic Flows
rc:3 rc:1 rc:2
Service 2
(…)
Service 3
(back-end)
kube-proxy kube-proxy
Service 1
(front-end)
kube-proxy
request
Cluster-domain : 10.100.0.10 (Service_Cluster_IP_Range, virtual IP)
Cluster-pool: 192.168.0.0/16
Cluster
Domain
Cluster
Pool
skydns
skydns
pod
containe
r
pod pod
containe
r
containe
r
pod pod pod
containe
r
containe
r
containe
r
Then, what is Kube-proxy?
Node #2
Node #1
Kube-proxy
pod
container
pod
container
Iptables
rule
Watches kubernetes master
to add and remove the objects
- Service
- Endpoints
Can do simple TCP,UDP stream forwarding
Round Robin TCP, UDP forwarding
VIP is managed by kube-proxy
Watch all services
Updates iptables after backend changing
Translate ServiceIP to Pod IP
Master ETCD Cluster
API Server ETCD
Cluster status
Current configuration
SkyDNS
SkyDNS in Kubernetes?
Kubernetes offers a DNS cluster addon, which most of the supported
environments enabled by default.
SkyDNS is a DNS service, with some custom logic to slave it to the Kubernetes
API Server
Create Service DNS name is mapped
to the service
Virtual IP address is
assigned to a service
Kubelet –v=5 –address=0.0.0.0 –port=10250 –hostname_override=105.144.47.24 –
api_servers=105.*.*.23:8080 –healthz_bind_address=0.0.0.0 –healthz_port=10248 –
network_plugin=calico –cluster-domain=cluster.local –cluster-dns=10.100.0.10 –logtostderr=true
SkyDNS(cont..)
ETCD in pod
(DNS record)
SkyDNS in pod
(DNS server)
Kube2SKY in
pod
(bridging between
Kubernetes and ETCD)
Kubernetes
(kubelet)
Pods in running
Kubernetes
(Master)
Service info is
published/written into etcd
Then,
SkyDNS be able to retrieve
the name of service
Kublet pretends itself to a
DNS server
Info of Service is pulled
from master into SkyDNS
e.g. what services has
changed?
Retrieve
Search
Query
Update
Editor's Notes
순서에 의해 나열된 브랜치들의 묶음, 각각의 브랜치는 디렉토리를 의미, 이들은 호스트 머쉰내 디렉토리에 저장
순서에 의해 나열된 브랜치들의 묶음, 각각의 브랜치는 디렉토리를 의미, 이들은 호스트 머쉰내 디렉토리에 저장
순서에 의해 나열된 브랜치들의 묶음, 각각의 브랜치는 디렉토리를 의미, 이들은 호스트 머쉰내 디렉토리에 저장
How many copy up on the same file in thin R/W layer if it is required to modify? No copy-up …just one time…
Where a container is deleted,,,any data written to the container that is not stored in a data volume is deleted along with the container.
Data volume(directly mounted into a container) is required to keep data eternally , Data volume is not controlled by storage driver.