SlideShare a Scribd company logo
1 of 70
Download to read offline
Kubernetes Walk-through
by Harry Zhang @resouer
Kubernetes
• Created by Google Borg/Omega team
• Founded and operated by CNCF (Linux Foundation)
• Container orchestration, scheduling and management
• One of the most popular open source project in the world
Project State
Growing Contributors
• 1728+ authors
Architecture
kubelet
SyncLoop
controller-manager
ControlLoop
kubelet
SyncLoop
proxy
proxy
network
pod
replica
namespace
service
job
deployment
volume
petset
…
scheduler
Node
Node
Desired World
Real World
etcdapi-server
Example
kubelet
SyncLoop
kubelet
SyncLoop
proxy
proxy
1 Container created
etcd
scheduler
api-server
Example
kubelet
SyncLoop
kubelet
SyncLoop
proxy
proxy
2 Object added
etcd
scheduler
api-server
Example
kubelet
SyncLoop
kubelet
SyncLoop
proxy
proxy
3.1 New container detected
3.2 Bind container to a node
etcd
scheduler
api-server
Example
kubelet
SyncLoop
kubelet
SyncLoop
proxy
proxy
4.1 Detected bind operation
4.2 Start container on this machine
etcd
scheduler
api-server
Take Aways
• Independent control loops
• loosely coupled
• high performance
• easy to customize and extend
• “Watch” object change
• Decide next step based on state change
• not edge driven (event), level driven (state)
{Pod} = a group of containers
Co-scheduling
• Tow containers:
• App: generate log files
• LogCollector: read and redirect logs to storage
• Request MEM:
• App: 1G
• LogCollector: 0.5G
• Available MEM:
• Node_A: 1.25G
• Node_B: 2G
• What happens if App is scheduled to Node_A first?
Pod
• Deeply coupled containers
• Atomic scheduling/placement unit
• Shared namespace
• network, IPC etc
• Shared volume
• Process group in container cloud
Why co-scheduling?
• It’s about using container in right way:
• Lesson learnt from Borg: “workloads tend to have tight relationship”
Ensure Container Order
• Decouple web server
and application
• war file container
• tomcat container
• Wrong!
Multiple Apps in One Container?
Master Pod
kube-apiserver
kube-scheduler
controller-manager
⽇日志看不不到

是否running没法判断

运维操作困难

出错定位麻烦,不不知道是哪个挂了了,频繁登陆容器器
Copy Files from One to Another?
• Wrong!
Master Pod
kube-apiserver
kube-scheduler
controller-manager
/etc/kubernetes/ssl
Connect to Peer Container thru IP?
• Wrong!
Master Pod
kube-apiserver
kube-scheduler
controller-manager
network namespace
So this is Pod
• Design pattern in container world
• decoupling
• reuse & refactoring
• Describe more real-world workloads by container
• e.g. ML
• Parameter server and trainer in same Pod
Kubernetes Control Panel
1. How Kubernetes schedule
workloads?
Resource Model
• Compressible resources
•  Hold no state
•  Can be taken away very quickly
•  “Merely” cause slowness when revoked
•  e.g. CPU
• Non-compressible resources
• Hold state
• Are slower to be taken away
• Can fail to be revoked
• e.g. Memory, disk space
Kubernetes (and Docker) can only handle CPU & Memory
Don’t handle things like memory bandwidth, disk time,
cache, network bandwidth, ... (yet)
Resource Model
• Request: amount of a resource allowed
to be used, with a strong guarantee of
availability
•  CPU (seconds/second), RAM (bytes)
•  Scheduler will not over-commit
requests
• Limit: max amount of a resource that
can be used, regardless of guarantees
• scheduler ignores limits
• Mapping to Docker
• —cpu-shares=requests.cpu
• —cpu-quota=limits.cpu
• —cpu-period=100ms
• —memory=limits.memory
QoS Tiers and Eviction
• Guaranteed
• limits is set for all resources, all containers
• limits == requests (if set)
• Be killed until they exceed their limits
• or if the system is under memory pressure and there are no lower priority containers that can be killed.
• Burstable
• requests is set for one or more resources, one or more containers
• limits (if set) != requests
• killed once they exceed their requests and no Best-Effort pods exist when system under memory pressure
• Best-Effort
• requests and limits are not set for all of the resources, all containers
• First to get killed if the system runs out of memory
Scheduler
• Predicates
• NoDiskConflict
• NoVolumeZoneConflict
• PodFitsResources
• PodFitsHostPorts
• MatchNodeSelector
• MaxEBSVolumeCount
• MaxGCEPDVolumeCount
• CheckNodeMemoryPressure
• eviction, QoS tiers
• CheckNodeDiskPressure
• Priorities
• LeastRequestedPriority
• BalancedResourceAllocation
• SelectorSpreadPriority
• CalculateAntiAffinityPriority
• ImageLocalityPriority
• NodeAffinityPriority
• Design tips:
• watch and sync podQueue
• schedule based on cached info
• optimistically bind
• predicates is paralleled between
nodes
• priorities are paralleled between
functions in Map-Reduce way
Multi-Scheduler
The 2nd scheduler
• Tips: annotation: system usage labels
• Do NOT abuse labels
2. Workload management?
Deployment
• Replicas with control
• Bring up a Replica Set and Pods.
• Check the status of a Deployment.
• Update that Deployment (e.g. new image, labels).
• Rollback to an earlier Deployment revision.
• Pause and resume a Deployment.
Create
• ReplicaSet
• Next generation of ReplicaController
• —record: record command in the annotation of ‘nginx-deployment’
Check
• DESIRED: .spec.replicas
• CURRENT: .status.replicas
• UP-TO-DATE: contains the latest pod template
• AVAILABLE: pod status is ready (running)
Update
• kubectl set image
• will change container image
• kubectl edit
• open an editor and modify
your deployment yaml
• RollingUpdateStrategy
• 1 max unavailable
• 1 max surge
• can also be percentage
• Does not kill old Pods until a sufficient
number of new Pods have come up
• Does not create new Pods until a
sufficient number of old Pods have
been killed.
trigger
Update Process
• The update process is coordinated by Deployment
Controller
• Create: Replica Set (nginx-deployment-2035384211) and scaled it up to 3 replicas directly.
• Update:
• created a new Replica Set (nginx-deployment-1564180365) and scaled it up to 1
• scaled down the old Replica Set to 2
• continued scaling up and down the new and the old Replica Set, with the same rolling update
strategy.
• Finally, 3 available replicas in the new Replica Set, and the old Replica Set is scaled down to 0.
Rolling Back
• Check reversions
• Roll back to reversion
Pausing & Resuming
(Canary)
• Tips
• blue-green deployment: duplicated infrastructure
• canary release: share same infrastructure
• rollback resumed deployment is WIP
• old way: kubectl rolling-update rc-1 rc-2
3. Deploy Daemon workload to
every Node?
DaemonSet
• Spread daemon pod to every node
• DaemonSet Controller
• bypass default scheduler
• even on unschedulable nodes
• e.g. bootstrap
4. Automatically scale?
Horizontal Pod Autoscaling
• Tips
• Scale out/in
• TriggeredScaleUp (GCE, AWS, will add more)
• Support for custom metrics
Custom Metrics
• Endpoint (Location to collect metrics from)
• Name of metric
• Type (Counter, Gauge, ...)
• Data Type (int, float)
• Units (kbps, seconds, count)
• Polling Frequency
• Regexps (Regular expressions to specify
which metrics to collect and how to parse
them)
• The metric will be added to pod as
ConfigMap volume
Prometheus
Nginx
5. Pass information to workloads?
ConfigMap
• Decouple configuration from image
• configuration is a runtime attribute
• Can be consumed by pods thru:
• env
• volumes
ConfigMap Volume
• No need to use Persistent Volume
• Think about Etcd
Secret
• Tip: credentials for
accessing the k8s API is
automatically added to
your pods as secret
6. Read information from system
itself?
Downward Api
• Get these inside your pod as
ENV or volume
• The pod’s name
• The pod’s namespace
• The pod’s IP
• A container’s cpu limit
• A container’s cpu request
• A container’s memory limit
• A container’s memory request
7. Service discovery?
Service
• The unified portal of replica Pods
• Portal IP:Port
• External load balancer
• GCE
• AWS
• HAproxy
• Nginx
• OpenStack LB
Service Implementation
Tip: ipvs solution works in nat mode which is the same with this iptables way
$ iptables-save | grep my-service
-A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6
-A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ
-A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ
-A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80
-A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80
Publishing Services
• Use Service.Type=NodePort
• <node_ip>:<node_port>
• External IP
• IPs route to one or more cluster nodes (e.g. floating IP)
• Use external LoadBalancer
• Require support from IaaS (GCE, AWS, OpenStack)
• Deploy a service-loadbalancer (e.g. HAproxy)
• Official guide: https://github.com/kubernetes/contrib/tree/master/service-loadbalancer
Ingress
• The next generation external Service load
balancer
• Deployed as a Pod on dedicated Node
(with external network)
• Implementation
• Nginx, HAproxy, GCE L7
• External access for service
• SSL support for service
• …
s1http://foo.bar.com <IP_of_Ingress_node>
http://foo.bar.com/foo
Headless Service
*.nginx.default.svc.cluster.local
app=nginx app=nginx app=nginx
also: subdomain
8. Stateful applications?
StatefulSet: “clustered applications”
• Ordinal index
• startup/teardown ordering
• Stable hostname
• Stable storage
• linked to the ordinal & hostname
• Databases like MySQL or PostgreSQL
• single instance attached to a persistent volume at any time
• Clustered software like Zookeeper, Etcd, or Elasticsearch, Cassandra
• stable membership.
Update StatefulSet:
Scale: create/delete one by one
Scale in: will not delete old persistent volume
StatefulSet
StatefulSet Example
cassandra-0 cassandra-1
volume 0 volume 1
cassandra-0.cassandra.default.svc.cluster.local
cassandra-1.cassandra.default.svc.cluster.local
$ kubectl patch petset cassandra -p '{"spec":{"replicas":10}}'
9. Container network?
One Pod One IP
• Network sharing is important for affiliate
containers
• Not all containers need independent
network
• Network implementation for pod is
totally the same as for single container
Pod
Infra
container
Container A Container B
--net=container:pause
/proc/{pid}/ns/net -> net:[4026532483]
Kubernetes uses CNI
• CNI plugin
• e.g. Calico, Flannel etc
• The kubelet cni flags:
• --network-plugin=cni
• --network-plugin-dir=/etc/cni/net.d
• CNI is very simple
1.Kubelet creates a network namespace for Pod
2.Kubelet invokes CNI plugin to configure the NS (interface
name, IP, MAC, gateway, bridge name …)
3.Infra container in Pod join this network namespace
Tips
• host < calico(bgp) < calico(ipip) = flannel(vxlan) = docker(vxlan) < flannel(udp) < weave(udp)
• Test graph comes from: http://cmgs.me/life/docker-network-cloud
Calico Flannel Weave Docker Overlay Network
Network Model Pure Layer-3 Solution VxLAN or UDP Channel VxLAN or UDP Channel VxLAN
Calico
• Step 1: Run calico-node image as DaemonSet
Calico
• Step 2: Download and enable calico cni plugin
Calico
• Step 3: Add calico network controller
• Done!
10. Persistent volume?
Persistent Volumes
• -v host_path:container_path
1.Attach networked storage to host path
1. mounted to host_path
2.Mount host path as container volume
1. bind mount container_path with host_path
3. Independent volume control loop
Officially Supported PVs
• GCEPersistentDisk
• AWSElasticBlockStore
• AzureFile
• FC (Fibre Channel)
• NFS
• iSCSI
• RBD (Ceph Block Device)
• CephFS
• Cinder (OpenStack block storage)
• Glusterfs
• VsphereVolume
• HostPath (single node testing only)
• more than 20+
• Write your own volume plugin: FlexVolume
1. Implement 10 methods
2. Put binary/shell in plugin directory
• example: LVM as k8s volume
Production ENV Volume Model
Persistent Volumes
PersistentVolumeClaims Pod
Host
path
networked
storage
Pod Pod
mountPath mountPath
Key point: 职责分离
PV & PVC
• System Admin:
• $ kubectl create -f nfs-pv.yaml
• create a volume with access mode, capacity, recycling mode
• Dev:
• $ kubectl create -f pv-claim.yaml
• request a volume with access mode, resource, selector
• $ kubectl create -f pod.yaml
More …
• GC
• Health check
• Container lifecycle hook
• Jobs (batch)
• Pod affinity and binding
• Dynamic provisioning
• Rescheduling
• CronJob
• Logging and monitoring
• Network policy
• Federation
• Container capabilities
• Resource quotas
• Security context
• Security polices
• GPU scheduling
Summary
• Q: Where are all these control panel ideas come from?
• A: Kubernetes = “Borg” + “Container”
• Kubernetes is a set of methodology for using containers based on past
10+ yr’s exp in Google Inc.
• “不不要摸着⽯石头过河”
• Kubernetes is a container centric DevOps/Workload orchestration system
• Not a “CI/CD”, “Micro-service” focused container cloud
Growing Adopters
• Public Cloud
• AWS
• Microsoft Azure (acquired Deis)
• Google Cloud
• 腾讯云
• 百度AI
• 阿⾥里里云
Enterprise Users
THE END
@resouer
harryzhang@zju.edu.cn

More Related Content

What's hot

What's hot (20)

Monitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on KubernetesMonitoring, Logging and Tracing on Kubernetes
Monitoring, Logging and Tracing on Kubernetes
 
Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
 
kubernetes for beginners
kubernetes for beginnerskubernetes for beginners
kubernetes for beginners
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
Docker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopDocker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshop
 
Scaling Microservices with Kubernetes
Scaling Microservices with KubernetesScaling Microservices with Kubernetes
Scaling Microservices with Kubernetes
 
Kubernetes introduction
Kubernetes introductionKubernetes introduction
Kubernetes introduction
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Kubernetes automation in production
Kubernetes automation in productionKubernetes automation in production
Kubernetes automation in production
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
 
Intro to Kubernetes
Intro to KubernetesIntro to Kubernetes
Intro to Kubernetes
 
Container orchestration
Container orchestrationContainer orchestration
Container orchestration
 
Using Docker with OpenStack - Hands On!
 Using Docker with OpenStack - Hands On! Using Docker with OpenStack - Hands On!
Using Docker with OpenStack - Hands On!
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Planes, Raft, and Pods: A Tour of Distributed Systems Within Kubernetes
Planes, Raft, and Pods: A Tour of Distributed Systems Within KubernetesPlanes, Raft, and Pods: A Tour of Distributed Systems Within Kubernetes
Planes, Raft, and Pods: A Tour of Distributed Systems Within Kubernetes
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
 
Kubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and ServicesKubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and Services
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Kubernetes - introduction
Kubernetes - introductionKubernetes - introduction
Kubernetes - introduction
 

Similar to Kubernetes Walk Through from Technical View

Similar to Kubernetes Walk Through from Technical View (20)

Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Kubernetes fundamentals
Kubernetes fundamentalsKubernetes fundamentals
Kubernetes fundamentals
 
CKA_1st.pptx
CKA_1st.pptxCKA_1st.pptx
CKA_1st.pptx
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
Kubernetes Problem-Solving
Kubernetes Problem-SolvingKubernetes Problem-Solving
Kubernetes Problem-Solving
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Kubernetes intro public - kubernetes meetup 4-21-2015
Kubernetes intro   public - kubernetes meetup 4-21-2015Kubernetes intro   public - kubernetes meetup 4-21-2015
Kubernetes intro public - kubernetes meetup 4-21-2015
 
Kubernetes intro public - kubernetes user group 4-21-2015
Kubernetes intro   public - kubernetes user group 4-21-2015Kubernetes intro   public - kubernetes user group 4-21-2015
Kubernetes intro public - kubernetes user group 4-21-2015
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive MeetupGoogle Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
 
141204 upload
141204 upload141204 upload
141204 upload
 
Kubernetes presentation
Kubernetes presentationKubernetes presentation
Kubernetes presentation
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Kubernetes Walk Through from Technical View

  • 2. Kubernetes • Created by Google Borg/Omega team • Founded and operated by CNCF (Linux Foundation) • Container orchestration, scheduling and management • One of the most popular open source project in the world
  • 8. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 3.1 New container detected 3.2 Bind container to a node etcd scheduler api-server
  • 9. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 4.1 Detected bind operation 4.2 Start container on this machine etcd scheduler api-server
  • 10. Take Aways • Independent control loops • loosely coupled • high performance • easy to customize and extend • “Watch” object change • Decide next step based on state change • not edge driven (event), level driven (state)
  • 11. {Pod} = a group of containers
  • 12. Co-scheduling • Tow containers: • App: generate log files • LogCollector: read and redirect logs to storage • Request MEM: • App: 1G • LogCollector: 0.5G • Available MEM: • Node_A: 1.25G • Node_B: 2G • What happens if App is scheduled to Node_A first?
  • 13. Pod • Deeply coupled containers • Atomic scheduling/placement unit • Shared namespace • network, IPC etc • Shared volume • Process group in container cloud
  • 14. Why co-scheduling? • It’s about using container in right way: • Lesson learnt from Borg: “workloads tend to have tight relationship”
  • 15. Ensure Container Order • Decouple web server and application • war file container • tomcat container
  • 16. • Wrong! Multiple Apps in One Container? Master Pod kube-apiserver kube-scheduler controller-manager ⽇日志看不不到 是否running没法判断 运维操作困难 出错定位麻烦,不不知道是哪个挂了了,频繁登陆容器器
  • 17. Copy Files from One to Another? • Wrong! Master Pod kube-apiserver kube-scheduler controller-manager /etc/kubernetes/ssl
  • 18. Connect to Peer Container thru IP? • Wrong! Master Pod kube-apiserver kube-scheduler controller-manager network namespace
  • 19. So this is Pod • Design pattern in container world • decoupling • reuse & refactoring • Describe more real-world workloads by container • e.g. ML • Parameter server and trainer in same Pod
  • 21. 1. How Kubernetes schedule workloads?
  • 22. Resource Model • Compressible resources •  Hold no state •  Can be taken away very quickly •  “Merely” cause slowness when revoked •  e.g. CPU • Non-compressible resources • Hold state • Are slower to be taken away • Can fail to be revoked • e.g. Memory, disk space Kubernetes (and Docker) can only handle CPU & Memory Don’t handle things like memory bandwidth, disk time, cache, network bandwidth, ... (yet)
  • 23. Resource Model • Request: amount of a resource allowed to be used, with a strong guarantee of availability •  CPU (seconds/second), RAM (bytes) •  Scheduler will not over-commit requests • Limit: max amount of a resource that can be used, regardless of guarantees • scheduler ignores limits • Mapping to Docker • —cpu-shares=requests.cpu • —cpu-quota=limits.cpu • —cpu-period=100ms • —memory=limits.memory
  • 24. QoS Tiers and Eviction • Guaranteed • limits is set for all resources, all containers • limits == requests (if set) • Be killed until they exceed their limits • or if the system is under memory pressure and there are no lower priority containers that can be killed. • Burstable • requests is set for one or more resources, one or more containers • limits (if set) != requests • killed once they exceed their requests and no Best-Effort pods exist when system under memory pressure • Best-Effort • requests and limits are not set for all of the resources, all containers • First to get killed if the system runs out of memory
  • 25. Scheduler • Predicates • NoDiskConflict • NoVolumeZoneConflict • PodFitsResources • PodFitsHostPorts • MatchNodeSelector • MaxEBSVolumeCount • MaxGCEPDVolumeCount • CheckNodeMemoryPressure • eviction, QoS tiers • CheckNodeDiskPressure • Priorities • LeastRequestedPriority • BalancedResourceAllocation • SelectorSpreadPriority • CalculateAntiAffinityPriority • ImageLocalityPriority • NodeAffinityPriority • Design tips: • watch and sync podQueue • schedule based on cached info • optimistically bind • predicates is paralleled between nodes • priorities are paralleled between functions in Map-Reduce way
  • 26. Multi-Scheduler The 2nd scheduler • Tips: annotation: system usage labels • Do NOT abuse labels
  • 28. Deployment • Replicas with control • Bring up a Replica Set and Pods. • Check the status of a Deployment. • Update that Deployment (e.g. new image, labels). • Rollback to an earlier Deployment revision. • Pause and resume a Deployment.
  • 29. Create • ReplicaSet • Next generation of ReplicaController • —record: record command in the annotation of ‘nginx-deployment’
  • 30. Check • DESIRED: .spec.replicas • CURRENT: .status.replicas • UP-TO-DATE: contains the latest pod template • AVAILABLE: pod status is ready (running)
  • 31. Update • kubectl set image • will change container image • kubectl edit • open an editor and modify your deployment yaml • RollingUpdateStrategy • 1 max unavailable • 1 max surge • can also be percentage • Does not kill old Pods until a sufficient number of new Pods have come up • Does not create new Pods until a sufficient number of old Pods have been killed. trigger
  • 32. Update Process • The update process is coordinated by Deployment Controller • Create: Replica Set (nginx-deployment-2035384211) and scaled it up to 3 replicas directly. • Update: • created a new Replica Set (nginx-deployment-1564180365) and scaled it up to 1 • scaled down the old Replica Set to 2 • continued scaling up and down the new and the old Replica Set, with the same rolling update strategy. • Finally, 3 available replicas in the new Replica Set, and the old Replica Set is scaled down to 0.
  • 33. Rolling Back • Check reversions • Roll back to reversion
  • 34. Pausing & Resuming (Canary) • Tips • blue-green deployment: duplicated infrastructure • canary release: share same infrastructure • rollback resumed deployment is WIP • old way: kubectl rolling-update rc-1 rc-2
  • 35. 3. Deploy Daemon workload to every Node?
  • 36. DaemonSet • Spread daemon pod to every node • DaemonSet Controller • bypass default scheduler • even on unschedulable nodes • e.g. bootstrap
  • 38. Horizontal Pod Autoscaling • Tips • Scale out/in • TriggeredScaleUp (GCE, AWS, will add more) • Support for custom metrics
  • 39. Custom Metrics • Endpoint (Location to collect metrics from) • Name of metric • Type (Counter, Gauge, ...) • Data Type (int, float) • Units (kbps, seconds, count) • Polling Frequency • Regexps (Regular expressions to specify which metrics to collect and how to parse them) • The metric will be added to pod as ConfigMap volume Prometheus Nginx
  • 40. 5. Pass information to workloads?
  • 41. ConfigMap • Decouple configuration from image • configuration is a runtime attribute • Can be consumed by pods thru: • env • volumes
  • 42. ConfigMap Volume • No need to use Persistent Volume • Think about Etcd
  • 43. Secret • Tip: credentials for accessing the k8s API is automatically added to your pods as secret
  • 44. 6. Read information from system itself?
  • 45. Downward Api • Get these inside your pod as ENV or volume • The pod’s name • The pod’s namespace • The pod’s IP • A container’s cpu limit • A container’s cpu request • A container’s memory limit • A container’s memory request
  • 47. Service • The unified portal of replica Pods • Portal IP:Port • External load balancer • GCE • AWS • HAproxy • Nginx • OpenStack LB
  • 48. Service Implementation Tip: ipvs solution works in nat mode which is the same with this iptables way $ iptables-save | grep my-service -A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6 -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80 -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80
  • 49. Publishing Services • Use Service.Type=NodePort • <node_ip>:<node_port> • External IP • IPs route to one or more cluster nodes (e.g. floating IP) • Use external LoadBalancer • Require support from IaaS (GCE, AWS, OpenStack) • Deploy a service-loadbalancer (e.g. HAproxy) • Official guide: https://github.com/kubernetes/contrib/tree/master/service-loadbalancer
  • 50. Ingress • The next generation external Service load balancer • Deployed as a Pod on dedicated Node (with external network) • Implementation • Nginx, HAproxy, GCE L7 • External access for service • SSL support for service • … s1http://foo.bar.com <IP_of_Ingress_node> http://foo.bar.com/foo
  • 53. StatefulSet: “clustered applications” • Ordinal index • startup/teardown ordering • Stable hostname • Stable storage • linked to the ordinal & hostname • Databases like MySQL or PostgreSQL • single instance attached to a persistent volume at any time • Clustered software like Zookeeper, Etcd, or Elasticsearch, Cassandra • stable membership. Update StatefulSet: Scale: create/delete one by one Scale in: will not delete old persistent volume
  • 54. StatefulSet StatefulSet Example cassandra-0 cassandra-1 volume 0 volume 1 cassandra-0.cassandra.default.svc.cluster.local cassandra-1.cassandra.default.svc.cluster.local $ kubectl patch petset cassandra -p '{"spec":{"replicas":10}}'
  • 56. One Pod One IP • Network sharing is important for affiliate containers • Not all containers need independent network • Network implementation for pod is totally the same as for single container Pod Infra container Container A Container B --net=container:pause /proc/{pid}/ns/net -> net:[4026532483]
  • 57. Kubernetes uses CNI • CNI plugin • e.g. Calico, Flannel etc • The kubelet cni flags: • --network-plugin=cni • --network-plugin-dir=/etc/cni/net.d • CNI is very simple 1.Kubelet creates a network namespace for Pod 2.Kubelet invokes CNI plugin to configure the NS (interface name, IP, MAC, gateway, bridge name …) 3.Infra container in Pod join this network namespace
  • 58. Tips • host < calico(bgp) < calico(ipip) = flannel(vxlan) = docker(vxlan) < flannel(udp) < weave(udp) • Test graph comes from: http://cmgs.me/life/docker-network-cloud Calico Flannel Weave Docker Overlay Network Network Model Pure Layer-3 Solution VxLAN or UDP Channel VxLAN or UDP Channel VxLAN
  • 59. Calico • Step 1: Run calico-node image as DaemonSet
  • 60. Calico • Step 2: Download and enable calico cni plugin
  • 61. Calico • Step 3: Add calico network controller • Done!
  • 63. Persistent Volumes • -v host_path:container_path 1.Attach networked storage to host path 1. mounted to host_path 2.Mount host path as container volume 1. bind mount container_path with host_path 3. Independent volume control loop
  • 64. Officially Supported PVs • GCEPersistentDisk • AWSElasticBlockStore • AzureFile • FC (Fibre Channel) • NFS • iSCSI • RBD (Ceph Block Device) • CephFS • Cinder (OpenStack block storage) • Glusterfs • VsphereVolume • HostPath (single node testing only) • more than 20+ • Write your own volume plugin: FlexVolume 1. Implement 10 methods 2. Put binary/shell in plugin directory • example: LVM as k8s volume
  • 65. Production ENV Volume Model Persistent Volumes PersistentVolumeClaims Pod Host path networked storage Pod Pod mountPath mountPath Key point: 职责分离
  • 66. PV & PVC • System Admin: • $ kubectl create -f nfs-pv.yaml • create a volume with access mode, capacity, recycling mode • Dev: • $ kubectl create -f pv-claim.yaml • request a volume with access mode, resource, selector • $ kubectl create -f pod.yaml
  • 67. More … • GC • Health check • Container lifecycle hook • Jobs (batch) • Pod affinity and binding • Dynamic provisioning • Rescheduling • CronJob • Logging and monitoring • Network policy • Federation • Container capabilities • Resource quotas • Security context • Security polices • GPU scheduling
  • 68. Summary • Q: Where are all these control panel ideas come from? • A: Kubernetes = “Borg” + “Container” • Kubernetes is a set of methodology for using containers based on past 10+ yr’s exp in Google Inc. • “不不要摸着⽯石头过河” • Kubernetes is a container centric DevOps/Workload orchestration system • Not a “CI/CD”, “Micro-service” focused container cloud
  • 69. Growing Adopters • Public Cloud • AWS • Microsoft Azure (acquired Deis) • Google Cloud • 腾讯云 • 百度AI • 阿⾥里里云 Enterprise Users