SlideShare a Scribd company logo
1 of 22
Download to read offline
How Self-Healing Nodes and Infrastructure
Management Impact Reliability
Oleg Chunikhin |CTO
Introductions
Oleg Chunikhin
CTO, Kublr
✔ 20+ years in software architecture & development
✔ Working w/ Kubernetes since its release in 2015
✔ CTO at Kublr—an enterprise ready container
management platform
✔ Twitter @olgch; @kublr
Like what you hear? Tweet at us!
Automation
Ingress
Custom
Clusters
Infrastructure
Logging Monitoring
Observability
API
Usage
Reporting
RBAC IAM
Air Gap TLS
Certificate
Rotation
Audit
Storage Networking
Container
Registry
CI / CD App Mgmt
Infrastructure
Container Runtime Kubernetes
OPERATIONS SECURITY &
GOVERNANCE
What’s Kublr?
@olgch, @kublr
Building a Reliable System with Kubernetes
• Day 1: trivial, K8S will restart my pod!
• Day 30: this is not so simple...
• What Kubernetes does, can, and doesn’t do
• Full stack reliability: tools and approaches
@olgch, @kublr
Kubernetes Cluster
K8S Architecture Refresher: Components
The Master, agent, etcd, API, overlay network, and DNS
Master
API Server
etcd data
controller
manager
scheduler etcd
kubectl
Worker
kubelet
container
runtime
overlay
network
cluster
DNS
kube-proxy
@olgch, @kublr
Cluster
K8S Architecture Refresher: API Objects
Nodes, pods, services, and persistent volumes
Node 1 Node 2
Pod A-1
10.0.0.3
Cnt1
Cnt2
Pod A-2
10.0.0.5
Cnt1
Cnt2
Pod B-1
10.0.0.8
Cnt3
SrvA
10.7.0.1
SrvB
10.7.0.3
Persistent
Volume
@olgch, @kublr
Reliability Tools: Pod Probes and Controllers
Probes
Liveness and readiness check
• TCP, HTTP(S), exec
Controllers
ReplicaSet
• Maintain specific number of identical replicas
Deployment
• ReplicaSet + update strategy, rolling update
StatefulSet
• Deployment + replica identity, persistent volume stability
DaemonSet
• Maintain identical replicas on each node (of a specified set)
Operators
@olgch, @kublr
• Resource framework
• Standard: CPU, memory, disk
• Custom: GPU, FPGA, etc.
• Requests and limits
• Kube and system reservations
• no swap
• Pod eviction and disruption budget (resource starving)
• Pod priority and preemption (critical pods)
• Affinity, anti-affinity, node selectors & matchers
Reliability Tools: Resource & Location Mgmt
@olgch, @kublr
Reliability Tools: Autoscaling
Horizontal pod autoscaler (HPA)
Vertical pod autoscaler (VPA)
• In-place updates - WIP (issue #5774)
Cluster Autoscaler
• Depends on infrastructure provider - uses node groups
AWS ASG, Azure ScaleSets, ...
• Supports AWS, Azure, GCE, GKE, Openstack, Alibaba Cloud
@olgch, @kublr
Applications/pods/containers
“Middleware”
• Operations: monitoring, log collection, alerting, etc.
• Lifecycle: CI/CD, SCM, binary repo, etc.
• Container management: registry, scanning, governance, etc.
Container Persistence: cloud native storage, DB, messaging
Container Orchestrator: Kubernetes
• “Essentials”: overlay network, DNS, autoscaler, etc.
• Core: K8S etcd, master, worker components
• Container engine: Docker, CRI-O, etc.
OS: kernel, network, devices, services, etc.
Infrastructure: “raw” compute, network, storage
Full Stack Reliability
@olgch, @kublr
Architecture 101
• Layers are separate and independent
• Disposable/restartable components
• Reattachable dependencies (including data)
• Persistent state is separate from disposable
processes
Pets vs cattle - only data is allowed to be pets
(ideally)
@olgch, @kublr
If a node has a problem...
• Try to fix it (pet)
• Replace or reset it (cattle)
Infrastructure and OS
Tools
• In-cluster: npd, weaveworks kured, ...
• hardware, kernel, servicer, container
runtime issues
• reboot
• Infrastructure provider automation
• AWS ASG, Azure Scale Set, ...
• External auto node recovery logic
• Custom + infrastructure provider API
• Cluster management solution
• (Future) cluster API
@olgch, @kublr
Components
• etcd
• Master: API server, controller
manager, scheduler
• Worker: kubelet, kube-proxy
• Container runtime: Docker,
CRI-O
Already 12 factor
Kubernetes Components: Auto-Recovery
Monitor liveliness, automate
restart
• Run as services
• Run as static pods
Dependencies to care about
• etcd data
• K8S keys and certificates
• Configuration
@olgch, @kublr
API Server
controller
manager
scheduler
API Server
controller
manager
scheduler
etcd data
etcd
etcd data
etcd
K8S multi-master
• Pros: HA, scaling
• Cons: need LB (server or client)
etcd cluster
• Pros: HA, data replication
• Cons: latency, ops complexity
etcd data
• Local ephemeral
• Local persistent (survives node failure)
• Remote persistent (survives node replacement)
Kubernetes Components: Multi-Master
API Server
etcd data
controller
manager
scheduler
etcd
kubectl
kubelet
@olgch, @kublr
• Persistent volumes
• Volume provisioning
• Storage categories
• Native block storage: AWS EBS, Azure Disk, vSphere volume,
attached block device, etc.
• HostPath
• Managed network storage: NFS, iSCSI, NAS/SAN, AWS EFS, etc.
• Some of the idiosyncrasies
• Topology sensitivity (e.g. AZ-local, host-local)
• Cloud provider limitations (e.g. number of attached disks)
• Kubernetes integration (e.g. provisioning and snapshots)
Container Persistence
@olgch, @kublr
• Integrates with Kubernetes
• CSI, FlexVolume, or native
• Volume provisioners
• Snapshots support
• Runs in cluster or externally
• Approach
• Flexible storage on top of backing storage
• Augmenting and extending backing storage
• Backing storage: local, managed, Kubernetes PV
• Examples: Rook/Ceph, Portworx, Nuvoloso, GlusterFS, Linstor,
OpenEBS, etc.
Cloud Native Storage
@olgch, @kublr
Data pool
mon
config
data
config
data
monmon
config
data
Cloud Native Storage: Rook/Ceph
raw data
osd
raw data
osd
raw data
mdsosd
Data pool
Image Image
Ceph
Filesystem
Components
Abstractions
Ceph
rgw
S3/Swift
Object Store
mgr
Rook
Operator
CSI plugins
osdosdganesha
NFS
CephCluster
Block Pool
Object Store
Filesystem
NFS
Object Store User
Provisioners
rbd-mirror
@olgch, @kublr
Operations: monitoring, log collection, alerting, etc.
Lifecycle: CI/CD, SCM, binary repo, etc.
Container management: registry, scanning, governance, etc.
Deployment options:
• Managed service
• In Kubernetes
• Deploy separately
Middleware
@olgch, @kublr
• Region to region; cloud to cloud; cloud to on-prem (hybrid)
• One cluster (⚠) vs cluster per location (✔)
Tasks
• Physical network connectivity: VPN, direct
• Overlay network connectivity: Calico BGP peering, Native routing
• Cross-cluster DNS: CoreDNS
• Cross-cluster deployment: K8S federation
• Cross-cluster ingress, load balancing: K8S federation, DNS, CDN
• Cross-cluster data replication
• native: e.g. AWS EBS, Snapshots inter-region transfer
• CNS level: e.g. Ceph geo-replication
• database level: e.g. Yugabyte geo-replication, sharding, ...
• application level
Something Missing? Multi-Site
@olgch, @kublr
To Recap...
• Kubernetes provides robust tools for application reliability
• Underlying infrastructure and Kubernetes components
recovery is responsibility of the cluster operator
• Kubernetes is just one of the layers
• Remember Architecture 101 and assess all layers accordingly
• Middleware, and even CNS, can run in Kubernetes and be
treated as regular applications to benefit from K8S capabilities
• Multi-site HA, balancing, failover is much easier with K8S and
the cloud native ecosystem. Still requires careful planning!
@olgch, @kublr
Q&A
@olgch, @kublr
Oleg Chunikhin
CTO | Kublr
oleg@kublr.com
Thank you!
Take Kublr for a test drive!
kublr.com/deploy
Free non-production license
@olgch, @kublr

More Related Content

What's hot

Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Kublr
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
Advanced Scheduling in Kubernetes
Advanced Scheduling in KubernetesAdvanced Scheduling in Kubernetes
Advanced Scheduling in KubernetesKublr
 
Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Weaveworks
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summits
 
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...Wojciech Barczyński
 
Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Kublr
 
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.K8s Pod Scheduling - Deep Dive. By Tsahi Duek.
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.Cloud Native Day Tel Aviv
 
Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple Wojciech Barczyński
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)Allan Naim
 
Managing kubernetes deployment with operators
Managing kubernetes deployment with operatorsManaging kubernetes deployment with operators
Managing kubernetes deployment with operatorsCloud Technology Experts
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Social Connections 14 - Kubernetes Basics for Connections Admins
Social Connections 14 - Kubernetes Basics for Connections AdminsSocial Connections 14 - Kubernetes Basics for Connections Admins
Social Connections 14 - Kubernetes Basics for Connections Adminspanagenda
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectinwin stack
 
Kubernetes Ingress 101
Kubernetes Ingress 101Kubernetes Ingress 101
Kubernetes Ingress 101Kublr
 
Kubernetes extensibility
Kubernetes extensibilityKubernetes extensibility
Kubernetes extensibilityDocker, Inc.
 

What's hot (20)

Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
Advanced Scheduling in Kubernetes
Advanced Scheduling in KubernetesAdvanced Scheduling in Kubernetes
Advanced Scheduling in Kubernetes
 
Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
 
Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes
 
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.K8s Pod Scheduling - Deep Dive. By Tsahi Duek.
K8s Pod Scheduling - Deep Dive. By Tsahi Duek.
 
Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Kubernetes basics and hands on exercise
Kubernetes basics and hands on exerciseKubernetes basics and hands on exercise
Kubernetes basics and hands on exercise
 
Kubernetes debug like a pro
Kubernetes debug like a proKubernetes debug like a pro
Kubernetes debug like a pro
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
 
Managing kubernetes deployment with operators
Managing kubernetes deployment with operatorsManaging kubernetes deployment with operators
Managing kubernetes deployment with operators
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
Social Connections 14 - Kubernetes Basics for Connections Admins
Social Connections 14 - Kubernetes Basics for Connections AdminsSocial Connections 14 - Kubernetes Basics for Connections Admins
Social Connections 14 - Kubernetes Basics for Connections Admins
 
From Code to Kubernetes
From Code to KubernetesFrom Code to Kubernetes
From Code to Kubernetes
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
 
Kubernetes Ingress 101
Kubernetes Ingress 101Kubernetes Ingress 101
Kubernetes Ingress 101
 
Kubernetes extensibility
Kubernetes extensibilityKubernetes extensibility
Kubernetes extensibility
 

Similar to How Self-Healing Nodes and Infrastructure Management Impact Reliability

Hybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackHybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackKublr
 
Intro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesIntro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesKublr
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with KubernetesOleg Chunikhin
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterKublr
 
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes OperationsDevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes OperationsDevOpsDays Houston
 
Kubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationKubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationinovex GmbH
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
The State of Kubernetes Security
The State of Kubernetes Security The State of Kubernetes Security
The State of Kubernetes Security Jimmy Mesta
 
KT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnKT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnHui Cheng
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...tdc-globalcode
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes InternalsShimi Bandiel
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinderopenstackindia
 
Kubernetes data science and machine learning
Kubernetes data science and machine learningKubernetes data science and machine learning
Kubernetes data science and machine learningKublr
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDocker, Inc.
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath
 
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин Владев
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин ВладевPlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин Владев
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин ВладевPlovDev Conference
 
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...Secure Your Containers: What Network Admins Should Know When Moving Into Prod...
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...Cynthia Thomas
 

Similar to How Self-Healing Nodes and Infrastructure Management Impact Reliability (20)

Hybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackHybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stack
 
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
 
Intro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesIntro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on Kubernetes
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes Cluster
 
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes OperationsDevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
 
Kubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationKubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestration
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
The State of Kubernetes Security
The State of Kubernetes Security The State of Kubernetes Security
The State of Kubernetes Security
 
KT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnKT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk Ahn
 
Am 02 osac_kt_swift
Am 02 osac_kt_swiftAm 02 osac_kt_swift
Am 02 osac_kt_swift
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
Kubernetes data science and machine learning
Kubernetes data science and machine learningKubernetes data science and machine learning
Kubernetes data science and machine learning
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин Владев
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин ВладевPlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин Владев
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин Владев
 
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...Secure Your Containers: What Network Admins Should Know When Moving Into Prod...
Secure Your Containers: What Network Admins Should Know When Moving Into Prod...
 

More from Kublr

Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Kublr
 
Container Runtimes and Tooling
Container Runtimes and ToolingContainer Runtimes and Tooling
Container Runtimes and ToolingKublr
 
Kubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKublr
 
Multi-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroMulti-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroKublr
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101Kublr
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101Kublr
 
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepSetting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepKublr
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Kublr
 
How to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsHow to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsKublr
 
Centralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive EnvironmentsCentralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive EnvironmentsKublr
 

More from Kublr (10)

Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2
 
Container Runtimes and Tooling
Container Runtimes and ToolingContainer Runtimes and Tooling
Container Runtimes and Tooling
 
Kubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with Submariner
 
Multi-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroMulti-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with Velero
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepSetting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
 
How to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsHow to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive Environments
 
Centralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive EnvironmentsCentralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive Environments
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

How Self-Healing Nodes and Infrastructure Management Impact Reliability

  • 1. How Self-Healing Nodes and Infrastructure Management Impact Reliability Oleg Chunikhin |CTO
  • 2. Introductions Oleg Chunikhin CTO, Kublr ✔ 20+ years in software architecture & development ✔ Working w/ Kubernetes since its release in 2015 ✔ CTO at Kublr—an enterprise ready container management platform ✔ Twitter @olgch; @kublr Like what you hear? Tweet at us!
  • 3. Automation Ingress Custom Clusters Infrastructure Logging Monitoring Observability API Usage Reporting RBAC IAM Air Gap TLS Certificate Rotation Audit Storage Networking Container Registry CI / CD App Mgmt Infrastructure Container Runtime Kubernetes OPERATIONS SECURITY & GOVERNANCE What’s Kublr? @olgch, @kublr
  • 4. Building a Reliable System with Kubernetes • Day 1: trivial, K8S will restart my pod! • Day 30: this is not so simple... • What Kubernetes does, can, and doesn’t do • Full stack reliability: tools and approaches @olgch, @kublr
  • 5. Kubernetes Cluster K8S Architecture Refresher: Components The Master, agent, etcd, API, overlay network, and DNS Master API Server etcd data controller manager scheduler etcd kubectl Worker kubelet container runtime overlay network cluster DNS kube-proxy @olgch, @kublr
  • 6. Cluster K8S Architecture Refresher: API Objects Nodes, pods, services, and persistent volumes Node 1 Node 2 Pod A-1 10.0.0.3 Cnt1 Cnt2 Pod A-2 10.0.0.5 Cnt1 Cnt2 Pod B-1 10.0.0.8 Cnt3 SrvA 10.7.0.1 SrvB 10.7.0.3 Persistent Volume @olgch, @kublr
  • 7. Reliability Tools: Pod Probes and Controllers Probes Liveness and readiness check • TCP, HTTP(S), exec Controllers ReplicaSet • Maintain specific number of identical replicas Deployment • ReplicaSet + update strategy, rolling update StatefulSet • Deployment + replica identity, persistent volume stability DaemonSet • Maintain identical replicas on each node (of a specified set) Operators @olgch, @kublr
  • 8. • Resource framework • Standard: CPU, memory, disk • Custom: GPU, FPGA, etc. • Requests and limits • Kube and system reservations • no swap • Pod eviction and disruption budget (resource starving) • Pod priority and preemption (critical pods) • Affinity, anti-affinity, node selectors & matchers Reliability Tools: Resource & Location Mgmt @olgch, @kublr
  • 9. Reliability Tools: Autoscaling Horizontal pod autoscaler (HPA) Vertical pod autoscaler (VPA) • In-place updates - WIP (issue #5774) Cluster Autoscaler • Depends on infrastructure provider - uses node groups AWS ASG, Azure ScaleSets, ... • Supports AWS, Azure, GCE, GKE, Openstack, Alibaba Cloud @olgch, @kublr
  • 10. Applications/pods/containers “Middleware” • Operations: monitoring, log collection, alerting, etc. • Lifecycle: CI/CD, SCM, binary repo, etc. • Container management: registry, scanning, governance, etc. Container Persistence: cloud native storage, DB, messaging Container Orchestrator: Kubernetes • “Essentials”: overlay network, DNS, autoscaler, etc. • Core: K8S etcd, master, worker components • Container engine: Docker, CRI-O, etc. OS: kernel, network, devices, services, etc. Infrastructure: “raw” compute, network, storage Full Stack Reliability @olgch, @kublr
  • 11. Architecture 101 • Layers are separate and independent • Disposable/restartable components • Reattachable dependencies (including data) • Persistent state is separate from disposable processes Pets vs cattle - only data is allowed to be pets (ideally) @olgch, @kublr
  • 12. If a node has a problem... • Try to fix it (pet) • Replace or reset it (cattle) Infrastructure and OS Tools • In-cluster: npd, weaveworks kured, ... • hardware, kernel, servicer, container runtime issues • reboot • Infrastructure provider automation • AWS ASG, Azure Scale Set, ... • External auto node recovery logic • Custom + infrastructure provider API • Cluster management solution • (Future) cluster API @olgch, @kublr
  • 13. Components • etcd • Master: API server, controller manager, scheduler • Worker: kubelet, kube-proxy • Container runtime: Docker, CRI-O Already 12 factor Kubernetes Components: Auto-Recovery Monitor liveliness, automate restart • Run as services • Run as static pods Dependencies to care about • etcd data • K8S keys and certificates • Configuration @olgch, @kublr
  • 14. API Server controller manager scheduler API Server controller manager scheduler etcd data etcd etcd data etcd K8S multi-master • Pros: HA, scaling • Cons: need LB (server or client) etcd cluster • Pros: HA, data replication • Cons: latency, ops complexity etcd data • Local ephemeral • Local persistent (survives node failure) • Remote persistent (survives node replacement) Kubernetes Components: Multi-Master API Server etcd data controller manager scheduler etcd kubectl kubelet @olgch, @kublr
  • 15. • Persistent volumes • Volume provisioning • Storage categories • Native block storage: AWS EBS, Azure Disk, vSphere volume, attached block device, etc. • HostPath • Managed network storage: NFS, iSCSI, NAS/SAN, AWS EFS, etc. • Some of the idiosyncrasies • Topology sensitivity (e.g. AZ-local, host-local) • Cloud provider limitations (e.g. number of attached disks) • Kubernetes integration (e.g. provisioning and snapshots) Container Persistence @olgch, @kublr
  • 16. • Integrates with Kubernetes • CSI, FlexVolume, or native • Volume provisioners • Snapshots support • Runs in cluster or externally • Approach • Flexible storage on top of backing storage • Augmenting and extending backing storage • Backing storage: local, managed, Kubernetes PV • Examples: Rook/Ceph, Portworx, Nuvoloso, GlusterFS, Linstor, OpenEBS, etc. Cloud Native Storage @olgch, @kublr
  • 17. Data pool mon config data config data monmon config data Cloud Native Storage: Rook/Ceph raw data osd raw data osd raw data mdsosd Data pool Image Image Ceph Filesystem Components Abstractions Ceph rgw S3/Swift Object Store mgr Rook Operator CSI plugins osdosdganesha NFS CephCluster Block Pool Object Store Filesystem NFS Object Store User Provisioners rbd-mirror @olgch, @kublr
  • 18. Operations: monitoring, log collection, alerting, etc. Lifecycle: CI/CD, SCM, binary repo, etc. Container management: registry, scanning, governance, etc. Deployment options: • Managed service • In Kubernetes • Deploy separately Middleware @olgch, @kublr
  • 19. • Region to region; cloud to cloud; cloud to on-prem (hybrid) • One cluster (⚠) vs cluster per location (✔) Tasks • Physical network connectivity: VPN, direct • Overlay network connectivity: Calico BGP peering, Native routing • Cross-cluster DNS: CoreDNS • Cross-cluster deployment: K8S federation • Cross-cluster ingress, load balancing: K8S federation, DNS, CDN • Cross-cluster data replication • native: e.g. AWS EBS, Snapshots inter-region transfer • CNS level: e.g. Ceph geo-replication • database level: e.g. Yugabyte geo-replication, sharding, ... • application level Something Missing? Multi-Site @olgch, @kublr
  • 20. To Recap... • Kubernetes provides robust tools for application reliability • Underlying infrastructure and Kubernetes components recovery is responsibility of the cluster operator • Kubernetes is just one of the layers • Remember Architecture 101 and assess all layers accordingly • Middleware, and even CNS, can run in Kubernetes and be treated as regular applications to benefit from K8S capabilities • Multi-site HA, balancing, failover is much easier with K8S and the cloud native ecosystem. Still requires careful planning! @olgch, @kublr
  • 22. Oleg Chunikhin CTO | Kublr oleg@kublr.com Thank you! Take Kublr for a test drive! kublr.com/deploy Free non-production license @olgch, @kublr