SlideShare a Scribd company logo
All The Troubles You
Get Into When Setting
up a Production-ready
Kubernetes Cluster
Jimmy Lu
Software Engineer @hyvesolutions
jimmylu@hyvesolutions.com
Agenda
Motivation
Recap of Kubernetes Architecture
Security
Networking
Miscellaneous
High Availability
Motivation
Motivation
• A Million Ways of Deploying a Kubernetes Cluster – DevOpsDays 2017
• https://goo.gl/5yHFHa
• We tried to build our own solutions – Kubewizard
• Large Clusters
• Configurable/Customizable
• Easy to Use
• Fast
• Production Ready
• We wanna save your precious time, keeping you out of troubles
Recap of
Kubernetes
Architecture
Architecture
Recap
Architecture
Recap –
Network
Architecture Recap – Authentication/Authorization
Clinical Cases
Case Study Sample
Symptom
01
Diagnosis
02
Therapy
03
Security
SSL/TLS
Recap
Architecture
Recap
Security – SSL/TLS
• Symptom: Unable to connect to the server: x509: certificate signed
by unknown authority
• Diagnosis: Check the CA data in .kubeconfig file
• Therapy: Make your client CA data identical to the CA file assigned to
the server
• --client-ca-file of kube-apiserver and kubelet, --trusted-ca-file and --trusted-
ca-file of etcd, and the CA of your authentication proxy
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0t……
server: https://xxx.xxx.xxx.xxx:6443
name: kw
contexts:
- context:
cluster: kw
namespace: default
user: kw-admin
name: kw
current-context: kw
kind: Config
preferences: {}
users:
- name: kw-admin
user:
client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tL……
client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQ……
Security – SSL/TLS
• Symptom: You try to get the logs via ‘kubectl logs $(pod_name)’ but
encounter x509: certificate signed by unknown authority
• Diagnosis: Check if kubelet client CA assigned to kube-apiserver is
correct
• Therapy: Make your kubelet client CA assigned to kube-apiserver via -
-kubelet-certificate-authority matches what is assigned to kubelets
Architecture
Recap
Security – SSL/TLS
• Symptom: Unable to connect to the server: x509: certificate is valid
for 10.240.0.4, 10.240.0.5, 35.194.148.244
• Diagnosis: Check IPs and domains in the hosts part of the certificate
request
• Therapy: Make sure all the IPs and domains are included in your
certificate request file when generating the server certificates
{
"CN": "kube-apiserver",
"hosts": [
"kw-master-001",
"kw-master-002",
"10.240.0.4",
"10.240.0.5",
"35.194.148.244",
"35.201.222.64",
"35.201.171.127",
"10.96.0.1",
"127.0.0.1",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{}]
}
Kube-DNS
domain
hostname
internal ip
external ip
load balancer
cluster ip
Security – Authentication
• X509 Client Certs
• Static Token File
• Bootstrap Tokens
• Static Password File
• Service Account Tokens
• OpenID Connect Tokens
• Webhook Token Authentication
• Authenticating Proxy
• Keystone Password
• Anonymous requests
• https://kubernetes.io/docs/admin/au
thentication/
Security – Authentication
• Symptom: tls: failed to find any PEM data in certificate (or key) input
• Diagnosis: Check the certificate and key data in .kubeconfig file
• Therapy: Make sure the certificate and key data are correctly signed
by the CA you created
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0t……
server: https://xxx.xxx.xxx.xxx:6443
name: kw
contexts:
- context:
cluster: kw
namespace: default
user: kw-admin
name: kw
current-context: kw
kind: Config
preferences: {}
users:
- name: kw-admin
user:
client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tL……
client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQ……
Security – RBAC Authorization
• Symptom: Error from server (Forbidden): User "kubernetes-admin"
cannot list nodes at the cluster scope. (get nodes)
• Diagnosis: Check clusterrole, clusterrolebinding, role, rolebinding,
and the user:group values in the kubeconfig, token, or http-headers
• Therapy: Create corresponding roles and rolebindings for the users,
groups, or service accounts
certificate request
{
"CN": "system:node:kw-etcd-001",
"names": [
{
"O": "system:nodes"
}
]
}
authentication token
2915baa1f710cbada00aad86706ded28,kubelet-
bootstrap,10001,"system:kubelet-bootstrap"
configmap
clusters:
- cluster:
certificate-authority:
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.96.0.1:443
name: default
contexts:
……
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
Security – RBAC Authorization
• Affects to all the components connecting to apiserver
• Symptom:
• Failed to list *v1.Pod, *v1.Node, etc. in the logs of all the components that
talk to apisrever
• kube-proxy – requests to the service cannot be proxied to the endpoints
• kube-dns – domain name cannot be resolved
• kubelet – nodes cannot join the cluster
• overlay network – cannot assign IPs to pods thus no traffic to the pods
• kube-controller-manager – primary features of Kubenretes are malfunctioned
Networking
Networking
• Symptom: Nodes are in NotReady state when ‘kubectl get nodes’
• Diagnosis: Verify if overlay networks work
• Therapy: Install CNI and CNI-plugins and make sure they work as
expected
Networking
• Symptom: All traffic between pods, or between nodes and pods are
dropped
• Diagnosis: Look at rules of iptables and routing tables
• Therapy: Allow packet forward or downgrade to docker v1.12.x
$ sudo iptables-save
-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -
j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j DROP
-A FORWARD -i docker0 -o docker0 -j DROP
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
Miscellaneous
Miscellaneous
• Symptom: Kubernetes components are signal to stop periodically
• Diagnosis: Check the configuration of liveness probe and readiness
probe.
• Therapy: Make sure the host, port, scheme match the health-check
targets. Also, make sure your applications are in the good states
kube-controller-manager.yaml
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1 # default to pod’s IP
path: /healthz
port: 10252
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
Miscellaneous
• Symptom: Cannot create cloud load balancer or dynamic persistent
volumes automatically
• Diagnosis: Look at --clud-provider argument of kube-apiserver, kube-
controller-manager, and kubelet
• Therapy: Enable cloud integration by giving correct values to the --
cloud-provider argument
Miscellaneous
• Symptom: TLS certificate request cannot work with static pods, --
run_once does not help solving the issue because it’s broken.
• Therapy: Either applies TLS certificate request or static pods, not both
High Availability
High Availability
• Symptom: controller.go:290] Resetting endpoints for master service
"kubernetes" to...'
• Diagnosis: Look at the --apiserver-count argument to see if it matches
the actual number of apiservers
• Therapy: Correct the value of the --apisrever-count argument
• https://stackoverflow.com/questions/36337431/kubernetes-newer-api-
server-shows-errors-resetting-endpoints-for-master-service
High Availability
• Symptom: attempting to acquire leader lease... keep showing in the
logs of kube-controller-managers and kube-schedulers
• Diagnosis: Check to see if ‘successfully acquired lease…‘ appears in
one of the logs of kube-controller-managers and kube-schedulers
• Therapy: No action needed
High Availability
• Use monit or systemctl to watch over kubelet and docker
• Let the health check of external load balancer hits against insecure
port of kube-apiserver (--insecure-port, --insecure-bind-address)
• Load balancer may sometimes may aggravate the issues. E.g. some
apiservers are in the good status, some are not.
• etcd, kube-apiserver, kube-controller-manager, kube-scheduler,
kubelet should all set to be high available
Kubewizard
Summary
• Setting up a distributed system is never easy, especially the complex
system like Kubernetes
• Some suggestions
• Be patient
• Step-by-step, reduce the number of control factors to a minimum
• Start from a small cluster, then to a HA cluster, then a large cluster
• kubectl logs, kubectl describe, systemctl status, journalctl -xe, docker logs,
minikube, and kubeadm, Kubernetes-the-hard-way are your good friends
• RTFM (Read the Documents)
Q&A

More Related Content

What's hot

Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
Sébastien Le Gall
 

What's hot (20)

NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in Kubernetes
 
Serverless technologies with Kubernetes
Serverless technologies with KubernetesServerless technologies with Kubernetes
Serverless technologies with Kubernetes
 
Kubernetes 101 for Developers
Kubernetes 101 for DevelopersKubernetes 101 for Developers
Kubernetes 101 for Developers
 
Scaling Docker with Kubernetes
Scaling Docker with KubernetesScaling Docker with Kubernetes
Scaling Docker with Kubernetes
 
Building Big Architectures
Building Big ArchitecturesBuilding Big Architectures
Building Big Architectures
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 Workshop
 
Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17
 
Kubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of ContainersKubernetes - Sailing a Sea of Containers
Kubernetes - Sailing a Sea of Containers
 
Introduction to Kubernetes RBAC
Introduction to Kubernetes RBACIntroduction to Kubernetes RBAC
Introduction to Kubernetes RBAC
 
Baylisa - Dive Into OpenStack
Baylisa - Dive Into OpenStackBaylisa - Dive Into OpenStack
Baylisa - Dive Into OpenStack
 
Kubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOSKubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOS
 
AKS Azure Kubernetes Services - Azure Nights melbourne feb 2018
AKS Azure Kubernetes Services - Azure Nights melbourne feb 2018AKS Azure Kubernetes Services - Azure Nights melbourne feb 2018
AKS Azure Kubernetes Services - Azure Nights melbourne feb 2018
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with Kubernetes
 

Similar to All the troubles you get into when setting up a production ready Kubernetes cluster

Similar to All the troubles you get into when setting up a production ready Kubernetes cluster (20)

Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019
 
K8s hard-way on DigitalOcean
K8s hard-way on DigitalOceanK8s hard-way on DigitalOcean
K8s hard-way on DigitalOcean
 
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
CKA_1st.pptx
CKA_1st.pptxCKA_1st.pptx
CKA_1st.pptx
 
AKS
AKSAKS
AKS
 
Kubernetes 1.16 and rancher 2.3 enhancements
Kubernetes 1.16 and rancher 2.3 enhancementsKubernetes 1.16 and rancher 2.3 enhancements
Kubernetes 1.16 and rancher 2.3 enhancements
 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
 
Meetup open stack_grizzly
Meetup open stack_grizzlyMeetup open stack_grizzly
Meetup open stack_grizzly
 
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
 
Best Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes ServicesBest Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes Services
 
Kubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOpsKubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOps
 
Encrypt your volumes with barbican open stack 2018
Encrypt your volumes with barbican open stack 2018Encrypt your volumes with barbican open stack 2018
Encrypt your volumes with barbican open stack 2018
 
From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in Sydney
 
DevOps in AWS with Kubernetes
DevOps in AWS with KubernetesDevOps in AWS with Kubernetes
DevOps in AWS with Kubernetes
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
 

More from Jimmy Lu

More from Jimmy Lu (20)

Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5
 
Event sourcing with reactor and spring statemachine
Event sourcing with reactor and spring statemachineEvent sourcing with reactor and spring statemachine
Event sourcing with reactor and spring statemachine
 
Bootify your spring application
Bootify your spring applicationBootify your spring application
Bootify your spring application
 
A Prototype of Brain Network Simulator for Spatiotemporal Dynamics of Alzheim...
A Prototype of Brain Network Simulator for Spatiotemporal Dynamics of Alzheim...A Prototype of Brain Network Simulator for Spatiotemporal Dynamics of Alzheim...
A Prototype of Brain Network Simulator for Spatiotemporal Dynamics of Alzheim...
 
The Model of Spatiotemporal Dynamics of Alzheimer’s Disease
The Model of Spatiotemporal Dynamics of Alzheimer’s DiseaseThe Model of Spatiotemporal Dynamics of Alzheimer’s Disease
The Model of Spatiotemporal Dynamics of Alzheimer’s Disease
 
The Models of Alzheimer's Disease Part II
The Models of Alzheimer's Disease Part IIThe Models of Alzheimer's Disease Part II
The Models of Alzheimer's Disease Part II
 
The Models of Alzheimer's Disease Part I
The Models of Alzheimer's Disease Part IThe Models of Alzheimer's Disease Part I
The Models of Alzheimer's Disease Part I
 
The Models of Alzheimer's Disease Part III
The Models of Alzheimer's Disease Part IIIThe Models of Alzheimer's Disease Part III
The Models of Alzheimer's Disease Part III
 
On the Development of a Brain Simulator
On the Development of a Brain SimulatorOn the Development of a Brain Simulator
On the Development of a Brain Simulator
 
Design the Brain Simulator
Design the Brain SimulatorDesign the Brain Simulator
Design the Brain Simulator
 
Research Proposal and Milestone
Research Proposal and MilestoneResearch Proposal and Milestone
Research Proposal and Milestone
 
Reward
RewardReward
Reward
 
On the Development of a Brain Simulator
On the Development of a Brain SimulatorOn the Development of a Brain Simulator
On the Development of a Brain Simulator
 
Mining the Parkinson's Telemonitoring Data Set
Mining the Parkinson's Telemonitoring Data SetMining the Parkinson's Telemonitoring Data Set
Mining the Parkinson's Telemonitoring Data Set
 
Brian Simulator (a draft)
Brian Simulator (a draft)Brian Simulator (a draft)
Brian Simulator (a draft)
 
Exploring Complex Networks
Exploring Complex NetworksExploring Complex Networks
Exploring Complex Networks
 
Brain Network - Thalamocortical Motif
Brain Network - Thalamocortical MotifBrain Network - Thalamocortical Motif
Brain Network - Thalamocortical Motif
 
How To Build A Personal Portal On Google App Engine With Django
How To Build A Personal Portal On Google App Engine With DjangoHow To Build A Personal Portal On Google App Engine With Django
How To Build A Personal Portal On Google App Engine With Django
 
Brain Networks
Brain NetworksBrain Networks
Brain Networks
 
WECO Lab
WECO LabWECO Lab
WECO Lab
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Benefits of Employee Monitoring Software
Benefits of  Employee Monitoring SoftwareBenefits of  Employee Monitoring Software
Benefits of Employee Monitoring Software
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 

All the troubles you get into when setting up a production ready Kubernetes cluster

  • 1. All The Troubles You Get Into When Setting up a Production-ready Kubernetes Cluster Jimmy Lu Software Engineer @hyvesolutions jimmylu@hyvesolutions.com
  • 2. Agenda Motivation Recap of Kubernetes Architecture Security Networking Miscellaneous High Availability
  • 4. Motivation • A Million Ways of Deploying a Kubernetes Cluster – DevOpsDays 2017 • https://goo.gl/5yHFHa • We tried to build our own solutions – Kubewizard • Large Clusters • Configurable/Customizable • Easy to Use • Fast • Production Ready • We wanna save your precious time, keeping you out of troubles
  • 8. Architecture Recap – Authentication/Authorization
  • 14. Security – SSL/TLS • Symptom: Unable to connect to the server: x509: certificate signed by unknown authority • Diagnosis: Check the CA data in .kubeconfig file • Therapy: Make your client CA data identical to the CA file assigned to the server • --client-ca-file of kube-apiserver and kubelet, --trusted-ca-file and --trusted- ca-file of etcd, and the CA of your authentication proxy
  • 15. clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0t…… server: https://xxx.xxx.xxx.xxx:6443 name: kw contexts: - context: cluster: kw namespace: default user: kw-admin name: kw current-context: kw kind: Config preferences: {} users: - name: kw-admin user: client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tL…… client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQ……
  • 16. Security – SSL/TLS • Symptom: You try to get the logs via ‘kubectl logs $(pod_name)’ but encounter x509: certificate signed by unknown authority • Diagnosis: Check if kubelet client CA assigned to kube-apiserver is correct • Therapy: Make your kubelet client CA assigned to kube-apiserver via - -kubelet-certificate-authority matches what is assigned to kubelets
  • 18. Security – SSL/TLS • Symptom: Unable to connect to the server: x509: certificate is valid for 10.240.0.4, 10.240.0.5, 35.194.148.244 • Diagnosis: Check IPs and domains in the hosts part of the certificate request • Therapy: Make sure all the IPs and domains are included in your certificate request file when generating the server certificates
  • 20. Security – Authentication • X509 Client Certs • Static Token File • Bootstrap Tokens • Static Password File • Service Account Tokens • OpenID Connect Tokens • Webhook Token Authentication • Authenticating Proxy • Keystone Password • Anonymous requests • https://kubernetes.io/docs/admin/au thentication/
  • 21. Security – Authentication • Symptom: tls: failed to find any PEM data in certificate (or key) input • Diagnosis: Check the certificate and key data in .kubeconfig file • Therapy: Make sure the certificate and key data are correctly signed by the CA you created
  • 22. clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0t…… server: https://xxx.xxx.xxx.xxx:6443 name: kw contexts: - context: cluster: kw namespace: default user: kw-admin name: kw current-context: kw kind: Config preferences: {} users: - name: kw-admin user: client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tL…… client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQ……
  • 23. Security – RBAC Authorization • Symptom: Error from server (Forbidden): User "kubernetes-admin" cannot list nodes at the cluster scope. (get nodes) • Diagnosis: Check clusterrole, clusterrolebinding, role, rolebinding, and the user:group values in the kubeconfig, token, or http-headers • Therapy: Create corresponding roles and rolebindings for the users, groups, or service accounts
  • 24. certificate request { "CN": "system:node:kw-etcd-001", "names": [ { "O": "system:nodes" } ] } authentication token 2915baa1f710cbada00aad86706ded28,kubelet- bootstrap,10001,"system:kubelet-bootstrap"
  • 25. configmap clusters: - cluster: certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt server: https://10.96.0.1:443 name: default contexts: …… users: - name: default user: tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  • 26. Security – RBAC Authorization • Affects to all the components connecting to apiserver • Symptom: • Failed to list *v1.Pod, *v1.Node, etc. in the logs of all the components that talk to apisrever • kube-proxy – requests to the service cannot be proxied to the endpoints • kube-dns – domain name cannot be resolved • kubelet – nodes cannot join the cluster • overlay network – cannot assign IPs to pods thus no traffic to the pods • kube-controller-manager – primary features of Kubenretes are malfunctioned
  • 28. Networking • Symptom: Nodes are in NotReady state when ‘kubectl get nodes’ • Diagnosis: Verify if overlay networks work • Therapy: Install CNI and CNI-plugins and make sure they work as expected
  • 29. Networking • Symptom: All traffic between pods, or between nodes and pods are dropped • Diagnosis: Look at rules of iptables and routing tables • Therapy: Allow packet forward or downgrade to docker v1.12.x
  • 30. $ sudo iptables-save -A INPUT -j KUBE-FIREWALL -A FORWARD -j DOCKER-ISOLATION -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED - j ACCEPT -A FORWARD -i docker0 ! -o docker0 -j DROP -A FORWARD -i docker0 -o docker0 -j DROP -A OUTPUT -j KUBE-FIREWALL -A DOCKER-ISOLATION -j RETURN
  • 32. Miscellaneous • Symptom: Kubernetes components are signal to stop periodically • Diagnosis: Check the configuration of liveness probe and readiness probe. • Therapy: Make sure the host, port, scheme match the health-check targets. Also, make sure your applications are in the good states
  • 33. kube-controller-manager.yaml livenessProbe: failureThreshold: 8 httpGet: host: 127.0.0.1 # default to pod’s IP path: /healthz port: 10252 scheme: HTTP initialDelaySeconds: 15 timeoutSeconds: 15
  • 34. Miscellaneous • Symptom: Cannot create cloud load balancer or dynamic persistent volumes automatically • Diagnosis: Look at --clud-provider argument of kube-apiserver, kube- controller-manager, and kubelet • Therapy: Enable cloud integration by giving correct values to the -- cloud-provider argument
  • 35. Miscellaneous • Symptom: TLS certificate request cannot work with static pods, -- run_once does not help solving the issue because it’s broken. • Therapy: Either applies TLS certificate request or static pods, not both
  • 37. High Availability • Symptom: controller.go:290] Resetting endpoints for master service "kubernetes" to...' • Diagnosis: Look at the --apiserver-count argument to see if it matches the actual number of apiservers • Therapy: Correct the value of the --apisrever-count argument • https://stackoverflow.com/questions/36337431/kubernetes-newer-api- server-shows-errors-resetting-endpoints-for-master-service
  • 38. High Availability • Symptom: attempting to acquire leader lease... keep showing in the logs of kube-controller-managers and kube-schedulers • Diagnosis: Check to see if ‘successfully acquired lease…‘ appears in one of the logs of kube-controller-managers and kube-schedulers • Therapy: No action needed
  • 39. High Availability • Use monit or systemctl to watch over kubelet and docker • Let the health check of external load balancer hits against insecure port of kube-apiserver (--insecure-port, --insecure-bind-address) • Load balancer may sometimes may aggravate the issues. E.g. some apiservers are in the good status, some are not. • etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet should all set to be high available
  • 41. Summary • Setting up a distributed system is never easy, especially the complex system like Kubernetes • Some suggestions • Be patient • Step-by-step, reduce the number of control factors to a minimum • Start from a small cluster, then to a HA cluster, then a large cluster • kubectl logs, kubectl describe, systemctl status, journalctl -xe, docker logs, minikube, and kubeadm, Kubernetes-the-hard-way are your good friends • RTFM (Read the Documents)
  • 42. Q&A