Kubeinvaders & Chaos Engineering practices for Kubernetes
Eugenio Marzo - CKA, Vault Associate
FOSDEM 2023
Agenda
● Definition of Chaos Engineering
● k-inv - game and programming mode
● Chaos Programming Console
● Community Links
Definition of Chaos Engineering
● ChaosMonkey
● Litmuschaos
● ChaosMesh
● ChaosToolkit
● Kubedoom
Chaos Engineering is the discipline of experimenting
on a system in order to build confidence in the
system’s capability to withstand turbulent conditions
in production (https://principlesofchaos.org/)
● Test the resilience of a distributed system
● Trigger controlled alerts for testing monitoring systems
Tools
Use cases
Definition
Monitoring system exposes tons of metrics. All are important,
but what are definitely significant?
Use Case - Metrics Selection
Sometime too many metrics and alerts can confuse SysOps
teams. There is not clarity of what is important for a first-level
monitoring system.
Use Case - Metrics Selection
Chaos Engineering is a discipline for stressing systems and see
how they are resilient and rock-solid. But, can it help us to solve
our problem?
Stressing systems reliability == Producing controlled alerts
Use Case - Metrics Selection
Openshift 4.10 (3 Master/Workers)
Chaosd
(physical nodes)
wrk1 wrk2 wrk3
Chaos Mesh Pods
Prometheus Stack
Use Case - Metrics Selection
Running Chaos Mesh we saw some interesting Alerts and related Metrics from Prometheus
console:
MEM Attack (/usr/local/chaosd-v1.0.0-linux-amd64/tools/stress-ng --vm 2 --vm-bytes 15G)
● etcdMembersDown
● etcdNoLeader
● TargetDown
● KubeClientErrors
● ExtremelyHighIndividualControlPlaneCPU
Disk Attack - ./chaosd attack disk fill -s95G -p /var/lib/containers/foo.bar
● NodeFilesystemAlmostOutOfSpace
CPU Attack - ./chaosd attack stress cpu -w 4
● etcdMemberCommunicationSlow
● etcdHighCommitDurations
● KubePodNotReady
● HighOverallControlPlaneCPU
Network Fault - Delay 3s
● TargetDown
● KubeAPIErrorBudgetBurn
What is k-inv
Chaos Engineering tool for Kubernetes. It is composed by a game part (space-invaders
imitation for killing pods) and a chaos programming console
● Kill pods randomly and start chaos jobs against worker and master nodes
● Define and run chaos experiments and load testing
Features
Definition
k-inv - Helm
helm repo add kubeinvaders https://lucky-sideburn.github.io/helm-charts/
helm repo update
kubectl create namespace kubeinvaders
helm install kubeinvaders --set-string
config.target_namespace="namespace1,namespace2" 
-n kubeinvaders kubeinvaders/kubeinvaders --set ingress.enabled=true --set
ingress.hostName=kubeinvaders.io --set deployment.image.tag=v1.9.6
Definition
k-inv - Docker
docker run -p 8080:8080 
--env K8S_TOKEN=<k8s_service_account_token> 
--env ENDPOINT=localhost:8080 
--env INSECURE_ENDPOINT=true 
--env KUBERNETES_SERVICE_HOST=<k8s_controlplane_host> 
--env KUBERNETES_SERVICE_PORT_HTTPS=<k8s_controlplane_port> 
--env NAMESPACE=<comma_separated_namespaces_to_stress> 
luckysideburn/kubeinvaders:develop
Definition
Game Mode
Game Mode
Architecture
Switch
between game
and
programming
mode
Control Plane
Overview
OpenMetrics
exporter
http://kubeinvaders:
8080/metrics
Control Plane
Overview
Customizable
presets for
chaos
experiments
and load
testing
Control Plane
Overview
Chaos Programming Console - Controls and metrics
Light and Dark Mode
Options and Chaos Container Def.
Chaos Programming Console - Logging
Chaos Programming Mode - Pods status
Watch status of pods
related to current chaos
experiments.
chaos-codename: promethium
jobs:
cpu-attack-job:
additional-labels:
chaos-controller: kubeinvaders
chaos-type: stress-ng
chaos-codename: promethium
image: docker.io/luckysideburn/kubeinvaders-stress-ng:latest
command: "stress-ng"
args:
- --version
mem-attack-job:
additional-labels:
chaos-controller: kubeinvaders
chaos-type: stress-ng
chaos-codename: promethium
image: docker.io/luckysideburn/kubeinvaders-stress-ng:latest
command: "stress-ng"
args:
- --version
experiments:
- name: cpu-attack-exp
job: cpu-attack-job
loop: 5
- name: mem-attack-exp
job: mem-attack-job
loop: 5
Chaos Programming Mode - k-inv language
Chaos Programming Mode - HTTP load test
Community Links
● Awesome_k8s:
https://github.com/ramitsurana/aweso
me-kubernetes
● Kubernetes_blog:
https://kubernetes.io/blog/2020/01/22
/kubeinvaders-gamified-chaos-engine
ering-tool-for-kubernetes/
● Live_session:
https://www.youtube.com/watch?v
=k0w-NXt0_hA
https://github.com/lucky-sideburn/kubeinvaders
(repo git)
https://devopstribe.it/ (my blog…)
https://www.linkedin.com/in/eugenio-marzo-646a674
2/ (linkedin profile)
● eugenio.marzo [at] yahoo.it
● kubeinvaders [at] gmail.com
Other Tools
Contacts
Links
Thank you for
your
attention!

Kubeinvaders & Chaos Engineering practices for Kubernetes-1.pdf

  • 1.
    Kubeinvaders & ChaosEngineering practices for Kubernetes Eugenio Marzo - CKA, Vault Associate FOSDEM 2023
  • 2.
    Agenda ● Definition ofChaos Engineering ● k-inv - game and programming mode ● Chaos Programming Console ● Community Links
  • 3.
    Definition of ChaosEngineering ● ChaosMonkey ● Litmuschaos ● ChaosMesh ● ChaosToolkit ● Kubedoom Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production (https://principlesofchaos.org/) ● Test the resilience of a distributed system ● Trigger controlled alerts for testing monitoring systems Tools Use cases Definition
  • 4.
    Monitoring system exposestons of metrics. All are important, but what are definitely significant? Use Case - Metrics Selection
  • 5.
    Sometime too manymetrics and alerts can confuse SysOps teams. There is not clarity of what is important for a first-level monitoring system. Use Case - Metrics Selection
  • 6.
    Chaos Engineering isa discipline for stressing systems and see how they are resilient and rock-solid. But, can it help us to solve our problem? Stressing systems reliability == Producing controlled alerts Use Case - Metrics Selection
  • 7.
    Openshift 4.10 (3Master/Workers) Chaosd (physical nodes) wrk1 wrk2 wrk3 Chaos Mesh Pods Prometheus Stack Use Case - Metrics Selection
  • 8.
    Running Chaos Meshwe saw some interesting Alerts and related Metrics from Prometheus console: MEM Attack (/usr/local/chaosd-v1.0.0-linux-amd64/tools/stress-ng --vm 2 --vm-bytes 15G) ● etcdMembersDown ● etcdNoLeader ● TargetDown ● KubeClientErrors ● ExtremelyHighIndividualControlPlaneCPU Disk Attack - ./chaosd attack disk fill -s95G -p /var/lib/containers/foo.bar ● NodeFilesystemAlmostOutOfSpace CPU Attack - ./chaosd attack stress cpu -w 4 ● etcdMemberCommunicationSlow ● etcdHighCommitDurations ● KubePodNotReady ● HighOverallControlPlaneCPU Network Fault - Delay 3s ● TargetDown ● KubeAPIErrorBudgetBurn
  • 9.
    What is k-inv ChaosEngineering tool for Kubernetes. It is composed by a game part (space-invaders imitation for killing pods) and a chaos programming console ● Kill pods randomly and start chaos jobs against worker and master nodes ● Define and run chaos experiments and load testing Features Definition
  • 10.
    k-inv - Helm helmrepo add kubeinvaders https://lucky-sideburn.github.io/helm-charts/ helm repo update kubectl create namespace kubeinvaders helm install kubeinvaders --set-string config.target_namespace="namespace1,namespace2" -n kubeinvaders kubeinvaders/kubeinvaders --set ingress.enabled=true --set ingress.hostName=kubeinvaders.io --set deployment.image.tag=v1.9.6 Definition
  • 11.
    k-inv - Docker dockerrun -p 8080:8080 --env K8S_TOKEN=<k8s_service_account_token> --env ENDPOINT=localhost:8080 --env INSECURE_ENDPOINT=true --env KUBERNETES_SERVICE_HOST=<k8s_controlplane_host> --env KUBERNETES_SERVICE_PORT_HTTPS=<k8s_controlplane_port> --env NAMESPACE=<comma_separated_namespaces_to_stress> luckysideburn/kubeinvaders:develop Definition
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Chaos Programming Console- Controls and metrics
  • 19.
  • 20.
    Options and ChaosContainer Def.
  • 21.
  • 22.
    Chaos Programming Mode- Pods status Watch status of pods related to current chaos experiments.
  • 23.
    chaos-codename: promethium jobs: cpu-attack-job: additional-labels: chaos-controller: kubeinvaders chaos-type:stress-ng chaos-codename: promethium image: docker.io/luckysideburn/kubeinvaders-stress-ng:latest command: "stress-ng" args: - --version mem-attack-job: additional-labels: chaos-controller: kubeinvaders chaos-type: stress-ng chaos-codename: promethium image: docker.io/luckysideburn/kubeinvaders-stress-ng:latest command: "stress-ng" args: - --version experiments: - name: cpu-attack-exp job: cpu-attack-job loop: 5 - name: mem-attack-exp job: mem-attack-job loop: 5
  • 24.
    Chaos Programming Mode- k-inv language
  • 25.
    Chaos Programming Mode- HTTP load test
  • 26.
    Community Links ● Awesome_k8s: https://github.com/ramitsurana/aweso me-kubernetes ●Kubernetes_blog: https://kubernetes.io/blog/2020/01/22 /kubeinvaders-gamified-chaos-engine ering-tool-for-kubernetes/ ● Live_session: https://www.youtube.com/watch?v =k0w-NXt0_hA https://github.com/lucky-sideburn/kubeinvaders (repo git) https://devopstribe.it/ (my blog…) https://www.linkedin.com/in/eugenio-marzo-646a674 2/ (linkedin profile) ● eugenio.marzo [at] yahoo.it ● kubeinvaders [at] gmail.com Other Tools Contacts Links
  • 27.