Presented By: Dipayan Pramanik
Chaos Engineering on Kubernetes
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
Agenda
01 A real life scenario
02 What is Chaos Engineering
03 Why Chaos Engineering is needed
04 Chaos Mesh - a Chaos Engineering tool for Kubernetes
05 Chaos Mesh architecture and features
06 Demo
Real Life Scenario
● Kubernetes is the main mode for application deployment in the
present time
● In the current time, containers are the main mode for
application deployment and Kubernetes is the container
orchestrator which serves the purpose.
● Though Kubernetes solves the problem of container recreation
and High availability and load balancing, there can be lot of
unfortunate problems which are unforeseen.
Chaos Engineering
● As the name suggests, chaos engineering is all about creating
havoc in the current environment.
● This chaos simulation is a way in which engineers can replicate
many such events through chaos experiments and tests. And
then they can check the result and find out what the application
lacks and solve the issue.
● In simpler words, chaos engineering is the practice of
implementing chaos and havoc in the production or staging
environment, so that the engineers can build a fault tolerant
application.
Why Chaos Engineering
● Let us list some of the issues that might happen with the
application deployed. Network failure, Network corruption,
Unresponsive pods, extra traffic etc.
● Any of the above scenario is enough to induce a downtime in
the application. But how do we avoid the downtime. How can
we stay prepared for the problems that have not occurred but
might occur?
● The answer to the question is Chaos Engineering. In Chaos
engineering we recreate many of the chaos scenarios that can
affect the application, and then build an application which can
tolerate the fault induced and be completely functional.
Chaos Mesh
● Chaos Mesh is chaos engineering platform for Kubernetes.
● There are no external dependency. It uses Kubernetes Custom resource Definitions(CRDs)
to define the chaos experiments.
● Chaos Mesh provides us a control over blast radius of the experiments by allowing us to
whitelist and black list namespaces.
● Chaos Mesh provides a wide variety of experiments which can be used to replicate real life
scenarios.
● We can run experiments in schedule or run them in serial or parallel as a workflow.
● Experiments can be configured through yaml or the dashboard it provides.
Architecture
● Chaos Dashboard: The visualization component of Chaos Mesh. Chaos Dashboard
offers a set of user-friendly web interfaces through which users can manipulate
and observe Chaos experiments. At the same time, Chaos Dashboard also
provides an RBAC permission management mechanism.
● Chaos Controller Manager: The core logical component of Chaos Mesh. Chaos
Controller Manager is primarily responsible for the scheduling and management of
Chaos experiments. This component contains several CRD Controllers, such as
Workflow Controller, Scheduler Controller, and Controllers of various fault types.
● Chaos Daemon: The main executive component. Chaos Daemon runs in the
DaemonSet mode and has the Privileged permission by default (which can be
disabled). This component mainly interferes with specific network devices, file
systems, kernels by hacking into the target Pod Namespace.
Features
● Provides different types of simulated faults like, container failure, network
corruption, network delay, kernel error.
● Also provides cloud platform like AWS, GCP specific faults.
● The Chaos daemon can be run on remote physical hosts and can be used to inject
simulated faults in those nodes.
● We can run single experiments or we can combine those experiments to form
workflow chain.
● We can also run the experiments in recurring schedule.
Demo
Thank You !

chaos-engineering-Knolx

  • 1.
    Presented By: DipayanPramanik Chaos Engineering on Kubernetes
  • 2.
    Lack of etiquetteand manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3.
    Agenda 01 A reallife scenario 02 What is Chaos Engineering 03 Why Chaos Engineering is needed 04 Chaos Mesh - a Chaos Engineering tool for Kubernetes 05 Chaos Mesh architecture and features 06 Demo
  • 4.
    Real Life Scenario ●Kubernetes is the main mode for application deployment in the present time ● In the current time, containers are the main mode for application deployment and Kubernetes is the container orchestrator which serves the purpose. ● Though Kubernetes solves the problem of container recreation and High availability and load balancing, there can be lot of unfortunate problems which are unforeseen.
  • 5.
    Chaos Engineering ● Asthe name suggests, chaos engineering is all about creating havoc in the current environment. ● This chaos simulation is a way in which engineers can replicate many such events through chaos experiments and tests. And then they can check the result and find out what the application lacks and solve the issue. ● In simpler words, chaos engineering is the practice of implementing chaos and havoc in the production or staging environment, so that the engineers can build a fault tolerant application.
  • 6.
    Why Chaos Engineering ●Let us list some of the issues that might happen with the application deployed. Network failure, Network corruption, Unresponsive pods, extra traffic etc. ● Any of the above scenario is enough to induce a downtime in the application. But how do we avoid the downtime. How can we stay prepared for the problems that have not occurred but might occur? ● The answer to the question is Chaos Engineering. In Chaos engineering we recreate many of the chaos scenarios that can affect the application, and then build an application which can tolerate the fault induced and be completely functional.
  • 7.
    Chaos Mesh ● ChaosMesh is chaos engineering platform for Kubernetes. ● There are no external dependency. It uses Kubernetes Custom resource Definitions(CRDs) to define the chaos experiments. ● Chaos Mesh provides us a control over blast radius of the experiments by allowing us to whitelist and black list namespaces. ● Chaos Mesh provides a wide variety of experiments which can be used to replicate real life scenarios. ● We can run experiments in schedule or run them in serial or parallel as a workflow. ● Experiments can be configured through yaml or the dashboard it provides.
  • 8.
    Architecture ● Chaos Dashboard:The visualization component of Chaos Mesh. Chaos Dashboard offers a set of user-friendly web interfaces through which users can manipulate and observe Chaos experiments. At the same time, Chaos Dashboard also provides an RBAC permission management mechanism. ● Chaos Controller Manager: The core logical component of Chaos Mesh. Chaos Controller Manager is primarily responsible for the scheduling and management of Chaos experiments. This component contains several CRD Controllers, such as Workflow Controller, Scheduler Controller, and Controllers of various fault types. ● Chaos Daemon: The main executive component. Chaos Daemon runs in the DaemonSet mode and has the Privileged permission by default (which can be disabled). This component mainly interferes with specific network devices, file systems, kernels by hacking into the target Pod Namespace.
  • 9.
    Features ● Provides differenttypes of simulated faults like, container failure, network corruption, network delay, kernel error. ● Also provides cloud platform like AWS, GCP specific faults. ● The Chaos daemon can be run on remote physical hosts and can be used to inject simulated faults in those nodes. ● We can run single experiments or we can combine those experiments to form workflow chain. ● We can also run the experiments in recurring schedule.
  • 10.
  • 11.