Chaos Engineering is the practice of intentionally introducing controlled and measurable failures into software systems to build resilience and confidence in their ability to withstand unexpected conditions.
2. #techtuesdays
Fallacies of Distributed Systems
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous
3. #techtuesdays
What is Chaos Engineering?
Chaos Engineering is the practice of
intentionally introducing controlled and
measurable failures into software systems
to build resilience and confidence in their
ability to withstand unexpected conditions.
4. #techtuesdays
Principles of Chaos Engineering
01 Build a hypothesis around a steady-state
03 Run experiments in production
05 Minimize blast radius
02 Simulate real-world events
04 Automate experiments and run them continuously
5. #techtuesdays
Types of Chaos Engineering
Experiment
Game Days Latency Injection
Infrastructure Failure
Volume Testing
Latency injection is the deliberate
introduction of a delay in system
response times to understand the
impact of degraded performance.
A game day is a simulation of a
disaster scenario, designed to test
the system's resiliency.
Volume testing is the process of
increasing the volume of traffic to
the system to assess its response
to high levels of traffic.
Simulating infrastructure failures,
such as server crashes or network
outages, can help to identify
weaknesses in the system's failover
processes.
7. #techtuesdays
Benefits of Using Chaos
Engineering for Software Testing
Improved System Resilience
Increase Customer Confidence
Improved Team Culture
Chaos Engineering can identify and resolve vulnerabilities in a system, making it
more resilient to unexpected conditions.
By improving system reliability and reducing downtime, Chaos Engineering can
increase customer confidence in your product or service.
Chaos Engineering requires engineers to work together and learn from their
mistakes, which can lead to an improved team culture and a stronger sense of
collaboration.
8. #techtuesdays
Common Tools of Chaos
Engineering
Chaos Kong
Chaos Monkey
Latency
Gremlin
Disables entire AWS availability zones.
Randomly disables production environment instances to cause a system failure.
Introduces latency to simulate network outages and degradation.
A chaos engineering program that works with AWS and Kubernetes
9. Thank you for
your time.
#techtuesdays www.Gleecus.com
hello@gleecus.com
Contact us: