2. What is Chaos Engineering? (for new attendees)
● Murphy’s law.
● Builds confidence in overall distributed systems to withstand turbulent &
unexpected conditions.
● Highlights weakness of the complex system proactively.
● Minimal downtime -> Less SLA breaches -> Less revenue loss.
● Improves the resilience of the system. Key areas:
○ Infrastructure Failures
○ Network Failures
○ Application Failures
3. What is Gremlin?
● It is a Failure/Resiliency as Service SaaS platform.
● Platform to safely, securely and simply simulate real world outages.
● Supports variety of failure vectors.
● Failures can be injected at Infrastructure and Application Layers.
● Failures can be triggered on demand as well as in a scheduled manner.
● Free(with limitation) + Organization version.
4. Gremlin Failure Vectors
Resources Starve your system’s critical
resources
CPU, Memory, IO, Disk
State Change the state of your
system within which your
application is running
Shutdown, Time Travel,
Process killer
Network Simulate unreliable
behavior of network
Blackhole, Latency(egress),
Packet Loss(egress), DNS
Requests Impact individual requests
as they hit the wire
Latency, Yes/No Switch
5. Gremlin
Architecture
● Gremlin clients
● Clients are installed in your
infrastructure or in your
application
● Clients communicates to and
fro with the the Gremlin platform
7. Gremlin Scenarios
● Set of Gremlin attacks.
● Name, Description and Hypothesis can be defined.
● Reusable.
● Stores the result of the triggered scenarios as well as the history.
● Gremlin offers few pre-configured scenarios.
8. User and Teams
● Company
● Teams
● Users
● RBAC (Company as well as Team)
● Authentication