Enterprise resilience patterns

Enterprise resilience
patterns
@seva_dolgopolov

2 Approaches
“Defensive coding” vs. “Let it crash”

Disaster Math
Defensive coding
Let it crash

Patterns
Exceptions, Timeout, Circuit Breaker, Handshaking, Bulkhead,
Health checks, Heartbeat, Retry, Rollback, Reset, Failover,
Fallback, Backpressure, Bounded Queue, Load Balancing,
Dead Letter, Supervision, Governor, ...

Infrastructure View
Hardware
|
Process
|
Network
Exceptions, Health checks, Heartbeat, Dead Letter,
Retry, Rollback, Reset, Governor, Supervision
Timeout, Handshaking, Backpressure,
Fallback, Circuit Breaker
Load Balancing, Bulkhead, Failover

Employer View
Ops
|
Dev
Failover, Load balancing, Supervisor, Health checks
Fallback, Bulkhead, Timeout, Circuit Breaker,
Handshaking, Backpressure, Retry, Rollback,
Reset,Supervisor, Bounded Queue, Dead Letter

Application View
Failover, Load balancing, Supervisor, Bulkhead
Health checks, Timeout, Handshaking, Supervisor, Dead
Letter, Heartbeat
Fallback, Circuit Breaker, Backpressure, Retry, Rollback,
Reset, Bounded Queue
Deployment
|
Detection
|
Repair

Akka
Deployment
- Supervisor
- Bulkhead(as Actor)
-> Detection -> Repair
- Heartbeat
- Dead letters
- Timeouts
- Restart
- Fallback
- Backpressure (akka-stream)
- Failover (akka-cluster)

Netflix
Deployment
- Bulkhead(as
Microservice)
-> Detection -> Repair
- Heartbeat
- Timeouts
- Circuit Breaker (Hystrix)
- Fallback
- Retry (Ribbon)
- Failover (Eureka)

A few things
1. Resilience will shape the way you implement your business logic
2. And get another level of complexity

objectives
- Staging env will never be full sized replica of production
- Safety is not composable property

Chaos
Engineering
http://principlesofchaos.org/

Netflix “Chaos monkey”
If something hurts do it more often.

Enrolling Chaos
Opt Out vs. Opt In

Targeting chaos
Random vs. Prespecified

Playing Chaos
1. Define a “Steady State”
2. Make a hypothesis that state will not change
“Clustered services should be unaffected by instance failures”
“The application is responsive even under high latency conditions”
3. Inject “Chaos”
4. Verify your hypothesis

Takeaways
- There is no silver bullet to achieve safe system
- Implementing Resilience brings complexity and you need
to manage it.
- Only test in Production make you confident

Enterprise resilience patterns

More Related Content

Similar to Enterprise resilience patterns

Recently uploaded

Enterprise resilience patterns