Resilience engineering

Application Resiliency
Yet Another Resilience Framework
Inspired By Netflix Hystrix

Resilience:
Systems easily “drift” from a state of resilience
and failure can emerge from component
relationships. Thus, applications (as components
of a complex system) must be resilient to latency
and failure on all of its system relationships and
not rely upon infrastructure alone to implement
this resilience

Philosophy
• Embrace failure as a natural state in the life-
cycle of the application
• Instead of trying to prevent it; manage it
• Let developers responsible for resiliency
• Process supervision
• Supervisor hierarchies

Resiliency Patterns
• Bulkheads
– Workload isolation with thread pools
• Dataflow Concurrency (Promises)
• Retry On Failure
• Timeouts
• Circuit Breaker
• Fallback
• Governor
– Overload Protection
– Throttling Concurrency & RateLimit

Async task orchestration with Promises

Fail fast (Timeout)
• Avoid “slow responses”
• Separate:
– SystemError - resources not available
– ApplicationError - bad user input etc
• Verify resource availability before starting
expensive task
• Input validation immediately

Retry on Failure
N
Improving User Experience and
Application Resiliency by Retrying
Dependency if Error recoverable
Retry Policy per service Call or Global defaults,
Transient errors should be retried
Service
Dependent Service Call
Transient Exception failures
Retry Call
1
2
Exception List:
Error 1
Error 2
Retry = Y
Retry Cnt = 2
Delay = 50ms
Policy: Service 1
Service
Framework
Application Service

Our Approach
Fallback
CircuitBreaker
CircuitBreaker
Retry
Retry
Timeout
Timeout
Primary
Service
Alternate
Service
1 3 4 5
Component Order
bulkhead
(Thread Pool)
bulkhead
(Thread Pool)
Governor
Order Service
getBillingInfo()
getOrder()
getShippingInfo()
2 6

Future Patterns
• Request Caching
• Request Collapsing
– A mechanism, which combines multiple requests
into a single backend dependency call to reduce
the number of threads and network connections
required
– The primary driver of using request collapsing is to reduce the number of threads and network
connections needed to perform concurrent command executions and do so in an automated manner
without forcing all developers of a codebase to coordinate manually batching of requests.

Getting Started
Demo on YouTube
– http://youtu.be/ZyeEdjufSHE
Code on GitHub
– https://github.com/xmlking/Resilience
Follow me on Twitter
– @xmlking

Resilience engineering

More Related Content

What's hot

Viewers also liked

Similar to Resilience engineering

Recently uploaded

Resilience engineering