Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[WSO2Con EU 2017] Resilience Patterns with Ballerina


Published on

Today almost all systems are distributed and have complex interactions between each other to provide useful functionality. In a software system, resilience is the ability to recover to a working condition after being affected by a serious incident. Ballerina has inbuilt functionality to make programs resilient for network failures. This slide deck explores how to build resilience patterns with Ballerina.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

[WSO2Con EU 2017] Resilience Patterns with Ballerina

  1. 1. Senior Technical Lead, WSO2 Resilience Patterns with Ballerina Isuru Udana
  2. 2. Bob is a developer who works at the IT Department of a popular bank. Bob
  3. 3. Bob was asked to develop a mobile selfcare banking application.
  4. 4. This mobile application should be capable of showing account balances as well as details of past transactions.
  5. 5. Legacy services are already available.
  6. 6. But when he started to implement the application, bob found issues with some of the legacy services. Transient Network Failures Moderate Load Intermittent Failures
  7. 7. Bob found it very difficult to build a reliable application.
  8. 8. Resilient service Bob got an idea!
  9. 9. Resilience
  10. 10. Ability to return to the original form, position after being affected by a particular alteration What is Resilience?
  11. 11. In a software system, resilience means ... … the ability to recover to a working condition after being affected by a serious incident Resilience in Software Applications
  12. 12. “The probability of failure-free software operation for a specified period of time in a specified environment.” - The IEEE Reliability Society • 100% operational all the time Reliability and Resilience Reliability
  13. 13. Focusing on Reliability is Enough...?
  14. 14. Distributed and complex systems with many interactions are prone to failures Why Focusing on Reliability is Not Enough Systems are Complex and Prone to Failures
  15. 15. • Untested corner cases • Minor mistakes can affect serious production incidents • Failures are unpredictable Why Focusing on Reliability is Not Enough Avoiding Failures is Not Practical
  16. 16. • Handle unexpected situations • When one feature is temporarily unavailable, the rest of the application still runs • Stop propagating errors happening at downstreams of a complex system into upstreams Resilience in Production
  17. 17. It’s All About Achieving Availability of a Production System!
  18. 18. Best case: • User get’s a 100% availability of the service Typical case: • User sees a graceful degradation of the service What Does it Mean to a User?
  19. 19. • Never expect systems to be 100% reliable • Design systems thinking about connection issues, down times, etc. What Does it Mean to a developer ?
  20. 20. • Bulkhead • Retry • Circuit Breaker • Timeout • … Resilience Patterns
  21. 21. Isolate components of an application into multiple pools. If one component fails, others will continue to service Bulkhead Isolation
  22. 22. • Transient failures are not uncommon • They recover by themselves • Can be handled by – Cancel – Retry – Retry with a delay Retry
  23. 23. • Hide downstream latency and keep the responsiveness to upstream • Prevent waiting forever Timeout
  24. 24. • Some transient failures takes much longer to recover • Repeatedly retrying may hinder recoverability • Retry up to a certain degree and cut off Circuit Breaker
  25. 25. Circuit Breaker Fail/Keep Open Reset Timeout Fail Success Fail (threshold not reached) Fail (threshold exceeded) Success Open Half-OpenClosed States in circuit breaker
  26. 26. Resilience with Ballerina
  27. 27. • Designed to implement resilient programs/services • Highly structured error and exception handling Resilience with Ballerina
  28. 28. Back to the Story
  29. 29. Banking Service Banking Service Account Balance Service Account History Service Mobile Application Can only handle moderate load Transient Failures Sometimes takes a long time to respond
  30. 30. Banking Service Account Balance Legacy Service Account History Legacy Service Account Balance Resource Account History Resource
  31. 31. Banking Service Account Balance Resource Account Balance Legacy Service
  32. 32. Connectors, Connections and Endpoints Endpoint Connection Params Connector Options Struct
  33. 33. Handling Transient Failures Retry Banking Service Account Balance Service Mobile Application
  34. 34. Handling Transient Failures Retry Retry Count Retry Delay Options Struct
  35. 35. Timeout Timeout Duration
  36. 36. Protect Services From Overload Circuit Breaker Retries Before Suspension Suspension Duration
  37. 37. Applying Multiple Patterns Circuit Breaker + Retry + Timeout Circuit Breaker Retry Timeout
  38. 38. Balancing Load Banking Service Account Balance Service 1 Mobile Application Account Balance Service 2
  39. 39. Balancing Load Connectors
  40. 40. Story Continued ... For priority customers, application should provide zero downtime.
  41. 41. Offering Different Quality of Services Bulkhead Banking Service Reliable Service Service got Transient Failures Standard Customer Priority Customer
  42. 42. Offering Different Quality of Services Bulkhead Reliable Service Service with Transient Issues Priority Check Function
  43. 43. Conclusion ● What is Resilience ● Reliability and resilience ● Resilience patterns ● Building resilience patterns with Ballerina
  44. 44.