SlideShare a Scribd company logo
1
Application Resiliency Patterns
Kiran Sama
ksama@visa.com
Application Resiliency Patterns | Aug 19, 2016
What is Application Resiliency?
• To recover quickly from difficulties
• To manage the sustained loads and spike in traffic
• Stop cascading failures
• Service degradation
2
Application Resiliency Patterns | Aug 19, 2016
Everything Is Healthy
3
Application Resiliency Patterns | Aug 19, 2016
Resources get saturated under load
4
Application Resiliency Patterns | Aug 19, 2016
Resiliency Patterns
Timeout
o Isolation
Fail Fast
Circuit Breaker
Adaptive Throttle
o Fallback
5
Application Resiliency Patterns | Aug 19, 2016
Timeout
• Always wait with timeout
• Preserve responsiveness independent of dependency latency
• Measure response times
• HTTP, JDBC and LDAP
• Connection Timeout, Read Timeout and Connection Pool
Timeout
• Configurable Timeouts
• Self Adapting Timeouts
• 99.5 percentile response time at full load without failures
6
Application Resiliency Patterns | Aug 19, 2016
Isolation
7
• One latent dependency does not effect rest of the application
• When dependency recovers, thread pool will clear up
Application Resiliency Patterns | Aug 19, 2016
Isolation on Server Side
8
• Dependency between clients
• Load on one client effects other
Application Resiliency Patterns | Aug 19, 2016
Isolation on Server Side
9
• Partitioning the service increases the stability
• Tradeoff: Resource cost increases
• Hybrid approach of shared and individual pools is efficient
• Separation Granularity
 Servers in a cluster
 Thread pools in an application
Application Resiliency Patterns | Aug 19, 2016
Circuit Breaker
10
Application Resiliency Patterns | Aug 19, 2016
Hystrix Circuit Breaker
11
Application Resiliency Patterns | Aug 19, 2016 12
Circuit Breaker in Resource Manager
Application Resiliency Patterns | Aug 19, 2016
Http Calls Using Resource Manager
13
• All the rest and soap calls are made through
ResourceManager will be converted to Hystrix commands.
• All the calls to one external system will be grouped as one
command by default
• API granularity for circuits to isolate end point specific
failures
• Failures: TimeoutException,ConnectException,UnKnownHo
stException and Status Code of 500 or more
• Fallback: Fail fast
Application Resiliency Patterns | Aug 19, 2016
Configuration Mechanism
14
• Configs can be controlled at ENV level via rm_configuration.xml
• Runtime refresh of configuration
• Configs can be controlled at multiple levels
 Value for all (Ex A,B,C,D,E…->A,B,C,D,E..)
 Value for one client application (Ex A -> B,C,D,E…)
 Value for one client to one server (Ex A->B)
• Important circuit breaker configs
 hystrix.command.default.circuitBreaker.enabled=true
 hystrix.command.default.circuitBreaker.requestVolumeThreshold=20
 hystrix.command.default.circuitBreaker.errorThresholdPercentage=50
 hystrix.command.default.circuitBreaker.forceOpen=false
 hystrix.command.default.circuitBreaker.forceClose=false
Application Resiliency Patterns | Aug 19, 2016
Adaptive Throttle
15
Application Resiliency Patterns | Aug 19, 2016
Adaptive Throttle
• Calculate success rate over last n seconds
• Client only allows successful throughput and little extra
to adapt to check server stability for next period
• This will result in more throughput as it always allows
some requests
16
Application Resiliency Patterns | Aug 19, 2016
Fallback
17
• It’s not optional
• Fallback should be part of the initial design
• Fallback Strategies
 Caching responses
 Stale Data
 Queue up the requests
 Remote cluster
Application Resiliency Patterns | Aug 19, 2016
More Patterns
• Defer Work, Queue it up
– Eventually consistency helps stability
• Asynchronous execution
– Stops cascading failures
– Blocks chain reactions
• Handshaking Pattern
– Server and client to communicate about capacity and timeouts
• Steady State
– Data purging
– Log archival
– Limit in memory caching
18

More Related Content

What's hot

An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
Gremlin
 
HA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and SolutionHA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and Solution
Continuity and Resilience
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
Knoldus Inc.
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
Pradeep Loganathan
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
Amazon Web Services
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
Uwe Friedrichsen
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
DevOps.com
 
Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?
Thoughtworks
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
Ana Medina
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
Klika Tech, Inc
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
Abdelghani Azri
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
Araf Karsh Hamid
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
Araf Karsh Hamid
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
Araf Karsh Hamid
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
Araf Karsh Hamid
 
IBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQIBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQ
Roman Kharkovski
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
Tung Nguyen
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
Amazon Web Services
 
DevOps - an Agile Perspective (at Scale)
DevOps - an Agile Perspective (at Scale)DevOps - an Agile Perspective (at Scale)
DevOps - an Agile Perspective (at Scale)
Brad Appleton
 

What's hot (20)

An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
HA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and SolutionHA & DR System Design - Concepts and Solution
HA & DR System Design - Concepts and Solution
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 
Tosca explained
Tosca explainedTosca explained
Tosca explained
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
 
Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft AzureIntroduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
 
IBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQIBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQ
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
DevOps - an Agile Perspective (at Scale)
DevOps - an Agile Perspective (at Scale)DevOps - an Agile Perspective (at Scale)
DevOps - an Agile Perspective (at Scale)
 

Similar to Application Resilience Patterns

AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
WASdev Community
 
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
Amazon Web Services
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
Amazon Web Services
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
Amazon Web Services
 
Autonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsAutonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud Applications
Srikumar Venugopal
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
RightScale
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
ssmarar
 
Scalable Web Applications Session at Codebase
Scalable Web Applications Session at CodebaseScalable Web Applications Session at Codebase
Scalable Web Applications Session at Codebase
Ian Massingham
 
Intro to Microservices Architecture
Intro to Microservices ArchitectureIntro to Microservices Architecture
Intro to Microservices Architecture
Peter Nijem
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
Andrew Schofield
 
Performance evaluation between checkpoint services in multi tier stateful
Performance evaluation between checkpoint services in multi tier statefulPerformance evaluation between checkpoint services in multi tier stateful
Performance evaluation between checkpoint services in multi tier stateful
Demis Gomes
 
Cloud Ready Apps
Cloud Ready AppsCloud Ready Apps
Cloud Ready Apps
Dotitude
 
Scalable Web Applications in AWS, 2014
Scalable Web Applications in AWS, 2014Scalable Web Applications in AWS, 2014
Scalable Web Applications in AWS, 2014
Vadim Zendejas
 
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
Amazon Web Services Korea
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
Orkhan Gasimov
 
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
Jan Penninkhof
 
Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04Mrityunjaya Hikkalgutti
 
Zerostack reliable openstack
Zerostack reliable openstackZerostack reliable openstack
Zerostack reliable openstack
ZeroStack
 

Similar to Application Resilience Patterns (20)

AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
 
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
AWS re:Invent 2016: Migrating Enterprise Messaging to the Cloud (ENT217)
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
Autonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud ApplicationsAutonomic Decentralised Elasticity Management of Cloud Applications
Autonomic Decentralised Elasticity Management of Cloud Applications
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Scalable Web Applications Session at Codebase
Scalable Web Applications Session at CodebaseScalable Web Applications Session at Codebase
Scalable Web Applications Session at Codebase
 
Intro to Microservices Architecture
Intro to Microservices ArchitectureIntro to Microservices Architecture
Intro to Microservices Architecture
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
 
Performance evaluation between checkpoint services in multi tier stateful
Performance evaluation between checkpoint services in multi tier statefulPerformance evaluation between checkpoint services in multi tier stateful
Performance evaluation between checkpoint services in multi tier stateful
 
Venugopal adec
Venugopal adecVenugopal adec
Venugopal adec
 
Cloud Ready Apps
Cloud Ready AppsCloud Ready Apps
Cloud Ready Apps
 
Scalable Web Applications in AWS, 2014
Scalable Web Applications in AWS, 2014Scalable Web Applications in AWS, 2014
Scalable Web Applications in AWS, 2014
 
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
 
Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04
 
Zerostack reliable openstack
Zerostack reliable openstackZerostack reliable openstack
Zerostack reliable openstack
 

Application Resilience Patterns

  • 2. Application Resiliency Patterns | Aug 19, 2016 What is Application Resiliency? • To recover quickly from difficulties • To manage the sustained loads and spike in traffic • Stop cascading failures • Service degradation 2
  • 3. Application Resiliency Patterns | Aug 19, 2016 Everything Is Healthy 3
  • 4. Application Resiliency Patterns | Aug 19, 2016 Resources get saturated under load 4
  • 5. Application Resiliency Patterns | Aug 19, 2016 Resiliency Patterns Timeout o Isolation Fail Fast Circuit Breaker Adaptive Throttle o Fallback 5
  • 6. Application Resiliency Patterns | Aug 19, 2016 Timeout • Always wait with timeout • Preserve responsiveness independent of dependency latency • Measure response times • HTTP, JDBC and LDAP • Connection Timeout, Read Timeout and Connection Pool Timeout • Configurable Timeouts • Self Adapting Timeouts • 99.5 percentile response time at full load without failures 6
  • 7. Application Resiliency Patterns | Aug 19, 2016 Isolation 7 • One latent dependency does not effect rest of the application • When dependency recovers, thread pool will clear up
  • 8. Application Resiliency Patterns | Aug 19, 2016 Isolation on Server Side 8 • Dependency between clients • Load on one client effects other
  • 9. Application Resiliency Patterns | Aug 19, 2016 Isolation on Server Side 9 • Partitioning the service increases the stability • Tradeoff: Resource cost increases • Hybrid approach of shared and individual pools is efficient • Separation Granularity  Servers in a cluster  Thread pools in an application
  • 10. Application Resiliency Patterns | Aug 19, 2016 Circuit Breaker 10
  • 11. Application Resiliency Patterns | Aug 19, 2016 Hystrix Circuit Breaker 11
  • 12. Application Resiliency Patterns | Aug 19, 2016 12 Circuit Breaker in Resource Manager
  • 13. Application Resiliency Patterns | Aug 19, 2016 Http Calls Using Resource Manager 13 • All the rest and soap calls are made through ResourceManager will be converted to Hystrix commands. • All the calls to one external system will be grouped as one command by default • API granularity for circuits to isolate end point specific failures • Failures: TimeoutException,ConnectException,UnKnownHo stException and Status Code of 500 or more • Fallback: Fail fast
  • 14. Application Resiliency Patterns | Aug 19, 2016 Configuration Mechanism 14 • Configs can be controlled at ENV level via rm_configuration.xml • Runtime refresh of configuration • Configs can be controlled at multiple levels  Value for all (Ex A,B,C,D,E…->A,B,C,D,E..)  Value for one client application (Ex A -> B,C,D,E…)  Value for one client to one server (Ex A->B) • Important circuit breaker configs  hystrix.command.default.circuitBreaker.enabled=true  hystrix.command.default.circuitBreaker.requestVolumeThreshold=20  hystrix.command.default.circuitBreaker.errorThresholdPercentage=50  hystrix.command.default.circuitBreaker.forceOpen=false  hystrix.command.default.circuitBreaker.forceClose=false
  • 15. Application Resiliency Patterns | Aug 19, 2016 Adaptive Throttle 15
  • 16. Application Resiliency Patterns | Aug 19, 2016 Adaptive Throttle • Calculate success rate over last n seconds • Client only allows successful throughput and little extra to adapt to check server stability for next period • This will result in more throughput as it always allows some requests 16
  • 17. Application Resiliency Patterns | Aug 19, 2016 Fallback 17 • It’s not optional • Fallback should be part of the initial design • Fallback Strategies  Caching responses  Stale Data  Queue up the requests  Remote cluster
  • 18. Application Resiliency Patterns | Aug 19, 2016 More Patterns • Defer Work, Queue it up – Eventually consistency helps stability • Asynchronous execution – Stops cascading failures – Blocks chain reactions • Handshaking Pattern – Server and client to communicate about capacity and timeouts • Steady State – Data purging – Log archival – Limit in memory caching 18