Resilient Architecture

RESILIENT ARCHITECTURE
Matt Stine ( )@mstine
http://www.mattstine.com

A SYSTEM FAILURE COSTS A WELL-KNOWN
RETAILER SIGNIFICANT REVENUE ON THE
BIGGEST INTERNET SHOPPING DAY OF
THE YEAR.

A SYSTEM FAILURE CAUSES THE
CANCELLATION OF HUNDREDS OF
FLIGHTS, STRANDING THOUSANDS OF
AIRLINE PASSENGERS, AND ULTIMATELY
COSTING THE AIRLINE MILLIONS IN
REVENUE.

A BEAUTIFULLY DESIGNED ONLINE STORE
CRUMBLES UNDER THE PRESSURE OF A
THUNDERING HERD OF CUSTOMERS
TRYING TO PURCHASE THE LATEST TECH
GADGET.

A SECURITY BREACH EXPOSES
THOUSANDS OF CUSTOMER CREDIT CARD
NUMBERS, LEADING TO MILLIONS IN LOST
REVENUE DUE TO THE RESULTING LOSS
OF TRUST.

DISRUPTIVE COMPANIES ARE
ALSO APPROACHING RESILIENCY
DIFFERENTLY.

STOP TRYING TO PREVENT
MISTAKES.

WE NEED BETTER TOOLS AND
TECHNIQUES.

RESILIENT ARCHITECTURES
Enhance Observability
Leverage Resiliency Patterns
Embrace Chaos

WHAT IS NORMAL?
Values
Rates of Change
Mean?
P95/99/99.9?

WHAT IS NORMAL?
http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/

SPRING BOOT HEALTH
ENDPOINT
{
"diskSpace": {
"status": "UP",
"total": 1056858112,
"free": 878850048,
"threshold": 10485760
},
"refreshScope": {
"status": "UP"
},
"configServer": {
"status": "UP",
"propertySources": [
"configClient",
"https://github.com/spring-cloud-services-samples/fortune-teller/configuration/application.yml"
]
},
"hystrix": {

SPRING BOOT INFO ENDPOINT
"git": {
"build": {
"host": "Matts-MacBook-Pro.local",
"version": "0.0.1-SNAPSHOT",
"time": 1489021333000,
"user": {
"name": "Matt Stine",
"email": "mstine@pivotal.io"
}
},
"branch": "master",
"commit": {
"message": {
"short": "initial commit",
"full": "initial commit"
},
"id": "9b624974e417693cf921b9abc50b5af4ea0b6dde",
"id.describe-short": "9b62497-dirty",
"id.abbrev": "9b62497",
"id.describe": "9b62497-dirty",

EXAMPLES:
Spring Boot Actuator
http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#production-ready
PCF Apps Manager
https://docs.pivotal.io/pivotalcf/1-9/console/using-actuators.html
Spring Cloud Sleuth
https://cloud.spring.io/spring-cloud-sleuth/
Zipkin
http://zipkin.io/

TIMEOUTS
Thinking is half the battle!
Anything that blocks threads
Any method call with an optional timeout argument

ADDING TIMEOUTS TO
RESTTEMPLATE
@Bean
public RestTemplate restTemplate() {
SimpleClientHttpRequestFactory clientHttpRequestFactory
= new SimpleClientHttpRequestFactory();
clientHttpRequestFactory.setConnectTimeout(10 * 1000); // Ten seconds!
clientHttpRequestFactory.setReadTimeout(10 * 1000); // Ten seconds!
return new RestTemplate(clientHttpRequestFactory);
}

RETRIES
Potentially transient failures
Immediately
With a backoff
Maximum times
Log all the things

SIMPLE RETRY
@RequestMapping("/acquireThings")
@Retryable
public ResponseEntity<String> tryToAcquireThings() {
logger.info("Attempting to acquire things...");
String things = restTemplate
.getForObject("http://localhost:8081/things", String.class);
return new ResponseEntity<String>(things, HttpStatus.OK);
}
@Recover
public ResponseEntity<String> recover() {
logger.warn("Returning default response...");
return new ResponseEntity<String>("default things", HttpStatus.OK);
}

RETRY WITH BACKOFF
@RequestMapping("/acquireThings")
@Retryable(maxAttempts = 5,
backoff = @Backoff(delay = 100L, maxDelay = 1000L,
multiplier = 2, random = true)
)
public ResponseEntity<String> tryToAcquireThings() {
logger.info("Attempting to acquire things...");
String things = restTemplate
.getForObject("http://localhost:8081/things", String.class);
return new ResponseEntity<String>(things, HttpStatus.OK);
}

EXPONENTIAL BACKOFF
@Bean
public BackOffPolicy backOffPolicy() {
return new ExponentialBackOffPolicy();
}

BULKHEADS
Microservices
Thread Pools
Availability Zones

SPRING CLOUD HYSTRIX
@HystrixCommand(fallbackMethod = "fallbackFortune")
public Fortune randomFortune() {
return restTemplate.getForObject("http://fortunes/random", Fortune.class);
}
private Fortune fallbackFortune() {
return new Fortune(42L, fortuneProperties.getFallbackFortune());
}

EXAMPLES:
Spring Retry
https://github.com/spring-projects/spring-retry
Hystrix
https://github.com/Netflix/Hystrix
via Spring Cloud Netflix
https://cloud.spring.io/spring-cloud-netflix/

HOW DO YOU KNOW YOUR
SYSTEM WILL TOLERATE FAILURE
IF IT HASN'T FAILED?

YAU AND CHEUNG:
DESIGN OF SELF-CHECKING SOFTWARE
(1975)

EXAMPLES:
Chaos Lemur (BOSH)
https://github.com/strepsirrhini-army/chaos-lemur
Chaos Loris (CF)
https://github.com/strepsirrhini-army/chaos-loris

REVIEW TIME!
Stop trying to prevent mistakes
Focus on MTTR
Enhance observability
Leverage resiliency patterns
Embrace chaos!

THANKS!
Matt Stine ( )@mstine
http://www.mattstine.com

Resilient Architecture

More Related Content

What's hot

Viewers also liked

Similar to Resilient Architecture

More from Matt Stine

Recently uploaded

Resilient Architecture