Voxxed Vienna 2015 Fault tolerant microservices

@chbatey#Voxxed
Fault tolerant microservices
Christopher Batey
DataStax

@chbatey
Who am I?
• DataStax
- Technical Evangelist / Software Engineer
- Builds enterprise ready version of Apache
Cassandra
• Sky: Building next generation Internet TV
platform
• Lots of time working on a test double for
Apache Cassandra

@chbatey
Agenda
•Setting the scene
•What do we mean by a fault?
•What is a micro(ish)service?
•Monolith application vs the micro(ish)service
•A worked example
•Identify an issue
•Reproduce/test it
•Show how to deal with the issue

So… what do applications look like?

So... what do systems look like now?

But different things go wrong...
down
slow network
slow app
SLA: 2 second max
missing packets
GC :(

Pin
Service
Movie Player
User
Service
Device
Service
Play Movie
Example: Movie player service

@chbatey
Time for an example...
•All examples are on github
•Technologies used:
•Dropwizard
•Spring Boot
•Wiremock
•Hystrix
•Graphite
•Saboteur

@chbatey
Testing microservices
• You don’t know a service is fault tolerant
if you don’t test faults

Isolated service tests
Movie service
Mocks
User
Device
Pin service
Play Movie
Acceptance
Test
Prime
Real HTTP/TCP

@chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off

@chbatey
1 - Don’t take forever
• If at first you don’t succeed, don’t take forever to tell someone
• Timeout and fail fast

@chbatey
Which timeouts?
• Socket connection timeout
• Socket read timeout

Your service hung for 30 seconds :(
Customer
You :(

@chbatey
Which timeouts?
• Socket connection timeout
• Socket read timeout
• Resource acquisition

Your service hung for 10 minutes :(

@chbatey
Wiremock + Saboteur + Vagrant
•Vagrant - launches + provisions local VMs
•Saboteur - uses tc, iptables to simulate network issues
•Wiremock - used to mock HTTP dependencies
•Cucumber - acceptance tests

I can write an automated test for that?
Wiremock:
•User Service
•Device Service
•Pin Service
S
a
b
o
t
e
u
r
Vagrant + Virtual box VM
Movie
Service
Acceptance
Test
prime to drop trafﬁc
reset

@chbatey
Implementing reliable timeouts
• Protect the container thread!
• Homemade: Worker Queue + Thread pool (executor)

@chbatey
Implementing reliable timeouts
• Protect the container thread!
• Homemade: Worker Queue + Thread pool (executor)
• Hystrix
• Spring cloud Netflix

A simple Spring RestController
@RestController 
public class Resource { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); 
 
@Autowired 
private ScaryDependency scaryDependency; 
 
@RequestMapping("/scary") 
public String callTheScaryDependency() { 
LOGGER.info("Resource later: I wonder which thread I am on!"); 
return scaryDependency.getScaryString(); 
} 
}

Scary dependency
@Component 
public class ScaryDependency { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
 
public String getScaryString() { 
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else {
Thread.sleep(5000) 
return “Slow Scary String"; 
} 
} 
}

All on the tomcat thread
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.Resource -
Resource later: I wonder which thread I am on!
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats?

Scary dependency
@Component 
public class ScaryDependency { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
 
@HystrixCommand() 
public String getScaryString() { 
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else {
Thread.sleep(5000) 
return “Slow Scary String"; 
} 
} 
}

What an annotation can do...
13:51:21.513 [http-8080-exec-1] INFO info.batey.examples.Resource - Resource
later: I wonder which thread I am on!
13:51:21.614 [hystrix-ScaryDependency-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats? :P

@chbatey
Timeouts take home
● You can’t use network level timeouts for SLAs
● Test your SLAs - if someone says you can’t, hit them with a stick
● Scary things happen without network issues

2 - Don’t try if you can’t succeed

Complexity
“When an application grows in complexity it will
eventually start sending emails”

Complexity
“When an application grows in complexity it will
eventually start using queues and thread pools”
Or use Akka :)

@chbatey
Don’t try if you can’t succeed

@chbatey
Don’t try if you can’t succeed
• Executor Unbounded queues :(
• newFixedThreadPool
• newSingleThreadExecutor
• newThreadCachedThreadPool
• Bound your queues and threads
• Fail quickly when the queue / maxPoolSize is met
• Know your drivers

@chbatey
This is a functional requirement
•Set the timeout very high
•Use Wiremock to add a large delay to the requests

@chbatey
This is a functional requirement
•Set the timeout very high
•Use Wiremock to add a large delay to the requests
•Set queue size and thread pool size to 1
•Send in 2 requests to use the thread and fill the queue
•What happens on the 3rd request?

@chbatey
Expect rubbish
•Expect invalid HTTP
•Expect malformed response bodies
•Expect connection failures
•Expect huge / tiny responses

Testing with Wiremock
stubFor(get(urlEqualTo("/dependencyPath")) 
.willReturn(aResponse() 
.withFault(Fault.MALFORMED_RESPONSE_CHUNK))); 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "RANDOM_DATA_THEN_CLOSE" 
} 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "EMPTY_RESPONSE" 
} 
}

@chbatey
Record stuff
•Metrics:
- Timings
- Errors
- Concurrent incoming requests
- Thread pool statistics
- Connection pool statistics
•Logging: Boundary logging, ElasticSearch / Logstash
•Request identifiers

@chbatey
Separate resource pools
•Don’t flood your dependencies
•Be able to answer the questions:
-How many connections will you
make to dependency X?
-Are you getting close to your max
connections?

So easy with Dropwizard + Hystrix
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
preﬁx: shiny_app
@Override 
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { 
HystrixCodaHaleMetricsPublisher metricsPublisher =  
new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()); 
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); 
}

Pin
Service
Movie Player
User
Service
Device
Service
Play Movie
5 - Don’t whack a dead horse

@chbatey
What to do…
•Yes this will happen…
•Mandatory dependency - fail *really* fast
•Throttling
•Fallbacks

Implementation with Hystrix
 
@Path("integrate") 
public class IntegrationResource { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(IntegrationResource.class); 
 
@GET 
@Timed 
public String integrate() { 
LOGGER.info("integrate"); 
String user = new UserServiceDependency(userService).execute(); 
String device = new DeviceServiceDependency(deviceService).execute(); 
Boolean pinCheck = new PinCheckDependency(pinService).execute(); 
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, pinCheck); 
} 
 
}

 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
 
private HttpClient httpClient; 
 
public PinCheckDependency(HttpClient httpClient) { 
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService")); 
this.httpClient = httpClient; 
} 
 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
int statusCode = pinCheckResponse.getStatusLine().getStatusCode(); 
if (statusCode != 200) { 
throw new RuntimeException("Oh dear no pin check, status code " + statusCode); 
} 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
 
}

 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
 
private HttpClient httpClient; 
 
public PinCheckDependency(HttpClient httpClient) { 
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService")); 
this.httpClient = httpClient; 
} 
 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
int statusCode = pinCheckResponse.getStatusLine().getStatusCode(); 
if (statusCode != 200) { 
throw new RuntimeException("Oh dear no pin check, status code " + statusCode); 
} 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
 
@Override 
public Boolean getFallback() { 
return true; 
} 
}

@chbatey
Triggering the fallback
•Error threshold percentage
•Bucket of time for the percentage
•Minimum number of requests to trigger
•Time before trying a request again
•Disable
•Per instance statistics

@chbatey
6 - Turn off broken stuff
• The kill switch

@chbatey
To recap
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off

@chbatey
Links
• Examples:
- https://github.com/chbatey/spring-cloud-example
- https://github.com/chbatey/dropwizard-hystrix
- https://github.com/chbatey/vagrant-wiremock-saboteur
• Tech:
- https://github.com/Netflix/Hystrix
- https://www.vagrantup.com/
- http://wiremock.org/
- https://github.com/tomakehurst/saboteur

@chbatey
Questions?
Thanks for listening!
Questions: @chbatey
http://christopher-batey.blogspot.co.uk/

@chbatey
Developer takeaways
● Learn about TCP
● Love vagrant, docker etc to enable testing
● Don’t trust libraries

Hystrix cost - do this yourself

@chbatey
Hystrix metrics
● Failure count
● Percentiles from Hystrix point of view
● Error percentages

@chbatey
How to test metric publishing?
● Stub out graphite and verify calls?
● Programmatically call graphite and verify numbers?
● Make metrics + logs part of the story demo

Voxxed Vienna 2015 Fault tolerant microservices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Voxxed Vienna 2015 Fault tolerant microservices

Similar to Voxxed Vienna 2015 Fault tolerant microservices (20)

More from Christopher Batey

More from Christopher Batey (19)

Recently uploaded

Recently uploaded (20)

Voxxed Vienna 2015 Fault tolerant microservices