Devoxx France: Fault tolerant microservices on the JVM with Cassandra

@chbatey
Fault tolerant microservices on
the JVM
Christopher Batey
DataStax
@chbatey

@chbatey
Who am I?
• DataStax
- Technical Evangelist / Software Engineer
- Builds enterprise ready version of Apache Cassandra
• Sky: Building next generation Internet TV platform
• Lots of time working on a test double for Apache Cassandra

@chbatey
Agenda
•Setting the scene
-What do we mean by a fault?
-What is a micro(ish)service?
-Monolith application vs the micro(ish)service
•A worked example
-Identify an issue
-Reproduce/test it
-Show how to deal with the issue

@chbatey
So… what do applications look like?

@chbatey
Small horizontal scalable services
• Move to small services independently deployed
- Login service
- Device service
- etc
• Move to a horizontally scalable Database that can run active
active in multiple data centres

@chbatey
So... what do systems look like now?

@chbatey
Pin
Service
Movie Player
User
Service
Device
Service
Play Movie
Example: Movie player service

@chbatey
Time for an example...
•All examples are on github
•Technologies used:
-Dropwizard
-Spring Boot
-Wiremock
-Hystrix
-Graphite
-Saboteur

@chbatey
Testing microservices
• You don’t know a service is fault tolerant if you don’t test faults

@chbatey
The test double
Wiremock for HTTP integration
Stubbed Cassandra for Database
Kafka Unit

@chbatey
Isolated service tests
Movie service
Mocks
User
Device
Pin
service
Play Movie
Acceptan
ce
Test
Prime
Real HTTP/TCP

@chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off

@chbatey
1 - Don’t take forever
• If at ﬁrst you don’t succeed, don’t take forever to tell
someone
• Timeout and fail fast

@chbatey
Which timeouts?
• Socket connection timeout
• Socket read timeout

@chbatey
Your service hung for 30 seconds :(
Customer
You :(

@chbatey
Which timeouts?
• Socket connection timeout
• Socket read timeout
• Resource acquisition

@chbatey
Your service hung for 10 minutes :(

@chbatey
Let’s think about this

@chbatey
Adding a automated test

@chbatey
Adding a automated test
•Vagrant - launches + provisions localVMs
•Saboteur - uses tc, iptables to simulate network issues
•Wiremock - used to mock HTTP dependencies
•Cucumber - acceptance tests

@chbatey
I can write an automated test for that?
Wiremock:
•User Service
•Device Service
•Pin Service
S
a
b
o
t
e
u
r
Vagrant + Virtual box VM
Movie
Service
Acceptance
prime to drop traffic
reset

@chbatey
Implementing reliable timeouts

@chbatey
• Protect the container thread!
• Homemade:Worker Queue + Thread pool (executor)

@chbatey
• Protect the container thread!
• Homemade:Worker Queue + Thread pool (executor)
• Hystrix
• Spring cloud Netﬂix

@chbatey
A simple Spring RestController
@RestController 
public class Resource { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); 
 
@Autowired 
private ScaryDependency scaryDependency; 
 
@RequestMapping("/scary") 
public String callTheScaryDependency() { 
LOGGER.info("Resource later: I wonder which thread I am on!"); 
return scaryDependency.getScaryString(); 
} 
}

@chbatey
Scary dependency
@Component 
public class ScaryDependency { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
 
public String getScaryString() { 
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else {
Thread.sleep(5000) 
return “Slow Scary String"; 
} 
} 
}

@chbatey
All on the tomcat thread
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.Resource -
Resource later: I wonder which thread I am on!
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats?

@chbatey
Scary dependency
@Component 
public class ScaryDependency { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
 
@HystrixCommand() 
public String getScaryString() { 
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else {
Thread.sleep(5000) 
return “Slow Scary String"; 
} 
} 
}

@chbatey
What an annotation can do...
13:51:21.513 [http-8080-exec-1] INFO info.batey.examples.Resource - Resource
later: I wonder which thread I am on!
13:51:21.614 [hystrix-ScaryDependency-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats? :P

@chbatey
Async libraries are your friend
• DataStax Java Driver
- Guava ListenableFuture

@chbatey
Timeouts take home
• You can’t use network level timeouts for SLAs
• Test your SLAs - if someone says you can’t, hit them with a
stick
• Scary things happen without network issues

@chbatey
2 - Don’t try if you can’t succeed

@chbatey
Complexity
“When an application grows
in complexity it will
eventually start sending
emails”

@chbatey
Complexity
“When an application grows
in complexity it will
eventually start using
queues and thread pools”

@chbatey
Don’t try if you can’t succeed

@chbatey
Don’t try if you can’t succeed
• Executor Unbounded queues :(
- newFixedThreadPool
- newSingleThreadExecutor
- newThreadCachedThreadPool
• Bound your queues and threads
• Fail quickly when the queue / maxPoolSize is met
• Know your drivers

@chbatey
This is a functional requirement
• Set the timeout very high
• Use Wiremock to add a large delay to the requests

@chbatey
This is a functional requirement
• Set the timeout very high
• Use Wiremock to add a large delay to the requests
• Set queue size and thread pool size to 1
• Send in 2 requests to use the thread and ﬁll the queue
• What happens on the 3rd request?

@chbatey
Expect rubbish
• Expect invalid HTTP
• Expect malformed response bodies
• Expect connection failures
• Expect huge / tiny responses

@chbatey
Testing with Wiremock
stubFor(get(urlEqualTo("/dependencyPath")) 
.willReturn(aResponse() 
.withFault(Fault.MALFORMED_RESPONSE_CHUNK))); 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "RANDOM_DATA_THEN_CLOSE" 
} 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "EMPTY_RESPONSE" 
} 
}

@chbatey
4 - Know if it’s your fault

@chbatey
Record stuff
• Metrics:
- Timings
- Errors
- Concurrent incoming requests
- Thread pool statistics
- Connection pool statistics
• Logging: Boundary logging, ElasticSearch / Logstash
• Request identiﬁers

@chbatey
Separate resource pools
• Don’t ﬂood your dependencies
• Be able to answer the questions:
- How many connections will you make to
dependency X?
- Are you getting close to your max
connections?

@chbatey
So easy with Dropwizard + Hystrix
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
prefix: shiny_app
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
preﬁx: shiny_app
@Override 
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { 
HystrixCodaHaleMetricsPublisher metricsPublisher =  
new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()); 
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); 
}

@chbatey
5 - Don’t whack a dead horse
Movie Player
User
Service
Device
Service
Play Movie
Pin
Service

@chbatey
What to do…
• Yes this will happen…
• Mandatory dependency - fail *really* fast
• Throttling
• Fallbacks

@chbatey
Circuit breaker pattern

@chbatey
Implementation with Hystrix
 
@Path("integrate") 
public class IntegrationResource { 
 
private static final Logger LOGGER = LoggerFactory.getLogger(IntegrationResource.class); 
 
@GET 
@Timed 
public String integrate() { 
LOGGER.info("integrate"); 
String user = new UserServiceDependency(userService).execute(); 
String device = new DeviceServiceDependency(deviceService).execute(); 
Boolean pinCheck = new PinCheckDependency(pinService).execute(); 
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device,
pinCheck); 
} 
 
}

@chbatey
 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
 
private HttpClient httpClient; 
 
public PinCheckDependency(HttpClient httpClient) { 
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService")); 
this.httpClient = httpClient; 
} 
 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
int statusCode = pinCheckResponse.getStatusLine().getStatusCode(); 
if (statusCode != 200) { 
throw new RuntimeException("Oh dear no pin check, status code " + statusCode); 
} 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
 
}

@chbatey
 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
 
private HttpClient httpClient; 
 
public PinCheckDependency(HttpClient httpClient) { 
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService")); 
this.httpClient = httpClient; 
} 
 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
int statusCode = pinCheckResponse.getStatusLine().getStatusCode(); 
if (statusCode != 200) { 
throw new RuntimeException("Oh dear no pin check, status code " + statusCode); 
} 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
 
@Override 
public Boolean getFallback() { 
return true; 
} 
}

@chbatey
Triggering the fallback
• Error threshold percentage
• Bucket of time for the percentage
• Minimum number of requests to trigger
• Time before trying a request again
• Disable
• Per instance statistics

@chbatey
6 - Turn off broken stuff
• The kill switch

@chbatey
To recap
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off

@chbatey
Links
• Examples:
- https://github.com/chbatey/spring-cloud-example
- https://github.com/chbatey/dropwizard-hystrix
- https://github.com/chbatey/vagrant-wiremock-saboteur
• Tech:
- https://github.com/Netﬂix/Hystrix
- https://www.vagrantup.com/
- http://wiremock.org/
- https://github.com/tomakehurst/saboteur

@chbatey
Questions?
Thanks for listening!
Questions: @chbatey
http://christopher-batey.blogspot.co.uk/
http://www.eventbrite.com/e/cassandra-day-paris-france-2015-june-16th-2015-tickets-15053035033?aff=meetup1
http://www.eventbrite.com/e/cassandra-day-london-2015-april-22nd-2015-tickets-15053026006?aff=meetup1

Devoxx France: Fault tolerant microservices on the JVM with Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Devoxx France: Fault tolerant microservices on the JVM with Cassandra

Similar to Devoxx France: Fault tolerant microservices on the JVM with Cassandra (20)

More from Christopher Batey

More from Christopher Batey (8)

Recently uploaded

Recently uploaded (20)

Devoxx France: Fault tolerant microservices on the JVM with Cassandra