Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Devoxx France: Fault tolerant microservices on the JVM with Cassandra

0 views

Published on

Devoxx France: Fault tolerant microservices on the JVM with Cassandra

Published in: Software

Devoxx France: Fault tolerant microservices on the JVM with Cassandra

  1. 1. @chbatey Fault tolerant microservices on the JVM Christopher Batey DataStax @chbatey
  2. 2. @chbatey Who am I? • DataStax - Technical Evangelist / Software Engineer - Builds enterprise ready version of Apache Cassandra • Sky: Building next generation Internet TV platform • Lots of time working on a test double for Apache Cassandra
  3. 3. @chbatey Agenda •Setting the scene -What do we mean by a fault? -What is a micro(ish)service? -Monolith application vs the micro(ish)service •A worked example -Identify an issue -Reproduce/test it -Show how to deal with the issue
  4. 4. @chbatey So… what do applications look like?
  5. 5. @chbatey So… what do applications look like?
  6. 6. @chbatey So… what do applications look like?
  7. 7. @chbatey So… what do applications look like?
  8. 8. @chbatey So… what do applications look like?
  9. 9. @chbatey Small horizontal scalable services • Move to small services independently deployed - Login service - Device service - etc • Move to a horizontally scalable Database that can run active active in multiple data centres
  10. 10. @chbatey So… what do applications look like?
  11. 11. @chbatey So... what do systems look like now?
  12. 12. @chbatey Pin Service Movie Player User Service Device Service Play Movie Example: Movie player service
  13. 13. @chbatey Time for an example... •All examples are on github •Technologies used: -Dropwizard -Spring Boot -Wiremock -Hystrix -Graphite -Saboteur
  14. 14. @chbatey Testing microservices • You don’t know a service is fault tolerant if you don’t test faults
  15. 15. @chbatey The test double Wiremock for HTTP integration Stubbed Cassandra for Database Kafka Unit
  16. 16. @chbatey Isolated service tests Movie service Mocks User Device Pin service Play Movie Acceptan ce Test Prime Real HTTP/TCP
  17. 17. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  18. 18. @chbatey 1 - Don’t take forever • If at first you don’t succeed, don’t take forever to tell someone • Timeout and fail fast
  19. 19. @chbatey Which timeouts? • Socket connection timeout • Socket read timeout
  20. 20. @chbatey Your service hung for 30 seconds :( Customer You :(
  21. 21. @chbatey Which timeouts? • Socket connection timeout • Socket read timeout • Resource acquisition
  22. 22. @chbatey Your service hung for 10 minutes :(
  23. 23. @chbatey Let’s think about this
  24. 24. @chbatey A little more detail
  25. 25. @chbatey Adding a automated test
  26. 26. @chbatey Adding a automated test •Vagrant - launches + provisions localVMs •Saboteur - uses tc, iptables to simulate network issues •Wiremock - used to mock HTTP dependencies •Cucumber - acceptance tests
  27. 27. @chbatey I can write an automated test for that? Wiremock: •User Service •Device Service •Pin Service S a b o t e u r Vagrant + Virtual box VM Movie Service Acceptance prime to drop traffic reset
  28. 28. @chbatey Implementing reliable timeouts
  29. 29. @chbatey Implementing reliable timeouts • Protect the container thread! • Homemade:Worker Queue + Thread pool (executor)
  30. 30. @chbatey Implementing reliable timeouts • Protect the container thread! • Homemade:Worker Queue + Thread pool (executor) • Hystrix • Spring cloud Netflix
  31. 31. @chbatey A simple Spring RestController @RestController
 public class Resource {
 
 private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class);
 
 @Autowired
 private ScaryDependency scaryDependency;
 
 @RequestMapping("/scary")
 public String callTheScaryDependency() {
 LOGGER.info("Resource later: I wonder which thread I am on!");
 return scaryDependency.getScaryString();
 }
 }
  32. 32. @chbatey Scary dependency @Component
 public class ScaryDependency {
 
 private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
 
 public String getScaryString() {
 LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”);
 if (System.currentTimeMillis() % 2 == 0) {
 return "Scary String";
 } else { Thread.sleep(5000)
 return “Slow Scary String";
 }
 }
 }
  33. 33. @chbatey All on the tomcat thread 13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.Resource - Resource later: I wonder which thread I am on! 13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on! Tomcats?
  34. 34. @chbatey Scary dependency @Component
 public class ScaryDependency {
 
 private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
 
 @HystrixCommand()
 public String getScaryString() {
 LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”);
 if (System.currentTimeMillis() % 2 == 0) {
 return "Scary String";
 } else { Thread.sleep(5000)
 return “Slow Scary String";
 }
 }
 }
  35. 35. @chbatey What an annotation can do... 13:51:21.513 [http-8080-exec-1] INFO info.batey.examples.Resource - Resource later: I wonder which thread I am on! 13:51:21.614 [hystrix-ScaryDependency-1] INFO info.batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on! Tomcats? :P
  36. 36. @chbatey Async libraries are your friend • DataStax Java Driver - Guava ListenableFuture
  37. 37. @chbatey Timeouts take home • You can’t use network level timeouts for SLAs • Test your SLAs - if someone says you can’t, hit them with a stick • Scary things happen without network issues
  38. 38. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  39. 39. @chbatey 2 - Don’t try if you can’t succeed
  40. 40. @chbatey Complexity “When an application grows in complexity it will eventually start sending emails”
  41. 41. @chbatey Complexity “When an application grows in complexity it will eventually start using queues and thread pools”
  42. 42. @chbatey Don’t try if you can’t succeed
  43. 43. @chbatey Don’t try if you can’t succeed • Executor Unbounded queues :( - newFixedThreadPool - newSingleThreadExecutor - newThreadCachedThreadPool • Bound your queues and threads • Fail quickly when the queue / maxPoolSize is met • Know your drivers
  44. 44. @chbatey This is a functional requirement • Set the timeout very high • Use Wiremock to add a large delay to the requests
  45. 45. @chbatey This is a functional requirement • Set the timeout very high • Use Wiremock to add a large delay to the requests • Set queue size and thread pool size to 1 • Send in 2 requests to use the thread and fill the queue • What happens on the 3rd request?
  46. 46. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  47. 47. @chbatey 3 - Fail gracefully
  48. 48. @chbatey Expect rubbish • Expect invalid HTTP • Expect malformed response bodies • Expect connection failures • Expect huge / tiny responses
  49. 49. @chbatey Testing with Wiremock stubFor(get(urlEqualTo("/dependencyPath"))
 .willReturn(aResponse()
 .withFault(Fault.MALFORMED_RESPONSE_CHUNK)));
 {
 "request": {
 "method": "GET",
 "url": "/fault"
 },
 "response": {
 "fault": "RANDOM_DATA_THEN_CLOSE"
 }
 {
 "request": {
 "method": "GET",
 "url": "/fault"
 },
 "response": {
 "fault": "EMPTY_RESPONSE"
 }
 }
  50. 50. @chbatey Stubbed Cassandra
  51. 51. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  52. 52. @chbatey 4 - Know if it’s your fault
  53. 53. @chbatey Record stuff • Metrics: - Timings - Errors - Concurrent incoming requests - Thread pool statistics - Connection pool statistics • Logging: Boundary logging, ElasticSearch / Logstash • Request identifiers
  54. 54. @chbatey Zipkin from Twitter
  55. 55. @chbatey Graphite + Codahale
  56. 56. @chbatey Response times
  57. 57. @chbatey Separate resource pools • Don’t flood your dependencies • Be able to answer the questions: - How many connections will you make to dependency X? - Are you getting close to your max connections?
  58. 58. @chbatey So easy with Dropwizard + Hystrix metrics:
 reporters:
 - type: graphite
 host: 192.168.10.120
 port: 2003
 prefix: shiny_app metrics:
 reporters:
 - type: graphite
 host: 192.168.10.120
 port: 2003
 prefix: shiny_app @Override
 public void initialize(Bootstrap<AppConfig> appConfigBootstrap) {
 HystrixCodaHaleMetricsPublisher metricsPublisher = 
 new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry());
 HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher);
 }
  59. 59. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  60. 60. @chbatey 5 - Don’t whack a dead horse Movie Player User Service Device Service Play Movie Pin Service
  61. 61. @chbatey What to do… • Yes this will happen… • Mandatory dependency - fail *really* fast • Throttling • Fallbacks
  62. 62. @chbatey Circuit breaker pattern
  63. 63. @chbatey Implementation with Hystrix 
 @Path("integrate")
 public class IntegrationResource {
 
 private static final Logger LOGGER = LoggerFactory.getLogger(IntegrationResource.class);
 
 @GET
 @Timed
 public String integrate() {
 LOGGER.info("integrate");
 String user = new UserServiceDependency(userService).execute();
 String device = new DeviceServiceDependency(deviceService).execute();
 Boolean pinCheck = new PinCheckDependency(pinService).execute();
 return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, pinCheck);
 }
 
 }
  64. 64. @chbatey Implementation with Hystrix 
 public class PinCheckDependency extends HystrixCommand<Boolean> {
 
 private HttpClient httpClient;
 
 public PinCheckDependency(HttpClient httpClient) {
 super(HystrixCommandGroupKey.Factory.asKey("PinCheckService"));
 this.httpClient = httpClient;
 }
 
 @Override
 protected Boolean run() throws Exception {
 HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
 HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
 int statusCode = pinCheckResponse.getStatusLine().getStatusCode();
 if (statusCode != 200) {
 throw new RuntimeException("Oh dear no pin check, status code " + statusCode);
 }
 String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
 return Boolean.valueOf(pinCheckInfo);
 }
 
 }

  65. 65. @chbatey Implementation with Hystrix 
 public class PinCheckDependency extends HystrixCommand<Boolean> {
 
 private HttpClient httpClient;
 
 public PinCheckDependency(HttpClient httpClient) {
 super(HystrixCommandGroupKey.Factory.asKey("PinCheckService"));
 this.httpClient = httpClient;
 }
 
 @Override
 protected Boolean run() throws Exception {
 HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
 HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
 int statusCode = pinCheckResponse.getStatusLine().getStatusCode();
 if (statusCode != 200) {
 throw new RuntimeException("Oh dear no pin check, status code " + statusCode);
 }
 String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
 return Boolean.valueOf(pinCheckInfo);
 }
 
 @Override
 public Boolean getFallback() {
 return true;
 }
 }

  66. 66. @chbatey Triggering the fallback • Error threshold percentage • Bucket of time for the percentage • Minimum number of requests to trigger • Time before trying a request again • Disable • Per instance statistics
  67. 67. @chbatey Fault tolerance 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  68. 68. @chbatey 6 - Turn off broken stuff • The kill switch
  69. 69. @chbatey To recap 1.Don’t take forever - Timeouts 2.Don’t try if you can’t succeed 3.Fail gracefully 4.Know if it’s your fault 5.Don’t whack a dead horse 6.Turn broken stuff off
  70. 70. @chbatey Links • Examples: - https://github.com/chbatey/spring-cloud-example - https://github.com/chbatey/dropwizard-hystrix - https://github.com/chbatey/vagrant-wiremock-saboteur • Tech: - https://github.com/Netflix/Hystrix - https://www.vagrantup.com/ - http://wiremock.org/ - https://github.com/tomakehurst/saboteur
  71. 71. @chbatey Questions? Thanks for listening! Questions: @chbatey http://christopher-batey.blogspot.co.uk/ http://www.eventbrite.com/e/cassandra-day-paris-france-2015-june-16th-2015-tickets-15053035033?aff=meetup1 http://www.eventbrite.com/e/cassandra-day-london-2015-april-22nd-2015-tickets-15053026006?aff=meetup1

×