Fault tolerance made easy
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Fault tolerance made easy

on

  • 2,742 views

Fault tolerance in general is a challenging topic. Yet we need fault toleranct designs more badly than ever in order to provide robust, highly available systems - especially in times of scale out ...

Fault tolerance in general is a challenging topic. Yet we need fault toleranct designs more badly than ever in order to provide robust, highly available systems - especially in times of scale out systems becoming more and more popular.

Unfortunately, most developers do not care too much about a fault tolerant design, either because they are scared by the complexity of the realm or because they do not care enough. One of the problems is that a lack of fault tolerant design does not hurt a lot in development or in QA, but it hurts a lot in production - as Michael Nygard said: "It's all about production!" (at least figuratively).

In this presentation I do *not* try to give a general introduction to fault tolerant design. Instead I pick a few generic case studies that demonstrate the results of missing fault tolerant design, try to sensitize a bit about the production relevance of fault tolerant design and then go along with a few selected patterns. I picked a few patterns which are surprisingly easy to implement and help to mitigate the problems of the former case studies.

This way I try to show two things:
1. A piece of architecture or design as a pattern is not necessarily hard to implement. Sometimes the code is written quicker than it takes to explain the pattern beforehand.
2. Even if fault tolerant design as a general topic might be hard, some parts of it can be implemented very easily and it's more than worth the coding effort if you look how much better your system behaves in production just from adding those few lines of code.

Statistics

Views

Total Views
2,742
Views on SlideShare
2,053
Embed Views
689

Actions

Likes
7
Downloads
47
Comments
1

18 Embeds 689

https://blog.codecentric.de 414
http://ufried.tumblr.com 162
http://feedly.com 29
http://www.redlab.be 22
http://feeds.feedburner.com 19
http://www.newsblur.com 11
https://twitter.com 6
http://newsblur.com 6
http://www.feedspot.com 4
https://newsblur.com 4
http://www.tumblr.com 3
http://feedproxy.google.com 3
https://bazqux.com 1
https://feedly.com 1
https://schalanda.name 1
http://www.linkedin.com 1
http://digg.com 1
https://www.linkedin.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Fault tolerance made easy Presentation Transcript

  • 1. Fault tolerance made easy Patterns for fault tolerance implemented surprisingly easy Uwe Friedrichsen, codecentric AG, 2013-2014
  • 2. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  • 3. It‘s all about production!
  • 4. Production Availability Resilience Fault Tolerance
  • 5. Your web server doesn‘t look good …
  • 6. Pattern #1 Timeouts
  • 7. Timeouts (1) // Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this
  • 8. Timeouts (2) // Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }
  • 9. Timeouts (3) // Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }
  • 10. Determining Timeout Duration Configurable Timeouts Self-Adapting Timeouts Timeouts in JavaEE Containers
  • 11. Pattern #2 Circuit Breaker
  • 12. Circuit Breaker (1) Client Resource Circuit Breaker Request Resource unavailable Resource available Closed Open Half-Open Lifecycle
  • 13. Circuit Breaker (2) Closed on call / pass through call succeeds / reset count call fails / count failure threshold reached / trip breaker Open on call / fail on timeout / attempt reset trip breaker Half-Open on call / pass through call succeeds / reset call fails / trip breaker trip breaker attempt reset reset Source: M. Nygard, „Release It!“
  • 14. Circuit Breaker (3) public class CircuitBreaker implements MyResource { public enum State { CLOSED, OPEN, HALF_OPEN } final MyResource resource; State state; int counter; long tripTime; public CircuitBreaker(MyResource r) { resource = r; state = CLOSED; counter = 0; tripTime = 0L; } ...
  • 15. Circuit Breaker (4) ... public Result access(...) { // resource access Result r = null; if (state == OPEN) { checkTimeout(); throw new ResourceUnavailableException(); } try { r = r.access(...); // should use timeout } catch (Exception e) { fail(); throw e; } success(); return r; } ...
  • 16. Circuit Breaker (5) ... private void success() { reset(); } private void fail() { counter++; if (counter > THRESHOLD) { tripBreaker(); } } private void reset() { state = CLOSED; counter = 0; } ...
  • 17. Circuit Breaker (6) ... private void tripBreaker() { state = OPEN; tripTime = System.currentTimeMillis(); } private void checkTimeout() { if ((System.currentTimeMillis - tripTime) > TIMEOUT) { state = HALF_OPEN; counter = THRESHOLD; } } public State getState() return state; } }
  • 18. Thread-Safe Circuit Breaker Failure Types Tuning Circuit Breakers Available Implementations
  • 19. Pattern #3 Fail Fast
  • 20. Fail Fast (1) Client Resources Expensive Action Request Uses
  • 21. Fail Fast (2) Client Resources Expensive Action Request Fail Fast Guard Uses Check availability Forward
  • 22. Fail Fast (3) public class FailFastGuard { private FailFastGuard() {} public static void checkResources(Set<CircuitBreaker> resources) { for (CircuitBreaker r : resources) { if (r.getState() != CircuitBreaker.CLOSED) { throw new ResourceUnavailableException(r); } } } }
  • 23. Fail Fast (4) public class MyService { Set<CircuitBreaker> requiredResources; // Initialize resources ... public Result myExpensiveAction(...) { FailFastGuard.checkResources(requiredResources); // Execute core action ... } }
  • 24. The dreaded SiteTooSuccessfulException …
  • 25. Pattern #4 Shed Load
  • 26. Shed Load (1) Clients Server Too many Requests
  • 27. Shed Load (2) Server Too many Requests Gate Keeper Monitor Requests Request Load Data Monitor Load Shedded Requests Clients
  • 28. Shed Load (3) public class ShedLoadFilter implements Filter { Random random; public void init(FilterConfig fc) throws ServletException { random = new Random(System.currentTimeMillis()); } public void destroy() { random = null; } ...
  • 29. Shed Load (4) ... public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws java.io.IOException, ServletException { int load = getLoad(); if (shouldShed(load)) { HttpServletResponse res = (HttpServletResponse)response; res.setIntHeader("Retry-After", RECOMMENDATION); res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE); return; } chain.doFilter(request, response); } ...
  • 30. Shed Load (5) ... private boolean shouldShed(int load) { // Example implementation if (load < THRESHOLD) { return false; } double shedBoundary = ((double)(load - THRESHOLD))/ ((double)(MAX_LOAD - THRESHOLD)); return random.nextDouble() < shedBoundary; } }
  • 31. Shed Load (6)
  • 32. Shed Load (7)
  • 33. Shedding Strategy Retrieving Load Tuning Load Shedders Alternative Strategies
  • 34. Pattern #5 Deferrable Work
  • 35. Deferrable Work (1) Client Requests Request Processing Resources Use Routine Work Use
  • 36. OVERLOAD Deferrable Work (2) Without
 Deferrable Work 100% OVERLOAD With
 Deferrable Work 100% Request Processing Routine Work
  • 37. // Do or wait variant ProcessingState state = initBatch(); while(!state.done()) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { state = processNext(state); } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability } Deferrable Work (3)
  • 38. // Adaptive load variant ProcessingState state = initBatch(); while(!state.done()) { waitLoadBased(); state = processNext(state); } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; } Deferrable Work (4)
  • 39. Delay Strategy Retrieving Load Tuning Deferrable Work
  • 40. I can hardly hear you …
  • 41. Pattern #6 Leaky Bucket
  • 42. Leaky Bucket (1) Leaky Bucket Fill Problem occured Periodically Leak Error Handling Overflowed?
  • 43. public class LeakyBucket { // Very simple implementation final private int capacity; private int level; private boolean overflow; public LeakyBucket(int capacity) { this.capacity = capacity; drain(); } public void drain () { this.level = 0; this.overflow = false; } ... Leaky Bucket (2)
  • 44. ... public void fill() { level++; if (level > capacity) { overflow = true; } } public void leak() { level--; if (level < 0) { level = 0; } } public boolean overflowed() { return overflow; } } Leaky Bucket (3)
  • 45. Thread-Safe Leaky Bucket Leaking strategies Tuning Leaky Bucket Available Implementations
  • 46. Pattern #7 Limited Retries
  • 47. // doAction returns true if successful, false otherwise // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...); Limited Retries (1)
  • 48. Idempotent Actions Closures / Lambdas Tuning Retries
  • 49. More Patterns •  Complete Parameter Checking •  Marked Data •  Routine Audits
  • 50. Further reading 1.  Michael T. Nygard, Release It!, Pragmatic Bookshelf, 2007 2.  Robert S. Hanmer,
 Patterns for Fault Tolerant Software, Wiley, 2007 3.  James Hamilton, On Designing and Deploying Internet-Scale Services,
 21st LISA Conference 2007 4.  Andrew Tanenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms,
 Prentice Hall, 2nd Edition, 2006
  • 51. It‘s all about production!
  • 52. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com