Fault tolerance made easy
A head-start to resilient software design

Uwe Friedrichsen (codecentric AG) – QCon London – 5. ...
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread o...
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
It‘s all about production!
Production
Availability
Resilience
Fault Tolerance
Your web server doesn‘t look good …
Pattern #1

Timeouts
Timeouts (1)
// Basics
myObject.wait(); // Do not use this by default
myObject.wait(TIMEOUT); // Better use this
// Some m...
Timeouts (2)
// Using the Java concurrent library
Callable<MyActionResult> myAction = <My Blocking Action>
ExecutorService...
Timeouts (3)
// Using Guava SimpleTimeLimiter
Callable<MyActionResult> myAction = <My Blocking Action>
SimpleTimeLimiter l...
Determining Timeout Duration

Configurable Timeouts

Self-Adapting Timeouts

Timeouts in JavaEE Containers
Pattern #2

Circuit Breaker
Circuit Breaker (1)
Client
 Resource
Circuit Breaker
Request
Resource unavailable
Resource available
Closed
 Open
Half-Ope...
Circuit Breaker (2)
Closed

on call / pass through
call succeeds / reset count
call fails / count failure
threshold reache...
Circuit Breaker (3)
public class CircuitBreaker implements MyResource {
public enum State { CLOSED, OPEN, HALF_OPEN }
fina...
Circuit Breaker (4)
...
public Result access(...) { // resource access
Result r = null;
if (state == OPEN) {
checkTimeout(...
Circuit Breaker (5)
...
private void success() {
reset();
}
private void fail() {
counter++;
if (counter > THRESHOLD) {
tr...
Circuit Breaker (6)
...
private void tripBreaker() {
state = OPEN;
tripTime = System.currentTimeMillis();
}
private void c...
Thread-Safe Circuit Breaker

Failure Types

Tuning Circuit Breakers

Available Implementations
Pattern #3

Fail Fast
Fail Fast (1)
Client
 Resources
Expensive Action
Request
Uses
Fail Fast (2)
Client
 Resources
Expensive Action
Request
Fail Fast Guard
Uses
Check availability
Forward
Fail Fast (3)
public class FailFastGuard {
private FailFastGuard() {}
public static void checkResources(Set<CircuitBreaker...
Fail Fast (4)
public class MyService {
Set<CircuitBreaker> requiredResources;
// Initialize resources
...
public Result my...
The dreaded SiteTooSuccessfulException …
Pattern #4

Shed Load
Shed Load (1)
Clients
 Server
Too many Requests
Shed Load (2)
Server
Too many Requests
Gate Keeper
Monitor
Requests
Request Load Data
 Monitor Load
Shedded Requests
Clien...
Shed Load (3)
public class ShedLoadFilter implements Filter {
Random random;
public void init(FilterConfig fc) throws Serv...
Shed Load (4)
...
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain)
throws java.io...
Shed Load (5)
...
private boolean shouldShed(int load) { // Example implementation
if (load < THRESHOLD) {
return false;
}...
Shed Load (6)
Shed Load (7)
Shedding Strategy

Retrieving Load

Tuning Load Shedders

Alternative Strategies
Pattern #5

Deferrable Work
Deferrable Work (1)
Client
Requests
Request Processing
Resources
Use
Routine Work
Use
OVERLOAD
Deferrable Work (2)
Without

Deferrable Work
100%
OVERLOAD
With

Deferrable Work
100%
Request Processing
Routine ...
// Do or wait variant
ProcessingState state = initBatch();
while(!state.done()) {
int load = getLoad();
if (load > THRESHO...
// Adaptive load variant
ProcessingState state = initBatch();
while(!state.done()) {
waitLoadBased();
state = processNext(...
Delay Strategy

Retrieving Load

Tuning Deferrable Work
I can hardly hear you …
Pattern #6

Leaky Bucket
Leaky Bucket (1)
Leaky Bucket
Fill
Problem
occured
Periodically
Leak
Error
Handling
Overflowed?
public class LeakyBucket { // Very simple implementation
final private int capacity;
private int level;
private boolean ov...
...
public void fill() {
level++;
if (level > capacity) {
overflow = true;
}
}
public void leak() {
level--;
if (level < 0...
Thread-Safe Leaky Bucket

Leaking strategies

Tuning Leaky Bucket

Available Implementations
Pattern #7

Limited Retries
// doAction returns true if successful, false otherwise
// General pattern
boolean success = false
int tries = 0;
while (!...
Idempotent Actions

Closures / Lambdas

Tuning Retries
More Patterns






•  Complete Parameter Checking
•  Marked Data
•  Routine Audits
Further reading

1.  Michael T. Nygard, Release It!,
Pragmatic Bookshelf, 2007
2.  Robert S. Hanmer,

Patterns for Fault T...
It‘s all about production!
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/resilient-
software-design-pat...
Fault Tolerance Made Easy
Upcoming SlideShare
Loading in...5
×

Fault Tolerance Made Easy

299

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1kZylpD.

Uwe Friedrichsen discusses several easy to implement resilient software design patterns, when to use them and how to actually implement them - code included along with options to extend and improve those patterns in order to make an application more robust step by step in order to achieve the next level after agile and clean code: Becoming a resilient software developer! Filmed at qconlondon.com.

Uwe Friedrichsen travels the IT world for many years. As a fellow of codecentric AG he can live out his curiosity about new ideas and concepts as well as his joy at outside-the-box thinking. His current focus areas are distributed, highly scalable systems and architecture in post-agile environments.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
299
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Fault Tolerance Made Easy"

  1. 1. Fault tolerance made easy A head-start to resilient software design Uwe Friedrichsen (codecentric AG) – QCon London – 5. March 2014
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /resilient-software-design-patterns
  3. 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  5. 5. It‘s all about production!
  6. 6. Production Availability Resilience Fault Tolerance
  7. 7. Your web server doesn‘t look good …
  8. 8. Pattern #1 Timeouts
  9. 9. Timeouts (1) // Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this
  10. 10. Timeouts (2) // Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }
  11. 11. Timeouts (3) // Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }
  12. 12. Determining Timeout Duration Configurable Timeouts Self-Adapting Timeouts Timeouts in JavaEE Containers
  13. 13. Pattern #2 Circuit Breaker
  14. 14. Circuit Breaker (1) Client Resource Circuit Breaker Request Resource unavailable Resource available Closed Open Half-Open Lifecycle
  15. 15. Circuit Breaker (2) Closed on call / pass through call succeeds / reset count call fails / count failure threshold reached / trip breaker Open on call / fail on timeout / attempt reset trip breaker Half-Open on call / pass through call succeeds / reset call fails / trip breaker trip breaker attempt reset reset Source: M. Nygard, „Release It!“
  16. 16. Circuit Breaker (3) public class CircuitBreaker implements MyResource { public enum State { CLOSED, OPEN, HALF_OPEN } final MyResource resource; State state; int counter; long tripTime; public CircuitBreaker(MyResource r) { resource = r; state = CLOSED; counter = 0; tripTime = 0L; } ...
  17. 17. Circuit Breaker (4) ... public Result access(...) { // resource access Result r = null; if (state == OPEN) { checkTimeout(); throw new ResourceUnavailableException(); } try { r = resource.access(...); // should use timeout } catch (Exception e) { fail(); throw e; } success(); return r; } ...
  18. 18. Circuit Breaker (5) ... private void success() { reset(); } private void fail() { counter++; if (counter > THRESHOLD) { tripBreaker(); } } private void reset() { state = CLOSED; counter = 0; } ...
  19. 19. Circuit Breaker (6) ... private void tripBreaker() { state = OPEN; tripTime = System.currentTimeMillis(); } private void checkTimeout() { if ((System.currentTimeMillis - tripTime) > TIMEOUT) { state = HALF_OPEN; counter = THRESHOLD; } } public State getState() return state; } }
  20. 20. Thread-Safe Circuit Breaker Failure Types Tuning Circuit Breakers Available Implementations
  21. 21. Pattern #3 Fail Fast
  22. 22. Fail Fast (1) Client Resources Expensive Action Request Uses
  23. 23. Fail Fast (2) Client Resources Expensive Action Request Fail Fast Guard Uses Check availability Forward
  24. 24. Fail Fast (3) public class FailFastGuard { private FailFastGuard() {} public static void checkResources(Set<CircuitBreaker> resources) { for (CircuitBreaker r : resources) { if (r.getState() != CircuitBreaker.CLOSED) { throw new ResourceUnavailableException(r); } } } }
  25. 25. Fail Fast (4) public class MyService { Set<CircuitBreaker> requiredResources; // Initialize resources ... public Result myExpensiveAction(...) { FailFastGuard.checkResources(requiredResources); // Execute core action ... } }
  26. 26. The dreaded SiteTooSuccessfulException …
  27. 27. Pattern #4 Shed Load
  28. 28. Shed Load (1) Clients Server Too many Requests
  29. 29. Shed Load (2) Server Too many Requests Gate Keeper Monitor Requests Request Load Data Monitor Load Shedded Requests Clients
  30. 30. Shed Load (3) public class ShedLoadFilter implements Filter { Random random; public void init(FilterConfig fc) throws ServletException { random = new Random(System.currentTimeMillis()); } public void destroy() { random = null; } ...
  31. 31. Shed Load (4) ... public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws java.io.IOException, ServletException { int load = getLoad(); if (shouldShed(load)) { HttpServletResponse res = (HttpServletResponse)response; res.setIntHeader("Retry-After", RECOMMENDATION); res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE); return; } chain.doFilter(request, response); } ...
  32. 32. Shed Load (5) ... private boolean shouldShed(int load) { // Example implementation if (load < THRESHOLD) { return false; } double shedBoundary = ((double)(load - THRESHOLD))/ ((double)(MAX_LOAD - THRESHOLD)); return random.nextDouble() < shedBoundary; } }
  33. 33. Shed Load (6)
  34. 34. Shed Load (7)
  35. 35. Shedding Strategy Retrieving Load Tuning Load Shedders Alternative Strategies
  36. 36. Pattern #5 Deferrable Work
  37. 37. Deferrable Work (1) Client Requests Request Processing Resources Use Routine Work Use
  38. 38. OVERLOAD Deferrable Work (2) Without
 Deferrable Work 100% OVERLOAD With
 Deferrable Work 100% Request Processing Routine Work
  39. 39. // Do or wait variant ProcessingState state = initBatch(); while(!state.done()) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { state = processNext(state); } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability } Deferrable Work (3)
  40. 40. // Adaptive load variant ProcessingState state = initBatch(); while(!state.done()) { waitLoadBased(); state = processNext(state); } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; } Deferrable Work (4)
  41. 41. Delay Strategy Retrieving Load Tuning Deferrable Work
  42. 42. I can hardly hear you …
  43. 43. Pattern #6 Leaky Bucket
  44. 44. Leaky Bucket (1) Leaky Bucket Fill Problem occured Periodically Leak Error Handling Overflowed?
  45. 45. public class LeakyBucket { // Very simple implementation final private int capacity; private int level; private boolean overflow; public LeakyBucket(int capacity) { this.capacity = capacity; drain(); } public void drain () { this.level = 0; this.overflow = false; } ... Leaky Bucket (2)
  46. 46. ... public void fill() { level++; if (level > capacity) { overflow = true; } } public void leak() { level--; if (level < 0) { level = 0; } } public boolean overflowed() { return overflow; } } Leaky Bucket (3)
  47. 47. Thread-Safe Leaky Bucket Leaking strategies Tuning Leaky Bucket Available Implementations
  48. 48. Pattern #7 Limited Retries
  49. 49. // doAction returns true if successful, false otherwise // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...); Limited Retries (1)
  50. 50. Idempotent Actions Closures / Lambdas Tuning Retries
  51. 51. More Patterns •  Complete Parameter Checking •  Marked Data •  Routine Audits
  52. 52. Further reading 1.  Michael T. Nygard, Release It!, Pragmatic Bookshelf, 2007 2.  Robert S. Hanmer,
 Patterns for Fault Tolerant Software, Wiley, 2007 3.  James Hamilton, On Designing and Deploying Internet-Scale Services,
 21st LISA Conference 2007 4.  Andrew Tanenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms,
 Prentice Hall, 2nd Edition, 2006
  53. 53. It‘s all about production!
  54. 54. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  55. 55. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/resilient- software-design-patterns

×