Introduction of Failsafe
debop@coupang.com 2019.03.14
Agenda
• MSA Use cases
• What is Failsafe
• Usage in Coupang
• How to work
• Main Features
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Fail? -> Retry? Fallback?
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Fail? -> Retry? Fallback?
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Fail? -> Retry? Fallback?
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
MSA - Key features
Service A
Service B
Service C
Our Service
Event Loop
Asynchronous
Non-Blocking
Async
Dashboard
Fail? -> Retry? Fallback?
Non-Blocking
CompletableFuture ?
Observable (RxJava) ?
Cache
Failsafe
• Latency and FaultTolerance for Distributed Systems
• Realtime Operations
• Synchronous & Asynchronous
• Resiliency - Fallback, Retry, Circuit Breaker
Failsafe vs Hystrix
• Executable logic can be passed to Failsafe as simple lambda expression
• Failsafe support retry
• Asynchronous execution in Failsafe are performed on a user suppliedThreadPool /
Scheduler
• Asynchronous execution can be observed via event listener api and return Future
• Hystrix circuit breakers are time sensitive, Failsafe use last executions, regardless of
when they took place.
• Failsafe circuit breakers support execution timeouts and configurable support
thresholds. Hystrix only performs a single execution when in half-open state
• Failsafe circuit breakers can be shared across different executions against the same
component, so that if a failure occurs, all executions against that component will be
halted by the circuit breaker.
Usage in Coupang
• Used in Connect SDK
• Main Goals
• Retry policy (backoff, jitters)
• Circuit breaker (Resiliency)
How to work
Circuit Breaker Pattern
Closed Open
Half
Open
trip breaker
If threshold
reached
Call pass through
count fail/success
reset breakers
trip breaker
try reset
after timeout is
reached
calls pass through
on success

reset breaker
on success

reset breaker
Closed-State
• The circuit breaker executes operations as usual
• If a failure occurs, the circuit breaker write it down
• If a specified error threshold (number of failures or frequency
of failures) is reached, it trips and opens the circuit breaker
(transitions to the open-state)
Open-State
• Calls to the circuit breaker in the open state fail immediately
• No call to the underlying operations is executed
• After a specified timeout is reached, the circuit breaker
transitions to the half-open state.
Half-Open-State
• In this state, one call is allowed to call the underlying operation
• If this call failed, the circuit-breaker transitions to the open-
state again until another timeout is reached
• If it succeeded, the circuit-breaker resets and transitions to
the closed-state.
Hystrix Flow Chart
Hystrix Flow Chart
Sync
Hystrix Flow Chart
AsyncSync
Hystrix Flow Chart
ReactiveAsyncSync
Hystrix Flow Chart
Cache
ReactiveAsyncSync
Hystrix Flow Chart
Cache
ReactiveAsyncSync
Circuit Breaker
Hystrix Flow Chart
Cache
ReactiveAsyncSync
Circuit Breaker
Thread Pool
Hystrix Flow Chart
Cache
ReactiveAsyncSync
Circuit Breaker
Fallback
Thread Pool
Main Features
Failsafe
Main Features
• Retry
• Circuit breaker
• Fallback
Retry
RetryPolicy<Object> retryPolicy = new RetryPolicy<>()
.handle(ConnectException.class) // handle specific exception
.withDelay(Duration.ofSeconds(1)) // retry with delay
.withMaxRetries(3); // maximum retry count
// Run with retries
Failsafe.with(retryPolicy).run(() -> connect());
// Get with retries
Connection connection = Failsafe.with(retryPolicy).get(() -> connect());
// Run with retries asynchronously
CompletableFuture<Void> future = Failsafe.with(retryPolicy).runAsync(() -> connect());
// Get with retries asynchronously
CompletableFuture<Connection> future = Failsafe.with(retryPolicy).getAsync(() -> connect());
Retry policies
retryPolicy.withMaxAttempts(3);
// delay between attempts
retryPolicy.withDelay(Duration.ofSeconds(1));
// delay with back off exponentially
retryPolicy.withBackoff(1, 30, ChronoUnit.SECONDS);
// random delay for some range
retryPolicy.withDelay(1, 10, ChronoUnit.SECONDS);
// time bases jitter
retryPolicy.withJitter(Duration.ofMillis(100));
retryPolicy
.abortWhen(false)
.abortOn(NoRouteToHostException.class)
.abortIf(result -> result == false)
Circuit Breakers
Circuit breakers allow you to create systems that fail-fast by temporarily disabling execution as a way of preventing system overload.
CircuitBreaker<Object> breaker = new CircuitBreaker<>()
.handle(ConnectException.class) // when ConnectionException occurs, open circuit
.withFailureThreshold(3, 10) // failure threshold to transit to open circuit
.withSuccessThreshold(5) // success threshold to transit to closed state from half-open
.withDelay(Duration.ofMinutes(1)); // after 1 minutes, transit to half-open state
breaker.withFailureThreshold(5); // when a successive number of executions has failed
breaker.withFailureThreshold(3, 5); // the last 3 out of 5 executions has failed
breaker.withSuccessThreshold(3, 5); // the last 3 out of 5 executions has success
Circuit Breaker Best practices
breaker.open();
breaker.halfOpen();
breaker.close();
if (breaker.allowsExecution()) {
try {
breaker.preExecute();
doSomething();
breaker.recordSuccess();
} catch (Exception e) {
breaker.recordFailure(e);
}
}
Fallback
// Fallback is null
Fallback<Object> fallback = Fallback.of(null);
// Fallback is throw a custom exception.
Fallback<Object> fallback = Fallback.of(failure -> { throw new CustomException(failure); });
// Fallback call alternative method
Fallback<Object> fallback = Fallback.of(this::connectToBackup);
// Fallback to run asynchronously
Fallback<Object> fallback = Fallback.ofAsync(this::blockingCall);
Policy Composition
// Policies handle execution results in reverse order
Failsafe.with(fallback, retryPolicy, circuitBreaker).get(supplier);
// Means: Fallback(RetryPolicy(CircuitBreaker(Supplier)))
Event Listeners
Failsafe.with(retryPolicy, circuitBreaker)
.onComplete(e -> {
if (e.getResult() != null)
log.info("Connected to {}", e.getResult());
else if (e.getFailure() != null)
log.error("Failed to create connection", e.getFailure());
})
.get(this::connect);
retryPolicy
.onFailedAttempt(e -> log.error("Connection attempt failed", e.getLastFailure()))
.onRetry(e -> log.warn("Failure #{}. Retrying.", ctx.getAttemptCount()));
Event Listeners
circuitBreaker
.onClose(() -> log.info("The circuit breaker was closed"));
.onOpen(() -> log.info("The circuit breaker was opened"))
.onHalfOpen(() -> log.info("The circuit breaker was half-opened"))
Asynchronous API Integration
Failsafe.with(retryPolicy)
.getAsyncExecution(execution -> service.connect().whenComplete((result, failure) -> {
if (execution.complete(result, failure))
log.info("Connected");
else if (!execution.retry())
log.error("Connection attempts failed", failure);
}));
Failsafe.with(retryPolicy)
.getStageAsync(this::connectAsync)
.thenApplyAsync(value -> value + "bar")
.thenAccept(System.out::println));
Thank you!

Introduction of failsafe

  • 1.
  • 2.
    Agenda • MSA Usecases • What is Failsafe • Usage in Coupang • How to work • Main Features
  • 3.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 4.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 5.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 6.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 7.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 8.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 9.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 10.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 11.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 12.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 13.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 14.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 15.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Fail? -> Retry? Fallback? Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 16.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Fail? -> Retry? Fallback? Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 17.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Fail? -> Retry? Fallback? Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 18.
    MSA - Keyfeatures Service A Service B Service C Our Service Event Loop Asynchronous Non-Blocking Async Dashboard Fail? -> Retry? Fallback? Non-Blocking CompletableFuture ? Observable (RxJava) ? Cache
  • 19.
    Failsafe • Latency andFaultTolerance for Distributed Systems • Realtime Operations • Synchronous & Asynchronous • Resiliency - Fallback, Retry, Circuit Breaker
  • 20.
    Failsafe vs Hystrix •Executable logic can be passed to Failsafe as simple lambda expression • Failsafe support retry • Asynchronous execution in Failsafe are performed on a user suppliedThreadPool / Scheduler • Asynchronous execution can be observed via event listener api and return Future • Hystrix circuit breakers are time sensitive, Failsafe use last executions, regardless of when they took place. • Failsafe circuit breakers support execution timeouts and configurable support thresholds. Hystrix only performs a single execution when in half-open state • Failsafe circuit breakers can be shared across different executions against the same component, so that if a failure occurs, all executions against that component will be halted by the circuit breaker.
  • 21.
    Usage in Coupang •Used in Connect SDK • Main Goals • Retry policy (backoff, jitters) • Circuit breaker (Resiliency)
  • 22.
  • 23.
    Circuit Breaker Pattern ClosedOpen Half Open trip breaker If threshold reached Call pass through count fail/success reset breakers trip breaker try reset after timeout is reached calls pass through on success
 reset breaker on success
 reset breaker
  • 24.
    Closed-State • The circuitbreaker executes operations as usual • If a failure occurs, the circuit breaker write it down • If a specified error threshold (number of failures or frequency of failures) is reached, it trips and opens the circuit breaker (transitions to the open-state)
  • 25.
    Open-State • Calls tothe circuit breaker in the open state fail immediately • No call to the underlying operations is executed • After a specified timeout is reached, the circuit breaker transitions to the half-open state.
  • 26.
    Half-Open-State • In thisstate, one call is allowed to call the underlying operation • If this call failed, the circuit-breaker transitions to the open- state again until another timeout is reached • If it succeeded, the circuit-breaker resets and transitions to the closed-state.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Main Features • Retry •Circuit breaker • Fallback
  • 37.
    Retry RetryPolicy<Object> retryPolicy =new RetryPolicy<>() .handle(ConnectException.class) // handle specific exception .withDelay(Duration.ofSeconds(1)) // retry with delay .withMaxRetries(3); // maximum retry count // Run with retries Failsafe.with(retryPolicy).run(() -> connect()); // Get with retries Connection connection = Failsafe.with(retryPolicy).get(() -> connect()); // Run with retries asynchronously CompletableFuture<Void> future = Failsafe.with(retryPolicy).runAsync(() -> connect()); // Get with retries asynchronously CompletableFuture<Connection> future = Failsafe.with(retryPolicy).getAsync(() -> connect());
  • 38.
    Retry policies retryPolicy.withMaxAttempts(3); // delaybetween attempts retryPolicy.withDelay(Duration.ofSeconds(1)); // delay with back off exponentially retryPolicy.withBackoff(1, 30, ChronoUnit.SECONDS); // random delay for some range retryPolicy.withDelay(1, 10, ChronoUnit.SECONDS); // time bases jitter retryPolicy.withJitter(Duration.ofMillis(100)); retryPolicy .abortWhen(false) .abortOn(NoRouteToHostException.class) .abortIf(result -> result == false)
  • 39.
    Circuit Breakers Circuit breakers allowyou to create systems that fail-fast by temporarily disabling execution as a way of preventing system overload. CircuitBreaker<Object> breaker = new CircuitBreaker<>() .handle(ConnectException.class) // when ConnectionException occurs, open circuit .withFailureThreshold(3, 10) // failure threshold to transit to open circuit .withSuccessThreshold(5) // success threshold to transit to closed state from half-open .withDelay(Duration.ofMinutes(1)); // after 1 minutes, transit to half-open state breaker.withFailureThreshold(5); // when a successive number of executions has failed breaker.withFailureThreshold(3, 5); // the last 3 out of 5 executions has failed breaker.withSuccessThreshold(3, 5); // the last 3 out of 5 executions has success
  • 40.
    Circuit Breaker Bestpractices breaker.open(); breaker.halfOpen(); breaker.close(); if (breaker.allowsExecution()) { try { breaker.preExecute(); doSomething(); breaker.recordSuccess(); } catch (Exception e) { breaker.recordFailure(e); } }
  • 41.
    Fallback // Fallback isnull Fallback<Object> fallback = Fallback.of(null); // Fallback is throw a custom exception. Fallback<Object> fallback = Fallback.of(failure -> { throw new CustomException(failure); }); // Fallback call alternative method Fallback<Object> fallback = Fallback.of(this::connectToBackup); // Fallback to run asynchronously Fallback<Object> fallback = Fallback.ofAsync(this::blockingCall);
  • 42.
    Policy Composition // Policieshandle execution results in reverse order Failsafe.with(fallback, retryPolicy, circuitBreaker).get(supplier); // Means: Fallback(RetryPolicy(CircuitBreaker(Supplier)))
  • 43.
    Event Listeners Failsafe.with(retryPolicy, circuitBreaker) .onComplete(e-> { if (e.getResult() != null) log.info("Connected to {}", e.getResult()); else if (e.getFailure() != null) log.error("Failed to create connection", e.getFailure()); }) .get(this::connect); retryPolicy .onFailedAttempt(e -> log.error("Connection attempt failed", e.getLastFailure())) .onRetry(e -> log.warn("Failure #{}. Retrying.", ctx.getAttemptCount()));
  • 44.
    Event Listeners circuitBreaker .onClose(() ->log.info("The circuit breaker was closed")); .onOpen(() -> log.info("The circuit breaker was opened")) .onHalfOpen(() -> log.info("The circuit breaker was half-opened"))
  • 45.
    Asynchronous API Integration Failsafe.with(retryPolicy) .getAsyncExecution(execution-> service.connect().whenComplete((result, failure) -> { if (execution.complete(result, failure)) log.info("Connected"); else if (!execution.retry()) log.error("Connection attempts failed", failure); })); Failsafe.with(retryPolicy) .getStageAsync(this::connectAsync) .thenApplyAsync(value -> value + "bar") .thenAccept(System.out::println));
  • 46.