This session provides a primer to resilience at varying flight altitudes.
It starts at a management level and motivates why resilience is important, why it is important today and what the business case for resilience is (or actually is not).
Then it descends to a high level architectural view and explains resilience a bit more in detail, its correlation to availability and the difference between resilience and robustness.
Afterwards it descends to a design level and explains some selected core principles of resilience, some of them garnished with grass-root level flight altitude code examples.
At the end the flight altitude is risen again and some recommendations how to introduce resilient software design into your software development process are given and the correlation to some related topics is explained.
Of course this slide deck will only show a fraction of the actual talk contents as the voice track is missing but I hope it will be helpful anyway.
5. Resilience (IT)
The ability of an application to handle unexpected situations
- without the user noticing it (best case)
- with a graceful degradation of service (worst case)
6. Resilience is not about testing your application
(You should definitely test your application, but that‘s a different story)
public class MySUTTest {
@Test
public void shouldDoSomething() {
MySUT sut = new MySUT();
MyResult result = sut.doSomething();
assertEquals(<Some expected result>, result);
}
…
}
16. Counter question
Can you afford to ignore it?
(It’s not about making money, it’s about not loosing money)
17. Resilience business case
• Identify risk scenarios
• Calculate current occurrence probability
• Calculate future occurrence probability
• Calculate short-term losses
• Calculate long-term losses
• Assess risks and money
• Do not forget the competitors
19. Classification attempt
Reliability: A set of attributes that bear on the capability of software to maintain its level
of performance under stated conditions for a stated period of time.
Efficiency
ISO/IEC 9126
software quality characteristics
Usability
Reliability
Portability
Maintainability
Functionality
Available with acceptable latency
Resilience goes
beyond that
37. Timeouts (1)
// Basics
myObject.wait(); // Do not use this by default
myObject.wait(TIMEOUT); // Better use this
// Some more basics
myThread.join(); // Do not use this by default
myThread.join(TIMEOUT); // Better use this
38. Timeouts (2)
// Using the Java concurrent library
Callable<MyActionResult> myAction = <My Blocking Action>
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<MyActionResult> future = executor.submit(myAction);
MyActionResult result = null;
try {
result = future.get(); // Do not use this by default
result = future.get(TIMEOUT, TIMEUNIT); // Better use this
} catch (TimeoutException e) { // Only thrown if timeouts are used
...
} catch (...) {
...
}
51. • Define scaling strategy
• Think full stack
• Apply D-I-D rule
• Design for elasticity
52. … and many more
• Supervision patterns
• Recovery & mitigation patterns
• Anti-fragility patterns
• Supporting patterns
• A rich pattern family
Different approach than traditional
enterprise software development
53. How do I integrate resilience
into my
software development process?
54. Steps to adopt resilient software design
1. Create awareness:
Go DevOps
2. Create capability:
Coach your developers
3. Create sustainability:
Inject errors