CYNICAL SOFTWARE
Tips for writing software that do not wake you up at 2am
dagi@goodata.com
https://twitter.com/_dagi

Sunday 17 November 13
HOW TO BECOME CYNICAL

• Eat

your own dog food

• Strong
• Pager

feedback

duty (Engineer on duty)

• DevOps

Sunday 17 November 13

No throw over the wall syndrom
Sunday 17 November 13
http://pragprog.com/book/mnee/release-it
Sunday 17 November 13

Read an excelent book about Cynical software
- stability patterns&anti-patterns
- architecture, testing and design
INTEGRATION POINTS & FAILURE
MODES
Integration points

Backend
service

Frontend
service
LB

DB
LB

Frontend
service

Backend
service

DB
DB

Backend
service

Failure modes

LB

3rd party
service

Sunday 17 November 13

-

shift toward SOA, interconnected services, remote communication, 3rd party services
how a crack may appear - failure mode
integration point
cracks are tightly coupled to integration points
the way cracks appear and propagated across multiple layers and services
Integration points may accelerate (chained reaction) or stop cracks
Failure in one component increase the probability of failure in another component service
Slow responses, endpoint unreachability
High levels of complexity provide more directions for the cracks to propagate in.
Sunday 17 November 13
User user; Projects projects;
1
try {
projects = dao.getProjects(user, OFFSET, LIMIT).await(250, MS);
} catch (FutureTimeoutException e) {
throw new AccountInfoUnavailableException();
}
2

for (Project project : projects) {
Set<String> userPermissions = dao.getPermissions(project, user);
if (userPermissions.contains(CAN_INIT_DATA)) {
Hanging SQL
return new AccountInfo(Boolean.TRUE);
connection
}
(90 s timeout)
}

Client

Frontend server

Hanging HTTP
worker thread
(120s)

Backend server

Hanging HTTP
connection (60s
C3
timeout)

Datacenter AWS
Frontend server

Datacenter Rackspace
Sunday 17 November 13

Backend server
C3

MySQL
MySQL

MySQL
MySQL
MySQL
MySQL
Sunday 17 November 13
“Cynicism is merely the art of seeing things as
they are instead of as they ought to be” [1.]

A cynical software
No intimacy
Internal barriers

Lack of trust
Bad things happen

Resilience to impulse and stress
Sunday 17 November 13

Explain the main attributes of cynical software
Steady state
Circuit breaker

Bulkheads
Test harness

Handshaking

Fail fast

Timeouts
Decoupling middleware

STABILITY PATTERNS &
ANTIPATTERNS
Slow responses
SLA inversion
Attacks of self denial

Unbalanced capacities
Unbounded result set
Blocked threads
Scaling effects

Sunday 17 November 13

- the antipatterns will create, accelerate or multiply cracks in the system
- the patterns provide architecture and design guidance to reduce, eliminate, or mitigate the
effects of cracks in the system
CIRCUIT BREAKER

Frontend
service

Half-Open
Timeout
Closed
Open
Open
Failure counter: 2
1
0

Sunday 17 November 13

- mediator (decoupling, isolation), integration point wrapper
- fail fast, recovery

Failure

Backend
service
BULKHEADS
Hiccup
HTTP client

Bulkheads
Frontend
service

Connections pool

Backend
service A
Backend
service B
Backend
service C

Sunday 17 November 13

fat tails - sizing/capacity
isolation, degrading functionality
Latency and Fault Tolerance for Distributed Systems
https://github.com/Netflix/Hystrix

Sunday 17 November 13

- do not reinvent the wheel
- OSS, Java
- most of the patterns are implemented there (circuit breaker, bulkheads, fail fast)
public class CommandHelloWorld extends HystrixCommand<String> {
private final String name;
public CommandHelloWorld(String name) {
super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));
this.name = name;
}
@Override
protected String run() {
// a real example would do work like a network call here
return "Hello " + name + "!";
}
}

String s = new CommandHelloWorld("World").execute();
Future<String> fs = new CommandHelloWorld("World").queue();

Sunday 17 November 13

HystrixCommand = circuit breaker + bulkheads
Synchronous/Asynchronous usage
Async usage Future -> timeout
Focus on failures

TESTING
Simulate bad things
Chaos monkey [2]
Sunday 17 November 13

-

we do test but on "sligthly" different topology. It makes hard to reveal some kind of bugs
we mostly test optimistic cases
HTTP mock server (bad responses, slow responses, protocol violation...)
Longevity tests
Adaptable design

[3]

ARCHITECTURE
Conway’s law
Sunday 17 November 13

ROC

[4]

[5]

- early decisions are hard to revert later (costs)
- Big Up Front Design doesn’t work prefer Adaptable design - Framework, Platform
- restartability (No restart the world), diagnostics (health checks), recovery mechanism circuit breaker, isolation/redundancy ()
Q&A
https://github.com/dagi/cynical-software

Sunday 17 November 13

Cynycal software

  • 1.
    CYNICAL SOFTWARE Tips forwriting software that do not wake you up at 2am dagi@goodata.com https://twitter.com/_dagi Sunday 17 November 13
  • 2.
    HOW TO BECOMECYNICAL • Eat your own dog food • Strong • Pager feedback duty (Engineer on duty) • DevOps Sunday 17 November 13 No throw over the wall syndrom
  • 3.
  • 4.
    http://pragprog.com/book/mnee/release-it Sunday 17 November13 Read an excelent book about Cynical software - stability patterns&anti-patterns - architecture, testing and design
  • 5.
    INTEGRATION POINTS &FAILURE MODES Integration points Backend service Frontend service LB DB LB Frontend service Backend service DB DB Backend service Failure modes LB 3rd party service Sunday 17 November 13 - shift toward SOA, interconnected services, remote communication, 3rd party services how a crack may appear - failure mode integration point cracks are tightly coupled to integration points the way cracks appear and propagated across multiple layers and services Integration points may accelerate (chained reaction) or stop cracks Failure in one component increase the probability of failure in another component service Slow responses, endpoint unreachability High levels of complexity provide more directions for the cracks to propagate in.
  • 6.
  • 7.
    User user; Projectsprojects; 1 try { projects = dao.getProjects(user, OFFSET, LIMIT).await(250, MS); } catch (FutureTimeoutException e) { throw new AccountInfoUnavailableException(); } 2 for (Project project : projects) { Set<String> userPermissions = dao.getPermissions(project, user); if (userPermissions.contains(CAN_INIT_DATA)) { Hanging SQL return new AccountInfo(Boolean.TRUE); connection } (90 s timeout) } Client Frontend server Hanging HTTP worker thread (120s) Backend server Hanging HTTP connection (60s C3 timeout) Datacenter AWS Frontend server Datacenter Rackspace Sunday 17 November 13 Backend server C3 MySQL MySQL MySQL MySQL MySQL MySQL
  • 8.
  • 9.
    “Cynicism is merelythe art of seeing things as they are instead of as they ought to be” [1.] A cynical software No intimacy Internal barriers Lack of trust Bad things happen Resilience to impulse and stress Sunday 17 November 13 Explain the main attributes of cynical software
  • 10.
    Steady state Circuit breaker Bulkheads Testharness Handshaking Fail fast Timeouts Decoupling middleware STABILITY PATTERNS & ANTIPATTERNS Slow responses SLA inversion Attacks of self denial Unbalanced capacities Unbounded result set Blocked threads Scaling effects Sunday 17 November 13 - the antipatterns will create, accelerate or multiply cracks in the system - the patterns provide architecture and design guidance to reduce, eliminate, or mitigate the effects of cracks in the system
  • 11.
    CIRCUIT BREAKER Frontend service Half-Open Timeout Closed Open Open Failure counter:2 1 0 Sunday 17 November 13 - mediator (decoupling, isolation), integration point wrapper - fail fast, recovery Failure Backend service
  • 12.
    BULKHEADS Hiccup HTTP client Bulkheads Frontend service Connections pool Backend serviceA Backend service B Backend service C Sunday 17 November 13 fat tails - sizing/capacity isolation, degrading functionality
  • 13.
    Latency and FaultTolerance for Distributed Systems https://github.com/Netflix/Hystrix Sunday 17 November 13 - do not reinvent the wheel - OSS, Java - most of the patterns are implemented there (circuit breaker, bulkheads, fail fast)
  • 14.
    public class CommandHelloWorldextends HystrixCommand<String> { private final String name; public CommandHelloWorld(String name) { super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup")); this.name = name; } @Override protected String run() { // a real example would do work like a network call here return "Hello " + name + "!"; } } String s = new CommandHelloWorld("World").execute(); Future<String> fs = new CommandHelloWorld("World").queue(); Sunday 17 November 13 HystrixCommand = circuit breaker + bulkheads Synchronous/Asynchronous usage Async usage Future -> timeout
  • 15.
    Focus on failures TESTING Simulatebad things Chaos monkey [2] Sunday 17 November 13 - we do test but on "sligthly" different topology. It makes hard to reveal some kind of bugs we mostly test optimistic cases HTTP mock server (bad responses, slow responses, protocol violation...) Longevity tests
  • 16.
    Adaptable design [3] ARCHITECTURE Conway’s law Sunday17 November 13 ROC [4] [5] - early decisions are hard to revert later (costs) - Big Up Front Design doesn’t work prefer Adaptable design - Framework, Platform - restartability (No restart the world), diagnostics (health checks), recovery mechanism circuit breaker, isolation/redundancy ()
  • 17.