2. From components to interactions
• Promiser (component) -> promisee(s) / stakeholder(s)
• Quantitative delivery and qualitative interpretation (perceptions)
• Error/fault -> Measured deviations from expected behaviour (probability)
• Incident -> Promise not kept (overlap of intent)
• Ticket -> Diagnostics (graph causation)
3. Agents and their promises
• Agents can be humans or machines
• Promises are quantifiable (not just yes/no)
• Scalable theory (agents inside agents)
• Different agents promise different capabilities
• Different agents perceive differently (context and capability)
4. • Agents can be humans or machines
• Promises are quantifiable
• Scalable theory
• Flawed communication
• Correctly intended/sent AND correctly perceived/received?
• Flawed / missing promises
• Flawed / missing cooperation (agreement) or interpretation
• Just wrong mindset / intuition
• Byzantine failures
Fault/Error modes
5. Issues relating to faulty cooperation
• Dependency (makes faults travel)
• Amplification (makes faults worse)
• Redundancy (helps to absorb faults)
• Repair (make faults disappear before they are noticed)
• Tolerance (keep working in spite of faults)
Diverge
Converge
6. Redundancy
• Serial:
• Humans: “Are you sure you want to do X?” (Self-confirm - AND = circuit breaker)
• Clients: failover to server X if server Y is not available (Self-repair - XOR Backup)
• Parallel:
• Humans: “Insert both your keys to confirm” (Average/voting - AND circuit breaker)
• Clients: query all sources for quorum (minimum acceptable confirmation - vote)
Converge/Confirm
7. Repair and tolerance
• There can be MANY ways to break a component
• It is MORE efficient to detect and repair quickly than to try to prevent
failure
• It is MOST efficient to tolerate errors and failures at all stages
Converge
9. Too late = broken promise
• Separate concerns
by timescales, not
be features
• Management is
about balance
• Error correction as
fast as error
generation