Be the first to like this
Modern software systems now increasingly span cloud, on-premise, and remote embedded devices & sensors. These distributed systems bring challenges with data, connectivity, performance, and systems management, so for business success we need to design and build with operability as a first class property.
In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT:
- Logging as a live diagnostics vector with sparse Event IDs
- Operational checklists and 'Run Book dialogue sheets' as a discovery mechanism for teams
- Endpoint healthchecks as a way to assess runtime dependencies and complexity
- Correlation IDs beyond simple HTTP calls
- Lightweight 'User Personas' as drivers for operational dashboards
These techniques work very differently with different technologies. For instance, an IoT device has limited storage, processing, and I/O, so generation and shipping of logs and metrics looks very different from the cloud or Serverless case. However, the principles - logging as a live diagnostics vector, Event IDs for discovery, etc. - work remarkably well across very different technologies.