Modern software systems now increasingly span cloud and on-premises deployments and remote embedded devices and sensors. These distributed systems bring challenges with data, connectivity, performance, and systems management; to ensure success, you must design and build with operability as a first-class property.
Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT: logging as a live diagnostics vector with sparse event IDs; operational checklists and runbook dialog sheets as a discovery mechanism for teams; endpoint health checks as a way to assess runtime dependencies and complexity; correlation IDs beyond simple HTTP calls; and lightweight user personas as drivers for operational dashboards.
These techniques work very differently with different technologies. For instance, an IoT device has limited storage, processing, and I/O, so generating and shipping of logs and metrics looks very different from cloud or serverless cases. However, the principles—logging as a live diagnostics vector, event IDs for discovery, etc.—work remarkably well across very different technologies.
Drawing from his experience helping teams improve the operability of their software systems, Matthew explains what works (and what doesn’t) and how teams can expand their understanding and awareness of operability through these straightforward, team-friendly techniques.
From a talk given by Matthew Skelton at Velocity Conference EU 2017 - https://conferences.oreilly.com/velocity/vl-eu/public/schedule/detail/61954