In this talk, Matthew Skelton (Skelton Thatcher Consulting) explores five practical, tried-and-tested, real-world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT.
Logging as a live diagnostics vector with sparse event IDs
Operational checklists and 'run book dialogue sheets' as a discovery mechanism for teams
Endpoint healthchecks as a way to assess runtime dependencies and complexity
Correlation IDs beyond simple HTTP calls
Lightweight 'User Personas' as drivers for operational dashboards
These techniques work very differently with different technologies. For instance, an IoT device has limited storage, processing, and I/O, so generation and shipping of logs and metrics looks very different from the cloud or 'serverless' case. However, the principles - logging as a live diagnostics vector, event IDs for discovery, etc - work remarkably well across very different technologies.
From a talk at Agile in the City Bristol 2017 http://agileinthecity.net/2017/bristol/sessions/index.php?session=44