In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, Microservices, on-premise, and IoT. Based on our work in many industry sectors, we will share our experience of helping teams to improve the operability of their software systems through these straightforward, team-friendly techniques.
From a talk given at DevOpsCon Munich 2017 https://devopsconference.de/microservices/practical-team-focused-operability-techniques-for-distributed-systems/
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Practical, team-focused operability techniques for distributed systems - DevOpsCon Munich 2017
1. Practical, Team-focused Operability
Techniques for Distributed Systems
Matthew Skelton, Skelton Thatcher Consulting
skeltonthatcher.com / @SkeltonThatcher
DevOpsCon 2017, MUC, DE – 20 Nov 2017
2. Today
Run Book dialogue sheets
User Personas for dashboards
Endpoint healthchecks
Modern logging
Correlation IDs
19. Run Book dialogue sheets
Checklists for typical operational
considerations
Team-friendly exploration
20. System characteristics
Hours of operation
During what hours does the service or system actually need to operate? Can portions or features of the
system be unavailable at times if needed?
Hours of operation - core features
(e.g. 03:00-01:00 GMT+0)
Hours of operation - secondary features
(e.g. 07:00-23:00 GMT+0)
Data and processing flows
How and where does data flow through the system? What controls or triggers data flows?
(e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data )
…
35. endpoint healthchecks
Every runnable app/service/daemon
exposes /status/health
An HTTP GET to the endpoint returns:
200 – "I am healthy"
500 – "I am sick"
59. Example: video processing
On-demand processing of TV and
mobile streaming adverts
Ad-agency TV broadcaster
High throughput
Glitch-free video & audio
63. Example: video processing
Discover processing bottlenecks
Trigger alerts via LogEntries /
HostedGraphite
Report on KPIs
Target areas for improvement
85. Resources
• Team Guide to Software Operability by Matthew Skelton and Rob
Thatcher (Skelton Thatcher Publications, 2016)
http://operabilitybook.com/
• Run Book template & Run Book dialogue sheets
http://runbooktemplate.info/